This Week in Fluvio #51
Fluvio is a distributed, programmable streaming platform written in Rust.
Welcome to the 51st edition of this week in Fluvio.
For this edition we have more details on Community Contribution
Carson Rajcan presented to our team a really cool contribution to the Fluvio OSS.
In the project Carson developed the functionality to apply SmartModule transformation for Producer - PR #3014
Here is Carson’s experience…
Get the Stream Processing Unit to apply SmartModule transformations for the Producer requests coming from the CLI (before commit).
At present, Fluvio SmartModules are only applied for Consumer requests after being read from disk. Shaping the data before it enters the topic spares the Consumer or other downstream services from this burden.
While investigating another issue, I had recently gained some insight into how the Stream Processing Unit handles Producer requests, as well as how it uses the Fluvio Smart Engine to transform records for the Consumer.
Critically, I noticed that the Smart Engine was well encapsulated – it was going to be easy to repurpose.
Also that the Producer request handler was well organized, it wasn’t going to be a nightmare to plug in some additional functionality.
Where was I going to start? Well, TDD is my friend.
There were plenty of test cases for Consumer SmartModule transformations that used the CLI. All I had to do was move the SmartModule options from the Consumer commands over to the Producer commands.
In this example, we are processing email addresses on the Consumer side.
Our test email is FooBar@test.com
(emphasis on the usage of capital letters)
echo "FooBar@test.com" | fluvio produce emails
fluvio consume emails -B -d --smartmodule lowercase
In this example, we are still processing email addresses, but now it is possible to do on the Producer side.
echo "FooBar@test.com" | fluvio produce emails --smartmodule lowercase
fluvio consume emails -B -d
Both result in the same output with the email address using only lowercase letters:
foobar@test.com
With my TDD workflow ready to go, I made the changes from the outside-in.
- Added the SmartModule options to the CLI Produce Command.
- Used those arguments to build SmartModuleInvocation(s), a type used to model SmartModules during network requests.
- Added the SmartModuleInvocation(s) to the ProduceRequest
- Had to define how the SmartModuleInvocation(s) would be encoded & decoded
- Translated the SmartModuleInvocation(s) into a SmartModuleChain, a type which can be passed to the Smart Engine.
- Finally, I fed the SmartModuleChain and the Produce requests’s records to the SmartEngine.
Learning to translate between types in someone else’s codebase can be challenging.
There are many types to become familiar with in the Stream Processing Unit. Notably, the SPU uses a few different Batch
types to model using different Record types (Record, RawRecords, MemoryRecords).
You end up seeing Batch<BatchRecords>
, Batch<RawRecords>
, Batch<MemoryRecords>
… quite often. It took me a while to figure out when and where to use each, and how to convert between them.
What happens when a Producer sends compressed records and requests the SPU performs a SmartModule Transformation?
The records must be decompressed so they can be fed to the SmartEngine, then compressed again before storage. To pull this off I had to dig up the code that performs the compression, decompression and figure out how to utilize it while handling Producer requests.
- SmartModules that perform filtering and aggregation can now be applied before commit to save storage.
- Time intensive SmartModule operations can be performed on write, rather than while consuming.
Thank you for your feedback on Discord. We are working on a public road map that should be out soon.
Keep the feedback flowing in our Discord channel and let us know if you’d like to see the video of Carson walking through the code.
Get in touch with us on Github Discussions or join our Discord channel and come say hello!
For the full list of changes this week, be sure to check out our CHANGELOG.