Group Split (Rust)
As I was working at Misix, I wrote a recommendation engine. Those programs worked entirely from TSV data, which was streamed from a profiler program to a ranker program. These programs worked fine from one set of data, and to get parallelism the input can be chunked based on the buyer’s ID. Once a chunk has been produced for each processor, it is a simple case of using GNU Parallel to start each recommendation process. I really wanted to keep the design of the recommender programs simple, and the decision to work from conventional unix processes paid off in spades from all the headaches I never had, and how simple it is to understand.
To produce these chunks, I wrote Group Split. Just like the recommendation engine, it was written in Rust. Given TSV data over stdin, it will chunk the data according to an identifier column into many files. Those files are then read by the recommendation processes.
For those interested in the code, it can be downloaded here.