Recommendation Engine
For a number of years when working at Misix, I developed and maintained a recommendation engine for at-auction vehicles. This project originally started with marketing emails, and developed into a component they would use on their e-bid website to advertise buyer-specific vehicles. During development, the recommender of the system matured from something in JS that filled HTML templates and sent emails, to a robust and well-designed set of shell tools written in Rust.
The early days
When this project started, I was on a silly, immature kick of "I'll use node.js for everything." I feel like that hampered progress, which I'll explain how I learned to overcome in a bit, but thought it to be important to offer this perspective.
I asked our boss (Andy) about making a recommendation engine for these cars, and it turns out that he was already talking about the same idea. This fast-tracked us on a project, and we started building one intent on sending marketing mails to prospective buyers. Over a couple months, I developed the programs to execute it, and our economist (Jonathan) tested the accuracy of our recommendations.
At first, because we were sending marketing emails, I designed the program around that. That assumption later bit me, and modifying tons of parameters became difficult. It was written with Node.js, and while it allowed fast prototyping, the execution speed was extremely slow. Especially troublesome were the inefficiencies in design. When we started, we only had a small batch of people to recommend cars to. I used database queries for everything, and that turned into a bottleneck when we expanded the pool of buyers. It was clear the volume of recommendations in a full set was going to take more time than my computer had, so I later rewrote it in Rust as a few command-line components.
The JS recommender program had the following components:
- SOAP downloader (get vehicles from customer)
- Vehicle database
- Profiler
- Ranker
- HTML generator
- Email sender
- Analytics downloader
Becoming useful
Due to the inefficiencies, I redesigned our recommender system around the unix philosophies of "do one thing well," "be small," and "be generic." Redesigning some components as shell tools made parallelism natural and automation convenient with GNU Parallel. CSV/TSV was the intermediate format of choice, and was easy to split up and recombine for different jobs in the batch. The code was written in Rust, which compiles into very fast machine code. In all, it took the recommendation processes from a day in JS down to minutes in Rust. The components were far more modular, and additional sections were later built to accommodate stuffing/sending more marketing emails.
The following is the process used to produce assets:
- A query to the database produces a CSV of historical inventory.
- A query to the database produces a CSV of current inventory.
- The historical inventory is piped into the profiler program, which produces a CSV of buyer profiles.
- Both the CSV of current inventory, and the CSV of buyer profiles are piped into the ranker profile, which produces a CSV of ranked inventory. This is the deliverable asset.
Unfulfilled goals
I didn't stay at Misix long enough to see the end of this. The project was stalled for a long time. From what I heard secondhand, our customer never did use my code in their website because their Windows IT couldn't figure out how to compile my source, and their .NET engineers couldn't read Rust code. So, they ended up not using it for long.
Source code
I have uploaded the source code to the Rust version of the engine, for a couple of reasons. One, it is a good example of my clean and tightly-written code. It is also a good example of design of command line utilities, that do one thing and do them well. It works on any kind of inventory, ranked for any kind of attribute. While it was written while I worked at Misix, it was never used in any commercial product, and the company is now defunct. Beyond that, it is so small that I don't know if it's actually copyright-able; there isn't anything "new" with it. It was purely an exercise of vector algebra. Download it here; the git repo has been stripped, since it may have contained customer-specific information.