Eyeview is a video marketing technology company that applies algorithms and machine learning models to deliver the most relevant video messages to consumers across TV, desktop, mobile and social media platforms. Our Real-Time Bidder handles more than half a million requests per second and is responsible for making decisions in under 100 milliseconds.
In an effort to provide the best consumer experience by matching them to brands of their interest, our bidder bids on less than 1% of the auctions we receive. The bidder is extremely selective because it is predicting each user’s potential interest in our advertisements. As you can imagine, matching users to advertisements in real-time is an extremely challenging task.
This blog is divided into the following sections – problem definition, user scoring model, handling frequent changes, user score evaluation, enter PMML, component workflow, conclusion, achievements and future work and improvements.
Problem Definition – Finding the Right Users Using Data Science Models
The way we predict outcomes at Eyeview is by using data science models. When we evaluate users, we use a score (generated by the model) to reflect how relevant a user is for a given campaign, based on the campaign KPI (key performance indicator). Our analytics team owns and tunes these models using regression techniques such as logistic, decision-tree and random forest algorithms.
For example, if the KPI of a motorcycle company campaign is to drive visits to their store, then the higher the user score, the more likely the user is to visit the store. Let’s examine how these models are written.
User Scoring Model
One way to calculate User Scores is using a logistic regression model. Here is what that might look like in a simplified way – our real models are a lot more sophisticated and part of our “secret sauce” here at Eyeview:
As you can see, bringing people to a store may involve factors like proximity to the store, past trips to the same store, having a particular rewards card, etc.
In Data Science, these factors are called features. Each feature represents a property of the user, and is associated with a weight which represents the importance of that property to the desired outcome. The algorithm then takes into consideration all of these features and assigns weights to each one of them.
The Problem – Handling Frequent Changes
One thing you might have already noticed is the above model contains two parts – a dynamic part – the features, and a fixed part – the weights. Features specific to the user are dynamic in nature since each user is unique and retrieved at runtime, but the weights are generated via an offline training model that is managed and fine-tuned periodically by our analytics team thereby forming a feedback loop. Since the user behavior changes over time, i.e. the weights of the features may change daily, we needed a way to feed them into our real-time system.
User Score Evaluation
Currently, our bidder evaluates user scores in real time by calling a microservice that is responsible for evaluating the model and returning the user score. We needed a generic scoring framework that is also aware of the data science algorithms and this microservice serves that purpose.
The models are trained in spark, so changes to the user-scoring model were warranted every time analytics added new features, which happens occasionally. Also, these training models had grown in number over a period of time and our analytics team was finding it extremely laborious to copy over the weights every time the model was re-trained. We knew we needed a different approach that is easier to manage, fits our requirements and scales better.
We looked at a few options like Mleap, MLFlow in our initial research and PMML (Predictive Model Markup Language) seemed like the best fit. The PMML format is an XML-based language that enables the definition and sharing of predictive models between applications. A predictive model is a statistical model that is designed to predict the likelihood of target occurrences given established variables or features. Since our training and real-time scoring spanned two applications and the number of features (including their weights) kept changing, PMML looked like an option to consider.
Let’s look at an example on how PMML can be used between spark (used for training the model) and the user-scoring microservice (computing the user score).
As seen here, once analytics creates the model in spark, they convert it to a PMML object and persist it. The user-scoring microservice, at the time of bid call evaluation deserializes the model (with the help of JPMML) and applies the received set of features to their values. The model evaluator evaluates the model and returns the user score in real time.
Here’s a diagram illustrating the complete flow:
Since the serialization workflow can exist in a different space than the deserialization one, we leverage it to use in two separate applications – training and the user-scoring to achieve an automated feedback loop, real time training and price predictions.
By automating the training and prediction feedback loop we were able to scale our usage of machine learning algorithms to all campaigns and eliminate all manual processes. Our Analytics and Data Science teams can now focus on improving the accuracy and quality of the actual models used.
Future Work and Improvements
PMML turned out to be an excellent use-case for us and going forward, once the scale increases and data science needs, it would make sense to use frameworks that support additional models with more features.