Decisioning / July 5, 2017

Decision Sciences 101: Machine Learning Algorithms in RTB

This is part one of our Decision Sciences 101 series. 

Online advertising is all about the matchmaking of three parties: the advertisement, the user and the publisher. A perfectly placed advertisement is the one that is relevant to a user at the time and is placed on publisher content that does not obscure user attention and elicits positive sentiments to the advertisement.

This concept is much easier said than done, especially in an ecosystem dominated by RTB where billions of bid opportunities from hundreds of thousands of publishers are presented on a daily basis. The RTB-heavy system is also causing a rapid increase in the applications of machine-learning algorithms. Some of the most seen applications include: response modeling, audience modeling, bid optimization, fraud detection and contextual advertising.

At Eyeview, we receive more than 30 billion bid requests per day, out of those we bid on fewer than 1%. Therefore decisioning on which bid requests to respond to, what is the right bid price, and what video to show to the user is critical to the campaign outcome as well as to the profitability.

Here, we’ll discuss response and audience modeling, which are used to determine whether and how much to bid on a bid request, and tackle the other applications in part two of this series. While the objective of both response modeling and audience modeling are to identify the audience that are most likely respond to the campaign, the former is typically used when explicit or implicit responses to past marketing campaigns are available and the latter is used when the response data is lacking.

Response Modeling

Response modeling is a long-established practice in direct marketing channels, such as email and direct mail marketing. However, the online environment presents additional challenges as well as opportunities. The main challenge is the lack of consistent user identification, which is caused by a combination of third-party cookies having limited lifespans and the widespread ownership of multiple devices. The development of cross-device identifications helps to reduce this problem. At Eyeview, we leverage third-party device graphs as well as proprietary algorithms to uniquely identify consumers across the devices and create consumer-level profiles for modeling and real-time scoring.

On the other hand, there is a large amount of information that is not available to direct marketing is open to online marketers. For example, retargeting campaigns can use a user’s site-visitation behavior on a retailer’s website, which can be akin to following the shopper around the store or watching a shopper flipping through a catalog. This makes the conversion prediction much more accurate than solely relying on the purchasing behavior in the offline, direct marketing scenario. Additionally, within the RTB environment, where the ads appear, the contextual environment and whether it is viewable to the user play important roles in determining the responses.  

Response modeling for online advertising has been mainly focused on predicting the probability of a user clicking on an ad and on the conversion probability after viewing or clicking on an ad, as these are the most commonly used performance metrics by advertisers. There are a number of machine-learning algorithms used in response modeling, from the traditional logistic regression and decision trees, to the more recent algorithms such as factorization machine and gradient-boosting decision trees. The probabilities are then used to evaluate bid opportunities.  

Audience Modeling

Audience modeling is also a long-existing practice in direct marketing, and in particular in prospecting and acquisition campaigns. In this case, since there is no historical interaction data available on the prospects, marketers take their best customers or respondents and identify which prospects resemble that group the most by creating look-alike models.

The data used for modeling ranges from demographics and geography in the more basic models, to detailed behavioral data, like credit card transactions and travel profiles. In the online world, the data is further enriched by the online behavioral data, such as web browsing, search, social networking, video watching and app usage. Because of the large amounts–and the sometimes complex structure of the data–it is much more challenging to develop an accurate model.

The techniques applied include nearest-neighbor, random forest, gradient boosting decision trees, and clustering techniques. Which technique to apply largely depends on the types and structure of the data that is available for modeling. Recently, we are also seeing some successful applications of topic modeling to this problem. The idea here is to group users into clusters based on the probability of their behavior, and each user has certain probabilities that belong to a cluster, as opposed to traditional clustering solution where clusters are mutually exclusive.

Topic modeling is particularly useful for data with inherent hierarchical structure, such as website browsing, and data provided by third-party data management platforms. The calculation of actual user scores on these models are often done in batch mode rather than in real-time due to the complexity of the model as well as the number of data sources and large data volume used in the model.

Final Thoughts

At Eyeview, we employ response models to drive site engagement and online purchases for multichannel retail and travel clients and lookalike audience models for CPG and auto clients, where the brand interaction data is more limited. The next step in further optimizing campaign performance is to select the video creative for each audience group that is most likely to create an emotional connection and drive purchasing decision, which we’ll be covering in the next installment of this series. 

Learn more about Eyeview’s Technology.

Yvonne Nikas

Yvonne Nikas VP Decision Sciences

Date: 07.05.2017