In the past few years, Eyeview’s data team was faced with a difficult task: how to handle billions of events per day coming from a growing number of sources — in different formats, resolutions, and delivery patterns — some arriving in real time and some through daily batches.
To further complicate things, there are multiple users of the processed data. We have customers looking at real-time graphs, account managers pulling aggregated reports, data analysts running SQL queries, and data scientists doing research in Scala or R, and they should all be looking at the same data no matter who they are or which source they use to access it.
In this presentation, we describe the data pipeline architecture, which allows Eyeview to handle these billions of events daily. We discuss how we use a variety of Amazon services including Kinesis, SQS, Redshift, S3, and DynamoDB. We also share some of the lessons we learned along the way.