Aggregating, reporting, and visualizing 10s of billions of records per day from a global footprint

Scale
06/12/2017 - 14:50 to 15:10
Kesselhaus
short talk (20 min)
Advanced

Session abstract: 

The beauty of digital advertising is the capability for per-transaction measurement and optimization, but in order to make that a reality, customers need to know what happened in the sea of billions of transactions. They want slick visualizations, interactive query times, and drill-down capabilities for fine-grained analysis.  Learn how we evolved our platform from a fixed set of tabular reports with a basic UI to a truly responsive and immersive reporting experience using classic data warehouse techniques, optimized techniques for specialized data sets, and new cloud platforms that address a new reality.

Managing a global footprint serving 10s of billions of transactions per day is challenging, but moving, aggregating, processing, and reporting on all of that data presents an even larger challenge.  Gone are the days when coarse dimensions, fixed report ranges, and multi-second load times. Customers want to be able to ask relatively open-ended questions across full data sets, customize the reports, drill down, and get responses in seconds.  Join us as we explore our journey from a fixed set of basic reports to a much more interactive experience that better meets our customers needs, all while scaling total volume 10x in 2 years.

Along the way, we'll discuss:

  • Data transport and disaster recovery logistics from world wide data centers using Kafka, Secor, and Camus
  • The evolution of batch processing from Pig to native Map Reduce, and the framework that enabled 6x performance gains
  • The integration of Spark streaming data flows for real time processing, data enrichment, and analytics
  • Leveraging new cloud-based data warehouses and technologies to iterate faster and avoid re-inventing the wheel
  • Re-imagining data models and processing flows to optimize performance
  • Hard lessons learned about managing machines, data, and hybrid technology platforms at scale.

Video: 

Slide: