Translating 800 million IP addresses to coordinates each day using Kafka Streams

Stream
06/11/2018 - 17:20 to 18:00
Kesselhaus
long talk (40 min)
Intermediate

Session abstract: 

The Schibsted Data Platform is the global processing hub for data in Schibsted, and we receive roughly 800 million user behaviour events from more than 40 sites worldwide each day. The Data Platform’s responsibility is not only to collect, structure, and index the incoming data, but also add extra value by adding additional information to the events, known as enrichments.

To offer targeted advertising based on location, the Data Platform enriches all incoming events using an API that will translate IP addresses into coordinates. To do this in real-time and at scale, with sub-second latencies, we utilize Kafka and Kafka Streams.

In this presentation, I will introduce Kafka, Kafka Streams, the Kafka Streams DSL and the Processor API to explain how it can be used for branching, caching, bulking, asynchronous HTTP lookups, and joining. I will also talk about experiences related to operations, performance, and scaling.

Video: 

Slide: