Writing a Distributed Ray Tracer with Apache Beam, Abridged

Stream

06/18/2019 - 11:50 to 12:10

Maschinenhaus

short talk (20 min)

Intermediate

Session abstract:

Ray Tracing is an embarrassingly parallel way to render high quality images, but distributing work between machines is hard. Apache Beam is a model for efficient distributed data processing. In this talk I map Ray Tracing onto the Apache Beam Go SDK.

Apache Beam SDKs portably abstract computations and data into DoFns and PCollections, and allow you to construct a graph of how data flows and connects. Then any compatible Runner can optimize that graph for computation, and distribute work to the runners.

I'll give a overview of Ray Tracing and introduce the Apache Beam model, it's history, demonstrating mapping one to the other. I'll explain the benefits and limits of the model and how to get the most out of your pipelines, and show it running on compatible runners like Apache Flink and Google Cloud Dataflow. Further, I'll show using the same pipeline running in batch and streaming modes, with minimal changes.