Towards Flink 2.0: Rethinking the stack and APIs to unify Batch & Stream

Stream
06/18/2019 - 15:20 to 16:00
Kesselhaus
long talk (40 min)
Intermediate

Session abstract: 

Flink currently features different APIs for bounded/batch (DataSet) and streaming (DataStream) programs. And while the DataStream API can handle batch use cases, it is much less efficient in that compared to the DataSet API. The Table API was built as a unified API on top of both, to cover batch and streaming with the same API, and under the hood delegate to either DataSet or DataStream.

In this talk, we present the latest on the Flink community's efforts to rework the APIs and the stack for better unified batch & streaming experience. We will discuss:

- The future roles and interplay of DataSet, DataStream, and Table API

- The new Flink stack and the abstractions on which these APIs will build

- The new unified batch/streaming sources

- How batch and streaming optimizations differ in the runtime, and what the future interplay of batch and streaming execution could look like

Video: 

Slide: