Thank you Ruslan.

1 min readNov 6, 2018

Thank you Ruslan. The serialization is handled by Beam but if you want to write your own IO module, then you can find serialization details on chapter 4.3 of the programming guide: https://beam.apache.org/documentation/programming-guide/

You can find more details on the SparkRunner here:

Apache Spark Runner

Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data…

beam.apache.org

And on the execution model for parallelized tasks:
https://beam.apache.org/documentation/execution-model/

Please keep in mind that the Python SDK is quite experimental and far behind the Java one. This gap should be filled in the coming versions though.

Apache Spark Runner

Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Vincent Teyssier

No responses yet