At training time, I used the standard tf.data.Dataset APIs to prefetch and load data in parallel. There’s nothing special about it. The key here is you can pre-processing the data with Spark when you have a lot of data and your data transformation logic is complex. Spark is good with this heaving lifting job as it can scale to multiple machine easily. With Tensorflow’s Dataset APIs, it’s slower to do it with a single machine CPU because Dataset’s parallel processing is limited by the number of cores you have in the machine. Plus, when your data transformation logic is complex, you probably need to do it with python in CPU then move data from CPU to GPU which is an expensive process.