Using Keras Sequence and model.fit multiprocessing

Felix00 · October 9, 2023, 4:04am

How can I use multiprocessing with a Keras Sequence as trainingsdata?

I tried just passing multiprocessing=true and numworkers > 1 but that doesnt work. Because of some errors.

What I want is the following:
Read huge Tfrecords Dataset on multiple gpu cores and perform the training on the gpu.
I can not use tf.data because that approach requires to read the whole trainingdata into memory.

So I have to pass a generator or a Keras.Sequence. Without multiprocessing I made both of those work.
Whenever I pass multiprocessing = true I get an Error to model.fit() I get an Error.

I also tried this approach https://muditb.medium.com/speed-up-your-keras-sequence-pipeline-f5d158359f46 which basically builds a generator from a Keras.Sequence running on different processes with shared memory. But here aswell I get an error, when calling process.start()

Is there a working example for a datapipeline that reads Batches from a tfrecords dataset in parallel?
An Ideal example would be on Ubuntu using a gpu enabled Tensorflow version, where I can determine how many cores are running the datapipeline.

Renu_Patel · November 22, 2023, 7:05pm

Hi @Felix00

Welcome to the TensorFlow Forum!

Please share the error log what is the issue error you are getting? Please have a look at these Data input pipelines docs to check the different methods of dataset extraction and optimise the performance accordingly.

Let us know if this helps or need further help in this. Thank you.