Keras Custom Data Generator - Stuck on First Epoch, No Output?

mempet · August 17, 2021, 3:13am

stackoverflow link: python - Keras Custom Data Generator - Stuck on First Epoch, No Output? - Stack Overflow

I’ve been trying to get a multi-input data generator to work in Keras for a muti-input model. The inputs are in the form of an image and an associated number.

I’ve tried two different custom data generators, but the simpler one merely uses ImageDataGenerator and flowfromdataframe with two outputs. Later on, I switch one of the outputs into an input and feed it into the model. The relevant code is as follows, where y_col is the output, number_col is the associated number and path_col is the path to the images:

# data generator

df_gen = img_data_gen.flow_from_dataframe(
**all_args,
x_col=path_col,
y_col=[y_col, number_col],
shuffle=False,
class_mode=‘raw’)

sending data to model, wrapped in a larger function

while True:
data_batch = next(df_gen)

#fake data, works in the model perfectly
number_labels = np.random.randint(1,219,len(data_batch[1]))

outputdata, numberdata = data_batch[1].T
outputdata = np.asarray(outputdata).astype('float32')

#this code never works, the model freezes
numberdata = np.asarray(numberdata).astype(np.int32)

yield [numberdata, data_batch[0]], outputdata

#fitting the model
history = model.fit(
train_generator,
steps_per_epoch=int(len(train_df) / batch_size),
validation_data=val_generator,
validation_steps=int(len(validation_df) / batch_size),
epochs=epochs,
callbacks=callbacks,
verbose = 1
)

When I run this model, the output freezes at ‘Epoch 1/12’. I’ve checked that the data is in the right format, the right length, and matching properly to the other input.

When I generate a random list of numbers, the model runs perfectly. I can also see that when fake data is generated, the number data is also getting generated.

However, when I use the correct number data as an input into the model, the model freezes at the second ‘next’ call. I can also use a smaller sub-dataset with the same data structure and the model runs correctly. But when I use the entire dataset, the problem occurs again.

Do you know what could be causing this problem? I’m using AWS Sagemaker to run the model and can’t seem to figure out where this problem is coming from. Thank you for your help!

Renu_Patel · November 2, 2023, 2:12pm

Hi @mempet

Welcome to the TensorFlow Forum!

Could you please share the standalone code(if it is shareable) and the dataset shape or sample dataset to reproduce the issue and understand the issues. As we may need to understand the dataset shape, type, the dataset preprocessing and how you have defined the model for the training before helping you in this issue. Thank you.