How to load big image segmentation dataset

I have a U-Net model that I would like to train on a big dataset of images stored on my local machine. Unfortunately it is not possible to load the dataset directly into memory since it is too big, moreover is also not possible to store the images in a way that allows the usage of image_dataset_from_directory since each pixel of the image must be associated to a different class.
What is the correct way to load the dataset and train the model? How can I create an appropriate dataset?

The dataset is stored in four directories, two for labels (train, test) and two for sources. Each of the four directories contains a set of folders named with unique ids containing either a npy image or a tif label.

The structure is show here:
train_labels > id_1 > label.tif

train_images > id_1 > image.npy

test_labels > id_2 > label.tif

test_images > id_2 > image.npy

Hi @zoythum,

I would suggest you to use tfrecords. Please find the documentation here. The TFRecord format is a simple format for storing a sequence of binary records.

The Oxford Pets Dataset might be a match for the structure of your images and labels. You can look at how the raw data is organized, and how the Tensorflow Dataset configuration files present the samples.

It can be quite tedious for a technical point of view as we don’t have good image data exploration tool.