Creating datasets from scratch tutorial needed

I have spent the past few months completing accreditations in TensorFlow and am wanting to create a CNN for image classification from scratch. I am not wanting to use an existing data set but am planning on collecting and creating a small dataset myself but don’t know where to start in terms of the correct method to collate and prepare the data. If anybody can recommend a short course or tutorials that would step me through the process I would really appreciate it thanks!

Hi @Nicci_Thomson,For collecting images you can capture the images, download them from the internet,etc… Once you have collected images that are suitable for your image classification task you can place them in sub directory with in the main directory the sub directory name should be class name for the images. For example, If I have 2 classes A & B then the directory structure will be


If you have less images you can perform data augmentation techniques to improve your dataset that you can get generalized results from training.

Now you can use tf.keras.utils.image_dataset_from_directory( ) for creating a train, validation datasets.

For example,

train_ds = tf.keras.utils.image_dataset_from_directory(
  image_size=(img_height, img_width),

val_ds = tf.keras.utils.image_dataset_from_directory(data_dir,
  image_size=(img_height, img_width),

Before passing those images to model you have normalized those images, if not you have to add a normalization layer to the model.

Now you can train your model with those images.

Thank You!