Unknown image file format. One of JPEG, PNG, GIF, BMP required. Tensorflow Error

Mokshit_Surana · April 20, 2023, 8:21am

I have six sub-folders which are classes for my classification task. They have images extension jpg, png, jpeg which are accepted by tensorflow.

I am using the function image_dataset_from_directory and then I am printing the shapes of each batch just to check if they are correct.

for x, y in val_ds:
    print(x.shape, y.shape)

In this code I am getting that unknown image error.

I also used 2 scripts which I took from the similar questions which removed corrupted images.

from pathlib import Path
import imghdr
import shutil
destination_folder_path = '../bekar/flood_bekar'
data_dir = "./flood"
image_extensions = [".png", ".jpg", 'bmp', 'jpeg']  # add there all your images file extensions
img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"]
for filepath in Path(data_dir).rglob("*"):
    if filepath.suffix.lower() in image_extensions:
        img_type = imghdr.what(filepath)
        destination_file_path = os.path.join(destination_folder_path, os.path.basename(filepath))
        if img_type is None:
            print(f"{filepath} is not an image")
            shutil.move(filepath, destination_file_path)
        elif img_type not in img_type_accepted_by_tf:
            print(f"{filepath} is a {img_type}, not accepted by TensorFlow")
            shutil.move(filepath, destination_file_path)

This removed atleast 500 images from each of the sub folders(classes)

import os
import cv2
import imghdr
def check_images( s_dir, ext_list):
    bad_images=[]
    bad_ext=[]
    s_list= os.listdir(s_dir)
    for klass in s_list:
        klass_path=os.path.join (s_dir, klass)
        print ('processing class directory ', klass)
        if os.path.isdir(klass_path):
            file_list = os.listdir(klass_path)
            for f in file_list:   
                f_path = os.path.join (klass_path,f)
                tip = imghdr.what(f_path)
                if ext_list.count(tip) == 0:
                  bad_images.append(f_path)
                if os.path.isfile(f_path):
                    try:
                        img = cv2.imread(f_path)
                        shape = img.shape
                    except Exception:
                        print('file ', f_path, ' is not a valid image file')
                        bad_images.append(f_path)
                else:
                    print('*** fatal error, you a sub directory ', f, ' in class directory ', klass)
        else:
            print ('*** WARNING*** you have files in ', s_dir, ' it should only contain sub directories')
    return bad_images, bad_ext
source_dir = './'
good_exts=['jpg', 'png', 'jpeg', 'gif', 'bmp' ]
bad_file_list, bad_ext_list=check_images(source_dir, good_exts)
if len(bad_file_list) != 0:
    print('improper image files are listed below')
    for i in range (len(bad_file_list)):
        print (bad_file_list[i])
else:
    print('no improper image files were found')

I used the above 2 scripts to remove corrupted images but still I am getting the unknown file format error.

Kiran_Sai_Ramineni · April 20, 2023, 1:21pm

Hi @Mokshit_Surana, Could you please check if there are any hidden files present in the directory. Also please refer to this gist for working code example for implementation of your use case. Thank You.

Mokshit_Surana · April 20, 2023, 1:48pm

There are no hidden files.

laijasonk · April 25, 2023, 3:10am

For issues like this, it can be useful to take a step back and do some basic sanity checking. I list a few examples below:

Confirm whether any of the images work as intended. Since you have the images in directories, you can try keeping only a few images in each directory and re-run your test.
Confirm that the issue is not extension specific. The same as above, except you only keep one image of each extension (jpg, png, etc.).
Confirm that the issue is not class-specific. You can exclude specific classes by removing the sub-directory to see if you continue to get an error.

Once all those basic checks are done, you can attempt to see which images are causing issues with a simple try block in your for loop. There may be a smarter way to do this, but a quick and dirty approach would be to set the batch size to 1 and test each batch individually so that you can identify all the failed images. Make sure to turn off shuffling or else the image file names get all mixed up.

See below for an example:

val_ds = tf.keras.utils.image_dataset_from_directory(
    ...
    ...
    shuffle=False,  # make sure you do NOT shuffle for this test
    batch_size=1,
    ...
    ...
)

# Track the image file names
img_paths = val_ds.file_paths
img_idx = 0

# Print the image path only if x.shape or y.shape fails
for x, y in val_ds:
    try:
        x_shape = x.shape
        y_shape = y.shape
    except:
        print(img_paths[img_idx])
    img_idx += 1

Hopefully this puts you on the right track! Good luck!

Kiran_Sai_Ramineni · May 15, 2023, 1:01pm

Hi @Mokshit_Surana, Try to filter out and delete the badly-encoded images that do not feature the string “JFIF” in their header. You can do that by using the code presented here. Thank You.