Problem with Image Captioning tutorial

Hello. I’m new to NLP and I’m trying to follow Tensorflow’s tutorial on Image Captioning (Image captioning with visual attention  |  TensorFlow Core), but I ran into an problem when trying to preprocess the images with InceptionV3. As said in the tutorial, I have a preprocessing function

def load_image(image_path):   
    img =    
    img = tf.image.decode_jpeg(img, channels=3)    
    img = tf.image.resize(img, (299, 299))    
    img = tf.keras.applications.inception_v3.preprocess_input(img)    
    return img, image_path

Then I use it to get a BatchDataset (I get a <BatchDataset shapes: ((None, 299, 299, 3), (None,)), types: (tf.float32, tf.string)>)

# Get unique images
encode_train = sorted(set(img_name_vector))
# Feel free to change batch_size according to your system configuration
image_dataset =
image_dataset =,

Up to this point, everything works, but then when I try

for img, path in image_dataset:
    #Do something

Either nothing happens or the kernel dies. Is there a way to fix or circumvent that issue?


I’ve just tried the colab you linked to and it passed the part you mentioned (and finished fine).
Is your runtime configured to use GPU?

When the kernel dies might be an out of memory issue, did you notice that on the RAM bar on the top right corner?

Thanks for the reply, @lgusm .
I don’t think this is a memory issue. I purposefully reduced the number of images down to 64 so that such a problem can’t happen. I also checked my task manager and didn’t see any problem there either.
Could there be a problem with the tensorflow version that I’m using? It’s not likely the problem, but I think I have an earlier version than the one used on colab.

hum, maybe

@markdaoust might know more about it.

Given that:

you may try running the tutorial with v2.5 on your local machine and check if that fixes your issue.

The notebook example is runnable in Colab end-to-end and Colab is loaded with the current latest version (TF 2.5).

This might be due to the compute/etc requirements of the example, but we don’t have all information about your setup to be able to completely debug this. As you’re probably aware, it’s not a small dataset for a typical demo:

“… large download ahead . You’ll use the training set, which is a 13GB file…”

And the Caching the features extracted from InceptionV3 step can be compute intensive. It comes with a warning in the tutorial:

“You will pre-process each image with InceptionV3 and cache the output to disk. Caching the output in RAM would be faster but also memory intensive, requiring 8 * 8 * 2048 floats per image. At the time of writing, this exceeds the memory limitations of Colab (currently 12GB of memory).”

Also keeping in mind that, as the doc says:

“Performance could be improved with a more sophisticated caching strategy (for example, by sharding the images to reduce random access disk I/O), but that would require more code.”

Maybe some or all of that are contributing to the issue.

Let us know if upgrading to TF 2.5 fixes anything.

1 Like

Yes, the amount of memory needed is probably very large, which is why I restricted the number of images to 64 just to see if the rest of the code works.
I managed to upgrade tensorflow to the latest version (TF2.5), and now if I write

for img, path in image_dataset:

the kernel doesn’t die anymore, but I get an error:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-20-55361e928603> in <module>
----> 1 for img, path in image_dataset:
      2     pass

~\anaconda3\envs\TF2_5\lib\site-packages\tensorflow\python\data\ops\ in __next__(self)
    759   def __next__(self):
    760     try:
--> 761       return self._next_internal()
    762     except errors.OutOfRangeError:
    763       raise StopIteration

~\anaconda3\envs\TF2_5\lib\site-packages\tensorflow\python\data\ops\ in _next_internal(self)
    745           self._iterator_resource,
    746           output_types=self._flat_output_types,
--> 747           output_shapes=self._flat_output_shapes)
    749       try:

~\anaconda3\envs\TF2_5\lib\site-packages\tensorflow\python\ops\ in iterator_get_next(iterator, output_types, output_shapes, name)
   2722       _result = pywrap_tfe.TFE_Py_FastPathExecute(
   2723         _ctx, "IteratorGetNext", name, iterator, "output_types", output_types,
-> 2724         "output_shapes", output_shapes)
   2725       return _result
   2726     except _core._NotOkStatusException as e:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 117: invalid continuation byte

I fixed the problem, so in case someone has the same problem, here’s how I solved it:
I found out that the file “captions_train2014.json” contains image IDs that do not exist in the “train2014” folder, so when trying to iterate over the images, the error occured. More exactly, there are 82783 different IDs, but I have only 74891 images. I fixed that by verifying if the image path exists before opening the image. I have no idea why that works in collab though (but maybe my download just went wrong).


Hey, I am running into the same problem, can you please show how you fixed it ?