Need help training with ModelMaker & Cloud TPU in Colab

I’m trying to train an object detection model in colab using tflite model maker and I’d like to use a cloud TPU, but so far all I’m getting are errors… I have searched for any tutorial that shows how to do it, but have come up empty. I’d appreciate any tips on what to try next!

The essence of what I have so far is derived from the Model Maker Object Detection Tutorial:

tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
print(tpu.master()) # --> grpc://10.5.111.234:8470
train_data, validation_data, test_data = object_detector.DataLoader.from_csv(
    drive_dir + 'cub.csv', images_dir = "images", num_shards = 10)
spec = object_detector.EfficientDetLite0Spec(tflite_max_detections=10,
                                             strategy='tpu', tpu=tpu.master(), debug=True)
model = object_detector.create(train_data=train_data, 
                               model_spec=spec, 
                               validation_data=validation_data, 
                               epochs=15, 
                               batch_size=30, 
                               train_whole_model=True)

The error I get in the last step:

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in shape(self)
   1196         # `_tensor_shape` is declared and defined in the definition of
   1197         # `EagerTensor`, in C.
-> 1198         self._tensor_shape = tensor_shape.TensorShape(self._shape_tuple())
   1199       except core._NotOkStatusException as e:
   1200         six.raise_from(core._status_to_exception(e.code, e.message), None)

InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on /tmp/tfhub_modules/0771ebbebbfa831cd20c20680920ec4ad9deb2bd/variables/variables: Unimplemented: File system scheme '[local]' not implemented (file: '/tmp/tfhub_modules/0771ebbebbfa831cd20c20680920ec4ad9deb2bd/variables/variables')

Is the issue that I have to put the training images into GCS? Or is there more to getting this to work?

1 Like

I moved all the images to a GCS bucket and adjusted the references in the CSV, but I still get the same error…

Hi tve,

did you manage to make this work?
I think the problem is related to TPU accessing the base model that Model Makers uses on Colab.

Model Maker uses the base models from TensorFlow Hub
You can set an env variable to use the models directly on GCS and this might fix your issue:

import os
import tensorflow_hub as hub

os.environ["TFHUB_MODEL_LOAD_FORMAT"] = "UNCOMPRESSED"

did it help?