OP_REQUIRES failed at cast_op.cc:121 : UNIMPLEMENTED: Cast string to float is not supported

This question is based on the TFX recommender tutorial. Please note that the code is being orchestrated by LocalDagRunner rather than run interactively in a notebook.
We have a MovieModel with a compute_loss function as follows:

class MovielensModel(tfrs.Model):
    def __init__(self, user_model, movie_model, tf_transform_output: TFTransformOutput, movie_uris: List[str]):
        super().__init__()
        self.movie_model: tf.keras.Model = movie_model
        self.user_model: tf.keras.Model = user_model

        movie_files = glob.glob(os.path.join(movie_uris[0], '*'))
        movies = tf.data.TFRecordDataset(movie_files, compression_type="GZIP")
        movies_dataset = extract_str_feature(movies, 'movie_title')

        loss_metrics = tfrs.metrics.FactorizedTopK(
            candidates=movies_dataset.batch(128).map(movie_model)
        )

        self.task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
            metrics=loss_metrics
        )

    def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
        # We pick out the user features and pass them into the user model.
        # try:
        user_embeddings = tf.squeeze(self.user_model(features['user_id']), axis=1)
            # And pick out the movie features and pass them into the movie model,
            # getting embeddings back.
        print(features['movie_title'])
        print(type(features['movie_title']))
        
        positive_movie_embeddings = self.movie_model(features['movie_title'])

            # The task computes the loss and the metrics.
        _task = self.task(user_embeddings, positive_movie_embeddings)
        # except BaseException as err:
        #    logging.error('######## ERROR IN compute_loss:\n{}\n###############'.format(err))

        return _task

The model fails on line positive_movie_embeddings = self.movie_model(features['movie_title']) with error
W tensorflow/core/framework/op_kernel.cc:1722] OP_REQUIRES failed at cast_op.cc:121 : UNIMPLEMENTED: Cast string to float is not supported

We also see in the trace:

Node: 'sequential_1/Cast'
Cast string to float is not supported
         [[{{node sequential_1/Cast}}]] [Op:__inference_train_function_1214007]
ERROR:absl:######## ERROR IN run_fn during fit:
Graph execution error:

We see that Features['movie_title'] is of type

SparseTensor(indices=Tensor("inputs_21_copy:0", shape=(None, 2), dtype=int64), values=Tensor("inputs_22_copy:0", shape=(None,), dtype=string), dense_shape=Tensor("inputs_23_copy:0", shape=(2,), dtype=int64)) 

The string values are as expected as the data files for this tutorial contain movie titles as strings.

I have looked at the other SO posts on this error but cannot relate them to this context. What could be causing this issue?

Not really sure, but it could be caused by using LocalDagRunner instead of InteractiveContext. To simplify the example we put additional channels in custom_config, instead of subclassing and adding them to the spec. This works in InteractiveContext because the order of execution is the order that the cells are run, but for other dag runners there is a race condition.

@Robert_Crowe we have already created a custom tfx trainer component to supply regular parameters to the trainer module. This explains the previous error we had with the custom_config. However, we now see this error above.

  1. If you run the code in a Colab using the InteractiveContext, do you still see the error?
  2. Alternatively, if you subclass the Trainer component spec and include the additional channels, instead of passing them in the custom_config, do you still see the error?

Eventually resolved with the custom class and fn args, and making sure the path was correct in the trainer.

1 Like