I am currently working through the example “Text Classification with TF Hub” on the Tensorflow for R website (TensorFlow for R - Text Classification with TF Hub), and try to replicate this model with some of my own data.
In the example, data are first loaded from external text files via text_dataset_from_directory and saved as a BatchDataset of shape (None,), which is named train_data. In my own data, the texts and labels are available as vectors of length 80 in R. When I try to convert these data, I obtain a BatchDataset of shape (None,80), which is incompatible with the Tensorflow model defined later.
My question is how I should preprocess my data to apply them with the Tensorflow Hub model.
I am providing a minimal example based on my R code. In the end, I obtain a Value Error, because the model expects data with shape (None,), but gets data with shape (None,80). I am grateful for any help.
# Loading the packages library(keras) library(tensorflow) library(tfdatasets) library(tfhub) # We transform the vectors into a tensorflow dataset: # texts_train and labels_train are vectors of length 80 texts_train_tf <- as_tensor(texts_train) # Has shape (80) labels_train_tf <- as_tensor(labels_train) # Has shape (80) train_tf <- tensors_dataset(c(texts_train_tf,labels_train_tf)) # TensorDataset of shape (80,) train_data_tf <- dataset_batch(train_tf, batch_size = 32) # BatchDataset of shape (80,) # Exploring the data batch_tf <- train_data_tf %\>% reticulate::as_iterator() %\>% reticulate::iter_next() # Following the example "Text Classification with TF Hub" embedding <- "https://tfhub.dev/google/nnlm-en-dim50/2" hub_layer <- tfhub::layer_hub(handle = embedding, trainable = TRUE) model <- keras_model_sequential() %\>% hub_layer() %\>% layer_dense(32, activation = "relu") %\>% layer_dense(1) model %\>% compile( optimizer = 'adam', loss = 'mean_squared_error', metrics = 'accuracy' ) history <- model %\>% fit( train_data_tf, epochs = 10, verbose = 1 )
Fitting this model leads to the following error message:
Error: ValueError: in user code: <… omitted …> File "C:\Users\RDEBEL~1\AppData\Local\Temp_autograph_generated_filei_8nea4e.py", line 37, in if_body_3 result = ag_.converted_call(ag__.ld(f), (), None, fscope)
ValueError: Exception encountered when calling layer ‘keras_layer_16’ (type KerasLayer).
in user code:
File “C:\Users\RDEBEL~1\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\tensorflow_hub\keras_layer.py”, line 234, in call * result = f()
ValueError: Python inputs incompatible with input_signature: inputs: ( Tensor(“IteratorGetNext:0”, shape=(None, 80), dtype=string)) input_signature: ( TensorSpec(shape=(None,), dtype=tf.string, name=None)).
Call arguments received by layer ‘keras_layer_16’ (type KerasLayer): • inputs=tf.Tensor(shape=(None, 80), dtype=string) • training=True
reticulate::py_last_error() for details