Use vector features as input to tf neural network model

Jaden · April 5, 2024, 3:30am

Hello,

I am fairly new to TensorFlow and I’m not exactly sure how to build my model. I’m working with a model with 11 features; however, 5 of these features are actually vector features rather than scalar (for example, one of these features is word embeddings, each of length 50). The 5 different features have unique vector lengths, but all of the vectors to a corresponding feature are the same length (ex. all of the word embeddings are of length 50, but vectors of another feature are not length 50). Is there a way to build the NN while taking into account these vectors? I want my output layer to have 100 nodes. I’m not sure if Concatenate() will work since each feature has differing vector lengths. Thanks for the help.

rcauvin · April 5, 2024, 2:34pm

Keras is designed to do exactly what you are proposing. You can create preprocessing layers that, for example, vectorize text and pass embeddings to the neural network. See this guide for examples. What kinds of predictions are you trying to make?

Jaden · April 5, 2024, 3:19pm

I’m essentially trying to predict a sentence embedding given certain features of sentences. It’s for a project. My vector features are average word embeddings (using word2vec), summed word embeddings, tf-idf vector, etc. As I said, I also have some scalar features as well. I had the tabular data formatted as a pandas dataframe, which I then converted to a numpy array with df.values. So, with this setup, I’m trying to get an output prediction of a vector of length 100, representing a sentence embedding. However, I keep getting errors, as it seems Keras is unable to handle the input format I’m giving it.

rcauvin · April 5, 2024, 3:24pm

Do you have a simplified example of an input pipeline that is resulting in the errors you are seeing?

Jaden · April 5, 2024, 3:38pm

I think this is what you mean. This is the pandas version of what I’m training on. I’m inputting the numpy array version of this into keras, and all the embeddings/vectors within the dataframe are also numpy arrays. If you were looking for something else then apologies, let me know

rcauvin · April 5, 2024, 5:49pm

Now I see the embeddings/vectors are already in the input dataset. Keras should be able to handle it, but you probably need to do some manipulation.

Are you converting the numpy arrays into tensors or putting them in a tf.data.Dataset at any point?

Are the errors occurring when you try to train the model?

Jaden · April 5, 2024, 6:19pm

I tried to convert the arrays to tensors, but I’m met with a ValueError stating “setting an array element with a sequence.” I suspect it has to do with the various length vectors among the features but I’m not entirely sure.

I haven’t tried anything with tf.data.Dataset. Should I look into that?

Jaden · April 8, 2024, 3:24pm

I tried to convert the arrays to tensors, but I’m met with a ValueError stating “setting an array element with a sequence.” I suspect it has to do with the various length vectors among the features but I’m not entirely sure. I also tried using tf.data.Dataset, but I’m getting errors and also crashes whenever this is ran. Originally, though, the errors were coming from when I tried to train the model.