BoostedTreesClassifier format of input

Doug · June 15, 2021, 1:00am

Hi all,

I have a pandas DataFrame with features as columns and rows as observations. One of the columns is a Series where each element is a 512-long tf.Tensor. I am trying to pass this Tensor vector, along with the other features, into a tf.estimator.BoostedTreesClassifier model. However, I am receiving the following error when passing the tf.Tensor column:

AttributeError: Tensor.name is meaningless when eager execution is enabled.

Your help is much appreciated! Below is a reproducible example. Many thanks in advance for your help!

import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub    
df = pd.DataFrame({"Text": ['This is text one', 'This is text two', 'And well, this is just the third text']})
model_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
encodings = tf.keras.Sequential(
    [
        tf.keras.layers.InputLayer(dtype=tf.string),
        hub.KerasLayer(model_url, input_shape=[], dtype=tf.string),
    ]
)
def encodes_text(txt):
    return encodings(tf.constant([txt]))
df['embeddings'] = df.map(lambda x: encodes_text(x))
tree_class = tf.estimator.BoostedTreesClassifier(
    df.embedding, 
    max_depth=3,
    n_classes=2,
    n_trees,50,
    n_batches_per_layer=1
)

markdaoust · June 15, 2021, 4:20pm

If you’re just getting started on this project my advice is don’t use anything in tf.estimator. Use TensorFlow Decision Forests which takes advantage of modern APIs.

If you’re going to ignore that advice, and do it with tf.estimator anyway, then the fix is to note that first argument isn’t meant to be the data. It’s meant to be a list of tf.feature_column objects that describe how the model should process the data.

See:

Doug · June 16, 2021, 6:29am

Many thanks, @markdaoust for the pointers! I’ll be happy to use tfdf instead, given this model will be run on a linux cloud.

Incidentally, will tf.estimator models be deprecated? And your advise not to use them is just based on a new API being available via tfdf or on other things like model performance, stability, etc?

markdaoust · June 16, 2021, 12:43pm

Estimators are fundamentally a TF1 thing. Supporting TF1 takes resources we’d rather spend on making TF2 better. We’d like to resolve that eventually. The less estimator code there is out there the easier that will be.

Doug · June 16, 2021, 1:33pm

Noted, @markdaoust. Thanks for clarifying!