Direct loading/inference on model created in Vertex AI

I’m not familiar with TF Serving, so I would appreciate any information or direction on the problem I’m facing.

I’ve trained a classification AutoML model on Vertex AI.Then I’ve downloaded the model locally and I am trying to load it and run inference with it directly. To clarify, I have been successful in running the model within the AutoML Docker image, but this is not what I want – I need to be able to load the saved model directly and run inference with it. I need to follow the same workflow one would follow with a Keras model.

I have been struggling with this for several weeks now. I even made a post on StackOverflow, but I got no replies: protocol buffers - Get predictions from Tensorflow Serve SavedModel - Stack Overflow

I have managed to solve the error somehow. I went into the AutoML Docker container, figured out (or so I think) the protobuf functions used to transform the input, and I’ve been able to feed that input to the model. However, my model is always giving the same output now. No matter what I give as input, I will always get the exact same output.

I have been thinking it might be because I’m perhaps loading the model with TF2 functions, so I tried loading it the TF1 way. I process the data in the same way, and I get the same output, always the same output. I’m at the end of my wits here, so any feedback is appreciated. I am posting the relevant part of the code here.

import numpy as np
from struct2tensor import *
import tensorflow_addons as tfa
tfa.register_all()
import tensorflow as tf
import tensorflow.compat.v1 as tf1
tf1.disable_eager_execution()
from sklearn.model_selection import train_test_split
import pandas as pd
import json

# The `translate` function is the one I found in the AutoML Docker container
import prediction.translate as translate

df = pd.read_csv('data/sample_data.csv', converters={i: str for i in range(0, 500)})

target = ['Objective'] 
train_x = df.drop(target, axis=1)
train_y = df[target]
variables = list(train_x.columns)
(train_x, test_x, train_y, test_y) = train_test_split(train_x, train_y, test_size = 0.3, random_state=1)

with tf1.Session() as sess:       
    model = tf1.saved_model.load(sess, ["serve"], '001')
    sig_def = model.signature_def[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
    input_name = sig_def.inputs['inputs'].name
    output_name = sig_def.outputs['scores'].name
    
    def predict(data):
        res = sess.run(output_name, feed_dict={input_name: data})
        print('Output with input', data, ': ', res)
        
    for k in range(100):
        data = json.dumps({"instances": [json.loads(train_x.iloc[0+k].to_json(orient='index'))]})    
        req, batch_size, err = translate.user_request_to_tf_serving_request(data)
        model_input = req.inputs['inputs'].string_val
        predict(model_input)

This snippet will print out 100 times the same output, which is in my case [[0.41441688 0.5855831 ]].

What I am thinking about trying next is that perhaps I should try to load the TF1 model and save it as TF2? I am not sure if this will save my problem, though. As I mentioned, I am really not familiar with TF Serving so this is quite confusing for me.

Any help is appreciated.

I was able to replicate pretty much all of the problems you described, however, AIUI, the only supported way to run an AutoML model is through the docker container. See this comment from Helin Wang (a Cloud Googler).

I’m curious if the model worked for you in the docker container? It did not work for me, so I’ll chase that up with the AutoML teams, it’d be good to know if you’re also impacted.