Tensorflow serving in Kubernetes deployment fails to predict based on input json (text based messages) - Output exceeds the size limit error

I have created tensorflow saved model using tf.keras.models.save_model as below -

tf.keras.models.save_model(
    text_model,
    export_path,
    overwrite=True,
    include_optimizer=True,
    save_format=None,
    signatures=None,
    options=None
)

Then I have deployed the same model in Kubernetes cluster with tensorflow serving image - The deployment yaml looks like -

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: tensorflow-serving
  name: news-classifier
spec:
  selector:
    matchLabels:
      app: news-classifier-server
  replicas: 3
  template:
    metadata:
      labels:
        app: news-classifier-server
    spec:
      containers:
      - name: news-classifier-container
        image: tensorflow/serving:latest
        ports:
        - containerPort: 8501
        volumeMounts:
          - name: news-classifier-vol
            mountPath: "/models/model"
        env:
        - name: MODEL_NAME
          value: "model"
        - name: MODEL_BASE_PATH
          value: "/models" 
      volumes:
      - name: news-classifier-vol
        persistentVolumeClaim:
          claimName: news-classifier
---
apiVersion: v1
kind: Service
metadata:
  namespace: tensorflow-serving
  name: news-classifier-service
spec:
  ports:
  - port: 8501
    targetPort: 8501
  selector:
    app: news-classifier-server
  type: ClusterIP

I could successfully mount the saved model inside pod using PVC and the serving logs shows no error.

But when I try to use the predict method of the model using below code, it does not work.

import requests
import json
import numpy as np

sample_news = ["In the last weeks, there has been many transfer suprises in footbal. Ronaldo went back to Old Trafford.",
               "while Messi went to Paris Saint Germain to join his former colleague Neymar.",
               "We can't wait to see these two clubs will perform in upcoming leagues"]

data = json.dumps({"instances": sample_news})

# Define headers with content-type set to json
headers = {"content-type": "application/json"}

# Capture the response by making a request to the appropiate URL with the appropiate parameters
json_response = requests.post('http://localhost:8501/v1/models/model:predict', data=data, headers=headers)

# Parse the predictions out of the response
predictions = json.loads(json_response.text)['predictions']

Getting following errors -

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
RemoteDisconnected                        Traceback (most recent call last)
c:\Users\htmrhv\apps\Python\Python38\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    698             # Make the request on the httplib connection object.
--> 699             httplib_response = self._make_request(
    700                 conn,

c:\Users\htmrhv\apps\Python\Python38\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    444                     # Otherwise it looks like a bug in the code.
--> 445                     six.raise_from(e, None)
    446         except (SocketTimeout, BaseSSLError, SocketError) as e:

c:\Users\htmrhv\apps\Python\Python38\lib\site-packages\urllib3\packages\six.py in raise_from(value, from_value)

c:\Users\htmrhv\apps\Python\Python38\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    439                 try:
--> 440                     httplib_response = conn.getresponse()
    441                 except BaseException as e:

c:\Users\htmrhv\apps\Python\Python38\lib\http\client.py in getresponse(self)
   1343             try:
-> 1344                 response.begin()
   1345             except ConnectionError:

c:\Users\htmrhv\apps\Python\Python38\lib\http\client.py in begin(self)
...
--> 498             raise ConnectionError(err, request=request)
    499 
    500         except MaxRetryError as e:

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

The pod log shows below output -

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/usr/bin/tf_serving_entrypoint.sh: line 3:     7 Aborted                 tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

Please suggest.

For debugging purposes, can you send the request to a single TF Serving instance? If that works, then sth. is up w/ your K8S setup. If not, well, you should look into your request.

Hi @Wei_Wei , Thanks for responding!

I have tried with single replica also and error is same.

I am not sure about the what is missing in the request. Do you see any problem with the request formation ?