TensorFlow 1.15: Protobuf UTF-8 Error with Large Dataset When Saving Model

Huang_Nuoxian · January 3, 2024, 3:54am

Hello TensorFlow Community,

I am experiencing a protobuf-related error in TensorFlow 1.15 when attempting to save my model using tf.train.Saver() with a large dataset. The same process works fine with a smaller subset of the dataset. Both subsets are sampled from a 20GB dataset, and I am confident that the data processing flow is correct.

Error Message:

libprotobuf ERROR google/protobuf/wire_format_lite.cc:581] String field 'tensorflow.TensorShapeProto.Dim.name' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes. 
Traceback (most recent call last):
  File "main_time_series_deconfounder.py", line 67, in <module>
    test_time_series_deconfounder(dataset=dataset, num_substitute_confounders=args.num_substitute_hidden_confounders,
  File "/root/autodl-tmp/Conformity_Casual_Inferance/time_series_deconfounder.py", line 370, in test_time_series_deconfounder
    rmse_without_confounders = train_rmsn(dataset_map, 'rmsn_' + str(exp_name), b_use_predicted_confounders=False)
  File "/root/autodl-tmp/Conformity_Casual_Inferance/time_series_deconfounder.py", line 201, in train_rmsn
    rnn_fit(dataset_map=dataset_map, networks_to_train='propensity_networks', MODEL_ROOT=MODEL_ROOT,
  File "/root/autodl-tmp/Conformity_Casual_Inferance/rmsn/script_rnn_fit.py", line 162, in rnn_fit
    hyperparam_opt = train(net_name, expt_name,
  File "/root/autodl-tmp/Conformity_Casual_Inferance/rmsn/core_routines.py", line 219, in train
    helpers.save_network(sess, model_folder, cp_name, optimisation_summary)
  File "/root/autodl-tmp/Conformity_Casual_Inferance/rmsn/libs/net_helpers.py", line 148, in save_network
    save_path = saver.save(tf_session, os.path.join(model_folder, "{0}.ckpt".format(cp_name)))
  File "/root/miniconda3/envs/casual/lib/python3.8/site-packages/tensorflow_core/python/training/saver.py", line 1200, in save
    self.export_meta_graph(
  File "/root/miniconda3/envs/casual/lib/python3.8/site-packages/tensorflow_core/python/training/saver.py", line 1246, in export_meta_graph
    graph_def=ops.get_default_graph().as_graph_def(add_shapes=True),
  File "/root/miniconda3/envs/casual/lib/python3.8/site-packages/tensorflow_core/python/framework/ops.py", line 3238, in as_graph_def
    result, _ = self._as_graph_def(from_version, add_shapes)
  File "/root/miniconda3/envs/casual/lib/python3.8/site-packages/tensorflow_core/python/framework/ops.py", line 3166, in _as_graph_def
    graph.ParseFromString(compat.as_bytes(data))
google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'

The error occurs during the model-saving step, and it seems to be related to how protobuf handles certain data. I came across a similar issue described in another post, where the user faced the same error. The traceback points to graph.ParseFromString in TensorFlow’s internal operations.

Additional Observations:

The error suggests that ‘invalid UTF-8 data’ was sent to protobuf.
The issue is reproducible only with a larger subset (2.5GB) of the data, not with a smaller subset (1.5GB).
Before the execution of graph.ParseFromString(compat.as_bytes(data)), it was confirmed that data is of type <class 'bytes'>.

Environment:

TensorFlow version: 1.15
Operating System: Ubuntu 20.04.3 LTS

Has anyone encountered a similar problem or can offer any advice on why this error might be occurring with larger datasets in TensorFlow 1.15? Any thoughts or suggestions would be highly appreciated.

Thank you!

Renu_Patel · January 9, 2024, 6:32am

Hi @Huang_Nuoxian

Welcome to the TensorFlow Forum!

Could you please try replicating the same code again with the latest TensorFlow version and let us know if the issue still persists along with the reproducible code to replicate the error and to understand the issue? Thank you.

Tim_Wolfe · January 27, 2024, 3:47am

The error you’re encountering in TensorFlow 1.15, related to protobuf and UTF-8 data when saving a model, is indicative of an issue that often arises when dealing with large amounts of data. TensorFlow uses protobuf (Protocol Buffers) for serializing and deserializing structured data, and it seems like you’re hitting a size limitation inherent to protobuf.