How to re-use training results when migrate from 1.x to 2.3

Hi

I have results trained with tensorflow.contrib.slim.fully_connected layer and the training results are stored as “fully_connected/weights”. When I migrate to TF2.3, the contrib package was not supported anymore. Instead, I am using tf.compat.v1.layers.dense(…, name=‘fully_connected’). But dense layer’s kernal is named as “kernel” while in tensorflow.contrib.slim.fully_connected, it was named as “weights”.

When load my trained results, it complain it cannot find “fully_connected/kernel” in my checkpoints. (it was saved as “fully_connected/weights”.

How can I resolve this kernel name difference?

Thanks for your help

Any comment is appreciated. Thanks.

Can you paste the error?

Here is the error message:

2021-07-12 09:49:20.004761: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key Agent_0/fully_connected/kernel not found in checkpoint
Traceback (most recent call last):
File “C:\Users\zhuli\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py”, line 1356, in _do_call
return fn(*args)
File “C:\Users\zhuli\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py”, line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “C:\Users\zhuli\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py”, line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key Agent_0/fully_connected/kernel not found in checkpoint
[[{{node save/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\zhuli\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 1286, in restore
{self.saver_def.filename_tensor_name: save_path})
File “C:\Users\zhuli\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py”, line 950, in run
run_metadata_ptr)
File “C:\Users\zhuli\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py”, line 1173, in _run
feed_dict_tensor, options, run_metadata)
File “C:\Users\zhuli\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py”, line 1350, in _do_run
run_metadata)
File “C:\Users\zhuli\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py”, line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key Agent_0/fully_connected/kernel not found in checkpoint
[[node save/RestoreV2 (defined at \Documents\PortfolioOptConvert\a3c\server.py:44) ]]

Original stack trace for ‘save/RestoreV2’:
File “\Documents\PortfolioOptConvert\a3c.py”, line 90, in
r = Server().main()
File “\Documents\PortfolioOptConvert\a3c\server.py”, line 44, in init
self.saver = tf.compat.v1.train.Saver(max_to_keep=Config.MAX_SAVE_RESULT)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 825, in init
self.build()
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 837, in build
self._build(self._filename, build_save=True, build_restore=True)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 875, in _build
build_restore=build_restore)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 508, in _build_internal
restore_sequentially, reshape)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 328, in _AddRestoreOps
restore_sequentially)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_io_ops.py”, line 1779, in restore_v2
name=name)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py”, line 788, in _apply_op_helper
op_def=op_def)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py”, line 507, in new_func
return func(*args, **kwargs)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py”, line 3616, in create_op
op_def=op_def)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py”, line 2005, in init
self._traceback = tf_stack.extract_stack()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\zhuli\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 1296, in restore
names_to_keys = object_graph_key_mapping(save_path)
File “C:\Users\zhuli\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 1614, in object_graph_key_mapping
object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
File “C:\Users\zhuli\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py”, line 678, in get_tensor
return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\zhuli\Documents\PortfolioOptConvert\a3c.py”, line 90, in
r = Server().main()
File “C:\Users\zhuli\Documents\PortfolioOptConvert\a3c\server.py”, line 115, in main
self.saver.restore(sess, ckpt.all_model_checkpoint_paths[Config.WHICH_CHECKPOINT])
File “C:\Users\zhuli\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 1302, in restore
err, “a Variable name or other graph key that is missing”)
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key Agent_0/fully_connected/kernel not found in checkpoint
[[node save/RestoreV2 (defined at \Documents\PortfolioOptConvert\a3c\server.py:44) ]]

Original stack trace for ‘save/RestoreV2’:
File “\Documents\PortfolioOptConvert\a3c.py”, line 90, in
r = Server().main()
File “\Documents\PortfolioOptConvert\a3c\server.py”, line 44, in init
self.saver = tf.compat.v1.train.Saver(max_to_keep=Config.MAX_SAVE_RESULT)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 825, in init
self.build()
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 837, in build
self._build(self._filename, build_save=True, build_restore=True)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 875, in _build
build_restore=build_restore)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 508, in _build_internal
restore_sequentially, reshape)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 328, in _AddRestoreOps
restore_sequentially)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py”, line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_io_ops.py”, line 1779, in restore_v2
name=name)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py”, line 788, in _apply_op_helper
op_def=op_def)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py”, line 507, in new_func
return func(*args, **kwargs)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py”, line 3616, in create_op
op_def=op_def)
File “\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py”, line 2005, in init
self._traceback = tf_stack.extract_stack()

I suppose you can simple rename the variable in the checkpoint.

Check if this script is still working correctly:

It works!! Much appreciate for your help.