I need help with this- what does it mean

Lawrence1 · May 24, 2023, 2:54am

3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in init
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

chunduriv · May 24, 2023, 6:49am

@Lawrence1,

Welcome to the Tensorflow Forum!

We need more information, in order to help.

Could you please provide the standalone code and complete stack trace to investigate the issue?

Thank You!

Lawrence1 · May 26, 2023, 1:38am

thank you. See below:
Starting. Press “Enter” to stop training and save model.
Error: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[3,3,2048,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node src_dst_opt_1/Select_26 (defined at C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\ops_init_.py:212) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

     [[concat_7/concat/_947]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

(1) Resource exhausted: OOM when allocating tensor with shape[3,3,2048,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node src_dst_opt_1/Select_26 (defined at C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\ops_init_.py:212) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node src_dst_opt_1/Select_26:
src_dst_opt_1/Less_26 (defined at C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\ops_init_.py:211)

Input Source operations connected to node src_dst_opt_1/Select_26:
src_dst_opt_1/Less_26 (defined at C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\ops_init_.py:211)

Original stack trace for ‘src_dst_opt_1/Select_26’:
File “threading.py”, line 884, in _bootstrap
File “threading.py”, line 916, in bootstrap_inner
File “threading.py”, line 864, in run
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py”, line 58, in trainerThread
debug=debug)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py”, line 193, in init
self.on_initialize()
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py”, line 341, in on_initialize
self.src_dst_opt.initialize_variables (self.src_dst_saveable_weights, vars_on_cpu=optimizer_vars_on_cpu, lr_dropout_on_cpu=self.options[‘lr_dropout’]==‘cpu’)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py”, line 45, in initialize_variables
lr_rnds = [ nn.random_binomial( v.shape, p=self.lr_dropout, dtype=v.dtype) for v in trainable_weights ]
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py”, line 45, in
lr_rnds = [ nn.random_binomial( v.shape, p=self.lr_dropout, dtype=v.dtype) for v in trainable_weights ]
File "C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\ops_init.py", line 212, in random_binomial
array_ops.ones(shape, dtype=dtype), array_ops.zeros(shape, dtype=dtype))
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py”, line 206, in wrapper
return target(*args, **kwargs)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\array_ops.py”, line 4589, in where
return gen_math_ops.select(condition=condition, x=x, y=y, name=name)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_math_ops.py”, line 8853, in select
“Select”, condition=condition, t=x, e=y, name=name)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py”, line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py”, line 3569, in _create_op_internal
op_def=op_def)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py”, line 2045, in init
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Traceback (most recent call last):
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 1375, in _do_call
return fn(*args)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 1360, in _run_fn
target_list, run_metadata)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 1453, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[3,3,2048,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node src_dst_opt_1/Select_26}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

     [[concat_7/concat/_947]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

(1) Resource exhausted: OOM when allocating tensor with shape[3,3,2048,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node src_dst_opt_1/Select_26}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py”, line 129, in trainerThread
iter, iter_time = model.train_one_iter()
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py”, line 474, in train_one_iter
losses = self.onTrainOneIter()
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py”, line 774, in onTrainOneIter
src_loss, dst_loss = self.src_dst_train (warped_src, target_src, target_srcm, target_srcm_em, warped_dst, target_dst, target_dstm, target_dstm_em)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py”, line 584, in src_dst_train
self.target_dstm_em:target_dstm_em,
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 968, in run
run_metadata_ptr)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 1191, in _run
feed_dict_tensor, options, run_metadata)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 1369, in _do_run
run_metadata)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py”, line 1394, in do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[3,3,2048,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node src_dst_opt_1/Select_26 (defined at C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\ops_init.py:212) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

     [[concat_7/concat/_947]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

(1) Resource exhausted: OOM when allocating tensor with shape[3,3,2048,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node src_dst_opt_1/Select_26 (defined at C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\ops_init_.py:212) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn’t available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node src_dst_opt_1/Select_26:
src_dst_opt_1/Less_26 (defined at C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\ops_init_.py:211)

Input Source operations connected to node src_dst_opt_1/Select_26:
src_dst_opt_1/Less_26 (defined at C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\ops_init_.py:211)

Original stack trace for ‘src_dst_opt_1/Select_26’:
File “threading.py”, line 884, in _bootstrap
File “threading.py”, line 916, in bootstrap_inner
File “threading.py”, line 864, in run
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py”, line 58, in trainerThread
debug=debug)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py”, line 193, in init
self.on_initialize()
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py”, line 341, in on_initialize
self.src_dst_opt.initialize_variables (self.src_dst_saveable_weights, vars_on_cpu=optimizer_vars_on_cpu, lr_dropout_on_cpu=self.options[‘lr_dropout’]==‘cpu’)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py”, line 45, in initialize_variables
lr_rnds = [ nn.random_binomial( v.shape, p=self.lr_dropout, dtype=v.dtype) for v in trainable_weights ]
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py”, line 45, in
lr_rnds = [ nn.random_binomial( v.shape, p=self.lr_dropout, dtype=v.dtype) for v in trainable_weights ]
File "C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\ops_init.py", line 212, in random_binomial
array_ops.ones(shape, dtype=dtype), array_ops.zeros(shape, dtype=dtype))
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py”, line 206, in wrapper
return target(*args, **kwargs)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\array_ops.py”, line 4589, in where
return gen_math_ops.select(condition=condition, x=x, y=y, name=name)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_math_ops.py”, line 8853, in select
“Select”, condition=condition, t=x, e=y, name=name)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py”, line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py”, line 3569, in _create_op_internal
op_def=op_def)
File “C:\Users\newle\Downloads\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py”, line 2045, in init
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

chunduriv · May 26, 2023, 9:35am

@Lawrence1,

(0) Resource exhausted: OOM when allocating tensor with shape[3,3,2048,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

This indicates that the model has run out of GPU memory.

You can try the following steps

Reduce the batch size, it will be a bit slower but it avoids memory issues
You can try limiting gpu memory: It can be done in two ways

Turn on memory growth by calling tf.config.experimental.set_memory_growth.
It allocates more memory as the process increases and demands extra memory
Set a hard limit on the total memory tf.config.set_logical_device_configuration(memory_limit=1024)

Use a GPU with more memory

Thank you!