Unable to Load Large 18GB Dataset (Numpy Array) with 11GB GPU

Hello,

I have my features and labels saved in a .npz file (numpy arrays) that’s about 18GB. My GPU has 11GB RAM (although the CPU RAM is 120GB).

I’m trying to load the data as a tf.data.Dataset but not having any luck.

I’m using TensorFlow 2.14 and the below code:

import tensorflow as tf
import numpy as np

# load from npz file
data = np.load('231017_encoded_seqs_labels.npz')
encoded_sequences = data['encoded_sequences']
input_labels = data['input_labels']

# Create tf.data.Dataset from your data.
dataset = tf.data.Dataset.from_tensor_slices((encoded_sequences, input_labels))

Everything runs fine except the last line which gives the below error. Kindly assist.


2023-10-19 11:27:34.562484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10778 MB memory:  -> device: 0, name: Tesla K40m, pci bus id: 0000:05:00.0, compute capability: 3.5
2023-10-19 11:27:34.563653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10778 MB memory:  -> device: 1, name: Tesla K40m, pci bus id: 0000:81:00.0, compute capability: 3.5
2023-10-19 11:27:34.566228: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 19403961120 exceeds 10% of free system memory.
2023-10-19 11:27:59.419189: W tensorflow/tsl/framework/bfc_allocator.cc:485] Allocator (GPU_0_bfc) ran out of memory trying to allocate 18.07GiB (rounded to 19403961344)requested by op _EagerConst
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2023-10-19 11:27:59.419249: I tensorflow/tsl/framework/bfc_allocator.cc:1039] BFCAllocator dump for GPU_0_bfc
2023-10-19 11:27:59.419273: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (256): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419289: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (512): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419303: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (1024): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419317: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (2048): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419331: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (4096): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419344: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (8192): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419358: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (16384): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419371: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (32768): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419385: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (65536): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419398: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (131072): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419411: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (262144): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419425: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (524288): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419439: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (1048576): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419453: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (2097152): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419466: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (4194304): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419480: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419495: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (16777216): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419509: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (33554432): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419522: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (67108864): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419535: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (134217728): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419549: I tensorflow/tsl/framework/bfc_allocator.cc:1046] Bin (268435456): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-10-19 11:27:59.419566: I tensorflow/tsl/framework/bfc_allocator.cc:1062] Bin for 18.07GiB was 256.00MiB, Chunk State: 
2023-10-19 11:27:59.419578: I tensorflow/tsl/framework/bfc_allocator.cc:1100]      Summary of in-use Chunks by size: 
2023-10-19 11:27:59.419589: I tensorflow/tsl/framework/bfc_allocator.cc:1107] Sum Total of in-use chunks: 0B
2023-10-19 11:27:59.419604: I tensorflow/tsl/framework/bfc_allocator.cc:1109] Total bytes in pool: 0 memory_limit_: 11301945344 available bytes: 11301945344 curr_region_allocation_bytes_: 11301945344
2023-10-19 11:27:59.419622: I tensorflow/tsl/framework/bfc_allocator.cc:1114] Stats: 
Limit:                     11301945344
InUse:                               0
MaxInUse:                            0
NumAllocs:                           0
MaxAllocSize:                        0
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2023-10-19 11:27:59.419639: W tensorflow/tsl/framework/bfc_allocator.cc:497] <allocator contains no memory>
---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
Cell In[6], line 2
      1 # Create tf.data.Dataset from your data.
----> 2 dataset = tf.data.Dataset.from_tensor_slices((encoded_sequences, input_labels))

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py:821, in DatasetV2.from_tensor_slices(tensors, name)
    817 # Loaded lazily due to a circular dependency (dataset_ops ->
    818 # from_tensor_slices_op -> dataset_ops).
    819 # pylint: disable=g-import-not-at-top,protected-access
    820 from tensorflow.python.data.ops import from_tensor_slices_op
--> 821 return from_tensor_slices_op._from_tensor_slices(tensors, name)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/data/ops/from_tensor_slices_op.py:25, in _from_tensor_slices(tensors, name)
     24 def _from_tensor_slices(tensors, name=None):
---> 25   return _TensorSliceDataset(tensors, name=name)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/data/ops/from_tensor_slices_op.py:33, in _TensorSliceDataset.__init__(self, element, is_files, name)
     31 def __init__(self, element, is_files=False, name=None):
     32   """See `Dataset.from_tensor_slices` for details."""
---> 33   element = structure.normalize_element(element)
     34   batched_spec = structure.type_spec_from_value(element)
     35   self._tensors = structure.to_batched_tensor_list(batched_spec, element)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py:134, in normalize_element(element, element_signature)
    131       else:
    132         dtype = getattr(spec, "dtype", None)
    133         normalized_components.append(
--> 134             ops.convert_to_tensor(t, name="component_%d" % i, dtype=dtype))
    135 return nest.pack_sequence_as(pack_as, normalized_components)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/profiler/trace.py:183, in trace_wrapper.<locals>.inner_wrapper.<locals>.wrapped(*args, **kwargs)
    181   with Trace(trace_name, **trace_kwargs):
    182     return func(*args, **kwargs)
--> 183 return func(*args, **kwargs)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/ops.py:698, in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
    696 # TODO(b/142518781): Fix all call-sites and remove redundant arg
    697 preferred_dtype = preferred_dtype or dtype_hint
--> 698 return tensor_conversion_registry.convert(
    699     value, dtype, name, as_ref, preferred_dtype, accepted_result_types
    700 )

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/tensor_conversion_registry.py:234, in convert(value, dtype, name, as_ref, preferred_dtype, accepted_result_types)
    225       raise RuntimeError(
    226           _add_error_prefix(
    227               f"Conversion function {conversion_func!r} for type "
   (...)
    230               f"actual = {ret.dtype.base_dtype.name}",
    231               name=name))
    233 if ret is None:
--> 234   ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
    236 if ret is NotImplemented:
    237   continue

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:328, in _constant_tensor_conversion_function(v, dtype, name, as_ref)
    325 def _constant_tensor_conversion_function(v, dtype=None, name=None,
    326                                          as_ref=False):
    327   _ = as_ref
--> 328   return constant(v, dtype=dtype, name=name)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:267, in constant(value, dtype, shape, name)
    170 @tf_export("constant", v1=[])
    171 def constant(value, dtype=None, shape=None, name="Const"):
    172   """Creates a constant tensor from a tensor-like object.
    173 
    174   Note: All eager `tf.Tensor` values are immutable (in contrast to
   (...)
    265     ValueError: if called on a symbolic tensor.
    266   """
--> 267   return _constant_impl(value, dtype, shape, name, verify_shape=False,
    268                         allow_broadcast=True)

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:279, in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    277     with trace.Trace("tf.constant"):
    278       return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
--> 279   return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    281 const_tensor = ops._create_graph_constant(  # pylint: disable=protected-access
    282     value, dtype, shape, name, verify_shape, allow_broadcast
    283 )
    284 return const_tensor

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:289, in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    287 def _constant_eager_impl(ctx, value, dtype, shape, verify_shape):
    288   """Creates a constant on the current device."""
--> 289   t = convert_to_eager_tensor(value, ctx, dtype)
    290   if shape is None:
    291     return t

File ~/.conda/envs/tfgpupip/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:102, in convert_to_eager_tensor(value, ctx, dtype)
    100     dtype = dtypes.as_dtype(dtype).as_datatype_enum
    101 ctx.ensure_initialized()
--> 102 return ops.EagerTensor(value, ctx.device_name, dtype)

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

Hey @Felix_M
Can you refactor your code a little to use tf.data.Dataset? This shall help with memory-related issues.
You can follow this tutorial Build TensorFlow input pipelines to get started if you are not yet familiar with it.
Thank you.

1 Like

That’s what my code is trying to do.

As I mentioned above, it’s the tf.data.Dataset line of code that gives the error.

The TF docs mentions some kind of 2 GB limit on NumPy arrays.

Have you tried reading the file directly into a TF Dataset? There’s this library called TF IO that’s pretty nice: tfio.experimental.IODataset  |  TensorFlow I/O

Looks interesting.

I’ll have a look and see if that’s the way to go.

Thanks.

Cool - let me know if it works.