Hi,
Yes, features has the same number of columns for both datasets. When I execute:
train = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(4*128).batch(128)
validate = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(128)
I am getting the errors:
2022-12-01 12:44:29.498244: W tensorflow/tsl/framework/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.43GiB (rounded to 1536092928)requested by op _EagerConst
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2022-12-01 12:44:29.498311: I tensorflow/tsl/framework/bfc_allocator.cc:1034] BFCAllocator dump for GPU_0_bfc
2022-12-01 12:44:29.498328: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (256): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498339: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498350: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (1024): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498359: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498369: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498379: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498388: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498398: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498407: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498417: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498426: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498435: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498445: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498454: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498464: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498473: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498482: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498492: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498501: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498511: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498520: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (268435456): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-01 12:44:29.498533: I tensorflow/tsl/framework/bfc_allocator.cc:1057] Bin for 1.43GiB was 256.00MiB, Chunk State:
2022-12-01 12:44:29.498541: I tensorflow/tsl/framework/bfc_allocator.cc:1095] Summary of in-use Chunks by size:
2022-12-01 12:44:29.498549: I tensorflow/tsl/framework/bfc_allocator.cc:1102] Sum Total of in-use chunks: 0B
2022-12-01 12:44:29.498557: I tensorflow/tsl/framework/bfc_allocator.cc:1104] total_region_allocated_bytes_: 0 memory_limit_: 924254208 available bytes: 924254208 curr_region_allocation_bytes_: 924254208
2022-12-01 12:44:29.498580: I tensorflow/tsl/framework/bfc_allocator.cc:1110] Stats:
Limit: 924254208
InUse: 0
MaxInUse: 0
NumAllocs: 0
MaxAllocSize: 0
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2022-12-01 12:44:29.498599: W tensorflow/tsl/framework/bfc_allocator.cc:492] <allocator contains no memory>
So it still seems that TensorFlow tries to load the complete tensor onto the GPU instead of a batch.
Any suggestions?
GW