Deciphering memory allocation warnings

glestrade · July 30, 2021, 1:34am

Hello! First time actually posting on these forums, I believe. I tend to follow the general advice that you should Google for answers first. At this point, I think this one is specific enough (to my case) to warrant a unique post. I’ll show you the full warning first, then my observations/hypothesis as to what’s going on, then the details of my actual hardware and OS.

I’m getting the following warnings:

2021-07-28 15:45:36.855763: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-07-28 15:45:36.856682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0

2021-07-28 15:45:36.899944: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-28 15:45:41.158493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:

2021-07-28 15:45:41.158534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0

2021-07-28 15:45:41.158545: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N

2021-07-28 15:45:41.158768: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-07-28 15:45:41.159384: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-07-28 15:45:41.159956: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-07-28 15:45:41.160578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6702 MB memory) → physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)

2021-07-28 15:45:41.205455: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 376320000 exceeds 10% of free system memory.

2021-07-28 15:45:41.475303: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 376320000 exceeds 10% of free system memory

2021-07-28 15:45:36.838849: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8

2021-07-28 15:45:36.839143: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-07-28 15:45:36.840230: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-07-28 15:45:36.850568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0

2021-07-28 15:45:36.852247: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: FMA

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2021-07-28 15:45:36.852921: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-07-28 15:45:36.853943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:

pciBusID: 0000:01:00.0 name: GeForce GTX 1070 computeCapability: 6.1

coreClock: 1.683GHz coreCount: 15 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 238.66GiB/s

2021-07-28 15:45:36.854084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-28 15:45:36.855763: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-07-28 15:45:36.856682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0

2021-07-28 15:45:36.899944: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

2021-07-28 15:45:41.158493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:

2021-07-28 15:45:41.158534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0

2021-07-28 15:45:41.158545: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N

2021-07-28 15:45:41.158768: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-07-28 15:45:41.159384: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-07-28 15:45:41.159956: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2021-07-28 15:45:41.160578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6702 MB memory) → physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)

2021-07-28 15:45:41.205455: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 376320000 exceeds 10% of free system memory.

2021-07-28 15:45:41.475303: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 376320000 exceeds 10% of free system memory

Observations and Hypothesis

When I first hit the training loop, I’m pretty sure that it begins fine, runs, compiles, and everything. Since I have a mechanical hard drive, I can hear the head moving inside. I started it at around 4:30 PM yesterday, and let it run whilst I was asleep. When I checked it just before going to sleep, the desktop GUI (I think it’s Gnome) had crashed, but the hard drive was still making noise.

At 7AM, when I woke up, I could no longer hear the hard drive going. However, Jupyter Notebook still showed an asterisk, which is meant to indicate that the cell in question (which contained the training loop), was still running.

All of this makes me think that TF was overcommitting - as is the default behavior on Linux (?), and that for some reason, the OOM killer chose to keep TF running instead of GNOME. But given that I didn’t hear the hard drive going, I don’t think the training loop was still going. That would mean that TF crashed too. (?)

Those aforementioned warning messages are what leads me to think that overcommitting could be a factor here. I’m thinking about getting more RAM. Unfortunately my current motherboard only supports DDR3.

OS and specs
OS: Ubuntu 20.04.2 LTS
Kernel: 5.4.0-80-generic

CPU: AMD A10-6800K APU (4) @ 4.1GHz
GPU: GeForce GTX 1070
RAM: 8 GiB DDR3

Thank you for reading this far! Hope I can get some advice.

glestrade · July 31, 2021, 9:29pm

Update with more information - since posting the above, I’ve installed more ram (now 16 GB). I’ve also run free -m and nvidia-smi, and I can see that significant amounts of memory are being used by TF - both in the GPU and RAM.

I’m not sure why it’s using such memory. I’m using the following command to help ensure that it at least doesn’t allocate all the memory at once:

gpus = tf.config.experimental.list_physical_devices(‘GPU’)
print(gpus)

tf.config.experimental.set_memory_growth(gpus[0], True)

Do the NUMA errors on my original post have any bearing on TF’s ability to use memory efficiently?

erick@erickusb:~$ free -m
total used free shared buff/cache available
Mem: 16033 14287 177 41 1568 1418
Swap: 7381 898 6483
erick@erickusb:~$ nvidia-smi
Sat Jul 31 14:13:48 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
| 36% 53C P2 34W / 151W | 5056MiB / 8116MiB | 5% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1139 G /usr/lib/xorg/Xorg 101MiB |
| 0 N/A N/A 12157 G /usr/lib/xorg/Xorg 187MiB |
| 0 N/A N/A 12274 G /usr/bin/gnome-shell 74MiB |
| 0 N/A N/A 13778 G /usr/lib/firefox/firefox 113MiB |
| 0 N/A N/A 17034 C /usr/bin/python3 4541MiB |
| 0 N/A N/A 18032 G …mviewer/tv_bin/TeamViewer 16MiB |
±----------------------------------------------------------------------------+

A couple of additional details… the TF kernel is now crashing instead of the GUI. Also, what I’m trying to implement is more or less Wang’s algorithm 2 in this paper: https://arxiv.org/pdf/1809.03428.pdf

I’ve also been playing around with batch size and it doesn’t seem to help much.

glestrade · September 19, 2021, 10:02pm

Hey folks, I figured out what was going on in August, and it turns out I wasn’t clearing the Laplace noise samples that I had created, as per the algorithm. Using the same code, and only storing one hundred noise samples at a time, I can prevent this error from occurring. I’ll link to my next question post - if and when it is approved - so you can follow along with the “story” if you are so inclined.

Hope someone can help with my next question. I’m really crash coursing myself in this stuff.

https://tensorflow-prod.ospodiscourse.com/t/custom-loss-function-shapes-of-all-inputs-must-match/4476