In the process of preprocessing data and creating a dataset in the tensor, data is uploaded to gpu memory when data is handled in tf form.
In this case, the data in the gpu is used, but why are bottlenecks occurring, and low gputill is found?
In addition, training the model with all the data already up in the gpu (automatically up when converted to tf format) results in Out of Memory.
In this case, I think that it is only necessary to use the data already allocated to the gpu even if the cpu does not prepare for the deployment. Also, I wonder why out of memory occurs when I’m only calculating the memory that’s already on gpu!! Is there anyone who knows??
Hi @wonjun_choi
Welcome to the TensorFlow Forum!
Could you please share the reproducible code to replicate the error along with GPU capacity details in your system and used dataset shape to understand the issue. Thank you.
this is my gpu capacity
NVIDIA-SMI 470.223.02 Driver Version: 470.223.02 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:01:00.0 Off | N/A |
| 0% 47C P8 31W / 250W | 5MiB / 7982MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce … Off | 00000000:02:00.0 Off | N/A |
| 0% 53C P8 20W / 250W | 5MiB / 7982MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 586664 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 586664 G /usr/lib/xorg/Xorg 4MiB |
±----------------------------------------------------------------------------+
https://github.com/wonjunchoi-arc/transformer_xl/blob/main/basic.ipynb
run basic.ipynb
If you specify the data path ‘/home/jun/transform/workspace company_xl/data/wiki_short/train.txt’, it will be executed. If there is an error, please do pip install transformers
Please provide some more details like which Tensorflow , python version and System OS you are using. Also verify that you have setup GPU correctly by checking the Hardware/Software requirements mentioned in this TF install official link and installed correct version of CUDA and cuDNN compatible to the installed Tensorflow version as per this tested build configuration. Thank you.
This post was flagged by the community and is temporarily hidden.