Gpu utilization is not 100%

Hi community, I am doing inference of faster rcnn-resnet, having 24 GB titan rtx GPU, GPU memory utilisation is 100% but GPU utilisation is around 50%, and I am getting around 10 fps and the benchmark of that same model on rtx is 115 FPS can anyone Guide for this? Thank

Have you tried to profile your run with this?

1 Like

Thank you for the response!
I tried the profiling I got error of CUPTI_ERROR_INVALID_PARAMETER
the result I got …

TensorFlow version: 2.2.0
2021-08-20 14:56:13.757732: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-08-20 14:56:13.784613: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2894525000 Hz
2021-08-20 14:56:13.789324: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5599bd185990 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-08-20 14:56:13.789363: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-08-20 14:56:13.790973: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-08-20 14:56:13.866766: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:13.867544: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5599bd1ebf50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-20 14:56:13.867585: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5
2021-08-20 14:56:13.867843: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:13.868562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:4c:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2021-08-20 14:56:13.868755: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-08-20 14:56:13.869682: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-08-20 14:56:13.870820: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-08-20 14:56:13.870976: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-08-20 14:56:13.871992: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-08-20 14:56:13.872571: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-08-20 14:56:13.874770: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-08-20 14:56:13.874879: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:13.875618: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:13.876260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2021-08-20 14:56:13.876283: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-08-20 14:56:13.877311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-20 14:56:13.877322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2021-08-20 14:56:13.877327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2021-08-20 14:56:13.877412: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:13.878107: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:13.878783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 22182 MB memory) → physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:4c:00.0, compute capability: 7.5)
Found GPU at: /device:GPU:0
2021-08-20 14:56:14.067066: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:14.067820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:4c:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2021-08-20 14:56:14.067863: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-08-20 14:56:14.067875: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-08-20 14:56:14.067886: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-08-20 14:56:14.067895: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-08-20 14:56:14.067905: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-08-20 14:56:14.067914: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-08-20 14:56:14.067924: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-08-20 14:56:14.067972: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:14.068670: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:14.069327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2021-08-20 14:56:14.069647: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:14.070312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:4c:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2021-08-20 14:56:14.070329: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-08-20 14:56:14.070339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-08-20 14:56:14.070348: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-08-20 14:56:14.070356: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-08-20 14:56:14.070364: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-08-20 14:56:14.070373: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-08-20 14:56:14.070381: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-08-20 14:56:14.070421: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:14.071106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:14.071754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2021-08-20 14:56:14.071777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-20 14:56:14.071783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2021-08-20 14:56:14.071787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2021-08-20 14:56:14.071853: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:14.072549: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-20 14:56:14.073211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22182 MB memory) → physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:4c:00.0, compute capability: 7.5)
2021-08-20 14:56:14.645568: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session started.
2021-08-20 14:56:14.645613: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1363] Profiler found 1 GPUs
2021-08-20 14:56:14.646103: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcupti.so.10.1
2021-08-20 14:56:14.746402: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1408] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
2021-08-20 14:56:14.746924: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1447] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
2021-08-20 14:56:14.746958: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1430] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI_ERROR_INVALID_PARAMETER
Epoch 1/2
2021-08-20 14:56:15.012248: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
1/469 […] - ETA: 0s - loss: 2.3183 - accuracy: 0. 9/469 […] - ETA: 2s - loss: 1.9426 - accuracy: 0. 17/469 [>…] - ETA: 2s - loss: 1.6191 - accuracy: 0. 25/469 [>…] - ETA: 2s - loss: 1.4059 - accuracy: 0. 33/469 [=>…] - ETA: 2s - loss: 1.2383 - accuracy: 0. 40/469 [=>…] - ETA: 2s - loss: 1.1306 - accuracy: 0. 45/469 [=>…] - ETA: 3s - loss: 1.0640 - accuracy: 0. 54/469 [==>…] - ETA: 2s - loss: 0.9743 - accuracy: 0. 63/469 [===>…] - ETA: 2s - loss: 0.9027 - accuracy: 0. 72/469 [===>…] - ETA: 2s - loss: 0.8438 - accuracy: 0. 81/469 [====>…] - ETA: 2s - loss: 0.7924 - accuracy: 0. 91/469 [====>…] - ETA: 2s - loss: 0.7486 - accuracy: 0.101/469 [=====>…] - ETA: 2s - loss: 0.7098 - accuracy: 0.111/469 [======>…] - ETA: 2s - loss: 0.6810 - accuracy: 0.121/469 [======>…] - ETA: 2s - loss: 0.6526 - accuracy: 0.131/469 [=======>…] - ETA: 2s - loss: 0.6277 - accuracy: 0.140/469 [=======>…] - ETA: 1s - loss: 0.6099 - accuracy: 0.150/469 [========>…] - ETA: 1s - loss: 0.5896 - accuracy: 0.159/469 [=========>…] - ETA: 1s - loss: 0.5712 - accuracy: 0.169/469 [=========>…] - ETA: 1s - loss: 0.5547 - accuracy: 0.179/469 [==========>…] - ETA: 1s - loss: 0.5404 - accuracy: 0.189/469 [===========>…] - ETA: 1s - loss: 0.5276 - accuracy: 0.199/469 [===========>…] - ETA: 1s - loss: 0.5162 - accuracy: 0.209/469 [============>…] - ETA: 1s - loss: 0.5047 - accuracy: 0.218/469 [============>…] - ETA: 1s - loss: 0.4971 - accuracy: 0.228/469 [=============>…] - ETA: 1s - loss: 0.4882 - accuracy: 0.238/469 [==============>…] - ETA: 1s - loss: 0.4804 - accuracy: 0.248/469 [==============>…] - ETA: 1s - loss: 0.4734 - accuracy: 0.258/469 [===============>…] - ETA: 1s - loss: 0.4644 - accuracy: 0.268/469 [================>…] - ETA: 1s - loss: 0.4568 - accuracy: 0.277/469 [================>…] - ETA: 1s - loss: 0.4498 - accuracy: 0.287/469 [=================>…] - ETA: 1s - loss: 0.4437 - accuracy: 0.297/469 [=================>…] - ETA: 0s - loss: 0.4382 - accuracy: 0.307/469 [==================>…] - ETA: 0s - loss: 0.4319 - accuracy: 0.317/469 [===================>…] - ETA: 0s - loss: 0.4260 - accuracy: 0.327/469 [===================>…] - ETA: 0s - loss: 0.4209 - accuracy: 0.337/469 [====================>…] - ETA: 0s - loss: 0.4154 - accuracy: 0.347/469 [=====================>…] - ETA: 0s - loss: 0.4095 - accuracy: 0.357/469 [=====================>…] - ETA: 0s - loss: 0.4050 - accuracy: 0.367/469 [======================>…] - ETA: 0s - loss: 0.4011 - accuracy: 0.377/469 [=======================>…] - ETA: 0s - loss: 0.3973 - accuracy: 0.387/469 [=======================>…] - ETA: 0s - loss: 0.3932 - accuracy: 0.397/469 [========================>…] - ETA: 0s - loss: 0.3887 - accuracy: 0.406/469 [========================>…] - ETA: 0s - loss: 0.3853 - accuracy: 0.415/469 [=========================>…] - ETA: 0s - loss: 0.3822 - accuracy: 0.424/469 [==========================>…] - ETA: 0s - loss: 0.3788 - accuracy: 0.434/469 [==========================>…] - ETA: 0s - loss: 0.3748 - accuracy: 0.444/469 [===========================>…] - ETA: 0s - loss: 0.3712 - accuracy: 0.453/469 [===========================>…] - ETA: 0s - loss: 0.3678 - accuracy: 0.462/469 [============================>.] - ETA: 0s - loss: 0.3654 - accuracy: 0.469/469 [==============================] - 3s 7ms/step - loss: 0.3631 - accuracy: 0.8996 - val_loss: 0.1974 - val_accuracy: 0.9440
Epoch 2/2
1/469 […] - ETA: 0s - loss: 0.1234 - accuracy: 0. 10/469 […] - ETA: 3s - loss: 0.1610 - accuracy: 0. 21/469 [>…] - ETA: 2s - loss: 0.1786 - accuracy: 0.95012021-08-20 14:56:18.617558: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session started.
2021-08-20 14:56:18.617629: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1408] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_NOT_INITIALIZED
2021-08-20 14:56:18.617652: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1447] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI_ERROR_NOT_INITIALIZED
31/469 [>…] - ETA: 2s - loss: 0.1999 - accuracy: 0. 39/469 [=>…] - ETA: 2s - loss: 0.1992 - accuracy: 0. 47/469 [==>…] - ETA: 2s - loss: 0.2012 - accuracy: 0.94152021-08-20 14:56:18.755693: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1430] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI_ERROR_INVALID_PARAMETER
2021-08-20 14:56:18.782747: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:216] GpuTracer has collected 0 callback api events and 0 activity events.
2021-08-20 14:56:18.832614: I tensorflow/core/profiler/rpc/client/save_profile.cc:168] Creating directory: logs/20210820-145614/train/plugins/profile/2021_08_20_14_56_18
2021-08-20 14:56:18.882673: I tensorflow/core/profiler/rpc/client/save_profile.cc:174] Dumped gzipped tool data for trace.json.gz to logs/20210820-145614/train/plugins/profile/2021_08_20_14_56_18/amd-System-Product-Name.trace.json.gz
2021-08-20 14:56:18.897768: I tensorflow/core/profiler/utils/event_span.cc:288] Generation of step-events took 0 ms

2021-08-20 14:56:18.898363: I tensorflow/python/profiler/internal/profiler_wrapper.cc:87] Creating directory: logs/20210820-145614/train/plugins/profile/2021_08_20_14_56_18Dumped tool data for overview_page.pb to logs/20210820-145614/train/plugins/profile/2021_08_20_14_56_18/amd-System-Product-Name.overview_page.pb
Dumped tool data for input_pipeline.pb to logs/20210820-145614/train/plugins/profile/2021_08_20_14_56_18/amd-System-Product-Name.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to logs/20210820-145614/train/plugins/profile/2021_08_20_14_56_18/amd-System-Product-Name.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to logs/20210820-145614/train/plugins/profile/2021_08_20_14_56_18/amd-System-Product-Name.kernel_stats.pb

51/469 [==>…] - ETA: 3s - loss: 0.2018 - accuracy: 0. 61/469 [==>…] - ETA: 3s - loss: 0.2009 - accuracy: 0. 71/469 [===>…] - ETA: 3s - loss: 0.1991 - accuracy: 0. 81/469 [====>…] - ETA: 2s - loss: 0.1969 - accuracy: 0. 91/469 [====>…] - ETA: 2s - loss: 0.1960 - accuracy: 0.101/469 [=====>…] - ETA: 2s - loss: 0.1956 - accuracy: 0.111/469 [======>…] - ETA: 2s - loss: 0.1942 - accuracy: 0.121/469 [======>…] - ETA: 2s - loss: 0.1918 - accuracy: 0.131/469 [=======>…] - ETA: 2s - loss: 0.1902 - accuracy: 0.141/469 [========>…] - ETA: 2s - loss: 0.1896 - accuracy: 0.151/469 [========>…] - ETA: 2s - loss: 0.1875 - accuracy: 0.160/469 [=========>…] - ETA: 1s - loss: 0.1849 - accuracy: 0.169/469 [=========>…] - ETA: 1s - loss: 0.1843 - accuracy: 0.180/469 [==========>…] - ETA: 1s - loss: 0.1841 - accuracy: 0.190/469 [===========>…] - ETA: 1s - loss: 0.1836 - accuracy: 0.200/469 [===========>…] - ETA: 1s - loss: 0.1832 - accuracy: 0.209/469 [============>…] - ETA: 1s - loss: 0.1819 - accuracy: 0.218/469 [============>…] - ETA: 1s - loss: 0.1823 - accuracy: 0.227/469 [=============>…] - ETA: 1s - loss: 0.1813 - accuracy: 0.236/469 [==============>…] - ETA: 1s - loss: 0.1814 - accuracy: 0.246/469 [==============>…] - ETA: 1s - loss: 0.1818 - accuracy: 0.256/469 [===============>…] - ETA: 1s - loss: 0.1800 - accuracy: 0.266/469 [================>…] - ETA: 1s - loss: 0.1796 - accuracy: 0.276/469 [================>…] - ETA: 1s - loss: 0.1779 - accuracy: 0.286/469 [=================>…] - ETA: 1s - loss: 0.1779 - accuracy: 0.296/469 [=================>…] - ETA: 1s - loss: 0.1777 - accuracy: 0.306/469 [==================>…] - ETA: 0s - loss: 0.1767 - accuracy: 0.316/469 [===================>…] - ETA: 0s - loss: 0.1765 - accuracy: 0.326/469 [===================>…] - ETA: 0s - loss: 0.1760 - accuracy: 0.336/469 [====================>…] - ETA: 0s - loss: 0.1748 - accuracy: 0.346/469 [=====================>…] - ETA: 0s - loss: 0.1736 - accuracy: 0.352/469 [=====================>…] - ETA: 0s - loss: 0.1729 - accuracy: 0.363/469 [======================>…] - ETA: 0s - loss: 0.1737 - accuracy: 0.373/469 [======================>…] - ETA: 0s - loss: 0.1736 - accuracy: 0.383/469 [=======================>…] - ETA: 0s - loss: 0.1729 - accuracy: 0.393/469 [========================>…] - ETA: 0s - loss: 0.1723 - accuracy: 0.403/469 [========================>…] - ETA: 0s - loss: 0.1722 - accuracy: 0.413/469 [=========================>…] - ETA: 0s - loss: 0.1719 - accuracy: 0.423/469 [==========================>…] - ETA: 0s - loss: 0.1713 - accuracy: 0.433/469 [==========================>…] - ETA: 0s - loss: 0.1707 - accuracy: 0.443/469 [===========================>…] - ETA: 0s - loss: 0.1699 - accuracy: 0.453/469 [===========================>…] - ETA: 0s - loss: 0.1689 - accuracy: 0.463/469 [============================>.] - ETA: 0s - loss: 0.1685 - accuracy: 0.469/469 [==============================] - 3s 7ms/step - loss: 0.1683 - accuracy: 0.9519 - val_loss: 0.1409 - val_accuracy: 0.9578

when I ran lambda tensoflow benchmark I got this result

| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 TITAN RTX Off | 00000000:4C:00.0 On | N/A |
| 59% 80C P2 255W / 280W | 23328MiB / 24214MiB | 99% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2728 G /usr/lib/xorg/Xorg 131MiB |
| 0 N/A N/A 4240 G compiz 83MiB |
| 0 N/A N/A 4801 G …mviewer/tv_bin/TeamViewer 3MiB |
| 0 N/A N/A 14255 G …AAAAAAAAA= --shared-files 84MiB |
| 0 N/A N/A 61184 G …_16989.log --shared-files 11MiB |
| 0 N/A N/A 64687 C python3 23007MiB |
±----------------------------------------------------------------------------+

Done warm up
Step Img/sec total_loss
1 images/sec: 220.2 +/- 0.0 (jitter = 0.0) 7.317 1629459411
10 images/sec: 219.0 +/- 1.4 (jitter = 1.4) 7.329 1629459415
20 images/sec: 220.0 +/- 0.8 (jitter = 1.3) 7.294 1629459420
30 images/sec: 220.0 +/- 0.6 (jitter = 1.0) 7.262 1629459425
40 images/sec: 218.9 +/- 0.7 (jitter = 1.5) 7.283 1629459431
50 images/sec: 218.7 +/- 0.6 (jitter = 1.7) 7.299 1629459436
60 images/sec: 218.4 +/- 0.6 (jitter = 1.9) 7.308 1629459441
70 images/sec: 218.5 +/- 0.5 (jitter = 1.7) 7.328 1629459446
80 images/sec: 218.3 +/- 0.5 (jitter = 1.6) 7.306 1629459451
90 images/sec: 218.2 +/- 0.5 (jitter = 1.4) 7.256 1629459456
100 images/sec: 218.3 +/- 0.5 (jitter = 1.2) 7.303 1629459461

total images/sec: 218.27

Check TensorBoard callback without profile_batch setting cause Errors: CUPTI_ERROR_INSUFFICIENT_PRIVILEGES and CUPTI_ERROR_INVALID_PARAMETER · Issue #35860 · tensorflow/tensorflow · GitHub

blacklist nvidia-96-updates
blacklist nvidia-450-updates
alias nvidia nvidia_450
alias nvidia-uvm nvidia_450_uvm
alias nvidia-modeset nvidia_450_modeset
alias nvidia-drm nvidia_450_drm
alias nouveau off
alias lbm-nouveau off

options nvidia_450_drm modeset=0
options nvidia “NVreg_RestrictProfilingToAdminUsers=0”

Edited /etc/modprobe.d/nvidia-kernel-common.conf but could not solved the issue

I see in your log also insufficient CUPTi permission. Check the Nvidia docs:

1 Like

in this case I have to install all my requirements in root, I have conda environment.

We don’t offcially support conda env as It is supported third_party in:

https://docs.anaconda.com/anaconda/user-guide/tasks/tensorflow/

my code uses GPU but it is not using 100%, and when I run benchmark - GitHub - lambdal/lambda-tensorflow-benchmark

It gives ,me 218 images / sec processing and this takes 95-99% Gpu utilisation,

As I told you could use the profiler to check the bottlenecks in your code.

If I do a setup with python virtualenv then will this work , because I tried to setup and also I changed my conf file below is my conf file

This file was installed by nvidia-450

Do not edit this file manually

blacklist nouveau
blacklist lbm-nouveau
blacklist nvidia-current
blacklist nvidia-173
blacklist nvidia-96
blacklist nvidia-current-updates
blacklist nvidia-173-updates
blacklist nvidia-96-updates
blacklist nvidia-450-updates
alias nvidia nvidia_450
alias nvidia-uvm nvidia_450_uvm
alias nvidia-modeset nvidia_450_modeset
alias nvidia-drm nvidia_450_drm
alias nouveau off
alias lbm-nouveau off

options nvidia_450_drm modeset=0
options nvidia “NVreg_RestrictProfilingToAdminUsers=0”
options nvidia “NVreg_RestrictProfilingToAdminUsers=1”

but it is still not working

Can you try to profile you code in our GPU image:

1 Like

okay thank you, will be working on this. I really appreciate for the help :grinning_face_with_smiling_eyes:

1 Like