Hi there. I’m new to this community. I’m not sure if this topic is proper to raise here. Please forgive me and point me to the correct channel, if not. Many thanks!
- Our project is using tensorflow as backbone and by writing custom op to realize our main functions.
- We usually allocate a lot of temporary/output memories, which is managed by tensorflow.
- However, operating raw data pointer is very dangerous and fallible.
- Does tensorflow provide any ways to help detect memory violation? Like prompting the user that reading/writing outside the memory space of one tensor happens?
- Is there any suggestions to detect or prevent memory violation as early as possible?
Hi @Will_Sun, Could you please take a look at the Profiling document. Profiling helps to understand the hardware resource consumption (like time and memory) of the various TensorFlow operations (ops) in your model. Thank You.
Hi Kiran, thanks for your reply. I will check profile tool to see if it meets my requirement.
The problem I have is more often related with incorrect result instead of memory resource consumption.
Usually, we might have quite a few run-time variables defined in GPU. When memory violation happens by some operations on one variable, it may crash some other variable space and fail sanity check in some point, eventually. Unfortunately, this sanity check failure usually happens quite late and unpredictable.
That’s really painful. I hope the violation can be found just when it happens. One idea is that I can allocate some more space and write some tags there for memory check later. The problem is that memory allocation is totally handled by tensorflow. I don’t know if tensorflow has provided some inherent solutions.