Hi all, I just wanted to show my appreciation for the PluggableDevices implementation. I think it was a good compromise between expanding availability of GPU acceleration and not touching the core CUDA kernels, which would probably need a complete re-development.
In particular, I have been using the MacOS/Metal implementation and liking it very much. One question I have to take this one step further is, what are some guidelines on memory usage the experts could share? For example, in my setting I have an 8GB AMD Radeon pro 5500. When Iâm setting buffer size for training my TF models, is there a rule or thumb or any other rough guideline on how I could get the most bang for the buck (in other words, more GPU acceleration for the CPU workload that sending / fetching data to and from the GPU entails).
Many thanks,
Doug