We work on a research project, we want to adapt CNN models for embedded systems.
Our model is in keras and contains Conv2DTranspose operations.
One of our solution is to implement a complete model with CUDNN yo try to optimize and reduce memory usage.
We hare having difficulty implementing and understanding the Conv2DTranspose operation.
Someone know how this operation is implemented on nvidia GPU ?
I try to search on tensorflow source code (on Github) but I don’t find the piece of code that do this…