Training a 320x320 spaghettinet model on 640x640 images

Isaac_Padberg · June 23, 2022, 7:14pm

Hi,

I’m trying to train a spaghettinet model on 640x640 images. The pipeline.config file has these lines in it:

      fixed_shape_resizer {
        height: 320
        width: 320
      }
    }

I’m not completely sure what the fixed_shape_resizer flag does, but its probably important here.

When I build a dataset of images that are exactly 320x320, the model trains as expected.
However, whenever I use my dataset that is strictly 640x640 images, the model trains correctly up until the step hits the graph_rewriter delay option. Once it hits this delay, the loss shoots up massively and eventually results in a NaN error.

If I remove the graph_rewriter it runs through training fine. I need to convert to tflite though

Here is a screenshot of the loss spike at step 500(this is the the delay for the graph_rewriter).

If anybody has an idea of what could be going on here, I would greatly appreciate some direction!

Thanks

lgusm · June 27, 2022, 9:55am

HI Isaac, can you also please add the tutorial, git or code sample that you are using?