Hi,
I’m trying to train a spaghettinet model on 640x640 images. The pipeline.config file has these lines in it:
fixed_shape_resizer {
height: 320
width: 320
}
}
I’m not completely sure what the fixed_shape_resizer
flag does, but its probably important here.
When I build a dataset of images that are exactly 320x320, the model trains as expected.
However, whenever I use my dataset that is strictly 640x640 images, the model trains correctly up until the step hits the graph_rewriter
delay option. Once it hits this delay, the loss shoots up massively and eventually results in a NaN error.
If I remove the graph_rewriter
it runs through training fine. I need to convert to tflite though
Here is a screenshot of the loss spike at step 500(this is the the delay for the graph_rewriter
).
If anybody has an idea of what could be going on here, I would greatly appreciate some direction!
Thanks