Best way to choose steps_per_execution?

I have a few questions about the steps_per_execution argument in the Keras compile method:

  1. Why should this argument not always be set to a very high number?
  2. What impact does setting steps_per_execution to a high number have on memory, CPU, and device resource utilization?
  3. Are there any concerns about model accuracy when using a very high steps_per_execution, or will models with different steps_per_execution values always converge to the same metrics? (In contrast, very large batch sizes can negatively impact model performance, as discussed in this discussion and paper.)
  4. For distributed strategies such as TPUStrategy, is there any concern about setting a very large steps_per_execution? When do the gradient all-reduces happen across pod devices when using large steps_per_execution values? Does the optimizer.apply_gradients behavior change with large steps_per_execution values?