When creating large models (couple thousands nodes) in graph mode, initializing the metrics can take a very long time. The following toy example takes ~30 seconds on my machine (TF 2.6) to start training:

```
import tensorflow as tf
import numpy as np
from tensorflow.python.keras import backend as K
with K.get_session() as sess:
print("DEF")
model = tf.keras.Sequential(
[tf.keras.layers.Dense(1) for _ in 500]
)
print("METRICS")
metrics = [tf.keras.metrics.Accuracy(str(i)) for i in range(100)]
print("COMPILE")
model.compile(loss="mse", metrics=metrics, run_eagerly=False)
x, y = np.zeros((2, 1000), dtype=np.float32)
print("FIT")
model.fit(x=x, y=y)
```

Most of the startup time is spend in this loop initializing the metrics.

In the actual model I am currently investigating, startup takes ~20 minutes since it’s quite a large model with data loading included in the graph and ~400 metrics. The latter is due to having 4 per-class metrics for ~100 classes. This time quadruples when adding another GPU with `MirroredStrategy`

. What could I do to improve startup time in this case? So far, I’ve tried:

- running in eager mode, which works fine on a single GPU, but scaling out is going to be more challenging
- Creating one metric-class for all classes so that I only need to register 4 metrics. But it doesn’t seem to be possible for metrics to return arrays.