Using MultiHeadAttention in custom layers

The MultiHeadAttention documentation states:

When using MultiHeadAttention inside a custom layer, the custom layer must implement its own build() method and call MultiHeadAttention 's _build_from_signature() there.

Is this guidance up to date? I don’t see this advice implemented in any of the examples using this layer in the Tensorflow documentation, like this one.

If this is up to date, can anyone share an example? The signature of the build method is def build(self, input_shape), whereas we have def _build_from_signature(self, query, value, key=None) for the multi-head attention layer. How do I get values for query, value, and key, from input_shape?

Interesting. It’s possible that the examples skipping this step didn’t use the keras feature that required _build_from_signature. It could be keras saving that requires this and that tutorial only does a tf.saved_model export.

@bischof may be able to provide more information here.