How to implement LayerDrop in TensorFlow Transformers

I need to implement layer drop in TensorFlow Transformer. Can some one guide me how to do that??

Reference paper: https://arxiv.org/pdf/1909.11556.pdf

Thanks!!

You can take a look at the HugginFace impl in different TF models.

E.g. TFBert

1 Like

Hello @Bhack,

thanks for your reply. I have tried to implement this way, but this implementation is not working when decorating train-step with tf.function(...) since new variables for few layers won’t get formed during 1st step & tf.function(...) doesn’t allow us to form variables in further steps.

Just a side note: This solution is perfectly working in eager mode though.

Do you get an explicit error in graph mode (tf.function)?

yes. I will share that complete message here in a min.

here is the error message:

Traceback (most recent call last):
  File "main.py", line 93, in <module>
    main(args)
  File "main.py", line 82, in main
    verbose="auto",
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py", line 1183, in fit
    tmp_logs = self.train_function(iterator)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 3022, in __call__
    filtered_flat_args) = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 3444, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 3289, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py", line 999, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 672, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py", line 986, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:855 train_function  *
        return step_function(self, iterator)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:845 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:1285 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:2833 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3608 _call_for_each_replica
        return fn(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py:838 run_step  **
        outputs = model.train_step(data)
    /content/gsoc-wav2vec2/src/wav2vec2/modeling.py:236 train_step
        self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:636 apply_gradients
        self._create_all_weights(var_list)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:823 _create_all_weights
        self._create_slots(var_list)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/optimizer_v2/adam.py:124 _create_slots
        self.add_slot(var, 'm')
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:913 add_slot
        initial_value=initial_value)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:262 __call__
        return cls._variable_v2_call(*args, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:256 _variable_v2_call
        shape=shape)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:67 getter
        return captured_getter(captured_previous, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3523 creator
        return next_creator(**kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:67 getter
        return captured_getter(captured_previous, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3523 creator
        return next_creator(**kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:67 getter
        return captured_getter(captured_previous, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py:3523 creator
        return next_creator(**kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/variables.py:67 getter
        return captured_getter(captured_previous, **kwargs)
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py:769 invalid_creator_scope
        "tf.function-decorated function tried to create "

    ValueError: tf.function-decorated function tried to create variables on non-first call.

Probably you could have some impact with:

But I’ve not tested your specific case with this new flag.

1 Like