Autograd fails for reduce_sum on ragged tensor

While using tf.reduce_sum with ragged tensors, I stumbled upon an issue where autograd produces an exception in graph mode. The following code fails:

@tf.function()
def f(x):
    return tf.reduce_sum(x, axis=-1)


def test_autograd():
    values = tf.random.uniform((8,), seed=213)
    sizes = tf.constant([4, 2, 2])
    x = tf.RaggedTensor.from_row_lengths(values, sizes)

    with tf.GradientTape() as tape:
        tape.watch(x.flat_values)
        y = f(x)

    grad = tape.gradient(y, x.flat_values)

If I run test_autograd, I get an error:

self = <tf.Operation 'RaggedReduceSum/UnsortedSegmentSum' type=UnsortedSegmentSum>
name = '_XlaCompile'

    def get_attr(self, name):
      """Returns the value of the attr of this op with the given `name`.
    
      Args:
        name: The name of the attr to fetch.
    
      Returns:
        The value of the attr, as a Python object.
    
      Raises:
        ValueError: If this op does not have an attr with the given `name`.
      """
      fields = ("s", "i", "f", "b", "type", "shape", "tensor", "func")
      try:
        with c_api_util.tf_buffer() as buf:
>         pywrap_tf_session.TF_OperationGetAttrValueProto(self._c_op, name, buf)
E         tensorflow.python.framework.errors_impl.InvalidArgumentError: Operation 'RaggedReduceSum/UnsortedSegmentSum' has no attr named '_XlaCompile'.

../../venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:2328: InvalidArgumentError

During handling of the above exception, another exception occurred:

scope = 'gradients'
op = <tf.Operation 'RaggedReduceSum/UnsortedSegmentSum' type=UnsortedSegmentSum>
func = None
grad_fn = <function _GradientsHelper.<locals>.<lambda> at 0x7f206004f4d0>

    def _MaybeCompile(scope, op, func, grad_fn):
      """Compile the calculation in grad_fn if op was marked as compiled."""
      scope = scope.rstrip("/").replace("/", "_")
      if func is not None:
        xla_compile = func.definition.attr["_XlaCompile"].b
        xla_separate_compiled_gradients = func.definition.attr[
            "_XlaSeparateCompiledGradients"].b
        xla_scope = func.definition.attr["_XlaScope"].s.decode()
      else:
        try:
>         xla_compile = op.get_attr("_XlaCompile")

../../venv/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py:331: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <tf.Operation 'RaggedReduceSum/UnsortedSegmentSum' type=UnsortedSegmentSum>
name = '_XlaCompile'

    def get_attr(self, name):
      """Returns the value of the attr of this op with the given `name`.
    
      Args:
        name: The name of the attr to fetch.
    
      Returns:
        The value of the attr, as a Python object.
    
      Raises:
        ValueError: If this op does not have an attr with the given `name`.
      """
      fields = ("s", "i", "f", "b", "type", "shape", "tensor", "func")
      try:
        with c_api_util.tf_buffer() as buf:
          pywrap_tf_session.TF_OperationGetAttrValueProto(self._c_op, name, buf)
          data = pywrap_tf_session.TF_GetBuffer(buf)
      except errors.InvalidArgumentError as e:
        # Convert to ValueError for backwards compatibility.
>       raise ValueError(str(e))
E       ValueError: Operation 'RaggedReduceSum/UnsortedSegmentSum' has no attr named '_XlaCompile'.

../../venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:2332: ValueError

During handling of the above exception, another exception occurred:

    def test_autograd():
        values = tf.random.uniform((8,), seed=213)
        sizes = tf.constant([4, 2, 2])
        x = tf.RaggedTensor.from_row_lengths(values, sizes)
    
        with tf.GradientTape() as tape:
            tape.watch(x.flat_values)
>           y = f(x)

graphs/tf2_sandwich_model_test.py:170: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py:580: in __call__
    result = self._call(*args, **kwds)
../../venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py:650: in _call
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
../../venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:1665: in _filtered_call
    self.captured_inputs)
../../venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:1751: in _call_flat
    forward_function, args_with_tangents = forward_backward.forward()
../../venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:1477: in forward
    self._inference_args, self._input_tangents)
../../venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:1233: in forward
    self._forward_and_backward_functions(inference_args, input_tangents))
../../venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:1385: in _forward_and_backward_functions
    outputs, inference_args, input_tangents)
../../venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py:943: in _build_functions_for_outputs
    src_graph=self._func_graph)
../../venv/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py:669: in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
../../venv/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py:336: in _MaybeCompile
    return grad_fn()  # Exit early
../../venv/lib/python3.7/site-packages/tensorflow/python/ops/gradients_util.py:669: in <lambda>
    lambda: grad_fn(op, *out_grads))
../../venv/lib/python3.7/site-packages/tensorflow/python/ops/math_grad.py:470: in _UnsortedSegmentSumGrad
    return _GatherDropNegatives(grad, op.inputs[1])[0], None, None
../../venv/lib/python3.7/site-packages/tensorflow/python/ops/math_grad.py:438: in _GatherDropNegatives
    dtype=is_positive_shape.dtype)],
../../venv/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:2967: in ones
    output = _constant_if_small(one, shape, dtype, name)
../../venv/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:2662: in _constant_if_small
    if np.prod(shape) < 1000:
<__array_function__ internals>:6: in prod
    ???
../../venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3031: in prod
    keepdims=keepdims, initial=initial, where=where)
../../venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py:87: in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <tf.Tensor 'gradients/RaggedReduceSum/UnsortedSegmentSum_grad/sub:0' shape=() dtype=int32>

    def __array__(self):
      raise NotImplementedError("Cannot convert a symbolic Tensor ({}) to a numpy"
>                               " array.".format(self.name))
E     NotImplementedError: Cannot convert a symbolic Tensor (gradients/RaggedReduceSum/UnsortedSegmentSum_grad/sub:0) to a numpy array.

../../venv/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:749: NotImplementedError

If I call tf.reduce_sum directly in the test (without going through f), it does work though. How can I avoid this problem?

1 Like

What is your TF version?

1 Like

Tensorflow version is 2.5.0

1 Like

It is working fine for me. Here is the gist.
@bennofs , can you please check again and if possible, provide the gist in collab.

1 Like

Wow I copied the example from the collab to my local python REPL and it prints the same tensorflow version but still fails with the error from the original post. I cannot reproduce this error on collab :confused:

1 Like

It also happens with both a system-wide installed tensorflow and a tensorflow installed into a local venv.
I don’t have any GPUs (except integrated).

1 Like

Can you try to reproduce the error on your machine with docker? As I cannot to reproduce with this command on my local setup:

docker run tensorflow/tensorflow:2.5.0 python -c "
import tensorflow as tf 

@tf.function()
def f(x):
    return tf.reduce_sum(x, axis=-1)


def test_autograd():
    values = tf.random.uniform((8,), seed=213)
    sizes = tf.constant([4, 2, 2])
    x = tf.RaggedTensor.from_row_lengths(values, sizes)

    with tf.GradientTape() as tape:
        tape.watch(x.flat_values)
        y = f(x)

    grad = tape.gradient(y, x.flat_values)
test_autograd()
print('Ok')
"
1 Like

You are missing a call to test_autograd there. But I can confirm that it doesn’t reproduce in docker for me either. I’ll debug further.

1 Like

I’ve updated the example adding the function call. But as it is running ok in Docker you need to investigate your local env/install.

1 Like

I reinstalled tensorflow (with pip -U in the existing venv) and now the error is gone, which is great. Unfortunately, I didn’t make a backup of the venv before doing the pip operation so I cannot debug any further to find the root cause :frowning:

1 Like

Don’t worry it will be for next time.
However, it is always better to double check with Docker in order to have a reproducible environment and rule out env/installation issues in a local setup.

1 Like