Using numpy function in dataset map function

So I’m calling my map function like this:

test_dataset = test_dataset.map(lambda x: tf.py_function(func=split,
          inp=[x]))

But i still get the error:

AttributeError: 'Tensor' object has no attribute 'numpy'

My split function looks like this:

def split(data):
    # split off intervention
    splitteded = tf.split(data, [6, 1], axis=0)

    # split into input and target image
    splitted = tf.split(splitteded[0], 2, axis=1)

    # get intervention over full window
    sum = tf.reduce_sum(splitteded[1])
    average_intervention = sum.numpy() / np.shape(splitteded[1])[1]

    # get average heartrate over full window
    peaks, _ = find_peaks([i[0].numpy() for i in tf.split(data, [1, 1, 5], axis=0)[1][0]])
    if peaks[0] == 0:
        peaks = peaks[1:-1]
    start = peaks[0]
    end = peaks[-1]
    average_heartrate = len(peaks) / ((end - start) / 50)
    return (splitted[0], splitted[1], [average_intervention, average_heartrate])

Everything I found relating to this says to use tf.py_function since eager execution does not work otherwise inside the dataset map function. But it is still not working for me. I feel I’m missing something very obvious.
Any help is greatly appreciated!

1 Like

I think You can achieve that with the tf.py_func function, or tf.py_function (which is the newer version). It does exactly what you want, it will wrap your numpy function that operates on arrays in a tensorflow operation that you can include as part of your dataset graph.

yes i also same error

Hi @jkummert , As per the official documentation in tf.py_function it is required to pass the Tout argument also. I have reproduced the issue similar to your code and I didn’t get any error. Please refer to this gist. Thank You.

Adding the tout gives the same error. Could this be related to running tf on cpu instead of gpu?

Hi,

I got the same issue and I’ve tried using wrapper functions. But it didn’t work. But I found another solution by myself.

This probelm caused because the “data” is a tensor and when you first time implement the dataset.map() function, the “sum” is a tensor having nothing and of course it dosen’t have “.numpy”.

However, tensorflow allows such calculations if and only if all the calulations are done within the tensorflow framework. Maybe you should try changing “sum.numpy()/np.shape()” into "sum/tf.constant(shape1, shape2, etc…)