Model only works with run_functions_eagerly(True)

rwo · May 4, 2022, 2:35am

I have a subclass model which trains fine with model.fit when using tf.config.experimental_run_functions_eagerly(True). When i try to run the model without the eager mode i get the following error.

Exception has occurred: ValueError
in user code:

    File "/home/rwo/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "/home/rwo/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/rwo/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "/home/rwo/.local/lib/python3.10/site-packages/keras/engine/training.py", line 859, in train_step
        y_pred = self(x, training=True)
    File "/home/rwo/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None

    ValueError: Exception encountered when calling layer "learned_transformation" (type LearnedTransformation).
    
    in user code:
    
        File "/home/rwo/Documents/Masterstudium/Studium/Masterarbeit/src/models/learned_transformation.py", line 115, in call  *
            microphone_channels = self.expand_reshape(inputs[0])
        File "/home/rwo/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler  **
            raise e.with_traceback(filtered_tb) from None
    
        ValueError: Exception encountered when calling layer "reshape" (type Reshape).
        
        as_list() is not defined on an unknown TensorShape.
        
        Call arguments received:
          • inputs=tf.Tensor(shape=<unknown>, dtype=float32)
    
    
    Call arguments received:
      • inputs=('tf.Tensor(shape=<unknown>, dtype=float32)', 'tf.Tensor(shape=<unknown>, dtype=float32)')
      • training=True

I use a tf.data dataset as input. The model has multiple inputs hence the indexed access. I have a minimal different model which trains with and without eager mode on the same dataset. Can someone give me a hint why the Tensorshape might be unknown? The other model seems to be able to infer the tensorshape from the dataset. Can someone give me a hint if this problem is caused by the dataset or the model?

Bhack · May 6, 2022, 12:17pm

Are you using tf.py_func?

rwo · May 7, 2022, 7:49pm

I don’t use any pyfunc’s. I just use standard Keras layers in my model.

Bhack · May 8, 2022, 12:45pm

Do you have a minimized Colab to reproduce your problema?

rwo · May 10, 2022, 10:50am

It took me a while to produce the minimal example. This is the call function of the subclass model. The model works fine in graph mode, if i only return u or v. When i add the Add() Layer and retun z i get the "
as_list() is not defined" error. The model works fine with the Add Layer in eager mode.

def call(self, inputs, training=False):
        u = inputs[0]
        v = inputs[1]
        z = Add()((u, v))
return z

My dataset is a bit complicated, i will try to provide minimal example for it in a later post. The shape of the dataset is a tuple of two [16, 10000] tensors. (two sequences of audio data with batch size 16). The input dataset is generated by zipping two datasets containing audio sequences.

    input_dataset = tf.data.Dataset.zip(tuple(input_datasets))

One exemplary input batch:

(<tf.Tensor: shape=(16, 53212), dtype=float32, numpy=
array([[ 0.00057983,  0.00085449, -0.00036621, ...,  0.0032959 ,
        -0.00643921,  0.00299072],
       [-0.04324341, -0.02734375, -0.03860474, ...,  0.01126099,
         0.00628662, -0.0010376 ],
       [ 0.00302124,  0.00244141,  0.00268555, ..., -0.02523804,
        -0.07363892, -0.04864502],
       ...,
       [ 0.00521851,  0.00869751,  0.00906372, ..., -0.00387573,
        -0.0043335 , -0.00323486],
       [-0.0211792 , -0.02825928, -0.03375244, ...,  0.00512695,
         0.00213623, -0.00305176],
       [ 0.00387573,  0.00442505,  0.00476074, ...,  0.1361084 ,
         0.12088013,  0.09637451]], dtype=float32)>, <tf.Tensor: shape=(16, 53212), dtype=float32, numpy=
array([[-0.00021362,  0.00048828,  0.00076294, ...,  0.00076294,
         0.00640869,  0.00418091],
       [-0.05352783, -0.0640564 , -0.06536865, ..., -0.0123291 ,
        -0.00335693,  0.00466919],
       [ 0.00326538,  0.00332642,  0.00299072, ...,  0.00942993,
         0.01193237,  0.01638794],
       ...,
       [-0.00482178, -0.00405884, -0.00375366, ...,  0.06103516,
         0.06112671,  0.05459595],
       [ 0.2133789 ,  0.18325806,  0.15246582, ...,  0.02453613,
         0.00723267, -0.00280762],
       [-0.09811401, -0.07116699, -0.0411377 , ..., -0.00030518,
        -0.00088501, -0.00210571]], dtype=float32)>)

Bhack · May 10, 2022, 11:21am

Thanks. I suppose you can replicate this with a random/constant input to be more general.
If you can share a self-contained minimized Colab/gist it could help.

More we are near to a test like example faster we can debug and isolate the problem. I can understand it may require user side work but it is the best thing to do as the extra code create only more noise and readability overhead to the debug/support activity.

rwo · May 10, 2022, 2:06pm

I created a colab with synthetic test tensors. The colab example works fine, but if i replace the dataset with the real dataset the error appears.

https://colab.research.google.com/drive/1ShC3uL0MJTUsMWIWXDP0wLl2InRJ-o0M?usp=sharing

So the error must be in the dataset (even one model works in graph mode with this dataset?) I am using input audio files of unknown length which i frame to fixed length segments. Following i unbatch the segments of multiple files to one continuous dimension. Now i batch the shuffled segments into batches of 16 segments. Following you find my code. The entry is the build_dataset function.

How can i make this process conform with graph execution? Which dimension is Tensorflow missing and how can i give it a hint?

import tensorflow as tf
import pathlib


def process_path(file_path):
    # load audio
    raw_audio = tf.io.read_file(file_path)
    audio, sample_rate_in = tf.audio.decode_wav(raw_audio)
    audio = tf.squeeze(audio)
    return audio


def frame_audio(audio, microbatch_size):
    frames = tf.signal.frame(audio, microbatch_size, microbatch_size)
    frames = tf.ensure_shape(frames, [None, microbatch_size])
    return frames


def process_dataset_folder(datasetpath, subfolder, microbatch_size):
    # get files in folder
    file_root = pathlib.Path(datasetpath)
    dataset = tf.data.Dataset.list_files(
        str(file_root / subfolder / "*"), shuffle=False
    )
    # convert file paths to audio tensors
    dataset = dataset.map(process_path, num_parallel_calls=tf.data.AUTOTUNE)
    if microbatch_size > 0:
        dataset = dataset.map(lambda x: frame_audio(x, microbatch_size))
        return dataset.unbatch()
    else:
        return dataset


def build_dataset(dataset_path, inputs, outputs, microbatch_size=0):
    # create single dataset for all inputs
    input_datasets = [
        process_dataset_folder(dataset_path, input, microbatch_size) for input in inputs
    ]
    # combine input datasets to one dataset
    input_dataset = tf.data.Dataset.zip(tuple(input_datasets))
    if outputs is not None:
        output_datasets = [
            process_dataset_folder(dataset_path, output, microbatch_size)
            for output in outputs
        ]
        output_dataset = tf.data.Dataset.zip(tuple(output_datasets))
        return tf.data.Dataset.zip((input_dataset, output_dataset))
    else:
        return input_dataset

Bhack · May 10, 2022, 2:31pm

Have you tried to organize your audio dataset like in tfio.audio.AudioIODataset derived class:

github.com

tensorflow/io/blob/v0.24.0/tensorflow_io/python/ops/audio_ops.py#L750-L788

      
        
            class AudioIODataset(tf.data.Dataset):
                """AudioIODataset"""
            
            
    def __init__(self, filename, dtype=None):
                    """AudioIODataset."""
                    with tf.name_scope("AudioIODataset"):
                        if not tf.executing_eagerly():
                            assert dtype is not None, "dtype must be provided in graph mode"
                        resource = core_ops.io_audio_readable_init(filename)
                        if tf.executing_eagerly():
                            shape, dtype, _ = core_ops.io_audio_readable_spec(resource)
                            dtype = tf.as_dtype(dtype.numpy())
                        else:
                            shape, _, _ = core_ops.io_audio_readable_spec(resource)
            
            
            capacity = 1024  # kwargs.get("capacity", 4096)
            
            
            self._resource = resource
                        dataset = tf.data.Dataset.range(0, shape[0], capacity)
                        dataset = dataset.map(

This file has been truncated. show original

keras-team/keras/blob/master/keras/utils/audio_dataset.py#L34-L35

      
        
            @keras_export("keras.utils.audio_dataset_from_directory", v1=[])
            def audio_dataset_from_directory(

rwo · May 10, 2022, 3:23pm

It seems like the problem is solved for me by adding a cast to tf.float32 (inspired by the statement : "assert dtype is not None, “dtype must be provided in graph mode” from the AudioIODataset class). I will test the proposed solution a bit more and mark the thread as solved if successful. Thanks for your help Bhack.

def process_path(file_path):
    # load audio
    raw_audio = tf.io.read_file(file_path)
    audio, sample_rate_in = tf.audio.decode_wav(raw_audio)
    audio = tf.squeeze(audio)
    audio = tf.cast(audio, tf.float32)
    return audio