tf.data.Dastaset.from_generator has error when using np.array : 'charmap' codec cannot encode character

The following code has the following error. I believe it is related to the tf.data.Dataset, the error is like this:

from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, BatchNormalization,Flatten
from keras.optimizers import adam
import tensorflow as tf
import numpy as np

a = np.array([[1,2,3,4,4,5,61,2,3,4,4,5,6],[1,2,3,4,4,5,61,2,3,4,4,5,6],[1,2,3,4,4,5,61,2,3,4,4,5,6],[1,2,3,4,4,5,61,2,3,4,4,5,6]])
y = np.array([[1],[1],[1],[1]])
model = Sequential([
   
    Dense(20, activation="relu"),
    Dense(100, activation="relu"),
    Dense(1, activation="sigmoid")
])

print(np.shape(a))

def generator_aa(a, y, batch_size):
    while True:
        indices = np.random.permutation(len(a))
        for i in range(0, len(indices), batch_size):
            batch_indices = indices[i:i+batch_size]
            yield a[batch_indices], y[batch_indices]



my_opt = adam(learning_rate=0.01)
model.compile(loss='binary_crossentropy', optimizer=my_opt, metrics=['accuracy'])

model.fit(dataset, epochs=3)

not sure whyyyyyy

Hi @daisydaisy, If you are performing any string encoding during data pre processing could please encode them using ‘utf-8’. If that is not the case could you please provide complete stand alone code to reproduce the issue. Thank You.

thank you very much, i used standalone code and had this error . Here is the code. tf version is 2.9 and using A100 to train the model, there might be some restrictions added in the system, like not allowing mixed precision etc. Thank you again!

from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, BatchNormalization,Flatten
from keras.optimizers import adam
import tensorflow as tf
import numpy as np

a = np.array([[1,2,3,4,4,5,61,2,3,4,4,5,6],[1,2,3,4,4,5,61,2,3,4,4,5,6],[1,2,3,4,4,5,61,2,3,4,4,5,6],[1,2,3,4,4,5,61,2,3,4,4,5,6]])
y = np.array([[1],[1],[1],[1]])
model = Sequential([
   
    Dense(20, activation="relu"),
    Dense(100, activation="relu"),
    Dense(1, activation="sigmoid")
])

print(np.shape(a))

def generator_aa(a, y, batch_size):
    while True:
        indices = np.random.permutation(len(a))
        for i in range(0, len(indices), batch_size):
            batch_indices = indices[i:i+batch_size]
            yield a[batch_indices], y[batch_indices]



my_opt = adam(learning_rate=0.01)
model.compile(loss='binary_crossentropy', optimizer=my_opt, metrics=['accuracy'])

dataset = tf.data.Dataset.from_generator(generator, args=(a,y,2),     
                                      output_types=('float32', 'float32'),
                                    )
        
model.fit(dataset, epochs=3)

Hi @daisydaisy, I have used output_signature instead of output_types in the from_generator it works fine

dataset = tf.data.Dataset.from_generator(
    generator_aa,
    args=(a, y, batch_size),
    output_signature=(
        tf.TensorSpec(shape=(None, 13), dtype=tf.float32),
        tf.TensorSpec(shape=(None, 1), dtype=tf.float32)
    )
)

But this will run the code in an infinite loop. so i have added steps_per_epoch in model.fit to over come this.

steps_per_epoch = len(a) // batch_size
model.fit(dataset, epochs=3,steps_per_epoch=steps_per_epoch)

Please refer to this gist for working code example. Thank You.

Thank you very much!