Help: model with various input shapes (csv floats and integer sequences)

Hello everyone,

I built a model that gets its input from a csv file. Everything works fine when I use the first four features. But as soon as I add the 5th feature, problems arise. The problem with feature 5 is that each entry is a numpy array(shape=(4,288)) stored as a string in the csv file.

The csv file has the following structure:
Feature1,Feature2,Feature3,Feature4,Labels,Feature5
13.37,33.09,-0.08,992.2,nass,"[[1, 160, 246, 255], … ,[1, 160, 246, 255]]"
26.37,33.03,-0.08,992.2,trocken,"[[110, 160, 246, 255], … ,[20, 160, 246, 255]]"

I use pandas to read the csv file.

CODE:

def build_model(num_features: int, num_classes: int) -> Sequential:

    model = Sequential()

    model.add(Dense(units=100, input_shape=(num_features,))) 

    model.add(Activation("relu"))

    model.add(Dense(units=100, input_shape=(num_features,)))

    model.add(Activation("relu"))

    model.add(Dense(units=num_classes))

    model.add(Activation("softmax"))

    model.summary()

    return model


df = pd.read_csv(r'D:\pyenv\Daten\gps_time_Regenverlauf2.csv', skiprows=[1])

df = df.replace(['trocken','feucht','nass','Wasser steht','Schnee(decke)','Schnee(matsch)','duenne Schneeschicht'],[0,1,1,1,2,2,2])

df = df.dropna()

for i in range(0,999):

    df.iat[i,999] = df.iat[i,999].strip("'")

print("Daten:", df.head())

labels = df.pop('Labels')

labels = np.asarray(labels).astype('float32')

data = np.array(df.loc[:1000, ['Feature1', 'Feature2', 'Feature3', 'Feature4', 'Feature5']])

#data = np.array(df.loc[:1000, ['Feature1', 'Feature2', 'Feature3', 'Feature4']])

x_train, x_test, labels_train, labels_test = train_test_split(data, labels, test_size=0.20, random_state=42)

num_features = 5

num_classes = 3

labels_train = to_categorical(labels_train, num_classes=num_classes, dtype=np.float32)

labels_test = to_categorical(labels_test, num_classes=num_classes, dtype=np.float32)

model = build_model(num_features, num_classes)

model.compile(loss= tf.keras.losses.CategoricalCrossentropy() , optimizer=tf.keras.optimizers.Adam(), metrics=["accuracy"])

model.fit(x_train, labels_train, epochs=50, verbose=1,validation_data=(x_test , labels_test))

model.summary()

model.evaluate(x_test_normalized, labels_test, verbose=2)

x_pred = model.predict(x_test_normalized)

I get the following error.

ERROR:

 dense_2 (Dense)             (None, 3)                 303

 activation2 (Activation)   (None, 3)                 0

=================================================================
Total params: 10,903
Trainable params: 10,903Non-trainable params: 0__
Traceback (most recent call last):
  File "classification_DNN.py", line 165, in <module>
    model.fit(x_train, labels_train, epochs=10, verbose=1,validation_data=(x_test , labels_test))
  File "D:\pyenv.venv\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "D:\pyenv.venv\lib\site-packages\tensorflow\python\framework\constant_op.py", line 102, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

How can I make the model work with feature 5 ?

I was thinking about to reshape feature 5 but then I still have a sequence. I read that for sequences I should use a RNN with lstm but then I don’t know how to combine that with the other input features.

All help will be much appreciated.

Cheers, Mo

Hi @Moritz

Have you tried to convert the list to a numpy array and change the dtype to float32, this worked for me.
you can do it as :

Feature5=np.asarray(Feature5).astype(np.float32)

Hi @Shirshak_Ghatak

thanks for your time. That might be the issue. I’m not sure. I tried different ways to convert the csv string back to an numpy array but it still doesn’t work.

I thought the following line of code would take care of the conversion. Because I only get the Error descriped above if I only use this line:

data = np.array(df.loc[:1000, ['Feature1', 'Feature2', 'Feature3', 'Feature4', 'Feature5']])

But if I try something like this:

I read the csv file with pandas into df and try to remove " ’ " with strip.

for i in range(0,999):
    df.iat[i,10] = df.iat[i,10].strip("'")
    
data = df.pop('Feature5')
data = np.asarray(data).astype('float32') 

I get the following ERROR:

Traceback (most recent call last):
  File "classification_DNN_forum.py", line 78, in <module>
    data = np.asarray(data).astype('float32')
ValueError: could not convert string to float: '[[1, 160, 246, 255], [1, 160, 246, 255], ...

But I’m not sure how to do it properly. Any ideas how to fix that?

Not sure if this is the best way to solve it but you could try the following method:

ar="[[1,2,3,4],[3,4,6,7]]"
emp=[]
i=0
temp=[]
for str in ar:
    if all([str!='[',str!=']',str!=',']):
        i+=1
        num=int(str)
        temp.append(num)
    if i==4:
        emp.append(temp)
        temp=[]
        i=0
emp=np.array(emp).astype('float32')

Then do the same for all the data points

1 Like

Hi @Shirshak_Ghatak

I tried the following

CODE:

for k in range(1,999):
    ar=df.loc[k, ['feature5']].to_string(header=None)
    i=0
    temp=[]
    for str in ar:
        if all([str!='[',str!=']',str!=',']):
            i+=1
            num=int(str)
            temp.append(num)
        if i==4:
            emp.append(temp)
            temp=[]
            i=0
    emp=np.array(emp).astype('float32')

and I get this

ERROR:

classification_DNN_forum.py:91: DeprecationWarning: string or file could not be read to its end due to unmatched data; this will raise a ValueError in the future.
  emp = np.fromstring(ar, dtype='float32', sep=',')
Traceback (most recent call last):
  File "classification_DNN_forum.py", line 98, in <module>
    num=int(str)
ValueError: invalid literal for int() with base 10: 'R'

so I replaced int() with float() and got this

ERROR:

Traceback (most recent call last):
  File "classification_DNN_forum.py", line 98, in <module>
    num=float(str)
ValueError: could not convert string to float: 'R'

Could you find where this ‘R’ is coming in your feature5 as I thought it only consisted of numerical 2D matrices type casted as string :sweat_smile:. Then I could help you further

@Shirshak_Ghatak thanks for your help! I’m sure your solution works too. I don’t know where the ‘R’ comes from maybe from the header but the header shouldn’t be in ar.

This is how I solved it for now:

import ast
data = np.empty([289,4], int)
for i in range(599):
    my_list = ast.literal_eval(df['feature5'][i])
    step = np.array(my_list)
    data = np.vstack([data, step]) #np.append(data, step, axis=0)
data = np.reshape(data,(-1,289,4))

Happy to try and help :grin:

1 Like