U-Net runs very slow

Firestone · August 25, 2023, 3:27am

Hello everyone!

I have written a U-Net with Python, TensorFlow and keras. With this I want to predict CT images from MR images. It’s nothing fancy and the code is very well commented:

import csv
import nibabel as nib
import tensorflow as tf

# U-Net architecture
def UNet_model_2D(img_height, img_width, clr_channels):
    
    inputs = tf.keras.layers.Input( shape=(img_height, img_width, clr_channels) )
    
    # contraction path
    c1 = tf.keras.layers.Conv2D(32, (3,3), activation='relu', kernel_initializer='he_normal', padding='same')(inputs)
    c1 = tf.keras.layers.Dropout(0.1)(c1)
    c1 = tf.keras.layers.Conv2D(32, (3,3), activation='relu', kernel_initializer='he_normal', padding='same')(c1)
    p1 = tf.keras.layers.MaxPooling2D((2,2))(c1)
    
    # ...
    
    # base
    c5 = tf.keras.layers.Conv2D(512, (3,3), activation='relu', kernel_initializer='he_normal', padding='same')(p4)
    c5 = tf.keras.layers.Dropout(0.3)(c5)
    c5 = tf.keras.layers.Conv2D(512, (3,3), activation='relu', kernel_initializer='he_normal', padding='same')(c5)
    
    # expansive path 
    u6 = tf.keras.layers.Conv2DTranspose(256, (2,2), strides=(2,2), padding='same')(c5)
    u6 = tf.keras.layers.concatenate([u6, c4])
    c6 = tf.keras.layers.Conv2D(256, (3,3), activation='relu', kernel_initializer='he_normal', padding='same')(u6)
    c6 = tf.keras.layers.Dropout(0.2)(c6)
    c6 = tf.keras.layers.Conv2D(256, (3,3), activation='relu', kernel_initializer='he_normal', padding='same')(c6)
     
    # ...
    
    outputs = tf.keras.layers.Conv2D(1, (1, 1), activation='linear')(c9) 
    model = tf.keras.Model(inputs=[inputs], outputs=[outputs])
    
    # config model with losses and metrics
    model.compile(optimizer='adam', loss=['MeanAbsoluteError'])

    model.summary()
    
    return model

################################ main program #################################

# load training data concatenated along 0th dimension
X = nib.load('MRIs_train.nii.gz').get_fdata()
Y = nib.load('CTs_train.nii.gz').get_fdata()

# load validation data concatenated along 0th dimension
x_val = nib.load('MRIs_val.nii.gz').get_fdata()
y_val = nib.load('CTs_val.nii.gz').get_fdata()

# files containing the loss values over time will be added to this folder
callbacks = [tf.keras.callbacks.TensorBoard(log_dir='log_survial')]
epochs = 50

# get shape of concatenated MRIs/CTs
s = X.shape

model = UNet_model_2D(s[1], s[2], 1)

# return a History object whose attribute '.history ' is a record of training 
# loss, metrics, validation loss, and validation metrics values
results = model.fit(
    x=X,  # input data
    y=Y,  # ground truth data
    batch_size=16, 
    epochs=epochs,
    verbose=1,
    callbacks=callbacks, 
    validation_data = (x_val, y_val),
    use_multiprocessing=True,
)

tmp = list(results.history.values())

train_loss=tmp[0][:]
val_loss=tmp[1][:]

# write/append csv file
f = open('log_train_loss.csv', 'a')
writer = csv.writer(f)
writer.writerow(train_loss)
f.close()

f = open('log_val_loss.csv', 'a')
writer = csv.writer(f)
writer.writerow(val_loss)
f.close()

model.save('pCT_2D_' + 'ep', save_format='tf')

I started training over 5 hours ago (112 concatenated 3D-MR/CT images as training and 25 concatenated 3D-MR/CT images as validation) and it’s not even through the first of 50 epochs.

One guess is that Spyder is not using the available 32 processor cores and/or not using the Quadro GV100 32 GB GPU. I also have 512 GB of RAM at my disposal.

The only thing I did in the Python code regarding multiprocessing is to set use_multiprocessing=True.

Thanks in advance for ANY support!

Renu_Patel · October 26, 2023, 11:19am

Hi @Firestone

Welcome to the TensorFlow Forum!

Could you please share the complete standalone code along with the sample dataset(if it is shareable) to replicate the error and understand the issues?

As you mentioned, the model is not recognising GPU. Please verify that you have followed all the steps mentioned in this link for GPU setup as per your system OS and installed TensorFlow version and make sure you are selecting the same virtual environment in Spyder to run the code.