Hi everyone. My first post here

So I’ve found a way to run my inference using my 2 GPUs, but this takes as much time as if I was running it on 1. I am quite new to tf / multiGPUs, so yeh, I reckon I need some help to:

- understand whether I am going to the right direction
- possibly help me in improving what’s been done so far

My inputs are 3D volumes, reshaped as (16, 16, 16, 64, 64, 64), then here is what comes:

```
#multiGPU try 1
model_path = pathLogDir+'/'+folder_name+ "/"+weights_name
json_file = open(model_path+".json", 'r')
#json_file = open(newpath+'model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
loaded_model.load_weights(model_path+".h5")
new_patches = np.reshape(patches, (-1, number_patchify, number_patchify, number_patchify))
print(new_patches.shape)
strategy = tf.distribute.MirroredStrategy()
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))
outputs = []
new_dimension = 3
print("patches shape: " + str(patches.shape))
dataset = tf.data.Dataset.from_tensor_slices(new_patches).batch(4)
distributed_dataset = strategy.experimental_distribute_dataset(dataset)
print("number of batches: " + str(tf.data.experimental.cardinality(dataset).numpy()))
@tf.function
def inference_step(inputs):
# Run the forward pass on the model
predicted = strategy.run(loaded_model, args=(inputs,))
return strategy.reduce(tf.distribute.ReduceOp.MEAN, predicted, axis=None)
total_time = 0.0
begin = time.time()
for batch in distributed_dataset:
tensors_tuple = batch.values
batch_array = tf.identity(tensors_tuple)
batch_array = np.reshape(batch_array, (-1, number_patchify, number_patchify, number_patchify))
input_array_with_new_dim = batch_array[..., np.newaxis]
output_array = np.repeat(input_array_with_new_dim, 3, axis=-1)
predicted = inference_step(output_array.astype(np.float32))
predicted = predicted >= 0.5
outputs.append(predicted)
total_time += time.time() - begin
print('Total time on 2 GPUs: ', total_time)
#outputs = np.concatenate(outputs)
outputs = np.array(outputs)
# Reshape the concatenated outputs to match the original image
print(outputs.shape)
outputs_reshaped = np.reshape(outputs, (patches.shape[0],patches.shape[1],patches.shape[2],number_patchify,number_patchify,number_patchify))
print(outputs_reshaped.shape)
```

Predictions are working, I get my volumes predicted, and both mu GPUs are used, but it does take as much time as if 1 was used. Also, my GPUs are working at like… 40,50% top (this highly depends on the bach size of course)? I am really reaching the end of my knowledge here, and can’t seem to find anything online after weeks or searching.

thanks in advance everyone