Calculate Flops in Tensorflow and Pytorch are not equal?

Given the same model, I found that the calculated flops in pytorch and tensorflow are different. I used the keras_flops (keras-flops · PyPI) in tensorflow, and ptflops (ptflops · PyPI) in pytorch to calculate flops. It seems that flops in pytorch are closed to my own calculation by hands. Is that tensorflow has some tricks to speed up the computation so that few flops are measured? My model in tensorflow

d=56
s=12

inp = Input((750 ,750, 1))
x = Conv2D(d, (5,5), padding='same')(inp)
x = PReLU()(x)

x = Conv2D(s, (1,1), padding='valid')(x)
x = PReLU()(x)

x = Conv2D(s, (3,3), padding='same')(x)
x = PReLU()(x)
x = Conv2D(s, (3,3), padding='same')(x)
x = PReLU()(x)
x = Conv2D(s, (3,3), padding='same')(x)
x = PReLU()(x)
x = Conv2D(s, (3,3), padding='same')(x)
x = PReLU()(x)

x = Conv2D(d, (1,1), padding='same')(x)
x = PReLU()(x)
out = Conv2DTranspose(1 ,(9,9), strides=(4, 4),padding='same',output_padding = 3)(x)

My model in tensorflow

node name | # float_ops
Conv2D                   8.92b float_ops (100.00%, 61.95%)
Conv2DBackpropInput      5.10b float_ops (38.05%, 35.44%)
Neg                      180.00m float_ops (2.61%, 1.25%)
BiasAdd                  105.75m float_ops (1.36%, 0.73%)
Mul                      90.00m float_ops (0.63%, 0.63%)

======================End of Report==========================
The FLOPs is:14.3 GFlops

However, the FLops in pytorch is

Model_1(
  0.013 M, 100.000% Params, 45.486 GMac, 100.000% MACs, 
  (begin): Sequential(
    0.002 M, 11.804% Params, 0.851 GMac, 1.870% MACs, 
    (0): Conv2d(0.001 M, 11.367% Params, 0.819 GMac, 1.801% MACs, 1, 56, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): PReLU(0.0 M, 0.437% Params, 0.032 GMac, 0.069% MACs, num_parameters=56)
  )
  (middle): Sequential(
    0.007 M, 52.775% Params, 3.803 GMac, 8.360% MACs, 
    (0): Conv2d(0.001 M, 5.340% Params, 0.385 GMac, 0.846% MACs, 56, 12, kernel_size=(1, 1), stride=(1, 1))
    (1): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
    (2): Conv2d(0.001 M, 10.212% Params, 0.736 GMac, 1.618% MACs, 12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
    (4): Conv2d(0.001 M, 10.212% Params, 0.736 GMac, 1.618% MACs, 12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
    (6): Conv2d(0.001 M, 10.212% Params, 0.736 GMac, 1.618% MACs, 12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
    (8): Conv2d(0.001 M, 10.212% Params, 0.736 GMac, 1.618% MACs, 12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
    (10): Conv2d(0.001 M, 5.684% Params, 0.409 GMac, 0.900% MACs, 12, 56, kernel_size=(1, 1), stride=(1, 1))
    (11): PReLU(0.0 M, 0.437% Params, 0.032 GMac, 0.069% MACs, num_parameters=56)
  )
  (final): ConvTranspose2d(0.005 M, 35.420% Params, 40.833 GMac, 89.770% MACs, 56, 1, kernel_size=(9, 9), stride=(4, 4), padding=(4, 4), output_padding=(3, 3))
)
Computational complexity:       45.49 GMac
1 Like

Can anyone help plz?

1 Like

@Bhack @seanpmorgan any concrete leads here?

2 Likes

Having an official tool It is a quite popular features request in TF.

In the meantime I think that you need to interact with these two third party projects repos about their specific implementations.

2 Likes

Thanks for reply.
I think the keras-flops · PyPI is call the official tools’ function.
Is that tensorflow has some tricks to speed up the computation so that few flops are measured?

1 Like

I don’t know ptflops internals and It is using the GMacs metric.

You could try to give a run with FB semi-official flops tool like:

2 Likes

@markdaoust @jbgordon this thread leads me to request for a thorough tutorial on reporting of FLOPs, and similar metrics.

1 Like

There’s too much going on in the initial post. Start by comparing individual layers, not whole models. That will make things easier to untangle.

My first impression is that you’re not measuring the same thing. Do we know why in the PT model 90% of the GMac comes from the final ConvTrtanspose2d layer, but that’s not listed for tensorflow?

“MAC” is “multiply-add-calculations”. The Conv2 layers are 9 Gflops (TF) or ~4.5 GMac (PT). 2:1 is the exchange rate. So that part makes sense.

2 Likes

Thanks for the clarification.
Yes the deconvolution is a bit weird.
I tried to calculate myself as follow
The flops for deconvolution is:
Cout * (1+Cin * k * k) * Hout * Wout
= 1 * (1+56 * 9 * 9) * 3000 * 3000
= 40.83 GFlops.

This value is closed to the pytorch calculated flops, but different to tensorflow did.

2 Likes

thanks, this will avoid endless and bottomless debates for GPUs

1 Like

but if you look at softmax activation function. It contains the calculations for e to the power x.
So, that will be counted as FLOPS not MACs.
My understanding is one cannot divide FLOPS/2 to get MACs.
Please correct me if I am wrong

1 Like

Yes I agree with you.
2FLops = MACs is a approximate estimation.
There will be a different depending on the model itself.

1 Like

Does anyone think there is a miscalculation in my equation, since the tensorflow count_flops reporting a different answer?
Thanks

1 Like

thin, the return of the revenge of “how one calculates a Flop”, I thought this debate exceeded with the tensor cores (it is true, the GPUs in question are missing), we are not in 1995 to wonder about the power real Cray T3E -1200

1 Like