Calculate Flops in Tensorflow and Pytorch are not equal?

Given the same model, I found that the calculated flops in pytorch and tensorflow are different. I used the keras_flops (keras-flops · PyPI) in tensorflow, and ptflops (ptflops · PyPI) in pytorch to calculate flops. It seems that flops in pytorch are closed to my own calculation by hands. Is that tensorflow has some tricks to speed up the computation so that few flops are measured? My model in tensorflow

d=56
s=12

inp = Input((750 ,750, 1))
x = Conv2D(d, (5,5), padding='same')(inp)
x = PReLU()(x)

x = Conv2D(s, (1,1), padding='valid')(x)
x = PReLU()(x)

x = Conv2D(s, (3,3), padding='same')(x)
x = PReLU()(x)
x = Conv2D(s, (3,3), padding='same')(x)
x = PReLU()(x)
x = Conv2D(s, (3,3), padding='same')(x)
x = PReLU()(x)
x = Conv2D(s, (3,3), padding='same')(x)
x = PReLU()(x)

x = Conv2D(d, (1,1), padding='same')(x)
x = PReLU()(x)
out = Conv2DTranspose(1 ,(9,9), strides=(4, 4),padding='same',output_padding = 3)(x)

My model in tensorflow

node name | # float_ops
Conv2D                   8.92b float_ops (100.00%, 61.95%)
Conv2DBackpropInput      5.10b float_ops (38.05%, 35.44%)
Neg                      180.00m float_ops (2.61%, 1.25%)
BiasAdd                  105.75m float_ops (1.36%, 0.73%)
Mul                      90.00m float_ops (0.63%, 0.63%)

======================End of Report==========================
The FLOPs is:14.3 GFlops

However, the FLops in pytorch is

Model_1(
  0.013 M, 100.000% Params, 45.486 GMac, 100.000% MACs, 
  (begin): Sequential(
    0.002 M, 11.804% Params, 0.851 GMac, 1.870% MACs, 
    (0): Conv2d(0.001 M, 11.367% Params, 0.819 GMac, 1.801% MACs, 1, 56, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): PReLU(0.0 M, 0.437% Params, 0.032 GMac, 0.069% MACs, num_parameters=56)
  )
  (middle): Sequential(
    0.007 M, 52.775% Params, 3.803 GMac, 8.360% MACs, 
    (0): Conv2d(0.001 M, 5.340% Params, 0.385 GMac, 0.846% MACs, 56, 12, kernel_size=(1, 1), stride=(1, 1))
    (1): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
    (2): Conv2d(0.001 M, 10.212% Params, 0.736 GMac, 1.618% MACs, 12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
    (4): Conv2d(0.001 M, 10.212% Params, 0.736 GMac, 1.618% MACs, 12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
    (6): Conv2d(0.001 M, 10.212% Params, 0.736 GMac, 1.618% MACs, 12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
    (8): Conv2d(0.001 M, 10.212% Params, 0.736 GMac, 1.618% MACs, 12, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): PReLU(0.0 M, 0.094% Params, 0.007 GMac, 0.015% MACs, num_parameters=12)
    (10): Conv2d(0.001 M, 5.684% Params, 0.409 GMac, 0.900% MACs, 12, 56, kernel_size=(1, 1), stride=(1, 1))
    (11): PReLU(0.0 M, 0.437% Params, 0.032 GMac, 0.069% MACs, num_parameters=56)
  )
  (final): ConvTranspose2d(0.005 M, 35.420% Params, 40.833 GMac, 89.770% MACs, 56, 1, kernel_size=(9, 9), stride=(4, 4), padding=(4, 4), output_padding=(3, 3))
)
Computational complexity:       45.49 GMac
1 Like

Can anyone help plz?

1 Like

@Bhack @seanpmorgan any concrete leads here?

2 Likes

Having an official tool It is a quite popular features request in TF.

In the meantime I think that you need to interact with these two third party projects repos about their specific implementations.

2 Likes

Thanks for reply.
I think the keras-flops · PyPI is call the official tools’ function.
Is that tensorflow has some tricks to speed up the computation so that few flops are measured?

1 Like

I don’t know ptflops internals and It is using the GMacs metric.

You could try to give a run with FB semi-official flops tool like:

2 Likes

@markdaoust @jbgordon this thread leads me to request for a thorough tutorial on reporting of FLOPs, and similar metrics.

1 Like

There’s too much going on in the initial post. Start by comparing individual layers, not whole models. That will make things easier to untangle.

My first impression is that you’re not measuring the same thing. Do we know why in the PT model 90% of the GMac comes from the final ConvTrtanspose2d layer, but that’s not listed for tensorflow?

“MAC” is “multiply-add-calculations”. The Conv2 layers are 9 Gflops (TF) or ~4.5 GMac (PT). 2:1 is the exchange rate. So that part makes sense.

2 Likes

Thanks for the clarification.
Yes the deconvolution is a bit weird.
I tried to calculate myself as follow
The flops for deconvolution is:
Cout * (1+Cin * k * k) * Hout * Wout
= 1 * (1+56 * 9 * 9) * 3000 * 3000
= 40.83 GFlops.

This value is closed to the pytorch calculated flops, but different to tensorflow did.

2 Likes

thanks, this will avoid endless and bottomless debates for GPUs

1 Like

but if you look at softmax activation function. It contains the calculations for e to the power x.
So, that will be counted as FLOPS not MACs.
My understanding is one cannot divide FLOPS/2 to get MACs.
Please correct me if I am wrong

1 Like

Yes I agree with you.
2FLops = MACs is a approximate estimation.
There will be a different depending on the model itself.

1 Like

Does anyone think there is a miscalculation in my equation, since the tensorflow count_flops reporting a different answer?
Thanks

1 Like

thin, the return of the revenge of “how one calculates a Flop”, I thought this debate exceeded with the tensor cores (it is true, the GPUs in question are missing), we are not in 1995 to wonder about the power real Cray T3E -1200

1 Like

Hello @mrgreen3325. I do not know if it is still a problem. However, tensorflow computes FLOPs, while tools that compute FLOPs for pytorch actually calculate MACs.
According to the information provided in github issue, you can compute MACs and FLOPs in tensorflow using the following snippet code:

Tensorflow

import tensorflow as tf
from tensorflow.python.profiler.model_analyzer import profile
from tensorflow.python.profiler.option_builder import ProfileOptionBuilder

def get_flops(model):
  forward_pass = tf.function(model.call, input_signature=[tf.TensorSpec(shape=(1,) + model.input_shape[1:])])
  graph_info = profile(forward_pass.get_concrete_function().graph, options=ProfileOptionBuilder.float_operation())
  flops = graph_info.total_float_ops
  return flops

model = tf.keras.applications.ConvNeXtTiny()

# model.compile(optimizer='adam', loss='bce', metrics=['accuracy'])

flops = get_flops(model)
macs = flops / 2
print(f"MACs: {macs / 1e+9:,} G")
print(f"FLOPs: {flops / 1e+9:,} G")
MACs: 4.3900329795 G
FLOPs: 8.780065959 G

Just keep in mind that the source computes MACs .

Pytorch

According to medium post, it is possible to compute MACs and FLOPs using the mentioned method. Therefore, I believe you calculated MACs that were close to pytorch. I would like to suggest using fvcore, the official/semi-official flops counter provided by facebook/meta, instead of torchprofile which was employed in medium post.

!pip -q install fvcore
import torch
import torchvision
from fvcore.nn import FlopCountAnalysis

convnexttiny_weights = torchvision.models.ConvNeXt_Tiny_Weights.IMAGENET1K_V1
model = torchvision.models.convnext_tiny(weights=convnexttiny_weights)

inputs = (torch.randn(1,3,224,224), )

macs = FlopCountAnalysis(model, inputs)
macs = macs.total()
flops = macs * 2
print(f'MACs = {macs / 1e+9:,} G')
print(f'FLOPs = {flops / 1e+9:,} G')
MACs = 4.470437376 G
FLOPs = 8.940874752 G

If you refer to Figure 2 of the paper ConvNeXt, you will notice that they have provided MACs instead of FLOPs for ConvNeXtTiny. ConvNeXt code: tensorflow , pytorch

I executed the code on colab.

Disclaimer: I can not claim to be familiar with all pytorch tools. However, I am aware that torchprofile and fvcore are capable of computing MACs.

I hope It helps.
I kindly request that someone correct me if I am mistaken.