Missing values not being detected by statistics gen

Hi, I am using a data set which has many nan/missing values. But statistics gen is not able to detect these and in the missing values in all the columns it says " 0%"

Hi @Aditya_Soni , Maybe you can provide a bit of context?
E.g. did you follow Tensorflow documentation e.g. Data validation using TFX Pipeline and TensorFlow Data Validation ?
Do you get a message arror, aside from the " 0%" you are getting?
Also, maybe you can share Colab / code?
Thank you.

1 Like

@tagoma
Thanks for the response, Yes I did followed the documentation.
Aside from “0%”, i was not getting any errror. Infact ExampleValidator component was also showing No anomality’s found.

Unfortunately due to company policy, I cannot share the code .

Could this be a case where a numerical feature is being mistaken for a text feature? What do the NaN and missing values look like?

1 Like

I wouldn’t expect NaNs to be reported as missing. The num_nan statistic should report those. See https://github.com/tensorflow/metadata/blob/a85e542f292562284f4d2aaa3a93c4d74060b05e/tensorflow_metadata/proto/v0/statistics.proto#L502. If the user wants to get anomalies for NaNs, they will need to set disallow_nans in their schema. See https://github.com/tensorflow/metadata/blob/a85e542f292562284f4d2aaa3a93c4d74060b05e/tensorflow_metadata/proto/v0/schema.proto#L541.