TFX for vision?

Sayak_Paul · May 24, 2021, 5:49am

Great updates from the TFX team at I/O. I am also a newbie in this area and after completing a few lessons from the Coursera MLOps Specialization, I got instantly hooked to TFX. Such an amazing framework! Data validator is my favorite component so far. @Robert_Crowe thank you for teaching it so beautifully.

In this video, I learned about TFX for NLP. As someone heavily into Computer Vision, a question arises:

Will there be core support for Vision inside TFX?

AseiSugiyama · May 24, 2021, 8:26am

FWIW, there are CIFAR10 example pipelines in the tensorflow/tfx repo.

Sayak_Paul · May 24, 2021, 8:41am

Aware of that, mate. TFX for NLP offers utilities specific to NLP tasks such easy reporting of topline metrics, seamless integration of preprocessing utilities, and so on.

lgusm · May 24, 2021, 10:48am

maybe @Robert_Crowe can help here

Sayak_Paul · May 25, 2021, 12:10pm

Yes, hence tagged him.

Robert_Crowe · June 3, 2021, 8:56pm

Hi, sorry for the delay responding! Sayak you are correct that TFX offers some deeper support for NLP in the Evaluator component, by offering BLEU and ROUGE metrics.

For image processing we don’t yet have the same level of support, but there is an example for this. It shows:

How to use a Keras application and to fine-tune the MobileNet image classification model.
How to perform data augmentation within the pipeline to ensure that the model generalizes well.
It invokes the rewriter to convert the model into TFLite.
Finally it shows how to write the required TFLite metadata to make the model compatible with MLKit.

Sayak_Paul · June 4, 2021, 1:46am

Thanks, @Robert_Crowe. I think it might be good to have support for more involved vision metrics like mAP, IoU, etc. since they are common object detection and segmentation respectively. Both of them are quite extensively used.

Robert_Crowe · June 12, 2021, 6:43pm

Thanks Sayak, and I agree that for object detection and segmentation those would be great metrics to have in TFMA. We also need validation measures in TFDV, and I know that some folks are working on that.

peddy · June 14, 2021, 3:33pm

This is fantastic feedback Sayak - one of the first things I think the TFMA team will ask for when presented with a FR for a new metric implementation is a set of (or at least 1) large benchmark they can use to test/finetune the metric implementation.

So if you’re (or anyone else) aware of popular tasks defined on large datasets where modelers are interested in computing such accuracy metrics, please do add them

Sayak_Paul · June 14, 2021, 3:58pm

Sure. Could you shed some more light on the large benchmark you mentioned? Something like state-of-the-art accuracy on the ImageNet-1k dataset? If so, I think you could use TensorFlow Model Garden to see what performance their implementations of image classification models get us to.

Similarly you will find implementations of other and most commonly used vision tasks – object detection and semantic segmentation. All have well established baselines on large datasets like MS-COCO, Places365, etc. For detection, mAP is the single most important metric we care about and for segmentation, IoU should do just fine.

As I am writing, I think the suite of vision APIs from TFX could also enable the following things:

Validator components to help developers understand the distribution of a large image dataset. I personally think this tooling is not available for enterprise-grade ML.
Utilities for running fast and distributed image similarity searches. There are already well established techniques for compressing embeddings and using them for fast retrieval. But having them integrated inside TFX would be a fun ride I think.

I am starting as an MLE at a startup today, in fact. So expect more TFX related stuff from me in the coming days

peddy · June 14, 2021, 4:25pm

This is great! To answer your question specifically, it’s extremely useful for the team not to just know what evaluation metrics (e.g., mAP) are useful to the community, but also what tasks (or task types) the community wants to evaluate their models on, that use that metric.

This allows the team to

have a concrete benchmark to test the metric computation implementation and finetune performance (parallelization of compute, resource usage, etc.)
possibly easily develop an example to showcase the new metric’s availability in TFMA/TFDV

Many teams using TFX will have their own proprietary datasets that you can’t share with us. But it would be really useful to know, which public tasks [1] map most closely to ones the community is interested in.

[1] Computer Vision | Papers With Code

Sayak_Paul · June 14, 2021, 4:44pm

So, I guess I got it right then. I have provided the information you asked for I believe. If anything is unclear let me know

peddy · June 14, 2021, 4:57pm

Yup you absolutely have - this was more of an explanation if any future readers would like to contribute their suggestions as well. Thanks Sayak!

Robert_Crowe · June 14, 2021, 10:56pm

While we’re discussion images, my view is that some basics would be useful. For validation, things like:

Is it an image?
Is the size between this min and max?
Does it have the right number of color channels?
Do the pixels have a minimum level of variance (for example, did we take the lens cap off?)

For feature engineering, things like:

Gaussian blur
Grayscale
Sobel filter
Canny filter

Do you think that would be valuable?

Sayak_Paul · June 15, 2021, 2:03am

Could you expand a bit more on this? I agree with the rest of the points mentioned for validation. Checking for format corruption might be another good addition.

Anomaly detection for vision datasets is way more nuanced than many other modalities I believe. For text-based problems, it’s easier to compute the descriptive statistics and compare examples against those but just getting the mean and std of a couple million images can be an expensive operation (given we operating on high-res images like 224x224x3).

Another post-training technique to discard some of the anomalies inside the training set is to plot the samples that cause a model to incur high loss values. Here’s a PoC of this technique. I learned this at one of the fast.ai classes.

Computing the distribution overlap is also not very straightforward for large vision datasets. In my experience, simple recipes like computing histogram frequencies often fail to capture the important trends.

One cheap but often effective technique I have seen on Kaggle (for tabular datasets in most of the cases) is to train a simple model to classify whether a given data point is from the training distribution. How this works in practice? We take the entries of the training set, discard their labels, and indicate 1s as their new labels to all the training data points. This is the training data for the simple model now. We take the trained model to infer on the validation and test sets to investigate the overlap.

All of this probably calls for a well-structured paper

For feature engineering, I think for most DL-related vision workflows users would want to have their own data augmentation pipelines incorporated.

Bhack · June 15, 2021, 10:22am

I think that we have two related interesting points:

Automated or semiautomated Data clean up
Policy learning or any other learning task for automatic augmentation

Sayak_Paul · June 15, 2021, 10:37am

A very interesting study in the first link.

I would prefer RandAugment since it’s computationally cheaper and almost yield similar or better performances than AutoAugment.

Bhack · June 15, 2021, 11:19am

I would prefer RandAugment since it’s computationally cheaper and almost yield similar or better performances than AutoAugment.

There are other cheaper solutions like:

Sayak_Paul · June 15, 2021, 11:51am

Still would very much prefer RandAugment because of its simplicity and it performs generally well across a wide variety of tasks.

But it’s not even the point. For feature engineering, my point is the end user should be able to setup their own augmentation pipelines.

Robert_Crowe · August 12, 2021, 5:24pm

What I meant was that a bad image can have very little hue and value variance, which is the case if the lens cap is on, or I take a very out of focus image like when things are way too close, or an image with not nearly enough light, or an image of the sun, or an image of the inside of my pocket, etc.