Continuous Adaptation for Machine Learning System to Data Changes

@deep-diver and I have worked on an MLOps project for the past couple of months. It shows how “Continuous Adaptation for ML System to Data Changes” can be done by building/interconnecting two separate pipelines (note this project is done in TFX and various GCP services).

We have written a blog post about some of the internal implementation details, and it is published on TensorFlow Blog. Please find it here:

Also, we have open-sourced all the materials to reproduce this project including in-depth explanations within a set of Jupyter notebooks. You can find the repo here

@Robert_Crowe, huge thanks for your help on this one.

Thanks for your valuable time to read this and we hope this will be helpful :slight_smile:

5 Likes

Thanks it is a nice tutorial, It could be interesting one day to expand this to cover:

  • some state of the art continual learning approaches instead of retraining the whole model.
  • to handle the drift in an openset/openworld context instead of just a misclassification threshold in the closed set classes.
2 Likes

Thanks for the suggestions! As you likely know many SoTA approaches stand quite colorless when they are exposed to real-world data but we will investigate and dig deeper.

We acknowledge (we do this from the post itself too) that JS Divergence (just a measurement) could have been used to capture the drift too but we wanted to follow another path.

But meanwhile, PRs are welcome :slight_smile:

3 Likes

Yes also in the not so “extreme” cases like Continual learning and openset the Active learning topic is always around the corner also with more “static” model but dynamic data pipelines:

1 Like

An overview on some of the required features are in https://arxiv.org/abs/2106.03122:

1 Like

thanks @Bhack for further information.

the materials you shared will definitely expand our knowledge space and let us to think about the next project to work on.

we are interested in two topics recently.

  1. monitor data drift comparing datasets (without model prediction) like JS Divergence
  2. combining two CI/CD MLOps systems to open source a complete MLOps system. Notice that this doesn’t mean to cover every usecases but to provide a complete kickstarter in one specific usecase.
1 Like

I think also that some state of the art learning challenge will go to impact mlops:

https://arxiv.org/abs/2111.01956

1 Like

Or like OW-DETR: Open-world Detection Transformer