TFX Pipeline on Azure

I’m learning about setting TFX Pipelines and I am really interested on if it is possible to deploy these in a cloud environment such Azure.

Is there any documentation about this? Which servicies would be needed for that?

Cheers

1 Like

Yes. You could run it either in a VM or in a containerized environment like Kubernetes on Azure. You could probably also run it on Airflow. However I haven’t tried either of these things.

It would be amazing if apache beam on tfx could push the computation down to spark. In this situation, would airflow be just doing the job of orchestrating the components of the pipeline? Would KubeFlow be a better choice here? I can’t believe how awesome tfx is.

You can use the Beam Spark Runner to push the computation to a Spark cluster. For using with TFX you would use the portable runner, which supports Python.

Getting tfx to work on Azure was pretty easy even though it uses anaconda. The local dockerized spark cluster was also very easy to setup and get to work passing additional beam options. Next up is pushing the computation down to spark which I believe shouldn’t be too big of a problem to solve.

1 Like

I tried executing some of the course notebooks on AzureML (course 4 notebooks from deeplearning.ai). The notebooks are executed as PySpark notebooks and had dependency issues that cannot be resolved (“Import package” Section of the notebook).

Out of curiosity, why do you configure a local pyspark session if the end goal of the OP is to apply TFX on Azure? How did you resolve the dependency issues when installing TFX with pip (did you have requirement files)?

Hi Henry, Please check your version of Python in Azure. The current TFX release wants 3.9 or below, and if you’re running 3.10 it will probably try to load a very old version of TFX.

Thanks a lot. But the session already uses Python 3.8, Scala 2.12.15, Java 1.8.0_282, .Net Core 3.1, .Net for Apache Spark 2.0, Delta Lake 1.2. TFX doesn’t seem to have much documentation for other cloud providers. I really want to apply what I learned from your deeplearning.ai course to the enterprise environment at work. But since there’s not much documentation for alternative cloud providers (which have been approved), I don’t think I can set up the environment to run the scripts. Do you know of any documentation of TFX on Azure?

I don’t know of any documentation for running TFX on Azure. Are you running it on Azure Kubernetes Service (AKS)? I also notice that there is something completely different which is also called “TFX”. That could be confusing and break things if you’re following any of that documentation.

Running TFX on AKS should be like running it on any Kubernetes deployment. You do mention a lot of other things (Scala 2.12.15, Java 1.8.0_282, .Net Core 3.1, .Net for Apache Spark 2.0, Delta Lake 1.2) which could be potentially conflicting.