Anyone implemented TFX in AWS?


I was wondering if it possible/how easy it would be to implement a TFX pipeline (on a real dataset, with 100+ GB dataset, not a tutorial with a small dataset) in AWS?

For the orchestration, I might use Kubeflow. But I suppose, the major issue would be setting up a proper scalable runner for the Apache Beam. I am thinking of using Apache Flink for that.

Anyone with experience doing it? How would you go about putting a TF in production in AWS in general when you need to train the model on a regular basis on new data, do you write the pipeline from scratch or use some tool?

Thank You,
Gorjan Todorovski

1 Like

Generally I suggest you to select tags and category from the menu on a new thread cause specialized technical team members could be subscribed only to a tag subset (e.g. in this case I suggest tfx tag)

1 Like

Idon’t know how to add tags and even don’t see an option to edit the post.

1 Like

Do you see these options with your account

A comprehensive guide to Discourse tags - faq - Discourse Meta ?

1 Like


We gained some headway running this on AWS with Kubeflow, yet we just hit one obstacle that will take a piece to survive:

ValueError: Unable to get the Filesystem for way s3:///data.csv
It’s fascinating on the grounds that it is effectively associating with S3 to peruse the filename, data.csv. We just determine the can.

Nonetheless, I think the blunder that is raised is identified with Apache Beams’ Python SDK not having a S3 FileSystem.

I got so determined to make TFX work in AWS (in an easy to implement manner), that I have started working on a platform that enables running TFX pipelines in AWS (and potentially in any cloud environment). I am also creating a new GUI and orchestrator as I don’t like Kubeflow Pipelines.

It is still a work in progress:


Verified - Divvya Saxena