Anyone implemented TFX in AWS?

gorjant · May 28, 2021, 8:38am

Hi,

I was wondering if it possible/how easy it would be to implement a TFX pipeline (on a real dataset, with 100+ GB dataset, not a tutorial with a small dataset) in AWS?

For the orchestration, I might use Kubeflow. But I suppose, the major issue would be setting up a proper scalable runner for the Apache Beam. I am thinking of using Apache Flink for that.

Anyone with experience doing it? How would you go about putting a TF in production in AWS in general when you need to train the model on a regular basis on new data, do you write the pipeline from scratch or use some tool?

Thank You,
Gorjan Todorovski

Bhack · May 28, 2021, 1:25pm

Generally I suggest you to select tags and category from the menu on a new thread cause specialized technical team members could be subscribed only to a tag subset (e.g. in this case I suggest tfx tag)

gorjant · May 29, 2021, 3:07pm

Idon’t know how to add tags and even don’t see an option to edit the post.

Bhack · May 29, 2021, 4:25pm

Do you see these options with your account

A comprehensive guide to Discourse tags - faq - Discourse Meta ?

surbhi_jain · December 24, 2021, 7:44am

Hi,

We gained some headway running this on AWS with Kubeflow, yet we just hit one obstacle that will take a piece to survive:

ValueError: Unable to get the Filesystem for way s3:///data.csv
It’s fascinating on the grounds that it is effectively associating with S3 to peruse the filename, data.csv. We just determine the can.

Nonetheless, I think the blunder that is raised is identified with Apache Beams’ Python SDK not having a S3 FileSystem.

gorjant · December 24, 2021, 9:08am

I got so determined to make TFX work in AWS (in an easy to implement manner), that I have started working on a platform that enables running TFX pipelines in AWS (and potentially in any cloud environment). I am also creating a new GUI and orchestrator as I don’t like Kubeflow Pipelines.

It is still a work in progress:

Divvya_Saxena · December 27, 2021, 6:02am

Verified - Divvya Saxena

Merelda · January 20, 2023, 9:02am

Did you succeed?
Struggling to find resources on it…

gorjant · January 20, 2023, 2:10pm

Hi Marelda,

As this was also a pain for me, I have worked on creating a managed TFX platform where you can run your TFX pipeline in any cloud environment or on-premise - robotika. ai.

Let me know if you are interested; we can discuss how we can help you run your TFX pipelines - you can shoot me an email at:(Removed by moderator)

Cheers,
Gorjan