TFX: CsvExampleGen does not work with simply example? (help)

Hi there! :slight_smile:

I am trying out TFX pipelines, and the first step is to ingest data. In this simple example I am ingesting data from a local (google drive) csv file with CsvExampleGen, following the book "Building Machine Learning Pipelines, publish by O’reilly. However, it has proven difficult to simply import the csv-file. I don’t get an error, but the artifact is empty.

Here is the code snipped:

from tfx.components import CsvExampleGen

os.chdir("/content/gdrive/MyDrive/TFXnotebooks")
base_dir = os.getcwd()
data_dir = os.path.join(base_dir, "data/")

context = InteractiveContext() 
example_gen = CsvExampleGen(input_base= data_dir)
context.run(example_gen)

I’ve checked with os.listdir(data_dir) that the file is indeed there.

Note: I know in many sources it says to import external_input from tfx.utils.dsl_utils, but this method is not supported any longer (the module doesn’t exist anymore).

Hope someone can help,
Thanks :slight_smile:

2 Likes

Maybe @Robert_Crowe might be able to help here

1 Like

Could you try running this Colab to see how a CSV file is ingested?

3 Likes

Yes, this one runs the way I expected it to do. Apart from the depreciated external_input, I guess I was confused by the fact that the ExampleGen has one artifact as an output, where I expected to see two (train and val).
Thanks a lot for the reply!

1 Like

I am having the same problem as you described - empty artifact when loading CSV file from the project directory, how did you resolve it?

Hello AmyH!

Well, it turned out the .csv file did in fact get imported correctly. I was simply mistaken by thinking there would be two inputs. Did you try the Colab notebook posted above? --It shoud work.

Sorry for the late reply!