UnicodeDecodeError using CsvExampleGen

Hi
I am trying to use tfx.componets.CsvExampleGen with a csv file that contains words written in spanish, and I get an error:

UnicodeDecodeError: ‘utf-8 [while running ‘InputToRecord/ReadFromText’]’ codec can’t decode byte 0xc1 in position 72: invalid start byte

When I try to read this .csv with pandas this is solved by adding “encoding=latin-1”, is there a similar solution for tfx?

There isn’t anything similar for TFX, at least not yet. We have a bug open to address this. In the meantime, if you set the default encoding I think that will work.

1 Like