I am currently facing a related issue on dataflow when using tfx library. The tfx pipeline works fine locally but it fails on dataflow.
Overview: Issue arises while using tensorflow_io library to do preprocessing in tfx transform component. It works fine when one worker is used on dataflow but it throws out the below mentioned error when there are multiple workers used in dataflow. Is there any correct way to load external libraries such as ‘tensorflow_io’ while using dataflow as beam pipeline argument? I have built custom docker containers using ‘tfx create’ to build the pipeline and ‘tfx run’ to run the pipeline.
In the docker container I have specifically installed tensorflow_io library.
FileNotFoundError: Op type not registered ‘IO>AudioResample’ in binary running on cmle-training-workerpool0-12345. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed