How can I access to HDFS file system in the latest Tensorflow 2.6.0?

I recently upgraded the tensorflow version used in my program to the recently released 2.6.0, but I ran into a trouble.

import tensorflow as tf

pattern = 'hdfs://mypath'
print(tf.io.gfile.glob(pattern))

The above API throws an exception in version 2.6:

tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme'hdfs' not implemented (file:xxxxx)

Then I checked the relevant implementation code and found that the official recommendation is to use tensorflow/io to access hdfs, and the environment variable TF_USE_MODULAR_FILESYSTEM is provided to use legacy access support. Since my code is more complex and difficult to refactor in a short time, I tried to use this environment variable, but it still failed.

In general, my questions are:

  1. In the latest version of tensorflow, if “tfio” is not used, how can I still access the HDFS file?
  2. If “tfio” must be used, what is the equivalent code call to tf.io.gfile.glob?

You need to import tensorflow_io alongside tensorflow since the 2.6.0 version:

Install with:
pip install tensorflow-io

In your code:
import tensorflow as tf
import tensorflow_io as tfio

2 Likes

You only need to install tensorflow-io pip package and do the import tensorflow_io as tfio. There’s no need for any other code change as the filesystem plugin would be loaded behind the scenes.

Alternatively, export TF_USE_MODULAR_FILESYSTEM=1 in all hosts/envs that need to access hdfs should work. If it doesn’t, can you provide a minimal example and the error logs?