I have a TensorBoard server started with a s3 like minio bucket in kubeflow, and would like to write tensorboard logs to this s3 bucket.
While calling tf.keras.callbacks.TensorBoard(log_dir="s3://minio-server:port/my-public-bucket")
, i got the following error
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 's3' not implemented
how can i write the keras training log for tensorboard to s3 like minio bucket with keras.callbacks?
I really appreciate if anyone can share some code snippets.
I don’t have experience using minio so I’m not sure how much help I will be. But some thoughts:
- Have you installed
tensorflow-io
? It’s the companion package that handles much of the io and protocols.
- Have you tried using another protocol like “s3e” or “s3a”? I don’t recall what protocols are supported with
tensorflow-io
, but you might experiment with ‘s3e://bucket_name/path/to/file’.
- What version of tensorflow? Can you experiment with other versions?
Sorry I don’t have anything more definitive. Good luck. And let us know what worked.
cheers,
Dennis
@dennisobrien Thanks for mention tensorflow-io, I used a cpu TF 2.9.1 image and tensorflow-io==0.26.0, according to version compatibility it should work.
import tensorflow_io
...
log_dir = "s3e://minio_ip:port/public_bucket/"
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir)
...
and still got
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 's3e' not implemented (file: 's3e://xxx.xxx.xxx.107:xxx97/kf-tensor-board/train') [Op:CreateSummaryFileWriter]
Just want to double check, do i need to wrap the s3/minio path with some tensorflow_io writter?
Thanks for mention s3e, which seems to be the case for tensorflow-io==0.17 (File system scheme 's3' not implemented · Issue #40302 · tensorflow/tensorflow · GitHub)
I get “UnimplementedError: File system scheme ‘s3’ not implemented” too(
Did you manage to find a solution?
I used a workaround with local log dir and copy that folder to the minio s3 bucket with boto3.
I declared environment variables as in this example io/tests/test_s3.py at master · tensorflow/io · GitHub and after that the logs began to write to s3
os.environ[“AWS_REGION”] = “us-east-1”
os.environ[“AWS_ACCESS_KEY_ID”] = “ACCESS_KEY”
os.environ[“AWS_SECRET_ACCESS_KEY”] = “SECRET_KEY”
os.environ[“S3_VERIFY_SSL”] = “0”
os.environ[“S3_ENDPOINT”] = “http://localhost:4566”
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir=(‘s3://storage/’ + path_model + “models/logs_buf”),
write_graph=False,
update_freq = ‘batch’
)
tensorflow==2.8.0\ tensorflow-io==0.25.0\
Now, I got an error:
curlCode: 60, SSL peer certificate or SSH remote key was not OK
.
It is probably because I am using a custom SSL certificate
os.environ["S3_ENDPOINT"] = "https://netappp1.company.com"
both env
os.environ["SSL_VERIFY_HOSTNAME"]="0"
os.environ["S3_VERIFY_SSL"] = "0"
doesn’t seem to work.
Sorry for the confusion, the actual error is Could not initialize events writer
tensorflow.python.framework.errors_impl.UnknownError: {{function_node __wrapped__CreateSummaryFileWriter_device_/job:localhost/replica:0/task:0/device:CPU:0}} : curlCode: 60, SSL peer certificate or SSH remote key was not OK
Failed to flush 1 events to s3://kind-mlflow/28_07_2023/train/events.out.tfevents.1690562484.tensorboard-2zbmj-1505363171.36.0.v2
Flushing first event.
Could not initialize events writer. [Op:CreateSummaryFileWriter]
I have the env setting
os.environ["S3_USE_HTTPS"]="1"
os.environ["S3_VERIFY_SSL"] = "0"
It might be due to the case that tensorflow-io
has remove the support of S3_VERIFY_SSL
improvements for `s3` environements variables by vnghia · Pull Request #1343 · tensorflow/io · GitHub. Which is still not reimplemented in the current tensorflow-io==0.32.0
release.