Is it possible to decode file on GPU?

innat · December 15, 2022, 1:34pm

Is it possible to decode file on GPU during training a model? Resizing, Rescaling etc can be done as part of a model.

with tf.device('/GPU:0'):
   tf.io.decode_*

model = Sequential(
     [
        ImageReader(), 
        ImageResizer(), 
        ImageNetModel()
        ...
     ]
)

Reference. https://developer.nvidia.com/dali

Bhack · December 15, 2022, 2:07pm

I don’t think that currently we have a GPU decoding.

We had a thread with preprocessing + decoding at:

github.com/keras-team/keras-cv

[Canary test] GridMask: Vectorized function call on the batch dimension

keras-team:master ← bhack:patch-2

opened 02:31PM - 22 Feb 22 UTC

bhack

+1 -1

As we have discussed in https://github.com/keras-team/keras-cv/pull/143#issuecom…ment-1047215737 this is just a canary (failing) test (check the CI): ``` ValueError: Input "maxval" of op 'RandomUniformInt' expected to be loop invariant. ``` As I've mentioned in the thread we really need to understand if we want to have randomness inside the batch or between the batches and what kind of impact we have between the computing overhead, contributing speed/code readability and network convergence. Also I don't know if @joker-eph or @qlzh727 could expose us a little bit the pro and cons of `jit_compile` a function vs using the `vectorized_map` or if they are orthogonal. With many CV transformations we cannot compile the function as the underline `tf.raw_ops.ImageProjectiveTransformV3` op isn't supported by XLA. /cc @chjort

We have also discussed something for Video:

Another emerging approach is:
RGB no more: Minimally-decoded JPEG Vision Transformers

river_shah · December 21, 2022, 7:23am

Does nvidia dali suit your usecase? I have not used it but could be something.

Also if using distributed strategies see these experimental options:

@tf_export("distribute.InputOptions", v1=[])
class InputOptions(
    collections.namedtuple("InputOptions", [
        "experimental_fetch_to_device",
        "experimental_replication_mode",
        "experimental_place_dataset_on_device",
        "experimental_per_replica_buffer_size",
    ])):
...

Perhaps experimental_place_dataset_on_device does what you are looking for