Debugging XLA compile

I am trying to enable XLA by using tf.function(fn, jit_compile=True), but got error

2022-05-18 00:48:53.146626: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at xla_ops.cc:287 : INVALID_ARGUMENT: Trying to access resource Resource-1419-at-0x561f8a868820 located in device /job:localhost/replica:0/task:0/device:CPU:0 from device /job:localhost/replica:0/task:0/device:GPU:0
Traceback (most recent call last):
  File "run_training.py", line 27, in <module>
    main()
  File "run_training.py", line 23, in main
    trainer.train()
  File "trainers.py", line 138, in train
    training_step(dataset_iter)
  File "miniconda3/envs/vlu/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "miniconda3/envs/vlu/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Trying to access resource Resource-1419-at-0x561f8a868820 located in device /job:localhost/replica:0/task:0/device:CPU:0 from device /job:localhost/replica:0/task:0/device:GPU:0 [Op:__inference_training_step_254644]

I understand variables on different devices is a know issue, is there a way to find the variable/resource that is on CPU so as to fix this error? The error message of resource id is difficult to make use of.

2 Likes

Hi i face exactly the same problem:

2023-02-04 08:45:00.238849: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at xla_ops.cc:287 : INVALID_ARGUMENT: Trying to access resource Resource-3-at-0x3c25c20 located in device /job:localhost/replica:0/task:0/device:CPU:0 from device /job:localhost/replica:0/task:0/device:GPU:0
Any advice is appreciated

1 Like