Object detection model building fails on mac m2 with gpu usage with weird error

gautam · September 10, 2023, 5:49pm

I am working on creating a custom dataset model with mask_rcnn_inception_resnet as a base model. I have managed to execute a training run on Ubuntu CPU . Now I am trying to make it work on Macbook M2.

My test runs as advised by various sources are all successful such as -

https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html

From within TensorFlow/models/research/

python object_detection/builders/model_builder_tf2_test.py

But when I am running my actual model training script I am facing a weird error -

tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_18_device_/job:localhost/replica:0/task:0/device:GPU:0}} indices[0] = 0 is not in [0, 0)
	 [[{{node GatherV2_7}}]]
	 [[MultiDeviceIteratorGetNextFromShard]]
	 [[RemoteCall]] [Op:IteratorGetNext] name:

The full console log below

python3 model_main_tf2.py --model_dir=models/ark_mask_rcnn_inception_resnet_v2 --pipeline_config_path=models/ark_mask_rcnn_inception_resnet_v2/pipeline.config
2023-09-10 23:11:55.486121: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M2 Pro
2023-09-10 23:11:55.486143: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 32.00 GB
2023-09-10 23:11:55.486147: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 10.67 GB
2023-09-10 23:11:55.486172: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-10 23:11:55.486190: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2023-09-10 23:11:55.487664: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-09-10 23:11:55.487673: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0910 23:11:55.487923 8568659456 mirrored_strategy.py:419] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I0910 23:11:55.496275 8568659456 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0910 23:11:55.496325 8568659456 config_util.py:552] Maybe overwriting use_bfloat16: False
WARNING:tensorflow:From /Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/object_detection/model_lib_v2.py:563: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W0910 23:11:55.509262 8568659456 deprecation.py:364] From /Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/object_detection/model_lib_v2.py:563: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['annotations/train.record']
I0910 23:11:55.512531 8568659456 dataset_builder.py:162] Reading unweighted datasets: ['annotations/train.record']
INFO:tensorflow:Reading record datasets for input file: ['annotations/train.record']
I0910 23:11:55.512593 8568659456 dataset_builder.py:79] Reading record datasets for input file: ['annotations/train.record']
INFO:tensorflow:Number of filenames to read: 1
I0910 23:11:55.512616 8568659456 dataset_builder.py:80] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0910 23:11:55.512634 8568659456 dataset_builder.py:86] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.deterministic`.
W0910 23:11:55.515808 8568659456 deprecation.py:364] From /Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/object_detection/builders/dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.deterministic`.
WARNING:tensorflow:From /Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/object_detection/builders/dataset_builder.py:235: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0910 23:11:55.524945 8568659456 deprecation.py:364] From /Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/object_detection/builders/dataset_builder.py:235: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py:459: calling map_fn (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
W0910 23:11:56.154265 8568659456 deprecation.py:569] From /Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py:459: calling map_fn (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
WARNING:tensorflow:From /Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0910 23:11:58.014597 8568659456 deprecation.py:364] From /Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0910 23:11:58.898831 8568659456 deprecation.py:364] From /Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
2023-09-10 23:12:00.177526: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2023-09-10 23:12:00.181435: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
Traceback (most recent call last):
  File "/Users/_dga/ml-git/tf-ark/Tensorflow/workspace/training_demo/model_main_tf2.py", line 126, in <module>
    tf.compat.v1.app.run()
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/platform/app.py", line 36, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/Users/_dga/ml-git/tf-ark/Tensorflow/workspace/training_demo/model_main_tf2.py", line 117, in main
    model_lib_v2.train_loop(
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/object_detection/model_lib_v2.py", line 605, in train_loop
    load_fine_tune_checkpoint(
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/object_detection/model_lib_v2.py", line 401, in load_fine_tune_checkpoint
    _ensure_model_is_built(model, input_dataset, unpad_groundtruth_tensors)
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/object_detection/model_lib_v2.py", line 161, in _ensure_model_is_built
    features, labels = iter(input_dataset).next()
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/distribute/input_lib.py", line 260, in next
    return self.__next__()
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/distribute/input_lib.py", line 264, in __next__
    return self.get_next()
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/distribute/input_lib.py", line 325, in get_next
    return self._get_next_no_partial_batch_handling(name)
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/distribute/input_lib.py", line 361, in _get_next_no_partial_batch_handling
    replicas.extend(self._iterators[i].get_next_as_list(new_name))
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/distribute/input_lib.py", line 1427, in get_next_as_list
    return self._format_data_list_with_options(self._iterator.get_next())
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py", line 553, in get_next
    result.append(self._device_iterators[i].get_next())
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 867, in get_next
    return self._next_internal()
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 777, in _next_internal
    ret = gen_dataset_ops.iterator_get_next(
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 3028, in iterator_get_next
    _ops.raise_from_not_ok_status(e, name)
  File "/Users/_dga/anaconda3/envs/tf-ark/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 6656, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_18_device_/job:localhost/replica:0/task:0/device:GPU:0}} indices[0] = 0 is not in [0, 0)
	 [[{{node GatherV2_7}}]]
	 [[MultiDeviceIteratorGetNextFromShard]]
	 [[RemoteCall]] [Op:IteratorGetNext] name: 
(tf-ark)  _dga@ :>

gautam · September 12, 2023, 1:51pm

In absence of any response to this question / help request, I had do my own investigation and assumption as listed in the only answer here → python 3.x - Training custom data set model using mask_rcnn_inception from tensorflow model zoo on Macbook pro M2 - Stack Overflow

for future reference ^^

gautam · September 12, 2023, 4:12pm

This assumption also seems to fall flat and could be a compability issue of the model with TF2 or python… multiple other people have listed bugs matching this error but no resolution yet.

github.com/tensorflow/tensorflow

Error: "tensorflow.python.framework.errors_impl.InvalidArgumentError" when training Mask RCNN Inception Resnet V2 1024x1024 model

opened 08:58PM - 13 Nov 20 UTC

closed 05:51AM - 01 Dec 20 UTC

ib124

stat:awaiting response type:support stale TF 2.3

I am training a Mask R-CNN Inception ResNet V2 1024x1024 algorithm using my comp…uter's GPU. This was downloaded from the TensorFlow Detection Model Zoo, and I labeled my images (dimensions of 1100x1100 pixels) with Label-img. Here is what I am working with: - GPU: NVIDIA GEFORCE RTX 2060 - GPU: 16GB RAM, 6 processor cores - TensorFlow: 2.3.1 - Python: 3.8.6 - CUDA: 10.1 - cuDNN: 7.6 - Anaconda 3 command prompt All tfrecord files have been generated, and when I start to train my model using ```python model_main_tf2.py --model_dir=models/my_faster_rcnn --pipeline_config_path=models/my_faster_rcnn/pipeline.config```, I get the following errors: ``` Traceback (most recent call last): File "model_main_tf2.py", line 113, in <module> tf.compat.v1.app.run() File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\absl\app.py", line 303, in run _run_main(main, args) File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "model_main_tf2.py", line 104, in main model_lib_v2.train_loop( File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\object_detection\model_lib_v2.py", line 564, in train_loop load_fine_tune_checkpoint(detection_model, File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\object_detection\model_lib_v2.py", line 350, in load_fine_tune_checkpoint features, labels = iter(input_dataset).next() File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 645, in next return self.__next__() File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 649, in __next__ return self.get_next() File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 694, in get_next self._iterators[i].get_next_as_list_static_shapes(new_name)) File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 1474, in get_next_as_list_static_shapes return self._iterator.get_next() File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\data\ops\multi_device_iterator_ops.py", line 581, in get_next result.append(self._device_iterators[i].get_next()) File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 825, in get_next return self._next_internal() File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 764, in _next_internal return structure.from_compatible_tensor_list(self._element_spec, ret) File "C:\user\anaconda3\envs\object_detection_api\lib\contextlib.py", line 131, in __exit__ self.gen.throw(type, value, traceback) File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\eager\context.py", line 2105, in execution_mode executor_new.wait() File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\eager\executor.py", line 67, in wait pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[16] = 16 is not in [0, 0) [[{{node GatherV2_7}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] ``` The config file that was used to run the model is: ``` # Mask R-CNN with Inception Resnet v2 (no atrous) # Sync-trained on COCO (with 8 GPUs) with batch size 16 (1024x1024 resolution) # Initialized from Imagenet classification checkpoint # # Train on GPU-8 # # Achieves 40.4 box mAP and 35.5 mask mAP on COCO17 val model { faster_rcnn { number_of_stages: 3 num_classes: 1 image_resizer { fixed_shape_resizer { height: 1024 width: 1024 } } feature_extractor { type: 'faster_rcnn_inception_resnet_v2_keras' } first_stage_anchor_generator { grid_anchor_generator { scales: [0.25, 0.5, 1.0, 2.0] aspect_ratios: [0.5, 1.0, 2.0] height_stride: 16 width_stride: 16 } } first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.7 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 17 maxpool_kernel_size: 1 maxpool_stride: 1 second_stage_box_predictor { mask_rcnn_box_predictor { use_dropout: false dropout_keep_probability: 1.0 fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } mask_height: 33 mask_width: 33 mask_prediction_conv_depth: 0 mask_prediction_num_conv_layers: 4 conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } predict_instance_masks: true } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 second_stage_mask_prediction_loss_weight: 4.0 resize_masks: false } } train_config: { batch_size: 1 num_steps: 200000 optimizer { momentum_optimizer: { learning_rate: { cosine_decay_learning_rate { learning_rate_base: 0.008 total_steps: 200000 warmup_learning_rate: 0.0 warmup_steps: 5000 } } momentum_optimizer_value: 0.9 } use_moving_average: false } gradient_clipping_by_norm: 10.0 fine_tune_checkpoint_version: V2 fine_tune_checkpoint: "pre-trained-models/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8/checkpoint/ckpt-0" fine_tune_checkpoint_type: "detection" data_augmentation_options { random_horizontal_flip { } } } train_input_reader: { label_map_path: "annotations/label_map.pbtxt" tf_record_input_reader { input_path: "annotations/train.record" } load_instance_masks: true mask_type: PNG_MASKS } eval_config: { metrics_set: "coco_detection_metrics" metrics_set: "coco_mask_metrics" eval_instance_masks: true use_moving_averages: false batch_size: 1 include_metrics_per_category: true } eval_input_reader: { label_map_path: "annotations/label_map.pbtxt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "annotations/test.record" } load_instance_masks: true mask_type: PNG_MASKS } ``` **What can be done to fix this?** ############################################## Below are the scripts that are referenced in the error: File "model_main_tf2.py", line 113: ``` #Lines 74-113: def main(unused_argv): flags.mark_flag_as_required('model_dir') flags.mark_flag_as_required('pipeline_config_path') tf.config.set_soft_device_placement(True) if FLAGS.checkpoint_dir: model_lib_v2.eval_continuously( pipeline_config_path=FLAGS.pipeline_config_path, model_dir=FLAGS.model_dir, train_steps=FLAGS.num_train_steps, sample_1_of_n_eval_examples=FLAGS.sample_1_of_n_eval_examples, sample_1_of_n_eval_on_train_examples=( FLAGS.sample_1_of_n_eval_on_train_examples), checkpoint_dir=FLAGS.checkpoint_dir, wait_interval=300, timeout=FLAGS.eval_timeout) else: if FLAGS.use_tpu: # TPU is automatically inferred if tpu_name is None and # we are running under cloud ai-platform. resolver = tf.distribute.cluster_resolver.TPUClusterResolver( FLAGS.tpu_name) tf.config.experimental_connect_to_cluster(resolver) tf.tpu.experimental.initialize_tpu_system(resolver) strategy = tf.distribute.experimental.TPUStrategy(resolver) elif FLAGS.num_workers > 1: strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy() else: strategy = tf.compat.v2.distribute.MirroredStrategy() with strategy.scope(): model_lib_v2.train_loop( pipeline_config_path=FLAGS.pipeline_config_path, model_dir=FLAGS.model_dir, train_steps=FLAGS.num_train_steps, use_tpu=FLAGS.use_tpu, checkpoint_every_n=FLAGS.checkpoint_every_n, record_summaries=FLAGS.record_summaries) if __name__ == '__main__': tf.compat.v1.app.run() ``` File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\platform\app.py", line 40: ``` #Lines 17-40: from __future__ import absolute_import from __future__ import division from __future__ import print_function import sys as _sys from absl.app import run as _run from tensorflow.python.platform import flags from tensorflow.python.util.tf_export import tf_export def _parse_flags_tolerate_undef(argv): """Parse args, returning any unknown flags (ABSL defaults to crashing).""" return flags.FLAGS(_sys.argv if argv is None else argv, known_only=True) @tf_export(v1=['app.run']) def run(main=None, argv=None): """Runs the program with an optional 'main' function and 'argv' list.""" main = main or _sys.modules['__main__'].main _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) ``` File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\absl\app.py", line 303: ``` #Lines 294-328: try: args = _run_init( sys.argv if argv is None else argv, flags_parser, ) while _init_callbacks: callback = _init_callbacks.popleft() callback() try: _run_main(main, args) except UsageError as error: usage(shorthelp=True, detailed_error=error, exitcode=error.exitcode) except: exc = sys.exc_info()[1] # Don't try to post-mortem debug successful SystemExits, since those # mean there wasn't actually an error. In particular, the test framework # raises SystemExit(False) even if all tests passed. if isinstance(exc, SystemExit) and not exc.code: raise # Check the tty so that we don't hang waiting for input in an # non-interactive scenario. if FLAGS.pdb_post_mortem and sys.stdout.isatty(): traceback.print_exc() print() print(' *** Entering post-mortem debugging ***') print() pdb.post_mortem() raise except Exception as e: _call_exception_handlers(e) raise # Callbacks which have been deferred until after _run_init has been called. _init_callbacks = collections.deque() ``` File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\absl\app.py", line 251: ``` #Lines 231-251: def _run_main(main, argv): """Calls main, optionally with pdb or profiler.""" if FLAGS.run_with_pdb: sys.exit(pdb.runcall(main, argv)) elif FLAGS.run_with_profiling or FLAGS.profile_file: # Avoid import overhead since most apps (including performance-sensitive # ones) won't be run with profiling. import atexit if FLAGS.use_cprofile_for_profiling: import cProfile as profile else: import profile profiler = profile.Profile() if FLAGS.profile_file: atexit.register(profiler.dump_stats, FLAGS.profile_file) else: atexit.register(profiler.print_stats) retval = profiler.runcall(main, argv) sys.exit(retval) else: sys.exit(main(argv)) ``` File "model_main_tf2.py", line 104: ``` #Lines 74-113: def main(unused_argv): flags.mark_flag_as_required('model_dir') flags.mark_flag_as_required('pipeline_config_path') tf.config.set_soft_device_placement(True) if FLAGS.checkpoint_dir: model_lib_v2.eval_continuously( pipeline_config_path=FLAGS.pipeline_config_path, model_dir=FLAGS.model_dir, train_steps=FLAGS.num_train_steps, sample_1_of_n_eval_examples=FLAGS.sample_1_of_n_eval_examples, sample_1_of_n_eval_on_train_examples=( FLAGS.sample_1_of_n_eval_on_train_examples), checkpoint_dir=FLAGS.checkpoint_dir, wait_interval=300, timeout=FLAGS.eval_timeout) else: if FLAGS.use_tpu: # TPU is automatically inferred if tpu_name is None and # we are running under cloud ai-platform. resolver = tf.distribute.cluster_resolver.TPUClusterResolver( FLAGS.tpu_name) tf.config.experimental_connect_to_cluster(resolver) tf.tpu.experimental.initialize_tpu_system(resolver) strategy = tf.distribute.experimental.TPUStrategy(resolver) elif FLAGS.num_workers > 1: strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy() else: strategy = tf.compat.v2.distribute.MirroredStrategy() with strategy.scope(): model_lib_v2.train_loop( pipeline_config_path=FLAGS.pipeline_config_path, model_dir=FLAGS.model_dir, train_steps=FLAGS.num_train_steps, use_tpu=FLAGS.use_tpu, checkpoint_every_n=FLAGS.checkpoint_every_n, record_summaries=FLAGS.record_summaries) if __name__ == '__main__': tf.compat.v1.app.run() ``` File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\object_detection\model_lib_v2.py", line 564: ``` #Line 545-569: if record_summaries: summary_writer = tf.compat.v2.summary.create_file_writer( summary_writer_filepath) else: summary_writer = tf2.summary.create_noop_writer() if use_tpu: num_steps_per_iteration = 100 else: # TODO(b/135933080) Explore setting to 100 when GPU performance issues # are fixed. num_steps_per_iteration = 1 with summary_writer.as_default(): with strategy.scope(): with tf.compat.v2.summary.record_if( lambda: global_step % num_steps_per_iteration == 0): # Load a fine-tuning checkpoint. if train_config.fine_tune_checkpoint: load_fine_tune_checkpoint(detection_model, train_config.fine_tune_checkpoint, fine_tune_checkpoint_type, fine_tune_checkpoint_version, train_input, unpad_groundtruth_tensors) ``` File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\object_detection\model_lib_v2.py", line 350: ``` #Lines 312-350: def load_fine_tune_checkpoint( model, checkpoint_path, checkpoint_type, checkpoint_version, input_dataset, unpad_groundtruth_tensors): """Load a fine tuning classification or detection checkpoint. To make sure the model variables are all built, this method first executes the model by computing a dummy loss. (Models might not have built their variables before their first execution) It then loads an object-based classification or detection checkpoint. This method updates the model in-place and does not return a value. Args: model: A DetectionModel (based on Keras) to load a fine-tuning checkpoint for. checkpoint_path: Directory with checkpoints file or path to checkpoint. checkpoint_type: Whether to restore from a full detection checkpoint (with compatible variable names) or to restore from a classification checkpoint for initialization prior to training. Valid values: `detection`, `classification`. checkpoint_version: train_pb2.CheckpointVersion.V1 or V2 enum indicating whether to load checkpoints in V1 style or V2 style. In this binary we only support V2 style (object-based) checkpoints. input_dataset: The tf.data Dataset the model is being trained on. Needed to get the shapes for the dummy loss computation. unpad_groundtruth_tensors: A parameter passed to unstack_batch. Raises: IOError: if `checkpoint_path` does not point at a valid object-based checkpoint ValueError: if `checkpoint_version` is not train_pb2.CheckpointVersion.V2 """ if not is_object_based_checkpoint(checkpoint_path): raise IOError('Checkpoint is expected to be an object-based checkpoint.') if checkpoint_version == train_pb2.CheckpointVersion.V1: raise ValueError('Checkpoint version should be V2') features, labels = iter(input_dataset).next() ``` File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\distribute\input_lib.py", issues with line 645, 645, 694: ``` #Lines 615-728: class DistributedIteratorBase(DistributedIteratorInterface): """Common implementation for all input iterators.""" # pylint: disable=super-init-not-called def __init__(self, input_workers, iterators, strategy): static_shape = _get_static_shape(iterators) # TODO(b/133073708): we currently need a flag to control the usage because # there is a performance difference between get_next() and # get_next_as_optional(). And we only enable get_next_as_optional when the # output shapes are not static. # # TODO(rxsang): We want to always enable the get_next_as_optional behavior # when user passed input_fn instead of dataset. if getattr( strategy.extended, "experimental_enable_get_next_as_optional", False): self._enable_get_next_as_optional = ( not static_shape) or strategy.extended._in_multi_worker_mode() else: self._enable_get_next_as_optional = False assert isinstance(input_workers, InputWorkers) if not input_workers.worker_devices: raise ValueError("Should have at least one worker for input iterator.") self._iterators = iterators self._input_workers = input_workers self._strategy = strategy def next(self): return self.__next__() def __next__(self): try: return self.get_next() except errors.OutOfRangeError: raise StopIteration def __iter__(self): return self def get_next_as_optional(self): global_has_value, replicas = _get_next_as_optional(self, self._strategy) def return_none(): return optional_ops.Optional.empty(self._element_spec) def return_value(replicas): """Wraps the inputs for replicas in an `tf.experimental.Optional`.""" results = [] for i, worker in enumerate(self._input_workers.worker_devices): with ops.device(worker): devices = self._input_workers.compute_devices_for_worker(i) for j, device in enumerate(devices): with ops.device(device): result = replicas[i][j] results.append(result) replicas = results return optional_ops.Optional.from_value( distribute_utils.regroup(replicas)) return control_flow_ops.cond(global_has_value, lambda: return_value(replicas), lambda: return_none()) # pylint: disable=unnecessary-lambda def get_next(self, name=None): """Returns the next input from the iterator for all replicas.""" if not self._enable_get_next_as_optional: replicas = [] for i, worker in enumerate(self._input_workers.worker_devices): if name is not None: d = tf_device.DeviceSpec.from_string(worker) new_name = "%s_%s_%d" % (name, d.job, d.task) else: new_name = None with ops.device(worker): # Make `replicas` a flat list of values across all replicas. replicas.extend( self._iterators[i].get_next_as_list_static_shapes(new_name)) return distribute_utils.regroup(replicas) out_of_range_replicas = [] def out_of_range_fn(worker_index, device): """This function will throw an OutOfRange error.""" # As this will be only called when there is no data left, so calling # get_next() will trigger an OutOfRange error. data = self._iterators[worker_index].get_next(device) out_of_range_replicas.append(data) return data global_has_value, replicas = _get_next_as_optional(self, self._strategy) results = [] for i, worker in enumerate(self._input_workers.worker_devices): with ops.device(worker): devices = self._input_workers.compute_devices_for_worker(i) for j, device in enumerate(devices): with ops.device(device): # pylint: disable=undefined-loop-variable # pylint: disable=cell-var-from-loop # It is fine for the lambda to capture variables from the loop as # the lambda is executed in the loop as well. result = control_flow_ops.cond( global_has_value, lambda: replicas[i][j], lambda: out_of_range_fn(i, device), strict=True, ) # pylint: enable=cell-var-from-loop # pylint: enable=undefined-loop-variable results.append(result) replicas = results return distribute_utils.regroup(replicas) ``` File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 1474 ``` #Lines 1459-1474: def get_next_as_list_static_shapes(self, name=None): """Get next element from the underlying iterator. Runs the iterator get_next() within a device scope. Since this doesn't use get_next_as_optional(), is is considerably faster than get_next_as_list() (but can only be used when the shapes are static). Args: name: not used. Returns: A list consisting of the next data from each device. """ del name with ops.device(self._worker): return self._iterator.get_next() ``` File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\data\ops\multi_device_iterator_ops.py", line 581: ``` #Lines 572-588: def get_next(self, device=None): """Returns the next element given a `device`, else returns all in a list.""" if device is not None: index = self._devices.index(device) return self._device_iterators[index].get_next() result = [] for i, device in enumerate(self._devices): with ops.device(device): result.append(self._device_iterators[i].get_next()) return result def __iter__(self): return self def __next__(self): return self.next() ``` File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 764 and 825: ``` #Lines 750-834: with context.execution_mode(context.SYNC): with ops.device(self._device): # TODO(ashankar): Consider removing this ops.device() context manager # and instead mimic ops placement in graphs: Operations on resource # handles execute on the same device as where the resource is placed. ret = gen_dataset_ops.iterator_get_next( self._iterator_resource, output_types=self._flat_output_types, output_shapes=self._flat_output_shapes) try: # Fast path for the case `self._structure` is not a nested structure. return self._element_spec._from_compatible_tensor_list(ret) # pylint: disable=protected-access except AttributeError: return structure.from_compatible_tensor_list(self._element_spec, ret) @property def _type_spec(self): return IteratorSpec(self.element_spec) def next(self): try: return self._next_internal() except errors.OutOfRangeError: raise StopIteration @property @deprecation.deprecated( None, "Use `tf.compat.v1.data.get_output_classes(iterator)`.") def output_classes(self): """Returns the class of each component of an element of this iterator. The expected values are `tf.Tensor` and `tf.sparse.SparseTensor`. Returns: A nested structure of Python `type` objects corresponding to each component of an element of this dataset. """ return nest.map_structure( lambda component_spec: component_spec._to_legacy_output_classes(), # pylint: disable=protected-access self._element_spec) @property @deprecation.deprecated( None, "Use `tf.compat.v1.data.get_output_shapes(iterator)`.") def output_shapes(self): """Returns the shape of each component of an element of this iterator. Returns: A nested structure of `tf.TensorShape` objects corresponding to each component of an element of this dataset. """ return nest.map_structure( lambda component_spec: component_spec._to_legacy_output_shapes(), # pylint: disable=protected-access self._element_spec) @property @deprecation.deprecated( None, "Use `tf.compat.v1.data.get_output_types(iterator)`.") def output_types(self): """Returns the type of each component of an element of this iterator. Returns: A nested structure of `tf.DType` objects corresponding to each component of an element of this dataset. """ return nest.map_structure( lambda component_spec: component_spec._to_legacy_output_types(), # pylint: disable=protected-access self._element_spec) @property def element_spec(self): return self._element_spec def get_next(self): return self._next_internal() def get_next_as_optional(self): # pylint: disable=protected-access return optional_ops._OptionalImpl( gen_dataset_ops.iterator_get_next_as_optional( self._iterator_resource, output_types=structure.get_flat_tensor_types(self.element_spec), output_shapes=structure.get_flat_tensor_shapes( self.element_spec)), self.element_spec) ``` File "C:\user\anaconda3\envs\object_detection_api\lib\contextlib.py", line 131: ``` #Lines 97-162: class _GeneratorContextManager(_GeneratorContextManagerBase, AbstractContextManager, ContextDecorator): """Helper for @contextmanager decorator.""" def _recreate_cm(self): # _GCM instances are one-shot context managers, so the # CM must be recreated each time a decorated function is # called return self.__class__(self.func, self.args, self.kwds) def __enter__(self): # do not keep args and kwds alive unnecessarily # they are only needed for recreation, which is not possible anymore del self.args, self.kwds, self.func try: return next(self.gen) except StopIteration: raise RuntimeError("generator didn't yield") from None def __exit__(self, type, value, traceback): if type is None: try: next(self.gen) except StopIteration: return False else: raise RuntimeError("generator didn't stop") else: if value is None: # Need to force instantiation so we can reliably # tell if we get the same exception back value = type() try: self.gen.throw(type, value, traceback) except StopIteration as exc: # Suppress StopIteration *unless* it's the same exception that # was passed to throw(). This prevents a StopIteration # raised inside the "with" statement from being suppressed. return exc is not value except RuntimeError as exc: # Don't re-raise the passed in exception. (issue27122) if exc is value: return False # Likewise, avoid suppressing if a StopIteration exception # was passed to throw() and later wrapped into a RuntimeError # (see PEP 479). if type is StopIteration and exc.__cause__ is value: return False raise except: # only re-raise if it's *not* the exception that was # passed to throw(), because __exit__() must not raise # an exception unless __exit__() itself failed. But throw() # has to raise the exception to signal propagation, so this # fixes the impedance mismatch between the throw() protocol # and the __exit__() protocol. # # This cannot use 'except BaseException as exc' (as in the # async implementation) to maintain compatibility with # Python 2, where old-style class exceptions are not caught # by 'except BaseException'. if sys.exc_info()[1] is value: return False raise raise RuntimeError("generator didn't stop after throw()") ``` File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\eager\context.py", line 2105: ``` #Lines 2001-2013: def graph_mode(): """Context-manager to disable eager execution for the current thread.""" return context()._mode(GRAPH_MODE) # pylint: disable=protected-access def eager_mode(): """Context-manager to enable eager execution for the current thread.""" return context()._mode(EAGER_MODE) # pylint: disable=protected-access def scope_name(): """Name of the current scope.""" return context().scope_name ``` File "C:\user\anaconda3\envs\object_detection_api\lib\site-packages\tensorflow\python\eager\executor.py", line 67: ``` #Lines 24-76: class Executor(object): """A class for handling eager execution. The default behavior for asynchronous execution is to serialize all ops on a single thread. Having different `Executor` objects in different threads enables executing ops asynchronously in parallel: ```python def thread_function(): executor = executor.Executor(enable_async=True): context.set_executor(executor) a = threading.Thread(target=thread_function) a.start() b = threading.Thread(target=thread_function) b.start() """ def __init__(self, handle): self._handle = handle def __del__(self): try: # pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle) pywrap_tfe.TFE_DeleteExecutor(self._handle) except TypeError: # Suppress some exceptions, mainly for the case when we're running on # module deletion. Things that can go wrong include the pywrap module # already being unloaded, self._handle. no longer being # valid, and so on. Printing warnings in these cases is silly # (exceptions raised from __del__ are printed as warnings to stderr). pass # 'NoneType' object is not callable when the handle has been # partially unloaded. def is_async(self): return pywrap_tfe.TFE_ExecutorIsAsync(self._handle) def handle(self): return self._handle def wait(self): """Waits for ops dispatched in this executor to finish.""" pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle) def clear_error(self): """Clears errors raised in this executor during execution.""" pywrap_tfe.TFE_ExecutorClearError(self._handle) def new_executor(enable_async): handle = pywrap_tfe.TFE_NewExecutor(enable_async) return Executor(handle) ```

gautam · September 12, 2023, 4:14pm

github.com/tensorflow/models

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[10] = 10 is not in [0, 0)

opened 08:42AM - 07 Aug 20 UTC

magedhelmy1

type:bug models:research:odapi

# Prerequisites Please answer the following questions for yourself before sub…mitting an issue. - [x ] I am using the latest TensorFlow Model Garden release and TensorFlow 2. - [x ] I am reporting the issue to the correct repository. (Model Garden official or research directory) - [x ] I checked to make sure that this issue has not already been filed. ## 1. The entire URL of the file you are using https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md ## 2. Describe the bug I am trying to write the equivalent of this [code][1] which converts CSV to TF records but instead, I am trying to convert from JSON to TFrecords. I am trying to generate TFrecords for using it in [object detection API][2]. Here is my full error message Traceback (most recent call last): File "model_main_tf2.py", line 113, in <module> tf.compat.v1.app.run() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\absl\app.py", line 299, in run _run_main(main, args) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\absl\app.py", line 250, in _run_main sys.exit(main(argv)) File "model_main_tf2.py", line 109, in main record_summaries=FLAGS.record_summaries) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\object_detection\model_lib_v2.py", line 561, in train_loop unpad_groundtruth_tensors) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\object_detection\model_lib_v2.py", line 342, in load_fine_tune_checkpoint features, labels = iter(input_dataset).next() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 645, in next return self.__next__() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 649, in __next__ return self.get_next() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 694, in get_next self._iterators[i].get_next_as_list_static_shapes(new_name)) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 1474, in get_next_as_list_static_shapes return self._iterator.get_next() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\data\ops\multi_device_iterator_ops.py", line 581, in get_next result.append(self._device_iterators[i].get_next()) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 825, in get_next return self._next_internal() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 764, in _next_internal return structure.from_compatible_tensor_list(self._element_spec, ret) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\contextlib.py", line 99, in __exit__ self.gen.throw(type, value, traceback) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\eager\context.py", line 2105, in execution_mode executor_new.wait() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\eager\executor.py", line 67, in wait pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[12] = 12 is not in [0, 0) [[{{node GatherV2_4}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] And here is my code, which is an attempt to convert JSON files into TFrecords Sample JSON file { "0.jpg59329": { "filename": "0.jpg", "size": 59329, "regions": [{ "shape_attributes": { "name": "rect", "x": 412, "y": 130, "width": 95, "height": 104 }, "region_attributes": {} }, { "shape_attributes": { "name": "rect", "x": 521, "y": 82, "width": 126, "height": 106 }, "region_attributes": {} } } My Python Code # Ref 1: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md # Ref 2: https://github.com/datitran/raccoon_dataset/blob/master/generate_tfrecord.py import json import glob from object_detection.utils import dataset_util import tensorflow as tf from pathlib import Path flags = tf.compat.v1.app.flags flags.DEFINE_string('output_path', '', 'Path to output TFRecord') FLAGS = flags.FLAGS def json_to_tf(jsonFile, im): with open(im, "rb") as image: encoded_image_data = image.read() with open(jsonFile) as json_file: data = json.load(json_file) for key, value in data.items(): width = 1920 height = 1080 filename = value["filename"] filename = filename.encode('utf8') image_format = b'jpeg' xmins = [] xmaxs = [] ymins = [] ymaxs = [] classes_text = [] classes = [] for x in value["regions"]: xmins.append(x["shape_attributes"]['x']) xmaxs.append(x["shape_attributes"]['width'] + x["shape_attributes"]['x']) ymins.append(x["shape_attributes"]['y']) ymaxs.append(x["shape_attributes"]['height'] + x["shape_attributes"]['y']) classes_text.append("cars".encode('utf8')) classes.append(1) tf_example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_image_data), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), })) return tf_example writer = tf.compat.v1.python_io.TFRecordWriter("train.record") for fn in glob.glob("annotation_refined\\*.json"): for img in glob.glob("images\\*.jpg"): if Path(fn).stem == Path(img).stem: tf_example_1 = json_to_tf(fn, img) writer.write(tf_example_1.SerializeToString()) writer.close() Can someone give any tips on what is going wrong? [1]: https://github.com/abdelrahman-gaber/tf2-object-detection-api-tutorial/blob/master/data_gen/generate_tfrecord.py [2]: https://github.com/tensorflow/models/tree/master/research/object_detection ## 3. Steps to reproduce I execute the code above to convert JSON to TF records ## 4. Expected behavior A TF record that works which I can use when executing the following ``` out_dir=output/ mkdir -p $out_dir python model_main_tf2.py --alsologtostderr --model_dir=$out_dir --checkpoint_every_n=1 \ --pipeline_config_path=mask_rcnn_inception_resnet_v2.config \ --eval_on_train_data 2>&1 | tee $out_dir/train.log ``` Here is my config file ``` # Mask R-CNN with Inception Resnet v2 (no atrous) # Sync-trained on COCO (with 8 GPUs) with batch size 16 (1024x1024 resolution) # Initialized from Imagenet classification checkpoint # # Train on GPU-8 # # Achieves 40.4 box mAP and 35.5 mask mAP on COCO17 val model { faster_rcnn { number_of_stages: 3 num_classes: 1 image_resizer { fixed_shape_resizer { height: 1024 width: 1024 } } feature_extractor { type: 'faster_rcnn_inception_resnet_v2_keras' } first_stage_anchor_generator { grid_anchor_generator { scales: [0.25, 0.5, 1.0, 2.0] aspect_ratios: [0.5, 1.0, 2.0] height_stride: 16 width_stride: 16 } } first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.7 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 17 maxpool_kernel_size: 1 maxpool_stride: 1 second_stage_box_predictor { mask_rcnn_box_predictor { use_dropout: false dropout_keep_probability: 1.0 fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } mask_height: 33 mask_width: 33 mask_prediction_conv_depth: 0 mask_prediction_num_conv_layers: 4 conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } predict_instance_masks: true } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 second_stage_mask_prediction_loss_weight: 4.0 resize_masks: false } } train_config: { batch_size: 16 num_steps: 2 optimizer { momentum_optimizer: { learning_rate: { cosine_decay_learning_rate { learning_rate_base: 0.008 total_steps: 2 warmup_learning_rate: 0.0 warmup_steps: 1 } } momentum_optimizer_value: 0.9 } use_moving_average: false } gradient_clipping_by_norm: 10.0 fine_tune_checkpoint_version: V2 fine_tune_checkpoint: "mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8/checkpoint/ckpt-0" fine_tune_checkpoint_type: "detection" data_augmentation_options { random_horizontal_flip { } } } train_input_reader: { label_map_path: "capillary_labelmap.pbtxt" tf_record_input_reader { input_path: "train.record" } load_instance_masks: true mask_type: PNG_MASKS } eval_config: { metrics_set: "coco_detection_metrics" metrics_set: "coco_mask_metrics" eval_instance_masks: true use_moving_averages: false batch_size: 512 include_metrics_per_category: true } eval_input_reader: { label_map_path: "capillary_labelmap.pbtxt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "train.record" } load_instance_masks: true mask_type: PNG_MASKS } ``` ## 5. Additional context ``` 2020-08-07 10:38:35.337851: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2020-08-07 10:38:38.463844: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll 2020-08-07 10:38:38.513432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:09:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.74GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s 2020-08-07 10:38:38.513852: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2020-08-07 10:38:38.518276: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2020-08-07 10:38:38.522366: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll 2020-08-07 10:38:38.524155: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll 2020-08-07 10:38:38.529030: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll 2020-08-07 10:38:38.531871: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll 2020-08-07 10:38:38.540810: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2020-08-07 10:38:38.542364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2020-08-07 10:38:38.543192: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-08-07 10:38:38.554233: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x21c5e9cf1e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-08-07 10:38:38.554563: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-08-07 10:38:38.555417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:09:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5 coreClock: 1.74GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s 2020-08-07 10:38:38.555804: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2020-08-07 10:38:38.556098: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2020-08-07 10:38:38.556281: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll 2020-08-07 10:38:38.556479: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll 2020-08-07 10:38:38.556678: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll 2020-08-07 10:38:38.556864: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll 2020-08-07 10:38:38.557049: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2020-08-07 10:38:38.558452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2020-08-07 10:38:39.469766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-08-07 10:38:39.470008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 2020-08-07 10:38:39.470183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N 2020-08-07 10:38:39.471856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8583 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:09:00.0, compute capability: 7.5) 2020-08-07 10:38:39.474601: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x21c05d6bb10 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2020-08-07 10:38:39.474889: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5 INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',) I0807 10:38:39.477294 26156 mirrored_strategy.py:341] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',) INFO:tensorflow:Maybe overwriting train_steps: None I0807 10:38:39.482280 26156 config_util.py:552] Maybe overwriting train_steps: None INFO:tensorflow:Maybe overwriting use_bfloat16: False I0807 10:38:39.482280 26156 config_util.py:552] Maybe overwriting use_bfloat16: False WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. W0807 10:38:39.949601 26156 dataset_builder.py:83] num_readers has been reduced to 1 to match input file shards. WARNING:tensorflow:From C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\object_detection\builders\dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`. W0807 10:38:39.953629 26156 deprecation.py:323] From C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\object_detection\builders\dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`. WARNING:tensorflow:From C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\object_detection\builders\dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.data.Dataset.map() W0807 10:38:39.996484 26156 deprecation.py:323] From C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\object_detection\builders\dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.data.Dataset.map() WARNING:tensorflow:From C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\object_detection\data_decoders\tf_example_decoder.py:703: calling map_fn (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version. Instructions for updating: Use fn_output_signature instead W0807 10:38:41.664895 26156 deprecation.py:506] From C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\object_detection\data_decoders\tf_example_decoder.py:703: calling map_fn (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version. Instructions for updating: Use fn_output_signature instead WARNING:tensorflow:From C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\util\dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. W0807 10:38:47.547355 26156 deprecation.py:323] From C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\util\dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead. WARNING:tensorflow:From C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\object_detection\inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.cast` instead. W0807 10:38:50.558459 26156 deprecation.py:323] From C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\object_detection\inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.cast` instead. Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\eager\context.py", line 2102, in execution_mode yield File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 758, in _next_internal output_shapes=self._flat_output_shapes) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2610, in iterator_get_next _ops.raise_from_not_ok_status(e, name) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\framework\ops.py", line 6843, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[10] = 10 is not in [0, 0) [[{{node GatherV2_4}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [Op:IteratorGetNext] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "model_main_tf2.py", line 113, in <module> tf.compat.v1.app.run() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\absl\app.py", line 299, in run _run_main(main, args) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\absl\app.py", line 250, in _run_main sys.exit(main(argv)) File "model_main_tf2.py", line 109, in main record_summaries=FLAGS.record_summaries) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\object_detection\model_lib_v2.py", line 561, in train_loop unpad_groundtruth_tensors) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\object_detection\model_lib_v2.py", line 342, in load_fine_tune_checkpoint features, labels = iter(input_dataset).next() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 645, in next return self.__next__() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 649, in __next__ return self.get_next() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 694, in get_next self._iterators[i].get_next_as_list_static_shapes(new_name)) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 1474, in get_next_as_list_static_shapes return self._iterator.get_next() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\data\ops\multi_device_iterator_ops.py", line 581, in get_next result.append(self._device_iterators[i].get_next()) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 825, in get_next return self._next_internal() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 764, in _next_internal return structure.from_compatible_tensor_list(self._element_spec, ret) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\contextlib.py", line 99, in __exit__ self.gen.throw(type, value, traceback) File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\eager\context.py", line 2105, in execution_mode executor_new.wait() File "C:\ProgramData\anaconda3\envs\4_SOA_OD_v2\lib\site-packages\tensorflow\python\eager\executor.py", line 67, in wait pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[10] = 10 is not in [0, 0) [[{{node GatherV2_4}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] ``` ## 6. System information - OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 - TensorFlow installed from (source or binary): Using pip install - TensorFlow version (use command below): v2.3.0-rc2-23-gb36436b087 2.3.0 - Python version: 3.6 - CUDA/cuDNN version: cudart64_101.dll - GPU model and memory: GTX 2080TI 11GB ``` == check python =================================================== python version: 3.6.11 python branch: python build version: ('default', 'Aug 5 2020 19:41:03') python compiler version: MSC v.1916 64 bit (AMD64) python implementation: CPython == check os platform =============================================== os: Windows os kernel version: 10.0.17763 os release version: 10 os platform: Windows-10-10.0.17763-SP0 linux distribution: ('', '', '') linux os distribution: ('', '', '') mac version: ('', ('', '', ''), '') uname: uname_result(system='Windows', node='DESKTOP-8I4BAP8', release='10', version='10.0.17763', machine='AMD64', processor='Intel64 Family 6 Model 158 Stepping 10, GenuineIntel') architecture: ('64bit', 'WindowsPE') machine: AMD64 == are we in docker ============================================= No == compiler ===================================================== c++.exe (MinGW.org GCC Build-20200227-1) 9.2.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. == check pips =================================================== numpy 1.19.1 protobuf 3.12.3 tensorflow 2.3.0 tensorflow-addons 0.11.0 tensorflow-datasets 3.2.1 tensorflow-estimator 2.3.0 tensorflow-hub 0.8.0 tensorflow-metadata 0.22.2 tensorflow-model-optimization 0.4.1 == check for virtualenv ========================================= False == tensorflow import ============================================ tf.version.VERSION = 2.3.0 tf.version.GIT_VERSION = v2.3.0-rc2-23-gb36436b087 tf.version.COMPILER_VERSION = MSVC 192628806 2020-08-07 10:33:58.725634: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll == env ========================================================== LD_LIBRARY_PATH is unset DYLD_LIBRARY_PATH is unset == nvidia-smi =================================================== Fri Aug 07 10:34:01 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 442.19 Driver Version: 442.19 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... WDDM | 00000000:09:00.0 On | N/A | | 25% 30C P8 28W / 300W | 1352MiB / 11264MiB | 1% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1352 C+G ...mmersiveControlPanel\SystemSettings.exe N/A | | 0 1756 C+G ...2.0_x64__8wekyb3d8bbwe\WinStore.App.exe N/A | | 0 1964 C+G C:\Windows\System32\MicrosoftEdgeCP.exe N/A | | 0 5040 C+G Insufficient Permissions N/A | | 0 5424 C+G C:\Windows\explorer.exe N/A | | 0 11260 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A | | 0 12224 C+G ...lmy\AppData\Roaming\Spotify\Spotify.exe N/A | | 0 12868 C+G ...osoft.LockApp_cw5n1h2txyewy\LockApp.exe N/A | | 0 12900 C+G C:\Program Files\Hue Sync\HueSync.exe N/A | | 0 13396 C+G ...1.95.0_x64__8wekyb3d8bbwe\YourPhone.exe N/A | | 0 13492 C+G ...hell.Experiences.TextInput.InputApp.exe N/A | | 0 13556 C+G ...0.6242.0_x64__8wekyb3d8bbwe\GameBar.exe N/A | | 0 15188 C+G ...x64__8wekyb3d8bbwe\Microsoft.Photos.exe N/A | | 0 16064 C+G Insufficient Permissions N/A | | 0 16748 C+G ...oftEdge_8wekyb3d8bbwe\MicrosoftEdge.exe N/A | | 0 17708 C+G ...dqpraam7r\AcrobatNotificationClient.exe N/A | | 0 17744 C+G ...rosoft Office\root\Office16\OUTLOOK.EXE N/A | | 0 18160 C+G ...AppData\Local\slack\app-4.7.0\slack.exe N/A | | 0 19316 C+G ...DIA GeForce Experience\NVIDIA Share.exe N/A | | 0 21808 C+G ...16211.0_x64__8wekyb3d8bbwe\Video.UI.exe N/A | | 0 22076 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A | | 0 24124 C+G ... Files (x86)\Dropbox\Client\Dropbox.exe N/A | +-----------------------------------------------------------------------------+ == cuda libs =================================================== == tensorflow installed from info ================== Name: tensorflow Version: 2.3.0 Summary: TensorFlow is an open source machine learning framework for everyone. Home-page: https://www.tensorflow.org/ Author-email: packages@tensorflow.org License: Apache 2.0 Location: c:\programdata\anaconda3\envs\4_soa_od_v2\lib\site-packages Required-by: tf-models-official == python version ============================================== (major, minor, micro, releaselevel, serial) (3, 6, 11, 'final', 0) == bazel version =============================================== == check python =================================================== python version: 3.6.11 python branch: python build version: ('default', 'Aug 5 2020 19:41:03') python compiler version: MSC v.1916 64 bit (AMD64) python implementation: CPython == check os platform =============================================== os: Windows os kernel version: 10.0.17763 os release version: 10 os platform: Windows-10-10.0.17763-SP0 linux distribution: ('', '', '') linux os distribution: ('', '', '') mac version: ('', ('', '', ''), '') uname: uname_result(system='Windows', node='DESKTOP-8I4BAP8', release='10', version='10.0.17763', machine='AMD64', processor='Intel64 Family 6 Model 158 Stepping 10, GenuineIntel') architecture: ('64bit', 'WindowsPE') machine: AMD64 == are we in docker ============================================= No == compiler ===================================================== c++.exe (MinGW.org GCC Build-20200227-1) 9.2.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. == check pips =================================================== numpy 1.19.1 protobuf 3.12.3 tensorflow 2.3.0 tensorflow-addons 0.11.0 tensorflow-datasets 3.2.1 tensorflow-estimator 2.3.0 tensorflow-hub 0.8.0 tensorflow-metadata 0.22.2 tensorflow-model-optimization 0.4.1 == check for virtualenv ========================================= False == tensorflow import ============================================ tf.version.VERSION = 2.3.0 tf.version.GIT_VERSION = v2.3.0-rc2-23-gb36436b087 tf.version.COMPILER_VERSION = MSVC 192628806 2020-08-07 10:35:32.098298: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll == env ========================================================== LD_LIBRARY_PATH is unset DYLD_LIBRARY_PATH is unset == nvidia-smi =================================================== Fri Aug 07 10:35:34 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 442.19 Driver Version: 442.19 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... WDDM | 00000000:09:00.0 On | N/A | | 25% 30C P8 26W / 300W | 1344MiB / 11264MiB | 1% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1352 C+G ...mmersiveControlPanel\SystemSettings.exe N/A | | 0 1756 C+G ...2.0_x64__8wekyb3d8bbwe\WinStore.App.exe N/A | | 0 1964 C+G C:\Windows\System32\MicrosoftEdgeCP.exe N/A | | 0 5040 C+G Insufficient Permissions N/A | | 0 5424 C+G C:\Windows\explorer.exe N/A | | 0 11260 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A | | 0 12224 C+G ...lmy\AppData\Roaming\Spotify\Spotify.exe N/A | | 0 12868 C+G ...osoft.LockApp_cw5n1h2txyewy\LockApp.exe N/A | | 0 12900 C+G C:\Program Files\Hue Sync\HueSync.exe N/A | | 0 13396 C+G ...1.95.0_x64__8wekyb3d8bbwe\YourPhone.exe N/A | | 0 13492 C+G ...hell.Experiences.TextInput.InputApp.exe N/A | | 0 13556 C+G ...0.6242.0_x64__8wekyb3d8bbwe\GameBar.exe N/A | | 0 15188 C+G ...x64__8wekyb3d8bbwe\Microsoft.Photos.exe N/A | | 0 16064 C+G Insufficient Permissions N/A | | 0 16748 C+G ...oftEdge_8wekyb3d8bbwe\MicrosoftEdge.exe N/A | | 0 17708 C+G ...dqpraam7r\AcrobatNotificationClient.exe N/A | | 0 17744 C+G ...rosoft Office\root\Office16\OUTLOOK.EXE N/A | | 0 18160 C+G ...AppData\Local\slack\app-4.7.0\slack.exe N/A | | 0 19316 C+G ...DIA GeForce Experience\NVIDIA Share.exe N/A | | 0 21808 C+G ...16211.0_x64__8wekyb3d8bbwe\Video.UI.exe N/A | | 0 22076 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A | | 0 24124 C+G ... Files (x86)\Dropbox\Client\Dropbox.exe N/A | +-----------------------------------------------------------------------------+ == cuda libs =================================================== == tensorflow installed from info ================== Name: tensorflow Version: 2.3.0 Summary: TensorFlow is an open source machine learning framework for everyone. Home-page: https://www.tensorflow.org/ Author-email: packages@tensorflow.org License: Apache 2.0 Location: c:\programdata\anaconda3\envs\4_soa_od_v2\lib\site-packages Required-by: tf-models-official == python version ============================================== (major, minor, micro, releaselevel, serial) (3, 6, 11, 'final', 0) == bazel version =============================================== ```

more issues matching it.

chunduriv · September 19, 2023, 10:43pm

@gautam,

Unfortunately, we do not support research models and suggest you to use official object detection models.

On M1 MacBook Pro, we ran the same code and it is working as expected.

Please see the gist for running on M1.

Thank you!

gautam · October 10, 2023, 6:57am

@chunduriv I am trying to run the above gist on M2. but running into errors

As you mentioned, I cant install tf-models-official package for 2.13 but I am able to install the same from models repo which gives me 2.5 version …

Is there an api change or am I missing a package. I am using the same ipynb file from the gist so I believe I have installed all the required packages.