Dataset with XML and Polygons

Hi all - firstly, I’m sorry if this is the wrong place to post this, but I’m honestly not sure how to tackle this problem. It may be worth prefacing this with the fact I’m still very new to TensorFlow and learning every day, so any help is really valuable.

I have a dataset structured as such:

{Dataset}
----- Images
---------- *.jpg
----- Annotations
---------- *.xml

Each image is named the same as the corresponding annotation XML, so image_1.jpg and image_1.xml. This is fine, and I’ve done a bunch with this such as overlaying the annotations and the images with different class colours to verify they’re correct.

Where I struggle now is that all of the resources I see online for dealing with XML files are for bounding boxes. These XML files all use polygons, structured like: (obviously the points aren’t actually all 1s)

There are several classes with several polygons per image.

How would I go about preparing this dataset for use in a semantic segmentation scenario?

Thanks in advance, I really appreciate any help I can get.

		<polygon>
			<point>
				<x>1</x>
				<y>1</y>
			</point>
			<point>
				<x>1</x>
				<y>1</y>
			</point>
			<point>
				<x>1</x>
				<y>1/y>
			</point>
			<point>
				<x>1</x>
				<y>1</y>
			</point>
			<point>
				<x>1</x>
				<y>1</y>
			</point>
		</polygon>

@book_keeper1 Welcome to Tensorflow forum!

Here’s a high level instruction for preparing your dataset with polygon annotations for semantic segmentation:

Parse XML Files: -

Use a suitable XML parsing library (e.g., ElementTree in Python) to extract polygon coordinates and classes from each XML file. Load the XML file, locate the relevant elements (e.g., <polygon> tags), and extract:

  • Image name
  • Class names
  • Polygon coordinates (lists of x, y pairs for each polygon vertex)

Choose a suitable mask representation:

  • Single-channel image with pixel values representing class IDs
  • Multi-channel image with one channel per class (binary masks)
  • NumPy array with shape (height, width, num_classes)
  • Fill Polygons: Iterate through polygon coordinates and fill pixels within each polygon with the corresponding class ID or binary value.
  • Handle Multiple Classes: If multiple classes exist per image, create separate masks for each class or use a multi-channel representation.

Handle Overlapping Polygons: - If polygons overlap, decide on a priority scheme (e.g., assign the class of the first polygon drawn).
If appropriate, merge overlapping polygons of the same class into a single mask region.
Scale Pixel Values: - If necessary, normalize pixel values to a specific range (e.g., 0-1).

Also, don’t forget below points :

  • Adjust code for multi-channel masks or NumPy arrays as needed.
  • Explore libraries like imgaug or Pillow for mask manipulation.
  • Choose a mask format compatible with your segmentation model.

Let us know if this helps!