What kind of network architecture is used for input feature with multiple X,Y,Z coorinates(for ex: 4 coordinates (x1,y,z1), (x2,y2,z2),(x3,y3,z3), (x4,y4,z4) ) & few other numeric input features and the target is numeric.
The coordinates are related to spatial information(drone coordinates in xyz plane to be specific). Is there any specific network that preserves the spatial meaning or just a simple regression network is fine?

Whatever you’re doing you need to be careful about in-variances. You probably want relative vectors between the things you care about. It’s hard to say without more info.

The XYZ coordinates are the drone locations & other input features are the time measurements(time taken for the radio wave to get reflected back from other drones & ground object of interest). The ground object is being tracked by the drones. The target is the ground object location.
So, I want to train the network with time measurements and drone coordinates & predict the XY coordinates on the ground. (Unknown is the ground object coordinates)
The drone operates in a fixed environment/area.
I have a mathematical solution for this, but I also want to compare the results with the neural net.
Thanks.

Do you need the neural network just for tracking the bbox of the target object on the ground?

If you already know the drone XYZ from the ground and yaw, pitch and roll and the camera is calibrated you can project the bbox in pixel on the ground plane.

Yes, I need a model only to locate the object coordinates.
Mathematical, one of the solutions to this approach is to solve the least square problem where my knowns are drone locations & unknown is the ground object location, speed of radio wave is a constant.
So, I train the network with drone coordinates & time measurements and target as Ground coordinates. (drones operate in fixed areas).
But as the drone coordinates are vectors, I wanted to know if there is a specific network architecture that retains the spatial meaning of the coordinates rather just converting it to a 1-d array and feed to the network.

Yes, no camera. These are the drone swarms. Drone swarms communicate with each other using radio waves to detect object location on the ground. So, I have large data of time measurements & the object location and want to use this for training.

I was looking at the problem from the perspective of non-linear regression. The time measurements are not the absolute time but just the time taken the radio waves to reach other drone/object. I will have a look at the links though.
Any suggestions on how to pass each vector of coordinates as input? Is it a right approach to convert all coordinates it to 1-d array?