Multiple, Different Shaped Inputs

sphere · January 25, 2022, 5:04pm

Hi All,

New to TensorFlow and Neural Networks in general, apologies if this is an overly simple question. I did do some digging but didn’t find what I am looking for. I’m wanting to build a model that accepts different types of data as multiple inputs. For example, a US based, 5 digit, postal code and the time of day, expressed as an integer (1 - morning, 2 - afternoon, 3 - evening, etc.).

I’m not sure if it’s the right approach but, I’m one hot encoding the data. Postal codes in this case look like (note, I’m working in JS):

tf.tensor3d([
	[ 
		// 94701
		[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 ],
		[ 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 ],
		[ 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 ],
		[ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
		[ 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 ]
	],
	...
]);

This has worked really well as a single input, struggling to introduce the second one however. Data for the time of day looks like:

tf.tensor3d([
	[ 
		// evening
		[ 0, 0, 1, 0 ]
	],
	...
]);

Everything I’ve tried so far leads me to shape mismatch errors. I’ve tried concatenating the tensors across different axis, tried concatenating layers, and have tried using an array of inputs with tf.model().

Seems like multiple, different data type, inputs should be a common use case. Am I thinking about this wrong? Should I be trying to normalize the data so it takes the same shape (use 10 digits to represent the time of day for example)? Should I be combining the data into a single tensor? Something else I’m missing?

Many thanks!

Jason · January 25, 2022, 6:58pm

Welcome and thanks for the question certainly a great one!

How come you are using tensor3d here? Maybe I am misunderstanding what you are trying to do, but if you just have a single post code and a single time of day as input to the model I would have expected it to be in the form of:

// 94701                        // evening
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0 ]

So the above would be a tensor1d which when sent as a batch to the model would be a tensor2d (can just call expandDims() to do that.

So as a batch you can do multiple predictions which would look like:

// Tensor2D
[
  [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0 ],
  [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0 ],
  [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0 ],
  [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0 ]
]

You do not need to normalize 1 hot encoded data here.

PS if you are just starting out you may be interested in my upcoming course for TensorFlow.js that covers some topics like this over on edX. Free to do if you do not need certification:

sphere · January 25, 2022, 7:38pm

Hi Jason,

Thanks for the response! The 3D tensor is because there are (or will be) multiple postal codes and each character of the postal code is encoded. Here’s a clearer example:

[
	[ 
		[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 ], // 9
		[ 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 ], // 4
		[ 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 ], // 7
		[ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], // 0
		[ 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 ] // 1
	], [ 
		[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 ], // 9
		[ 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 ], // 4
		[ 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 ], // 1
		[ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], // 0
		[ 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 ] // 2
	],
	...
]

Hopefully that makes sense. From your response, sounds like this could be a 2D tensor however, yeah? If I normalize the time of day data to match the postal code data, I could do something like:

[
	[ 
		[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 ], // 9
		[ 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 ], // 4
		[ 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 ], // 7
		[ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], // 0
		[ 0, 1, 0, 0, 0, 0, 0, 0, 0, 0 ], // 1
		[ 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 ] // evening
	], 
	...
]

This would be simple enough to do but feels a little dirty. But maybe that’s just how it would have to be done? I worry about how easy this will be when adding more data inputs as well. Not all data may conform so easily. Actually, another thought, as I’m working this through in my head, I think I could use your example but expand the columns for each postal code character.

[ 
	[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, // 9
	0, 0, 0, 0, 1, 0, 0, 0, 0, 0, // 4
	0, 0, 0, 0, 0, 0, 0, 1, 0, 0, // 7
	1, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0
	0, 1, 0, 0, 0, 0, 0, 0, 0, 0, // 1
	0, 0, 1, 0  ] // evening
]

Hmm, that seems like it would work well. And would support future inputs as they would continue to get tacked on to the end. Thoughts in this?

I was also wondering if I should not use the one hot encoding and simply have something like:

[ 
	[ 94701, 2 ],
	[ 94102, 1 ],
	...
]

The encoding of the postal codes has worked really well in my tests, and I haven’t tried this approach, but am curious if it could/would work.

Thanks again! I’ll check out the link, thanks for sharing!

Jason · January 25, 2022, 9:27pm

Ah. I see.

So some questions and thoughts:

1 hot encoding 40,000+ post codes is maybe less than ideal, as you would have 40,000 inputs each time so…
What is it about the post code data that is actually useful to potentially solving the problem you are trying to solve? For example, the first 2 digits of the post code are most useful for US post code at least as they will basically determine the city/town. And maybe if you take the first 3 digits, that may be enough to group similar located zips within that town which may be good enough for your needs? Thus you would then have a max of 1000 values in a 1 hot encoding of those 3 digit zips (assuming all digits are used).
It would need to be encoded in some form as 94701 is not “better” or “worth more” than 94700 by simply being a higher number so some encoding would need to be used to avoid issues there.
Your current proposition above essentially converts a 5 digit post code into 5 separate 1 hot encodings each with 10 values, so 50 total. I have not seen it done like that before, is that working well for you? I was just talking to @Laurence_Moroney about this and we are curious to hear how well that works if you do manage to try that.

Matthew_Soulanille · January 25, 2022, 9:33pm

I should preface this by saying I’m by no means an export at model design, but I have some thoughts I’d like to share.

If your model is relating postal codes to some geographical feature, then you should choose an encoding where small differences in a value correspond to small differences in geographic location.

This stackexchange thread has a lot of good discussion on the topic. I particularly like the idea of splitting the Earth (or US) into a quadtree and encoding zip codes as regions of the quadtree since numerically similar quadtree regions correspond to geographically similar areas.

You can also provide additional data based on the geographical location if it would help your model. For example, if you’re predicting the weather, I’d suggest adding “distance from a body of water / the ocean” as an input (which can be looked up / computed from the geographical location). (alternatively, you can tell the model what quadtree regions are water and let it figure it out).

I hope this helps!

Bhack · January 25, 2022, 9:39pm

As postalcodes could be mapped to geofenced or bucketized coordinate encoding you could try to take a look at:

P.s. for the feature column part in the mentioned tutorial it was related only to TF 1.x.

Bhack · January 25, 2022, 9:47pm

You can explore the embedding approach:

https://towardsdatascience.com/why-you-should-always-use-feature-embeddings-with-structured-datasets-7f280b40e716

Jason · January 25, 2022, 11:19pm

As this question is for TensorFlow.js though not Python, it seems its just a function that transforms a number into some bucketized version right? So that is not too hard to replicate in JS as a pre-processing step before you use the values if needed to be done client side.

Curious to see the results you get @sphere for the different approaches including your one with the 5x 1 hot encodings representing each digit, vs someting like this bucketed version to reduce the number of unique encodings needed to represent the data

Bhack · January 26, 2022, 12:07am

Yes but I don’t know if it was an hard constrain to prepare the model training in JS directly or not.
So I’ve just pointed also to some python resources for the Keras structured data classification that supersed the tf .feature_column (TF 1.x) mentioned in these zip/geo tutorials.

Many users just prepare the model/experiments on python and then inference on JS so python is not mutually exclusive to JS.

But if it was end2end TF.js requirements yes you could just prepare your embedding or your bucketized input data directly in JS.

Jason · January 26, 2022, 12:33am

Thanks for confirming! Indeed on my side I see a lot of folk starting to do whole pipelines purely in JS now (node for larger data and models) as web folk get more confident with ML.

I think this will continue to grow into the future and then more utils will appear in the JS community too like Pandas now exists as Danfo.js for example and so on. Hopefully more useful stuff like this will get ported in the future too as needs for these things arise.

Bhack · January 26, 2022, 12:59am

Yes also js/webassembly is not exclusively related to the browser or node.js.

There will be expanding opportunities with WASI in now browsers environment/runtime like e.g.:

github.com

WebAssembly/wasi-nn/blob/main/docs/Explainer.md

# Design Motivation and Considerations

## Why Wasm for ML?
Trained machine learning models are typically deployed on a variety of devices with different
architectures and operating systems. WebAssembly provides an ideal portable form of deployment for
those models. 

## Why WASI?
Although a whole machine learning framework could be potentially compiled into Wasm, special
hardware acceleration is often needed in order to be performant. For example, SIMD instructions such
as AVX512 on a CPU can speed up performance by several hundred times. Other hardware acclerator
examples are GPU, TPU, FPGA. All of those acceleration mechanisms are not available within Wasm. In
addition, the field of machine learning is still evolving rapidly, with new operations and network
topologies emerging continuously. It would be a challenge to define an evolving set of operations to
support in the API. 

## Design Goals
Our approach is to start with a model loader API, inspired by WebNN’s model loader proposal. It
treats a machine learning graph as an opaque handle that we can use to send inputs, get outputs, and
perform forward propagation. Since inferencing is the main ML use case, and our initial focus, this

This file has been truncated. show original

Jason · January 26, 2022, 8:42pm

Thanks for sharing! Interesting read!

sphere · January 26, 2022, 11:38pm

Hi All,

Thanks for all the responses. I’ll start working through the material provided as I can. I am working in Node. I’m not opposed to using Python but am not as proficient in the language and JS has worked great for rapid prototyping. As for what I’m trying to solve, at the moment, I’m experimenting and working through some conceptual ideas. The problem isn’t necessarily concrete and I’m mostly trying to understand how I can work with different types of data. There’s a couple of possible use cases around postal codes (or location data) as inputs I’m exploring.

I don’t have a data set I’m working with. I wrote code to generate random, five char strings, with digits 0-9, to represent postal codes. I then introduced a pattern in the data that associates groups of codes to four possible output categorizations. The data is a bit (or totally?) contrived but it’s been helpful as I know there is a pattern to train and test against. This is a bit simplified but it basically looks like:

// randomly select our output values
var outputs = [ '1', '2', '3', '4' ];
var [ output1 ] = outputs.splice( Math.floor( Math.random() * outputs.length ), 1 );
var [ output2 ] = outputs.splice( Math.floor( Math.random() * outputs.length ), 1 );
var [ output3 ] = outputs.splice( Math.floor( Math.random() * outputs.length ), 1 );
var [ output4 ] = outputs;

// create the data set
for( var i = 0; i < sampleSize; i++ ){
	
	// generate a pseudo postal code

	// now map the postal code to an output, creating a pattern in the data
	if( postalCodeAsInt < 25000 ){
		output = output1;
	} else if( postalCodeAsInt < 50000 ){
		output = output2;
	} else if( postalCodeAsInt < 75000 ){
		output = output3;
	} else {
		output = output4;
	}

	// push postal code as the input and output as the output value to data set
}

There’s also code that sets every 3rd postal code to 94701 with the same, randomly selected, output every time. So it doesn’t follow the same rule/pattern as the other postal codes. This was me testing to see if the model could pull that out in its predictions (it can no prob).The data comes out looking something like:

[ '10561', '2' ],
[ '68267', '3' ],
[ '94701', '1' ],
[ '55471', '3' ],
[ '32613', '1' ],
[ '94701', '1' ],
[ '73666', '3' ],
...

The data and the outputs are one hot encoded, as seen in my prior post. I’ve been playing with different types of models and currently am running this through an RNN, I think that’s why I currently have these as 3D tensors. I know this isn’t really series data, there’s probably a more appropriate model to use, but, again, experimenting, learning. I assume it’s because of the simplicity of the data but, the model can predict the output with 100% accuracy after training against a relatively small data set. I don’t expect real world data to be as clean (I probably should introduce some noise in my tests).

Anyway, hopefully that’s some helpful context. I’m currently at the point where I would like to add more inputs, hence the thread.

sphere · January 31, 2022, 4:41pm

Hi All,

Following up on this, stringing the one hot encoded data together into 2D tensors worked great!

[ [ 
	0, 0, 0, 0, 0, 0, 0, 0, 0, 1, // 9
	0, 0, 0, 0, 1, 0, 0, 0, 0, 0, // 4
	0, 0, 0, 0, 0, 0, 0, 1, 0, 0, // 7
	1, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0
	0, 1, 0, 0, 0, 0, 0, 0, 0, 0, // 1
	0, 0, 1, 0 // evening
], ... ]

Jason · January 31, 2022, 6:42pm

Thanks for following up @sphere! Good to know!