Very Basic Tensorflow

Somebody please explain me exactly what is going on in this code

model = Sequential()
model.add(Dense(10, input_dim=x.shape[1], activation='relu'))
model.add(Dense(50, input_dim=x.shape[1], activation='relu'))
model.add(Dense(10, input_dim=x.shape[1], activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
model.add(Dense(y.shape[1],activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

model.fit(x_train,y_train,validation_data=(x_test,y_test),
          verbose=2,epochs=10)

what is input_dim exactly, what are the numbers 10,50,10,1 and why is y.shape[1] in last layer
I know activation functions but what is kernel_initializer =‘normal’ .

thanks in Advance

Hi Tornike,

the numbers your are asking are the number of neurons on each of those layers.

for kernel_initializer (docs), this defines how the starting random weights of that layer will be initialized. the ‘normal’ might (I don’t know for sure) generate the numbers from a random values from a Normal distribution.

y.shape[1] is the second dimension size of y, for example:

a = [[1, 2, 3],[1, 2, 3]]
b = tf.constant(a)
b.shape
>>> TensorShape([2, 3])
b.shape[1]
>>> 3

you can find more information here: tf.shape  |  TensorFlow Core v2.8.0

Hi Igusm, thank you for your reply,

but i wonder are these neurons arbitrary numbers ? I think that y.shape[1] is output(labels) but i can’t understand how the numbers of neurons are chosen in those layers.

The number of neurons will affect the model’s ability to learn and its computation requirements. There is usually some room for experimentation there so they are arbitrary, but can be changed to try to get better performance. The only size that is important will be the output layer, since it needs to be the same size as the labels that it is being compared to in the loss function.

for some additional resources, you might want to check out this video series:

it will give a better understanding of the neurons and layers.

The first colab is a gem to play with and experiment until you can get the idea.

1 Like

Great questions.

As @lgusm mentioned, “this defines how the starting random weights of that layer will be initialized”. And there are a lot of initializers to choose from ( Module: tf.keras.initializers  |  TensorFlow Core v2.8.0).

From Experiments on learning by back propagation (Plaut, Nowlan & Hinton (1986)) (see also Learning representations by back-propagating errors (Rumelhart, Hinton & Williams (1986)):

[the learning procedure] repeatedly adjusts the weights in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector given the current input vector…
…
The aim of the learning procedure is to find a set of weights which ensures that for each input vector the output vector produced by the network is the same as (or sufficiently close to) the desired output vector.
…
To minimize [the error] by gradient descent it is necessary to compute the partial derivative of E with respect to each weight in the network
…
The learning procedure is entirely deterministic, so if two units within a layer start off with the same connectivity and weights, there is nothing to make them ever differ from each other. We break this symmetry by starting with small random weights.

The input shape depends on the data you’re feeding into the network’s input layer. When building a model (at least in a supervised learning setting like basic classification with labelled data), you choose or design an initializer, a network architecture, an optimizer, etc - so it’s kind of an art form.

https://arxiv.org/abs/1611.02167 (2016):

While constructing a CNN, a network designer has to make numerous design choices: the number of layers of each type, the ordering of layers, and the hyperparameters for each type of layer, e.g., the receptive field size, stride, and number of receptive fields for a convolution layer. The number of possible choices makes the design space of CNN architectures extremely large and hence, infeasible for an exhaustive manual search.

There are niche research fields - e.g. AutoML and meta-learning - where you use ML for ML to optimize the tasks that you’d normally do manually. For example, from https://arxiv.org/abs/1611.01578 :

Our experiments show that Neural Architecture Search can design good models from scratch, an
achievement considered not possible with other methods.

@tornike_amaghlobeli There are a lot of good courses that teach ML and deep learning theory with TensorFlow and Keras, such as Coursera’s deeplearning.ai and Udacity - you can check them out here Basics of machine learning  |  TensorFlow if you’re interested.

In addition, Kaggle has good material for learning ML and deep learning - Learn Intro to Deep Learning Tutorials, A Single Neuron | Kaggle, Deep Neural Networks | Kaggle.

1 Like

input_dim=x.shape[1]

Those don’t make any sense. I don’t think they should be there.