Great questions.

As @lgusm mentioned, âthis defines how the starting random weights of that layer will be initializedâ. And there are a lot of initializers to choose from ( Module: tf.keras.initializers Â |Â TensorFlow Core v2.8.0).

From Experiments on learning by back propagation (Plaut, Nowlan & Hinton (1986)) (see also Learning representations by back-propagating errors (Rumelhart, Hinton & Williams (1986)):

[the learning procedure] repeatedly adjusts the weights in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector given the current input vectorâŚ

âŚ

The aim of the learning procedure is to find a set of weights which ensures that for each input vector the output vector produced by the network is the same as (or sufficiently close to) the desired output vector.

âŚ

To minimize [the error] by gradient descent it is necessary to compute the partial derivative of E with respect to each weight in the network

âŚ

The learning procedure is entirely deterministic, so if two units within a layer start off with the same connectivity and weights, there is nothing to make them ever differ from each other. **We break this symmetry by starting with small random weights**.

The input shape depends on the data youâre feeding into the networkâs input layer. When building a model (at least in a supervised learning setting like basic classification with labelled data), you choose or design an initializer, a network architecture, an optimizer, etc - so itâs kind of an art form.

https://arxiv.org/abs/1611.02167 (2016):

While constructing a CNN, a network designer has to make numerous design choices: the number of layers of each type, the ordering of layers, and the hyperparameters for each type of layer, e.g., the receptive field size, stride, and number of receptive fields for a convolution layer. The number of possible choices makes the design space of CNN architectures extremely large and hence, infeasible for an exhaustive manual search.

There are niche research fields - e.g. AutoML and meta-learning - where you use ML for ML to optimize the tasks that youâd normally do manually. For example, from https://arxiv.org/abs/1611.01578 :

Our experiments show that Neural Architecture Search can design good models from scratch, an

achievement considered not possible with other methods.

@tornike_amaghlobeli There are a lot of good courses that teach ML and deep learning theory with TensorFlow and Keras, such as Courseraâs deeplearning.ai and Udacity - you can check them out here Basics of machine learning Â |Â TensorFlow if youâre interested.

In addition, Kaggle has good material for learning ML and deep learning - Learn Intro to Deep Learning Tutorials, A Single Neuron | Kaggle, Deep Neural Networks | Kaggle.