Best starting point for understanding the codebase?


I would like to understand the TensorFlow codebase and wondered if there are some recommended resources. As for my background, I am a very experienced software developer with many years in machine learning and the TensorFlow APIs. But, I have very little knowledge of such concepts like Kernel, NodeDef, GraphDef, ‘registering a kernel’. I had read through some background on MLIR, and have some understanding of this.

At the moment I’ve compiled the source with bazelisk and taken it for a run in vscode.

Thanks in advance for any advice!




Hi Henry

If you already know TensorFlow, that’s a great start!

I imagine you’ve seen this already: TensorFlow koduna katkıda bulunun

of course it doesn’t answer your question but gives a start.

in regards to the terms you mentioned:

Kernel is a special type of node that is used to perform a specific type of operation on tensors, such as a matrix multiplication or a convolution.

GraphDef is a definition of a computation graph. A computation graph is a way of representing the operations and data flow in a machine learning model, and consists of nodes that represent different operations or pieces of data.

NodeDef is a definition of a node in a computation graph. It defines the properties and behavior of a node in a computation graph, such as the type of operation it performs, the data it takes as input, and the data it produces as output

Both GraphDef and NodeDef are saved in a protocol buffer (.pb) format

Reigstering a Kernel is the process of adding a new kernel function to the list of available kernels that can be used in a computation graph. This can be useful for adding custom kernel functions or for using third-party libraries with TensorFlow.

sorry for the non-answer but I hope it helped a little

Hi Luiz,

Thank you for the overview. That is definitely helpful. Yesterday I was suspecting this was the case, but didn’t know where to look. Finally stumbled onto tensorflow/core/ops/ops.pbtxt, which helps complete another tiny part of the picture.

So, just to confirm, I have looked over that link you sent

but I didn’t see any overall codebase architecture design principles. Does one exist, to your knowledge?

Thanks again,


I imagine it exists, I just don’t know where it is!
If I find something in that line I’ll let you know

1 Like

Tensorflow is ::

  1. a library,
  2. a common front-end to a variety of operating system plug-ins
  3. a controller for compiling to custom binaries used by the plug-ins.

You may find it difficult to trace through the code. The build process exposes a lot of the structure.