Profiling Multi-Process TF Sessions

We have a multi-process application built upon Tensorflow. Separate sessions are launched from separate processes. Each session executes a graph composed of custom OPs representing steps of an image processing algorithm. We would like to profile our application. Specifically, we would like to analyze the performance of each OP and each session, and would like to see if OPs of different sessions overlap in the overall execution. Is this something that can be accomplished by Tensorboard or do we have to rely on Nsight System from Nvidia?