Calculate Hessian of loss with respect to model layer for a batch of samples


I want to calculate the influence score for a model. This requires computing the average of the second derivation of training loss for the whole dataset. The picture below shows it:

The influence score can be calculated as:


What I need to do is to calculate H once and then plug in different test/train values into the left and right terms, respectively.

First of all, it turns out that if I am to calculate the second derivative, I have to do it per layer of the model. Ok, this is acceptable.

My question is, how could I vectorize this Hessian calculation for a batch of samples? For example, given that my model layer is a fully connected layer of shape (200, 10) and a batch of 13 images, what I need is a tensor of shape (13,200,10,200,10) which is the desired H.

So far I have been able to do it element-wise, but have had a very hard time finding a way to vectorize the process as it is super slow.


Do you want to replicate ?

I have not tested the repo personally but have you already tried to look at:

Thank you for your reply.

Yes, I want to replicate it. The thing is formula I mentioned is the vanilla version which is too slow. There are other approximation methods such as the hessian vector product.

I wanted to implement the Vanilla version on MNIST but as it is very expensive I wanted to see how good are the results of the original version before switching to approximation methods.