Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency of New Saliency Metric Calculation #4

Open
yulongwang12 opened this issue May 14, 2018 · 2 comments
Open

Inconsistency of New Saliency Metric Calculation #4

yulongwang12 opened this issue May 14, 2018 · 2 comments

Comments

@yulongwang12
Copy link

Hi Piotr,

Thanks for your great work and code release. Currently I am working on saliency and notice that you propose a new saliency metric (Sec. 3.2) that s(a, p) = log(a) - log(p). However, when I evaluated on ImageNet with caffe pretrained GoogleNet, I got the following results

\ Saliency Metric (paper reported) Saliency Metric (mine)
ground truth 0.284 0.3044
max box 1.366 1.3443
central box 0.645 0.7238

From max box result, I think my result conforms to your calculation. And ground truth result is close (the difference is because that I didn't calculate all the ground truth boxes). But the difference of central box result is quite large, and I cannot figure out what's going wrong. So could you please release saliency metric evaluation code? Thank you very much, and looking forward to your replay.

Best regards

Yulong

@PiotrDabkowski
Copy link
Owner

Hmm, that's strange, but this could be explained by the differences in classifiers (are you sure you are using googlenet?) and the resizing strategy. The classifier has a very similar loss on both max box and central box (the -log(p) term). Obviously, the area is halved in case of the central box, so the area term will be log(1)=0 in case of the central box and log(0.5)=-0.7 in case of the central box. We would therefore expect the saliency metric for the central box to be about 0.7 smaller. In my case the classifier actually performed slightly better on central box than on max box, in your case it appears that the opposite was true. Anyway, the results look reasonable, make sure you are using the same net though.

The main implementation used for the paper was originally written in tensorflow and was a mess, so I reimplemented the main part in Pytorch. I could release the evaluation code as well if that helps.

@yulongwang12
Copy link
Author

I'm currently using Caffe pretrained googlenet, which is also called inception-v1. So do you also use inception-v1 model? I find tensorflow official implementation here, is that right?

I also want to confirm the preprocessing procedure for central box. First get the cropped area at the center with the size of (1/sqrt(2) * H, 1/sqrt(2) *W). Then resize the cropped image to 224 x 224, and do the normalization as before. Did I miss something? Thanks for your reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants