You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think it prevents the logsumexp from being too large.. Is there any reference for doing this? or just a practical issue? I derived it to the following equation..
In the mentioned paper (https://arxiv.org/abs/2206.13517), regularization term is added to prevent the divergence.
Is it right? @enijkamp Can I get further details about the loss term?
I think it is intended for numerical stability, but I don't know how it works.
Could you explain it or provide a reference for that code?
jaxformer/jaxformer/models/decoder/inter/model.py
Line 71 in 9c41fd4
The text was updated successfully, but these errors were encountered: