-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[BUGFIX] fix ELU function will appear nan when calculating the gradient #14673
Conversation
python/mxnet/gluon/nn/activations.py
Outdated
@@ -158,7 +158,8 @@ def __init__(self, alpha=1.0, **kwargs): | |||
self._alpha = alpha | |||
|
|||
def hybrid_forward(self, F, x): | |||
return F.where(x > 0, x, self._alpha * (F.exp(x) - 1.0)) | |||
_x = F.where(x < 0, x, F.zeros_like(x)) | |||
return F.where(x > 0, x, self._alpha * (F.exp(_x) - 1.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution!
Actually we do have ELU implemented in the backend so you can call it directly
Please use:
return F.LeakyReLU(x, act_type='elu', slope=self._alpha)
here instead and also you want to change: https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_gluon.py#L1183 to:
return mx.nd.expm1(x) if x <= 0.0 else x
so that the test would still pass after this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug does not affect the forward calculation, but occurs when the gradient is calculated backwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fierceX Yes I understand, and I'm presenting the recommended way to fix this issue to you in the above comment.
@fierceX You also need to change tests/python/unittest/test_gluon.py Line 1183 to: return mx.nd.expm1(x) if x <= 0.0 else x so that the test could pass, thanks! |
python/mxnet/gluon/nn/activations.py
Outdated
@@ -158,7 +158,7 @@ def __init__(self, alpha=1.0, **kwargs): | |||
self._alpha = alpha | |||
|
|||
def hybrid_forward(self, F, x): | |||
return F.where(x > 0, x, self._alpha * (F.exp(x) - 1.0)) | |||
F.LeakyReLU(x, act_type='elu', slope=self._alpha) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you have an extra whitespace here, please get rid of that and this is good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, less return
.
python/mxnet/gluon/nn/activations.py
Outdated
@@ -158,7 +158,7 @@ def __init__(self, alpha=1.0, **kwargs): | |||
self._alpha = alpha | |||
|
|||
def hybrid_forward(self, F, x): | |||
return F.where(x > 0, x, self._alpha * (F.exp(x) - 1.0)) | |||
return F.LeakyReLU(x, act_type='elu', slope=self._alpha) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The extra space before return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Will merge once the test passes.
Note: There should still be a bug in the |
@fierceX We'll surely investigate that. |
@fierceX Can you rebase and do a force push? |
@fierceX Merged, thanks for your contribution! |
…nt (apache#14673) * fix ELU * fix * fix * fix * fix * fix
Description
fix ELU function will appear
nan
when calculating the gradiente.g:
output is
After experiments, it was found that the
where
and theexp
were used together and that this bug would occure.g:
output:
When the
exp
calculation appearsinf
, even ifwhere
does not select the value, but the gradient will still benan
,so this PR does not completely fix the problem, but based on this, modified the ELU calculation method so that it does not appearnan