Improve sparse adagrad update #9651

eric-haibin-lin · 2018-01-31T23:34:08Z

Description

The weight decay term in Adagrad is incorrect. See Incorrect weight_decay implementation in AdaGrad #9363 for more details.
The sparse update is not fast enough for the language model I'm training. Moving the logic to cpp backend so that we don't suffer from blocking calls such as RowSparseNDArray.indices and RowSparseNDArray.data.

@ZiyueHuang @piiswrong @sxjscience @cjolivier01 @szha
Commit author is mistaken. Nvm.

Checklist

Essentials

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

ZiyueHuang

Looks good. Just some minor comments.

ZiyueHuang · 2018-02-01T02:48:55Z

src/operator/optimizer_op-inl.h

+      }
+      const DType grad_squared = grad_rescaled * grad_rescaled;
+      state_data[data_j] += grad_squared;
+      const DType div = grad_rescaled / math::sqrt(state_data[data_j] + epsilon);


nit : use mshadow_op::square_root::Map? Seems that other kernels use mshadow_op instead of directly call math func

ZiyueHuang · 2018-02-01T02:50:10Z

src/operator/optimizer_op-inl.h

+      state_data[data_j] += grad_squared;
+      const DType div = grad_rescaled / math::sqrt(state_data[data_j] + epsilon);
+      // No need to use KERNEL_ASSIGN, as we already checked req is kWriteInplace
+      out_data[data_j] = weight_data[data_j] + div * -lr;


nit : ... - div * lr?

szha · 2018-02-01T22:09:23Z

python/mxnet/optimizer.py

+                      'rescale_grad': self.rescale_grad}
+            if self.clip_gradient:
+                kwargs['clip_gradient'] = self.clip_gradient
+            sparse.adagrad_update(weight, grad, history, out=weight, lr=lr, wd=wd, **kwargs)


since wd is still passed to the operator, we can just check whether wd is 0 or not in the backend.

szha · 2018-02-01T22:10:17Z

src/operator/optimizer_op-inl.h

+  float wd;
+  DMLC_DECLARE_PARAMETER(AdagradParam) {
+    DMLC_DECLARE_FIELD(lr)
+    .describe("Learning rate");


indentation?

It's indented. What's the concern?

nevermind. it's correct

cjolivier01 · 2018-02-27T21:40:07Z

Is this ready to go in?

eric-haibin-lin · 2018-03-01T08:21:29Z

Hi everyone,

Considering the wd term is implemented inconsistently in many other optimizers (see #9881) , I removed the changes for wd in this PR so that the scope of changes is limited. Let's discuss how to fix the behavior of wd systematically in a separate issue/PR.

* fix adagrad * add test * fix lint * CR comments * remove raise in python * enhance unit test * revert wd changes * revert dense op changes

ZiyueHuang added 2 commits January 31, 2018 22:39

fix adagrad

d5790d5

add test

1398998

eric-haibin-lin added the Bug label Jan 31, 2018

eric-haibin-lin requested a review from szha as a code owner January 31, 2018 23:34

fix lint

60577d5

ZiyueHuang reviewed Feb 1, 2018

View reviewed changes

CR comments

3cbd0ed

ZiyueHuang approved these changes Feb 1, 2018

View reviewed changes

cjolivier01 approved these changes Feb 1, 2018

View reviewed changes

szha reviewed Feb 1, 2018

View reviewed changes

eric-haibin-lin and others added 2 commits February 1, 2018 14:13

remove raise in python

ee0649d

enhance unit test

9bec75d

revert wd changes

02a3331

eric-haibin-lin changed the title ~~Fix weight decay in Adagrad and improve sparse adagrad update~~ Improve sparse adagrad update Mar 1, 2018

eric-haibin-lin added Sparse and removed Bug labels Mar 1, 2018

ZiyueHuang added 3 commits March 2, 2018 04:19

Merge commit 'f9c2689ec2ffd61ce123dce5857f8a797f21e4df' into adagrad-fix

85b9df1

Merge remote-tracking branch 'upstream/master' into adagrad-fix

15bf340

revert dense op changes

dbc8cc7

eric-haibin-lin merged commit fc9e70b into apache:master Mar 3, 2018

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018

Improve sparse adagrad update (apache#9651)

16d9692

* fix adagrad * add test * fix lint * CR comments * remove raise in python * enhance unit test * revert wd changes * revert dense op changes

zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018

Improve sparse adagrad update (apache#9651)

4c163f0

* fix adagrad * add test * fix lint * CR comments * remove raise in python * enhance unit test * revert wd changes * revert dense op changes

eric-haibin-lin deleted the adagrad-fix branch September 18, 2018 23:33

ptrendx mentioned this pull request Aug 14, 2020

Use RTC for elementwise and broadcast ops #18622

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve sparse adagrad update #9651

Improve sparse adagrad update #9651

eric-haibin-lin commented Jan 31, 2018 •

edited

Loading

ZiyueHuang left a comment

ZiyueHuang Feb 1, 2018

ZiyueHuang Feb 1, 2018

szha Feb 1, 2018

eric-haibin-lin Feb 1, 2018

szha Feb 1, 2018

eric-haibin-lin Feb 1, 2018

szha Feb 1, 2018

cjolivier01 commented Feb 27, 2018

eric-haibin-lin commented Mar 1, 2018

Improve sparse adagrad update #9651

Improve sparse adagrad update #9651

Conversation

eric-haibin-lin commented Jan 31, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

ZiyueHuang left a comment

Choose a reason for hiding this comment

ZiyueHuang Feb 1, 2018

Choose a reason for hiding this comment

ZiyueHuang Feb 1, 2018

Choose a reason for hiding this comment

szha Feb 1, 2018

Choose a reason for hiding this comment

eric-haibin-lin Feb 1, 2018

Choose a reason for hiding this comment

szha Feb 1, 2018

Choose a reason for hiding this comment

eric-haibin-lin Feb 1, 2018

Choose a reason for hiding this comment

szha Feb 1, 2018

Choose a reason for hiding this comment

cjolivier01 commented Feb 27, 2018

eric-haibin-lin commented Mar 1, 2018

eric-haibin-lin commented Jan 31, 2018 •

edited

Loading