Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Cudnn dropout #13896

Merged
merged 9 commits into from
Feb 5, 2019
Merged

Cudnn dropout #13896

merged 9 commits into from
Feb 5, 2019

Conversation

szha
Copy link
Member

@szha szha commented Jan 16, 2019

Description

Use dropout in CuDNN

Tested on p3.2x (V100). Test case:

import mxnet as mx
a = mx.nd.ones((10, 200, 300, 500), ctx=mx.gpu(0))
a.attach_grad()
mx.autograd.set_recording(True)
%timeit mx.nd.Dropout(a, 0.5, mode='always').wait_to_read()
fwd fwd+bwd
cudnn GPU 46ms 4.3ms 48ms 15ms
prev GPU 305ms 321ms

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • CuDNN Dropout.
  • skip resource request for parallel random when using cudnn forward.
  • extend cpptest to support stateful operators, resourcerequestex, ResourceRequest::kCuDNNDropoutDesc.
  • fix bug where the backward pass on an inference-mode dropout forward didn't act as identity.
  • add kCuDNNDropoutDesc resource for sharing dropout state across operators.
  • replace dropout with identity when it's not dropping anything.

Comments

  • cudnnSetDropoutDescriptor is an expensive call due to initialization on each of the stream multiprocessor on a GPU. Since cudnn 7, cudnnRestoreDropoutDescriptor becomes available so that the initialized space can be cached. This descriptor is currently used in both Dropout op and RNN op. We need a mechanism for caching this so that initialization on each stream happens only once, as the same desc can be shared among operators.

@szha szha force-pushed the cudnn_dropout branch 2 times, most recently from f2a6f7f to 03453c4 Compare January 16, 2019 04:53
@eric-haibin-lin eric-haibin-lin self-requested a review January 16, 2019 05:51
@szha szha force-pushed the cudnn_dropout branch 2 times, most recently from c5699d8 to 9c23d1b Compare January 16, 2019 07:26
@stu1130
Copy link
Contributor

stu1130 commented Jan 16, 2019

@mxnet-label-bot add [pr-work-in-progress]
Thanks for your contribution @szha

@marcoabreu marcoabreu added the pr-work-in-progress PR is still work in progress label Jan 16, 2019
@szha szha requested a review from anirudh2290 as a code owner January 19, 2019 03:19
@szha szha removed the request for review from anirudh2290 January 19, 2019 03:55
@szha szha force-pushed the cudnn_dropout branch 2 times, most recently from a211d6b to 3b797e6 Compare January 19, 2019 07:27
@szha szha changed the title [WIP] Cudnn dropout Cudnn dropout Jan 20, 2019
@szha szha removed the pr-work-in-progress PR is still work in progress label Jan 20, 2019
@eric-haibin-lin
Copy link
Member

shall we also add a cudnn_off flag to this op?

@szha szha requested a review from yzhliu January 21, 2019 23:37
@sandeep-krishnamurthy sandeep-krishnamurthy added Operator pr-awaiting-review PR is waiting for code review labels Jan 25, 2019
@szha szha added pr-work-in-progress PR is still work in progress and removed pr-awaiting-review PR is waiting for code review labels Jan 25, 2019
@szha szha force-pushed the cudnn_dropout branch 2 times, most recently from d15143d to 9f29aec Compare January 27, 2019 07:43
src/operator/nn/dropout-inl.h Outdated Show resolved Hide resolved
src/operator/nn/dropout-inl.h Outdated Show resolved Hide resolved
@szha szha force-pushed the cudnn_dropout branch 2 times, most recently from 8a7707c to cb3d2b0 Compare January 28, 2019 02:22
@szha szha force-pushed the cudnn_dropout branch 2 times, most recently from 89f497d to 94a48f9 Compare January 28, 2019 04:09
@szha szha force-pushed the cudnn_dropout branch 3 times, most recently from 9db5fde to 34288d4 Compare January 31, 2019 02:38
@szha szha force-pushed the cudnn_dropout branch 2 times, most recently from b3a2af4 to dcc7636 Compare January 31, 2019 05:57
Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TaoLv @pengzhao-intel @ptrendx @DickJC123 could you guys help review?

src/executor/attach_op_resource_pass.cc Show resolved Hide resolved
src/operator/cudnn_rnn-inl.h Show resolved Hide resolved
src/operator/nn/dropout-inl.h Show resolved Hide resolved
src/operator/nn/dropout-inl.h Show resolved Hide resolved
src/operator/nn/dropout-inl.h Show resolved Hide resolved
src/resource.cc Show resolved Hide resolved
src/operator/nn/dropout.cc Show resolved Hide resolved
@szha
Copy link
Member Author

szha commented Feb 4, 2019

Thanks for the review, @eric-haibin-lin @TaoLv.

@pengzhao-intel @ptrendx @DickJC123 I'm holding onto updating the PR until you get a chance to review this PR.

@pengzhao-intel
Copy link
Contributor

@szha sorry I am on the vocation and don't have enough time to look into the details.

@TaoLv took the review so please go ahead to move forward for this PR.

Happy Chinese New Year @szha @eric-haibin-lin @TaoLv :)

Copy link
Member

@TaoLv TaoLv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. My comments are addressed.

@eric-haibin-lin eric-haibin-lin merged commit 18b8704 into apache:master Feb 5, 2019
@szha szha deleted the cudnn_dropout branch February 5, 2019 18:33
stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this pull request Feb 16, 2019
* cudnn dropout

* test dropout as stateful op

* add cudnn_off

* refactor

* fix bug when using inf forward

* turn on cudnn in gluon

* reuse dropout state space

* dropout passthrough

* address comments
@roywei
Copy link
Member

roywei commented Feb 28, 2019

I m not able to get the speed in the test case, see #13825 (comment)

@szha
Copy link
Member Author

szha commented Feb 28, 2019

@roywei by default cudnn_off is turned on. You need to turn it off to benefit from cudnn dropout.

@roywei roywei mentioned this pull request Feb 28, 2019
4 tasks
vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019
* cudnn dropout

* test dropout as stateful op

* add cudnn_off

* refactor

* fix bug when using inf forward

* turn on cudnn in gluon

* reuse dropout state space

* dropout passthrough

* address comments
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* cudnn dropout

* test dropout as stateful op

* add cudnn_off

* refactor

* fix bug when using inf forward

* turn on cudnn in gluon

* reuse dropout state space

* dropout passthrough

* address comments
@roywei roywei mentioned this pull request Oct 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants