Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Less cudaGet/SetDevice calls in Gluon execution #13764

Merged
merged 6 commits into from
Jan 5, 2019

Conversation

ptrendx
Copy link
Member

@ptrendx ptrendx commented Jan 2, 2019

Description

This PR reduces the number of cudaGetDevice/cudaSetDevice calls during Gluon execution.
Previously, during every call to allocate/free buffer in StorageManager DeviceStore would call cudaGetDevice and 2x cudaSetDevice (to get the current device, set the new device and lastly to set the original device again), even if no actual allocation took place (due to caching allocator usage).

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • This PR changes the DeviceStore so that cudaSetDevice calls are made only when necessary (when the new device is different than the original device)
  • This PR changes the Storage so that only the actual GPU and pinned CPU allocations are guarded by the DeviceStore - memory returned from allocator cache does not need to be guarded.

Comments

@Roshrini
Copy link
Member

Roshrini commented Jan 3, 2019

@ptrendx Can you look into failing CI builds?
@mxnet-label-bot Add [pr-awaiting-review, Gluon]

@marcoabreu marcoabreu added Gluon pr-awaiting-review PR is waiting for code review labels Jan 3, 2019
Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! One quesiton

for (int i = 0; i < n; ++i) {
device_store.SetDevice(gpus[i]);
// Restores active device to what it was before EnableP2P
mxnet::common::cuda::DeviceStore device_store(gpus[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is cudaGetDevice costly? This change would cause 2x cudaGetDevice calls

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is executed only during initialization, so I'm not concerned about its performance (to answer your question though - cudaGetDevice is slightly less costly than cudaSetDevice).
I made a change here just because it is then real RAII guard instead of just a setdevice call.

@eric-haibin-lin
Copy link
Member

@ctcyang could you also take a look?

@ptrendx
Copy link
Member Author

ptrendx commented Jan 4, 2019

Not sure why the website check is showing as pending - it seems to have finished successfully in Details view.

Copy link
Contributor

@ctcyang ctcyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! The only nitpick I have is that after these changes, the only place where cudaSetDevice is still used directly is: https://github.com/apache/incubator-mxnet/blob/e9a7aa42ec380d92b1623025d6434b8856724402/src/engine/threaded_engine_pooled.cc#L136

Could you change that to use this new API too?

Copy link
Contributor

@ctcyang ctcyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eric-haibin-lin eric-haibin-lin merged commit 863fb86 into apache:master Jan 5, 2019
rondogency pushed a commit to rondogency/incubator-mxnet that referenced this pull request Jan 9, 2019
* Remove unnecessary cudaGetDevice/cudaSetDevice calls

* Fixes for the DeviceGuard

* Retrigger CI

* Fix for possible invalid device ordinal when using DeviceStore while
driver is unloading

* Fix for RTC when the driver API call is the first call

* Added DeviceStore to pooled engine
perdasilva pushed a commit to perdasilva/incubator-mxnet that referenced this pull request Mar 13, 2019
* Remove unnecessary cudaGetDevice/cudaSetDevice calls

* Fixes for the DeviceGuard

* Retrigger CI

* Fix for possible invalid device ordinal when using DeviceStore while
driver is unloading

* Fix for RTC when the driver API call is the first call

* Added DeviceStore to pooled engine
perdasilva pushed a commit to perdasilva/incubator-mxnet that referenced this pull request Mar 14, 2019
* Remove unnecessary cudaGetDevice/cudaSetDevice calls

* Fixes for the DeviceGuard

* Retrigger CI

* Fix for possible invalid device ordinal when using DeviceStore while
driver is unloading

* Fix for RTC when the driver API call is the first call

* Added DeviceStore to pooled engine
szha pushed a commit that referenced this pull request Apr 12, 2019
* Remove unnecessary cudaGetDevice/cudaSetDevice calls

* Fixes for the DeviceGuard

* Retrigger CI

* Fix for possible invalid device ordinal when using DeviceStore while
driver is unloading

* Fix for RTC when the driver API call is the first call

* Added DeviceStore to pooled engine
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* Remove unnecessary cudaGetDevice/cudaSetDevice calls

* Fixes for the DeviceGuard

* Retrigger CI

* Fix for possible invalid device ordinal when using DeviceStore while
driver is unloading

* Fix for RTC when the driver API call is the first call

* Added DeviceStore to pooled engine
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Gluon pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants