Add an asynchronous single GPU dataloader example #1521

HenryJia · 2020-04-17T20:35:01Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

This was the simplest way without adding an any extra complexity to pytorch-lightning itself. It's only possible to asynchronously load for single GPU training anyway. MultiGPU uses PyTorch's DataParallel.scatter() which seems to have synchronisation constraints

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

pep8speaks · 2020-04-17T20:35:05Z

Hello @HenryJia! Thanks for updating this PR.

In the file pl_examples/dataloaders/async_loader.py:

Line 75:111: E501 line too long (112 > 110 characters)

Comment last updated at 2020-04-29 20:03:05 UTC

pl_examples/utils/loaders.py

pl_examples/basic_examples/async_gpu_template.py

pl_examples/utils/loaders.py

mergify · 2020-04-20T14:10:41Z

This pull request is now in conflict... :(

codecov · 2020-04-20T14:40:28Z

Codecov Report

Merging #1521 into master will not change coverage.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #1521   +/-   ##
======================================
  Coverage      88%     88%           
======================================
  Files          71      71           
  Lines        4175    4175           
======================================
  Hits         3692    3692           
  Misses        483     483

veritas9872 · 2020-04-22T02:23:37Z

@HenryJia The len attribute seems to be wrong. To the best of my knowledge, DataLoaders do not have len. To access the length, you should use DataLoader.Dataset to access the dataset of the data loader.

justusschock · 2020-04-22T13:04:28Z

@veritas9872 That's not correct. The pytorch data loader does have a length that computes the number of samples based on batch size and dataset/sampler length. It's the dataset that does not necessarily have a length.

pl_examples/utils/loaders.py

mergify · 2020-04-22T21:40:32Z

This pull request is now in conflict... :(

justusschock

Looks good to me now :)

I think as an example this is good now. However, there are some issues, with DDP, where we automatically alter the sampler when necessary.
I'd say to merge this now, since currently your loader code is only in examples, but maybe add a new PR that add's the code to the framework and also takes care of these changes?

Thoughts @Borda @williamFalcon @PyTorchLightning/core-contributors

Borda · 2020-04-24T08:38:30Z

pl_examples/basic_examples/async_gpu_template.py

+from pl_examples.models.lightning_template import LightningTemplateModel
+from pl_examples.utils.loaders import AsynchronousLoader
+
+SEED = 2334


this shall be fixed in #1572

Is the seed actually needed? The example does not actually produce any meaningful results, it's just a demo for a dataloader, right? Otherwise I would wait until #1572 is merged.

HenryJia · 2020-04-25T09:35:13Z

@justusschock I thought about adding it in to lightning itself instead of just as an example, but I'm not sure where exactly to stick it. It's not very generalisable as well as it's only single GPU compatible. To get it to work with infinite dataloaders would add a bit of complexity as I'd need to add in thread stopping conditions. I figured sticking with the PyTorch/PyTorch lightning philosophy of modularity, I'd keep it as an wrapper example with dataloader

mergify · 2020-04-26T20:12:30Z

This pull request is now in conflict... :(

awaelchli

Looks good, I tried it myself and it worked fine.
The template filename could be renamed to gpu_async_template just for better lexicograhic ordering in the folder.
And maybe the dataloader could be moved to a folder "dataloaders" just like the models that are located in "models" folder.
There is a merge conflict, could you resolve it?

pl_examples/utils/loaders.py

williamFalcon · 2020-04-30T12:15:48Z

@HenryJia this is awesome, but i think this belongs in bolts?
@PyTorchLightning/core-contributors thoughts?

awaelchli · 2020-04-30T13:08:26Z

Could be a good place for these kind of examples

awaelchli · 2020-04-30T13:09:09Z

maybe could go to bolts.dataloaders?

HenryJia · 2020-04-30T13:33:04Z

Following some discussions with William, we agreed that this would be best added to the datamodules or dataloader section of bolts instead. As such, I'm closing this PR for now unless anyone else has something to object about it. I will open a new PR with this for bolts soon once I get around my final year university exams.

Add an asynchronous dataloader example

2d58bf0

mergify bot requested a review from a team April 17, 2020 20:35

HenryJia changed the title ~~Add an asynchronous single GPU dataloader sexample~~ Add an asynchronous single GPU dataloader example Apr 17, 2020

HenryJia added 3 commits April 17, 2020 22:04

Cleanup formatting with the help of autopep8

c951595

Further pep9-ify things

38e41ea

Make sure we pass the circleci formatting check

13b463c

Borda added the feature Is an improvement or enhancement label Apr 20, 2020

Borda reviewed Apr 20, 2020

View reviewed changes

pl_examples/utils/loaders.py Outdated Show resolved Hide resolved

pl_examples/utils/loaders.py Outdated Show resolved Hide resolved

pl_examples/utils/loaders.py Outdated Show resolved Hide resolved

pl_examples/basic_examples/async_gpu_template.py Outdated Show resolved Hide resolved

mergify bot requested a review from a team April 20, 2020 06:26

Borda added this to the 0.7.4 milestone Apr 20, 2020

justusschock reviewed Apr 20, 2020

View reviewed changes

pl_examples/utils/loaders.py Outdated Show resolved Hide resolved

mergify bot requested a review from a team April 20, 2020 07:08

HenryJia added 2 commits April 20, 2020 15:05

Turn Asynchronousloader into a wrapper instead

53e65fb

Add to changelog and update the comments

e1d2def

HenryJia added 2 commits April 20, 2020 15:12

Resolve merge conflicts with changelog

a46f441

Correct bug where I forgot about private classes

181384c

More bugfixes

f1af156

justusschock reviewed Apr 22, 2020

View reviewed changes

pl_examples/utils/loaders.py Outdated Show resolved Hide resolved

mergify bot requested a review from a team April 22, 2020 13:09

HenryJia added 2 commits April 22, 2020 22:21

Address Justusschock's suggestions

1b4c971

Remove unecessary line break

14b0628

Merge branch 'master' into feat-async

cf44c99

justusschock requested review from Borda and justusschock April 24, 2020 07:36

justusschock approved these changes Apr 24, 2020

View reviewed changes

mergify bot requested a review from a team April 24, 2020 07:38

Borda approved these changes Apr 24, 2020

View reviewed changes

mergify bot requested a review from a team April 24, 2020 08:39

Borda added example ready PRs ready to be merged labels Apr 24, 2020

Borda modified the milestones: 0.7.4, 0.7.5 Apr 26, 2020

Borda requested a review from awaelchli April 26, 2020 22:30

awaelchli reviewed Apr 28, 2020

View reviewed changes

pl_examples/utils/loaders.py Outdated Show resolved Hide resolved

mergify bot requested a review from a team April 28, 2020 22:29

HenryJia and others added 2 commits April 29, 2020 20:45

Merge branch 'master' into feat-async

603d68c

Move files around to neaten things a bit

3dc657b

HenryJia closed this Apr 30, 2020

HenryJia mentioned this pull request Jul 26, 2020

Add the Asynchronous Dataloader Lightning-Universe/lightning-bolts#127

Merged

4 tasks

ananthsub mentioned this pull request Aug 9, 2021

Add a flavor of training_step that allows for expressing inter-batch parallelism #8316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an asynchronous single GPU dataloader example #1521

Add an asynchronous single GPU dataloader example #1521

HenryJia commented Apr 17, 2020 •

edited

Loading

pep8speaks commented Apr 17, 2020 •

edited

Loading

mergify bot commented Apr 20, 2020

codecov bot commented Apr 20, 2020 •

edited

Loading

veritas9872 commented Apr 22, 2020

justusschock commented Apr 22, 2020

mergify bot commented Apr 22, 2020

justusschock left a comment

Borda Apr 24, 2020

awaelchli Apr 28, 2020 •

edited

Loading

HenryJia commented Apr 25, 2020

mergify bot commented Apr 26, 2020

awaelchli left a comment

williamFalcon commented Apr 30, 2020

awaelchli commented Apr 30, 2020

awaelchli commented Apr 30, 2020

HenryJia commented Apr 30, 2020 •

edited

Loading

Add an asynchronous single GPU dataloader example #1521

Add an asynchronous single GPU dataloader example #1521

Conversation

HenryJia commented Apr 17, 2020 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

pep8speaks commented Apr 17, 2020 • edited Loading

Comment last updated at 2020-04-29 20:03:05 UTC

mergify bot commented Apr 20, 2020

codecov bot commented Apr 20, 2020 • edited Loading

Codecov Report

veritas9872 commented Apr 22, 2020

justusschock commented Apr 22, 2020

mergify bot commented Apr 22, 2020

justusschock left a comment

Choose a reason for hiding this comment

Borda Apr 24, 2020

Choose a reason for hiding this comment

awaelchli Apr 28, 2020 • edited Loading

Choose a reason for hiding this comment

HenryJia commented Apr 25, 2020

mergify bot commented Apr 26, 2020

awaelchli left a comment

Choose a reason for hiding this comment

williamFalcon commented Apr 30, 2020

awaelchli commented Apr 30, 2020

awaelchli commented Apr 30, 2020

HenryJia commented Apr 30, 2020 • edited Loading

HenryJia commented Apr 17, 2020 •

edited

Loading

pep8speaks commented Apr 17, 2020 •

edited

Loading

codecov bot commented Apr 20, 2020 •

edited

Loading

awaelchli Apr 28, 2020 •

edited

Loading

HenryJia commented Apr 30, 2020 •

edited

Loading