test_ImageRecordIter_seed_augmentation flaky test fix #12485

perdasilva · 2018-09-08T11:25:41Z

Description

Fixes flakiness with ImageRecordIter augmentation seed tests (#11359) by moving the seed_aug parameter and RNG seeding out of the default augmenter. We now seed the RNG before each image augmentation (if seed_aug parameter is set). This guarantees reproducibility while still enabling multithreaded augmentation (thus fixing the flakiness).
Additionally, it makes the test more robust by testing the whole iterator and not just the first image.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

kalyc · 2018-09-14T18:10:08Z

Thanks for your contribution @perdasilva

@mxnet-label-bot[pr-work-in-progress]

sandeep-krishnamurthy · 2018-09-20T04:56:09Z

@perdasilva - Did you run the tests multiple times to verify it is not flaky anymore?

perdasilva · 2018-09-25T07:13:00Z

@sandeep-krishnamurthy sorry for the delay in the response. I've been absent lately because life got in the way. The issue was the multi-threading, I've just done a rebase from master and it seems the number of threads has been reduced to 1 (which fixes the flakyness). I need to look at the history of the file to see what's changed since I was last looking at this problem.

EDIT: The problem is still there. I think something might have changed in the build system in the mean time and MP wasn't enabled. I've rebuilt from scratch with MP and it's still flaky.

perdasilva · 2018-10-01T07:44:24Z

I have sketched out a fix to the flakiness here, although its a bit of a hack at the moment.
The reason for the flakiness has to do with the parallel processing and the random number generators used for the augmenters.

One random number generator is created per preprocessing thread (https://github.com/apache/incubator-mxnet/blob/master/src/io/iter_image_recordio_2.cc#L162). While the image records are retrieved in the same order (https://github.com/apache/incubator-mxnet/blob/master/src/io/iter_image_recordio_2.cc#L508) and always stored in the same index in the output array (https://github.com/apache/incubator-mxnet/blob/master/src/io/iter_image_recordio_2.cc#L567) - they are not guaranteed to be processed by the same thread every time. This matter because the default augmenter seeds the thread's random number generator (https://github.com/apache/incubator-mxnet/blob/master/src/io/image_aug_default.cc#L255) the first time it gets called (https://github.com/apache/incubator-mxnet/blob/master/src/io/iter_image_recordio_2.cc#L563).

This means that for different runs, the same record can be processed in a different iteration of the usage of the random number generator by the augmenters, thus leading to different random numbers being generated, which lead to different changes to the image -> flakyness.

To remedy the issue, I've changed to code such that the augmenter seed is a parameter to the image parser. I then use this seed and the image record index to seed the random number generator (outside of the default augmenter) before any changes are made to the image. Since the records are always retrieved in the same order, the same record will always have the same generator seed, independent of the number of threads used -> same random numbers being generated -> reproducible.

It's a bit of a hack, so I would appreciate some input from the code owner (@anirudh2290 ) and the community.

sandeep-krishnamurthy · 2018-10-10T18:52:31Z

@haojin2 - Can you provide some inputs here? Thanks.

Roshrini · 2018-10-22T17:12:24Z

@haojin2 @apeforest Can you please review this PR? Thanks

ankkhedia · 2018-10-30T19:04:01Z

@haojin2 @apeforest ping for reviewing the PR!

lebeg · 2018-11-02T10:05:35Z

src/io/iter_image_recordio_2.cc

+        LOG(INFO) << "tid: " << tid << " idx: " << idx << " index: " << rec.image_index();
+      }
+      if (param_.seed_aug.has_value()) {
+        prnds_[tid]->seed(idx + param_.seed_aug.value());


I do not understand why seeds need to be reset all the time. The logic is a bit obscure - the random generators are in the augmentor and get passed to the iterator.
Anyway, what you could try to do (if not refactoring) is to seed all generators in DefaultImageAugmenter::Init here or in a separate method (since the random generator is not passed in to Init). I guess the interface ImageAugmenter would require a separate seed method then that could be called sequentially (instead of openmp parallelized).

There are two requirements here: that setting the seed will yield reproducible results and that parallelization should be used to speed up image augmentation. We need to reset the seed for each image because there is no guarantee that the same image will be processed by the same thread, or that even that it will be the ith image processed by that thread, across independent runs or even different hardware (the code figures out the number of threads to use for processing). Meaning the order of draws from the random number generator to augment the same image is not guaranteed. Therefore, resetting the random number generator at the start of processing an image it the only way (at least that I could think of) to guarantee reproducibility when setting a fixed seed. That is, to guarantee that the same random distortions will be applied to the same image across independent runs and different hardware configurations. I hope this make it a little clearer. It's not the easiest topic to discuss in written form lol.

I agree with your assessment - it's pretty obscure and it took me a while to figure out exactly what was going on.

anirudhacharya · 2018-11-07T22:46:44Z

src/io/image_iter_common.h

@@ -165,6 +167,8 @@ struct ImageRecParserParam : public dmlc::Parameter<ImageRecParserParam> {
        .describe("The data shuffle buffer size in MB. Only valid if shuffle is true.");
    DMLC_DECLARE_FIELD(shuffle_chunk_seed).set_default(0)
        .describe("The random seed for shuffling");
+    DMLC_DECLARE_FIELD(seed_aug).set_default(dmlc::optional<int>())


maybe I am missing something here, but the PR says it fixes a flaky test.

Are we adding this new field just to fix the flakiness of a particular test? It does not seem right. we can't change the functionality, just so that it will be easier for us to run tests.

Fair call. However, the problem, I think, is architectural.
The seed_aug parameter is passed in to the default image augmenter (https://github.com/apache/incubator-mxnet/blob/master/src/io/image_aug_default.cc#L101). But, the actual random number generator is external, but seeded internally (https://github.com/apache/incubator-mxnet/blob/master/src/io/image_aug_default.cc#L254).

As I've explained above, I can't see how to meet the requirements of processing the images in parallel and ensuring that the output is reproducible (by setting the augmentation seed).

My fix for the flakiness (without violating the aforementioned requirements), is to ensure that the RNG used to process the image is seeded before processing the image within that thread it is processed in.

To be clear, none of the changes made were to facilitate running the tests, but to actually correct the underlying problem. I agree that the PR is a little rough around the edges atm, but what I'm trying to do is reach out to the community to understand what would be the best way to actually fix the underlying problem.

anirudhacharya · 2018-11-07T22:47:19Z

tests/python/unittest/test_io.py

@@ -427,7 +427,7 @@ def check_CSVIter_synthetic(dtype='float32'):
    for dtype in ['int32', 'int64', 'float32']:
        check_CSVIter_synthetic(dtype=dtype)

-@unittest.skip("Flaky test: https://github.com/apache/incubator-mxnet/issues/11359")
+# @unittest.skip("Flaky test: https://github.com/apache/incubator-mxnet/issues/11359")


please remove rather than comment.

Done. Although again, I need a review of my current approach to see if its the right one. If I get validation from that, I can then prepare the PR for proper review...
Many of nits I'll fix before removing [WIP]

kalyc · 2018-11-14T21:55:10Z

@perdasilva thanks for your contributions. We are looking forward to merging your PR. Could you please address comments above to proceed further?

stu1130 · 2018-11-21T19:18:59Z

@perdasilva is it ready for review?

perdasilva · 2018-11-22T19:53:51Z

@stu1130 I will polish it tomorrow and submit it for review. I'm sorry for the delay.

perdasilva · 2018-11-23T06:23:32Z

@stu1130 ready for review

lebeg

Looks really good

anirudhacharya · 2018-11-28T18:51:14Z

tests/python/unittest/test_io.py

+    def assert_dataiter_equals(dataiter1, dataiter2):
+        for batch1, batch2 in zip_longest(dataiter1, dataiter2):
+            # ensure iterators are of same length
+            assert(batch1 and batch2)


what is this assert doing?

This is a function for asserting that two data iterators are producing the same output. The data iterator iterates over batches (lists of arrays). So, basically it runs through the iterators to get the batches, then iterates through the batches and ensures that they are equal.

anirudhacharya · 2018-11-28T18:52:46Z

tests/python/unittest/test_io.py

+        for batch1, batch2 in zip_longest(dataiter1, dataiter2):
+
+            # try to ensure iterators are of same length
+            assert(batch1 and batch2)


how is this ensuring same length?

The problem is that (as far as I could tell), there was no easy way of getting the number of batches in an iterator. It doesn't have a size or length method. So, I'm using the zip_longest (https://docs.python.org/3.0/library/itertools.html#itertools.zip_longest) to iterate through the iterators together. If they are don't have the same length, a fill value (None by default) will be returned in place of an item (or batch in this case). Therefore, if one of batch1 or batch2 is none, the assertion will be triggered.

anirudhacharya · 2018-11-28T18:58:09Z

tests/python/unittest/test_io.py


    # check whether seed_aug changes the iterator behavior
-    dataiter = mx.io.ImageRecordIter(
+    dataiter1 = mx.io.ImageRecordIter(


what is getting tested here that is not being tested in the first test?

This test was already here when I got here =) I think it was meant as a sanity check to ensure that the seed aug parameter is not influencing the iterator (other than in the case where the iterator has augmentations applied to its images).

anirudhacharya · 2018-11-28T19:00:12Z

tests/python/unittest/test_io.py

+if sys.version_info >= (3,0):
+    from itertools import zip_longest
+else:
+    from itertools import izip_longest as zip_longest


suggestion for handling different python versions

try: from itertools import izip_longest as zip_longest except: from itertools import zip_longest

I've updated it with your suggestion. Much nicer.

perdasilva · 2018-11-30T15:34:03Z

@anirudhacharya all good now?

anirudhacharya · 2018-12-07T19:09:51Z

LGTM.
@anirudh2290 @apeforest for review/merge

apeforest · 2018-12-10T19:10:29Z

@ChaiBapchya @stu1130 Could you please review this change?

stu1130

LGTM. Have you run the test_ImageRecordIter_seed_augmentation at least 10000 times to see if it's still flaky?

perdasilva · 2018-12-10T20:49:15Z

@stu1130 - not 10000x. I will do that tomorrow and confirm ^^

perdasilva · 2018-12-11T09:21:59Z

@stu1130 - ok each test takes around 10 seconds - might take a tad bit longer than expected...

perdasilva · 2018-12-12T14:26:52Z

@stu1130 - I can confirm, all good and flake free, please merge at your convenience

ChaiBapchya

LGTM!

stu1130 · 2018-12-13T19:18:23Z

@anirudh2290 could you take a look at this and merge it if it looks good

perdasilva · 2018-12-21T12:48:44Z

@anirudh2290 any chance you could merge this? =)

Roshrini · 2019-01-02T19:24:32Z

@marcoabreu @sandeep-krishnamurthy Can someone merge this PR?
@mxnet-label-bot Update [pr-awaiting-merge]

…re each augmentation to guarantee reproducibilit

…ly first image

stu1130 · 2019-01-16T21:27:40Z

@sandeep-krishnamurthy @nswamy could someone merge this PR?

* Moves seed_aug parameter to ImageRecParserParam and re-seeds RNG before each augmentation to guarantee reproducibilit * Update image record iterator tests to check the whole iterator not only first image

perdasilva changed the title ~~[WIP]test_ImageRecordIter_seed_augmentation flaky test fix~~ [WIP] test_ImageRecordIter_seed_augmentation flaky test fix Sep 9, 2018

perdasilva requested a review from anirudh2290 as a code owner September 11, 2018 18:24

perdasilva force-pushed the imagerecorditer_flaky_test_fix_11359 branch from 0deffe9 to 6aa8d75 Compare September 11, 2018 20:08

marcoabreu added the pr-work-in-progress PR is still work in progress label Sep 14, 2018

perdasilva force-pushed the imagerecorditer_flaky_test_fix_11359 branch from 6aa8d75 to 97ae4a9 Compare September 24, 2018 07:58

perdasilva force-pushed the imagerecorditer_flaky_test_fix_11359 branch 2 times, most recently from c6152f9 to 1550da3 Compare October 1, 2018 07:50

sandeep-krishnamurthy added the Flaky label Oct 10, 2018

lebeg reviewed Nov 2, 2018

View reviewed changes

perdasilva force-pushed the imagerecorditer_flaky_test_fix_11359 branch from 1550da3 to 12ffd99 Compare November 5, 2018 08:00

anirudhacharya reviewed Nov 7, 2018

View reviewed changes

perdasilva force-pushed the imagerecorditer_flaky_test_fix_11359 branch 2 times, most recently from 728586b to 6984ba6 Compare November 15, 2018 12:03

perdasilva requested a review from szha as a code owner November 15, 2018 12:03

perdasilva force-pushed the imagerecorditer_flaky_test_fix_11359 branch 2 times, most recently from ac885eb to 5dcea9d Compare November 15, 2018 15:11

perdasilva force-pushed the imagerecorditer_flaky_test_fix_11359 branch from 5dcea9d to aa3d167 Compare November 22, 2018 15:33

perdasilva force-pushed the imagerecorditer_flaky_test_fix_11359 branch from aa3d167 to c92ccba Compare November 23, 2018 06:12

perdasilva changed the title ~~[WIP] test_ImageRecordIter_seed_augmentation flaky test fix~~ test_ImageRecordIter_seed_augmentation flaky test fix Nov 23, 2018

lebeg approved these changes Nov 28, 2018

View reviewed changes

anirudhacharya reviewed Nov 28, 2018

View reviewed changes

perdasilva force-pushed the imagerecorditer_flaky_test_fix_11359 branch from c92ccba to 451291b Compare November 29, 2018 08:25

perdasilva force-pushed the imagerecorditer_flaky_test_fix_11359 branch 3 times, most recently from 39117c5 to 88d2d6d Compare December 7, 2018 06:43

nswamy added the Test label Dec 8, 2018

stu1130 approved these changes Dec 10, 2018

View reviewed changes

ChaiBapchya approved these changes Dec 13, 2018

View reviewed changes

marcoabreu added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed Test pr-awaiting-review PR is waiting for code review labels Jan 2, 2019

perdasilva added 2 commits January 9, 2019 08:11

Moves seed_aug parameter to ImageRecParserParam and re-seeds RNG befo…

a856fde

…re each augmentation to guarantee reproducibilit

Update image record iterator tests to check the whole iterator not on…

1861e85

…ly first image

perdasilva force-pushed the imagerecorditer_flaky_test_fix_11359 branch from 88d2d6d to 1861e85 Compare January 9, 2019 07:11

marcoabreu merged commit d07187b into apache:master Jan 18, 2019

perdasilva deleted the imagerecorditer_flaky_test_fix_11359 branch January 18, 2019 18:47

test_ImageRecordIter_seed_augmentation flaky test fix #12485

test_ImageRecordIter_seed_augmentation flaky test fix #12485

Conversation

perdasilva commented Sep 8, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

kalyc commented Sep 14, 2018

sandeep-krishnamurthy commented Sep 20, 2018

perdasilva commented Sep 25, 2018 • edited Loading

perdasilva commented Oct 1, 2018

sandeep-krishnamurthy commented Oct 10, 2018

Roshrini commented Oct 22, 2018

ankkhedia commented Oct 30, 2018

Choose a reason for hiding this comment

perdasilva Nov 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudhacharya Nov 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kalyc commented Nov 14, 2018

stu1130 commented Nov 21, 2018

perdasilva commented Nov 22, 2018

perdasilva commented Nov 23, 2018

lebeg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

perdasilva commented Nov 30, 2018

anirudhacharya commented Dec 7, 2018

apeforest commented Dec 10, 2018 • edited Loading

stu1130 left a comment

Choose a reason for hiding this comment

perdasilva commented Dec 10, 2018

perdasilva commented Dec 11, 2018

perdasilva commented Dec 12, 2018

ChaiBapchya left a comment

Choose a reason for hiding this comment

stu1130 commented Dec 13, 2018

perdasilva commented Dec 21, 2018

Roshrini commented Jan 2, 2019

stu1130 commented Jan 16, 2019

perdasilva commented Sep 8, 2018 •

edited

Loading

perdasilva commented Sep 25, 2018 •

edited

Loading

perdasilva Nov 5, 2018 •

edited

Loading

anirudhacharya Nov 7, 2018 •

edited

Loading

apeforest commented Dec 10, 2018 •

edited

Loading