-
Notifications
You must be signed in to change notification settings - Fork 6.8k
fix test_depthwise_convoltuion for occasional CI failures #14016
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also compare between cpu and cpu as context?
Could you also create an issue of descprepancy between cpu and gpu implementation with the input to replay? Thanks! |
@apeforest We already do that. We actually compare between default contexts. So when default context is cpu, we compare between cpu and cpu. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @mseth10 Thanks for your thorough analysis!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
I would also recommend that you should put if there is a input you tested this on. Overall this looks good to me.
On a side note, you should also investigate if there is a way to put the test for non-cudnn environment. I am not sure if we do this in mxnet CI, in case we do, the test should be part of that suite which runs test cases for non-cudnn environment.
@Vikas89 I checked. We do have GPU no cuDNN environment running on our CI. I will make sure this test is running on that. |
This reverts commit 1f95d02.
Unfortunately, the same test is failing in this PR:
Not sure whether it's the same error though. |
@lebeg It's not the same error - the current failure is on unix-cpu whereas earlier it was on unix-gpu. Also, I verified this failure is not due to the fix, it is because of some other code change that happened while this test was inactive. |
…to fix-depthwise-conv
@Vikas89 This test already runs on GPU no cuDNN environment. So we are good there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
However, since the test is still failing (albeit due to a different root cause), we should probably keep skipping this test.
…to fix-depthwise-conv
a985b9d
to
7a4a26b
Compare
7a4a26b
to
c1299d7
Compare
Filed another issue #14052 for MKLDNN bug that causes CI failure when test is enabled. Keeping the test disabled as of now. |
…to fix-depthwise-conv
@mxnet-label-bot add [pr-awaiting-merge, Flaky, Test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! LGTM.
* keeping same contexts for comparison * enabling test * testing default context * Revert "testing default context" This reverts commit 1f95d02. * Disabling test due to CI failure on MKL-DNN
* keeping same contexts for comparison * enabling test * testing default context * Revert "testing default context" This reverts commit 1f95d02. * Disabling test due to CI failure on MKL-DNN
* keeping same contexts for comparison * enabling test * testing default context * Revert "testing default context" This reverts commit 1f95d02. * Disabling test due to CI failure on MKL-DNN
Description
This PR aims to fix #12203 . The test_depthwise convolution fails occasionally on CI for GPU context. The problem occurs as the test tries to match GPU output of convolution on an image with CPU output of convolution on sliced images concatenated later. The error tolerance values are sometimes exceeded. When the comparison instead is made between GPU output of convolution on an image with GPU output of convolution on sliced images concatenated later, it passes the test.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments