-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-1408] Adding test to verify Large Tensor Support for ravel and unravel #15048
Conversation
@mxnet-label-bot add [pr-awaiting-review] |
@apeforest : Please review |
tests/nightly/test_large_array.py
Outdated
@@ -279,6 +279,19 @@ def test_diag(): | |||
assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k)) | |||
|
|||
|
|||
def test_ravel_and_unravel(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break it into two tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its more readable and understandable this way. Also I have split it into 2 functions inside.
tests/nightly/test_large_array.py
Outdated
|
||
def test_unravel_index(): | ||
original_2d_idxs = mx.nd.unravel_index(mx.nd.array(idx, dtype=np.int64), shape=(LARGE_X,SMALL_Y)) | ||
assert (original_2d_idxs.asnumpy() == np.array(idxs_2d)).all() == True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove == True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
7c915e3
to
e2970e9
Compare
tests/nightly/test_large_array.py
Outdated
@@ -279,6 +279,19 @@ def test_diag(): | |||
assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k)) | |||
|
|||
|
|||
def test_ravel_and_unravel(): | |||
idxs_2d = [[LARGE_X-1,LARGE_X-100,6],[SMALL_Y-1,SMALL_Y-10,1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use C++ tests with mocking of the memory allocation instead of actually allocating memory.
Although this is nightly, I don't think it's necessary for these kind of tests to actually allocate real memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add C++ unit test to mock the memory allocation. However, I thought the nightly test is used for end-to-end testing purpose to mimic the real deployment situation, or is it not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are to some extent, but in same cases the cost vs return ratio is not that good. While smoke tests are certainly necessary in some cases, resource intensive ones on the other hand can usually be abstracted away. If we assume that memory access on the system side is properly working and only mock this very laster layer, this result should be pretty much as meaningful as running it end to end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcoabreu : I didn't understand your last comment. What did you mean by "very laster layer" ? All these tests are single operator tests. This tests I combined because its easier to test when combined together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @marcoabreu, thanks a lot for your suggestion and it's a very good learning for me too. The large tensor support is crucial to DGL (Deep Graph Library) that is using MXNet and planning to release with support of large tensors. Without this test, we cannot make this feature production ready.
While I appreciate the direction of testing you suggested and definitely would like to explore it further, I am also afraid that not adding tests before we implement the approach you suggested will block the release of DGL and other MXNet projects that depend on large tensors as well.
Can we have a seperate project to address the test efficiency (we may need some expertise from CI team for this project) while moving on with the large tensor project? Please let me know if you have other concerns or advice. Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcoabreu : We have to test cases where we have to allocate large tensors on actual physical memory. That is the whole purpose of adding these tests. Without which we cannot support or even guarantee large dense tensor support on MXNet.
The purpose of these tests are not at all to test only Sparse. Currently we are not addressing performance issues with large dense arrays but trying to support them. From your reply it seems like you are under the impression that everything should work fine if tensor size >2^32 elements(Please correct me if I am wrong). But that's not the case and the operator simply segfaults or gives unpredictable behavior. That is why is necessary to test them with such large arrays/tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcoabreu : Let me know if you have other unanswered queries that will help you understand the use case better and unblock this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
I understand the importance of large tensors and I'm really looking forward to them as well. Don't get me wrong, I'm not underestimating the work that comes with supporting them.
We already had a history of issues that were caused by too large allocations. Thus, for the sake of keeping CI stable (independent on whether we're running on per-PR or nightly) and to adhere to a high test standard, I'd prefer if we would not delay adressing the test strategy.
I think there's a misunderstanding. With my proposed approach, you would still be able to make a full test of a single operator. But instead of randomly generating the input tensor, you would use a generator model that gives you predictable and procedually generatable input tensor instead of a large and randomly generated one.
The idea here would then be to give this procedually generated input tensor to the operator and since it's a predicutable input, you can also have a predictable output that you can verify. On the way, the operator will run as usual, it's just that the source of the data is not physical memory but generated values.
The only case that wouldn't be supported is the chaining of operators since that would require storing the intermediary values.
The idea here is that (in whatever fashion you'd like) we generate fake input and output tensors that return predictable and valid data, but doesn't actually consume memory on the RAM. From MXNets perspective, it shouldn't matter where the real values is coming from. For the sake of an operator, it's important that it actually retrieves proper values and that would be the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcoabreu I still don't understand how will I know that on actual large tensor after memory allocation would it work as expected or not. Procedurally generating the values will indeed give me random values on random locations that are indexed at locations > 2^32. But the input should come from RAM. it does matter. It needs to be tested on actual physical memory so there are no surprises. If nighly tests are not feasible do we have a weekly test where we can run this ?
tests/nightly/test_large_array.py
Outdated
@@ -279,6 +279,19 @@ def test_diag(): | |||
assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k)) | |||
|
|||
|
|||
def test_ravel_and_unravel(): | |||
idxs_2d = [[LARGE_X-1,LARGE_X-100,6],[SMALL_Y-1,SMALL_Y-10,1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space between operators and operands
tests/nightly/test_large_array.py
Outdated
idx = mx.nd.array(1) | ||
def test_ravel_multi_index(): | ||
idx = mx.nd.ravel_multi_index(mx.nd.array(idxs_2d, dtype=np.int64), shape=(LARGE_X,SMALL_Y)) | ||
idx_numpy = np.ravel_multi_index(idxs_2d, (LARGE_X,SMALL_Y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space. btw, do you see any pylint warning out of these?
tests/nightly/test_large_array.py
Outdated
assert np.sum(1 for i in range(idx.size) if idx[i] == idx_numpy[i]) == 3 | ||
|
||
def test_unravel_index(): | ||
original_2d_idxs = mx.nd.unravel_index(mx.nd.array(idx, dtype=np.int64), shape=(LARGE_X,SMALL_Y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same: add space
tests/nightly/test_large_array.py
Outdated
@@ -279,6 +279,19 @@ def test_diag(): | |||
assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k)) | |||
|
|||
|
|||
def test_ravel_and_unravel(): | |||
idxs_2d = [[LARGE_X-1,LARGE_X-100,6],[SMALL_Y-1,SMALL_Y-10,1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do you mean indices_2d
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes . Though make pylint
gives this output.
ubuntu@ip-172-31-82-110 ~/mxnet3 (master) $ make pylint
Makefile:345: WARNING: Significant performance increases can be achieved by installing and enabling gperftools or jemalloc development packages
python3 -m pylint --rcfile=/home/ubuntu/mxnet3/ci/other/pylintrc --ignore-patterns="..so$,..dll$,..dylib$" python/mxnet tools/caffe_converter/.py
Using config file /home/ubuntu/mxnet3/ci/other/pylintrc
Your code has been rated at 10.00/10
I will still fix it. Dunno why pylint isn't catching it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it .... mxnet makefile doesn't do pylint on tests/nightly and tests/python/unittest ... added manually to local repo. Thanks for the advice !
import numpy as np | ||
import mxnet as mx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested reordering of imports by pylint
@@ -130,13 +130,6 @@ def test_clip(): | |||
assert np.sum(res[-1].asnumpy() == 1000) == a.shape[1] | |||
|
|||
|
|||
def test_take(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was duplicate
idx_numpy = np.ravel_multi_index(indices_2d, (LARGE_X, SMALL_Y)) | ||
assert np.sum(1 for i in range(idx.size) if idx[i] == idx_numpy[i]) == 3 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just talked to Lin and we agreed on merging the end to end tests for now. The condition is that they have to be replaced by 15th of July or the next minor release that includes large tensor support; whichever comes first.
tests/nightly/test_large_array.py
Outdated
@@ -254,8 +247,8 @@ def numpy_space_to_depth(x, blocksize): | |||
|
|||
|
|||
def test_diag(): | |||
h = np.random.randint(2,9) | |||
w = np.random.randint(2,9) | |||
h = np.random.randint(2, 9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the necessity of creating h and w variables here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you decide to move forward with them, please add the with seed annotation to the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apeforest Thats how the unittest for diagonal operator is implemented.
@marcoabreu Good point will add @with_seed()
tests/nightly/test_large_array.py
Outdated
r = mx.nd.diag(a, k=k) | ||
assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k)) | ||
|
||
|
||
def test_ravel_multi_index(): | ||
indices_2d = [[LARGE_X-1, LARGE_X-100, 6], [SMALL_Y-1, SMALL_Y-10, 1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we generate the random indices instead?
tests/nightly/test_large_array.py
Outdated
|
||
|
||
def test_unravel_index(): | ||
original_2d_indices = [[LARGE_X-1, LARGE_X-100, 6], [SMALL_Y-1, SMALL_Y-10, 1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. can we generate random indices? you can pass in the 2d_array as input so you only keep one copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input is not 2d_array, we pass indices as input. I can do following:
low_large_value = 2**32
high_large_value = 2**34
a = nd.random.randint(low_large_value, high_large_value, dtype=np.int64)
for random Large indices
Good Point
tests/nightly/test_large_array.py
Outdated
@@ -216,8 +209,8 @@ def test_where(): | |||
|
|||
def test_pick(): | |||
a = mx.nd.ones(shape=(256*35, 1024*1024)) | |||
b = mx.nd.ones(shape=(256*35,)) | |||
res = mx.nd.pick(a,b) | |||
b = mx.nd.ones(shape=(256*35, )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space around * operator
tests/nightly/test_large_array.py
Outdated
r = mx.nd.diag(a, k=k) | ||
assert_almost_equal(r.asnumpy(), np.diag(a_np, k=k)) | ||
|
||
|
||
def random_2d_coordinates(x_low, x_high, y_low, y_high): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you are re-use the existing utility method: https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/test_utils.py#L410
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is different and we need one value that to very large for tests case to be successful not any values between (1, very large value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will move this to test_utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
New Ops: ravel_multi_index, unravel_index
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments