[MXNET-688] Fix quantization divide by zero errors #11833

OneRaynyDay · 2018-07-20T00:40:30Z

Description

The current quantization strategy for calib_mode='entropy' is to calculate the KL divergence for different thresholds and choose the best threshold. This assumes that the random variable is nonzero for all reals and is a continuous random variable. Because we are discretizing the distribution, we smooth the distribution over the range [-threshold, threshold]. What we are not considering is that the entire sampled distribution may be not in the range [-threshold, threshold] and thus we end up with all zeros in the sampled candidate p distribution inside of _get_optimal_threshold.

I have added a check that the distribution(possibly unnormalized) is proper before attempting to smooth or else we'll run into a divide by 0 error.

In most cases, activation functions and layers for classification type problems output numbers symmetric around 0. This is not the case for a regressor's last layer, and there are various other examples where the activation distribution is not around 0, and this was a major blockage for airbnb's adoption into mxnet's quantization capabilities.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Added tests and extra code inside of quantization.py to set all kl divergence to infinity if the probability distribution formed is all zeros. Also there are some typos and one-off-errors in the original implementation that are now fixed.

…ithout 1 off errors

… divergence calc

OneRaynyDay · 2018-07-20T00:42:47Z

python/mxnet/contrib/quantization.py

        for j in range(num_quantized_bins):
            start = j * num_merged_bins
            if j == num_quantized_bins - 1:
-                stop = -1
+                stop = len(is_nonzeros)


This is an off-by-1 error that can be caught via the quantization tests that I added. Indexing a numpy array with x[a:-1] excludes the last element.

OneRaynyDay · 2018-07-20T00:43:20Z

python/mxnet/contrib/quantization.py

            else:
                stop = start + num_merged_bins
            norm = is_nonzeros[start:stop].sum()
            if norm != 0:
-                q[start:stop] = float(quantized_bins[j]) / float(norm)
-        q[sliced_nd_hist == 0] = 0


This is not representative of the quantized distribution, as setting values to 0 artificially will not correctly represent the quantized activation output.

OneRaynyDay · 2018-07-20T00:46:36Z

python/mxnet/contrib/quantization.py

            else:
                stop = start + num_merged_bins
            norm = is_nonzeros[start:stop].sum()
            if norm != 0:
-                q[start:stop] = float(quantized_bins[j]) / float(norm)
-        q[sliced_nd_hist == 0] = 0
+                q[start:stop] = float(quantized_bins[j]) / float(num_quantized_bins)


Originally this was float(norm), and that is not appropriate. Suppose you have the distribution:

[0, 0, ... , 0, 1]

If the num_quantized_bins is 3, then you theoretically should get:

[0, 0, ... , 1/3, 1/3, 1/3]

instead of:

[0, 0, ..., 1, 1, 1]

To make this more clear, suppose your original dist is:

[0, 0, 0, ... , 1/3, 1/3, 1/3]

This should be equivalent after quantization as the first distribution, but it isn't. Under the rules, you would get the same array back, where you have [..., 1/3, 1/3 ,1/3], and the first distribution would give you [..., 1, 1, 1], off by the multiplier.

I'm not sure I understand your change here. The original implementation is following the explanation here (see page 38):
http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf

OneRaynyDay · 2018-07-20T00:47:41Z

python/mxnet/contrib/quantization.py

@@ -274,22 +279,21 @@ def _get_optimal_threshold(arr, num_bins=8001, num_quantized_bins=255):
    max_val = np.max(arr)
    th = max(abs(min_val), abs(max_val))

-    hist, hist_edeges = np.histogram(arr, bins=num_bins, range=(-th, th))
+    hist, hist_edges = np.histogram(arr, bins=num_bins, range=(-th, th))


edges is mispelled as edeges throughout the code

OneRaynyDay · 2018-07-20T00:48:22Z

tests/python/quantization/test_quantization.py

+    # at one edge: [0, 0, ..., 1000]. (histogram)
+    # We want to make sure that the optimal threshold in this case is the max.
+    arr = np.array([2]*1000)
+    res = mx.contrib.quant._get_optimal_threshold(arr, num_quantized_bins=5)


Using the incorrectly implemented code for _get_optimal_threshold, we would result in a divide by 0 error here.

OneRaynyDay · 2018-07-20T00:49:20Z

python/mxnet/contrib/quantization.py

+        try:
+            q = _smooth_distribution(q)
+        except ValueError:
+            divergence[i - num_half_quantized_bins] = float("inf")


If the distribution is improper, we set the KL divergence to infinity, as it could theoretically model a uniform distribution of parameters [a,b] with either variables unbounded, which means KL divergence is infinity.

OneRaynyDay · 2018-07-20T18:05:55Z

For reference:

With new PR, on imagenet, resnet 152, 5 batch, entropy method:

INFO:logger:Finished inference with 16000 images
INFO:logger:Finished with 178.458244 images per second
INFO:logger:('accuracy', 0.75525)
INFO:logger:('top_k_accuracy_5', 0.92275)

Previously:

INFO:logger:Finished inference with 16000 images
INFO:logger:Finished with 165.527678 images per second
INFO:logger:('accuracy', 0.7565)
INFO:logger:('top_k_accuracy_5', 0.9235)

This is a proof that there was no degradation in performance. @reminisce

OneRaynyDay · 2018-07-20T23:16:44Z

python/mxnet/contrib/quantization.py

            else:
                stop = start + num_merged_bins
            norm = is_nonzeros[start:stop].sum()
            if norm != 0:
                q[start:stop] = float(quantized_bins[j]) / float(norm)
-        q[sliced_nd_hist == 0] = 0
+        q[p == 0] = 0


According to the slides on page 38: http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf, the zero'd out bins are meant to be w.r.t reference distribution p, rather than the sliced_nd_hist.

* Fix quantization bug * Added tests and made sure the edge case is now considered correctly without 1 off errors * Changed back to original truncated distribution but with different kl divergence calc * Reorder back to original format * Reorder back to original format (again) * Change comments * Clarified comments * Changed norm division

ray_zhang added 5 commits July 19, 2018 14:40

Fix quantization bug

8cb5484

Added tests and made sure the edge case is now considered correctly w…

4ab863f

…ithout 1 off errors

Changed back to original truncated distribution but with different kl…

118a1b3

… divergence calc

Reorder back to original format

510e9ed

Reorder back to original format (again)

22c86c9

OneRaynyDay requested a review from szha as a code owner July 20, 2018 00:40

Change comments

695f229

OneRaynyDay commented Jul 20, 2018

View reviewed changes

Clarified comments

c06ff84

Changed norm division

6d1fd9c

OneRaynyDay commented Jul 20, 2018

View reviewed changes

reminisce approved these changes Jul 23, 2018

View reviewed changes

reminisce merged commit 55fef30 into apache:master Jul 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-688] Fix quantization divide by zero errors #11833

[MXNET-688] Fix quantization divide by zero errors #11833

OneRaynyDay commented Jul 20, 2018 •

edited

Loading

OneRaynyDay Jul 20, 2018

OneRaynyDay Jul 20, 2018

OneRaynyDay Jul 20, 2018

reminisce Jul 20, 2018

OneRaynyDay Jul 20, 2018

OneRaynyDay Jul 20, 2018

OneRaynyDay Jul 20, 2018

OneRaynyDay commented Jul 20, 2018 •

edited

Loading

OneRaynyDay Jul 20, 2018

[MXNET-688] Fix quantization divide by zero errors #11833

[MXNET-688] Fix quantization divide by zero errors #11833

Conversation

OneRaynyDay commented Jul 20, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OneRaynyDay commented Jul 20, 2018 • edited Loading

Choose a reason for hiding this comment

OneRaynyDay commented Jul 20, 2018 •

edited

Loading

OneRaynyDay commented Jul 20, 2018 •

edited

Loading