-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Support Quantized Fully Connected by INT8 GEMM #12922
Conversation
@reminisce @zheng-da could you help take a review? |
@lihaofd Thanks for the contribution! |
enum QuantilizedfcOpResource {kTempSpace}; | ||
} | ||
|
||
struct QuantizedSumInitKernelWithBias { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest move all the implementation to .cc file since it's only for CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
*dispatch_mode = DispatchMode::kFComputeEx; | ||
} | ||
for (size_t i = 0; i < out_attrs->size(); i++) | ||
(*out_attrs)[i] = kDefaultStorage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use STORAGE_TYPE_ASSIGN_CHECK
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just delete this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
for (size_t i = 0; i < out_attrs->size(); i++) | ||
(*out_attrs)[i] = kDefaultStorage; | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if in_attrs
has unknown storage types? You need to
- Check and assign stype to
in_attrs
as well. - return false if any stype is unknown in
in_attrs
andout_attrs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider using range for loops for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @larroy meant to use:
for (auto &v : in_attrs) {
// ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -283,16 +283,16 @@ def check_quantized_fc(data_shape, num_hidden, no_bias, qdtype, flatten=True): | |||
fc_fp32_exe = fc_fp32.simple_bind(ctx=mx.current_context(), grad_req='null') | |||
if qdtype == 'uint8': | |||
data_low = 0.0 | |||
data_high = 127.0 | |||
data_high = 63.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason of changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change data range from (-127,127) to (-63, 63) to avoid potential overflow when using igemm in some hardware platform
} | ||
|
||
for (size_t i = 0; i < in_attrs->size(); i++) { | ||
(*in_attrs)[i] = kDefaultStorage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, delete this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@zheng-da Could you please take a look? |
#include "../nn/fully_connected-inl.h" | ||
|
||
namespace mxnet { | ||
namespace op { | ||
|
||
namespace quantized_fc { | ||
enum QuantilizedfcOpResource {kTempSpace}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Quantized'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
using mshadow::red::limits::MinValue; | ||
using mshadow::red::limits::MaxValue; | ||
float float_for_one_out_quant = | ||
MaxAbs(*min_out, *max_out) / static_cast<double>(MaxValue<T1>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 spaces as indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
} | ||
}; | ||
template<typename SrcType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a blank line before this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
}; | ||
template<typename SrcType> | ||
void MKLDNNQuantizedFullyConnectedForward(const nnvm::NodeAttrs& attrs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chang to another function name? Since it doesn't call any MKL-DNN APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Thanks for your contribution @lihaofd |
|
||
template<typename SrcType> | ||
void QuantizedFullyConnectedForward(const nnvm::NodeAttrs& attrs, | ||
const OpContext &ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix indent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. All of my comments are addressed.
@mxnet-label-bot update [pr-awaiting-merge] |
@reminisce @zheng-da Look ok to one of you? |
n, | ||
&oc); | ||
#else | ||
LOG(FATAL) << "s8u8s32 is only supported by MKL BLAS library"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this error message be made little bit more verbose for users? Like mentioning Quantized INT8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
out[i] = bias[i] * float_for_one_bias_quant / | ||
float_for_one_out_quant; | ||
} else { | ||
LOG(INFO) << "WARNING: QuantizedBiasAddKernel float_for_one_out_quant is 0 !"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this info more verbose and add more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@reminisce @sandeep-krishnamurthy @KellenSunderland Please check if your comments are addressed and then we can move forward. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out[i] = bias[i] * float_for_one_bias_quant / | ||
float_for_one_out_quant; | ||
} else { | ||
LOG(INFO) << "WARNING: float_for_one_out_quant is 0, need to check min/max data !"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Seems like a mix of INFO / WARNING usage here.
@@ -270,7 +270,7 @@ def check_quantized_pooling(data_shape, kernel, pool_type, pad, stride, global_p | |||
def test_quantized_fc(): | |||
def check_quantized_fc(data_shape, num_hidden, no_bias, qdtype, flatten=True): | |||
if mx.current_context().device_type != 'gpu': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to run this test on CPU in CI. Could we test to see if 'MKL' is in the env var 'BUILD_TAG' and run the test if it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KellenSunderland good suggestion! Currently, the CI doesn't include Intel MKL library as BLAS library and @azai91 is working on adding it so that we can have a better coverage, such as batch_gemm, quantization FC, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pengzhao-intel Oh sorry, didn't realize that was the case. If the tests won't pass without full mkl installed and it's not there let's add this in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pengzhao-intel do you mean the full MKL? We already use MKLML on CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lebeg yes, I mean full MKL. The MKLML doesn't have the INT8 GEMM now :)
0de4156
to
1f98f63
Compare
@KellenSunderland, @pengzhao-intel @TaoLv |
Thanks for asking. Great changes. No blocking issues. However, reading through error messages, I don't think it was very easy to understand for users on what failed and actions like what should they do no. |
@lihaofd Thanks for resetting. I'll try to monitor this PR closely and merge when you're ready. Ping me on the other PR as well and I'll try and help out there. By the way: quite a few people on my team are looking forward to this PR. Thanks for the contribution and patience in the review. |
@lihaofd please rebase code and trigger MKL ci. |
Test case need be refined to make it can run into MKL BLAS. |
sync to latest master code base
@KellenSunderland @TaoLv ,could you help to review and check if it can be merged? Thanks! |
@@ -48,8 +54,9 @@ bool QuantizedFullyConnectedShape(const nnvm::NodeAttrs& attrs, | |||
SHAPE_ASSIGN_CHECK(*in_shape, 2, bshape); | |||
} | |||
|
|||
for (size_t i = num_inputs; i < 3 * num_inputs; ++i) { | |||
SHAPE_ASSIGN_CHECK(*in_shape, i, TShape{1}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use SHAPE_ASSIGN_CHECK
?
@@ -66,11 +73,12 @@ bool QuantizedFullyConnectedType(const nnvm::NodeAttrs& attrs, | |||
CHECK_EQ(in_type->size(), num_inputs * 3); | |||
CHECK_EQ(out_type->size(), 3U); | |||
|
|||
for (size_t i = 0; i < num_inputs; ++i) { | |||
TYPE_ASSIGN_CHECK(*in_type, i, mshadow::kInt8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use TYPE_ASSIGN_CHECK
?
} | ||
for (size_t i = num_inputs; i < 3 * num_inputs; ++i) { | ||
TYPE_ASSIGN_CHECK(*in_type, i, mshadow::kFloat32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use TYPE_ASSIGN_CHECK
?
@lihaofd Thank you for your contribution and patience. Now merging. |
* add quantized fully connect support * disable qfc cpu case since s8u8s32 is only supported by MKL BLAS library * retrigger to ci testing * move implementation to cc file and add STORAGE_TYPE_ASSIGN_CHECK * fix typo bug * retrigger the ci test * fix typo bug * retrigger ci * retrigger the ci test * retrigger the ci * retrigger the ci test * retrigger ci test * fix indent issue * retrigger the ci * retrigger the ci test * add verbose message * update log message * using range for loop * using for auto range * enable MKL BLAS ci test * fix typo issue * use TYPE_ASSIGN_CHECK * retrigger the ci
Description
In this PR, it created quantized fully connected op by using int8 gemm
@pengzhao-intel, @TaoLv , @ciyongch
Feature changes
New features
Unit-test changes
Checklist