[Relay][AlterOp] NHWC to NCHWc support for Pool, concatenate, sum.#4059
[Relay][AlterOp] NHWC to NCHWc support for Pool, concatenate, sum.#4059zhiics merged 1 commit intoapache:masterfrom
Conversation
|
@yzhliu @vinx13 @ZihengJiang @tqchen Please review when you get time. |
2d2794e to
254adbe
Compare
src/relay/op/nn/pad.cc
Outdated
| bool is_layout_modified = new_in_layouts.defined(); | ||
| if (new_in_layouts.defined()) { | ||
| LOG(INFO) << "New = " << new_in_layouts[0]; | ||
| LOG(INFO) << "Old = " << old_in_layouts[0]; |
src/relay/op/nn/pad.cc
Outdated
| new_pad_width.push_back(axis_pad_width.at(axis_name)); | ||
| } else { | ||
| // This is the axis that got split. So, check that pad_width was [0, 0] originally. | ||
| LOG(INFO) << "Old axis " << axis_name; |
src/relay/op/nn/pad.cc
Outdated
| ret = Layout::Undef(); | ||
| } | ||
| } | ||
| LOG(INFO) << "Final layout = " << ret; |
src/relay/op/nn/pad.cc
Outdated
| // split. | ||
|
|
||
| // 1) Create a map from axis to param_width using old layout. | ||
| std::map<std::string, tvm::Array<tvm::Expr>> axis_pad_width; |
There was a problem hiding this comment.
do you think it is more convenient to add a hash function in LayoutAxis, and use const LayoutAxis& as key?
src/relay/op/nn/pad.cc
Outdated
| std::map<std::string, tvm::Array<tvm::Expr>> axis_pad_width; | ||
| int index_counter = 0; | ||
| CHECK_EQ(new_in_layouts.size(), 1); | ||
| for (auto axis : old_in_layouts[0]->axes) { |
There was a problem hiding this comment.
CHECK_EQ(old_in_layouts[0]->axes.size(), params->pad_width.size())
src/relay/op/nn/pad.cc
Outdated
| for (auto axis : new_in_layouts[0]->axes) { | ||
| auto axis_name = axis->var->name_hint; | ||
| if (axis_pad_width.count(axis_name) != 0) { | ||
| // This is primal axis. So, directly use the original pad_width. |
There was a problem hiding this comment.
there's no guarantee that the old layouts are all primal, right?
src/relay/op/tensor/reduce.cc
Outdated
| return r_axes; | ||
| } | ||
|
|
||
| Array<Array<Layout>> SumInferCorrectLayout(const Attrs& attrs, const Array<Layout>& new_in_layouts, |
There was a problem hiding this comment.
| Array<Array<Layout>> SumInferCorrectLayout(const Attrs& attrs, const Array<Layout>& new_in_layouts, | |
| Array<Array<Layout>> ReduceInferCorrectLayout(const Attrs& attrs, const Array<Layout>& new_in_layouts, |
| } | ||
| // Collect only the primal axis. | ||
| if (LayoutAxis::Get(layout_axis).IsPrimal()) { | ||
| new_layout_string += layout_dim; |
There was a problem hiding this comment.
Sub axis, we dont have to do anything. Basically, this will insert a layout transform from NCHWC16c to NCHW before reduce.
src/relay/op/tensor/reduce.cc
Outdated
| std::string new_layout_string = ""; | ||
| int axis_index = 0; | ||
| for (auto layout_axis : new_in_layouts[0]->axes) { | ||
| char layout_dim = layout_axis->var->name_hint.at(0); |
There was a problem hiding this comment.
would be better to consistently use LayoutAxis
c65a536 to
87f012e
Compare
|
@anijain2305 Could you take a look at the failure test? |
|
I am getting some coreml flaky errors with pad operator. I will try to reproduce them. Currently, removing Pad changes from this PR. Will send separately. |
|
I will TAL later. |
zhiics
left a comment
There was a problem hiding this comment.
Only a few small questions to clarify. It looks good to me.
| (pad_top, pad_bottom), | ||
| (pad_left, pad_right), | ||
| (0, 0))) | ||
| do_pad = not (pad_top == 0 and pad_bottom == 0 and pad_left == 0 and pad_right == 0) |
There was a problem hiding this comment.
Is this needed? or should it be done with pad?
There was a problem hiding this comment.
This one is fine. It is just an optimization that ensures that we do not redundant pads.
| const Array<Layout>& old_in_layouts, | ||
| const Array<Array<IndexExpr>>& old_in_shapes) { | ||
| // NOTE: Discard "const" qualifier here. | ||
| ReduceAttrs* params = const_cast<ReduceAttrs*>(attrs.as<ReduceAttrs>()); |
There was a problem hiding this comment.
Why const_cast here? is it because we need to change the axis?
|
Thanks @anijain2305 @yzhliu @tmoreau89. This is now merged. |
For quantized InceptionV3, the number of layout transforms are reduced with this PR
Original - 260 Layout transforms / 1836 total operators
After PR - 172 Layout transforms / 1718 total operators
(Many of the transforms out of 172 are cheaper than the original 260 LT).