Add FP requantize flow. Set float32 flow by default for llvm x86 targets with sse4.1 support.#9637
Conversation
7068e63 to
9050d50
Compare
18824b4 to
da09e5e
Compare
66e0220 to
5225f48
Compare
|
5225f48 to
457711e
Compare
|
cc @jwfromm |
masahi
left a comment
There was a problem hiding this comment.
Looks good, only minor comments
2eb8658 to
b958076
Compare
|
Please go through your change and remove all uses of the term |
b958076 to
81458dc
Compare
python/tvm/topi/x86/utils.py
Outdated
| "amdfam10", | ||
| "athlon-4", | ||
| "athlon-xp", | ||
| "c3-2", |
There was a problem hiding this comment.
Do we need this level of details? I prefer dropping them. I don't think people would ever specify these targets...
I think sse4.1 - vnni are enough.
There was a problem hiding this comment.
I agree, sse4.1 looks good. Users can always use requantize_config to change the default behavior.
Done.
masahi
left a comment
There was a problem hiding this comment.
Very nice, just more minor comments and I'll merge this.
…ets with sse4.1 support
81458dc to
5b07e4c
Compare
|
Please kick another CI job. |
…ets with (apache#9637) sse4.1 support
…ets with (apache#9637) sse4.1 support
Added a new calculation_flow_type parament to the relay.qnn.op.requantize. This parameter is controlling the implementation flow of this function. Valid values: "int64", "float32", "float64".
The basic idea is that for some targets implementations other than "int64" (the only one at the moment) will be more productive.
Below some measurements were made on AMD Ryzen 7 5800H with TVM_NUM_THREADS=1
Performance with "llvm -mcpu=core-avx2" target:

Performance with "llvm" target:

Accuracy with "llvm -mcpu=core-avx2" target:
Accuracy with "llvm" target:
Additional changes: