Optimize multisample resolve with SSE2 instructions
Benchmark results:
Run on (48 X 2594 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x24)
L1 Instruction 32 KiB (x24)
L2 Unified 256 KiB (x24)
L3 Unified 30720 KiB (x2)
---------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------
(LLVM, before)
Triangle/Hello 0.845 ms 0.439 ms 1673
Triangle/Multisample 6.95 ms 0.781 ms 1000
(LLVM, after)
Triangle/Hello 0.861 ms 0.450 ms 1493
Triangle/Multisample 4.03 ms 0.753 ms 747
(Subzero, before)
Triangle/Hello 1.19 ms 0.474 ms 1120
Triangle/Multisample 11.8 ms 0.920 ms 747
(Subzero, after)
Triangle/Hello 0.907 ms 0.486 ms 1673
Triangle/Multisample 4.62 ms 0.781 ms 1000
Bug: b/147802090
Change-Id: Iea8498f2b745c86cf578db5c0f7ef2329b73c736
Reviewed-on: https://swiftshader-review.googlesource.com/c/SwiftShader/+/47970
Presubmit-Ready: Nicolas Capens <nicolascapens@google.com>
Tested-by:
Nicolas Capens <nicolascapens@google.com>
Reviewed-by:
Alexis Hétu <sugoi@google.com>
Kokoro-Result: kokoro <noreply+kokoro@google.com>
Showing
Please
register
or
sign in
to comment