Optimize transcendentals for Subzero
With this change, we can now select implementations of most
transcendentals from either the "emulated" or "optimal"
namespaces. The emulated versions generally call the math.h
standard function on each component for vector types, while
the optimal versions typically implement some approximation
in Reactor to produce vectorized code.
Most of the optimal versions were taken directly from ShaderCore.cpp,
except for ASin, for which I implemented an 8-term approximation. The
new versions are faster, and pass all deqp precision tests.
Here's a table of benchmarks that show the performance improvements that
were made. Note that Asin and Acos now take a Precision parameter for
Full and Relaxed precision:
Before After
rr_Sin 48.6 ns 10.6 ns
rr_Cos 67.1 ns 9.62 ns
rr_Tan 75.5 ns 19.4 ns
rr_Asin_fullp 24.2 ns 23.0 ns
rr_Asin_relaxedp N/A 9.31 ns
rr_Acos_fullp 14.3 ns 6.35 ns
rr_Acos_relaxedp N/A 4.56 ns
rr_Atan 66.8 ns 12.9 ns
rr_Sinh 79.7 ns 11.5 ns
rr_Cosh 80.1 ns 11.5 ns
rr_Tanh 62.9 ns 12.1 ns
rr_Asinh 104 ns 9.44 ns
rr_Acosh 14.4 ns 10.2 ns
rr_Atanh 170 ns 9.81 ns
rr_Atan2 73.5 ns 22.8 ns
rr_Pow 87.9 ns 16.3 ns
rr_Exp 40.2 ns 5.72 ns
rr_Log 44.0 ns 7.35 ns
rr_Exp2 101 ns 5.38 ns
rr_Log2 106 ns 9.24 ns
Bug: b/147818976
Change-Id: I791893bd9f005dbbae4770fb474de338a04845be
Reviewed-on: https://swiftshader-review.googlesource.com/c/SwiftShader/+/48588
Kokoro-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by:
Nicolas Capens <nicolascapens@google.com>
Tested-by:
Antonio Maiorano <amaiorano@google.com>
Showing
src/Reactor/OptimalIntrinsics.cpp
0 → 100644
src/Reactor/OptimalIntrinsics.hpp
0 → 100644
Please
register
or
sign in
to comment