Commits · 83b8036b4e0fb45bfc0bb7e237279dce57bea42e · Chen Yisong / swiftshader

16 Jul, 2014 2 commits

Lower casting operations that involve vector types. · 83b8036b

authored Jul 16, 2014

Impacted instructions:

bitcast {v4f32, v4i32, v8i16, v16i8} <-> {v4f32, v4i32, v8i16, v16i8}
bitcast v8i1 <-> i8
bitcast v16i1 <-> i16

(There was already code present to handle trivial bitcasts like v16i1 <-> v16i1.)

[sz]ext v4i1 -> v4i32
[sz]ext v8i1 -> v8i16
[sz]ext v16i1 -> v16i8

trunc v4i32 -> v4i1
trunc v8i16 -> v8i1
trunc v16i8 -> v16i1

[su]itofp v4i32 -> v4f32
fpto[su]i v4f32 -> v4i32

Where there is a relatively simple lowering to x86 instructions, it has been used. Otherwise a helper call is used.

Some lowerings require a materialization of a integer vector with 1s in each entry. Since there is no support for vector constant pools, the constant is materialized purely through register operations.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/383303003

83b8036b

Lower bitmanip intrinsics, assuming absence of BMI/SSE4.2 for now. · e4da26f6

authored Jul 15, 2014

We'll need the fallbacks in any case. However, once we've
decided on how to specify the CPU features of the user
machine we can use the nicer LZCNT/TZCNT/POPCNT as well.

Adds cmov, bsf, and bsr instructions.

Calls a popcount helper function for machines without SSE4.2.

Not handling bswap yet (which can also take i16 params).

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/390443005

e4da26f6

15 Jul, 2014 2 commits

Various improvements related to legalization code. · ad8f7265

authored Jul 14, 2014

1) In makeHelperCall(), function pointers that are created should have
type IceType_i32, not the functions' own return type.

2) In legalize(), change the name of WillHaveRegister to
MustHaveRegister. Add a comment to clarify the condition being computed.

3) In legalize(), add an assert to make sure that vector "constants"
don't get legalized (other than undef). There should be no constants of
vector type.

4) In copyToReg(), replace an unnecessary use of Src->getType().

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/385133006

ad8f7265

Fix floating point vector frem lowering. · 0ecabc82

authored Jul 14, 2014

The frem operation takes two arguments.
Pass both Src0 and Src1 to __frem_v4f32.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/387153002

0ecabc82

14 Jul, 2014 2 commits

Remove memcpy test workaround for name mangling substitutions. · 140bb0d8

authored Jul 14, 2014

Now that the name mangling is a bit smarter (from commit:
217dc082), we don't need to
avoid having the same type twice in the function signature.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/389683003

140bb0d8

Subzero: lower the rest of the atomic operations. · a3a01a2f

authored Jul 14, 2014

64-bit ops are expanded via a cmpxchg8b loop.

64/32-bit and/or/xor are also expanded into a cmpxchg /
cmpxchg8b loop.

Add a cross test for atomic RMW operations and
compare and swap.

Misc: Test that atomic.is.lock.free can be optimized out if result is ignored.

TODO:
* optimize compare and swap with compare+branch further down
instruction stream.

* optimize atomic RMW when the return value is ignored
(adds a locked field to binary ops though).

* We may want to do some actual target-dependent basic
block splitting + expansion (the instructions inserted by
the expansion must reference the pre-colored registers,
etc.). Otherwise, we are currently getting by with modeling
the extended liveness of the variables used in the loops
using fake uses.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=jfb@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/362463002

a3a01a2f

11 Jul, 2014 4 commits

Lower vector floating point arithmetic operations. · 8d1072e7

authored Jul 11, 2014

This adds lowering code for fadd, fsub, fmul, fdiv, and frem. frem, having no native x86 counterpart, is implemented by making a helper call.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/389653002

8d1072e7

Subzero: Fix the name mangling code's base-36 increment. · 78b4c0b8

authored Jul 11, 2014

SZZZ_ was being incremented to S0000_ instead of S1000_.

BUG= https://codereview.chromium.org/385273002/
R=wala@chromium.org

Review URL: https://codereview.chromium.org/390533002

78b4c0b8

Subzero: Deal with substitutions in the primitive remangler. · 217dc082

authored Jul 11, 2014

https://refspecs.linuxbase.org/cxxabi-1.75.html#mangling-compression
describes the mechanism for compressing mangled strings by using substitutions of the form S[0-9A-Z]*_ to represent repeated components.

When the prefix is handled as wrapping inside a namespace, the base-36 substitution numbers all have to be incremented.

This is implemented in a very simple way by scanning the string only for instances of the substitution pattern.

Unfortunately, false matches are possible because the S[0-9A-Z]*_ pattern can be a substring of the type name, or can span other components of the mangled name. Getting this completely right would essentially require a full demangling parser - see the ~4000 lines of code in cxa_demangle.cpp and ItaniumMangle.cpp.

Since this is just for testing, any false matches will likely cause a linking error and the test can be rewritten to avoid false matches.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/385273002

217dc082

Clean up exit status and globals procecessing in llvm2ice. · b164d208

authored Jul 11, 2014

Makes IceTranslator.ExitStatus a boolean (rather than int), and changes
code to check flag when done. Fixes bug introduced in
https://codereview.chromium.org/387023002.

Also cleans up the (Ice) Converter class to handle globals processing,
rathe than doing it in llvm2ice.cpp.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3894
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/387023002

b164d208

10 Jul, 2014 1 commit

Subzero: Fix a regalloc bug involving too-aggressive AllowRegisterOverlap. · ca662e9d

authored Jul 10, 2014

See the BUG description for more details.  In short, the register allocator
was inappropriately honoring AllowRegisterOverlap even when the variable's
live range overlaps with an Unhandled variable precolored to the preferred
register.

Also changes legalize() logic to recognize when a variable is guaranteed
to ultimately have a physical register due to infinite weight, and not
create a new temporary in those cases.

Finally, dumps RegisterPreference and AllowRegisterOverlap info for
Variables for improved diagnostics.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3897
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/380363002

ca662e9d

09 Jul, 2014 4 commits

Subzero: Add "make format-diff" target. · 240e0f8a

authored Jul 09, 2014

This invokes clang-format-diff.py so you can easily reformat just
the code you touched.

(Caution, this may not apply to new files.)

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/372133002

240e0f8a

Add support for passing and returning vectors in accordance with the x86 calling convention. · 45a06236

authored Jul 09, 2014

- Add TargetLowering::lowerArguments() as a new stage in TargetLowering.
- Add support for passing arguments/return values in XMM registers in the x86 target.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/372113005

45a06236

Add scalar lowering for sqrt intrinsic. · f37fbbe9

authored Jul 09, 2014

Re-used test_arith_main.cpp, mostly to share the set of interesting
floating point constants.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/384443003

f37fbbe9

Avoid assigning esp (or ebp for framepointer-using frames) in Om1. · 9559899d

authored Jul 09, 2014

For ebp, exclude as needed. For esp, don't mark it as
an int register.

Not sure exactly how to do a targeted test for this Om1
register allocator. The Om1 regalloc seems to start w/ a
fresh whitelist after each instruction, so it may assign
the same register (e.g., eax), as an earlier instruction.
Without pre-colored registers, I'm not sure how to force it
to allocate something other than the first few registers.
I do have a test case that has a ton of pre-colored
registers, (e.g., cmpxchg8b), but that is a different CL:
https://codereview.chromium.org/362463002/

Encountered for:
BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/369573005

9559899d

08 Jul, 2014 1 commit

Subzero: Temporary fix for build error. · e169e66d

authored Jul 08, 2014

The compile error was introduced in https://codereview.chromium.org/361733002/ .

BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/376923003

e169e66d

07 Jul, 2014 2 commits

Add support for vector types. · 928f1297

authored Jul 07, 2014

- Add vector types to the type table.

- Add support for parsing vector types in llvm2ice.

- Legalize undef vector values to zero. Test that undef vector values are lowered correctly.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/353553004

928f1297

Update Subzero to start parsing PNaCl bitcode files. · 8d7abae9

authored Jul 07, 2014

This patch only handles global addresses in PNaCl bitcode files.
Function blocks are still not parsed. Also, factors out a common API
for translation, so that generated ICE can always be translated using
the same code.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3892
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/361733002

8d7abae9

29 Jun, 2014 1 commit

Subzero: Partial implementation of global initializers. · de4ca71e

authored Jun 29, 2014

This is still missing a couple things:

1. It only supports flat arrays and zeroinitializers. Arrays of structs are not yet supported.

2. Initializers can't yet contain relocatables, e.g. the address of another global.Mod

Some changes are made to work around an llvm-mc assembler bug. When assembling using intel syntax, llvm-mc doesn't correctly parse symbolic constants or add relocation entries in some circumstances. Call instructions work, and use in a memory operand works, e.g. mov eax, [ArrayBase+4*ecx]. To work around this, we adjust legalize() to not allow ConstantRelocatable by default, except for memory operands and when called from lowerCall(), so the relocatable ends up being the source operand of a mov instruction. Then, the mov emit routine actually emits an lea instruction for such moves.

A few lit tests needed to be adjusted to make szdiff work properly with respect to global initializers.

In the new cross test, the driver calls test code that returns a pointer to an array with a global initializer, and the driver compares the arrays returned by llc and Subzero.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/358013003

de4ca71e

27 Jun, 2014 1 commit
- Refactor llvm2ice so that Ice can be built while reading bitcode. · e1e013cf
  Karl Schimpf authored Jun 27, 2014
```
BUG=None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/350933002
```
  e1e013cf
26 Jun, 2014 1 commit

Subzero: Add 'not' to the list of LLVM commands in lit.cfg. · cc27a53a

authored Jun 26, 2014

Without this being in the command substitutions list, lit will rely on the 'not' command being in $PATH.

The substitution code is adapted from llvm/test/lit.cfg to add word-break regexps to the list.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/344063004

cc27a53a

25 Jun, 2014 1 commit

Add atomic load/store, fetch_add, fence, and is-lock-free lowering. · 5cd240df

authored Jun 25, 2014

Loads/stores w/ type i8, i16, and i32 are converted to
plain load/store instructions and lowered w/ the plain
lowerLoad/lowerStore.  Atomic stores are followed by an mfence
for sequential consistency.

For 64-bit types, use movq to do 64-bit memory
loads/stores (vs the usual load/store being broken into
separate 32-bit load/stores). This means bitcasting the
i64 -> f64, first (which splits the load of the value to be
stored into two 32-bit ops) then stores in a single op. For
load, load into f64 then bitcast back to i64 (which splits
after the atomic load). This follows what GCC does for
c++11 std::atomic<uint64_t> load/store methods (uses movq
when -mfpmath=sse). This introduces some redundancy between
movq and movsd, but the convention seems to be to use movq
when working with integer quantities. Otherwise, movsd
could work too. The difference seems to be in whether or
not the XMM register's upper 64-bits are filled with 0 or
not. Zero-extending could help avoid partial register
stalls.

Handle up to i32 fetch_add. TODO: add i64 via a cmpxchg loop.

TODO: add some runnable crosstests to make sure that this
doesn't do funny things to integer bit patterns that happen
to look like signaling NaNs and quiet NaNs. However, the system
clang would not know how to handle "llvm.nacl.*" if we choose to
target that level directly via .ll files. Or, (a) we use old-school __sync
methods (sync_fetch_and_add w/ 0 to load) or (b) require buildbot's
clang/gcc to support c++11...

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/342763004

5cd240df

24 Jun, 2014 1 commit

Bitcast of 64-bit immediates may need to split the immediate, not a var. · 1ee34165

authored Jun 24, 2014

Currently, the integer immediate is legalized to a
64-bit integer register first, and then the lower/upper
parts of that register are used for the bitcast.
However, mov(64_bit_reg, imm) done by the legalization
isn't legal.

Similarly, trunc of 64-bit immediates need to take the
lower half of the immediate, not legalize to a var first.

This shifts the legalization code around.

Other cases where immediates are illegal and legalized
are idiv/div, but for those cases 64-bit operands are
handled separately via a function call. The function
call code properly splits up immediate arguments.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/348373005

1ee34165

18 Jun, 2014 5 commits

Add a few Subzero intrinsics (not the atomic ones yet). · 3bd9f1af

authored Jun 18, 2014

Handle:
* mem{cpy,move,set} (without optimizations for known lengths)
* nacl.read.tp
* setjmp, longjmp
* trap

Mostly see if the dispatching/organization is okay.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/321993002

3bd9f1af

Add ss/sd suffix to InstX8632Store and legalize FP constants. · 5a13f456

authored Jun 18, 2014

InstX8632Store is essentially a "mov" and it would emit
a mov, but it did not add the ss/sd suffix based on the operand type.

Also, there are some cases where legalization would leave
two memory operands in the case that one of them
is a floating point immediate:

storeDoubleConst:
.LstoreDoubleConst$entry:
  mov     eax, dword ptr [esp+4]
  mov     qword ptr [eax], qword ptr [L$double$1]
  ret

BUG=none
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/341683002

5a13f456

Use GlobalContext::getConstantZero() to get zero valued constants. · 43ff7ebe
Matt Wala authored Jun 18, 2014
```
BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/344613002
```
43ff7ebe

Add support for undef values in ICE IR. Undef values represent an · d8f4a7de

authored Jun 18, 2014

arbitrary bit pattern and are lowered to a zero constant.

IceOperand.h: Introduce a new ConstantUndef subclass of
Constant. Add a getConstantZero() method.

IceGlobalContext.h / IceGlobalContext.cpp: Implement pooling for
ConstantUndefs.

IceTargetLoweringX8632.cpp: Legalize ConstantUndefs to constant
zeros.

llvm2ice.cpp: Translate LLVM Undefs into ConstantUndefs.

undef.ll: Test that undef values are recognized and legalized to
zero.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/339783002

d8f4a7de

Change some tests to be valid PNaCl IR (parameter type from i1 -> i32). · bdbe4023

authored Jun 17, 2014

Change the i1 zeroext parameter to an explicit zext and
i32. Add an assert in lowerCall that the type is at least
32-bits.

I ended up putting the assert in lowerCall instead of
InstX8632Push, since technically there are quite a few
modes that push allows: 16-bit reg/mem (just not 8-bit
reg/mem) and 8/16/32 bit constants.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/339933004

bdbe4023

17 Jun, 2014 2 commits

Fix subzero build for mac · 44712d15

authored Jun 17, 2014

The subzero mac build fails with errors like the following:
/Users/dschuff/code/nacl/native_client/toolchain_build/src/subzero/src/IceGlobalContext.cpp:116: error: ISO C++ forbids variable-size array 'NameBase'

Replace the variable-length array with llvm::SmallVector which will still
allow stack allocation most of the time.

R=stichnot@chromium.org
BUG=build subzero on the bots

Review URL: https://codereview.chromium.org/335343005

44712d15

Legalize div/idiv operands to avoid immediates. · 70d6883a

authored Jun 17, 2014

The div/idiv instruction operand must be a register or memory.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/339643003

70d6883a

12 Jun, 2014 2 commits

Ignore stack adjustment for ebp-based variables. · b0e142bd

authored Jun 12, 2014

The TargetX8632 class maintains a "current stack adjustment" during a push sequence, so that pushing or otherwise accessing stack locations during a function arg push sequence can use the right esp offset.

This adjustment should only be used for esp-based frames, but it was being used for ebp-based frames as well, causing the wrong stack-based arguments to be pushed.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3878
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/331743002

b0e142bd

Subzero: give crosstest .sz intermediate files names that depend on flags · 798b4155

authored Jun 12, 2014

Currently only the output has a unique name (supplied by the invocation),
but the intermediate files (.sz.s, .sz.o) can get overwritten (w/ different
optlevels, or targets).

Would be nice to keep them around for debugging. (bug may happen
for Om1 but not O2).

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/333713004

798b4155

06 Jun, 2014 1 commit

Make py import not assume dir is "pnacl-subzero". Avoid autovect in crosstest. · 1248a6d1

authored Jun 06, 2014

Derek's CL to check out subzero calls the source directory
"subzero", and the file header comments call the directory
"subzero". Just make the python sys.path munging for
importing pydir more generic.

Also change crosstest to not run the raw LLVM "opt" with
optimizations (only use it for ABI stabilization passes).
Instead run pnacl-clang with -O2. Otherwise, newer NACL_SDK
versions include a newer LLVM "opt" binary which
autovectorizes and may generate vector IR that is not
handled by Subzero yet.

E.g.,
LLVM ERROR: Invalid PNaCl instruction:   %1 = insertelement <4 x i32> undef, i32 %0, i32 0
w/ pepper_canary to version 37, revision 274873

BUG=none
TEST=make -f Makefile.standalone check
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/317963002

1248a6d1

05 Jun, 2014 1 commit

Fix a C++ violation. · ab8242ca

authored Jun 05, 2014

Ice::Inst::NumberSentinel is defined within the Inst class definition:

class Inst {
  ...
  static const InstNumberT NumberDeleted = -1;
  static const InstNumberT NumberSentinel = 0;
  ...
};

Under some compilers/options, this causes a link error when passing NumberSentinel as a const T& argument.

(Another option would be to move the actual definitions into IceInst.cpp.)

BUG= none
R=jfb@chromium.org

Review URL: https://codereview.chromium.org/311243006

ab8242ca

04 Jun, 2014 1 commit

Subzero: Initial O2 lowering · d97c7df5

authored Jun 04, 2014

Includes the following:
1. Liveness analysis.
2. Linear-scan register allocation.
3. Address mode optimization.
4. Compare-branch fusing.

All of these depend on liveness analysis. There are three versions of liveness analysis (in order of increasing cost):
1. Lightweight. This computes last-uses for variables local to a single basic block.
2. Full. This computes last-uses for all variables based on global dataflow analysis.
3. Full live ranges. This computes all last-uses, plus calculates the live range intervals in terms of instruction numbers. (The live ranges are needed for register allocation.)

For testing the full live range computation, Cfg::validateLiveness() checks every Variable of every Inst and verifies that the current Inst is contained within the Variable's live range.

The cross tests are run with O2 in addition to Om1.

Some of the lit tests (for what good they do) are updated with O2 code sequences.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/300563003

d97c7df5

02 Jun, 2014 1 commit

Add wala@chromium.org to owners list · 88a485ed

authored Jun 02, 2014

BUG= none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/305973005

88a485ed

23 May, 2014 2 commits

Fix g++ -pedantic warnings. · 4376d292

authored May 23, 2014

1. Comma-terminated enumerator lists.
2. Empty macro arguments.
3. Variable-length arrays.

The first issue is definitely hitting the Mac bots.  The other two issues will quite possibly following that.

BUG= none
R=jfb@chromium.org

Review URL: https://codereview.chromium.org/296823013

4376d292

Fix x86 floating-point constant emission. · f61d5b22

authored May 23, 2014

Previously, the basis of constant pooling was implemented, but two things were lacking:

1. The constant pools were not being emitted in the asm file.

2. A direct FP value was emitted in an FP instruction, e.g. "addss xmm0, 1.0000e00". Curiously, at least for some FP constants, llvm-mc was accepting this syntax.

BUG= none
R=jfb@chromium.org

Review URL: https://codereview.chromium.org/291213003

f61d5b22

22 May, 2014 2 commits

Add Makefiles to support building along with LLVM · bc643135

authored May 22, 2014

This change now supports building subzero as part of the LLVM build (instead
of in a separate build step). It is modeled on clang's Makefiles.

The existing Makefile has been renamed and can still be used manually, e.g.
Make -f Makefile.standalone

It does not yet support running tests, just building.

R=stichnot@chromium.org, jvoung@chromium.org
BUG=

Review URL: https://codereview.chromium.org/293983007

bc643135

Add Om1 lowering with no optimizations. · 5bc2b1d1

authored May 22, 2014

This adds infrastructure for low-level x86-32 instructions, and the target lowering patterns.

Practically no optimizations are performed. Optimizations to be introduced later include liveness analysis, dead-code elimination, global linear-scan register allocation, linear-scan based stack slot coalescing, and compare/branch fusing. One optimization that is present is simple coalescing of stack slots for variables that are only live within a single basic block.

There are also some fairly comprehensive cross tests. This testing infrastructure translates bitcode using both Subzero and llc, and a testing harness calls both versions with a variety of "interesting" inputs and compares the results. Specifically, Arithmetic, Icmp, Fcmp, and Cast instructions are tested this way, across all PNaCl primitive types.

BUG=
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/265703002

5bc2b1d1