Commits · 89d7956d7dcc007fa29bea5c64150602feab47ee · Chen Yisong / swiftshader

27 Aug, 2014 3 commits

Subzero: Fix address mode optimization involving phi temporaries. · 89d7956d

authored Aug 27, 2014

Also adds much-needed logging of the decision process that goes into the address mode optimization.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/490333003

89d7956d

Subzero: Fix the link command for Trusty. · 14c3f417

authored Aug 27, 2014

With the original link command, -lpthread comes before some other LLVM libraries, and this ends up causing undefined pthreads symbols. The new link command makes sure the -lpthread part comes last.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/514723004

14c3f417

Subzero: Fix some legalization issues involving immediates. · ef8cf0e0

authored Aug 26, 2014

Some lowering sequences were incorrectly allowing immediate operands in native instructions. This includes 32-bit icmp, 64-bit icmp, select, switch, and 64-bit mul.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/511543002

ef8cf0e0

26 Aug, 2014 4 commits

Subzero: Add a check-lit target for faster smoke testing. · ac9c9439
Jim Stichnoth authored Aug 26, 2014
```
BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/507813002
```
ac9c9439

Subzero: Fixes for Hello World and bisection debugging. · bfb03e57

authored Aug 26, 2014

Add the llvm2ice -sandbox option (false by default) to select between
native and sandboxed code generation.  Currently, it controls whether
the llvm.nacl.read.tp intrinsic is lowered to gs:[0x0] or a call to
__nacl_read_tp.

Change the asm output slightly for -ffunction-sections so that objdump
is more willing to provide a disassembly.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/504963002

bfb03e57

Revert "COmmit" · 6170e757

authored Aug 26, 2014

This was committed as a test, not actually intended.

This reverts commit 420e8bf2.

BUG=
R=dschuff@chromium.org

Review URL: https://codereview.chromium.org/504073003

6170e757

COmmit · 420e8bf2
Jim Stichnoth authored Aug 26, 2014
```
Patch from Jim Stichnoth <stichnot@chromium.org>.
```
420e8bf2

18 Aug, 2014 1 commit

Subzero: Fix the simple register allocation for -Om1. · 4d79fe5b

authored Aug 18, 2014

Background: After lowering each high-level ICE instruction, Om1 calls
postLower() to do simple register allocation.  It only assigns
registers where absolutely necessary, specifically for infinite-weight
variables, while honoring pre-coloring decisions.  The original Om1
register allocation never tried to reuse registers within a lowered
sequence, which was generally OK except for very long lowering
sequences, such as call instructions or some intrinsics.  In these
cases, when it ran out of physical registers, it would just reset the
free list and hope for the best, but with no guarantee of correctness.

The fix involves keeping track of which instruction in the lowered
sequence holds the last use of each variable, and releasing each
register back to the free list after its last use.  This makes much
better use of registers.  It's not necessarily optimal, at least with
respect to pre-colored variables, since those registers are
black-listed even if they don't interfere with an infinite-weight
variable.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/483453002

4d79fe5b

15 Aug, 2014 2 commits

Subzero: Randomly insert nops. · c3302746

authored Aug 15, 2014

Adds command line options -nop-insertion, -nop-insertion-probability=X, and -max-nops-per-instruction=X.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/463563006

c3302746

Subzero: Start a list of SIMD improvement ideas. · 9dbe38e3

authored Aug 15, 2014

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/477773003

9dbe38e3

14 Aug, 2014 2 commits

Subzero: Align spill locations to natural alignment. · d4799f47

authored Aug 14, 2014

This requires sorting the spilled variables based on alignment and
introducing additional padding around the spill location areas.

These changes allow vector instructions to accept memory operands.

Old stack frame layout:  New stack frame layout:
+---------------------+  +---------------------+
| return address      |  | return address      |
+---------------------+  +---------------------+
| preserved registers |  | preserved registers |
+---------------------+  +---------------------+
| global spill area   |  | padding             |
+---------------------+  +---------------------+
| local spill area    |  | global spill area   |
+---------------------+  +---------------------+
| padding             |  | padding             |
+---------------------+  +---------------------+
| local variables     |  | local spill area    |
+---------------------+  +---------------------+
                         | padding             |
                         +---------------------+
                         | local variables     |
                         +---------------------+

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/465413003

d4799f47

Emit .local before .comm for bss to make llvm-mc happy. · f820da5e

authored Aug 14, 2014

Otherwise llvm-mc asserts. This is also the order that llc emits the directives.
Change a couple of RUIN -> RUN in lit tests.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/469973002

f820da5e

13 Aug, 2014 1 commit

Convert lit test llvm-mc -arch arguments to full -triple. · c8e87812

authored Aug 13, 2014

Mostly to make them a bit more portable across OSes.
Otherwise the OS assumed by llvm-mc is the build/host OS. So,
on Mac llvm-mc will assume it's targeting darwin and only accepts macho
assembler directives. Assembler directives like .rodata.cst8 are not accepted
(I'm guessing it uses .cstring, .literal4, etc. instead?).

Force an OS (NaCl) so that ELF-related assembler macros make sense.

Also remove a now unused function typeIdentString to make clang happy.

Example errors:
Command 5 Stderr:
<stdin>:5:2: error: unknown directive
        .type   fixed_400,@function
        ^
<stdin>:23:2: error: unknown directive
        .type   variable_n,@function
        ^
<stdin>:40:11: error: mach-o section specifier uses an unknown section type
        .section        .rodata.cst4,"aM",@progbits,4
                        ^
<stdin>:42:11: error: mach-o section specifier uses an unknown section type
        .section        .rodata.cst8,"aM",@progbits,8
                        ^

BUG=none
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/467103004

c8e87812

12 Aug, 2014 4 commits

Subzero: Factor our commonalities between mov-like instructions. · e58178ab

authored Aug 12, 2014

Introduce a base class for mov, movq, and movp instruction classes.

BUG=none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/466733005

e58178ab

Subzero: Align the stack at the point of function calls. · 105b7044

authored Aug 11, 2014

Be compatible with the x86-32 calling convention by ensuring that the
stack is aligned to 16 bytes at the point of the call
instruction. Also ensure that vector arguments passed on the stack are
16 byte aligned.

Also, make alloca instructions respect alignment.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/444443002

105b7044

Subzero: address mode opt: Transform *(reg+const) into [reg+const]. · 8835b89b

authored Aug 11, 2014

Teach address mode optimization about Base=Base+Const,
Base=Const+Base, and Base=Base-Const patterns.

Change ConstantInteger::emit() to emit signed values.

BUG=none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/459133002

8835b89b

Subzero: Fix a debugging string in the test_icmp crosstest. · 89cbfb08

authored Aug 11, 2014

STR(inst) should be STR(cmp).

BUG=none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/466543002

89cbfb08

08 Aug, 2014 3 commits

Subzero: Add a random number generator. · 1bd2fce4

authored Aug 08, 2014

This is inital work necessary for diversification support in Subzero.
The random number generator implementation is temporary.  It will
eventually use a cryptographically secure pseudorandom number
generator (perhaps from LLVM, if LLVM gets one).

Add the -rng-seed= option to seed the random number generator from
the command line.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/455593004

1bd2fce4

Subzero: Add the "llvm2ice -ffunction-sections" argument. · 989a703f

authored Aug 08, 2014

The purpose is to enable bisection debugging of Subzero-translated functions, using objcopy to selectively splice functions from llc and Subzero into the binary.

Note that llvm-mc claims to take this argument, but actually does nothing with it, so we need to implement it in Subzero.

Also moves the ClFlags object into the GlobalContext so everyone can access it.

BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/455633002

989a703f

Subzero: Make InstX8632Cbwdq a UnaryOp. · 51e8cfba

authored Aug 08, 2014

After the changes in CL 443203003, InstX8632Cbwdq fits the template for
a UnaryOp, so change it to be in instance of this class.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/452143003

51e8cfba

07 Aug, 2014 2 commits

Subzero: Use scalar arithmetic when no vector instruction exists. · afeaee41

authored Aug 07, 2014

Implement scalarizeArithmetic() which extracts the components of the
input vectors, performs the operation with scalar instructions, and
builds the output vector component by component.

Fix the lowering of sdiv and srem.  These were previously emitting a
wrong instruction (cdq) for i8 and i16 inputs (needing cbw, cwd).

In the test_arith crosstest, mask the inputs to vector shift
operations to ensure that the shifts are in range.  Otherwise the
Subzero output is not identical to the llc output in some (undefined)
cases.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/443203003

afeaee41

Subzero: A few fixes toward running larger programs. · 206833c6

authored Aug 07, 2014

1. Add 'llvm2ice -disable-globals' to disable Subzero translation of
global initializers, since full support isn't yet implemented.

2. Change the names of intra-block branch target labels to avoid
collisions with basic block labels.

3. Fix lowering of "br i1 <constant>, label ...", which was producing
invalid instructions like "cmp 1, 0".

4. Fix the "make format-diff" operation, which was diffing against the wrong target.

BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/449093002

206833c6

05 Aug, 2014 1 commit

Subzero: Fix and clean up some cross tests. · 7da431b5

authored Aug 05, 2014

1. It turns out that the crosstest scripts mix different versions of
clang - build_pnacl_ir.py uses pnacl-clang from the NaCl SDK for the
tests, while crosstest.py uses clang/clang++ from LLVM_BIN_PATH for
the driver.  The SDK has been updated to use a different version of
the standard library, and now there is a mismatch as to whether int8_t
is typedef'd to 'char' or 'signed char', leading to name mangling
mismatches.  (char, signed char, and unsigned char are distinct
types.)  We deal with this by using myint8_t which is explicitly
defined as signed char.

2. Some ugly function pointer casting in test_arith_main.cpp is fixed/removed.

3. std::endl is replaced with "\n".

4. License text is added to tests that were touched by the above items.

BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/435353002

7da431b5

31 Jul, 2014 1 commit

Subzero: Fix some issues related to legalization and undef handling. · e377767c

authored Jul 31, 2014

1. Much of the lowering code for vector operations was not properly
checking that the input operand was in a register or memory. This
problem could be exhibited by passing undef values as inputs.

=> Change the vector legalization code to legalize input operands to
register or memory before producing instructions that use the
operands. Also, append a suffix to the variable names in the vector
legalization code to clarify the legalization status of the values.

2. Undef values should never be emitted directly. Rather, they should
have been appropriately legalized to a zero value.

=> To enforce this, make ConstantUndef::emit() issue an error
message. Do this in the x86 backend, as other backends may decide to
treat undef values differently.

3. The regalloc_evict_non_overlap test was loading from an undef
pointer. Subzero was not handling this correctly (the undef pointer was
being emitted without being legalized), but it does not have to handle
this case since PNaCl IR disallows undef pointers.

=> Fix the regalloc_evict_non_overlap test to use an inttoptr instead of
directly loading from the undef pointer. Also, add an assert in
IceTargetLoweringX8632::FormMemoryOperand() to make sure that undef
pointers are never encountered.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/432613002

e377767c

30 Jul, 2014 6 commits

Subzero: Fix a signed/unsigned warning reported on the Mac. · 5acafbc0

authored Jul 30, 2014

Also cleans up some unneeded table size const static variables.

BUG= https://codereview.chromium.org/296053008/
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/428353002

5acafbc0

Subzero: Try to fix warnings and errors in the Windows build. · 6e992147

authored Jul 30, 2014

Quiet some unused-variable warnings when their only use is in an assert().

Forward-declare partial template specializations when the template method already has a default implementation, to avoid ODR violations and link errors.

BUG= https://codereview.chromium.org/296053008/
R=wala@chromium.org

Review URL: https://codereview.chromium.org/429993002

6e992147

Add dtor to InstX8632Lockable. · 1e889586

authored Jul 30, 2014

Speculative fix for Mac GCC build.

BUG=none
R=dschuff@chromium.org

Review URL: https://codereview.chromium.org/432523002

1e889586

Subzero: Add support for SSE4.1 instructions. · 0a450519

authored Jul 30, 2014

* Add initial support for code generation with SSE4.1 instructions. The
following operations are affected:
 - multiplication with v4i32
 - select
 - insertelement
 - extractelement

* Add appropriate lit checks for SSE4.1 instructions. Run the crosstests
in both SSE2 and SSE4.1 mode.

* Introduce the -mattr flag to llvm2ice to control which instruction set
gets used.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/427843002

0a450519

Fix bug when atomic load is fused with an arith op (and not in the entry BB) · e6e497db

authored Jul 30, 2014

Normally, the FakeUse for preserving the atomic load ends
up on the load's Dest. However, for fused load+add, the load
is deleted, and its Dest is no longer defined. This trips
up the liveness analysis when it happens on a non-entry
block. So the FakeUse should be for the add's dest instead,
in that case.

We have no access to the add, so introduce a
getLastInserted() helper. A couple of ways to do that:
- modify insert() to track explicitly
- rewind from Next one step

Either that, or we disable the fusing for atomic loads.

BUG=  https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/417353003

e6e497db

Remove extra semicolon after method definition · d7ee9728

authored Jul 30, 2014

The mac build treats this as an error.

R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/429253002

d7ee9728

29 Jul, 2014 1 commit

Add a peephole to fuse cmpxchg w/ later cmp+branch. · c820ddf2

authored Jul 29, 2014

The cmpxchg instruction already sets ZF for comparing the return value
vs the expected value. So there is no need to compare eq again.

Lots of pexes-in-the-wild have this pattern. Some compare against
a constant, some compare against a variable.

BUG=https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/413903002

c820ddf2

28 Jul, 2014 2 commits

A couple of fixes for using Makefile.standalone on Mac. · 839c4cea

authored Jul 28, 2014

(*) PNaCl toolchain_build builds 64-bit libraries for LLVM on Mac.
    That won't link with subzero code if subzero is built with -m32,
    so add an option to override the -m32.
(*) include locale header
(*) Mark xMacroIntegrityCheck unused to avoid clang compiler warning.
(*) virtual dtor, for inheritable class
(*) Mark compare function const

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/428733003

839c4cea

Subzero: Make Ice::Ostream a typedef for llvm::raw_ostream. · 78282f6c

authored Jul 27, 2014

Previously Ostream was a class that wrapped a raw_ostream pointer,
structured that way in case we wanted to wrap an alternate stream
type.

Also, Ostream used to include a Cfg pointer, but that had to go away
when the Ostream became associated with the GlobalContext which
persists beyond the Cfg lifetime, so the Cfg pointer was removed
leaving only the raw_ostream.

Since llvm::raw_ostream is supposed to be very lightweight, we can
just give up the abstraction and equate it to Ice::Ostream.

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/413393005

78282f6c

25 Jul, 2014 1 commit

Use movss to implement insertelement when elements = 4 and index = 0. · cfe5146f

authored Jul 25, 2014

This avoids using a pair of shufps instructions as the previous lowering
was doing.  Instead, we use movss to copy the element to be inserted
into the lower 32 bits of the destination.

Define InstX8632Movss as a Binop, the class to which it properly
belongs.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/412353005

cfe5146f

24 Jul, 2014 4 commits

Lower the fcmp instruction for <4 x float> operands. · ce0ca8f8

authored Jul 24, 2014

Most fcmp conditions map directly to single x86 instructions. For
these, the lowering is table driven.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/413053002

ce0ca8f8

Lower the select instruction when the operands are of vector type. · 9cb61e2f

authored Jul 24, 2014

Select of vectors is implemented by appropriately masking and
combining the inputs with sign extend / bitwise operations
and without the use of branches.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/417653004

9cb61e2f

Fix a counter in the test_global crosstest. · 656d1767

authored Jul 24, 2014

Change TotalTests so that the test count matches up with the number of
recorded passes and failures.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/415803004

656d1767

Subzero: Fix a regalloc eviction bug. · 68e28192

authored Jul 24, 2014

We don't need/want to evict an inactive live range when it doesn't
overlap with the live range currently being considered.

This is especially important for Variables representing scratch
registers that are killed by call instructions.  These register
assignments should obviously never be evicted.

Note that the algorithm that computes the min-weight register to evict
doesn't consider inactive and non-overlapping live ranges.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3903
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/417933004

68e28192

23 Jul, 2014 2 commits

Lower icmp operations between vector values. · 9a0168a9

authored Jul 23, 2014

SSE2 only has signed integer comparison. Unsigned compares are
implemented by inverting the sign bits of the operands and doing a
signed compare.

A common pattern in clang generated IR is a vector compare which
generates an i1 vector followed by a sign extension of the result of the
compare. The x86 comparison instructions already generate sign extended
values, so we can eliminate unnecessary sext operations that follow
compares in the IR.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/412593002

9a0168a9

Add llvm-mc to the set of commands lit knows about. · 87543355
Jim Stichnoth authored Jul 23, 2014
```
BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/415583003
```
87543355