Commits · e4289e23cd226f487ef767a59c96e6c77073309a · Chen Yisong / swiftshader

27 Oct, 2015 3 commits

Fix ARM integrated assembler to be able to compile spec2k examples. · e4289e23

authored Oct 27, 2015

Fixes a couple of bugs that stopped the ARM integrated assembler from
generating assembly code for any spec2k examples.

Fixes are:

1) Handle conditional branches with no else branch.

2) Fix usage of fixups so that the emit method does any needed buffer
lookups. This fixes case where textual fixups (with zero length)
appear at the end of the assembly file.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1417173003 .

e4289e23

Add ADC (immediate) instruction to ARM integrated assembler. · db098880

authored Oct 27, 2015

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1424773002 .

db098880

Handle branch relative to pc in ARM integrated assembler. · 137e62bd

authored Oct 27, 2015

Adds an explicit branch instruction (near form only), which allows
branching from the current pc up to 2**26 bytes (in either direction).
For now, this near restriction (within a function) doesn't appear to
be a bad restriction, and only near jumps have been implemented.

Also fixes notationally the concepts of the following types:

InstValueType : The 32-bit encoding of an instruction value.
InstOffsetType : Offset (+/-) used within an instruction.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1418313003 .

137e62bd

23 Oct, 2015 1 commit

Generate block labels in the ARM hybrid assembler. · 50a3331c

authored Oct 23, 2015

Fixes an issue where branches don't compile in the hybrid integrated
assembler because some jump instructions have not yet been integrated.
It does this by adding an instruction label for each corresponding
label generated by the standalone ARM assembler.

Note that in order to fix this, I had to change the signature of
virtual method Assembler::bindCfgNodeLabel to get the Cfg node (rather
than the index value). This allows the ARM hybrid assembler to
generate a label for each CfgNode (using the getAsmName() method).

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1407273006 .

50a3331c

22 Oct, 2015 1 commit

Add hybrid assembler concept to ARM assembler. · 2fee2a2f

authored Oct 22, 2015

Adds a notion of a hybrid assembler. That is, if the integrated
assembler can lower an instruction to bytes, it does. Otherwise, it
uses the standalone assembler to generate text as the placeholder for
the instruction. This is done using a textual fixup in the assembly
buffer.

The advantage of the hybrid assembler is that one can incrementally
implement the integrated assembler and still test the generated
assembly.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1418523002 .

2fee2a2f

21 Oct, 2015 1 commit

Implements simple returns and call args for Mips. · ac8da5cf

authored Oct 21, 2015

This patch is essentially the same as for ARM https://codereview.chromium.org/1127963004

I have incorporated the new 64 bit register work which was not available
at the time of this earlier patch.

The MIPS O32 Abi is not perfect on this patch but I am more or less following
the development of the ARM patches and those were preliminary at this
stage too. I will make corrections in a later patch when I incorporate
more of the ARM patches.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4167
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1416493002 .

ac8da5cf

17 Oct, 2015 1 commit

emit add/sub registers instructions in integrated ARM assembler. · 4c2153b1

authored Oct 17, 2015

Also cleans up comments and condition violations for all implemented ARM
instructions.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1411873002 .

4c2153b1

16 Oct, 2015 5 commits

Subzero: Fix MINIMAL build issues. · 659cc4f2

authored Oct 16, 2015

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/1407263005 .

659cc4f2

Merge compares and branches · d981025a

authored Oct 16, 2015

Generalize folding of icmp instructions into br.  64-bit comparisons are
considered as candidates unless they feed a select.

BUG=
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1407143002 .

d981025a

Subzero: Add -allow-extern as an alias for --allow-externally-defined-symbols. · 4e10aa2c

authored Oct 16, 2015

Also remind the user of that option in IceConverter.cpp, similar to PNaClTranslator.cpp.

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/1408023004 .

4e10aa2c

Subzero. Misc ARM32 bugfixes. · afc92af5

authored Oct 16, 2015

With this CL, Spec2k built by the Sz ARM32 backend runs and verifies
successfully.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1407063002 .

afc92af5

Handle stack spills in ARM integrated assembler. · 745ad1d8

authored Oct 16, 2015

Add code to handle spilling stack variables. That is, add code to
handle loading and storing to stack addresses.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334
R=jpp@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/1402403002 .

745ad1d8

15 Oct, 2015 2 commits

Subzero: Various fixes in preparation for x86-32 register aliasing. · 1fb030c6

authored Oct 15, 2015

1. Helper function sameVarOrReg() also needs to return true if the two physical registers alias or overlap. Otherwise advanced phi lowering may pick an incorrect ordering.

2. With -asm-verbose, redundant truncation assignments expressed as _mov instructions, like "mov cl, ecx", need to have their register use counts updated properly, so that the LIVEEND= annotations are correct.

3. The register allocator should consider suitably typed aliases when choosing a register preference.

4. When evicting a variable, the register allocator should decrement the use count of all aliases.

5. When saving/restoring callee-save registers in the prolog/epilog, map each register to its "canonical" register (e.g. %bl --> %ebx) and make sure each canonical register is only considered once.

6. Remove some unnecessary Variable::setMustHaveReg() calls.

7. When assigning bool results as a constant 0 or 1, use an 8-bit constant instead of 32-bit so that only the 8-bit register gets assigned.

BUG= none
TEST= make check, plus spec2k -asm-verbose output is unchanged
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/1405643003 .

1fb030c6

Optimize 64-bit compares with zero · 5c87542a

authored Oct 15, 2015

Comparisons with zero can be done with no branches in most cases and with
simpler sequences of operations.

BUG=
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1406593003 .

5c87542a

14 Oct, 2015 1 commit

Add "sub immediate" instruction to the ARM integrated assembler. · e3455053

authored Oct 14, 2015

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1388323003 .

e3455053

13 Oct, 2015 2 commits

Add "add immediate" instruction to the ARM integrated assembler. · 372bdd6e

authored Oct 13, 2015

Also does some bikeshed clean ups. In particualr, the (ARM)
instruction method emitIAS only needs to choose the applicable ARM
instruction, and then passes the corresponding operands to the
corresponding instruction method of the assembler. The assembler
method then extracts the appropriate data from the operands, and
decides which rule to apply for the corresponding arm instruction.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334
R=jpp@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/1407613002 .

372bdd6e

Fix emission of move immediate for ARM integrated assembler. · 85342a76

authored Oct 13, 2015

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334
R=jpp@chromium.org

Review URL: https://codereview.chromium.org/1397043003 .

85342a76

12 Oct, 2015 1 commit

Subzero: Consider all instruction variables for register preference. · 28b71be4

authored Oct 12, 2015

The original code only looked at top-level source operands in the defining instruction, with a TODO to instead consider all inner variables in the instruction.

The primary reason is so that we end up with more instructions like
  mov eax, eax
which are later elided as redundant assignments.

A secondary reason is to foster more instructions like:
  mov ecx, [ecx]
rather than
  mov eax, [ecx]
where ecx's live range ends.  This hopefully keeps eax (in the latter case) free for longer and maybe allow some other variable to get a register.  By considering all instruction variables, we enable this.

BUG= none
R=jpp@chromium.org

Review URL: https://codereview.chromium.org/1392383003 .

28b71be4

09 Oct, 2015 5 commits

Subzero: Implement "second-chance bin-packing" for register allocation. · 4001c939

authored Oct 09, 2015

If a variable gets a register but is later evicted because of a higher-weight variable, there's a chance that the first variable could have been allocated a register if only its initial choice had been different.

To improve this, we keep track of which variables are evicted, and then allow register allocation to run again, focusing only on those once-evicted variables, and not changing any previous register assignments.

This can iterate until there are no more evictions.

This is more or less what the linear-scan literature describes as "second-chance bin-packing".

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
R=jpp@chromium.org

Review URL: https://codereview.chromium.org/1395693005 .

4001c939

Start incorporating the ARM integrated assembler. · c5abdc13

authored Oct 09, 2015

Extends the ARM32 assembler to be able to generate a trivial function
footprint using the -filetype=iasm option.

Also does a couple of cleanups:

1) Move UnimplementedError macro to common location so that it can be
used by everyone.

2) Add a GlobalContext argument to the assembler, so that it can
look at flags etc.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1397933002 .

c5abdc13

Subzero: Don't bother printing stack/frame ptr as part of LiveIn/LiveOut. · e7418719

authored Oct 09, 2015

The LiveIn and LiveOut register sets are printed for each basic block in -asm-verbose mode. These sets would generally include the stack and/or frame pointer registers, which is just noise, so we suppress that.

BUG= none
R=jpp@chromium.org

Review URL: https://codereview.chromium.org/1399523003 .

e7418719

Subzero: Don't "and" i1 values with 1. · 485d0773

authored Oct 09, 2015

In x86 lowering, i1 values are held in i8 register and memory slots. We were conservatively "and"ing them with 1 before zero-extending them for some lowering operations, but this "and" with 1 is unnecessary and just clutters the code.

We continue the invariant that all i1-produced values in an i8 slot are either 0 or 1.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
R=jpp@chromium.org

Review URL: https://codereview.chromium.org/1394413002 .

485d0773

Subzero: Change aliases_init --> alias_init for consistency. · 69a85b14
Jim Stichnoth authored Oct 09, 2015
```
BUG= none
R=jpp@chromium.org

Review URL: https://codereview.chromium.org/1392403002 .
```
69a85b14

08 Oct, 2015 2 commits

Subzero: Remove trailing whitespace. · a00b1f7f

authored Oct 08, 2015

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/1396923002 .

a00b1f7f

Add correction message to bad linkage error. · a313a121

authored Oct 08, 2015

Adds message to use "-allow-externally-defined-symbols" on bad
linkage errors.

Also cleans up code by defining common reporting routine.

BUG=None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1392273002 .

a313a121

07 Oct, 2015 3 commits

Create local copy of Dart assembler code. · 3e53dc99

authored Oct 07, 2015

Creates a local version of the Dart assembler code, before being
merged into our code base. The goal of these files is to track code as
it is moved from the Dart implementation into our code base.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334
R=jpp@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/1394613002 .

3e53dc99

Make sure that all globals are internal, except for "start" functions. · 57d31ac7

authored Oct 07, 2015

The existing code, when run on a fuzzed example, generates a runtime
assertion. The reason for this is that the input defines "memmove" as
an external global. However, the code generator can generate calls to
"memmove" which assumes it is internal (see PNaCl ABI). As a result,
the assertion that checks that global names are unique (for memmove)
fails.

This code fixes the problem by checking that global names are
internal, unless they are one of the "start" functions,
or the function is an intrinsic. To allow for
non-PNaCl ABI input, a flag was added to allow functions to be
external. However, in such cases the external can't be one of
Subzero's runtime helper functions.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4330
R=jpp@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/1387963002 .

57d31ac7

Generate better two address code by using commutativity · 487bad02

authored Oct 06, 2015

For operations such as
    t0 = t1 + t2
Subzero's pattern for arithmetic operations generates two address code that
looks like
    movl ...t1..., %ecx
    addl ...t2..., %ecx // t0 is in %ecx

When register pressure is high this sometimes becomes:
    movl ...t2..., SPILL
    movl ...t1..., %ecx
    addl SPILL, %ecx // t0 is in %ecx

This CL takes advantage of cases where the use of t2 is the last one, so the
register that held t2 before the operation can be reused.  The optimization
simply swaps the (commutative) operation to
    t0 = t2 + t1
which then generates code as
    movl ...t2..., %ecx
    addl ...t1..., %ecx // t0 is in %ecx

This optimization is used for any commutative operation, which now includes
Fadd and Fmul, which were erroneously marked as non-commutative.  See the
rationale in IceInst.def for the IEEE wordings.

BUG=
R=jfb@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/1371703003 .

487bad02

06 Oct, 2015 2 commits

Use three-address form of imul · e11f878a

authored Oct 06, 2015

Previously we did not take advantage of the three address versions of the
imul instruction.  With this we are able to avoid some copies before imuls.

BUG=
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1365433004 .

e11f878a

Subzero. Enable Atomics in ARM. · 578f1161

authored Oct 06, 2015

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1369333003 .

578f1161

05 Oct, 2015 3 commits

Subzero: Improve lowering sequence for "a=b*b". · ebbb5912

authored Oct 05, 2015

Originally, the lowering sequence looked like:
  T = b
  T *= b
  a = T
Now it looks like:
  T = b
  T *= T
  a = T

If "b" gets a register and its live range ends after this instruction, then the new lowering sequence allows its register to be reused for "T".  This decreases register pressure, and removes an instruction (register move) from what could be a critical path.

This optimization is actually applicable for most arithmetic operations whose source operands are identical, but mul/fmul are the only ones that seem at all likely in practice.

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/1377213004 .

ebbb5912

Subzero: Fix nondeterministic behavior in constant pool creation. · b36757e1

authored Oct 05, 2015

This issue was discovered as the result of a spurious "make check-lit" failure in undef.ll.

The problem is that constant pool label strings depend on the order the constants are created, and this order can be different with multithreaded translation.

Even -filetype=obj is affected by this, because the label string is put into the ELF .o file. This means that different runs of Subzero on the same input could potentially produce slightly different output.

The solution is to base the label name on the actual value of the constant. We do this by using the hex representation of the constant, rather than the sequence number of the constant within the pool. This actually simplifies things a bit, as we no longer need to track the sequence number.

In addition, for floating-point constant labels in asm-verbose mode, include a human-readable rendering of the value in the label name.

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/1386593004 .

b36757e1

Subzero: With -asm-verbose, make the predecessor list more compact. · 9a63babb

authored Oct 05, 2015

Instead of a comment like this:

  # preds=.Lfv_update_nonbon$split___114___115_0,.Lfv_update_nonbon$split___138___115_1

remove some redundancy and make the comment like this:

  # preds=$split___114___115_0,$split___138___115_1

This makes it slightly easier to read, and less likely to exceed 80 columns.

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/1380323003 .

9a63babb

02 Oct, 2015 3 commits

Change from ::stdout to stderr when reporting fatal error. · 4e6ea83a

authored Oct 02, 2015

The pnacl linux x86_64 buildbot doesn't understand ::stdout (it uses
a macro to define stdout). Fix by removing :: prefix. Also redirects
the error messages to stderr instead of stdout.

BUG=None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1383053002 .

4e6ea83a

Remove dependence on header file unistd.h. · 7e64eaaa

authored Oct 02, 2015

Fixes bug in function reportFatalErrorThenExitSuccess by using fwrite
instead of write (a unix posix include file not supported by MSC).

BUG=None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1370323005 .

7e64eaaa

Subzero: Use register availability during lowering to improve the code. · 318f4cda

authored Oct 01, 2015

The problem is that given code like this:

  a = b + c
  d = a + e
  ...
  ... (use of a) ...

Lowering may produce code like this, at least on x86:

  T1 = b
  T1 += c
  a = T1
  T2 = a
  T2 += e
  d = T2
  ...
  ... (use of a) ...

If "a" has a long live range, it may not get a register, resulting in clumsy code in the middle of the sequence like "a=reg; reg=a".  Normally one might expect store forwarding to make the clumsy code fast, but it does presumably add an extra instruction-retirement cycle to the critical path in a pointer-chasing loop, and makes a big difference on some benchmarks.

The simple fix here is, at the end of lowering "a=b+c", keep track of the final "a=T1" assignment.  Then, when lowering "d=a+e" and we look up "a", we can substitute "T1".  This slightly increases the live range of T1, but it does a great job of avoiding the redundant reload of the register from the stack location.

A more general fix (in the future) might be to do live range splitting and let the register allocator handle it.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/1385433002 .

318f4cda

01 Oct, 2015 4 commits

Subzero. Adds I64 register pairs for ARM32. · ed2c06b2

authored Oct 01, 2015

This is in preparation for llvm.nacl.atomic.* lowerings. atomic i64
loads and stores require their operands to be consecutive registers
starting at an even register that is not r14.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/1382063002 .

ed2c06b2

Subzero. Fixes a bug in the register allocator. · 7cb12682

authored Oct 01, 2015

This bug was uncovered While implementing the llvm.nacl.atomic.cmpxchg
lowering for i64 for ARM32. For reference, the lowering is

retry:
    ldrexd     tmp_i, tmp_i+1 [addr]
    cmp        tmp_i+1, expected_i+1
    cmpeq      tmp_i, expected_i
    strexdeq   success, new_i, new_i+1, [addr]
    movne      expected_i+1, tmp_i+1
    movne      expected_i, tmp_i
    cmpeq      success, #0
    bne        retry
    mov        dest_i+1, tmp_i+1
    mov        dest_i, tmp_i

The register allocator would allocate r4 to both success and new_i,
which is clearly wrong (expected_i is alive thought the cmpxchg loop.)
Adding a fake-use(new_i) after the loop caused the register allocator
to fail due to the impossibility to allocate a register for an infinite
weight register. The problem was being caused for not evicting live
ranges that were assigned registers that alias the selected register.

BUG=
R=kschimpf@google.com, stichnot@chromium.org

Review URL: https://codereview.chromium.org/1373823006 .

7cb12682

Subzero. Adds ldrex, strex, and dmb support (ARM32) · 16991847

authored Oct 01, 2015

These instructions are used to load/store data atomically, and to
notify the processor about a data memory barrier. They are used for
implementing the llvm.nacl.atomic.* lowerings.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1378303003 .

16991847

Add include files so that IceCompilerServer.cpp can compile on MSC. · 166cbf4a

authored Oct 01, 2015

A recent change to IceCompilerServer.cpp was added to allow fatal
errors to return exit status zero. However, this code called ::write
(a C function) that is not defined when compiling with MSC. This CL
adds includes to fix this problem.

BUG=None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1379613005 .

166cbf4a