Commits · 5acafbc06bd00aab96fa299b615a56a965e81885 · Chen Yisong / swiftshader

30 Jul, 2014 6 commits

Subzero: Fix a signed/unsigned warning reported on the Mac. · 5acafbc0

authored Jul 30, 2014

Also cleans up some unneeded table size const static variables.

BUG= https://codereview.chromium.org/296053008/
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/428353002

5acafbc0

Subzero: Try to fix warnings and errors in the Windows build. · 6e992147

authored Jul 30, 2014

Quiet some unused-variable warnings when their only use is in an assert().

Forward-declare partial template specializations when the template method already has a default implementation, to avoid ODR violations and link errors.

BUG= https://codereview.chromium.org/296053008/
R=wala@chromium.org

Review URL: https://codereview.chromium.org/429993002

6e992147

Add dtor to InstX8632Lockable. · 1e889586

authored Jul 30, 2014

Speculative fix for Mac GCC build.

BUG=none
R=dschuff@chromium.org

Review URL: https://codereview.chromium.org/432523002

1e889586

Subzero: Add support for SSE4.1 instructions. · 0a450519

authored Jul 30, 2014

* Add initial support for code generation with SSE4.1 instructions. The
following operations are affected:
 - multiplication with v4i32
 - select
 - insertelement
 - extractelement

* Add appropriate lit checks for SSE4.1 instructions. Run the crosstests
in both SSE2 and SSE4.1 mode.

* Introduce the -mattr flag to llvm2ice to control which instruction set
gets used.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/427843002

0a450519

Fix bug when atomic load is fused with an arith op (and not in the entry BB) · e6e497db

authored Jul 30, 2014

Normally, the FakeUse for preserving the atomic load ends
up on the load's Dest. However, for fused load+add, the load
is deleted, and its Dest is no longer defined. This trips
up the liveness analysis when it happens on a non-entry
block. So the FakeUse should be for the add's dest instead,
in that case.

We have no access to the add, so introduce a
getLastInserted() helper. A couple of ways to do that:
- modify insert() to track explicitly
- rewind from Next one step

Either that, or we disable the fusing for atomic loads.

BUG=  https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/417353003

e6e497db

Remove extra semicolon after method definition · d7ee9728

authored Jul 30, 2014

The mac build treats this as an error.

R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/429253002

d7ee9728

29 Jul, 2014 1 commit

Add a peephole to fuse cmpxchg w/ later cmp+branch. · c820ddf2

authored Jul 29, 2014

The cmpxchg instruction already sets ZF for comparing the return value
vs the expected value. So there is no need to compare eq again.

Lots of pexes-in-the-wild have this pattern. Some compare against
a constant, some compare against a variable.

BUG=https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/413903002

c820ddf2

28 Jul, 2014 2 commits

A couple of fixes for using Makefile.standalone on Mac. · 839c4cea

authored Jul 28, 2014

(*) PNaCl toolchain_build builds 64-bit libraries for LLVM on Mac.
    That won't link with subzero code if subzero is built with -m32,
    so add an option to override the -m32.
(*) include locale header
(*) Mark xMacroIntegrityCheck unused to avoid clang compiler warning.
(*) virtual dtor, for inheritable class
(*) Mark compare function const

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/428733003

839c4cea

Subzero: Make Ice::Ostream a typedef for llvm::raw_ostream. · 78282f6c

authored Jul 27, 2014

Previously Ostream was a class that wrapped a raw_ostream pointer,
structured that way in case we wanted to wrap an alternate stream
type.

Also, Ostream used to include a Cfg pointer, but that had to go away
when the Ostream became associated with the GlobalContext which
persists beyond the Cfg lifetime, so the Cfg pointer was removed
leaving only the raw_ostream.

Since llvm::raw_ostream is supposed to be very lightweight, we can
just give up the abstraction and equate it to Ice::Ostream.

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/413393005

78282f6c

25 Jul, 2014 1 commit

Use movss to implement insertelement when elements = 4 and index = 0. · cfe5146f

authored Jul 25, 2014

This avoids using a pair of shufps instructions as the previous lowering
was doing.  Instead, we use movss to copy the element to be inserted
into the lower 32 bits of the destination.

Define InstX8632Movss as a Binop, the class to which it properly
belongs.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/412353005

cfe5146f

24 Jul, 2014 4 commits

Lower the fcmp instruction for <4 x float> operands. · ce0ca8f8

authored Jul 24, 2014

Most fcmp conditions map directly to single x86 instructions. For
these, the lowering is table driven.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/413053002

ce0ca8f8

Lower the select instruction when the operands are of vector type. · 9cb61e2f

authored Jul 24, 2014

Select of vectors is implemented by appropriately masking and
combining the inputs with sign extend / bitwise operations
and without the use of branches.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/417653004

9cb61e2f

Fix a counter in the test_global crosstest. · 656d1767

authored Jul 24, 2014

Change TotalTests so that the test count matches up with the number of
recorded passes and failures.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/415803004

656d1767

Subzero: Fix a regalloc eviction bug. · 68e28192

authored Jul 24, 2014

We don't need/want to evict an inactive live range when it doesn't
overlap with the live range currently being considered.

This is especially important for Variables representing scratch
registers that are killed by call instructions.  These register
assignments should obviously never be evicted.

Note that the algorithm that computes the min-weight register to evict
doesn't consider inactive and non-overlapping live ranges.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3903
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/417933004

68e28192

23 Jul, 2014 3 commits

Lower icmp operations between vector values. · 9a0168a9

authored Jul 23, 2014

SSE2 only has signed integer comparison. Unsigned compares are
implemented by inverting the sign bits of the operands and doing a
signed compare.

A common pattern in clang generated IR is a vector compare which
generates an i1 vector followed by a sign extension of the result of the
compare. The x86 comparison instructions already generate sign extended
values, so we can eliminate unnecessary sext operations that follow
compares in the IR.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/412593002

9a0168a9

Add llvm-mc to the set of commands lit knows about. · 87543355
Jim Stichnoth authored Jul 23, 2014
```
BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/415583003
```
87543355

Add -arch=x86 and -filetype=obj to all RUN lines involving · d9ea7ad5

authored Jul 22, 2014

llvm-mc.

This fixes the failing validation of callindirect.pnacl.ll.

The following tests fail to validate (some due to the
addition of -filetype=obj):
 * convert.ll
 * globalinit.pnacl.ll
 * mangle.ll
 * nacl-atomic-fence-all.ll
 * shift.ll

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/410743005

d9ea7ad5

22 Jul, 2014 3 commits

Fix legalization of source operand to bsr and bsf. · 53c5e609

authored Jul 22, 2014

The source operand to bsr and bsf must be in a register or memory.

BUG=none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/407093014

53c5e609

Validate the assembly code that Subzero generates in unit tests. · 927cc171

authored Jul 22, 2014

Add RUN lines to applicable lit tests to pipe the output of Subzero (in
-Om1 and/or -O2 mode) to llvm-mc for validation.

Note that the following unit tests fail the validation:
 * callindirect.pnacl.ll
 * mangle.ll
 * nacl-other-intrinsics.ll

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/411693003

927cc171

Factor out common vector crosstesting code. · 89a7c2bd

authored Jul 22, 2014

Add vectors.h and vector.def to hold vector type declarations and useful
vector utilities. Change the existing tests to use this new header where
applicable (arith, vector_ops).

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/407543003

89a7c2bd

21 Jul, 2014 1 commit

Use lowerCast instead of inlined _movzx, to get legalization, for memset. · 957c50d9

authored Jul 21, 2014

Otherwise, there can be a movzx reg, 0, which is illegal,
when the memset value is constant 0.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/402253002

957c50d9

18 Jul, 2014 4 commits

Fix array index in test initialization. · 35ec373d

authored Jul 18, 2014

Index() % NumElementsInType should be Index() % NumValues.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/404553007

35ec373d

Lower stacksave and restore intrinsics. · 7b34b597

authored Jul 18, 2014

Just copies the current stack pointer to/from a variable.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/396993009

7b34b597

Lower byte swap intrinsic. · 7fa813b3

authored Jul 18, 2014

Clump the negate instruction w/ the bswap instruction as an
"inplace" operation. One difference is that bswap has stricter
requirements the operand type.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/401533002

7fa813b3

Lower insertelement and extractelement. · 49889239

authored Jul 18, 2014

Use instructions that do the operations in registers and that are
available in SSE2. Spill to memory to perform the operation in the
absence of any other reasonable options (v16i8 and v16i1).

Unfortunately there is no natural class of SSE2 instructions that
insertelement / extractelement can get lowered
to for all vector types (though pinsr[bwd] and pextr[bwd] are
available in SSE4.1). There are in some cases a large number of
choices available for lowering and I have not looked into which
choices are the best yet, besides using LLVM output as a guide.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/401523003

49889239

17 Jul, 2014 1 commit

Lower the rest of the vector arithmetic operations. · 7fa22d8a

authored Jul 17, 2014

The instructions emitted by the lowering operations require memory
operands to be aligned to 16 bytes. Since there is no support for
aligning memory operands in Subzero, do the arithmetic in registers for
now.

Add vector arithmetic to the arith crosstest. Pass the -mstackrealign
parameter to the crosstest clang so that llc code called back from
Subzero code (helper calls) doesn't assume that the stack is aligned at
the entry to the call.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/397833002

7fa22d8a

16 Jul, 2014 2 commits

Lower casting operations that involve vector types. · 83b8036b

authored Jul 16, 2014

Impacted instructions:

bitcast {v4f32, v4i32, v8i16, v16i8} <-> {v4f32, v4i32, v8i16, v16i8}
bitcast v8i1 <-> i8
bitcast v16i1 <-> i16

(There was already code present to handle trivial bitcasts like v16i1 <-> v16i1.)

[sz]ext v4i1 -> v4i32
[sz]ext v8i1 -> v8i16
[sz]ext v16i1 -> v16i8

trunc v4i32 -> v4i1
trunc v8i16 -> v8i1
trunc v16i8 -> v16i1

[su]itofp v4i32 -> v4f32
fpto[su]i v4f32 -> v4i32

Where there is a relatively simple lowering to x86 instructions, it has been used. Otherwise a helper call is used.

Some lowerings require a materialization of a integer vector with 1s in each entry. Since there is no support for vector constant pools, the constant is materialized purely through register operations.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/383303003

83b8036b

Lower bitmanip intrinsics, assuming absence of BMI/SSE4.2 for now. · e4da26f6

authored Jul 15, 2014

We'll need the fallbacks in any case. However, once we've
decided on how to specify the CPU features of the user
machine we can use the nicer LZCNT/TZCNT/POPCNT as well.

Adds cmov, bsf, and bsr instructions.

Calls a popcount helper function for machines without SSE4.2.

Not handling bswap yet (which can also take i16 params).

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/390443005

e4da26f6

15 Jul, 2014 2 commits

Various improvements related to legalization code. · ad8f7265

authored Jul 14, 2014

1) In makeHelperCall(), function pointers that are created should have
type IceType_i32, not the functions' own return type.

2) In legalize(), change the name of WillHaveRegister to
MustHaveRegister. Add a comment to clarify the condition being computed.

3) In legalize(), add an assert to make sure that vector "constants"
don't get legalized (other than undef). There should be no constants of
vector type.

4) In copyToReg(), replace an unnecessary use of Src->getType().

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/385133006

ad8f7265

Fix floating point vector frem lowering. · 0ecabc82

authored Jul 14, 2014

The frem operation takes two arguments.
Pass both Src0 and Src1 to __frem_v4f32.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/387153002

0ecabc82

14 Jul, 2014 2 commits

Remove memcpy test workaround for name mangling substitutions. · 140bb0d8

authored Jul 14, 2014

Now that the name mangling is a bit smarter (from commit:
217dc082), we don't need to
avoid having the same type twice in the function signature.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/389683003

140bb0d8

Subzero: lower the rest of the atomic operations. · a3a01a2f

authored Jul 14, 2014

64-bit ops are expanded via a cmpxchg8b loop.

64/32-bit and/or/xor are also expanded into a cmpxchg /
cmpxchg8b loop.

Add a cross test for atomic RMW operations and
compare and swap.

Misc: Test that atomic.is.lock.free can be optimized out if result is ignored.

TODO:
* optimize compare and swap with compare+branch further down
instruction stream.

* optimize atomic RMW when the return value is ignored
(adds a locked field to binary ops though).

* We may want to do some actual target-dependent basic
block splitting + expansion (the instructions inserted by
the expansion must reference the pre-colored registers,
etc.). Otherwise, we are currently getting by with modeling
the extended liveness of the variables used in the loops
using fake uses.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=jfb@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/362463002

a3a01a2f

11 Jul, 2014 4 commits

Lower vector floating point arithmetic operations. · 8d1072e7

authored Jul 11, 2014

This adds lowering code for fadd, fsub, fmul, fdiv, and frem. frem, having no native x86 counterpart, is implemented by making a helper call.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/389653002

8d1072e7

Subzero: Fix the name mangling code's base-36 increment. · 78b4c0b8

authored Jul 11, 2014

SZZZ_ was being incremented to S0000_ instead of S1000_.

BUG= https://codereview.chromium.org/385273002/
R=wala@chromium.org

Review URL: https://codereview.chromium.org/390533002

78b4c0b8

Subzero: Deal with substitutions in the primitive remangler. · 217dc082

authored Jul 11, 2014

https://refspecs.linuxbase.org/cxxabi-1.75.html#mangling-compression
describes the mechanism for compressing mangled strings by using substitutions of the form S[0-9A-Z]*_ to represent repeated components.

When the prefix is handled as wrapping inside a namespace, the base-36 substitution numbers all have to be incremented.

This is implemented in a very simple way by scanning the string only for instances of the substitution pattern.

Unfortunately, false matches are possible because the S[0-9A-Z]*_ pattern can be a substring of the type name, or can span other components of the mangled name. Getting this completely right would essentially require a full demangling parser - see the ~4000 lines of code in cxa_demangle.cpp and ItaniumMangle.cpp.

Since this is just for testing, any false matches will likely cause a linking error and the test can be rewritten to avoid false matches.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/385273002

217dc082

Clean up exit status and globals procecessing in llvm2ice. · b164d208

authored Jul 11, 2014

Makes IceTranslator.ExitStatus a boolean (rather than int), and changes
code to check flag when done. Fixes bug introduced in
https://codereview.chromium.org/387023002.

Also cleans up the (Ice) Converter class to handle globals processing,
rathe than doing it in llvm2ice.cpp.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3894
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/387023002

b164d208

10 Jul, 2014 1 commit

Subzero: Fix a regalloc bug involving too-aggressive AllowRegisterOverlap. · ca662e9d

authored Jul 10, 2014

See the BUG description for more details.  In short, the register allocator
was inappropriately honoring AllowRegisterOverlap even when the variable's
live range overlaps with an Unhandled variable precolored to the preferred
register.

Also changes legalize() logic to recognize when a variable is guaranteed
to ultimately have a physical register due to infinite weight, and not
create a new temporary in those cases.

Finally, dumps RegisterPreference and AllowRegisterOverlap info for
Variables for improved diagnostics.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3897
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/380363002

ca662e9d

09 Jul, 2014 3 commits

Subzero: Add "make format-diff" target. · 240e0f8a

authored Jul 09, 2014

This invokes clang-format-diff.py so you can easily reformat just
the code you touched.

(Caution, this may not apply to new files.)

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/372133002

240e0f8a

Add support for passing and returning vectors in accordance with the x86 calling convention. · 45a06236

authored Jul 09, 2014

- Add TargetLowering::lowerArguments() as a new stage in TargetLowering.
- Add support for passing arguments/return values in XMM registers in the x86 target.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/372113005

45a06236

Add scalar lowering for sqrt intrinsic. · f37fbbe9

authored Jul 09, 2014

Re-used test_arith_main.cpp, mostly to share the set of interesting
floating point constants.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/384443003

f37fbbe9