Commits · cfe5146fc08cd992aed6aefcee5f3d8642b4c2d8 · Chen Yisong / swiftshader

25 Jul, 2014 1 commit

Use movss to implement insertelement when elements = 4 and index = 0. · cfe5146f

authored Jul 25, 2014

This avoids using a pair of shufps instructions as the previous lowering
was doing.  Instead, we use movss to copy the element to be inserted
into the lower 32 bits of the destination.

Define InstX8632Movss as a Binop, the class to which it properly
belongs.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/412353005

cfe5146f

24 Jul, 2014 4 commits

Lower the fcmp instruction for <4 x float> operands. · ce0ca8f8

authored Jul 24, 2014

Most fcmp conditions map directly to single x86 instructions. For
these, the lowering is table driven.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/413053002

ce0ca8f8

Lower the select instruction when the operands are of vector type. · 9cb61e2f

authored Jul 24, 2014

Select of vectors is implemented by appropriately masking and
combining the inputs with sign extend / bitwise operations
and without the use of branches.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/417653004

9cb61e2f

Fix a counter in the test_global crosstest. · 656d1767

authored Jul 24, 2014

Change TotalTests so that the test count matches up with the number of
recorded passes and failures.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/415803004

656d1767

Subzero: Fix a regalloc eviction bug. · 68e28192

authored Jul 24, 2014

We don't need/want to evict an inactive live range when it doesn't
overlap with the live range currently being considered.

This is especially important for Variables representing scratch
registers that are killed by call instructions.  These register
assignments should obviously never be evicted.

Note that the algorithm that computes the min-weight register to evict
doesn't consider inactive and non-overlapping live ranges.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3903
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/417933004

68e28192

23 Jul, 2014 3 commits

Lower icmp operations between vector values. · 9a0168a9

authored Jul 23, 2014

SSE2 only has signed integer comparison. Unsigned compares are
implemented by inverting the sign bits of the operands and doing a
signed compare.

A common pattern in clang generated IR is a vector compare which
generates an i1 vector followed by a sign extension of the result of the
compare. The x86 comparison instructions already generate sign extended
values, so we can eliminate unnecessary sext operations that follow
compares in the IR.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/412593002

9a0168a9

Add llvm-mc to the set of commands lit knows about. · 87543355
Jim Stichnoth authored Jul 23, 2014
```
BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/415583003
```
87543355

Add -arch=x86 and -filetype=obj to all RUN lines involving · d9ea7ad5

authored Jul 22, 2014

llvm-mc.

This fixes the failing validation of callindirect.pnacl.ll.

The following tests fail to validate (some due to the
addition of -filetype=obj):
 * convert.ll
 * globalinit.pnacl.ll
 * mangle.ll
 * nacl-atomic-fence-all.ll
 * shift.ll

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/410743005

d9ea7ad5

22 Jul, 2014 3 commits

Fix legalization of source operand to bsr and bsf. · 53c5e609

authored Jul 22, 2014

The source operand to bsr and bsf must be in a register or memory.

BUG=none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/407093014

53c5e609

Validate the assembly code that Subzero generates in unit tests. · 927cc171

authored Jul 22, 2014

Add RUN lines to applicable lit tests to pipe the output of Subzero (in
-Om1 and/or -O2 mode) to llvm-mc for validation.

Note that the following unit tests fail the validation:
 * callindirect.pnacl.ll
 * mangle.ll
 * nacl-other-intrinsics.ll

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/411693003

927cc171

Factor out common vector crosstesting code. · 89a7c2bd

authored Jul 22, 2014

Add vectors.h and vector.def to hold vector type declarations and useful
vector utilities. Change the existing tests to use this new header where
applicable (arith, vector_ops).

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/407543003

89a7c2bd

21 Jul, 2014 1 commit

Use lowerCast instead of inlined _movzx, to get legalization, for memset. · 957c50d9

authored Jul 21, 2014

Otherwise, there can be a movzx reg, 0, which is illegal,
when the memset value is constant 0.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/402253002

957c50d9

18 Jul, 2014 4 commits

Fix array index in test initialization. · 35ec373d

authored Jul 18, 2014

Index() % NumElementsInType should be Index() % NumValues.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/404553007

35ec373d

Lower stacksave and restore intrinsics. · 7b34b597

authored Jul 18, 2014

Just copies the current stack pointer to/from a variable.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/396993009

7b34b597

Lower byte swap intrinsic. · 7fa813b3

authored Jul 18, 2014

Clump the negate instruction w/ the bswap instruction as an
"inplace" operation. One difference is that bswap has stricter
requirements the operand type.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/401533002

7fa813b3

Lower insertelement and extractelement. · 49889239

authored Jul 18, 2014

Use instructions that do the operations in registers and that are
available in SSE2. Spill to memory to perform the operation in the
absence of any other reasonable options (v16i8 and v16i1).

Unfortunately there is no natural class of SSE2 instructions that
insertelement / extractelement can get lowered
to for all vector types (though pinsr[bwd] and pextr[bwd] are
available in SSE4.1). There are in some cases a large number of
choices available for lowering and I have not looked into which
choices are the best yet, besides using LLVM output as a guide.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/401523003

49889239

17 Jul, 2014 1 commit

Lower the rest of the vector arithmetic operations. · 7fa22d8a

authored Jul 17, 2014

The instructions emitted by the lowering operations require memory
operands to be aligned to 16 bytes. Since there is no support for
aligning memory operands in Subzero, do the arithmetic in registers for
now.

Add vector arithmetic to the arith crosstest. Pass the -mstackrealign
parameter to the crosstest clang so that llc code called back from
Subzero code (helper calls) doesn't assume that the stack is aligned at
the entry to the call.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/397833002

7fa22d8a

16 Jul, 2014 2 commits

Lower casting operations that involve vector types. · 83b8036b

authored Jul 16, 2014

Impacted instructions:

bitcast {v4f32, v4i32, v8i16, v16i8} <-> {v4f32, v4i32, v8i16, v16i8}
bitcast v8i1 <-> i8
bitcast v16i1 <-> i16

(There was already code present to handle trivial bitcasts like v16i1 <-> v16i1.)

[sz]ext v4i1 -> v4i32
[sz]ext v8i1 -> v8i16
[sz]ext v16i1 -> v16i8

trunc v4i32 -> v4i1
trunc v8i16 -> v8i1
trunc v16i8 -> v16i1

[su]itofp v4i32 -> v4f32
fpto[su]i v4f32 -> v4i32

Where there is a relatively simple lowering to x86 instructions, it has been used. Otherwise a helper call is used.

Some lowerings require a materialization of a integer vector with 1s in each entry. Since there is no support for vector constant pools, the constant is materialized purely through register operations.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/383303003

83b8036b

Lower bitmanip intrinsics, assuming absence of BMI/SSE4.2 for now. · e4da26f6

authored Jul 15, 2014

We'll need the fallbacks in any case. However, once we've
decided on how to specify the CPU features of the user
machine we can use the nicer LZCNT/TZCNT/POPCNT as well.

Adds cmov, bsf, and bsr instructions.

Calls a popcount helper function for machines without SSE4.2.

Not handling bswap yet (which can also take i16 params).

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/390443005

e4da26f6

15 Jul, 2014 2 commits

Various improvements related to legalization code. · ad8f7265

authored Jul 14, 2014

1) In makeHelperCall(), function pointers that are created should have
type IceType_i32, not the functions' own return type.

2) In legalize(), change the name of WillHaveRegister to
MustHaveRegister. Add a comment to clarify the condition being computed.

3) In legalize(), add an assert to make sure that vector "constants"
don't get legalized (other than undef). There should be no constants of
vector type.

4) In copyToReg(), replace an unnecessary use of Src->getType().

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/385133006

ad8f7265

Fix floating point vector frem lowering. · 0ecabc82

authored Jul 14, 2014

The frem operation takes two arguments.
Pass both Src0 and Src1 to __frem_v4f32.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/387153002

0ecabc82

14 Jul, 2014 2 commits

Remove memcpy test workaround for name mangling substitutions. · 140bb0d8

authored Jul 14, 2014

Now that the name mangling is a bit smarter (from commit:
217dc082), we don't need to
avoid having the same type twice in the function signature.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/389683003

140bb0d8

Subzero: lower the rest of the atomic operations. · a3a01a2f

authored Jul 14, 2014

64-bit ops are expanded via a cmpxchg8b loop.

64/32-bit and/or/xor are also expanded into a cmpxchg /
cmpxchg8b loop.

Add a cross test for atomic RMW operations and
compare and swap.

Misc: Test that atomic.is.lock.free can be optimized out if result is ignored.

TODO:
* optimize compare and swap with compare+branch further down
instruction stream.

* optimize atomic RMW when the return value is ignored
(adds a locked field to binary ops though).

* We may want to do some actual target-dependent basic
block splitting + expansion (the instructions inserted by
the expansion must reference the pre-colored registers,
etc.). Otherwise, we are currently getting by with modeling
the extended liveness of the variables used in the loops
using fake uses.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=jfb@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/362463002

a3a01a2f

11 Jul, 2014 4 commits

Lower vector floating point arithmetic operations. · 8d1072e7

authored Jul 11, 2014

This adds lowering code for fadd, fsub, fmul, fdiv, and frem. frem, having no native x86 counterpart, is implemented by making a helper call.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/389653002

8d1072e7

Subzero: Fix the name mangling code's base-36 increment. · 78b4c0b8

authored Jul 11, 2014

SZZZ_ was being incremented to S0000_ instead of S1000_.

BUG= https://codereview.chromium.org/385273002/
R=wala@chromium.org

Review URL: https://codereview.chromium.org/390533002

78b4c0b8

Subzero: Deal with substitutions in the primitive remangler. · 217dc082

authored Jul 11, 2014

https://refspecs.linuxbase.org/cxxabi-1.75.html#mangling-compression
describes the mechanism for compressing mangled strings by using substitutions of the form S[0-9A-Z]*_ to represent repeated components.

When the prefix is handled as wrapping inside a namespace, the base-36 substitution numbers all have to be incremented.

This is implemented in a very simple way by scanning the string only for instances of the substitution pattern.

Unfortunately, false matches are possible because the S[0-9A-Z]*_ pattern can be a substring of the type name, or can span other components of the mangled name. Getting this completely right would essentially require a full demangling parser - see the ~4000 lines of code in cxa_demangle.cpp and ItaniumMangle.cpp.

Since this is just for testing, any false matches will likely cause a linking error and the test can be rewritten to avoid false matches.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/385273002

217dc082

Clean up exit status and globals procecessing in llvm2ice. · b164d208

authored Jul 11, 2014

Makes IceTranslator.ExitStatus a boolean (rather than int), and changes
code to check flag when done. Fixes bug introduced in
https://codereview.chromium.org/387023002.

Also cleans up the (Ice) Converter class to handle globals processing,
rathe than doing it in llvm2ice.cpp.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3894
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/387023002

b164d208

10 Jul, 2014 1 commit

Subzero: Fix a regalloc bug involving too-aggressive AllowRegisterOverlap. · ca662e9d

authored Jul 10, 2014

See the BUG description for more details.  In short, the register allocator
was inappropriately honoring AllowRegisterOverlap even when the variable's
live range overlaps with an Unhandled variable precolored to the preferred
register.

Also changes legalize() logic to recognize when a variable is guaranteed
to ultimately have a physical register due to infinite weight, and not
create a new temporary in those cases.

Finally, dumps RegisterPreference and AllowRegisterOverlap info for
Variables for improved diagnostics.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3897
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/380363002

ca662e9d

09 Jul, 2014 4 commits

Subzero: Add "make format-diff" target. · 240e0f8a

authored Jul 09, 2014

This invokes clang-format-diff.py so you can easily reformat just
the code you touched.

(Caution, this may not apply to new files.)

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/372133002

240e0f8a

Add support for passing and returning vectors in accordance with the x86 calling convention. · 45a06236

authored Jul 09, 2014

- Add TargetLowering::lowerArguments() as a new stage in TargetLowering.
- Add support for passing arguments/return values in XMM registers in the x86 target.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/372113005

45a06236

Add scalar lowering for sqrt intrinsic. · f37fbbe9

authored Jul 09, 2014

Re-used test_arith_main.cpp, mostly to share the set of interesting
floating point constants.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org, wala@chromium.org

Review URL: https://codereview.chromium.org/384443003

f37fbbe9

Avoid assigning esp (or ebp for framepointer-using frames) in Om1. · 9559899d

authored Jul 09, 2014

For ebp, exclude as needed. For esp, don't mark it as
an int register.

Not sure exactly how to do a targeted test for this Om1
register allocator. The Om1 regalloc seems to start w/ a
fresh whitelist after each instruction, so it may assign
the same register (e.g., eax), as an earlier instruction.
Without pre-colored registers, I'm not sure how to force it
to allocate something other than the first few registers.
I do have a test case that has a ton of pre-colored
registers, (e.g., cmpxchg8b), but that is a different CL:
https://codereview.chromium.org/362463002/

Encountered for:
BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/369573005

9559899d

08 Jul, 2014 1 commit

Subzero: Temporary fix for build error. · e169e66d

authored Jul 08, 2014

The compile error was introduced in https://codereview.chromium.org/361733002/ .

BUG= none
R=wala@chromium.org

Review URL: https://codereview.chromium.org/376923003

e169e66d

07 Jul, 2014 2 commits

Add support for vector types. · 928f1297

authored Jul 07, 2014

- Add vector types to the type table.

- Add support for parsing vector types in llvm2ice.

- Legalize undef vector values to zero. Test that undef vector values are lowered correctly.

BUG=none
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/353553004

928f1297

Update Subzero to start parsing PNaCl bitcode files. · 8d7abae9

authored Jul 07, 2014

This patch only handles global addresses in PNaCl bitcode files.
Function blocks are still not parsed. Also, factors out a common API
for translation, so that generated ICE can always be translated using
the same code.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3892
R=jvoung@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/361733002

8d7abae9

29 Jun, 2014 1 commit

Subzero: Partial implementation of global initializers. · de4ca71e

authored Jun 29, 2014

This is still missing a couple things:

1. It only supports flat arrays and zeroinitializers. Arrays of structs are not yet supported.

2. Initializers can't yet contain relocatables, e.g. the address of another global.Mod

Some changes are made to work around an llvm-mc assembler bug. When assembling using intel syntax, llvm-mc doesn't correctly parse symbolic constants or add relocation entries in some circumstances. Call instructions work, and use in a memory operand works, e.g. mov eax, [ArrayBase+4*ecx]. To work around this, we adjust legalize() to not allow ConstantRelocatable by default, except for memory operands and when called from lowerCall(), so the relocatable ends up being the source operand of a mov instruction. Then, the mov emit routine actually emits an lea instruction for such moves.

A few lit tests needed to be adjusted to make szdiff work properly with respect to global initializers.

In the new cross test, the driver calls test code that returns a pointer to an array with a global initializer, and the driver compares the arrays returned by llc and Subzero.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/358013003

de4ca71e

27 Jun, 2014 1 commit
- Refactor llvm2ice so that Ice can be built while reading bitcode. · e1e013cf
  Karl Schimpf authored Jun 27, 2014
```
BUG=None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/350933002
```
  e1e013cf
26 Jun, 2014 1 commit

Subzero: Add 'not' to the list of LLVM commands in lit.cfg. · cc27a53a

authored Jun 26, 2014

Without this being in the command substitutions list, lit will rely on the 'not' command being in $PATH.

The substitution code is adapted from llvm/test/lit.cfg to add word-break regexps to the list.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/344063004

cc27a53a

25 Jun, 2014 1 commit

Add atomic load/store, fetch_add, fence, and is-lock-free lowering. · 5cd240df

authored Jun 25, 2014

Loads/stores w/ type i8, i16, and i32 are converted to
plain load/store instructions and lowered w/ the plain
lowerLoad/lowerStore.  Atomic stores are followed by an mfence
for sequential consistency.

For 64-bit types, use movq to do 64-bit memory
loads/stores (vs the usual load/store being broken into
separate 32-bit load/stores). This means bitcasting the
i64 -> f64, first (which splits the load of the value to be
stored into two 32-bit ops) then stores in a single op. For
load, load into f64 then bitcast back to i64 (which splits
after the atomic load). This follows what GCC does for
c++11 std::atomic<uint64_t> load/store methods (uses movq
when -mfpmath=sse). This introduces some redundancy between
movq and movsd, but the convention seems to be to use movq
when working with integer quantities. Otherwise, movsd
could work too. The difference seems to be in whether or
not the XMM register's upper 64-bits are filled with 0 or
not. Zero-extending could help avoid partial register
stalls.

Handle up to i32 fetch_add. TODO: add i64 via a cmpxchg loop.

TODO: add some runnable crosstests to make sure that this
doesn't do funny things to integer bit patterns that happen
to look like signaling NaNs and quiet NaNs. However, the system
clang would not know how to handle "llvm.nacl.*" if we choose to
target that level directly via .ll files. Or, (a) we use old-school __sync
methods (sync_fetch_and_add w/ 0 to load) or (b) require buildbot's
clang/gcc to support c++11...

BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/342763004

5cd240df

24 Jun, 2014 1 commit

Bitcast of 64-bit immediates may need to split the immediate, not a var. · 1ee34165

authored Jun 24, 2014

Currently, the integer immediate is legalized to a
64-bit integer register first, and then the lower/upper
parts of that register are used for the bitcast.
However, mov(64_bit_reg, imm) done by the legalization
isn't legal.

Similarly, trunc of 64-bit immediates need to take the
lower half of the immediate, not legalize to a var first.

This shifts the legalization code around.

Other cases where immediates are illegal and legalized
are idiv/div, but for those cases 64-bit operands are
handled separately via a function call. The function
call code properly splits up immediate arguments.

BUG=none
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/348373005

1ee34165