- 25 Jul, 2014 1 commit
-
-
Matt Wala authored
This avoids using a pair of shufps instructions as the previous lowering was doing. Instead, we use movss to copy the element to be inserted into the lower 32 bits of the destination. Define InstX8632Movss as a Binop, the class to which it properly belongs. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/412353005
-
- 24 Jul, 2014 4 commits
-
-
Matt Wala authored
Most fcmp conditions map directly to single x86 instructions. For these, the lowering is table driven. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/413053002
-
Matt Wala authored
Select of vectors is implemented by appropriately masking and combining the inputs with sign extend / bitwise operations and without the use of branches. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/417653004
-
Matt Wala authored
Change TotalTests so that the test count matches up with the number of recorded passes and failures. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/415803004
-
Jim Stichnoth authored
We don't need/want to evict an inactive live range when it doesn't overlap with the live range currently being considered. This is especially important for Variables representing scratch registers that are killed by call instructions. These register assignments should obviously never be evicted. Note that the algorithm that computes the min-weight register to evict doesn't consider inactive and non-overlapping live ranges. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3903 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/417933004
-
- 23 Jul, 2014 3 commits
-
-
Matt Wala authored
SSE2 only has signed integer comparison. Unsigned compares are implemented by inverting the sign bits of the operands and doing a signed compare. A common pattern in clang generated IR is a vector compare which generates an i1 vector followed by a sign extension of the result of the compare. The x86 comparison instructions already generate sign extended values, so we can eliminate unnecessary sext operations that follow compares in the IR. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/412593002
-
Jim Stichnoth authored
BUG= none R=wala@chromium.org Review URL: https://codereview.chromium.org/415583003
-
Matt Wala authored
llvm-mc. This fixes the failing validation of callindirect.pnacl.ll. The following tests fail to validate (some due to the addition of -filetype=obj): * convert.ll * globalinit.pnacl.ll * mangle.ll * nacl-atomic-fence-all.ll * shift.ll BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/410743005
-
- 22 Jul, 2014 3 commits
-
-
Matt Wala authored
The source operand to bsr and bsf must be in a register or memory. BUG=none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/407093014
-
Matt Wala authored
Add RUN lines to applicable lit tests to pipe the output of Subzero (in -Om1 and/or -O2 mode) to llvm-mc for validation. Note that the following unit tests fail the validation: * callindirect.pnacl.ll * mangle.ll * nacl-other-intrinsics.ll BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/411693003
-
Matt Wala authored
Add vectors.h and vector.def to hold vector type declarations and useful vector utilities. Change the existing tests to use this new header where applicable (arith, vector_ops). BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/407543003
-
- 21 Jul, 2014 1 commit
-
-
Jan Voung authored
Otherwise, there can be a movzx reg, 0, which is illegal, when the memset value is constant 0. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/402253002
-
- 18 Jul, 2014 4 commits
-
-
Matt Wala authored
Index() % NumElementsInType should be Index() % NumValues. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/404553007
-
Jan Voung authored
Just copies the current stack pointer to/from a variable. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/396993009
-
Jan Voung authored
Clump the negate instruction w/ the bswap instruction as an "inplace" operation. One difference is that bswap has stricter requirements the operand type. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org, wala@chromium.org Review URL: https://codereview.chromium.org/401533002
-
Matt Wala authored
Use instructions that do the operations in registers and that are available in SSE2. Spill to memory to perform the operation in the absence of any other reasonable options (v16i8 and v16i1). Unfortunately there is no natural class of SSE2 instructions that insertelement / extractelement can get lowered to for all vector types (though pinsr[bwd] and pextr[bwd] are available in SSE4.1). There are in some cases a large number of choices available for lowering and I have not looked into which choices are the best yet, besides using LLVM output as a guide. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/401523003
-
- 17 Jul, 2014 1 commit
-
-
Matt Wala authored
The instructions emitted by the lowering operations require memory operands to be aligned to 16 bytes. Since there is no support for aligning memory operands in Subzero, do the arithmetic in registers for now. Add vector arithmetic to the arith crosstest. Pass the -mstackrealign parameter to the crosstest clang so that llc code called back from Subzero code (helper calls) doesn't assume that the stack is aligned at the entry to the call. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/397833002
-
- 16 Jul, 2014 2 commits
-
-
Matt Wala authored
Impacted instructions: bitcast {v4f32, v4i32, v8i16, v16i8} <-> {v4f32, v4i32, v8i16, v16i8} bitcast v8i1 <-> i8 bitcast v16i1 <-> i16 (There was already code present to handle trivial bitcasts like v16i1 <-> v16i1.) [sz]ext v4i1 -> v4i32 [sz]ext v8i1 -> v8i16 [sz]ext v16i1 -> v16i8 trunc v4i32 -> v4i1 trunc v8i16 -> v8i1 trunc v16i8 -> v16i1 [su]itofp v4i32 -> v4f32 fpto[su]i v4f32 -> v4i32 Where there is a relatively simple lowering to x86 instructions, it has been used. Otherwise a helper call is used. Some lowerings require a materialization of a integer vector with 1s in each entry. Since there is no support for vector constant pools, the constant is materialized purely through register operations. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/383303003 -
Jan Voung authored
We'll need the fallbacks in any case. However, once we've decided on how to specify the CPU features of the user machine we can use the nicer LZCNT/TZCNT/POPCNT as well. Adds cmov, bsf, and bsr instructions. Calls a popcount helper function for machines without SSE4.2. Not handling bswap yet (which can also take i16 params). BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org, wala@chromium.org Review URL: https://codereview.chromium.org/390443005
-
- 15 Jul, 2014 2 commits
-
-
Matt Wala authored
1) In makeHelperCall(), function pointers that are created should have type IceType_i32, not the functions' own return type. 2) In legalize(), change the name of WillHaveRegister to MustHaveRegister. Add a comment to clarify the condition being computed. 3) In legalize(), add an assert to make sure that vector "constants" don't get legalized (other than undef). There should be no constants of vector type. 4) In copyToReg(), replace an unnecessary use of Src->getType(). BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/385133006
-
Matt Wala authored
The frem operation takes two arguments. Pass both Src0 and Src1 to __frem_v4f32. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/387153002
-
- 14 Jul, 2014 2 commits
-
-
Jan Voung authored
Now that the name mangling is a bit smarter (from commit: 217dc082), we don't need to avoid having the same type twice in the function signature. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/389683003
-
Jan Voung authored
64-bit ops are expanded via a cmpxchg8b loop. 64/32-bit and/or/xor are also expanded into a cmpxchg / cmpxchg8b loop. Add a cross test for atomic RMW operations and compare and swap. Misc: Test that atomic.is.lock.free can be optimized out if result is ignored. TODO: * optimize compare and swap with compare+branch further down instruction stream. * optimize atomic RMW when the return value is ignored (adds a locked field to binary ops though). * We may want to do some actual target-dependent basic block splitting + expansion (the instructions inserted by the expansion must reference the pre-colored registers, etc.). Otherwise, we are currently getting by with modeling the extended liveness of the variables used in the loops using fake uses. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=jfb@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/362463002
-
- 11 Jul, 2014 4 commits
-
-
Matt Wala authored
This adds lowering code for fadd, fsub, fmul, fdiv, and frem. frem, having no native x86 counterpart, is implemented by making a helper call. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/389653002
-
Jim Stichnoth authored
SZZZ_ was being incremented to S0000_ instead of S1000_. BUG= https://codereview.chromium.org/385273002/ R=wala@chromium.org Review URL: https://codereview.chromium.org/390533002
-
Jim Stichnoth authored
https://refspecs.linuxbase.org/cxxabi-1.75.html#mangling-compression describes the mechanism for compressing mangled strings by using substitutions of the form S[0-9A-Z]*_ to represent repeated components. When the prefix is handled as wrapping inside a namespace, the base-36 substitution numbers all have to be incremented. This is implemented in a very simple way by scanning the string only for instances of the substitution pattern. Unfortunately, false matches are possible because the S[0-9A-Z]*_ pattern can be a substring of the type name, or can span other components of the mangled name. Getting this completely right would essentially require a full demangling parser - see the ~4000 lines of code in cxa_demangle.cpp and ItaniumMangle.cpp. Since this is just for testing, any false matches will likely cause a linking error and the test can be rewritten to avoid false matches. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/385273002
-
Karl Schimpf authored
Makes IceTranslator.ExitStatus a boolean (rather than int), and changes code to check flag when done. Fixes bug introduced in https://codereview.chromium.org/387023002. Also cleans up the (Ice) Converter class to handle globals processing, rathe than doing it in llvm2ice.cpp. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3894 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/387023002
-
- 10 Jul, 2014 1 commit
-
-
Jim Stichnoth authored
See the BUG description for more details. In short, the register allocator was inappropriately honoring AllowRegisterOverlap even when the variable's live range overlaps with an Unhandled variable precolored to the preferred register. Also changes legalize() logic to recognize when a variable is guaranteed to ultimately have a physical register due to infinite weight, and not create a new temporary in those cases. Finally, dumps RegisterPreference and AllowRegisterOverlap info for Variables for improved diagnostics. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3897 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/380363002
-
- 09 Jul, 2014 4 commits
-
-
Jim Stichnoth authored
This invokes clang-format-diff.py so you can easily reformat just the code you touched. (Caution, this may not apply to new files.) BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/372133002
-
Matt Wala authored
- Add TargetLowering::lowerArguments() as a new stage in TargetLowering. - Add support for passing arguments/return values in XMM registers in the x86 target. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/372113005
-
Jan Voung authored
Re-used test_arith_main.cpp, mostly to share the set of interesting floating point constants. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org, wala@chromium.org Review URL: https://codereview.chromium.org/384443003
-
Jan Voung authored
For ebp, exclude as needed. For esp, don't mark it as an int register. Not sure exactly how to do a targeted test for this Om1 register allocator. The Om1 regalloc seems to start w/ a fresh whitelist after each instruction, so it may assign the same register (e.g., eax), as an earlier instruction. Without pre-colored registers, I'm not sure how to force it to allocate something other than the first few registers. I do have a test case that has a ton of pre-colored registers, (e.g., cmpxchg8b), but that is a different CL: https://codereview.chromium.org/362463002/ Encountered for: BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/369573005
-
- 08 Jul, 2014 1 commit
-
-
Jim Stichnoth authored
The compile error was introduced in https://codereview.chromium.org/361733002/ . BUG= none R=wala@chromium.org Review URL: https://codereview.chromium.org/376923003
-
- 07 Jul, 2014 2 commits
-
-
Matt Wala authored
- Add vector types to the type table. - Add support for parsing vector types in llvm2ice. - Legalize undef vector values to zero. Test that undef vector values are lowered correctly. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/353553004
-
Karl Schimpf authored
This patch only handles global addresses in PNaCl bitcode files. Function blocks are still not parsed. Also, factors out a common API for translation, so that generated ICE can always be translated using the same code. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3892 R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/361733002
-
- 29 Jun, 2014 1 commit
-
-
Jim Stichnoth authored
This is still missing a couple things: 1. It only supports flat arrays and zeroinitializers. Arrays of structs are not yet supported. 2. Initializers can't yet contain relocatables, e.g. the address of another global.Mod Some changes are made to work around an llvm-mc assembler bug. When assembling using intel syntax, llvm-mc doesn't correctly parse symbolic constants or add relocation entries in some circumstances. Call instructions work, and use in a memory operand works, e.g. mov eax, [ArrayBase+4*ecx]. To work around this, we adjust legalize() to not allow ConstantRelocatable by default, except for memory operands and when called from lowerCall(), so the relocatable ends up being the source operand of a mov instruction. Then, the mov emit routine actually emits an lea instruction for such moves. A few lit tests needed to be adjusted to make szdiff work properly with respect to global initializers. In the new cross test, the driver calls test code that returns a pointer to an array with a global initializer, and the driver compares the arrays returned by llc and Subzero. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/358013003
-
- 27 Jun, 2014 1 commit
-
-
Karl Schimpf authored
BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/350933002
-
- 26 Jun, 2014 1 commit
-
-
Jim Stichnoth authored
Without this being in the command substitutions list, lit will rely on the 'not' command being in $PATH. The substitution code is adapted from llvm/test/lit.cfg to add word-break regexps to the list. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/344063004
-
- 25 Jun, 2014 1 commit
-
-
Jan Voung authored
Loads/stores w/ type i8, i16, and i32 are converted to plain load/store instructions and lowered w/ the plain lowerLoad/lowerStore. Atomic stores are followed by an mfence for sequential consistency. For 64-bit types, use movq to do 64-bit memory loads/stores (vs the usual load/store being broken into separate 32-bit load/stores). This means bitcasting the i64 -> f64, first (which splits the load of the value to be stored into two 32-bit ops) then stores in a single op. For load, load into f64 then bitcast back to i64 (which splits after the atomic load). This follows what GCC does for c++11 std::atomic<uint64_t> load/store methods (uses movq when -mfpmath=sse). This introduces some redundancy between movq and movsd, but the convention seems to be to use movq when working with integer quantities. Otherwise, movsd could work too. The difference seems to be in whether or not the XMM register's upper 64-bits are filled with 0 or not. Zero-extending could help avoid partial register stalls. Handle up to i32 fetch_add. TODO: add i64 via a cmpxchg loop. TODO: add some runnable crosstests to make sure that this doesn't do funny things to integer bit patterns that happen to look like signaling NaNs and quiet NaNs. However, the system clang would not know how to handle "llvm.nacl.*" if we choose to target that level directly via .ll files. Or, (a) we use old-school __sync methods (sync_fetch_and_add w/ 0 to load) or (b) require buildbot's clang/gcc to support c++11... BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/342763004
-
- 24 Jun, 2014 1 commit
-
-
Jan Voung authored
Currently, the integer immediate is legalized to a 64-bit integer register first, and then the lower/upper parts of that register are used for the bitcast. However, mov(64_bit_reg, imm) done by the legalization isn't legal. Similarly, trunc of 64-bit immediates need to take the lower half of the immediate, not legalize to a var first. This shifts the legalization code around. Other cases where immediates are illegal and legalized are idiv/div, but for those cases 64-bit operands are handled separately via a function call. The function call code properly splits up immediate arguments. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/348373005
-