1. 25 Jul, 2014 1 commit
  2. 24 Jul, 2014 4 commits
  3. 23 Jul, 2014 3 commits
  4. 22 Jul, 2014 3 commits
  5. 21 Jul, 2014 1 commit
  6. 18 Jul, 2014 4 commits
  7. 17 Jul, 2014 1 commit
    • Lower the rest of the vector arithmetic operations. · 7fa22d8a
      Matt Wala authored
      The instructions emitted by the lowering operations require memory
      operands to be aligned to 16 bytes. Since there is no support for
      aligning memory operands in Subzero, do the arithmetic in registers for
      now.
      
      Add vector arithmetic to the arith crosstest. Pass the -mstackrealign
      parameter to the crosstest clang so that llc code called back from
      Subzero code (helper calls) doesn't assume that the stack is aligned at
      the entry to the call.
      
      BUG=none
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/397833002
  8. 16 Jul, 2014 2 commits
    • Lower casting operations that involve vector types. · 83b8036b
      Matt Wala authored
      Impacted instructions:
      
      bitcast {v4f32, v4i32, v8i16, v16i8} <-> {v4f32, v4i32, v8i16, v16i8}
      bitcast v8i1 <-> i8
      bitcast v16i1 <-> i16
      
      (There was already code present to handle trivial bitcasts like v16i1 <-> v16i1.)
      
      [sz]ext v4i1 -> v4i32
      [sz]ext v8i1 -> v8i16
      [sz]ext v16i1 -> v16i8
      
      trunc v4i32 -> v4i1
      trunc v8i16 -> v8i1
      trunc v16i8 -> v16i1
      
      [su]itofp v4i32 -> v4f32
      fpto[su]i v4f32 -> v4i32
      
      Where there is a relatively simple lowering to x86 instructions, it has been used. Otherwise a helper call is used.
      
      Some lowerings require a materialization of a integer vector with 1s in each entry. Since there is no support for vector constant pools, the constant is materialized purely through register operations.
      
      BUG=none
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/383303003
    • Lower bitmanip intrinsics, assuming absence of BMI/SSE4.2 for now. · e4da26f6
      Jan Voung authored
      We'll need the fallbacks in any case. However, once we've
      decided on how to specify the CPU features of the user
      machine we can use the nicer LZCNT/TZCNT/POPCNT as well.
      
      Adds cmov, bsf, and bsr instructions.
      
      Calls a popcount helper function for machines without SSE4.2.
      
      Not handling bswap yet (which can also take i16 params).
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
      R=stichnot@chromium.org, wala@chromium.org
      
      Review URL: https://codereview.chromium.org/390443005
  9. 15 Jul, 2014 2 commits
  10. 14 Jul, 2014 2 commits
  11. 11 Jul, 2014 4 commits
  12. 10 Jul, 2014 1 commit
  13. 09 Jul, 2014 4 commits
  14. 08 Jul, 2014 1 commit
  15. 07 Jul, 2014 2 commits
  16. 29 Jun, 2014 1 commit
    • Subzero: Partial implementation of global initializers. · de4ca71e
      Jim Stichnoth authored
      This is still missing a couple things:
      
      1. It only supports flat arrays and zeroinitializers.  Arrays of structs are not yet supported.
      
      2. Initializers can't yet contain relocatables, e.g. the address of another global.Mod
      
      Some changes are made to work around an llvm-mc assembler bug.  When assembling using intel syntax, llvm-mc doesn't correctly parse symbolic constants or add relocation entries in some circumstances.  Call instructions work, and use in a memory operand works, e.g. mov eax, [ArrayBase+4*ecx].  To work around this, we adjust legalize() to not allow ConstantRelocatable by default, except for memory operands and when called from lowerCall(), so the relocatable ends up being the source operand of a mov instruction.  Then, the mov emit routine actually emits an lea instruction for such moves.
      
      A few lit tests needed to be adjusted to make szdiff work properly with respect to global initializers.
      
      In the new cross test, the driver calls test code that returns a pointer to an array with a global initializer, and the driver compares the arrays returned by llc and Subzero.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/358013003
  17. 27 Jun, 2014 1 commit
  18. 26 Jun, 2014 1 commit
  19. 25 Jun, 2014 1 commit
    • Add atomic load/store, fetch_add, fence, and is-lock-free lowering. · 5cd240df
      Jan Voung authored
      Loads/stores w/ type i8, i16, and i32 are converted to
      plain load/store instructions and lowered w/ the plain
      lowerLoad/lowerStore.  Atomic stores are followed by an mfence
      for sequential consistency.
      
      For 64-bit types, use movq to do 64-bit memory
      loads/stores (vs the usual load/store being broken into
      separate 32-bit load/stores). This means bitcasting the
      i64 -> f64, first (which splits the load of the value to be
      stored into two 32-bit ops) then stores in a single op. For
      load, load into f64 then bitcast back to i64 (which splits
      after the atomic load). This follows what GCC does for
      c++11 std::atomic<uint64_t> load/store methods (uses movq
      when -mfpmath=sse). This introduces some redundancy between
      movq and movsd, but the convention seems to be to use movq
      when working with integer quantities. Otherwise, movsd
      could work too. The difference seems to be in whether or
      not the XMM register's upper 64-bits are filled with 0 or
      not. Zero-extending could help avoid partial register
      stalls.
      
      Handle up to i32 fetch_add. TODO: add i64 via a cmpxchg loop.
      
      TODO: add some runnable crosstests to make sure that this
      doesn't do funny things to integer bit patterns that happen
      to look like signaling NaNs and quiet NaNs. However, the system
      clang would not know how to handle "llvm.nacl.*" if we choose to
      target that level directly via .ll files. Or, (a) we use old-school __sync
      methods (sync_fetch_and_add w/ 0 to load) or (b) require buildbot's
      clang/gcc to support c++11...
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/342763004
  20. 24 Jun, 2014 1 commit
    • Bitcast of 64-bit immediates may need to split the immediate, not a var. · 1ee34165
      Jan Voung authored
      Currently, the integer immediate is legalized to a
      64-bit integer register first, and then the lower/upper
      parts of that register are used for the bitcast.
      However, mov(64_bit_reg, imm) done by the legalization
      isn't legal.
      
      Similarly, trunc of 64-bit immediates need to take the
      lower half of the immediate, not legalize to a var first.
      
      This shifts the legalization code around.
      
      Other cases where immediates are illegal and legalized
      are idiv/div, but for those cases 64-bit operands are
      handled separately via a function call. The function
      call code properly splits up immediate arguments.
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/348373005