1. 16 Jul, 2014 2 commits
    • Lower casting operations that involve vector types. · 83b8036b
      Matt Wala authored
      Impacted instructions:
      
      bitcast {v4f32, v4i32, v8i16, v16i8} <-> {v4f32, v4i32, v8i16, v16i8}
      bitcast v8i1 <-> i8
      bitcast v16i1 <-> i16
      
      (There was already code present to handle trivial bitcasts like v16i1 <-> v16i1.)
      
      [sz]ext v4i1 -> v4i32
      [sz]ext v8i1 -> v8i16
      [sz]ext v16i1 -> v16i8
      
      trunc v4i32 -> v4i1
      trunc v8i16 -> v8i1
      trunc v16i8 -> v16i1
      
      [su]itofp v4i32 -> v4f32
      fpto[su]i v4f32 -> v4i32
      
      Where there is a relatively simple lowering to x86 instructions, it has been used. Otherwise a helper call is used.
      
      Some lowerings require a materialization of a integer vector with 1s in each entry. Since there is no support for vector constant pools, the constant is materialized purely through register operations.
      
      BUG=none
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/383303003
    • Lower bitmanip intrinsics, assuming absence of BMI/SSE4.2 for now. · e4da26f6
      Jan Voung authored
      We'll need the fallbacks in any case. However, once we've
      decided on how to specify the CPU features of the user
      machine we can use the nicer LZCNT/TZCNT/POPCNT as well.
      
      Adds cmov, bsf, and bsr instructions.
      
      Calls a popcount helper function for machines without SSE4.2.
      
      Not handling bswap yet (which can also take i16 params).
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
      R=stichnot@chromium.org, wala@chromium.org
      
      Review URL: https://codereview.chromium.org/390443005
  2. 15 Jul, 2014 2 commits
  3. 14 Jul, 2014 2 commits
  4. 11 Jul, 2014 4 commits
  5. 10 Jul, 2014 1 commit
  6. 09 Jul, 2014 4 commits
  7. 08 Jul, 2014 1 commit
  8. 07 Jul, 2014 2 commits
  9. 29 Jun, 2014 1 commit
    • Subzero: Partial implementation of global initializers. · de4ca71e
      Jim Stichnoth authored
      This is still missing a couple things:
      
      1. It only supports flat arrays and zeroinitializers.  Arrays of structs are not yet supported.
      
      2. Initializers can't yet contain relocatables, e.g. the address of another global.Mod
      
      Some changes are made to work around an llvm-mc assembler bug.  When assembling using intel syntax, llvm-mc doesn't correctly parse symbolic constants or add relocation entries in some circumstances.  Call instructions work, and use in a memory operand works, e.g. mov eax, [ArrayBase+4*ecx].  To work around this, we adjust legalize() to not allow ConstantRelocatable by default, except for memory operands and when called from lowerCall(), so the relocatable ends up being the source operand of a mov instruction.  Then, the mov emit routine actually emits an lea instruction for such moves.
      
      A few lit tests needed to be adjusted to make szdiff work properly with respect to global initializers.
      
      In the new cross test, the driver calls test code that returns a pointer to an array with a global initializer, and the driver compares the arrays returned by llc and Subzero.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/358013003
  10. 27 Jun, 2014 1 commit
  11. 26 Jun, 2014 1 commit
  12. 25 Jun, 2014 1 commit
    • Add atomic load/store, fetch_add, fence, and is-lock-free lowering. · 5cd240df
      Jan Voung authored
      Loads/stores w/ type i8, i16, and i32 are converted to
      plain load/store instructions and lowered w/ the plain
      lowerLoad/lowerStore.  Atomic stores are followed by an mfence
      for sequential consistency.
      
      For 64-bit types, use movq to do 64-bit memory
      loads/stores (vs the usual load/store being broken into
      separate 32-bit load/stores). This means bitcasting the
      i64 -> f64, first (which splits the load of the value to be
      stored into two 32-bit ops) then stores in a single op. For
      load, load into f64 then bitcast back to i64 (which splits
      after the atomic load). This follows what GCC does for
      c++11 std::atomic<uint64_t> load/store methods (uses movq
      when -mfpmath=sse). This introduces some redundancy between
      movq and movsd, but the convention seems to be to use movq
      when working with integer quantities. Otherwise, movsd
      could work too. The difference seems to be in whether or
      not the XMM register's upper 64-bits are filled with 0 or
      not. Zero-extending could help avoid partial register
      stalls.
      
      Handle up to i32 fetch_add. TODO: add i64 via a cmpxchg loop.
      
      TODO: add some runnable crosstests to make sure that this
      doesn't do funny things to integer bit patterns that happen
      to look like signaling NaNs and quiet NaNs. However, the system
      clang would not know how to handle "llvm.nacl.*" if we choose to
      target that level directly via .ll files. Or, (a) we use old-school __sync
      methods (sync_fetch_and_add w/ 0 to load) or (b) require buildbot's
      clang/gcc to support c++11...
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/342763004
  13. 24 Jun, 2014 1 commit
    • Bitcast of 64-bit immediates may need to split the immediate, not a var. · 1ee34165
      Jan Voung authored
      Currently, the integer immediate is legalized to a
      64-bit integer register first, and then the lower/upper
      parts of that register are used for the bitcast.
      However, mov(64_bit_reg, imm) done by the legalization
      isn't legal.
      
      Similarly, trunc of 64-bit immediates need to take the
      lower half of the immediate, not legalize to a var first.
      
      This shifts the legalization code around.
      
      Other cases where immediates are illegal and legalized
      are idiv/div, but for those cases 64-bit operands are
      handled separately via a function call. The function
      call code properly splits up immediate arguments.
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/348373005
  14. 18 Jun, 2014 5 commits
  15. 17 Jun, 2014 2 commits
  16. 12 Jun, 2014 2 commits
  17. 06 Jun, 2014 1 commit
    • Make py import not assume dir is "pnacl-subzero". Avoid autovect in crosstest. · 1248a6d1
      Jan Voung authored
      Derek's CL to check out subzero calls the source directory
      "subzero", and the file header comments call the directory
      "subzero". Just make the python sys.path munging for
      importing pydir more generic.
      
      Also change crosstest to not run the raw LLVM "opt" with
      optimizations (only use it for ABI stabilization passes).
      Instead run pnacl-clang with -O2. Otherwise, newer NACL_SDK
      versions include a newer LLVM "opt" binary which
      autovectorizes and may generate vector IR that is not
      handled by Subzero yet.
      
      E.g.,
      LLVM ERROR: Invalid PNaCl instruction:   %1 = insertelement <4 x i32> undef, i32 %0, i32 0
      w/ pepper_canary to version 37, revision 274873
      
      BUG=none
      TEST=make -f Makefile.standalone check
      R=stichnot@chromium.org, wala@chromium.org
      
      Review URL: https://codereview.chromium.org/317963002
  18. 05 Jun, 2014 1 commit
    • Fix a C++ violation. · ab8242ca
      Jim Stichnoth authored
      Ice::Inst::NumberSentinel is defined within the Inst class definition:
      
      class Inst {
        ...
        static const InstNumberT NumberDeleted = -1;
        static const InstNumberT NumberSentinel = 0;
        ...
      };
      
      Under some compilers/options, this causes a link error when passing NumberSentinel as a const T& argument.
      
      (Another option would be to move the actual definitions into IceInst.cpp.)
      
      BUG= none
      R=jfb@chromium.org
      
      Review URL: https://codereview.chromium.org/311243006
  19. 04 Jun, 2014 1 commit
    • Subzero: Initial O2 lowering · d97c7df5
      Jim Stichnoth authored
      Includes the following:
      1. Liveness analysis.
      2. Linear-scan register allocation.
      3. Address mode optimization.
      4. Compare-branch fusing.
      
      All of these depend on liveness analysis.  There are three versions of liveness analysis (in order of increasing cost):
      1. Lightweight.  This computes last-uses for variables local to a single basic block.
      2. Full.  This computes last-uses for all variables based on global dataflow analysis.
      3. Full live ranges.  This computes all last-uses, plus calculates the live range intervals in terms of instruction numbers.  (The live ranges are needed for register allocation.)
      
      For testing the full live range computation, Cfg::validateLiveness() checks every Variable of every Inst and verifies that the current Inst is contained within the Variable's live range.
      
      The cross tests are run with O2 in addition to Om1.
      
      Some of the lit tests (for what good they do) are updated with O2 code sequences.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/300563003
  20. 02 Jun, 2014 1 commit
  21. 23 May, 2014 2 commits
  22. 22 May, 2014 2 commits
    • Add Makefiles to support building along with LLVM · bc643135
      Derek Schuff authored
      This change now supports building subzero as part of the LLVM build (instead
      of in a separate build step). It is modeled on clang's Makefiles.
      
      The existing Makefile has been renamed and can still be used manually, e.g.
      Make -f Makefile.standalone
      
      It does not yet support running tests, just building.
      
      R=stichnot@chromium.org, jvoung@chromium.org
      BUG=
      
      Review URL: https://codereview.chromium.org/293983007
    • Add Om1 lowering with no optimizations. · 5bc2b1d1
      Jim Stichnoth authored
      This adds infrastructure for low-level x86-32 instructions, and the target lowering patterns.
      
      Practically no optimizations are performed.  Optimizations to be introduced later include liveness analysis, dead-code elimination, global linear-scan register allocation, linear-scan based stack slot coalescing, and compare/branch fusing.  One optimization that is present is simple coalescing of stack slots for variables that are only live within a single basic block.
      
      There are also some fairly comprehensive cross tests.  This testing infrastructure translates bitcode using both Subzero and llc, and a testing harness calls both versions with a variety of "interesting" inputs and compares the results.  Specifically, Arithmetic, Icmp, Fcmp, and Cast instructions are tested this way, across all PNaCl primitive types.
      
      BUG=
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/265703002