1. 06 Aug, 2015 4 commits
  2. 05 Aug, 2015 5 commits
    • Subzero: Fix an Om1 crash from memset lowering. · f6f9825e
      Jim Stichnoth authored
      With a certain combination of memset arguments, legalizeToReg() is called but its result is unused.  Om1 register allocation does not like this because it sees an infinite-weight variable with a definition but no uses.  The simplest fix is to add a fake use.
      
      The problem shows up when building spec2k with -Om1.
      
      BUG= none
      R=ascull@google.com
      
      Review URL: https://codereview.chromium.org/1272823004.
    • Subzero: Slight improvement to phi lowering. · 552490c2
      Jim Stichnoth authored
      When doing post phi lowering register allocation, the original code limited register allocation to pre-colored or infinite-weight variables with a non-empty live range within the new edge-split nodes.  This limitation ends up missing some opportunities.  Specifically, when a temporary is introduced to break a dependency cycle, e.g.:
        // a = phi(b)
        // b = phi(a)
        t = a
        a = b
        b = t
      then t was always stack-allocated, even if a physical register was available.
      
      In the new design, the RangeMask bitvector specifies which variables should have their live ranges tracked and computed.  For normal liveness analysis, all variables are tracked.  For post phi lowering liveness analysis, all variables created from phi lowering, plus all pre-colored variables, plus all infinite-weight variables, are tracked.
      
      The result is slightly better code quality, and sometimes the frame size is 1 or 2 words smaller.
      
      The hope was to narrow the 10% translation-time degradation in pnacl-llc.pexe compared to the old, hackish way of phi lowering, but that goal still proves to be elusive.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1271923002.
    • Subzero. Implements x86-64 lowerCall. · e0d9afa8
      John Porto authored
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1266673003.
    • Clarify which type "Label" refers to (generic vs X86) · c2ec5817
      Jan Voung authored
      There is a generic Label that can be used as-is for ARM
      and MIPS, and there is an x86 derived class Label which
      adds the concept of near and far (for 8-bit vs 32-bit
      jumps). Previously, one method "getOrCreateCfgNodeLabel"
      would say that it returns a Label when it means the generic
      one in some cases and the x86 one in other cases.
      
      Split that into getCfgNodeLabel and getOrCreateCfgNodeLabel
      where getCfgNodeLabel returns the generic one and
      getOrCreateCfgNodeLabel is part of the x86 code and
      returns the x86 one.
      
      BUG=none
      R=ascull@google.com, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1265023003 .
    • Order jump tables for deterministic or randomized emission. · 1eda90a1
      Andrew Scull authored
      BUG=
      R=stichnot@chromium.org, jvoung, stichnot
      
      Review URL: https://codereview.chromium.org/1260183008.
  3. 04 Aug, 2015 2 commits
  4. 02 Aug, 2015 1 commit
    • Subzero: Expand the liveness consistency check. · b3bfcbcb
      Jim Stichnoth authored
      Liveness analysis includes a consistency check on each node, to verify that variables referenced in only one block do not appear to be live coming into a block (and are therefore live across multiple blocks).  This check was disabled in the entry block because there might be function arguments that are referenced only in the entry block but are still live coming in.
      
      It seems that this entry-block exclusion has been largely unnecessary for some time.  This is because input arguments and other special variables are now pre-marked as multi-block.  The exclusion masks problems in some single-block lit tests, so it's best if it can be removed.
      
      This CL removes the exclusion, and fixes some minor issues uncovered in the MIPS and ARM target lowering.  A key issue is that when implementing a new target lowering and using --skip-unimplemented to make progress with existing tests, it may be necessary to add FakeDef instructions to avoid liveness inconsistency errors.
      
      Note that when this patch is applied to 448c16f0, it correctly identifies the liveness consistency error (as shown by a "make check-lit" failure) that was fixed in 59f2d925.
      
      BUG= none
      R=jpp@chromium.org
      
      Review URL: https://codereview.chromium.org/1265093002.
  5. 31 Jul, 2015 5 commits
    • Subzero. Buildable, non-functional TargetLoweringX8664. · 453660ff
      John Porto authored
      This CL adds a TargetLoweringX8664 that inherits from TargetX86Base, but
      other than that it does nothing to generate runnable code.
      
      Things that need to be addressed in follow up CLs:
      1) lowerCall
      2) lowerArguments
      3) lowerRet
      4) addPrologue
      5) addEpilogue
      6) Native 64-bit arithmetic
      7) 32- to 64-bit addressing
      
      (7) will be particularly interesting. Pointers in Pexes are always
      32-bit wide, so pexes have a de facto 32-bit address space. In
      Sandboxed mode that's solved by using RZP (i.e., r15) as a base
      register. For native codegen, we still need to decide what to do
      -- very likely we will start targeting X32.
      
      NOTE: This CL also
      
      s/IceType_ForceRexW/RexTypeForceRexW/g
      
      because I forgot to do it in the X8664 assembler cl.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1257643004.
    • Subzero. Misc fixes. · 59f2d925
      John Porto authored
      This CL disables the X86 assembler tests by default. They take too long
      to compile, so there's very little point in running them with the other
      unittests.
      
      This CL fixes a bug introduced in
      https://codereview.chromium.org/1260163003/ that caused liveness
      analysis to assert due to a uninitialized Variable.
      
      BUG=
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1266863002.
    • ARM: Add a postRA pass to legalize stack offsets. Greedy approach (reserve IP). · 28068adb
      Jan Voung authored
      Make a post-register allocation and post-addProlog pass to
      go through variables with stack offsets and legalize them
      in case the offsets are not encodeable. The naive approach
      is to reserve IP, and use IP to movw/movt the offset, then
      add/sub the frame/stack pointer to IP and use IP as the new
      base instead of the frame/stack pointer. We do some amount
      of CSE within a basic block, and share the IP base pointer
      when it is (a) within range for later stack references,
      and (b) IP hasn't been clobbered (e.g., by a function call).
      I chose to do this greedy approach for both Om1 and O2,
      since it should just be a linear pass, and it reduces the
      amount of variables/instructions created compared to the
      super-naive peephole approach (so might be faster?).
      
      Introduce a test-only flag and use that to artificially
      bloat the stack frame so that spill offsets are out
      of range for ARM. Use that flag for cross tests to
      stress this new code a bit more (than would have been
      stressed by simply doing a lit test + FileCheck).
      
      Also, the previous version of emitVariable() was using the
      Var's type to determine the range (only +/- 255 for i16,
      vs +/- 4095 for i32), even though mov's emit() always
      uses a full 32-bit "ldr" instead of a 16-bit "ldrh".
      Use a common legality check, which uses the stackSlotType
      instead of the Var's type. This previously caused the
      test_bitmanip to spuriously complain, even though the
      offsets for Om1 were "only" in the 300 byte range. With this
      fixed, we can then enable the test_bitmanip test too.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1241763002 .
    • Add -reorder-basic-blocks option and fix nop insertion · 969f6a33
      Qining Lu authored
      1. Basic block reordering can be enabled with -reorder-basic-blocks option enabled.
      
      Blocks will be sorted according to the Reversed Post Traversal Order, but the next
      node to visit among all candidate children nodes is selected 'randomly'.
      
      Example:
        A
       / \
      B   C
       \ /
        D
      
      This CFG can generate two possible layouts:
      A-B-C-D or A-C-B-D
      
      2. Fix nop insetion
      
      Add checks to avoiding insertions in empty basic blocks(dead blocks) and bundle locked instructions.
      
      BUG=
      R=jpp@chromium.org, jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1255303004.
    • Fix a -Wcovered-switch-default warning in emitJumpTables. · c2648c2d
      Jan Voung authored
      The Subzero build inside of the LLVM build system turns this on.
      
      BUG=none
      R=ascull@google.com
      
      Review URL: https://codereview.chromium.org/1264913005 .
  6. 30 Jul, 2015 2 commits
    • Iasm and obj lowering for advanced switch lowering. · 86df4e9e
      Andrew Scull authored
      Jump table emission is delayed until offsets are known. X86 local jumps can be
      near or far. Sanboxing is applied to indirect jumps from jump table.
      
      BUG=
      R=stichnot@chromium.org, jvoung
      
      Review URL: https://codereview.chromium.org/1257283004.
    • Subzero: Cleanly implement register allocation after phi lowering. · a3f57b9a
      Jim Stichnoth authored
      After finding a valid linearization of phi assignments, the old approach calls a complicated target-specific method that lowers and ad-hoc register allocates the phi assignments.
      
      In the new approach, we use existing target lowering to lower assignments into mov/whatever instructions, and enhance the register allocator to be able to forcibly spill and reload a register if one is needed but none are available.
      
      The new approach incrementally updates liveness and live ranges for newly added nodes and variable uses, to avoid having to expensively recompute it all.
      
      Advanced phi lowering is enabled now on ARM, and constant blinding no longer needs to be disabled during phi lowering.
      
      Some of the metadata regarding which CfgNode a local variable belongs to, needed to be made non-const, in order to add spill/fill instructions to a CfgNode during register allocation.
      
      Most of the testing came from spec2k.  There are some minor differences in the output regarding stack frame offsets, probably related to the order that new nodes are phi-lowered.  The changes related to constant blinding were tested by running spec with "-randomize-pool-immediates=randomize -randomize-pool-threshold=8".
      
      Unfortunately, this appears to add about 10% to the translation time for 176.gcc.  The cost is clear in the -timing output so it can be investigated later.  There is a TODO suggesting the possible cause and solution, for later investigation.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1253833002.
  7. 28 Jul, 2015 2 commits
  8. 23 Jul, 2015 4 commits
  9. 21 Jul, 2015 5 commits
    • Make ARM RegNames[] static like X86 (no ARM syms in X86-only build). · 0dab0324
      Jan Voung authored
      The X86 code was switch out here:
      https://codereview.chromium.org/1216933015/diff/150001/src/IceTargetLoweringX86Base.h
      
      The important bit might be that it's static const char * instead of
      static IceString. This removes static ctor/dtor for that array,
      which LTO doesn't seem to be able to optimize out, leaving ARM
      and MIPS symbols in the X86-only build. After changing it to static
      const char *, LTO is able to optimize out the ARM and MIPS
      symbols in the x86-only build, saving about 3KB of .text and
      few bytes of .rodata.
      
      BUG=none
      R=jpp@chromium.org
      
      Review URL: https://codereview.chromium.org/1246013004 .
    • Changes the TargetX8632 to inherit from TargetX86Base<TargetX8632>. · 5aeed955
      John Porto authored
      Previously, TargetX8632 was defined as
      
      class TargetX8632 : public TargetLowering;
      
      and its create method would do
      
      TargetX8632 *TargetX8632::create() {
        return TargetX86Base<TargetX8632>::create()
      }
      
      TargetX86Base<M> was defined was
      
      template <class M> class TargetX86Base : public M;
      
      which meant TargetX8632 had no way to access methods defined in
      TargetX86Base<M>. This used to not be a problem, but with the X8664
      backend around the corner it became obvious that the actual TargetX86
      targets (e.g., X8632. X8664SysV, X8664Win) would need access to some
      methods in TargetX86Base (e.g., _mov, _fld, _fstp etc.)
      
      This CL changes the class hierarchy to something like
      
      TargetLowering <-- TargetX86Base<X8632> <-- X8632
                     <-- TargetX86Base<X8664SysV> <-- X8664SysV (TODO)
                     <-- TargetX86Base<X8664Win> <-- X8664Win (TODO)
      
      One problem with this new design is that TargetX86Base<M> needs to be
      able to invoke methods in the actual backends. For example, each
      backend will have its own way of lowering llvm.nacl.read.tp. This
      creates a chicken/egg problem that is solved with (you guessed)
      template machinery (some would call it voodoo.)
      
      In this CL, as a proof of concept, we introduce the
      
         TargetX86Base::dispatchToConcrete
      
      template method. It is a very simple method: it downcasts "this" from
      the template base class (TargetX86Base<TargetX8664>) to the actual
      (concrete) class (TargetX8632), and then it invokes the requested
      method. It uses perfect forwarding for passing arguments to the method
      being invoked, and returns whatever that method returns.
      
      A simple proof-of-concept for using dispatchToConcrete is introduced
      with this CL: it is used to invoke createNaClReadTPSrcOperand on the
      concrete target class. In a way, dispatchToConcrete is a poor man's
      virtual method call, without the virtual method call overhead.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1217443024.
    • Only run adv-switch test when asm is allowed. · 8c8f3bc1
      Andrew Scull authored
      BUG=
      R=stichnot@chromium.org, jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1248823003.
    • Rename legalizeToVar to the more accurate legalizeToReg. · 97f460dc
      Andrew Scull authored
      BUG=
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1245063003.
    • Fix --filetype=iasm non-pc-rel fixup offsets (double counted). · b7db1a52
      Jan Voung authored
      For pc-rel fixups, we have a ConstantRelocatable referring
      to Foo+0, and and the offset "-4" is encoded in the code
      buffer (but not the ConstantRelocatable object). Thus we
      need to load from the code buffer in order to
      get that "-4" instead of just taking the +0 from Foo+0.
      
      For non-pc-rel fixups, we have the ConstantRelocatable
      with a true offset, and we also write that offset into the
      code buffer (for ELF REL and not RELA, it expects the
      offset in the code buffer). In this case, we want to choose
      one and not double-count.
      
      BUG=none
      176.gcc seemed to be failing when compiled with --filetype=iasm...
      load address for 64-bit pointers were +8 instead of +4
      
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1241313002 .
  10. 20 Jul, 2015 1 commit
    • Introduction of improved switch lowering. · 87f80c12
      Andrew Scull authored
      This includes the high level analysis of switches, the x86 lowering,
      the repointing of targets in jump tables and ASM emission of jump
      tables.
      
      The technique uses jump tables, range test and binary search with
      worst case O(lg n) which improves the previous worst case of O(n)
      from a sequential search.
      
      Use is hidden by the --adv-switch flag as the IAS emission still
      needs to be implemented.
      
      BUG=None
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1234803007.
  11. 16 Jul, 2015 1 commit
    • Factor out prelowerPhi for 32-bit targets. Disable adv phi lowering for ARM. · 53483691
      Jan Voung authored
      This way, prelowerPhi can be shared between 32-bit targets (split 64-bit
      values into 32-bit ones, and legalize undef). Suggestions from template
      experts on how to share prelowerPhi welcome. I'm not particularly happy
      with the first pass in that legalizeUndef has to be made public (though
      other methods used are also public). Also the methods required from the
      template type TargetT aren't clear without looking through the code.
      
      The current advanced phi lowering code depends on lowerPhiAssignments.
      That is a special case of lowerAssign that does some adhoc register
      allocation. The current adhoc register allocation doesn't work as
      well when a target may need to spill more than one register.
      Disable that optimization for ARM for now, until we have a better
      way that works for ARM, and enable O2 cross testing on ARM.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1223133007 .
  12. 15 Jul, 2015 2 commits
    • Factor out legalization of undef, and handle more cases for ARM. · fbdd2440
      Jan Voung authored
      By factoring out legalizeUndef(), we can use the same
      logic in prelowerPhis which may help if we ever change the
      value used (though if we switch from zero-ing out regs to
      using uninitialized regs, it'll take more work -- e.g.,
      can't return a 64-bit reg).
      
      For x86, use legalizeUndef where it's clear that the value
      is immediately fed to loOperand/hiOperand then another
      legalize() call. Otherwise, leave the general
      X = legalize(X); alone where the code is counting on that
      being the sole legalization.
      
      For x86 legalize(const64) is a pass-through, which can then
      be passed to loOperand/hiOperand nicely. However, for ARM,
      legalize(const64) may end up trying to copy the const64 to
      a register, but we don't have 64-bit registers. Instead do
      legalizeUndef(X) where x86 would have just done
      legalize(X). This happens to work because legalizeUndef
      doesn't try to copy to reg, and we immediately pass the
      result to loOperand/hiOperand() which then passes the
      result to a real legalization call.
      
      Add a few more undef tests.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1233903002 .
    • Subzero: Fix register encodings. · 728c1d40
      Jim Stichnoth authored
      Specifically, we were ending up with Encoded_Reg_xmm0=0 yet Encoded_Reg_xmm1=10, Encoded_Reg_xmm2=11, etc.
      
      It's a mystery as to why this wasn't triggering any failures with filetype!=asm.
      
      BUG= none
      R=jpp@chromium.org
      
      Review URL: https://codereview.chromium.org/1231973003.
  13. 13 Jul, 2015 1 commit
    • Add an cross include path for ARM to work around clang bug 22937. · 112b6e89
      Jan Voung authored
      Clang appears to be missing an include path to find
      bits/c++config.h so we were unable to compile the
      unsandboxed c++ based cross tests and link against the
      subzero unsandboxed ARM object files.
      
      Work around this for now by finding and including the
      missing path.
      
      Turn on a few ARM cross tests that should be working
      (mem_intrin and test_strengthreduce -- though the
      strength-reduction isn't done for ARM). The test_bitmanip
      still fails, because under Om1 we overflow the stack offset
      and need to materialize that offset with a register first.
      
      Update a few other references that still say x8632.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
      R=jpp@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1232183002 .
  14. 11 Jul, 2015 1 commit
  15. 10 Jul, 2015 1 commit
  16. 09 Jul, 2015 2 commits
  17. 08 Jul, 2015 1 commit