1. 15 Sep, 2015 2 commits
    • Subzero: Fix labels for block profiling. · e7dbc0bc
      Jim Stichnoth authored
      The problem is that the block profiling pass runs at the very beginning and commits to particular label strings, but the actual label names might change by emission time because of node reordering.
      
      There was actually something of a workaround - given a label string from the profile output, inspect the *profiled* asm code and search for the block containing the increment of the counter location, as the name of the counter location label is related to the label string in the profile output.  However, it's tedious to mentally filter out the counter update code, and the counter update code has a huge impact on register allocation.
      
      The solution is to use a persistent number in CfgNode for constructing the label string, which doesn't change when the nodes are reordered.
      
      One note (independent of this change): Without block profiling, empty basic blocks are deleted and don't appear in the asm output.  But with block profiling, these blocks are never empty because they contain profile update instructions.  This means the profile output may contain labels that don't exist in the non-profiled asm.
      
      Another note: New nodes created as a result of edge splitting from advanced phi lowering are not profiled.
      
      BUG= none
      R=ascull@google.com, jpp@chromium.org
      
      Review URL: https://codereview.chromium.org/1341613002 .
  2. 14 Sep, 2015 2 commits
  3. 11 Sep, 2015 1 commit
  4. 09 Sep, 2015 2 commits
  5. 08 Sep, 2015 2 commits
  6. 04 Sep, 2015 3 commits
  7. 03 Sep, 2015 1 commit
  8. 31 Aug, 2015 1 commit
  9. 28 Aug, 2015 1 commit
  10. 25 Aug, 2015 2 commits
  11. 21 Aug, 2015 2 commits
  12. 20 Aug, 2015 3 commits
    • Use separate random number generator for each randomization pass · aee5fa8d
      Qining Lu authored
      This removes random number generator from GlobalContext class and decouples different randomization passes
      
      1. Add a new constructor for random number generator which merge three arguments to into one seed for the underlying implementation of random number generator.
      
      RandomNumberGenerator(uint64_t Seed, RandomizationPassesEnum RandomizationPassID, uint64_t Salt=0)
      
      param Seed: Should be the global random number seed passed through command line.
      param RandomizationPassID: Should be the ID for different randomization passes.
      param Salt: Should be an additional integer salt, default to be 0.
      
      2. Move the creation of random number generators to the call sites of randomization passes. Each randomization pass create its own random number generator with specific salt value.
      
      Function reordering:		Salt = 0 (default)
      Basic Block reordering:		Salt = Function Sequence Number
      Global Variable reordering:	Salt = 0 (default)
      Pooled Constants reordering:	Salt = Constants' Kind value (return of getKind())
                     *Jump Tables:	Salt = 0
      Nop Insertion:			Salt = Function Sequence Number
      Register Alloc Randomization:	Salt = (Function Sequence Number << 1) ^ (Kind == RAK_Phi ? 0u : 1u)
      Constants Blinding:		Salt = Function Sequence Number
      
      *Jump tables are treated as pooled constants, but without Kind value as salt.
      
      BUG=
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1300993002.
    • Inline memove for small constant sizes and refactor memcpy and memset. · cfa628b5
      Andrew Scull authored
      The memory intrinsics are only optimized at -O1 and higher unless the
      -fmem-intrin-opt flag is set to force to optimization to take place.
      
      This change also introduces the xchg instruction for two register operands. This
      is no longer used in the memory intrinsic lowering (or by anything else) but the
      implementation is left for future use.
      
      BUG=
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1278173009.
    • Change to use arena allocation for function-local data in parser. · 209318af
      Karl Schimpf authored
      Changes to use arena allocator of the CFG associated with function, for
      vectors in the function parser.
      
      BUG=None
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1293343003 .
  13. 17 Aug, 2015 1 commit
    • Restore function-local variables to use a vector. · 7a99327d
      Karl Schimpf authored
      CL 1282523002 changed the bitcode parser from using a vector, to using
      an unordered map. This was done because one could forward reference a
      local variable, and would freeze the computer trying to allocate a
      vector large enough to contain the index.
      
      This patch goes back to using vectors. To fix the forward variable
      reference, we use the number of bytes in the function to determine if
      the index is possible. This stops very large (probematic) vector
      resizes from happening.
      
      BUG=None
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1293923002 .
  14. 14 Aug, 2015 1 commit
    • Change tracking of basic blocks (within function) to use a vector. · 98ed4464
      Karl Schimpf authored
      Changing the code to "preallocate" basic blocks in a vector, rather
      than dynamically creating on demand. This has the advantage of not
      requiring basic blocks to be sorted after the bitcode is parsed.
      
      This also means that the name of the basic blocks remain constant,
      even during parsing, making debugging easier.
      
      The drawback is that the DECLAREBLOCKS bitcode record of a function
      block can define a very large number of basic blocks. To control this,
      we look at the function block size (within the bitstream) to determine
      the maximal number of basic blocks that could be defined. If the
      DECLAREBLOCKS record specifies a number larger than this, we generate
      an error and recover (if applicable).
      
      We also add an cleanup test that confirms the number of declared basic
      blocks correspond to the number of basic blocks defined in the
      function.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4261
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1297433002 .
  15. 12 Aug, 2015 2 commits
  16. 10 Aug, 2015 2 commits
    • Subzero: Misc fixes/cleanup. · 992f91dd
      Jim Stichnoth authored
      1. Fix MINIMAL build.
        (a) Add a void cast to a var only used in asserts.
        (b) Use "REQUIRES:" instead of "REQUIRES" in a .ll file.
      2. Use StrError instead of StrDump for errors.
      3. Use a lambda instead of a functor because C++11.
      4. Explicit check for -filetype=obj in a non-dump-enabled build, to avoid cryptic downstream error messages.
      5. Run "make format" which was neglected earlier.
      
      BUG= none
      R=kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/1284493003.
    • Fix processing of local variable indices in fuction blocks. · c6acf08f
      Karl Schimpf authored
      The previous code used a vector to hold local values associated with
      indices in the bitcode file. The problem was that the vector would be
      expanded to match the index of a "variable index forward reference".
      If the index was very large, the program would freeze the computer
      trying to allocate an array large enough to contain the index.
      
      This patch fixes this by using a local unordered map instead of a
      vector.  Hence, forward index references just add a sinle entry into
      the map.
      
      Note that this fix doesn't have a corresponding issue. However, the
      problem was made apparent from the problems noted in issues 4257 and
      4261.
      
      BUG=None
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1282523002 .
  17. 09 Aug, 2015 1 commit
    • Add the ARM32 FP register table entries, simple arith, and args. · 86ebec12
      Jan Voung authored
      Lower some instructions, without much guarantee of
      correctness. *Running* generated code will be risky
      because the register allocator isn't aware of register
      aliasing.
      
      Fill in v{add,div,mul,sub}.f{32,64}, vmov, vldr
      and vsqrt.f{32,64}. I tried to make the nacl-other-intrinsics
      test not explode, so added vsqrt too. That was pretty
      easy for sqrt, but then fabs tests also exploded. Those are not
      truly fixed but are currently "fixed" by adding a FakeDef to
      satisfy liveness.
      
      Propagate float/double arguments to the right register
      in lowerArguments, lowerCall, and propagate to s0/d0/q0
      for lowerReturn. May need to double check the calling convention.
      Currently can't test call-ret because vpush/vpop for prologues
      and epilogues isn't done.
      
      Legalize FP immediates to make the nacl-other-intrinsics sqrt
      test happy. Use the correct type of load (vldr (.32 and .64 are
      optional) instead of ldr{b,h,,d}).
      
      Whether or not the float/vector instructions can be
      predicated is a bit interesting. The float/double ones
      can, but the SIMD versions cannot. E.g.
      
      vadd<cond>.f32 s0, s0, s1 is okay
      vadd<cond>.f32 q0, q0, q1 is not okay.
      
      For now, just omit conditions from instructions that may
      end up being reused for SIMD.
      
      Split up the fp.pnacl.ll test into multiple ones so that
      parts of lowering can be tested incrementally.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1266263003 .
  18. 08 Aug, 2015 1 commit
  19. 07 Aug, 2015 3 commits
    • Fix processing of global variable indices in the global vars block. · aa0ce790
      Karl Schimpf authored
      The code used to use a vector to hold global variables associated with
      indices. The problem was that the count record in the global vars
      block would generate variables for the given count (even if very
      large).
      
      To fix this, we created a local unordered map to associate indices
      with defined/referenced globals. After processing the global vars
      block, this unordered map is used to verify the size makes sense, and
      then install the recognized global variables into the (top-level)
      contents.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4257
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1278793006 .
    • Inline memcpy for small constant sizes. · 9df4a379
      Andrew Scull authored
      Combined with memset inlining, this has shown an improvement of over 11% on
      the eon benchmark. This the only C++ program in spec2k and it seems C++
      programs have a significantly larger number of memset/memcpy calls. Other
      benchmarks also showed improvement of ~5% (perlbmk, parser) while most had
      a 1-2% improvement.
      
      This commit also includes a refactoring of lowerMemset which is much more
      readable and also removed the fake use of the destination pointer register.
      
      BUG=
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1279833005.
    • Subzero: Completely remove tracking of stack pointer live range. · f9df4523
      Jim Stichnoth authored
      Specifically, if a variable is marked with IgnoreLiveness=true, then:
        1. Completely avoid adding any segments to its live range
        2. Assert that no one tries to add segments to its live range
      
      This is done in part by incorporating Variable::IgnoreLiveness into Liveness::RangeMask.
      
      Also, change a functor into a lambda because C++11.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1273823003.
  20. 06 Aug, 2015 4 commits
  21. 05 Aug, 2015 3 commits
    • Subzero: Fix an Om1 crash from memset lowering. · f6f9825e
      Jim Stichnoth authored
      With a certain combination of memset arguments, legalizeToReg() is called but its result is unused.  Om1 register allocation does not like this because it sees an infinite-weight variable with a definition but no uses.  The simplest fix is to add a fake use.
      
      The problem shows up when building spec2k with -Om1.
      
      BUG= none
      R=ascull@google.com
      
      Review URL: https://codereview.chromium.org/1272823004.
    • Subzero: Slight improvement to phi lowering. · 552490c2
      Jim Stichnoth authored
      When doing post phi lowering register allocation, the original code limited register allocation to pre-colored or infinite-weight variables with a non-empty live range within the new edge-split nodes.  This limitation ends up missing some opportunities.  Specifically, when a temporary is introduced to break a dependency cycle, e.g.:
        // a = phi(b)
        // b = phi(a)
        t = a
        a = b
        b = t
      then t was always stack-allocated, even if a physical register was available.
      
      In the new design, the RangeMask bitvector specifies which variables should have their live ranges tracked and computed.  For normal liveness analysis, all variables are tracked.  For post phi lowering liveness analysis, all variables created from phi lowering, plus all pre-colored variables, plus all infinite-weight variables, are tracked.
      
      The result is slightly better code quality, and sometimes the frame size is 1 or 2 words smaller.
      
      The hope was to narrow the 10% translation-time degradation in pnacl-llc.pexe compared to the old, hackish way of phi lowering, but that goal still proves to be elusive.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1271923002.
    • Subzero. Implements x86-64 lowerCall. · e0d9afa8
      John Porto authored
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1266673003.