1. 02 Oct, 2015 3 commits
    • Change from ::stdout to stderr when reporting fatal error. · 4e6ea83a
      Karl Schimpf authored
      The pnacl linux x86_64 buildbot doesn't understand ::stdout (it uses
      a macro to define stdout). Fix by removing :: prefix. Also redirects
      the error messages to stderr instead of stdout.
      
      BUG=None
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1383053002 .
    • Remove dependence on header file unistd.h. · 7e64eaaa
      Karl Schimpf authored
      Fixes bug in function reportFatalErrorThenExitSuccess by using fwrite
      instead of write (a unix posix include file not supported by MSC).
      
      BUG=None
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1370323005 .
    • Subzero: Use register availability during lowering to improve the code. · 318f4cda
      Jim Stichnoth authored
      The problem is that given code like this:
      
        a = b + c
        d = a + e
        ...
        ... (use of a) ...
      
      Lowering may produce code like this, at least on x86:
      
        T1 = b
        T1 += c
        a = T1
        T2 = a
        T2 += e
        d = T2
        ...
        ... (use of a) ...
      
      If "a" has a long live range, it may not get a register, resulting in clumsy code in the middle of the sequence like "a=reg; reg=a".  Normally one might expect store forwarding to make the clumsy code fast, but it does presumably add an extra instruction-retirement cycle to the critical path in a pointer-chasing loop, and makes a big difference on some benchmarks.
      
      The simple fix here is, at the end of lowering "a=b+c", keep track of the final "a=T1" assignment.  Then, when lowering "d=a+e" and we look up "a", we can substitute "T1".  This slightly increases the live range of T1, but it does a great job of avoiding the redundant reload of the register from the stack location.
      
      A more general fix (in the future) might be to do live range splitting and let the register allocator handle it.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
      R=kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/1385433002 .
  2. 01 Oct, 2015 7 commits
    • Subzero. Adds I64 register pairs for ARM32. · ed2c06b2
      John Porto authored
      This is in preparation for llvm.nacl.atomic.* lowerings. atomic i64
      loads and stores require their operands to be consecutive registers
      starting at an even register that is not r14.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
      R=kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/1382063002 .
    • Subzero. Fixes a bug in the register allocator. · 7cb12682
      John Porto authored
      This bug was uncovered While implementing the llvm.nacl.atomic.cmpxchg
      lowering for i64 for ARM32. For reference, the lowering is
      
      retry:
          ldrexd     tmp_i, tmp_i+1 [addr]
          cmp        tmp_i+1, expected_i+1
          cmpeq      tmp_i, expected_i
          strexdeq   success, new_i, new_i+1, [addr]
          movne      expected_i+1, tmp_i+1
          movne      expected_i, tmp_i
          cmpeq      success, #0
          bne        retry
          mov        dest_i+1, tmp_i+1
          mov        dest_i, tmp_i
      
      The register allocator would allocate r4 to both success and new_i,
      which is clearly wrong (expected_i is alive thought the cmpxchg loop.)
      Adding a fake-use(new_i) after the loop caused the register allocator
      to fail due to the impossibility to allocate a register for an infinite
      weight register. The problem was being caused for not evicting live
      ranges that were assigned registers that alias the selected register.
      
      BUG=
      R=kschimpf@google.com, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1373823006 .
    • Subzero. Adds ldrex, strex, and dmb support (ARM32) · 16991847
      John Porto authored
      These instructions are used to load/store data atomically, and to
      notify the processor about a data memory barrier. They are used for
      implementing the llvm.nacl.atomic.* lowerings.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1378303003 .
    • Add include files so that IceCompilerServer.cpp can compile on MSC. · 166cbf4a
      Karl Schimpf authored
      A recent change to IceCompilerServer.cpp was added to allow fatal
      errors to return exit status zero. However, this code called ::write
      (a C function) that is not defined when compiling with MSC. This CL
      adds includes to fix this problem.
      
      BUG=None
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1379613005 .
    • Subzero: Fix a bug in register allocator overlap computation. · 48e3ae5c
      Jim Stichnoth authored
      When the register allocator decides whether to allow the candidate's live range to overlap its preferred variable's live range (and share their register), it needs to consider whether any redefinitions in one variable occur within the live range of the other variable, in which case overlap should not be allowed.
      
      There was a bug in the API for iterating over the defining instructions for a variable, in which the earliest definition might be ignored in some cases.  This came from the fact that the first definition and latter definitions are split apart for translation speed reasons, and a particular API is needed for finding an unambiguous first definition, which is possible when all definitions are within a single block but not so possible when definitions cross block boundaries.  (This only happens for the simple phi lowering.)
      
      Since both semantics are needed, a separate API is added to support both.
      
      For spec2k, the asm output is identical to before, so this changes nothing.  When translating spec2k with "-O2 -phi-edge-split=0", there is a single minor difference in ammp that actually looks legit in both cases.
      
      However, when testing an upcoming CL, -phi-edge-split=0 triggered the bug, causing gcc and crafty to fail with incorrect output.
      
      This CL also fixes some minor issues, and adds dump output of the instruction definition list when available.
      
      BUG= none
      R=jpp@chromium.org
      
      Review URL: https://codereview.chromium.org/1381563004 .
    • Subzero: Add missing content to CfgLocalAllocator. · 91d1b80f
      Jim Stichnoth authored
      The std::list<> implementation used by g++ needs some extra stuff defined in the custom allocator.
      
      This can be smoke-tested with:
      
        make -f Makefile.standalone CXX=g++ LLVM_EXTRA_WARNINGS="-Wno-unknown-pragmas -Wno-unused-parameter -Wno-comment -Wno-enum-compare -Wno-strict-aliasing" STDLIB_FLAGS=
      
      until the link fails for unrelated reasons.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4325
      R=kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/1367403004 .
    • Subzero: Change -asm-verbose output to print more useful info. · 238b4c16
      Jim Stichnoth authored
      Frame offsets for variables are emitted using a symbolic name based on the variable's name.  This makes it a bit easier to digest the asm code.
      
      For example, if variable Foo gets an esp offset 24, asm like this:
        ... 24(%esp) ...
      will instead be emitted like this:
        lv$Foo = 24
        ...
        ... lv$Foo(%esp) ...
      
      Predecessor labels are printed for each basic block.
      
      Loop nest depth is printed for each basic block.  (Would be nice if we had loop header info as well.)
      
      BUG= none
      R=jpp@chromium.org
      
      Review URL: https://codereview.chromium.org/1377323002 .
  3. 30 Sep, 2015 2 commits
  4. 28 Sep, 2015 2 commits
  5. 26 Sep, 2015 1 commit
    • Subzero: Improve usability of liveness-related tools. · 230d4101
      Jim Stichnoth authored
      1. Rename all identifiers containing "nonkillable" to use the more understandable "redefined".
      
      2. Change inferTwoAddress() to be called inferRedefinition(), and to check *all* instruction source variables (instead of just the first source operand) against the Dest variable.  This eliminates the need for several instances of _set_dest_redefined().  The performance impact on translation time is something like 0.1%, which is dwarfed by the usability gain.
      
      3. Change a cryptic assert in (O2) live range construction to print detailed information on the liveness errors.
      
      4. Change a cryptic assert in (Om1) live range construction to do the same.
      
      BUG= none
      R=jpp@chromium.org
      
      Review URL: https://codereview.chromium.org/1368993004 .
  6. 25 Sep, 2015 1 commit
  7. 24 Sep, 2015 1 commit
  8. 23 Sep, 2015 2 commits
    • Subzero: Improve handling of alloca instructions of constant size. · 55f931f6
      Jim Stichnoth authored
      PNaCl simplifies varargs calls by creating a known-size argument array with an alloca instruction, and passing the address of that argument array.  These alloca instructions don't necessarily require use of a frame pointer, freeing up the frame pointer register for normal register allocation.
      
      These varargs calls sometimes show up in cold paths of hot functions, so increasing the number of registers available to the register allocator can produce tangible gains.
      
      This patch does a simple recognition of these alloca patterns, and on x86 doesn't force a frame pointer if all alloca instructions are suitable.
      
      Future work is to avoid saving the alloca result as a local variable, and instead rematerialize the address as needed with respect to the stack or frame pointer.
      
      BUG= none
      R=jpp@chromium.org
      
      Review URL: https://codereview.chromium.org/1361803002 .
    • Subzero: Enable the asm-verbose.ll test for ARM32. · 467a222f
      Jim Stichnoth authored
      BUG= none
      R=jpp@chromium.org
      
      Review URL: https://codereview.chromium.org/1356953003 .
  9. 22 Sep, 2015 4 commits
  10. 21 Sep, 2015 1 commit
  11. 18 Sep, 2015 6 commits
  12. 17 Sep, 2015 2 commits
  13. 16 Sep, 2015 7 commits
  14. 15 Sep, 2015 1 commit
    • Subzero: Add a flag to mock up bounds checking on unsafe references. · ad2989b6
      Jim Stichnoth authored
      The idea is that, before each load or store operation, we add a couple of compares/branches against the load/store address, one for the lower bound and one for the upper bound.  The conditional branches would be to an error throwing routine, and would never be taken in practice.  The compares might be against an immediate or a global location.  So a load of [reg] will mock-expand to this:
      
        cmp reg, 0
        je label
        cmp reg, 1
        je label
      label:
        mov xxx, [reg]
      
      We also make address mode inference less aggressive, because for a load of e.g. [eax+4*ecx], we can't compare that address expression against anything in any instruction, so we would have to reconstruct the address and undo at least part of the address mode inference.
      
      The bounds-check mock is added for loads, stores, and rmw operations (with an exclusion for stores to the stack for out-arg pushes).  There are probably a small handful of other cases that are missing the bounds check, but if we add the transformation inside legalize(), which is the most obvious place, we may add extra bounds checks because sometimes legalize() is called twice on the same operand.
      
      BUG= none
      R=ascull@google.com
      
      Review URL: https://codereview.chromium.org/1338633005 .