1. 20 Dec, 2014 1 commit
  2. 19 Dec, 2014 2 commits
    • Subzero: Use CFG-local arena allocation for relevant containers. · 31c95590
      Jim Stichnoth authored
      In particular, node lists for in and out edges of a CfgNode, and the live range segment list in a Variable.  This is done by making the Cfg allocator globally available through TLS, and providing the STL containers with an allocator struct that uses this.
      
      This also cleans up some other allocation-related issues:
      
      * The allocator is now hung off the Cfg via a pointer, rather than being embedded into the Cfg.  This allows a const Cfg pointer to be stored in TLS while still allowing its allocator to be mutated.
      
      * Cfg is now created via a static create() method.
      
      * The redundant Cfg::allocateInst<> methods are removed.
      
      * The Variable::asType() method allocates a whole new Variable from the Cfg arena, rather than allocating it on the stack, removing the need for the move constructor in Variable and Operand.  This is OK since asType() is only used for textual asm emission.
      
      * The same 1MB arena allocator is now used by the assembler as well.  The fact that it wasn't changed over to be the same as Cfg and GlobalContext was an oversight.  (It turns out this adds ~3MB to the translator memory footprint, so that could be tuned later.)
      
      BUG= none
      R=jfb@chromium.org, jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/802183004
    • Subzero: Randomize register assignment. · e6d24789
      Jim Stichnoth authored
      Randomize the order that registers appear in the free list.  Only
      randomize fully "equivalent" registers to ensure no extra spills.
      
      This adds the -randomize-regalloc option.
      
      This is a continuation of https://codereview.chromium.org/456033003/ which Matt owns.
      
      BUG= none
      R=jfb@chromium.org
      
      Review URL: https://codereview.chromium.org/807293003
  3. 15 Dec, 2014 4 commits
    • Remove TypeConverter and Module from minimal subzero build. · 4019f084
      Karl Schimpf authored
      Removes the need to model LLVM types from the minimal subzero build.
      It isn't removed from the nonminimal build because IceConverter still needs
      to be able to convert LLVM types to corresponding Ice types.
      
      Note that this CL reduces the size of Release+Min/llvm2ice (after
      strip) to about 638K bytes.
      
      BUG=None
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/805943002
    • Remove using LLVM tools to check correctness of cast operation. · bf170370
      Karl Schimpf authored
      Removes cast instruction checks (in PNaClTranslator.cpp) that used
      LLVM utilities to use locally defined methods instead. Remove the need
      to call naclbitc::DecodeCastOpcode and CastInst::castIsValid.
      
      Also removes two more calls to convertToLLVMType.
      
      BUG= None
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/794823002
    • Subzero: Clean up live range construction. · e5b73e6e
      Jim Stichnoth authored
      Moves the deletion of newly dead instructions into the main liveness() routine.  The old livenessPostProcess() routine is renamed and now used purely for live range construction.
      
      The hack is removed in which live in-args have a custom live range segment added to avoid an artifact of the live ranges.  It is replaced with a gentler hack that extends the instruction numbering range of the initial basic block to avoid the artifact.
      
      Since special live range segments no longer need to be prepended, the live range representation is simplified and we can always assume that segments are being appended, never prepended (and as before, never added to the middle).
      
      Some magic constants involving special instruction numbers are replaced with symbolic constants.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/802003003
    • Simplify LLVM's APInt and APFloat for use in Subzero. · 3281748c
      Karl Schimpf authored
      In Subzero, we only need to be able to convert 64 bit constants in
      bitcode files to the corresponding Ice integer or floating type. This
      CL extracts the minimal implementation needed for Subzero. The intent
      of this change is to remove loading unnecessary LLVM code into
      (minimal) llvm2ice.
      
      BUG=None
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/797323002
  4. 11 Dec, 2014 3 commits
  5. 10 Dec, 2014 2 commits
  6. 09 Dec, 2014 1 commit
  7. 08 Dec, 2014 1 commit
    • Subzero: Disable stats and timers under the MINIMAL build. · 1c44d819
      Jim Stichnoth authored
      Specifically, don't bother to collect "-timing" and "-szstats" information since they anyway don't get printed out under the MINIMAL build.  This is done by using the ALLOW_DUMP flag to guard whether code and timing stats are collected.  This ends up reducing the native translator size by about 3%.  ALLOW_DUMP is used as the guard since it already guards the output of the collected data - no sense collecting the data if it can never be printed out.
      
      To minimize the number of ALLOW_DUMP tests, we push the tests into the timing/stats class methods.
      
      BUG= none
      R=jvoung@chromium.org, kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/788713002
  8. 07 Dec, 2014 1 commit
  9. 06 Dec, 2014 1 commit
  10. 05 Dec, 2014 1 commit
  11. 04 Dec, 2014 4 commits
  12. 03 Dec, 2014 2 commits
  13. 02 Dec, 2014 1 commit
    • Subzero: Add basic ELFObjectWriter (text section, symtab, strtab, headers). · 08c3bcd6
      Jan Voung authored
      Able to write out the ELF file header w/ a text section,
      a symbol table, and string table. Write text buffer
      directly to file after translating each CFG.
      This means that the header is written out early w/ fake
      data and then we seek back and write the real header
      at the very end.
      
      Does not yet handle relocations, data, rodata, constant
      pools, bss, or -ffunction-sections, more than 64K sections
      or more than 2^24 symbols.
      
      Numbers w/ current NOASSERT=1 build on 176.gcc:
      
      w/out -elf-writer:
          0.233771 (21.1%): [ 1287] emit
          28MB .s file
      
      w/ -elf-writer:
          0.051056 ( 5.6%): [ 1287] emit
          2.4MB .o file
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/678533005
  14. 01 Dec, 2014 2 commits
  15. 26 Nov, 2014 1 commit
    • Subzero: Improve malloc/free behavior. · 9d801a01
      Jim Stichnoth authored
      Use a bigger block size in the bump-pointer allocators, since we
      basically know up front that we'll need lots of memory.  The 1MB value
      (versus the default of 4KB) was chosen somewhat arbitrarily, and
      succeeds in pretty much removing bump-pointer related mallocs from the
      profile.
      
      Pre-reserve the a priori known number of edges in getTerminatorEdges()
      to avoid vector resizing.
      
      BUG= none
      R=kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/760973002
  16. 24 Nov, 2014 1 commit
  17. 21 Nov, 2014 1 commit
  18. 20 Nov, 2014 1 commit
    • Subzero: Simplify the constant pools. · d2cb4361
      Jim Stichnoth authored
      Internally, create a separate constant pool for each integer type, instead of a single i64 pool that uses the Ice::Type value as part of the key.  This means each constant pool key can be a simple primitive value, rather than a tuple.
      
      Represent the pools using std::unordered_map instead of std::map since we're using C++11 now.
      
      Use signed integers instead of unsigned integers for the integer constant pools, to benefit from sign extension and to be more consistent.
      
      Remove the SuppressMangling field from hash and comparison functions on RelocatableTuple, since we'll never have two symbols with the same name but different values of SuppressMangling.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/737513008
  19. 18 Nov, 2014 1 commit
  20. 17 Nov, 2014 1 commit
  21. 14 Nov, 2014 4 commits
    • Subzero: Use the linear-scan register allocator for Om1 as well. · 70d0a054
      Jim Stichnoth authored
      This removes the need for Om1's postLower() code which did its own ad-hoc register allocation.  And it actually speeds up Om1 translation significantly.
      
      This mode of register allocation only allocates for infinite-weight Variables, while respecting live ranges of pre-colored Variables.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/733643005
    • Add irt_random to szbuild link line. · edc115ec
      Jan Voung authored
      Seems to be part of the non-sfi link now:
      https://codereview.chromium.org/686723003/diff/180001/pnacl/driver/pnacl-translate.py
      
      Otherwise I get:
      x86-32-linux/lib/unsandboxed_irt.o:(.rodata+0x68): undefined reference to `nacl_secure_random'
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/726093002
    • Subzero: Simplify the FakeKill instruction. · 87ff3a18
      Jim Stichnoth authored
      Even after earlier simplifications, FakeKill was still handled somewhat inefficiently for the register allocator.  For x86-32, any function containing call instructions would result in about 11 pre-colored Variables, each with an identical and relatively complex live range consisting of points.  They would start out on the UnhandledPrecolored list, then all move to the Inactive list, where they would be repeatedly compared against each register allocation candidate via overlapsRange().
      
      We improve this by keeping around a single copy of that live range and directly masking out the Free[] register set when that live range overlaps the current candidate's live range.  This saves ~10 overlaps() calculations per candidate while FakeKills are still pending.
      
      Also, slightly rearrange the initialization of the Unhandled etc. sets into a separate init routine, which will make it easier to reuse the register allocator in other situations such as Om1 post-lowering.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/720343003
    • Subzero: Auto-set -build-on-read=0 for .ll input files. · 51596d43
      Jim Stichnoth authored
      This is purely for convenience of personal testing/debugging.
      
      To demonstrate its correctness in this CL, -build-on-read=0 is removed
      from the two .ll lit tests that explicitly use it, and also from the
      crosstest.py script.  The lit test wrapper run-llvm2ice.py is left
      unchanged to be safe.
      
      BUG= none
      R=kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/732583002
  22. 11 Nov, 2014 1 commit
    • Subzero: Remove Variable::NeedsStackSlot. · 33c80641
      Jim Stichnoth authored
      Instead, separately compute it during prolog generation via another
      pass over the Cfg.
      
      This may slow down translation by ~1%, but it greatly simplifies the
      management of this flag/property.
      
      The higher motivation is to pull this management out of register
      allocation to make it easier to extend register allocation for other
      uses.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/692633004
  23. 06 Nov, 2014 3 commits