1. 01 Feb, 2015 2 commits
  2. 31 Jan, 2015 2 commits
    • Subzero: Fix stats collection and output for multithreading. · a1dd3cc8
      Jim Stichnoth authored
      Updates of current-function and cumulative stats are done entirely in TLS.  At the end, cumulative stats are merged across all threads' TLS into the global cumulative stats.
      
      Printing of cumulative stats after every function is removed, since there's very little value from that.  It was probably done in the first place just to give partial cumulative information in the face of crashes or assertion failures.
      
      BUG= none
      R=jfb@chromium.org
      
      Review URL: https://codereview.chromium.org/887213002
    • Fix subzero Windows build · ae6e12ca
      JF Bastien authored
      MinGW's GCC 4.8.1 was sad because SectionType was shadowing the other SectionType. Also, the enum's values are in the ELFObjectWriter namespace, not ELFObjectWriter::SectionType.
      
      R=stichnot@chromium.org, jvoung@chromium.org
      BUG= Windows build is sad
      
      Review URL: https://codereview.chromium.org/891953002
  3. 30 Jan, 2015 2 commits
    • Subzero: Fix timers for multithreaded translation. · 380d7b96
      Jim Stichnoth authored
      Now that multithreaded parsing and translation is in place, timer operations have to be made thread-local.  After the non-main threads end, their thread-local timer data needs to be merged into the global timer data, which resides in the GlobalContext object.  The merge is a bit tricky because the internal timer stack structure is built up dynamically as items are pushed and popped.  Two threads may have radically different timing data:
      
      1. The parser thread profile is completely different from a translator thread.
      
      2. For -timing-funcs, two translator threads hold data for entirely different sets of functions.
      
      A bit more tweaking will need to be done to make the timing output fully usable in a multithreaded run.  Because of multiple threads, times may add up to >100%.  Also, time spent blocked is being "unfairly" attributed to the caller of the blocking operation - we should either count the user time instead of wall-clock time, or add a special timer marker for blocking locking operations.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/878383004
    • Subzero: Minor Makefile fix. · 51d00936
      Jim Stichnoth authored
      The problem showed up after the link step failed, in which case $(OBJDIR)/llvm2ice was deleted but the ./llvm2ice symlink still existed.  A subsequent "make check-lit" or "make check" would fail, so the basic "make" would have to be done first.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/887873002
  4. 29 Jan, 2015 1 commit
    • Write out global initializers and data rel directly to ELF file. · 72984d88
      Jan Voung authored
      The local symbol relocations are a bit different from
      llvm-mc, which are section-relative. E.g., instead "bytes",
      it will be ".data + offsetof(bytes, .data)". So the
      contents of the text/data/rodata sections can also differ
      since the offsets written in place are different.
      
      Still need to fill the symbol table with undefined
      symbols (e.g., memset, and szrt lib functions) before
      trying to link.
      
      BUG=none
      R=kschimpf@google.com, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/874353006
  5. 28 Jan, 2015 6 commits
  6. 27 Jan, 2015 3 commits
    • Fix pedantic build warnings; · 8427ea2b
      JF Bastien authored
      GCC 4.8.1 is sad;
      
      There are extra semicolons in Subzero;
      
      It removes the semicolons or it gets the build warning hose again;^H
      
      R=stichnot@chromium.org
      BUG= none
      
      Review URL: https://codereview.chromium.org/882743003
    • Subzero: Use a "known" version of clang-format. · dd842dbb
      Jim Stichnoth authored
      There are two problems with "make format" and "make format-diff" in
      Makefile.standalone:
      
      1. You have to make sure clang-format and clang-format-diff.py are
      available in $PATH.
      
      2. Different users may have different versions installed (even for the
      same user on different machines), leading to whitespace wars.  Can't we
      all just get along?
      
      Since the normal LLVM build that Subzero depends on also exposes and
      builds clang-format and friends, we might as well use it.  The
      clang-format binary is found in $LLVM_BIN_PATH, and clang-format-diff.py
      is found relative to $LLVM_SRC_PATH.  As long as the user's LLVM build
      is fairly up to date, whitespace wars are unlikely.
      
      Given this, there's a much higher incentive to use "make format"
      regularly instead of "make format-diff".  In particular, inline comments
      on variable/field declaration lists can get lined up more nicely by
      looking at the entire context, rather than the small diff window.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/877003003
    • Subzero: Initial implementation of multithreaded translation. · fa4efea5
      Jim Stichnoth authored
      Provides a single-producer, multiple-consumer translation queue where the number of translation threads is given by the -threads=N argument.  The producer (i.e., bitcode parser) blocks if the queue size is >=N, in order to control the memory footprint.  If N=0 (which is the default), execution is purely single-threaded.  If N=1, there is a single translation thread running in parallel with the parser thread.  "make check" succeeds with the default changed to N=1.
      
      Currently emission is also done by the translation thread, which limits scalability since the emit stream has to be locked.  Also, since the ELF writer stream is not locked, it won't be safe to use N>1 with the ELF writer.  Furthermore, for N>1, emitted function ordering is nondeterministic and needs to be recombobulated.  This will all be fixed in a follow-on CL.
      
      The -timing option is broken for N>0.  This will be fixed in a follow-on CL.
      
      Verbose flags are now managed in the Cfg instead of (or in addition to) the GlobalContext, due to the -verbose-focus option which wants to temporarily change the verbose level for a particular function.
      
      TargetLowering::emitConstants() and related methods are changed to be static, so that a valid TargetLowering object isn't required.  This is because the TargetLowering object wants to hold a valid Cfg, and none really exists after all functions are translated and the constant pool is ready for emission.
      
      The Makefile.standalone now has a TSAN=1 option to enable ThreadSanitizer.
      
      BUG= none
      R=jfb@chromium.org
      
      Review URL: https://codereview.chromium.org/870653002
  7. 26 Jan, 2015 1 commit
  8. 25 Jan, 2015 1 commit
    • Make use of BSS more explicit in global initializers (vs a local .comm). · fed97aff
      Jan Voung authored
      This reduces the number of conditionals, and will more closely reflect
      the structure of the ELF writer's version of the same thing.
      Without fdata-sections, the ELF writer version will have to batch all
      initializers of a certain type so that they can be contiguous on the file
      and the overall alignment can be determined.
      
      A downside of this is that, .s files will be different from llc's output.
      The spec .o and executables are identical before/after the change.
      
      BUG=none
      R=kschimpf@google.com, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/870123003
  9. 23 Jan, 2015 1 commit
  10. 22 Jan, 2015 2 commits
  11. 20 Jan, 2015 3 commits
    • Subzero: Remove the GlobalContext::GlobalDeclarations vector. · a086b913
      Jim Stichnoth authored
      Elements were added to this vector, but never inspected, so it is
      essentially a useless field.  Plus, the removal allows us to remove a
      couple of friend declarations.
      
      BUG=none
      R=kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/814163004
    • Subzero: Add locking to prepare for multithreaded translation. · e4a8f400
      Jim Stichnoth authored
      This just gets the locking in place.  Actual multithreading will be added later.
      
      Mutexes are added for accessing the GlobalContext allocator, the constant pool, the stats data, and the profiling timers.  These are managed via the LockedPtr<> helper.  Finer grain locks on the constant pool may be added later, i.e. a separate lock for each data type.
      
      An vector of pointers to TLS objects is added to GlobalContext.  Each new thread will get its own TLS object, whose address is added to the vector.  (After threads complete, things like stats can be combined by iterating over the vector.)
      
      The dump/emit streams are guarded by a separate lock, to avoid fine-grain interleaving of output by multiple threads.  E.g., lock the streams, emit an entire function, and unlock the streams.  This works for dumping too, though dump output for different passes on the same function may be interleaved with that of another thread.  There is an OstreamLocker helper class to keep this simple.
      
      CodeStats is made an inner class of GlobalContext (this was missed on a previous CL).
      
      BUG= none
      R=jfb@chromium.org, jvoung@chromium.org, kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/848193003
    • Add instruction alignment tests to unit tests. · af238b25
      Karl Schimpf authored
      BUG=None
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/848473002
  12. 15 Jan, 2015 1 commit
    • Subzero: Remove the IceV_RegManager enum value. · 769be681
      Jim Stichnoth authored
      This hasn't been used in a very long time, and there's no intention of using it again.
      
      Originally there was the idea of a "fast" block-local register allocator for an O1-like configuration, which would allocate registers for infinite-weight temporaries during target lowering, using a "local register manager".  This verbose option was for tracing execution of this register manager.  However, by now it seems unlikely that this would do a better/faster job than the current Om1 register allocation approach, which reuses the linear-scan code quite effectively and does very well at separation of concerns.  So adios IceV_RegManager!
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/831663008
  13. 13 Jan, 2015 1 commit
    • Start writing out some relocation sections (text). · ec270731
      Jan Voung authored
      Pass the full assembler pointer to the elf writer, so
      that it has access to both the text buffer and the fixups.
      
      Remove some child classes of AssemblerFixups. They didn't
      really do much, and were pretty much identical to the
      original AssemblerFixup class. Dart had a virtual method
      for fixups to do necessary patching, but we currently
      don't do the patching and just emit the relocations.
      TODO see if patching is more efficient than writing out
      relocations and letting the linker do the work.
      
      This CL also makes AssemblerFixups POD.
      
      Change the fixup kind to be a plain unsigned int, which
      the target can fill w/ target/container-specific values.
      
      Move the fwd declaration of Assembler to IceDefs and remove
      the others. Do similar for fwd declaration refactoring for
      ELFWriter.
      
      Make the createAssembler method return a std::unique_ptr.
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/828873002
  14. 12 Jan, 2015 2 commits
  15. 09 Jan, 2015 6 commits
  16. 23 Dec, 2014 1 commit
  17. 20 Dec, 2014 1 commit
  18. 19 Dec, 2014 2 commits
    • Subzero: Use CFG-local arena allocation for relevant containers. · 31c95590
      Jim Stichnoth authored
      In particular, node lists for in and out edges of a CfgNode, and the live range segment list in a Variable.  This is done by making the Cfg allocator globally available through TLS, and providing the STL containers with an allocator struct that uses this.
      
      This also cleans up some other allocation-related issues:
      
      * The allocator is now hung off the Cfg via a pointer, rather than being embedded into the Cfg.  This allows a const Cfg pointer to be stored in TLS while still allowing its allocator to be mutated.
      
      * Cfg is now created via a static create() method.
      
      * The redundant Cfg::allocateInst<> methods are removed.
      
      * The Variable::asType() method allocates a whole new Variable from the Cfg arena, rather than allocating it on the stack, removing the need for the move constructor in Variable and Operand.  This is OK since asType() is only used for textual asm emission.
      
      * The same 1MB arena allocator is now used by the assembler as well.  The fact that it wasn't changed over to be the same as Cfg and GlobalContext was an oversight.  (It turns out this adds ~3MB to the translator memory footprint, so that could be tuned later.)
      
      BUG= none
      R=jfb@chromium.org, jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/802183004
    • Subzero: Randomize register assignment. · e6d24789
      Jim Stichnoth authored
      Randomize the order that registers appear in the free list.  Only
      randomize fully "equivalent" registers to ensure no extra spills.
      
      This adds the -randomize-regalloc option.
      
      This is a continuation of https://codereview.chromium.org/456033003/ which Matt owns.
      
      BUG= none
      R=jfb@chromium.org
      
      Review URL: https://codereview.chromium.org/807293003
  19. 15 Dec, 2014 2 commits