1. 12 Feb, 2015 1 commit
    • Subzero: Emit functions and global initializers in a separate thread. · bbca754a
      Jim Stichnoth authored
      (This is a continuation of https://codereview.chromium.org/876083007/ .)
      
      Emission is done in a separate thread when -threads=N with N>0 is specified.  This includes both functions and global initializers.
      
      Emission is deterministic.  The parser assigns sequence numbers, and the emitter thread reassembles work units into their original order, regardless of the number of threads.
      
      Dump output, however, is not intended to be in deterministic, reassembled order.  As such, lit tests that test dump output (i.e., '-verbose inst') are explicitly run with -threads=0.
      
      For -elf-writer and -ias=1, the translator thread invokes Cfg::emitIAS() and the assembler buffer is passed to the emitter thread.  For -ias=0, the translator thread passed the Cfg to the emitter thread which then invokes Cfg::emit() to produce the textual asm.
      
      Minor cleanup along the way:
        * Removed Flags from the Ice::Translator object and ctor, since it was redundant with Ctx->getFlags().
        * Cfg::getAssembler<> is the same as Cfg::getAssembler<Assembler> and is useful for just passing the assembler around.
        * Removed the redundant Ctx argument from TargetDataLowering::lowerConstants() .
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4075
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/916653004
  2. 10 Feb, 2015 1 commit
    • Fix PNaCl bitcode reader to release global variables to emitter. · 6ca7d2b6
      Karl Schimpf authored
      Fixes the PNaCl bitcode reader to maintain two lists of global
      variables. The first, VariableDeclarations, is the list of
      variable declarations to be lowered by the emitter. The second,
      ValueIDConstants, is the corresponding constant symbol to use
      when references to the corresponding global variable declaration
      is referenced when processing functions.
      
      BUG=None
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/883673005
  3. 09 Feb, 2015 1 commit
  4. 06 Feb, 2015 1 commit
  5. 05 Feb, 2015 1 commit
  6. 04 Feb, 2015 1 commit
    • Add comment for the forked Dart revision for the assembler code. · 33a5f41d
      Jan Voung authored
      Also note to keep that up to date.
      See also Patch set 1 of https://codereview.chromium.org/574133002/,
      vs later patch sets.
      
      Some things that were changed:
      (*) Headers / constants use Ice version (RegX8632::Encoded_Reg_eax vs EAX), (KB / MB -> other...)
      (*) Use llvm/Subzero allocator instead of Dart one.
      (*) Class/Field/On-stack-replacement/Dart runtime stuff is removed
      (*) Relocation/Fixups are now POD -- rather than a class
      with a virtual method for fixup. For now, we write out
      an ELF relocation, but later we may do a target pass
      to handle function calls within the same section, etc.
      (*) ASSERT -> assert
      (*) uword -> uintptr_t (should check).
      (*) clang-format
      (*) ???
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/901453002
  7. 03 Feb, 2015 4 commits
  8. 01 Feb, 2015 2 commits
  9. 31 Jan, 2015 2 commits
    • Subzero: Fix stats collection and output for multithreading. · a1dd3cc8
      Jim Stichnoth authored
      Updates of current-function and cumulative stats are done entirely in TLS.  At the end, cumulative stats are merged across all threads' TLS into the global cumulative stats.
      
      Printing of cumulative stats after every function is removed, since there's very little value from that.  It was probably done in the first place just to give partial cumulative information in the face of crashes or assertion failures.
      
      BUG= none
      R=jfb@chromium.org
      
      Review URL: https://codereview.chromium.org/887213002
    • Fix subzero Windows build · ae6e12ca
      JF Bastien authored
      MinGW's GCC 4.8.1 was sad because SectionType was shadowing the other SectionType. Also, the enum's values are in the ELFObjectWriter namespace, not ELFObjectWriter::SectionType.
      
      R=stichnot@chromium.org, jvoung@chromium.org
      BUG= Windows build is sad
      
      Review URL: https://codereview.chromium.org/891953002
  10. 30 Jan, 2015 2 commits
    • Subzero: Fix timers for multithreaded translation. · 380d7b96
      Jim Stichnoth authored
      Now that multithreaded parsing and translation is in place, timer operations have to be made thread-local.  After the non-main threads end, their thread-local timer data needs to be merged into the global timer data, which resides in the GlobalContext object.  The merge is a bit tricky because the internal timer stack structure is built up dynamically as items are pushed and popped.  Two threads may have radically different timing data:
      
      1. The parser thread profile is completely different from a translator thread.
      
      2. For -timing-funcs, two translator threads hold data for entirely different sets of functions.
      
      A bit more tweaking will need to be done to make the timing output fully usable in a multithreaded run.  Because of multiple threads, times may add up to >100%.  Also, time spent blocked is being "unfairly" attributed to the caller of the blocking operation - we should either count the user time instead of wall-clock time, or add a special timer marker for blocking locking operations.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/878383004
    • Subzero: Minor Makefile fix. · 51d00936
      Jim Stichnoth authored
      The problem showed up after the link step failed, in which case $(OBJDIR)/llvm2ice was deleted but the ./llvm2ice symlink still existed.  A subsequent "make check-lit" or "make check" would fail, so the basic "make" would have to be done first.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/887873002
  11. 29 Jan, 2015 1 commit
    • Write out global initializers and data rel directly to ELF file. · 72984d88
      Jan Voung authored
      The local symbol relocations are a bit different from
      llvm-mc, which are section-relative. E.g., instead "bytes",
      it will be ".data + offsetof(bytes, .data)". So the
      contents of the text/data/rodata sections can also differ
      since the offsets written in place are different.
      
      Still need to fill the symbol table with undefined
      symbols (e.g., memset, and szrt lib functions) before
      trying to link.
      
      BUG=none
      R=kschimpf@google.com, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/874353006
  12. 28 Jan, 2015 6 commits
  13. 27 Jan, 2015 3 commits
    • Fix pedantic build warnings; · 8427ea2b
      JF Bastien authored
      GCC 4.8.1 is sad;
      
      There are extra semicolons in Subzero;
      
      It removes the semicolons or it gets the build warning hose again;^H
      
      R=stichnot@chromium.org
      BUG= none
      
      Review URL: https://codereview.chromium.org/882743003
    • Subzero: Use a "known" version of clang-format. · dd842dbb
      Jim Stichnoth authored
      There are two problems with "make format" and "make format-diff" in
      Makefile.standalone:
      
      1. You have to make sure clang-format and clang-format-diff.py are
      available in $PATH.
      
      2. Different users may have different versions installed (even for the
      same user on different machines), leading to whitespace wars.  Can't we
      all just get along?
      
      Since the normal LLVM build that Subzero depends on also exposes and
      builds clang-format and friends, we might as well use it.  The
      clang-format binary is found in $LLVM_BIN_PATH, and clang-format-diff.py
      is found relative to $LLVM_SRC_PATH.  As long as the user's LLVM build
      is fairly up to date, whitespace wars are unlikely.
      
      Given this, there's a much higher incentive to use "make format"
      regularly instead of "make format-diff".  In particular, inline comments
      on variable/field declaration lists can get lined up more nicely by
      looking at the entire context, rather than the small diff window.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/877003003
    • Subzero: Initial implementation of multithreaded translation. · fa4efea5
      Jim Stichnoth authored
      Provides a single-producer, multiple-consumer translation queue where the number of translation threads is given by the -threads=N argument.  The producer (i.e., bitcode parser) blocks if the queue size is >=N, in order to control the memory footprint.  If N=0 (which is the default), execution is purely single-threaded.  If N=1, there is a single translation thread running in parallel with the parser thread.  "make check" succeeds with the default changed to N=1.
      
      Currently emission is also done by the translation thread, which limits scalability since the emit stream has to be locked.  Also, since the ELF writer stream is not locked, it won't be safe to use N>1 with the ELF writer.  Furthermore, for N>1, emitted function ordering is nondeterministic and needs to be recombobulated.  This will all be fixed in a follow-on CL.
      
      The -timing option is broken for N>0.  This will be fixed in a follow-on CL.
      
      Verbose flags are now managed in the Cfg instead of (or in addition to) the GlobalContext, due to the -verbose-focus option which wants to temporarily change the verbose level for a particular function.
      
      TargetLowering::emitConstants() and related methods are changed to be static, so that a valid TargetLowering object isn't required.  This is because the TargetLowering object wants to hold a valid Cfg, and none really exists after all functions are translated and the constant pool is ready for emission.
      
      The Makefile.standalone now has a TSAN=1 option to enable ThreadSanitizer.
      
      BUG= none
      R=jfb@chromium.org
      
      Review URL: https://codereview.chromium.org/870653002
  14. 26 Jan, 2015 1 commit
  15. 25 Jan, 2015 1 commit
    • Make use of BSS more explicit in global initializers (vs a local .comm). · fed97aff
      Jan Voung authored
      This reduces the number of conditionals, and will more closely reflect
      the structure of the ELF writer's version of the same thing.
      Without fdata-sections, the ELF writer version will have to batch all
      initializers of a certain type so that they can be contiguous on the file
      and the overall alignment can be determined.
      
      A downside of this is that, .s files will be different from llc's output.
      The spec .o and executables are identical before/after the change.
      
      BUG=none
      R=kschimpf@google.com, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/870123003
  16. 23 Jan, 2015 1 commit
  17. 22 Jan, 2015 2 commits
  18. 20 Jan, 2015 3 commits
    • Subzero: Remove the GlobalContext::GlobalDeclarations vector. · a086b913
      Jim Stichnoth authored
      Elements were added to this vector, but never inspected, so it is
      essentially a useless field.  Plus, the removal allows us to remove a
      couple of friend declarations.
      
      BUG=none
      R=kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/814163004
    • Subzero: Add locking to prepare for multithreaded translation. · e4a8f400
      Jim Stichnoth authored
      This just gets the locking in place.  Actual multithreading will be added later.
      
      Mutexes are added for accessing the GlobalContext allocator, the constant pool, the stats data, and the profiling timers.  These are managed via the LockedPtr<> helper.  Finer grain locks on the constant pool may be added later, i.e. a separate lock for each data type.
      
      An vector of pointers to TLS objects is added to GlobalContext.  Each new thread will get its own TLS object, whose address is added to the vector.  (After threads complete, things like stats can be combined by iterating over the vector.)
      
      The dump/emit streams are guarded by a separate lock, to avoid fine-grain interleaving of output by multiple threads.  E.g., lock the streams, emit an entire function, and unlock the streams.  This works for dumping too, though dump output for different passes on the same function may be interleaved with that of another thread.  There is an OstreamLocker helper class to keep this simple.
      
      CodeStats is made an inner class of GlobalContext (this was missed on a previous CL).
      
      BUG= none
      R=jfb@chromium.org, jvoung@chromium.org, kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/848193003
    • Add instruction alignment tests to unit tests. · af238b25
      Karl Schimpf authored
      BUG=None
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/848473002
  19. 15 Jan, 2015 1 commit
    • Subzero: Remove the IceV_RegManager enum value. · 769be681
      Jim Stichnoth authored
      This hasn't been used in a very long time, and there's no intention of using it again.
      
      Originally there was the idea of a "fast" block-local register allocator for an O1-like configuration, which would allocate registers for infinite-weight temporaries during target lowering, using a "local register manager".  This verbose option was for tracing execution of this register manager.  However, by now it seems unlikely that this would do a better/faster job than the current Om1 register allocation approach, which reuses the linear-scan code quite effectively and does very well at separation of concerns.  So adios IceV_RegManager!
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/831663008
  20. 13 Jan, 2015 1 commit
    • Start writing out some relocation sections (text). · ec270731
      Jan Voung authored
      Pass the full assembler pointer to the elf writer, so
      that it has access to both the text buffer and the fixups.
      
      Remove some child classes of AssemblerFixups. They didn't
      really do much, and were pretty much identical to the
      original AssemblerFixup class. Dart had a virtual method
      for fixups to do necessary patching, but we currently
      don't do the patching and just emit the relocations.
      TODO see if patching is more efficient than writing out
      relocations and letting the linker do the work.
      
      This CL also makes AssemblerFixups POD.
      
      Change the fixup kind to be a plain unsigned int, which
      the target can fill w/ target/container-specific values.
      
      Move the fwd declaration of Assembler to IceDefs and remove
      the others. Do similar for fwd declaration refactoring for
      ELFWriter.
      
      Make the createAssembler method return a std::unique_ptr.
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/828873002
  21. 12 Jan, 2015 2 commits
  22. 09 Jan, 2015 2 commits
    • Make fixups reference any constant (allow const float/double pool literals). · 1d62cf08
      Jan Voung authored
      This avoids doing getConstantSym to avoid hitting the global
      context's getConstantSym during emitIAS(), which may be desirable for
      multi-threading, since each function's emitIAS() should be able to happen
      on a separate thread.
      
      The stringification is moved till later, so it still happens, just without
      creating a constant relocatable w/ offset of 0.
      
      This ends up tickling an issue where -O0 on 252.eon now gets 2x as many
      page faults, and I'm not sure exactly why. This makes the overall time
      higher, though emit time is lower.
      
      When translating with -O2 # of page faults is about the same before/after,
      so that oddness is restricted to O0.
      
      Before this change, tweaking the slab size at O0 doesn't
      seem to affect as drastically as 2x swings either.
      
      To work around this, I turned the slab size of the assembler down to 32KB.
      
      ===
      
      Move all the .L$type$poolid into a function (replacing getPoolEntryID).
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/837553009
    • Add ability to test parsing of bitcode records in Subzero. · 2e7daeff
      Karl Schimpf authored
      Extends the NaCl bitcode munger so that the PNaClTranslator parser
      can be applied to the defined sequence of record values.
      
      BUG=None
      R=jvoung@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/800883006