1. 12 May, 2015 2 commits
  2. 07 May, 2015 1 commit
  3. 04 May, 2015 1 commit
    • Subzero: Use a setcc sequence for better icmp lowering. · f48b320c
      Jim Stichnoth authored
      For an example like:
        %a = icmp eq i32 %b, %c
      
      The original icmp lowering sequence for i8/i16/i32 was something like:
      
        cmpl b, c
        movb 1, a
        je label
        movb 0, a
      label:
      
      The improved sequence is:
        cmpl b, c
        sete a
      
      In O2 mode, this doesn't help when successive compare/branch instructions are fused, but it does help when the boolean result needs to be saved and later used.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1118353005
  4. 30 Apr, 2015 3 commits
  5. 29 Apr, 2015 1 commit
    • Subzero: Produce actually correct code in --asm-verbose mode. · 76dcf1a8
      Jim Stichnoth authored
      The "pnacl-sz --asm-verbose=1" mode annotates the asm output with physical register liveness information, including which registers are live at the beginning and end of each basic block, and which registers' live ranges end at each instruction.  Computing this information requires a final liveness analysis pass.  One of the side effects of liveness analysis is to remove dead instructions, which happens when the instruction's dest variable is not live and the instruction lacks important side effects.
      
      In some cases, direct manipulation of physical registers was missing extra fakedef/fakeuse/etc., and as as result these instructions could be eliminated, leading to incorrect code.  Without --asm-verbose, these instructions were being created after the last run of liveness analysis, so they had no chance of being eliminated and everything was fine.  But with --asm-verbose, some instructions would be eliminated.
      
      This CL fixes the omissions so that the resulting code is runnable.
      
      An alternative would be to add a flag to liveness analysis directing it not to dead-code eliminate any more instructions.  However, it's better to get the liveness right in case future late-stage optimizations rely on it.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4135
      TEST= pydir/szbuild_spec2k.py --filetype=asm -v --sz=--asm-verbose=1 --force
      R=jvoung@chromium.org, kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/1113683002
  6. 28 Apr, 2015 1 commit
    • Subzero: Fix asm (non-ELF) output files. · 620ad732
      Jim Stichnoth authored
      In an earlier version of Subzero, the text output stream object was
      stack-allocated within main.  A later refactoring moved its allocation
      into a helper function, but it was still being stack-allocated, which
      was bad when the helper function returned.
      
      This change allocates the object via "new", which fixes that problem,
      but reveals another problem: the raw_ostream object for some reason
      doesn't finish writing everything to disk and yielding a truncated
      output file.  This is solved in the style of the ELF streamer, by
      using raw_fd_ostream instead.
      
      BUG= none
      R=kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/1111603003
  7. 22 Apr, 2015 2 commits
  8. 21 Apr, 2015 2 commits
    • Subzero: Improve "make check-unit" execution. · e7e9b024
      Jim Stichnoth authored
      If you switch between "cmake" and "autoconf" toolchain builds, and
      neglect to clean out pnacl_newlib_raw/ in between, the wrong libgtest
      and libgtest_main may get pulled in for the autoconf build, leading to
      an assertion failure in "make check-unit".
      
      This tweak fixes that problem by rejiggering the lib search path.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1099093005
    • Subzero: Auto-detect cmake versus autoconf LLVM build. · 0a9e1261
      Jim Stichnoth authored
      The CMAKE=1 option is no longer needed.
      
      Pretty much all the tools we need are now in pnacl_newlib_raw/bin, so use PNACL_BIN_PATH set to that instead of using LLVM_BIN_PATH and BINUTILS_BIN_PATH.
      
      However, for the autoconf build, libgtest and libtest_main and clang-format are only under the llvm_x86_64_linux_work directory, so they need special casing.  This also means that you have to actually do an LLVM build and not blow away the work directory in order to "make check-unit" or "make format".
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1085733002
  9. 16 Apr, 2015 5 commits
  10. 10 Apr, 2015 1 commit
  11. 09 Apr, 2015 2 commits
  12. 07 Apr, 2015 1 commit
  13. 06 Apr, 2015 1 commit
  14. 31 Mar, 2015 1 commit
    • Add argv[0] before parsing commandline flags. · 9c1d3869
      Jan Voung authored
      The \0 delimited string array that the browser sends doesn't have
      the program name and the IRT only tokenizes that and forwards
      it along. We need argv[0] to make the llvm CL parser happy
      (used for -help message, etc).
      
      Alternatively, we could have the IRT fill in a program name
      so that the argv is a real argv. That will involve less copying since
      the argv will be the right size to begin with, but prevents each app
      from customizing its argv[0] =/
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4091
      TEST= manual for now (construct the sel_universal script to only pass
      the "--build-atts" flag and see it exits without being swallowed,
      or pass "-Ofoo" and see an error + exit)
      
      R=mtrofin@chromium.org, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1041843003
  15. 30 Mar, 2015 1 commit
  16. 29 Mar, 2015 1 commit
    • Subzero: Fix dependency checking to avoid unnecessary rebuilds. · 8c7b0a2c
      Jim Stichnoth authored
      When trying to do bisection debugging, the pnacl-llc translation was happening every time even if the pexe didn't change.  This is because it was checking for a binary called 'llc' in the current directory, instead of an absolute path the pnacl-llc.  (This check is done so that updating pnacl-llc triggers a rebuild of the bisection binary, similar to the check for an update of pnacl-sz.)
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1044623003
  17. 27 Mar, 2015 1 commit
    • Refactor Subzero initialization and add a browser callback handler. · 44c3a804
      Jan Voung authored
      Handlers are represented as a "compile server" even though
      right now it can really only handle a single
      compile request.
      
      Then there can be a commandline-based server and a
      browser-based server. This server takes over the main
      thread. In the browser-based case the server can block,
      waiting on bytes to be pushed. This becomes a producer of
      bitcode bytes.
      
      The original main thread which did bitcode reading is now
      shifted to yet another worker thread, which is then the
      consumer of bitcode bytes.
      
      This uses an IRT interface for listening to messages
      from the browser:
      https://codereview.chromium.org/984713003/
      
      TEST=Build the IRT core nexe w/ the above patch and compile w/ something like:
      
      echo """
      readwrite_file objfile /tmp/temp.nexe---gcc.opt.stripped.pexe---.o
      rpc StreamInitWithSplit i(4) h(objfile) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) C(4,-O2\x00) * s()
      stream_file /usr/local/google/home/jvoung/pexe_tests/gcc.opt.stripped.pexe 65536 1000000000
      rpc StreamEnd * i() s() s() s()
      echo "pnacl-sz complete"
      """ | scons-out/opt-linux-x86-32/staging/sel_universal \
          -a -B scons-out/nacl_irt-x86-32/staging/irt_core.nexe \
          --abort_on_error \
          -- toolchain/linux_x86/pnacl_translator/translator/x86-32/bin/pnacl-sz.nexe
      
      echo """
      readwrite_file nexefile /tmp/temp.nexe.tmp
      readonly_file objfile0 /tmp/temp.nexe---gcc.opt.stripped.pexe---.o
      rpc RunWithSplit i(1) h(objfile0) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(nexefile) *
      echo "ld complete"
      """ | /usr/local/google/home/nacl3/native_client/scons-out/opt-linux-x86-32/staging/sel_universal \
          --abort_on_error \
          -a -B \
          scons-out/nacl_irt-x86-32/staging/irt_core.nexe \
          -E NACL_IRT_OPEN_RESOURCE_BASE=toolchain/linux_x86/pnacl_translator/translator/x86-32/lib/ \
          -E NACL_IRT_OPEN_RESOURCE_REMAP=libpnacl_irt_shim.a:libpnacl_irt_shim_dummy.a \
          -- toolchain/linux_x86/pnacl_translator/translator/x86-32/bin/ld.nexe
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4091
      R=kschimpf@google.com, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/997773002
  18. 24 Mar, 2015 2 commits
    • Subzero: Fix a lowering bug involving xchg and xadd instructions. · 0e432ac4
      Jim Stichnoth authored
      The x86-32 xchg and xadd instructions are modeled using two source operands, one of which is a memory operand and the other ultimately a physical register.  These instructions have a side effect of modifying both operands.
      
      During lowering, we need to specially express that the instruction modifies the Variable operand (since it doesn't appear as the instruction's Dest variable).  This makes the register allocator aware of the Variable being multi-def, and prevents it from sharing a register with an overlapping live range.
      
      This was being partially expressed by adding a FakeDef instruction.  However, FakeDef instructions are still allowed to be dead-code eliminated, and if this happens, the Variable may appear to be single-def, triggering the unsafe register sharing.
      
      The solution is to prevent the FakeDef instruction from being eliminated, via a FakeUse instruction.
      
      It turns out that the current register allocator isn't aggressive enough to manifest the bug with cmpxchg instructions, but the fix and tests are there just in case.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1020853011
    • Make compile without ICE_THREAD_LOCAL_HACK (avoid "Type *TLS = TLS;") · 3e5009f6
      Jan Voung authored
      Otherwise you get:
      
      In file included from src/IceGlobalContext.cpp:21:
      In file included from src/IceCfg.h:21:
      src/IceGlobalContext.h:257:44: error: variable 'TLS' is uninitialized when used within its own initialization [-Werror,-Wuninitialized]
          ThreadContext *TLS = ICE_TLS_GET_FIELD(TLS);
                         ~~~                     ^~~
      src/IceTLS.h:95:39: note: expanded from macro 'ICE_TLS_GET_FIELD'
                                            ^
      So rename the local var to Tls.
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1030793002
  19. 23 Mar, 2015 3 commits
    • Subzero: Don't use key SSE instructions on potentially unaligned loads. · f79d2cb6
      Jim Stichnoth authored
      The non-mov-like SSE instructions generally require 16-byte aligned memory operands.  The PNaCl bitcode ABI only guarantees 4-byte alignment or less on vector loads and stores.  Subzero maintains stack alignment so stack memory operands are fine.
      
      We handle this by legalizing memory operands into a register wherever there is doubt.
      
      This bug was first discovered on the vector_align scons test.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4083
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4133
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1024253003
    • Subzero: Prune unreachable nodes after constructing the Cfg. · 69d3f9c6
      Jim Stichnoth authored
      The gcc torture test suite has examples where there is a function call (to a routine that throws an exception or aborts or something), followed by an "unreachable" instruction, followed by more code that may e.g. return a value to the caller.  In these examples, the code following the unreachable is itself unreachable.
      
      Problems arise when the unreachable code references a variable defined in the reachable code.  This triggers a liveness consistency error because the use of the variable has no reaching definition.
      
      It's a bit surprising that LLVM actually allows this, but it does so we need to deal with it.
      
      The solution is, after initial CFG construction, do a traversal starting from the entry node and then delete any undiscovered nodes.
      
      There is code in Subzero that assumes Cfg::Nodes[i]->Number == i, so the nodes need to be renumbered after pruning.  The alternative was to set Nodes[i]=nullptr and not change the node number, but that would mean peppering the code base with CfgNode null checks.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1027933002
    • Subzero: Fix inappropriate use of nullptr. · 27c56bf6
      Jim Stichnoth authored
      When lowering of a couple of atomic intrinsics down to a loop structure, a FakeUse on the memory address's base variable is created.  However, if the memory address is a global constant, there is no base variable.  So check for that and don't create a FakeUse if there is none.
      
      BUG= none
      TEST=synchronization_sync (scons test)
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1023673007
  20. 20 Mar, 2015 3 commits
  21. 19 Mar, 2015 3 commits
    • Subzero: Fix floating-point constant pooling. · 5bfe2157
      Jim Stichnoth authored
      This fixes a regression likely introduced in d2cb4361 .
      
      The problem is that by using the default std::unordered_map comparison predicate std::equal_to, we get incorrect behavior when the key is float or double:
      
      1. 0.0 and -0.0 appear equal, so they share a constant pool entry even though the bit patterns are different.  This is a correctness bug.
      
      2. Each instance of NaN gets a separate constant pool entry, because NaN != NaN by C equality rules.  This is a performance bug.  (This problem doesn't show up with the native bitcode reader, because constants are already unique-ified in the PNaCl bitcode file.)
      
      The solution is to use memcmp for floating-point key types.
      
      Also, the abi-atomics.ll test is disabled for the MINIMAL build, to fix an oversight from a previous CL.
      
      BUG= none
      R=jfb@chromium.org
      
      Review URL: https://codereview.chromium.org/1019233002
    • Subzero: Add fabs intrinsic support. · 8c980d0d
      Jim Stichnoth authored
      The intrinsic is lowered using the standard technique of masking off the FP sign bit, which is the high-order bit.
      
      To construct this mask, we use the existing trick of loading a vector register with all "1" bits, then logical-shift-right by one bit.
      
      In the future, we should add 128-bit vector values to the constant pool and force them to memory, and this could be used for the other routines that synthesize a vector constant.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4097
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1022573004
    • Assemble calls to constant addresses. · f644a4b3
      Jan Voung authored
      Finally address this TODO in the assembler. This will help translate
      non-IRT using programs (no ABI stability). The default scons testing
      mode is non-IRT, so this helps with that. I haven't actually tested
      this against scons yet, but I'm filling in the tests based on how
      LLVM translates the same bitcode.
      
      The filetype=asm is adjusted to omit the "*" and the "$".
      
      The filetype=obj is adjusted to check for fixups with NullSymbols,
      and also fill the assembler buffer at the instruction's immediate
      field w/ the right constant.
      
      The filetype=iasm is still TODO (hits an new assert in the Fixup's emit() function).
      
      Reverts 7ad1bed9:
      "Allow stubbing of called constant addresses using command line argument."
      since this is now handled (except for iasm).
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4080
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1017373002
  22. 18 Mar, 2015 2 commits