1. 04 Jun, 2015 1 commit
    • Use report_fatal_error before destroying input object on error. · 2f7f2b7e
      Jan Voung authored
      The input object may be a QueueStreamer, which the compile
      server will still have a reference to (even though
      downstream the memory object API and parser API thinks it
      has a unique_ptr). Terminate the thread quickly on error,
      instead of free'ing and causing a use-after-free.
      
      Also set up a report_fatal_error handler which has access
      to the server's state. This allows the server to record the
      error and stop pushing bytes to the QueueStreamer.
      Otherwise the QueueStreamer can get full without a consumer
      still active to unblock.
      
      Unfortunately the fatal error handler only terminates the
      current thread, and not all worker threads. NaCl doesn't
      have support for signals or pthread_kill.
      E.g., with pthread_kill(std_thread.native_handle(), SIGABRT).
      So, other worker/emitter threads will have to hang waiting on
      more input or something.
      
      Random clang-format edits from 3.7.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4163
      TEST= tbd:
      
      I manually ran the translator a dummy text file (invalid bitcode
      header), and observed that this no longer crashes. Instead the SRPC
      calls finish and I see:
      
      3> [17812,4147750656:14:23:02.025382] Streaming file at 100000 bps
      [17812,4147750656:14:23:12.511574] RPC call failed: Rpc application returned an error.
      [17812,4147750656:14:23:12.511625] StreamChunk failed
      [17812,4147750656:14:23:12.511655] stream_file: SendDataChunk failed, but returning without failing. Expect call to StreamEnd.4> rpc call initiated StreamEnd::isss
      [17812,4147750656:14:23:12.511931] RPC call failed: Rpc application returned an error.
      rpc call complete StreamEnd::isss
      output 0:  i(0)
      output 1:  s("")
      output 2:  s("")
      output 3:  s("Invalid PNaCl bitcode header")
      [17812,4147750656:14:23:12.512102] Command [rpc] failed.
      
      R=kschimpf@google.com, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1168543002
  2. 03 Jun, 2015 2 commits
    • Subzero: Improve/refactor folding loads into the next instruction. · 8e6bf6e1
      Jim Stichnoth authored
      This is turned into a separate (O2-only) pass that looks for opportunities:
      1. A Load instruction, or an AtomicLoad intrinsic that would be lowered just like a Load instruction
      2. Followed immediately by an instruction with a whitelisted kind that uses the Load dest variable as one of its operands
      3. Where the whitelisted instruction ends the live range of the Load dest variable.
      
      In such cases, the original two instructions are deleted and a new instruction is added that folds the load into the whitelisted instruction.
      
      We also do some work to splice the liveness information (Inst::LiveRangesEnded and Inst::isLastUse()) into the new instruction, so that the target lowering pass might still take advantage.  Currently this is used quite sparingly, but in the future we could use that along with operator commutativity to choose among different lowering sequences to reduce register pressure.
      
      The whitelisted instruction kinds are chosen based primarily on whether the main operation's native instruction can use a memory operand - e.g., arithmetic (add/sub/imul/etc), compare (cmp/ucomiss), cast (movsx/movzx/etc).  Notably, call and ret are not included because arg passing is done through simple assignments which normal lowering is sufficient for.
      
      BUG= none
      R=jvoung@chromium.org, mtrofin@chromium.org
      
      Review URL: https://codereview.chromium.org/1169493002
    • Subzero: Change pnacl_newlib ==> pnacl_newlib_raw in scripts. · bb9d11a5
      Jim Stichnoth authored
      BUG= none
      R=jvoung@chromium.org, kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/1162903003
  3. 02 Jun, 2015 1 commit
  4. 01 Jun, 2015 4 commits
    • Subzero: Changes needed for LLVM 3.7 integration. · e5b58fbe
      Jim Stichnoth authored
      1. Change Makefile.standalone from 3.6 to 3.7.
      
      2. Update to new load instruction .ll syntax.  This includes changing InstLoad::dump() to match.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1161543005
    • Subzero: Remove a compile-time warning. · 0769299d
      Jim Stichnoth authored
      BUG= none
      R=kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/1161353002
    • Subzero ARM: addProlog/addEpilogue -- share some code with x86. · 0fa6c5a0
      Jan Voung authored
      Split out some of the addProlog code from x86 and
      reuse that for ARM. Mainly, the code that doesn't
      concern preserved registers or stack arguments is split out.
      
      ARM push and pop take a whole list of registers (not
      necessarily consecutive, but should be in ascending order).
      There is also "vpush" for callee-saved float/vector
      registers but we do not handle that yet (the register
      numbers for that have to be consecutive).
      
      Enable some of the int-arg.ll tests, which relied on
      addPrologue's finishArgumentLowering to pull from the
      correct argument stack slot.
      
      Test some of the frame pointer usage (push/pop) when
      handling a variable sized alloca.
      
      Also change the classification of LR, and PC so that
      they are not "CalleeSave". We don't want to push LR
      if it isn't overwritten by another call. It will certainly be
      "used" by the return however. The prologue code only checks
      if a CalleeSave register is used somewhere before deciding
      to preserve it. We could make that stricter and check if
      the register is also written to, but there are some
      additional writes that are not visible till after the
      push/pop are generated (e.g., copy from argument stack slot
      to the argument register). Instead, keep checking use
      only, and handle LR as a special case (IsLeafFunction).
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1159013002
    • Subzero: Fold the load instruction into the next cast instruction. · c77f817f
      Jim Stichnoth authored
      This is similar to the way a load instruction may be folded into the next arithmetic instruction.
      
      Usually the effect is to improve a sequence like:
        mov ax, WORD PTR [mem]
        movsx eax, ax
      into this:
        movsx eax, WORD PTR [mem]
      without actually improving register allocation, though other kinds of casts may have different improvements.
      
      Existing tests needed to be fixed when they "inadvertently" did a cast to i32 return type and triggered the optimization when it wasn't wanted.  These were fixed by inserting a "dummy" instruction between the load and the cast.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1152783006
  5. 27 May, 2015 3 commits
  6. 26 May, 2015 2 commits
  7. 22 May, 2015 2 commits
  8. 19 May, 2015 2 commits
    • Lower a few basic ARM binops for i{8,16,32,64}. · 2971997a
      Jan Voung authored
      Do basic lowering for add, sub, and, or, xor, mul.
      We don't yet take advantage of commuting immediate operands
      (e.g., use rsb to reverse subtract instead of sub) or
      inverting immediate operands (use bic to bit clear instead
      of using and).
      
      The binary operations can set the flags register (e.g., to
      have the carry bit for use with a subsequent adc
      instruction). That is optional for the "data processing"
      instructions.
      
      I'm not yet able to compile 8bit.pnacl.ll and
      64bit.pnacl.ll so 8-bit and 64-bit are not well tested yet.
      Only tests are in the arith.ll file (like arith-opt.ll, but
      assembled instead of testing the "verbose inst" output).
      
      Not doing divide yet. ARM divide by 0 does not trap, but
      PNaCl requires uniform behavior for such bad code. Thus,
      in LLVM we insert a 0 check and would have to do the same.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1127003003
    • Subzero: Use cmov to improve lowering for the select instruction. · 537b5ba0
      Jim Stichnoth authored
      This is instead of explicit control flow which may interfere with branch prediction.  However, explicit control flow is still needed for types other than i16 and i32, due to cmov limitations.
      
      The assembler for cmov is extended to allow the non-dest operand to be a memory operand.
      
      The select lowering is getting large enough that it was in our best interest to combine the default lowering with the bool-folding optimization.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1125323004
  9. 18 May, 2015 1 commit
  10. 17 May, 2015 1 commit
    • Subzero: Fold icmp into br/select lowering. · a59ae6ff
      Jim Stichnoth authored
      Originally there was a peephole-style optimization in lowerIcmp() that looks ahead to see if the next instruction is a conditional branch with the right properties, and if so, folds the icmp and br into a single lowering sequence.
      
      However, sometimes extra instructions come between the icmp and br instructions, disabling the folding even though it would still be possible.
      
      One thought is to do the folding inside lowerBr() instead of lowerIcmp(), by looking backward for a suitable icmp instruction.  The problem here is that the icmp lowering code may leave lowered instructions that can't easily be dead-code eliminated, e.g. instructions lacking a dest variable.
      
      Instead, before lowering a basic block, we do a prepass on the block to identify folding candidates.  For the icmp/br example, the prepass would tentatively delete the icmp instruction and then the br lowering would fold in the icmp.
      
      This folding can also be extended to several producers:
        icmp (i32 operands), icmp (i64 operands), fcmp, trunc .. to i1
      and several consumers:
        br, select, sext, zext
      
      This CL starts with 2 combinations: icmp32 paired with br & select.  Other combinations will be added in later CLs.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4162
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1141213004
  11. 16 May, 2015 1 commit
  12. 14 May, 2015 1 commit
    • Convert Constant->emit() definitions to allow multiple targets to define them. · 76bb0bec
      Jan Voung authored
      Wasn't sure how to allow TargetX8632 and TargetARM32
      to both define "ConstantInteger32::emit(GlobalContext *)",
      and define them differently if both targets happen to be
      ifdef'ed into the code. Rearranged things so that it's now
      "TargetFoo::emit(ConstantInteger32 *)", so that each
      TargetFoo can have a separate definition.
      
      Some targets may allow emitting some types of constants
      while other targets do not (64-bit int for x86-64?).
      Also they emit constants with a different style.
      E.g., the prefix for x86 is "$" while the prefix for ARM
      is "#" and there isn't a prefix for mips(?).
      Renamed emitWithoutDollar to emitWithoutPrefix.
      
      Did this sort of multi-method dispatch via a visitor
      pattern, which is a bit verbose though.
      
      We may be able to remove the emitWithoutDollar/Prefix for
      ConstantPrimitive by just inlining that into the few places
      that need it (only needed for ConstantInteger32). This
      undoes the unreachable methods added by: https://codereview.chromium.org/1017373002/diff/60001/src/IceTargetLoweringX8632.cpp
      The only place extra was for emitting calls to constants.
      There was already an inlined instance for OperandX8632Mem.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1129263005
  13. 12 May, 2015 2 commits
  14. 07 May, 2015 1 commit
  15. 04 May, 2015 1 commit
    • Subzero: Use a setcc sequence for better icmp lowering. · f48b320c
      Jim Stichnoth authored
      For an example like:
        %a = icmp eq i32 %b, %c
      
      The original icmp lowering sequence for i8/i16/i32 was something like:
      
        cmpl b, c
        movb 1, a
        je label
        movb 0, a
      label:
      
      The improved sequence is:
        cmpl b, c
        sete a
      
      In O2 mode, this doesn't help when successive compare/branch instructions are fused, but it does help when the boolean result needs to be saved and later used.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1118353005
  16. 30 Apr, 2015 3 commits
  17. 29 Apr, 2015 1 commit
    • Subzero: Produce actually correct code in --asm-verbose mode. · 76dcf1a8
      Jim Stichnoth authored
      The "pnacl-sz --asm-verbose=1" mode annotates the asm output with physical register liveness information, including which registers are live at the beginning and end of each basic block, and which registers' live ranges end at each instruction.  Computing this information requires a final liveness analysis pass.  One of the side effects of liveness analysis is to remove dead instructions, which happens when the instruction's dest variable is not live and the instruction lacks important side effects.
      
      In some cases, direct manipulation of physical registers was missing extra fakedef/fakeuse/etc., and as as result these instructions could be eliminated, leading to incorrect code.  Without --asm-verbose, these instructions were being created after the last run of liveness analysis, so they had no chance of being eliminated and everything was fine.  But with --asm-verbose, some instructions would be eliminated.
      
      This CL fixes the omissions so that the resulting code is runnable.
      
      An alternative would be to add a flag to liveness analysis directing it not to dead-code eliminate any more instructions.  However, it's better to get the liveness right in case future late-stage optimizations rely on it.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4135
      TEST= pydir/szbuild_spec2k.py --filetype=asm -v --sz=--asm-verbose=1 --force
      R=jvoung@chromium.org, kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/1113683002
  18. 28 Apr, 2015 1 commit
    • Subzero: Fix asm (non-ELF) output files. · 620ad732
      Jim Stichnoth authored
      In an earlier version of Subzero, the text output stream object was
      stack-allocated within main.  A later refactoring moved its allocation
      into a helper function, but it was still being stack-allocated, which
      was bad when the helper function returned.
      
      This change allocates the object via "new", which fixes that problem,
      but reveals another problem: the raw_ostream object for some reason
      doesn't finish writing everything to disk and yielding a truncated
      output file.  This is solved in the style of the ELF streamer, by
      using raw_fd_ostream instead.
      
      BUG= none
      R=kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/1111603003
  19. 22 Apr, 2015 2 commits
  20. 21 Apr, 2015 2 commits
    • Subzero: Improve "make check-unit" execution. · e7e9b024
      Jim Stichnoth authored
      If you switch between "cmake" and "autoconf" toolchain builds, and
      neglect to clean out pnacl_newlib_raw/ in between, the wrong libgtest
      and libgtest_main may get pulled in for the autoconf build, leading to
      an assertion failure in "make check-unit".
      
      This tweak fixes that problem by rejiggering the lib search path.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1099093005
    • Subzero: Auto-detect cmake versus autoconf LLVM build. · 0a9e1261
      Jim Stichnoth authored
      The CMAKE=1 option is no longer needed.
      
      Pretty much all the tools we need are now in pnacl_newlib_raw/bin, so use PNACL_BIN_PATH set to that instead of using LLVM_BIN_PATH and BINUTILS_BIN_PATH.
      
      However, for the autoconf build, libgtest and libtest_main and clang-format are only under the llvm_x86_64_linux_work directory, so they need special casing.  This also means that you have to actually do an LLVM build and not blow away the work directory in order to "make check-unit" or "make format".
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1085733002
  21. 16 Apr, 2015 5 commits
  22. 10 Apr, 2015 1 commit