1. 22 Jun, 2015 1 commit
    • Add constant blinding/pooling option for X8632 code translation. · 253dc8a8
      Qining Lu authored
      GOAL:
      The goal is to remove the ability of an attacker to control immediates emitted into the text section.
      
      OPTION:
      The option -randomize-pool-immediates is set to none by default (-randomize-pool-immediates=none). To turn on constant blinding, set -randomize-pool-immediates=randomize; to turn on constant pooling, use -randomize-pool-immediates=pool.
      
      Not all constant integers in the input pexe file will be randomized or pooled. The signed representation of a candidate constant integer must be between -randomizeOrPoolImmediatesThreshold/2 and +randomizeOrPoolImmediatesThreshold/2. This threshold value can be set with command line option: "-randomize-pool-threshold". By default this threshold is set to 0xffff.
      
      The constants introduced by instruction lowering (e.g. constants in shifting, masking) and argument lowering are not blinded in this way. The mask used for sandboxing is not affected either.
      
      APPROACH:
      We use GAS syntax in these examples.
      
      Constant blinding for immediates:
      Original:
          add 0x1234, eax
      After:
          mov 0x1234+cookie, temp_reg
          lea -cookie[temp_reg], temp_reg
          add temp_reg, eax
      
      Constant blinding for memory addressing offsets:
      Original:
        mov 0x1234(eax, esi, 1), ebx
      After:
        lea 0x1234+cookie(eax), temp_reg
        mov -cookie(temp_reg, esi, 1), ebx
      
      We use "lea" here because it won't affect flag register, so it is safer to transform immediate-involved instructions.
      
      Constant pooling for immediates:
      Original:
          add 0x1234, eax
      After:
          mov [memory label of 0x1234], temp_reg
          add temp_reg, eax
      
      Constant pooling for addressing offsets:
      Original:
        mov 0x1234, eax
      After:
        mov [memory label of 0x1234], temp_reg
        mov temp_reg, eax
      
      Note in both cases, temp_reg may be assigned with "eax" here, depends on the
      liveness analysis. So this approach may not require extra register.
      
      IMPLEMENTATION:
        Processing:
         TargetX8632::randomizeOrPoolImmediate(Constant *Immediate, int32_t RegNum);
         TargetX8632::randomizeOrPoolImmediate(OperandX8632Mem *Memoperand, int32_t RegNum);
      
        Checking eligibility:
          ConstantInteger32::shouldBeRandomizedOrPooled(const GlobalContext *Ctx);
      
      ISSUES:
      1. bool Ice::TargetX8632::RandomizationPoolingPaused is used to guard some translation phases to disable constant blinding/pooling temporally. Helper class BoolFlagSaver is added to latch the value of RandomizationPoolingPaused.
      
      Known phases that need to be guarded are: doLoadOpt() and advancedPhiLowering(). However, during advancedPhiLowering(), if the destination variable has a physical register allocated, constant blinding and pooling are allowed. Stopping blinding/pooling for doLoadOpt() won't hurt our randomization or pooling as the optimized addressing operands will be processed again in genCode() phase.
      
      2. i8 and i16 constants are collected with different constant pools now, instead of sharing a same constant pool with i32 constants. This requires emitting two more pools during constants lowering, hence create two more read-only data sections in the resulting ELF and ASM. No runtime issues have been observed so far.
      
      BUG=
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1185703004.
  2. 18 Jun, 2015 4 commits
  3. 17 Jun, 2015 2 commits
  4. 16 Jun, 2015 1 commit
  5. 15 Jun, 2015 2 commits
  6. 12 Jun, 2015 3 commits
  7. 11 Jun, 2015 5 commits
  8. 10 Jun, 2015 2 commits
  9. 08 Jun, 2015 1 commit
  10. 05 Jun, 2015 4 commits
  11. 04 Jun, 2015 2 commits
    • Subzero: Legalize FP constants directly into memory operands. · 03ffa585
      Jim Stichnoth authored
      Previously, the legalize() function would always force a floating point constant into an xmm register before it could be used in an instruction.  This uses an extra register unnecessarily when the instruction allows a memory operand for that operand.
      
      We improve this by lowering the FP constant operand to an OperandX8632Mem that wraps a ConstantRelocatable representing the label for the constant pool entry, e.g. [.L$float$0].  (This may end up being copied into an xmm register if the instruction doesn't allow a memory operand for that operand.)
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1163943005
    • Use report_fatal_error before destroying input object on error. · 2f7f2b7e
      Jan Voung authored
      The input object may be a QueueStreamer, which the compile
      server will still have a reference to (even though
      downstream the memory object API and parser API thinks it
      has a unique_ptr). Terminate the thread quickly on error,
      instead of free'ing and causing a use-after-free.
      
      Also set up a report_fatal_error handler which has access
      to the server's state. This allows the server to record the
      error and stop pushing bytes to the QueueStreamer.
      Otherwise the QueueStreamer can get full without a consumer
      still active to unblock.
      
      Unfortunately the fatal error handler only terminates the
      current thread, and not all worker threads. NaCl doesn't
      have support for signals or pthread_kill.
      E.g., with pthread_kill(std_thread.native_handle(), SIGABRT).
      So, other worker/emitter threads will have to hang waiting on
      more input or something.
      
      Random clang-format edits from 3.7.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4163
      TEST= tbd:
      
      I manually ran the translator a dummy text file (invalid bitcode
      header), and observed that this no longer crashes. Instead the SRPC
      calls finish and I see:
      
      3> [17812,4147750656:14:23:02.025382] Streaming file at 100000 bps
      [17812,4147750656:14:23:12.511574] RPC call failed: Rpc application returned an error.
      [17812,4147750656:14:23:12.511625] StreamChunk failed
      [17812,4147750656:14:23:12.511655] stream_file: SendDataChunk failed, but returning without failing. Expect call to StreamEnd.4> rpc call initiated StreamEnd::isss
      [17812,4147750656:14:23:12.511931] RPC call failed: Rpc application returned an error.
      rpc call complete StreamEnd::isss
      output 0:  i(0)
      output 1:  s("")
      output 2:  s("")
      output 3:  s("Invalid PNaCl bitcode header")
      [17812,4147750656:14:23:12.512102] Command [rpc] failed.
      
      R=kschimpf@google.com, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1168543002
  12. 03 Jun, 2015 2 commits
    • Subzero: Improve/refactor folding loads into the next instruction. · 8e6bf6e1
      Jim Stichnoth authored
      This is turned into a separate (O2-only) pass that looks for opportunities:
      1. A Load instruction, or an AtomicLoad intrinsic that would be lowered just like a Load instruction
      2. Followed immediately by an instruction with a whitelisted kind that uses the Load dest variable as one of its operands
      3. Where the whitelisted instruction ends the live range of the Load dest variable.
      
      In such cases, the original two instructions are deleted and a new instruction is added that folds the load into the whitelisted instruction.
      
      We also do some work to splice the liveness information (Inst::LiveRangesEnded and Inst::isLastUse()) into the new instruction, so that the target lowering pass might still take advantage.  Currently this is used quite sparingly, but in the future we could use that along with operator commutativity to choose among different lowering sequences to reduce register pressure.
      
      The whitelisted instruction kinds are chosen based primarily on whether the main operation's native instruction can use a memory operand - e.g., arithmetic (add/sub/imul/etc), compare (cmp/ucomiss), cast (movsx/movzx/etc).  Notably, call and ret are not included because arg passing is done through simple assignments which normal lowering is sufficient for.
      
      BUG= none
      R=jvoung@chromium.org, mtrofin@chromium.org
      
      Review URL: https://codereview.chromium.org/1169493002
    • Subzero: Change pnacl_newlib ==> pnacl_newlib_raw in scripts. · bb9d11a5
      Jim Stichnoth authored
      BUG= none
      R=jvoung@chromium.org, kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/1162903003
  13. 02 Jun, 2015 1 commit
  14. 01 Jun, 2015 4 commits
    • Subzero: Changes needed for LLVM 3.7 integration. · e5b58fbe
      Jim Stichnoth authored
      1. Change Makefile.standalone from 3.6 to 3.7.
      
      2. Update to new load instruction .ll syntax.  This includes changing InstLoad::dump() to match.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1161543005
    • Subzero: Remove a compile-time warning. · 0769299d
      Jim Stichnoth authored
      BUG= none
      R=kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/1161353002
    • Subzero ARM: addProlog/addEpilogue -- share some code with x86. · 0fa6c5a0
      Jan Voung authored
      Split out some of the addProlog code from x86 and
      reuse that for ARM. Mainly, the code that doesn't
      concern preserved registers or stack arguments is split out.
      
      ARM push and pop take a whole list of registers (not
      necessarily consecutive, but should be in ascending order).
      There is also "vpush" for callee-saved float/vector
      registers but we do not handle that yet (the register
      numbers for that have to be consecutive).
      
      Enable some of the int-arg.ll tests, which relied on
      addPrologue's finishArgumentLowering to pull from the
      correct argument stack slot.
      
      Test some of the frame pointer usage (push/pop) when
      handling a variable sized alloca.
      
      Also change the classification of LR, and PC so that
      they are not "CalleeSave". We don't want to push LR
      if it isn't overwritten by another call. It will certainly be
      "used" by the return however. The prologue code only checks
      if a CalleeSave register is used somewhere before deciding
      to preserve it. We could make that stricter and check if
      the register is also written to, but there are some
      additional writes that are not visible till after the
      push/pop are generated (e.g., copy from argument stack slot
      to the argument register). Instead, keep checking use
      only, and handle LR as a special case (IsLeafFunction).
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1159013002
    • Subzero: Fold the load instruction into the next cast instruction. · c77f817f
      Jim Stichnoth authored
      This is similar to the way a load instruction may be folded into the next arithmetic instruction.
      
      Usually the effect is to improve a sequence like:
        mov ax, WORD PTR [mem]
        movsx eax, ax
      into this:
        movsx eax, WORD PTR [mem]
      without actually improving register allocation, though other kinds of casts may have different improvements.
      
      Existing tests needed to be fixed when they "inadvertently" did a cast to i32 return type and triggered the optimization when it wasn't wanted.  These were fixed by inserting a "dummy" instruction between the load and the cast.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/1152783006
  15. 27 May, 2015 3 commits
  16. 26 May, 2015 2 commits
  17. 22 May, 2015 1 commit
    • Subzero ARM: do lowerIcmp, lowerBr, and a bit of lowerCall. · 3bfd99a3
      Jan Voung authored
      Allow instructions to be predicated and use that in lower icmp
      and branch. Tracking the predicate for almost every instruction
      is a bit overkill, but technically possible. Add that to most of
      the instruction constructors except ret and call for now.
      
      This doesn't yet do compare + branch fusing, but it does handle
      the branch fallthrough to avoid branching twice.
      
      I can't yet test 8bit and 16bit, since those come from "trunc"
      and "trunc" is not lowered yet (or load, which also isn't
      handled yet).
      
      Adds basic "call(void)" lowering, just to get the call markers
      showing up in tests.
      
      64bit.pnacl.ll no longer explodes with liveness consistency errors,
      so risk running that and backfill some of the 64bit arith tests.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/1151663004