- 22 Jun, 2015 1 commit
-
-
Qining Lu authored
GOAL: The goal is to remove the ability of an attacker to control immediates emitted into the text section. OPTION: The option -randomize-pool-immediates is set to none by default (-randomize-pool-immediates=none). To turn on constant blinding, set -randomize-pool-immediates=randomize; to turn on constant pooling, use -randomize-pool-immediates=pool. Not all constant integers in the input pexe file will be randomized or pooled. The signed representation of a candidate constant integer must be between -randomizeOrPoolImmediatesThreshold/2 and +randomizeOrPoolImmediatesThreshold/2. This threshold value can be set with command line option: "-randomize-pool-threshold". By default this threshold is set to 0xffff. The constants introduced by instruction lowering (e.g. constants in shifting, masking) and argument lowering are not blinded in this way. The mask used for sandboxing is not affected either. APPROACH: We use GAS syntax in these examples. Constant blinding for immediates: Original: add 0x1234, eax After: mov 0x1234+cookie, temp_reg lea -cookie[temp_reg], temp_reg add temp_reg, eax Constant blinding for memory addressing offsets: Original: mov 0x1234(eax, esi, 1), ebx After: lea 0x1234+cookie(eax), temp_reg mov -cookie(temp_reg, esi, 1), ebx We use "lea" here because it won't affect flag register, so it is safer to transform immediate-involved instructions. Constant pooling for immediates: Original: add 0x1234, eax After: mov [memory label of 0x1234], temp_reg add temp_reg, eax Constant pooling for addressing offsets: Original: mov 0x1234, eax After: mov [memory label of 0x1234], temp_reg mov temp_reg, eax Note in both cases, temp_reg may be assigned with "eax" here, depends on the liveness analysis. So this approach may not require extra register. IMPLEMENTATION: Processing: TargetX8632::randomizeOrPoolImmediate(Constant *Immediate, int32_t RegNum); TargetX8632::randomizeOrPoolImmediate(OperandX8632Mem *Memoperand, int32_t RegNum); Checking eligibility: ConstantInteger32::shouldBeRandomizedOrPooled(const GlobalContext *Ctx); ISSUES: 1. bool Ice::TargetX8632::RandomizationPoolingPaused is used to guard some translation phases to disable constant blinding/pooling temporally. Helper class BoolFlagSaver is added to latch the value of RandomizationPoolingPaused. Known phases that need to be guarded are: doLoadOpt() and advancedPhiLowering(). However, during advancedPhiLowering(), if the destination variable has a physical register allocated, constant blinding and pooling are allowed. Stopping blinding/pooling for doLoadOpt() won't hurt our randomization or pooling as the optimized addressing operands will be processed again in genCode() phase. 2. i8 and i16 constants are collected with different constant pools now, instead of sharing a same constant pool with i32 constants. This requires emitting two more pools during constants lowering, hence create two more read-only data sections in the resulting ELF and ASM. No runtime issues have been observed so far. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1185703004.
-
- 18 Jun, 2015 4 commits
-
-
Jan Voung authored
Actually assign arguments to r0-r3 at the call site. Previously this was left unhandled. There was only logic for pulling formal parameters out of r0-r3. Refactor the GPR counter and move it into a class so that the rounding up for i64 arguments is in one place for callsites and for pulling out of parameters. We might be able to use a similar pattern to count the FP/SIMD registers later. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1187513006.
-
Jim Stichnoth authored
Specifically: sub, and, or, xor; for all integer types. Turns out that RMW is not possible for fadd/fsub/fmul/fdiv as well as operations on vector types, because the corresponding x86 instructions require the result to be in a physical register. Refactors the assembler's implementations of add/or/adc/sbb/and/sub/xor/cmp to avoid repetition. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1186713010
-
Jim Stichnoth authored
BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1195553002
-
Jim Stichnoth authored
Search for sequences of Load/Arith/Store instructions that can be transformed into single non-atomic Read-Modify-Write instructions. Corresponding operands must match up, and it is limited to the operator/type combinations that have simple lowerings. For suitable sequences, an RMW pseudo-instruction is added. Extra variables are attached to the RMW instruction and the original Store instruction, to make it easy to figure out whether to retain the original Store instruction or the new RMW instruction (but never both). The RMW instructions are similar to their non-RMW counterparts, except that the RMW instruction has no Dest variable - the Src[0] operand doubles as the memory-operand dest. The x86-32 integrated assembler has some new forms of existing instructions added. Note: this CL puts the machinery in place to identify, lower, and emit RMW operations only for the "add" instruction operating on i32/i16/i8 operands. The next CL will fill in the rest of the options. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1182603004
-
- 17 Jun, 2015 2 commits
-
-
John Porto authored
Creates a single TargetDataLowering. BUG= None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1179313004.
-
Jan Voung authored
We can't run O2 yet because some of the advanced Phi lowering hooks aren't implemented for O2 yet. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1160873006.
-
- 16 Jun, 2015 1 commit
-
-
Jan Voung authored
That way, we don't have to use -mattr=sse2 for ARM in cross tests, etc. Default to NEON for now. Also put in an entry for HW divide in ARM mode. There's bunches of features that are possible though, e.g.,: https://github.com/llvm-mirror/llvm/blob/master/lib/Target/ARM/ARM.td BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1191573003.
-
- 15 Jun, 2015 2 commits
-
-
Jan Voung authored
Emitting the global initializers is mostly the same across each architecture (same filling, alignment, etc.). The only difference is in assembler-directive quirks. E.g., on ARM for ".align N" N is the exponent for a power of 2, while on x86 N is the actual number of bytes. To avoid target-specific directives, use .p2align which is always a power of 2. Similarly, use % instead of @. Either one may be a comment character for *some* architecture, but for the architectures we care about % is not a comment character while @ is sometimes (ARM). Usually MIPS uses ".space N" for ".zero", but the assembler seems to accept ".zero" so don't change that for now. May need to adjust .long in the future too. .word for AArch64 and .4byte for MIPS? Potentially we can refactor the lowerGlobals() dispatcher (ELF vs ASM vs IASM). The only thing target-specific about that is *probably* just the relocation type. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1188603002.
-
John Porto authored
Removes const qualifier for TargetDataLowering::lowerGlobals() and TargetDataLowering::lowerConstants() BUG= None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1177873003.
-
- 12 Jun, 2015 3 commits
-
-
Jan Voung authored
Use PNaCl built binutils, which is known to support ARM and MIPS. Otherwise the system-provided binutils may or may not have that support (mine did not and perhaps expected a prefix like arm-xxx-objcopy for the version that did support arm). Split off from CL to run crosstests for ARM under qemu. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=jpp@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1185703006.
-
Jim Stichnoth authored
These all appear to some degree in spec2k. This is implemented for i8/i16/i32 types. It is done as part of core lowering, so in theory all optimization levels could benefit, but it is explicitly disabled for Om1/O0 to keep things simple there. While clang appears to strength-reduce udiv/urem by a constant power of 2, for some reason it does not always strength-reduce multiplies (given that they appear in the spec2k bitcode). For multiplies by 3, 5, or 9, we can make use of the lea instruction. We can do combinations of shift and lea to multiply by other constants, e.g. 100=5*5*4. If too many operations would be required, just give up and use the mul instruction. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095 R=jpp@chromium.org, jvoung@chromium.org Review URL: https://codereview.chromium.org/1146803002
-
Jim Stichnoth authored
BUG= none R=jpp@chromium.org Review URL: https://codereview.chromium.org/1182673003
-
- 11 Jun, 2015 5 commits
-
-
Jan Voung authored
The ARM linker will check that .o files declare compatible build attributes (e.g., all claim hard-float calling convention, all claim VFP-vX ,etc.). Thus, in order to set up cross tests that link LLC generated code against and Subzero generated code, we need the build attributes to be compatible. Pick ARMv7, hard-float calling convention, and neon, etc. which we use for PNaCl LLVM. Will probably have to reorganize to keep in sync once the ELF writer also emits this. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=kschimpf@google.com, stichnot@chromium.org Review URL: https://codereview.chromium.org/1171563002.
-
John Porto authored
Adjusts the expected unittest output. BUG= None R=kschimpf@google.com, stichnot@chromium.org Review URL: https://codereview.chromium.org/1173353003.
-
Jim Stichnoth authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4167 Move issue https://codereview.chromium.org/1159823004/ here so that it's under the proper email. Review URL: https://codereview.chromium.org/1169533003
-
Jim Stichnoth authored
1. The data symbol __Sz_block_profile_info should never be mangled (for cross tests), similar to runtime helper calls. Add a SuppressMangling override for such variable declarations. 2. When cross tests contain more than one translated object file, we end up with multiple definitions of __Sz_block_profile_info . Work around this by making that symbol weak. 3. Don't try to attach global inits to an EmitterWorkItem that represents a translation error. 4. Update one lit test to reflect the additional profiling value in the data section. 5. Update one lit test to reflect that global initializers are emitted at the end instead of the beginning. The check-unit test is still broken and will be fixed in a separate CL. BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/1180883002
-
John Porto authored
Initializes IceAssembler::Allocator before IceAssembler::Buffer. BUG= None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1177843006.
-
- 10 Jun, 2015 2 commits
-
-
John Porto authored
Renames the assembler* files to IceAssembler*. Fixes whatever breaks. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1179563004.
-
John Porto authored
BUG= None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1147023007.
-
- 08 Jun, 2015 1 commit
-
-
Karl Schimpf authored
Simplify the munging unit tests to follow the new NaCl utilities for munging tests. Note that this CL takes advantage of changes added by CL https://codereview.chromium.org/1140153004 BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1149423011
-
- 05 Jun, 2015 4 commits
-
-
John Porto authored
Merge branch 'master' of https://chromium.googlesource.com/native_client/pnacl-subzero into subzero-ownership
-
John Porto authored
BUG= None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1149213006
-
John Porto authored
-
Jan Voung authored
Sext, etc. usually uses shifts (especially for i1 and i64) so implement shift, then implement those casts. Implement just enough of bitcast to handle accessing global addresses (used by some tests). Otherwise, most other bitcasts are from GPR to FP and FP regs aren't modeled yet. Generally following the GCC style for 64-bit shifts. This takes advantage of the flexible second operand in a "orr", and takes advantage of the shift-beyond bitwidth saturation. LLVM is almost the same, but only seems to take advantage on one side of the 32-bits, not the other side. Should really get some of the execution tests running to test this behavior! Fix InstARM32Str::dump(). Str doesn't have a Dest, so use Src. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1143323013
-
- 04 Jun, 2015 2 commits
-
-
Jim Stichnoth authored
Previously, the legalize() function would always force a floating point constant into an xmm register before it could be used in an instruction. This uses an extra register unnecessarily when the instruction allows a memory operand for that operand. We improve this by lowering the FP constant operand to an OperandX8632Mem that wraps a ConstantRelocatable representing the label for the constant pool entry, e.g. [.L$float$0]. (This may end up being copied into an xmm register if the instruction doesn't allow a memory operand for that operand.) BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1163943005
-
Jan Voung authored
The input object may be a QueueStreamer, which the compile server will still have a reference to (even though downstream the memory object API and parser API thinks it has a unique_ptr). Terminate the thread quickly on error, instead of free'ing and causing a use-after-free. Also set up a report_fatal_error handler which has access to the server's state. This allows the server to record the error and stop pushing bytes to the QueueStreamer. Otherwise the QueueStreamer can get full without a consumer still active to unblock. Unfortunately the fatal error handler only terminates the current thread, and not all worker threads. NaCl doesn't have support for signals or pthread_kill. E.g., with pthread_kill(std_thread.native_handle(), SIGABRT). So, other worker/emitter threads will have to hang waiting on more input or something. Random clang-format edits from 3.7. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4163 TEST= tbd: I manually ran the translator a dummy text file (invalid bitcode header), and observed that this no longer crashes. Instead the SRPC calls finish and I see: 3> [17812,4147750656:14:23:02.025382] Streaming file at 100000 bps [17812,4147750656:14:23:12.511574] RPC call failed: Rpc application returned an error. [17812,4147750656:14:23:12.511625] StreamChunk failed [17812,4147750656:14:23:12.511655] stream_file: SendDataChunk failed, but returning without failing. Expect call to StreamEnd.4> rpc call initiated StreamEnd::isss [17812,4147750656:14:23:12.511931] RPC call failed: Rpc application returned an error. rpc call complete StreamEnd::isss output 0: i(0) output 1: s("") output 2: s("") output 3: s("Invalid PNaCl bitcode header") [17812,4147750656:14:23:12.512102] Command [rpc] failed. R=kschimpf@google.com, stichnot@chromium.org Review URL: https://codereview.chromium.org/1168543002
-
- 03 Jun, 2015 2 commits
-
-
Jim Stichnoth authored
This is turned into a separate (O2-only) pass that looks for opportunities: 1. A Load instruction, or an AtomicLoad intrinsic that would be lowered just like a Load instruction 2. Followed immediately by an instruction with a whitelisted kind that uses the Load dest variable as one of its operands 3. Where the whitelisted instruction ends the live range of the Load dest variable. In such cases, the original two instructions are deleted and a new instruction is added that folds the load into the whitelisted instruction. We also do some work to splice the liveness information (Inst::LiveRangesEnded and Inst::isLastUse()) into the new instruction, so that the target lowering pass might still take advantage. Currently this is used quite sparingly, but in the future we could use that along with operator commutativity to choose among different lowering sequences to reduce register pressure. The whitelisted instruction kinds are chosen based primarily on whether the main operation's native instruction can use a memory operand - e.g., arithmetic (add/sub/imul/etc), compare (cmp/ucomiss), cast (movsx/movzx/etc). Notably, call and ret are not included because arg passing is done through simple assignments which normal lowering is sufficient for. BUG= none R=jvoung@chromium.org, mtrofin@chromium.org Review URL: https://codereview.chromium.org/1169493002
-
Jim Stichnoth authored
BUG= none R=jvoung@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/1162903003
-
- 02 Jun, 2015 1 commit
-
-
Jan Voung authored
Thought leaving "mov" simple and not handle memory operands, but then we'd have to duplicate some of the lowerAssign code for lowerLoad =/ BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=kschimpf@google.com, stichnot@chromium.org Review URL: https://codereview.chromium.org/1152703006
-
- 01 Jun, 2015 4 commits
-
-
Jim Stichnoth authored
1. Change Makefile.standalone from 3.6 to 3.7. 2. Update to new load instruction .ll syntax. This includes changing InstLoad::dump() to match. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1161543005
-
Jim Stichnoth authored
BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/1161353002
-
Jan Voung authored
Split out some of the addProlog code from x86 and reuse that for ARM. Mainly, the code that doesn't concern preserved registers or stack arguments is split out. ARM push and pop take a whole list of registers (not necessarily consecutive, but should be in ascending order). There is also "vpush" for callee-saved float/vector registers but we do not handle that yet (the register numbers for that have to be consecutive). Enable some of the int-arg.ll tests, which relied on addPrologue's finishArgumentLowering to pull from the correct argument stack slot. Test some of the frame pointer usage (push/pop) when handling a variable sized alloca. Also change the classification of LR, and PC so that they are not "CalleeSave". We don't want to push LR if it isn't overwritten by another call. It will certainly be "used" by the return however. The prologue code only checks if a CalleeSave register is used somewhere before deciding to preserve it. We could make that stricter and check if the register is also written to, but there are some additional writes that are not visible till after the push/pop are generated (e.g., copy from argument stack slot to the argument register). Instead, keep checking use only, and handle LR as a special case (IsLeafFunction). BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1159013002
-
Jim Stichnoth authored
This is similar to the way a load instruction may be folded into the next arithmetic instruction. Usually the effect is to improve a sequence like: mov ax, WORD PTR [mem] movsx eax, ax into this: movsx eax, WORD PTR [mem] without actually improving register allocation, though other kinds of casts may have different improvements. Existing tests needed to be fixed when they "inadvertently" did a cast to i32 return type and triggered the optimization when it wasn't wanted. These were fixed by inserting a "dummy" instruction between the load and the cast. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1152783006
-
- 27 May, 2015 3 commits
-
-
Jan Voung authored
So far we've been using ldr/str (32-bit) to load/store the whole stack slot, independent of the variable type. Toggle on some tests that didn't have an Om1 variant previously. Didn't toggle everything since there are still some problems with liveness from code being unimplemented. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1144923008
-
Jim Stichnoth authored
It turns out that code deleted in 9a05aea8 actually had a legitimate purpose, so it is added back, this time with more extensive comments justifying it. Also, takes the instruction's IsDestNonKillable flag into account when updating the live register usage count (along with extra comments on why that is necessary). Furthermore, removes an unnecessary assert that otherwise fails when --asm-verbose is used with --filetype=iasm or --filetype-obj. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1158113002
-
Jan Voung authored
Might have gotten replaced by some other field, but don't quite remember. Spotted while looking for ways to share the addProlog() code between targets. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1158713005
-
- 26 May, 2015 2 commits
-
-
Jim Stichnoth authored
Fixes a bug where a num-uses counter wasn't being updated because of C operator && semantics. The code was something like "if (A && --B) ..." but we want --B to happen even when A is false. Sorts the LiveIn and LiveOut lists by regnum so that the lists always display the set of registers in a consistent/familiar order. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1152813003
-
Jan Voung authored
Lower alloca in a way similar to x86. Subtract the stack and align if needed, then copy that stack address to dest. Sometimes use "bic" for the mask, sometimes use "and", depending on what fits better. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1156713003
-
- 22 May, 2015 1 commit
-
-
Jan Voung authored
Allow instructions to be predicated and use that in lower icmp and branch. Tracking the predicate for almost every instruction is a bit overkill, but technically possible. Add that to most of the instruction constructors except ret and call for now. This doesn't yet do compare + branch fusing, but it does handle the branch fallthrough to avoid branching twice. I can't yet test 8bit and 16bit, since those come from "trunc" and "trunc" is not lowered yet (or load, which also isn't handled yet). Adds basic "call(void)" lowering, just to get the call markers showing up in tests. 64bit.pnacl.ll no longer explodes with liveness consistency errors, so risk running that and backfill some of the 64bit arith tests. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1151663004
-