Commits · 36087cd4e098def6c9e7dfc9072edcd34888fff6 · Chen Yisong / swiftshader

24 Jun, 2015 3 commits

authored Jun 24, 2015

It turns out that using using TargetLowering::<member> causes problems when compiling with g++. The problem was fixed by using
Machine:: instead, where Machine is the template parameter. With name-dependent identifier, g++ does the right thing.

BUG= None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1208663002.

36087cd4

Subzero: Reduce the amount of #ifdef'd code. · 20b71f58

authored Jun 24, 2015

Try to make most #ifdef'd code be compiled under all configurations,
to catch code rot earlier.  When #ifdef code is required, try to use
it only to guard trivial code like "return;".

BUG= none
R=jpp@chromium.org

Review URL: https://codereview.chromium.org/1197863003

20b71f58

Remove unnecessary TEXTBC_LIBS makefile definition. · 28f3f731

authored Jun 24, 2015

Jan correctly pointed out that this makefie definition was redundant
since -lLLVMNaClBitTestUtils was already defined in LLVM_LIBS_LIST.
Removing the definition and simplifying makefile.

BUG=None
R=jpp@chromium.org

Review URL: https://codereview.chromium.org/1211593003.

28f3f731

23 Jun, 2015 4 commits

Subzero. Adds x86-64 to the list of supported Subzero targets. · d58f01ca

authored Jun 23, 2015

Related changes:
NaCl change: https://codereview.chromium.org/1201483005
LLVM change: https://codereview.chromium.org/1193843016

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1199043006.

d58f01ca

Subzero: Make life a little easier for emacs users. · 8fa8b437

authored Jun 23, 2015

Emacs will try to execute .dir-locals.el whenever loading a file under
the subzero directory.  It sets local variables depending on the mode.

Set the fill-column to 80 for c++-mode, c-mode, and python-mode.  The
main use is when using M-q to reformat multi-line comments.

Disable tabs (use spaces instead) in c++-mode, c-mode, and
python-mode.

Set the tab-width to 2 spaces in python-mode.  (The tab-width doesn't
really matter in c++-mode or c-mode thanks to clang-format.)

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/1199133005

8fa8b437

Fix handling of TYPE_CODE_NUMENTRY record when size large. · 74cd883a

authored Jun 23, 2015

Fixes how (very) large size entries in the TYPE_CODE_NUMENTRY is
handled when reading bitcode. Makes sure that we con't call
vector.resize() with too large a value (replacing an allocation
exception with a parse error).

Also tries to clean up type modeling of bitcode indices (references to
values etc in the bitcode). Uses common type NaClBcIndexSize_t and
NaClRelBcIndexSize_t (defined in nacl) to describe these (32-bit)
values.

Note: We use cast truncation of 64-bit values to NaClBcIndexSize_t and
NaClRelBcIndexSize_t, since negative value indices are stored both as
32 and 64 bit values. The truncation cast handles this differences
correctly (and efficiently).

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4195
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1182323011

74cd883a

Extracts an TargetX86Base target which will be used as the common X86{32,64} implementation. · 7e93c62d

authored Jun 23, 2015

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1202533003.

7e93c62d

22 Jun, 2015 8 commits

Subzero: Use more "= default;" for ctors and dtors. · e587d949

authored Jun 22, 2015

Look for "() override {}" and "() final {}" patterns.

Don't touch IceTargetLoweringX8632.* to spare a refactoring in
progress.

BUG= none
R=jpp@chromium.org

Review URL: https://codereview.chromium.org/1201023002

e587d949

Fix llvm makefile to handle macro INPUT_IS_TEXTUAL_BITCODE. · cac05851
Karl Schimpf authored Jun 22, 2015
```
BUG=None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1205463002
```
cac05851

Allow pnacl-sz to be compiled to textual bitcode records. · 6f9ba115

authored Jun 22, 2015

This has been added to allow fuzzing to be applied to textual bitcode
records. When built with make option TEXTUAL_BITCODE=1, the
corresponding generated pnacl-sz will preprocess the input file
(containing the textual form of bitcode records) and generate
a corresponding data stream with the binary form.

Note that the texual form of bitcode records is not LLVM assembly
(i.e. .ll files). Rather, it is sequences of texual integers
corresponding to bitcode records.

Dependent on: https://codereview.chromium.org/1191393004

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4169
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1190413004

6f9ba115

Subzero: Fix "make -f Makefile.standalone MINIMAL=1 check". · c8799688

authored Jun 22, 2015

Some recent ARM changes turned out to break the lit tests for the MINIMAL build. Two main issues:

1. ARM tests are currently asm-only, so allow_dump needs to be required.

2. GlobalContext::emitFileHeader() needs to return gracefully instead of calling report_fatal_error(), to allow error tests to produce the right error output.

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/1202563002

c8799688

Subzero. Fixes memory leaks. · 1bec8bcd

authored Jun 22, 2015

Adds named constructors to initialzers. Removes destructor from Inst.

BUG= None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1181013016.

1bec8bcd

Subzero: Apply commutativity to the RMW optimization. · 8525c329

authored Jun 22, 2015

The read-modify-write (RMW) optimization looks for patterns like this:

  a = Load addr
  b = <op> a, other
  Store b, addr

and essentially transforms them into this:

  RMW <op>, addr, other

This CL also applies the transformation when the middle instruction is
  b = <op> other, a
and <op> is commutative.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
R=jpp@chromium.org

Review URL: https://codereview.chromium.org/1193103005

8525c329

Subzero: Use C++11 member initializers where practical. · eafb56cb

authored Jun 22, 2015

Also change the pattern "foo() {}" into "foo() = default;" for ctors and dtors.

Generally avoids initializing unique_ptr<> members to nullptr in a .h file, because that requires knowing the definition of the underlying class which may not be available to all includers.

BUG= none
R=jpp@chromium.org

Review URL: https://codereview.chromium.org/1197223002

eafb56cb

Add constant blinding/pooling option for X8632 code translation. · 253dc8a8

authored Jun 22, 2015

GOAL:
The goal is to remove the ability of an attacker to control immediates emitted into the text section.

OPTION:
The option -randomize-pool-immediates is set to none by default (-randomize-pool-immediates=none). To turn on constant blinding, set -randomize-pool-immediates=randomize; to turn on constant pooling, use -randomize-pool-immediates=pool.

Not all constant integers in the input pexe file will be randomized or pooled. The signed representation of a candidate constant integer must be between -randomizeOrPoolImmediatesThreshold/2 and +randomizeOrPoolImmediatesThreshold/2. This threshold value can be set with command line option: "-randomize-pool-threshold". By default this threshold is set to 0xffff.

The constants introduced by instruction lowering (e.g. constants in shifting, masking) and argument lowering are not blinded in this way. The mask used for sandboxing is not affected either.

APPROACH:
We use GAS syntax in these examples.

Constant blinding for immediates:
Original:
add 0x1234, eax
After:
mov 0x1234+cookie, temp_reg
lea -cookie[temp_reg], temp_reg
add temp_reg, eax

Constant blinding for memory addressing offsets:
Original:
mov 0x1234(eax, esi, 1), ebx
After:
lea 0x1234+cookie(eax), temp_reg
mov -cookie(temp_reg, esi, 1), ebx

We use "lea" here because it won't affect flag register, so it is safer to transform immediate-involved instructions.

Constant pooling for immediates:
Original:
add 0x1234, eax
After:
mov [memory label of 0x1234], temp_reg
add temp_reg, eax

Constant pooling for addressing offsets:
Original:
mov 0x1234, eax
After:
mov [memory label of 0x1234], temp_reg
mov temp_reg, eax

Note in both cases, temp_reg may be assigned with "eax" here, depends on the
liveness analysis. So this approach may not require extra register.

IMPLEMENTATION:
Processing:
TargetX8632::randomizeOrPoolImmediate(Constant *Immediate, int32_t RegNum);
TargetX8632::randomizeOrPoolImmediate(OperandX8632Mem *Memoperand, int32_t RegNum);

Checking eligibility:
ConstantInteger32::shouldBeRandomizedOrPooled(const GlobalContext *Ctx);

ISSUES:
1. bool Ice::TargetX8632::RandomizationPoolingPaused is used to guard some translation phases to disable constant blinding/pooling temporally. Helper class BoolFlagSaver is added to latch the value of RandomizationPoolingPaused.

Known phases that need to be guarded are: doLoadOpt() and advancedPhiLowering(). However, during advancedPhiLowering(), if the destination variable has a physical register allocated, constant blinding and pooling are allowed. Stopping blinding/pooling for doLoadOpt() won't hurt our randomization or pooling as the optimized addressing operands will be processed again in genCode() phase.

2. i8 and i16 constants are collected with different constant pools now, instead of sharing a same constant pool with i32 constants. This requires emitting two more pools during constants lowering, hence create two more read-only data sections in the resulting ELF and ASM. No runtime issues have been observed so far.

BUG=
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1185703004.

253dc8a8

18 Jun, 2015 4 commits

ARM: Assign "actuals" at call site to the appropriate GPR/stack slot. · b0a8c24e

authored Jun 18, 2015

Actually assign arguments to r0-r3 at the call site. Previously
this was left unhandled. There was only logic for pulling
formal parameters out of r0-r3.

Refactor the GPR counter and move it into a class so that the
rounding up for i64 arguments is in one place for callsites
and for pulling out of parameters. We might be able to use a
similar pattern to count the FP/SIMD registers later.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1187513006.

b0a8c24e

Subzero: Add more kinds of RMW lowering. · cac003e8

authored Jun 18, 2015

Specifically: sub, and, or, xor; for all integer types.

Turns out that RMW is not possible for fadd/fsub/fmul/fdiv as well as operations on vector types, because the corresponding x86 instructions require the result to be in a physical register.

Refactors the assembler's implementations of add/or/adc/sbb/and/sub/xor/cmp to avoid repetition.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/1186713010

cac003e8

Subzero: Correct the cross test's diagnostic message for a test failure. · a9eeb420
Jim Stichnoth authored Jun 17, 2015
```
BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/1195553002
```
a9eeb420

Subzero: Transform suitable Load/Arith/Store sequences into RMW ops. · e4f65d86

authored Jun 17, 2015

Search for sequences of Load/Arith/Store instructions that can be transformed into single non-atomic Read-Modify-Write instructions. Corresponding operands must match up, and it is limited to the operator/type combinations that have simple lowerings.

For suitable sequences, an RMW pseudo-instruction is added. Extra variables are attached to the RMW instruction and the original Store instruction, to make it easy to figure out whether to retain the original Store instruction or the new RMW instruction (but never both).

The RMW instructions are similar to their non-RMW counterparts, except that the RMW instruction has no Dest variable - the Src[0] operand doubles as the memory-operand dest.

The x86-32 integrated assembler has some new forms of existing instructions added.

Note: this CL puts the machinery in place to identify, lower, and emit RMW operations only for the "add" instruction operating on i32/i16/i8 operands. The next CL will fill in the rest of the options.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/1182603004

e4f65d86

17 Jun, 2015 2 commits

Fix a bug that would cause subzero to fail when --threads=0. · 8b1a7051

authored Jun 17, 2015

Creates a single TargetDataLowering.

BUG= None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1179313004.

8b1a7051

Set up crosstest to run simple loop in Om1 on ARM. · 8e32fed5

authored Jun 17, 2015

We can't run O2 yet because some of the advanced Phi lowering
hooks aren't implemented for O2 yet.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1160873006.

8e32fed5

16 Jun, 2015 1 commit

Add a basic enum for ARM InstructionSet / cpu features. · d062f73a

authored Jun 15, 2015

That way, we don't have to use -mattr=sse2 for ARM in
cross tests, etc.

Default to NEON for now. Also put in an entry for HW
divide in ARM mode. There's bunches of features that are
possible though, e.g.,:

https://github.com/llvm-mirror/llvm/blob/master/lib/Target/ARM/ARM.td

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1191573003.

d062f73a

15 Jun, 2015 2 commits

Move lowerGlobal() from target-specific code to emitGlobal() in generic code. · 58eea4d1

authored Jun 15, 2015

Emitting the global initializers is mostly the same across
each architecture (same filling, alignment, etc.). The only difference
is in assembler-directive quirks. E.g., on ARM for ".align N" N is
the exponent for a power of 2, while on x86 N is the actual number
of bytes. To avoid target-specific directives, use .p2align which
is always a power of 2. Similarly, use % instead of @. Either one
may be a comment character for *some* architecture, but for the
architectures we care about % is not a comment character while @
is sometimes (ARM).

Usually MIPS uses ".space N" for ".zero", but the assembler seems
to accept ".zero" so don't change that for now.

May need to adjust .long in the future too.
.word for AArch64 and .4byte for MIPS?

Potentially we can refactor the lowerGlobals() dispatcher
(ELF vs ASM vs IASM). The only thing target-specific about that
is *probably* just the relocation type.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1188603002.

58eea4d1

Removes const qualification for two methods in TargetDataLowering. · 0f86d03c

authored Jun 15, 2015

Removes const qualifier for TargetDataLowering::lowerGlobals() and TargetDataLowering::lowerConstants()

BUG= None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1177873003.

0f86d03c

12 Jun, 2015 3 commits

Build ARM SZ runtime files. Use le32-nacl-objcopy in various places. · 050deaa6

authored Jun 12, 2015

Use PNaCl built binutils, which is known to support ARM and MIPS.
Otherwise the system-provided binutils may or may not have that
support (mine did not and perhaps expected a prefix like
arm-xxx-objcopy for the version that did support arm).

Split off from CL to run crosstests for ARM under qemu.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
R=jpp@chromium.org, stichnot@chromium.org

Review URL: https://codereview.chromium.org/1185703006.

050deaa6

Subzero: Strength-reduce mul by certain constants. · 0933c0cf

authored Jun 12, 2015

These all appear to some degree in spec2k.

This is implemented for i8/i16/i32 types. It is done as part of core lowering, so in theory all optimization levels could benefit, but it is explicitly disabled for Om1/O0 to keep things simple there.

While clang appears to strength-reduce udiv/urem by a constant power of 2, for some reason it does not always strength-reduce multiplies (given that they appear in the spec2k bitcode).

For multiplies by 3, 5, or 9, we can make use of the lea instruction. We can do combinations of shift and lea to multiply by other constants, e.g. 100=5*5*4. If too many operations would be required, just give up and use the mul instruction.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
R=jpp@chromium.org, jvoung@chromium.org

Review URL: https://codereview.chromium.org/1146803002

0933c0cf

Subzero: Fix compilation error in MINIMAL=1 or NOASSERT=1 mode. · 326534a3
Jim Stichnoth authored Jun 11, 2015
```
BUG= none
R=jpp@chromium.org

Review URL: https://codereview.chromium.org/1182673003
```
326534a3

11 Jun, 2015 5 commits

Emit ARM build-attributes in the file scope (as header). · fb79284d

authored Jun 11, 2015

The ARM linker will check that .o files declare compatible
build attributes (e.g., all claim hard-float calling convention,
all claim VFP-vX ,etc.). Thus, in order to set up cross tests that
link LLC generated code against and Subzero generated code,
we need the build attributes to be compatible.

Pick ARMv7, hard-float calling convention, and neon, etc. which
we use for PNaCl LLVM.

Will probably have to reorganize to keep in sync once the ELF
writer also emits this.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
R=kschimpf@google.com, stichnot@chromium.org

Review URL: https://codereview.chromium.org/1171563002.

fb79284d

Unittest fixes. · 8eefffad

authored Jun 11, 2015

Adjusts the expected unittest output.

BUG= None
R=kschimpf@google.com, stichnot@chromium.org

Review URL: https://codereview.chromium.org/1173353003.

8eefffad

First patch for Mips subzero compiler · 6da4cef7

authored Jun 11, 2015

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4167

Move issue https://codereview.chromium.org/1159823004/ here so that
it's under the proper email.

Review URL: https://codereview.chromium.org/1169533003

6da4cef7

Subzero: Fix lit and cross tests broken in . · d9f1f9fc

authored Jun 11, 2015

1. The data symbol __Sz_block_profile_info should never be mangled (for cross tests), similar to runtime helper calls. Add a SuppressMangling override for such variable declarations.

2. When cross tests contain more than one translated object file, we end up with multiple definitions of __Sz_block_profile_info . Work around this by making that symbol weak.

3. Don't try to attach global inits to an EmitterWorkItem that represents a translation error.

4. Update one lit test to reflect the additional profiling value in the data section.

5. Update one lit test to reflect that global initializers are emitted at the end instead of the beginning.

The check-unit test is still broken and will be fixed in a separate CL.

BUG= none
R=kschimpf@google.com

Review URL: https://codereview.chromium.org/1180883002

d9f1f9fc

Fixes a bug in that caused IceAssembler to use Allocator before it was initialized. · 1a9043e7

authored Jun 11, 2015

Initializes IceAssembler::Allocator before IceAssembler::Buffer.

BUG= None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1177843006.

1a9043e7

10 Jun, 2015 2 commits

Renames the assembler* files. · aff4ccf9

authored Jun 10, 2015

Renames the assembler* files to IceAssembler*. Fixes whatever breaks.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/1179563004.

aff4ccf9

Subzero: Basic Block Profiler. · f8b4cc84

authored Jun 09, 2015

BUG= None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1147023007.

f8b4cc84

08 Jun, 2015 1 commit

Clean up unit munging unit tests using common NaCl API. · cbb1d3d7

authored Jun 08, 2015

Simplify the munging unit tests to follow the new NaCl utilities
for munging tests.

Note that this CL takes advantage of changes added by
CL https://codereview.chromium.org/1140153004

BUG=None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1149423011

cbb1d3d7

05 Jun, 2015 4 commits

Merge branch 'master' of… · 8af4aac7

authored Jun 05, 2015

Merge branch 'master' of https://chromium.googlesource.com/native_client/pnacl-subzero into subzero-ownership

8af4aac7

Subzero: adding jpp@chromium.org to OWNERS. · af9032fc
John Porto authored Jun 05, 2015
```
BUG= None
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1149213006
```
af9032fc
Subzero: adding jpp@chromium.org to OWNERS. · 09a18657
John Porto authored Jun 05, 2015

09a18657

Subzero ARM32: Lower shift and zext, sext, and trunc. · 66c3d5ec

authored Jun 04, 2015

Sext, etc. usually uses shifts (especially for i1 and i64)
so implement shift, then implement those casts.

Implement just enough of bitcast to handle accessing
global addresses (used by some tests). Otherwise,
most other bitcasts are from GPR to FP and FP regs
aren't modeled yet.

Generally following the GCC style for 64-bit shifts.
This takes advantage of the flexible second operand in a "orr",
and takes advantage of the shift-beyond bitwidth saturation.
LLVM is almost the same, but only seems to take advantage
on one side of the 32-bits, not the other side. Should really
get some of the execution tests running to test this behavior!

Fix InstARM32Str::dump(). Str doesn't have a Dest, so use Src.

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076
R=stichnot@chromium.org

Review URL: https://codereview.chromium.org/1143323013

66c3d5ec

04 Jun, 2015 1 commit

Subzero: Legalize FP constants directly into memory operands. · 03ffa585

authored Jun 04, 2015

Previously, the legalize() function would always force a floating point constant into an xmm register before it could be used in an instruction. This uses an extra register unnecessarily when the instruction allows a memory operand for that operand.

We improve this by lowering the FP constant operand to an OperandX8632Mem that wraps a ConstantRelocatable representing the label for the constant pool entry, e.g. [.L$float$0]. (This may end up being copied into an xmm register if the instruction doesn't allow a memory operand for that operand.)

BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/1163943005

03ffa585