- 29 Oct, 2015 2 commits
-
-
Karl Schimpf authored
Adds a new type of fixup to handle the relocatable fixups needed for movw and movt on a global addresses. Also adds movw and movt methods to the ARM assembler. Also makes ARM register names visible (without a target lowering object), so that the ARM integrated assembler can generate the appropriate assembly. Note that the integrated assembler needs to generate the corresponding movw/movt, and follows the instruction with the bytes that appear in the corresponding assembler buffer. This allows the ability to test if we have generated the correct values, and will be set up properly for ELF emission. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1424863005 .
-
Karl Schimpf authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1424213003 .
-
- 28 Oct, 2015 2 commits
-
-
Jim Stichnoth authored
Sets the stage for enabling the use of the 8-bit high registers, but doesn't yet turn it on because more work is needed for correctness. In the lowering, typing is tightened up so that we don't specify e.g. eax when we really mean ax or al. This gets rid of the ShiftHack hack. The one exception is the pinsr instruction which always requires an r32 register even if the memory operand is m8 or m16. The x86 assembler unit tests are fixed, by not passing a GlobalContext arg to the Assembler ctor. Many constexpr and "auto *" upgrades are applied. Sorry for not putting this into a separate CL - a few local fixes got out of hand... Tested in the following ways: - "make check-lit" - some .ll CHECK line changes due to register randomization - "make check-xtest" - "make check-xtest" with forced filetype=asm (via local .py hack) - spec2k with all -filetype options - compare before-and-after spec2k filetype=asm output - a few differences where the correct narrow register is used instead of the full-width register To do in the next CL: 1. Add new register classes: (a) 32-bit GPR truncable to 8-bit (eax, ecx, edx, ebx) (b) 16-bit GPR truncable to 8-bit (ax, cx, dx, bx) (c) 8-bit truncable from 16/32-bit (al, bl, cl, dl) (c) 8-bit "mov"able from ah/bh/ch/dh 2. Enable use of ah/bh/ch/dh for x86-32. 3. Enable use of ah (but skip bh/ch/dh) for x86-64. 4. Statically initialize register tables in the TargetLowering subclass. BUG= none R=jpp@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/1419903002 .
-
John Porto authored
Implements the Availability optimization: a = b x = f(a, c) becomes a = b x = f(b, c) This only triggers if b is an infinite-weight temporary, and it prevents a potential spill at the cost of higher register pressure. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1424873003 .
-
- 27 Oct, 2015 5 commits
-
-
David Sehr authored
This adds some more patterns to address mode recovery to recover ConstantRelocatables as displacements, and a few more generalizations that catch indexed addressing. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1428443002 .
-
Karl Schimpf authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1410183004 .
-
Karl Schimpf authored
Fixes a couple of bugs that stopped the ARM integrated assembler from generating assembly code for any spec2k examples. Fixes are: 1) Handle conditional branches with no else branch. 2) Fix usage of fixups so that the emit method does any needed buffer lookups. This fixes case where textual fixups (with zero length) appear at the end of the assembly file. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1417173003 .
-
Karl Schimpf authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1424773002 .
-
Karl Schimpf authored
Adds an explicit branch instruction (near form only), which allows branching from the current pc up to 2**26 bytes (in either direction). For now, this near restriction (within a function) doesn't appear to be a bad restriction, and only near jumps have been implemented. Also fixes notationally the concepts of the following types: InstValueType : The 32-bit encoding of an instruction value. InstOffsetType : Offset (+/-) used within an instruction. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1418313003 .
-
- 23 Oct, 2015 1 commit
-
-
Karl Schimpf authored
Fixes an issue where branches don't compile in the hybrid integrated assembler because some jump instructions have not yet been integrated. It does this by adding an instruction label for each corresponding label generated by the standalone ARM assembler. Note that in order to fix this, I had to change the signature of virtual method Assembler::bindCfgNodeLabel to get the Cfg node (rather than the index value). This allows the ARM hybrid assembler to generate a label for each CfgNode (using the getAsmName() method). BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1407273006 .
-
- 22 Oct, 2015 1 commit
-
-
Karl Schimpf authored
Adds a notion of a hybrid assembler. That is, if the integrated assembler can lower an instruction to bytes, it does. Otherwise, it uses the standalone assembler to generate text as the placeholder for the instruction. This is done using a textual fixup in the assembly buffer. The advantage of the hybrid assembler is that one can incrementally implement the integrated assembler and still test the generated assembly. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1418523002 .
-
- 21 Oct, 2015 1 commit
-
-
Jim Stichnoth authored
This patch is essentially the same as for ARM https://codereview.chromium.org/1127963004 I have incorporated the new 64 bit register work which was not available at the time of this earlier patch. The MIPS O32 Abi is not perfect on this patch but I am more or less following the development of the ARM patches and those were preliminary at this stage too. I will make corrections in a later patch when I incorporate more of the ARM patches. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4167 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1416493002 .
-
- 17 Oct, 2015 1 commit
-
-
Karl Schimpf authored
Also cleans up comments and condition violations for all implemented ARM instructions. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1411873002 .
-
- 16 Oct, 2015 5 commits
-
-
Jim Stichnoth authored
BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/1407263005 .
-
David Sehr authored
Generalize folding of icmp instructions into br. 64-bit comparisons are considered as candidates unless they feed a select. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1407143002 .
-
Jim Stichnoth authored
Also remind the user of that option in IceConverter.cpp, similar to PNaClTranslator.cpp. BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/1408023004 .
-
John Porto authored
With this CL, Spec2k built by the Sz ARM32 backend runs and verifies successfully. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1407063002 .
-
Karl Schimpf authored
Add code to handle spilling stack variables. That is, add code to handle loading and storing to stack addresses. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=jpp@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1402403002 .
-
- 15 Oct, 2015 2 commits
-
-
Jim Stichnoth authored
1. Helper function sameVarOrReg() also needs to return true if the two physical registers alias or overlap. Otherwise advanced phi lowering may pick an incorrect ordering. 2. With -asm-verbose, redundant truncation assignments expressed as _mov instructions, like "mov cl, ecx", need to have their register use counts updated properly, so that the LIVEEND= annotations are correct. 3. The register allocator should consider suitably typed aliases when choosing a register preference. 4. When evicting a variable, the register allocator should decrement the use count of all aliases. 5. When saving/restoring callee-save registers in the prolog/epilog, map each register to its "canonical" register (e.g. %bl --> %ebx) and make sure each canonical register is only considered once. 6. Remove some unnecessary Variable::setMustHaveReg() calls. 7. When assigning bool results as a constant 0 or 1, use an 8-bit constant instead of 32-bit so that only the 8-bit register gets assigned. BUG= none TEST= make check, plus spec2k -asm-verbose output is unchanged R=kschimpf@google.com Review URL: https://codereview.chromium.org/1405643003 .
-
David Sehr authored
Comparisons with zero can be done with no branches in most cases and with simpler sequences of operations. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1406593003 .
-
- 14 Oct, 2015 1 commit
-
-
Karl Schimpf authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1388323003 .
-
- 13 Oct, 2015 2 commits
-
-
Karl Schimpf authored
Also does some bikeshed clean ups. In particualr, the (ARM) instruction method emitIAS only needs to choose the applicable ARM instruction, and then passes the corresponding operands to the corresponding instruction method of the assembler. The assembler method then extracts the appropriate data from the operands, and decides which rule to apply for the corresponding arm instruction. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=jpp@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1407613002 .
-
Karl Schimpf authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=jpp@chromium.org Review URL: https://codereview.chromium.org/1397043003 .
-
- 12 Oct, 2015 1 commit
-
-
Jim Stichnoth authored
The original code only looked at top-level source operands in the defining instruction, with a TODO to instead consider all inner variables in the instruction. The primary reason is so that we end up with more instructions like mov eax, eax which are later elided as redundant assignments. A secondary reason is to foster more instructions like: mov ecx, [ecx] rather than mov eax, [ecx] where ecx's live range ends. This hopefully keeps eax (in the latter case) free for longer and maybe allow some other variable to get a register. By considering all instruction variables, we enable this. BUG= none R=jpp@chromium.org Review URL: https://codereview.chromium.org/1392383003 .
-
- 09 Oct, 2015 5 commits
-
-
Jim Stichnoth authored
If a variable gets a register but is later evicted because of a higher-weight variable, there's a chance that the first variable could have been allocated a register if only its initial choice had been different. To improve this, we keep track of which variables are evicted, and then allow register allocation to run again, focusing only on those once-evicted variables, and not changing any previous register assignments. This can iterate until there are no more evictions. This is more or less what the linear-scan literature describes as "second-chance bin-packing". BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095 R=jpp@chromium.org Review URL: https://codereview.chromium.org/1395693005 .
-
Karl Schimpf authored
Extends the ARM32 assembler to be able to generate a trivial function footprint using the -filetype=iasm option. Also does a couple of cleanups: 1) Move UnimplementedError macro to common location so that it can be used by everyone. 2) Add a GlobalContext argument to the assembler, so that it can look at flags etc. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1397933002 .
-
Jim Stichnoth authored
The LiveIn and LiveOut register sets are printed for each basic block in -asm-verbose mode. These sets would generally include the stack and/or frame pointer registers, which is just noise, so we suppress that. BUG= none R=jpp@chromium.org Review URL: https://codereview.chromium.org/1399523003 .
-
Jim Stichnoth authored
In x86 lowering, i1 values are held in i8 register and memory slots. We were conservatively "and"ing them with 1 before zero-extending them for some lowering operations, but this "and" with 1 is unnecessary and just clutters the code. We continue the invariant that all i1-produced values in an i8 slot are either 0 or 1. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095 R=jpp@chromium.org Review URL: https://codereview.chromium.org/1394413002 .
-
Jim Stichnoth authored
BUG= none R=jpp@chromium.org Review URL: https://codereview.chromium.org/1392403002 .
-
- 08 Oct, 2015 2 commits
-
-
Jim Stichnoth authored
BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/1396923002 .
-
Karl Schimpf authored
Adds message to use "-allow-externally-defined-symbols" on bad linkage errors. Also cleans up code by defining common reporting routine. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1392273002 .
-
- 07 Oct, 2015 3 commits
-
-
Karl Schimpf authored
Creates a local version of the Dart assembler code, before being merged into our code base. The goal of these files is to track code as it is moved from the Dart implementation into our code base. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4334 R=jpp@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1394613002 .
-
Karl Schimpf authored
The existing code, when run on a fuzzed example, generates a runtime assertion. The reason for this is that the input defines "memmove" as an external global. However, the code generator can generate calls to "memmove" which assumes it is internal (see PNaCl ABI). As a result, the assertion that checks that global names are unique (for memmove) fails. This code fixes the problem by checking that global names are internal, unless they are one of the "start" functions, or the function is an intrinsic. To allow for non-PNaCl ABI input, a flag was added to allow functions to be external. However, in such cases the external can't be one of Subzero's runtime helper functions. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4330 R=jpp@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1387963002 .
-
David Sehr authored
For operations such as t0 = t1 + t2 Subzero's pattern for arithmetic operations generates two address code that looks like movl ...t1..., %ecx addl ...t2..., %ecx // t0 is in %ecx When register pressure is high this sometimes becomes: movl ...t2..., SPILL movl ...t1..., %ecx addl SPILL, %ecx // t0 is in %ecx This CL takes advantage of cases where the use of t2 is the last one, so the register that held t2 before the operation can be reused. The optimization simply swaps the (commutative) operation to t0 = t2 + t1 which then generates code as movl ...t2..., %ecx addl ...t1..., %ecx // t0 is in %ecx This optimization is used for any commutative operation, which now includes Fadd and Fmul, which were erroneously marked as non-commutative. See the rationale in IceInst.def for the IEEE wordings. BUG= R=jfb@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1371703003 .
-
- 06 Oct, 2015 2 commits
-
-
David Sehr authored
Previously we did not take advantage of the three address versions of the imul instruction. With this we are able to avoid some copies before imuls. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1365433004 .
-
John Porto authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1369333003 .
-
- 05 Oct, 2015 3 commits
-
-
Jim Stichnoth authored
Originally, the lowering sequence looked like: T = b T *= b a = T Now it looks like: T = b T *= T a = T If "b" gets a register and its live range ends after this instruction, then the new lowering sequence allows its register to be reused for "T". This decreases register pressure, and removes an instruction (register move) from what could be a critical path. This optimization is actually applicable for most arithmetic operations whose source operands are identical, but mul/fmul are the only ones that seem at all likely in practice. BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/1377213004 .
-
Jim Stichnoth authored
This issue was discovered as the result of a spurious "make check-lit" failure in undef.ll. The problem is that constant pool label strings depend on the order the constants are created, and this order can be different with multithreaded translation. Even -filetype=obj is affected by this, because the label string is put into the ELF .o file. This means that different runs of Subzero on the same input could potentially produce slightly different output. The solution is to base the label name on the actual value of the constant. We do this by using the hex representation of the constant, rather than the sequence number of the constant within the pool. This actually simplifies things a bit, as we no longer need to track the sequence number. In addition, for floating-point constant labels in asm-verbose mode, include a human-readable rendering of the value in the label name. BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/1386593004 .
-
Jim Stichnoth authored
Instead of a comment like this: # preds=.Lfv_update_nonbon$split___114___115_0,.Lfv_update_nonbon$split___138___115_1 remove some redundancy and make the comment like this: # preds=$split___114___115_0,$split___138___115_1 This makes it slightly easier to read, and less likely to exceed 80 columns. BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/1380323003 .
-
- 02 Oct, 2015 1 commit
-
-
Karl Schimpf authored
The pnacl linux x86_64 buildbot doesn't understand ::stdout (it uses a macro to define stdout). Fix by removing :: prefix. Also redirects the error messages to stderr instead of stdout. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1383053002 .
-