- 31 Jul, 2015 4 commits
-
-
John Porto authored
This CL disables the X86 assembler tests by default. They take too long to compile, so there's very little point in running them with the other unittests. This CL fixes a bug introduced in https://codereview.chromium.org/1260163003/ that caused liveness analysis to assert due to a uninitialized Variable. BUG= R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1266863002.
-
Jan Voung authored
Make a post-register allocation and post-addProlog pass to go through variables with stack offsets and legalize them in case the offsets are not encodeable. The naive approach is to reserve IP, and use IP to movw/movt the offset, then add/sub the frame/stack pointer to IP and use IP as the new base instead of the frame/stack pointer. We do some amount of CSE within a basic block, and share the IP base pointer when it is (a) within range for later stack references, and (b) IP hasn't been clobbered (e.g., by a function call). I chose to do this greedy approach for both Om1 and O2, since it should just be a linear pass, and it reduces the amount of variables/instructions created compared to the super-naive peephole approach (so might be faster?). Introduce a test-only flag and use that to artificially bloat the stack frame so that spill offsets are out of range for ARM. Use that flag for cross tests to stress this new code a bit more (than would have been stressed by simply doing a lit test + FileCheck). Also, the previous version of emitVariable() was using the Var's type to determine the range (only +/- 255 for i16, vs +/- 4095 for i32), even though mov's emit() always uses a full 32-bit "ldr" instead of a 16-bit "ldrh". Use a common legality check, which uses the stackSlotType instead of the Var's type. This previously caused the test_bitmanip to spuriously complain, even though the offsets for Om1 were "only" in the 300 byte range. With this fixed, we can then enable the test_bitmanip test too. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1241763002 .
-
Qining Lu authored
1. Basic block reordering can be enabled with -reorder-basic-blocks option enabled. Blocks will be sorted according to the Reversed Post Traversal Order, but the next node to visit among all candidate children nodes is selected 'randomly'. Example: A / \ B C \ / D This CFG can generate two possible layouts: A-B-C-D or A-C-B-D 2. Fix nop insetion Add checks to avoiding insertions in empty basic blocks(dead blocks) and bundle locked instructions. BUG= R=jpp@chromium.org, jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1255303004.
-
Jan Voung authored
The Subzero build inside of the LLVM build system turns this on. BUG=none R=ascull@google.com Review URL: https://codereview.chromium.org/1264913005 .
-
- 30 Jul, 2015 2 commits
-
-
Andrew Scull authored
Jump table emission is delayed until offsets are known. X86 local jumps can be near or far. Sanboxing is applied to indirect jumps from jump table. BUG= R=stichnot@chromium.org, jvoung Review URL: https://codereview.chromium.org/1257283004.
-
Jim Stichnoth authored
After finding a valid linearization of phi assignments, the old approach calls a complicated target-specific method that lowers and ad-hoc register allocates the phi assignments. In the new approach, we use existing target lowering to lower assignments into mov/whatever instructions, and enhance the register allocator to be able to forcibly spill and reload a register if one is needed but none are available. The new approach incrementally updates liveness and live ranges for newly added nodes and variable uses, to avoid having to expensively recompute it all. Advanced phi lowering is enabled now on ARM, and constant blinding no longer needs to be disabled during phi lowering. Some of the metadata regarding which CfgNode a local variable belongs to, needed to be made non-const, in order to add spill/fill instructions to a CfgNode during register allocation. Most of the testing came from spec2k. There are some minor differences in the output regarding stack frame offsets, probably related to the order that new nodes are phi-lowered. The changes related to constant blinding were tested by running spec with "-randomize-pool-immediates=randomize -randomize-pool-threshold=8". Unfortunately, this appears to add about 10% to the translation time for 176.gcc. The cost is clear in the -timing output so it can be investigated later. There is a TODO suggesting the possible cause and solution, for later investigation. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1253833002.
-
- 28 Jul, 2015 2 commits
-
-
John Porto authored
AH is a thorn in the flesh for our X86-64 backend. The assembler was designed to always encode the low 8-bit registers, so %ah would become %spl. While it is true we **could** force %spl to always be encoded as %ah, that would not work if the instruction has a rex prefix. This CL removes references to %ah from TargetX86Base. There used to be 2 uses of ah in the target lowering: 1) To zero-extend %al before an unsigned div: mov <<src0>>, %al mov 0, %ah div <<src1>> This pattern has been changed to xor %eax, %eax mov <<src0>>, %al div <<src1>> 2) To access the 8-bit remainder for 8-bit division: mov %ah, <<dest>> This pattern has been changed to shr $8, %eax mov %al, <<Dest>> BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1260163003. -
John Porto authored
As part of this CL, x86-32 assembler tests are also introduced. They were implemented before the x86 base assembler template was modified for x86-64 support. BUG=https://code.google.com/p/nativeclient/issues/detail?id=4077 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1224173006.
-
- 23 Jul, 2015 4 commits
-
-
Andrew Scull authored
BUG= R=jvoung@chromium.org, jvoung Review URL: https://codereview.chromium.org/1247833003.
-
Andrew Scull authored
During switch lowering a binary search tree is created. The height of this tree is usually small so no need for heap allocation. BUG= R=jvoung@chromium.org, jvoung, stichnot Review URL: https://codereview.chromium.org/1240323005.
-
Karl Schimpf authored
Added command line flag "--bitcode-as-text", and triggered the acceptance of textual bitcode on this flag. To make sure this isn't added to the sandboxed translator, the reading of bitcode text is also dependent on the translator not being a minimal build. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4222 R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1215463014 .
-
Andrew Scull authored
After contracting a node reorder it as unreachable even if it hasn't yet been placed. BUG= R=jvoung@chromium.org, jvoung, stichnot Review URL: https://codereview.chromium.org/1246173003.
-
- 21 Jul, 2015 5 commits
-
-
Jan Voung authored
The X86 code was switch out here: https://codereview.chromium.org/1216933015/diff/150001/src/IceTargetLoweringX86Base.h The important bit might be that it's static const char * instead of static IceString. This removes static ctor/dtor for that array, which LTO doesn't seem to be able to optimize out, leaving ARM and MIPS symbols in the X86-only build. After changing it to static const char *, LTO is able to optimize out the ARM and MIPS symbols in the x86-only build, saving about 3KB of .text and few bytes of .rodata. BUG=none R=jpp@chromium.org Review URL: https://codereview.chromium.org/1246013004 .
-
John Porto authored
Previously, TargetX8632 was defined as class TargetX8632 : public TargetLowering; and its create method would do TargetX8632 *TargetX8632::create() { return TargetX86Base<TargetX8632>::create() } TargetX86Base<M> was defined was template <class M> class TargetX86Base : public M; which meant TargetX8632 had no way to access methods defined in TargetX86Base<M>. This used to not be a problem, but with the X8664 backend around the corner it became obvious that the actual TargetX86 targets (e.g., X8632. X8664SysV, X8664Win) would need access to some methods in TargetX86Base (e.g., _mov, _fld, _fstp etc.) This CL changes the class hierarchy to something like TargetLowering <-- TargetX86Base<X8632> <-- X8632 <-- TargetX86Base<X8664SysV> <-- X8664SysV (TODO) <-- TargetX86Base<X8664Win> <-- X8664Win (TODO) One problem with this new design is that TargetX86Base<M> needs to be able to invoke methods in the actual backends. For example, each backend will have its own way of lowering llvm.nacl.read.tp. This creates a chicken/egg problem that is solved with (you guessed) template machinery (some would call it voodoo.) In this CL, as a proof of concept, we introduce the TargetX86Base::dispatchToConcrete template method. It is a very simple method: it downcasts "this" from the template base class (TargetX86Base<TargetX8664>) to the actual (concrete) class (TargetX8632), and then it invokes the requested method. It uses perfect forwarding for passing arguments to the method being invoked, and returns whatever that method returns. A simple proof-of-concept for using dispatchToConcrete is introduced with this CL: it is used to invoke createNaClReadTPSrcOperand on the concrete target class. In a way, dispatchToConcrete is a poor man's virtual method call, without the virtual method call overhead. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077 R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1217443024. -
Andrew Scull authored
BUG= R=stichnot@chromium.org, jvoung@chromium.org Review URL: https://codereview.chromium.org/1248823003.
-
Andrew Scull authored
BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1245063003.
-
Jan Voung authored
For pc-rel fixups, we have a ConstantRelocatable referring to Foo+0, and and the offset "-4" is encoded in the code buffer (but not the ConstantRelocatable object). Thus we need to load from the code buffer in order to get that "-4" instead of just taking the +0 from Foo+0. For non-pc-rel fixups, we have the ConstantRelocatable with a true offset, and we also write that offset into the code buffer (for ELF REL and not RELA, it expects the offset in the code buffer). In this case, we want to choose one and not double-count. BUG=none 176.gcc seemed to be failing when compiled with --filetype=iasm... load address for 64-bit pointers were +8 instead of +4 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1241313002 .
-
- 20 Jul, 2015 1 commit
-
-
Andrew Scull authored
This includes the high level analysis of switches, the x86 lowering, the repointing of targets in jump tables and ASM emission of jump tables. The technique uses jump tables, range test and binary search with worst case O(lg n) which improves the previous worst case of O(n) from a sequential search. Use is hidden by the --adv-switch flag as the IAS emission still needs to be implemented. BUG=None R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1234803007.
-
- 16 Jul, 2015 1 commit
-
-
Jan Voung authored
This way, prelowerPhi can be shared between 32-bit targets (split 64-bit values into 32-bit ones, and legalize undef). Suggestions from template experts on how to share prelowerPhi welcome. I'm not particularly happy with the first pass in that legalizeUndef has to be made public (though other methods used are also public). Also the methods required from the template type TargetT aren't clear without looking through the code. The current advanced phi lowering code depends on lowerPhiAssignments. That is a special case of lowerAssign that does some adhoc register allocation. The current adhoc register allocation doesn't work as well when a target may need to spill more than one register. Disable that optimization for ARM for now, until we have a better way that works for ARM, and enable O2 cross testing on ARM. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1223133007 .
-
- 15 Jul, 2015 2 commits
-
-
Jan Voung authored
By factoring out legalizeUndef(), we can use the same logic in prelowerPhis which may help if we ever change the value used (though if we switch from zero-ing out regs to using uninitialized regs, it'll take more work -- e.g., can't return a 64-bit reg). For x86, use legalizeUndef where it's clear that the value is immediately fed to loOperand/hiOperand then another legalize() call. Otherwise, leave the general X = legalize(X); alone where the code is counting on that being the sole legalization. For x86 legalize(const64) is a pass-through, which can then be passed to loOperand/hiOperand nicely. However, for ARM, legalize(const64) may end up trying to copy the const64 to a register, but we don't have 64-bit registers. Instead do legalizeUndef(X) where x86 would have just done legalize(X). This happens to work because legalizeUndef doesn't try to copy to reg, and we immediately pass the result to loOperand/hiOperand() which then passes the result to a real legalization call. Add a few more undef tests. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1233903002 .
-
Jim Stichnoth authored
Specifically, we were ending up with Encoded_Reg_xmm0=0 yet Encoded_Reg_xmm1=10, Encoded_Reg_xmm2=11, etc. It's a mystery as to why this wasn't triggering any failures with filetype!=asm. BUG= none R=jpp@chromium.org Review URL: https://codereview.chromium.org/1231973003.
-
- 13 Jul, 2015 1 commit
-
-
Jan Voung authored
Clang appears to be missing an include path to find bits/c++config.h so we were unable to compile the unsandboxed c++ based cross tests and link against the subzero unsandboxed ARM object files. Work around this for now by finding and including the missing path. Turn on a few ARM cross tests that should be working (mem_intrin and test_strengthreduce -- though the strength-reduction isn't done for ARM). The test_bitmanip still fails, because under Om1 we overflow the stack offset and need to materialize that offset with a register first. Update a few other references that still say x8632. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=jpp@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1232183002 .
-
- 11 Jul, 2015 1 commit
-
-
John Porto authored
More tests will be added during the AssemblerX8664 (on a separate CL.) BUG= R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1231903002.
-
- 10 Jul, 2015 1 commit
-
-
Reed Kotler authored
This changes the run-pnacl-sz script to be more consistent with the other targets. BUG=https://code.google.com/p/nativeclient/issues/detail?id=4167 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1232483003 . Patch from Reed Kotler <reed.kotler@imgtec.com>.
-
- 09 Jul, 2015 2 commits
-
-
Jan Voung authored
Lower stacksave/restore. Lower ctlz, cttz, bswap, and popcount. Popcount is just done with a helper call. Ctz can use the clz instruction after reversing the bits. We can only crosstest stacksave/restore for now which happens to be written in C for the C99 VLAs. The CXX crosstests I can't seem to compile with the arm-cross-g++ (missing headers), so I will check that later after resolving the cross compilation issue. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=jpp@chromium.org Review URL: https://codereview.chromium.org/1222943003 .
-
Jan Voung authored
When compiling with DEBUG, there is a problem linking InstMIPS32. It overrides dump, but never defined that. Also, update the code for some recent changes. Namely, we no longer check ALLOW_DUMP but instead check BuildDefs::dump(). Also, the instruction dtors have been deleted. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4167 R=kschimpf@google.com Review URL: https://codereview.chromium.org/1214863019 .
-
- 08 Jul, 2015 1 commit
-
-
Reed Kotler authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4167 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1176133004 . Patch from Reed Kotler <reed.kotler@imgtec.com>.
-
- 07 Jul, 2015 1 commit
-
-
John Porto authored
This CL introduces the X86Inst templates. The previous implementation relied on template specialization which did not played nice with the new design. This required a lot of other boilerplate code (i.e., tons of new named constructors, one for each X86Inst.) This CL also moves X8632 code out of the X86Base{Impl}?.h files so that they are **almost** target agnostic. As we move to adding other X86 targets more methods will be moved to the target-specific trait class (e.g., call/ret/argument lowering.) BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1216933015.
-
- 06 Jul, 2015 3 commits
-
-
Andrew Scull authored
Accidentally resurrected during rebase when it shouldn't have been. BUG= R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1217503011.
-
Andrew Scull authored
There were many // comment used to document classes, functions etc. but those are not picked up by doxygen which expects /// comments. This converts many comments from // to /// in order to improve the generated documentation. BUG= R=jvoung@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/1216963007.
-
Jan Voung authored
The original arithmetic lowering was introducing some unused mov instructions from legalization (e.g., the upper part of shift num bits -- which should be 0 anyway), and div helper calls don't actually use the legalized parameters (handled separately by lowerCall). These unused instructions cause the Om1 allocator to assert that LRBegin exists but LREnd does not. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=kschimpf@google.com Review URL: https://codereview.chromium.org/1210073017.
-
- 30 Jun, 2015 6 commits
-
-
John Porto authored
As part of the refactoring moves the MachineTraits<TargetX8632> to a separate header. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1216033004.
-
Karl Schimpf authored
Deals with fact that minimal builds generate simple "generic" error messages, rather than descriptive error messages. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1209083005.
-
Jan Voung authored
BUG=none R=jpp@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/1219883003.
-
Jan Voung authored
ARM normally just returns 0 when dividing by 0 with the software and hw implementations, which is different from what X86 does. So, for NaCl, we've modified LLVM to trap by inserting explicit 0 checks. Uses -mattr=hwdiv-arm attribute to decide if 32-bit sdiv/udiv are supported. Also lower the unreachable-inst to a trap-inst, since we need a trap instruction for divide by 0 anyway. Misc: fix switch test under MINIMAL=1, since ARM requires allow_dump for filetype=asm. Random clang-format changes... TODO: check via cross tests BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1214693004.
-
Karl Schimpf authored
If the bitcode parser detects that the last block in the function is missing a terminator, generate an error message and insert a terminator instruction. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4214 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1210013005.
-
Jan Voung authored
No bool-folding optimization, just the straightforward compare followed by mov and conditional mov. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1211243005.
-
- 29 Jun, 2015 3 commits
-
-
Andrew Scull authored
A naive implementation of switch lowering using sequential tests for each of the cases. BUG= none TEST=switch-opt.ll R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1213593002.
-
Andrew Scull authored
BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1207823002.
-
John Porto authored
IceCfg::getAssembler() is a template that simply static_casts the CFG's assembler. This could potentially be problematic in the future, so we enabled the (relatively) cheap llvm dyn_cast operator for Assemblers. This CL also renames assembler_mips32.h to IceAssemblerMIPS32.h. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1211863004.
-