- 01 Dec, 2014 1 commit
-
-
Jim Stichnoth authored
In -O2 mode, postLower() is supposed to iterate over just the instructions that were most recently added. Instead, it was iterating all the way to the end of the block, also post-lowering high-level ICE instructions that hadn't yet been lowered. This was basically harmless, given that the spec2k asm code is identical after this patch, but it improves performance. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/721333004
-
- 26 Nov, 2014 1 commit
-
-
Jim Stichnoth authored
Use a bigger block size in the bump-pointer allocators, since we basically know up front that we'll need lots of memory. The 1MB value (versus the default of 4KB) was chosen somewhat arbitrarily, and succeeds in pretty much removing bump-pointer related mallocs from the profile. Pre-reserve the a priori known number of edges in getTerminatorEdges() to avoid vector resizing. BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/760973002
-
- 24 Nov, 2014 1 commit
-
-
Jim Stichnoth authored
We need to link with -lpthread now. The CALLTARGETS workaround in our lit tests can be removed, since llvm-objdump has gotten more accurate than before with respect to symbols. The -stats and -rng-seed options need to be renamed to avoid conflicting with the LLVM options being brought in. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3930 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/756543002
-
- 21 Nov, 2014 1 commit
-
-
JF Bastien authored
R=stichnot@chromium.org BUG= https://code.google.com/p/nativeclient/issues/detail?id=3930 Review URL: https://codereview.chromium.org/752603003
-
- 20 Nov, 2014 1 commit
-
-
Jim Stichnoth authored
Internally, create a separate constant pool for each integer type, instead of a single i64 pool that uses the Ice::Type value as part of the key. This means each constant pool key can be a simple primitive value, rather than a tuple. Represent the pools using std::unordered_map instead of std::map since we're using C++11 now. Use signed integers instead of unsigned integers for the integer constant pools, to benefit from sign extension and to be more consistent. Remove the SuppressMangling field from hash and comparison functions on RelocatableTuple, since we'll never have two symbols with the same name but different values of SuppressMangling. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/737513008
-
- 18 Nov, 2014 1 commit
-
-
Jim Stichnoth authored
It's not meant to be comprehensive, but rather to help someone new get started, assuming they already have PNaCl working. BUG= none R=jfb@chromium.org, jvoung@chromium.org Review URL: https://codereview.chromium.org/727583003
-
- 17 Nov, 2014 1 commit
-
-
Karl Schimpf authored
Remove the dump/emit routines when ALLOW_DUMP=0. Also fixes some verbosity messages to not print if ALLOW_DUMP=0. Note: emit routines needed for emitIAS are not turned off. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/686913005
-
- 14 Nov, 2014 4 commits
-
-
Jim Stichnoth authored
This removes the need for Om1's postLower() code which did its own ad-hoc register allocation. And it actually speeds up Om1 translation significantly. This mode of register allocation only allocates for infinite-weight Variables, while respecting live ranges of pre-colored Variables. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/733643005
-
Jan Voung authored
Seems to be part of the non-sfi link now: https://codereview.chromium.org/686723003/diff/180001/pnacl/driver/pnacl-translate.py Otherwise I get: x86-32-linux/lib/unsandboxed_irt.o:(.rodata+0x68): undefined reference to `nacl_secure_random' BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/726093002
-
Jim Stichnoth authored
Even after earlier simplifications, FakeKill was still handled somewhat inefficiently for the register allocator. For x86-32, any function containing call instructions would result in about 11 pre-colored Variables, each with an identical and relatively complex live range consisting of points. They would start out on the UnhandledPrecolored list, then all move to the Inactive list, where they would be repeatedly compared against each register allocation candidate via overlapsRange(). We improve this by keeping around a single copy of that live range and directly masking out the Free[] register set when that live range overlaps the current candidate's live range. This saves ~10 overlaps() calculations per candidate while FakeKills are still pending. Also, slightly rearrange the initialization of the Unhandled etc. sets into a separate init routine, which will make it easier to reuse the register allocator in other situations such as Om1 post-lowering. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/720343003
-
Jim Stichnoth authored
This is purely for convenience of personal testing/debugging. To demonstrate its correctness in this CL, -build-on-read=0 is removed from the two .ll lit tests that explicitly use it, and also from the crosstest.py script. The lit test wrapper run-llvm2ice.py is left unchanged to be safe. BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/732583002
-
- 11 Nov, 2014 1 commit
-
-
Jim Stichnoth authored
Instead, separately compute it during prolog generation via another pass over the Cfg. This may slow down translation by ~1%, but it greatly simplifies the management of this flag/property. The higher motivation is to pull this management out of register allocation to make it easier to extend register allocation for other uses. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/692633004
-
- 06 Nov, 2014 4 commits
-
-
Jim Stichnoth authored
Use LLVM's intrusive list ADT template to implement instruction lists. This embeds prev/next pointers into the instruction, and as such, iterators essentially double as instruction pointers. This means stripping off one level of indirection when dereferencing, and also the range-based for loop can't be used. The performance difference in translation time seems to be 1-2%. I tried to also do this for the much less used PhiList and AssignList, but ran into SFINAE problems. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/709533002
-
Karl Schimpf authored
This CL allows one to time Subzero's bitcode parsing without IR generation (other than types, function declarations, and uninitialized global variables) for performance testing. BUG=None R=jvoung@chromium.org Review URL: https://codereview.chromium.org/696383004
-
Jan Voung authored
Eventually, I wanted to have a flag "UseELFWriter" like: https://codereview.chromium.org/678533005/diff/120001/src/IceCfg.cpp Where the emit OStream would not have text, and only have binary. This refactor hopefully means fewer places to check for a flag to disable the text version of IAS, and be able to write binary. Otherwise, there are some text labels for branches that are still being dumped out. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/700263003
-
Jim Stichnoth authored
Currently NodeList is defined as std::vector<CfgNode*>, but in the future it may be desirable to change it to something like std::list<CfgNode*> so that it is easier to split edges and insert the new nodes at the right locations, rather than re-sorting them in a separate pass. This gets us closer by using foo.front() instead of foo[0]. There are still a couple more places using the [] operator, but the changes would be more intrusive. Also, a few instances of ".size()==0" are changed to the possibly more efficient ".empty()". BUG= none R=jvoung@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/704753007
-
- 05 Nov, 2014 1 commit
-
-
Jan Voung authored
There is one in IceDefs.h and one in IceGlobalInits.h. Can we just use one? BUG=none (mini cleanup) R=kschimpf@google.com, stichnot@chromium.org Review URL: https://codereview.chromium.org/695563006
-
- 04 Nov, 2014 4 commits
-
-
Jan Voung authored
More consistently use auto while doing llvm::dyn_cast/cast in the emit and emitIAS routines. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/699923002
-
Jim Stichnoth authored
Also deletes a few entirely trivial test files. These tests were useful in the early days of Subzero, but now mostly just clutter things up. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/705513002
-
Jim Stichnoth authored
The FakeKill instruction is always used for killing scratch registers, and a fair amount of effort is spent building new copies of the same scratch register list each time (i.e., for each lowered call instruction). As such, we can create one master list of scratch registers and share it among all FakeKill instructions. Also, in all situations where an instruction's Srcs[] were considered for liveness, we had to either explicitly ignore an InstFakeKill instruction, or treat it specially. Now that InstFakeKill lacks any Srcs[] (or Dest), it doesn't need to be specially ignored, and the code is simplified. In addition, the text asm emitter no longer clutters the output with FakeKill comments (and FakeUse as well). BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/691693003
-
Jim Stichnoth authored
BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/701673002
-
- 03 Nov, 2014 3 commits
-
-
Jim Stichnoth authored
The integrated assembler assumed there is at most one fixup per instruction. For x86, there could actually be two fixups, for any instruction that allows memory and immediate operands at the same time. Using the now-default -build-on-read flag, it happens in spec2k - the smallest function I found where this happens is 176.gcc and perm_tree_cons. This changes the textual emission of integrated assembler code to allow for multiple consecutive fixups in a single instruction. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/693393002
-
Jim Stichnoth authored
Build a pure-Subzero binary when neither --include nor --exclude is specified. A pure-Subzero binary is built without flags: -externalize -ffunction-sections -fdata-sections which is good because that configuration is closer to what real usage will be, and will get more testing during development. BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/693393003
-
Karl Schimpf authored
Adds timers to each bitcode block parser in Subzero, to get a reading on how much time is used by the bitcode parser. BUG=None R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/688543003
-
- 02 Nov, 2014 1 commit
-
-
Jim Stichnoth authored
They are no longer needed now that we aren't using the buggy llvm-mc parser for Intel syntax. This also gets all spec2k components to work with -build-on-read. Also, adds an emit-time check that infinite-weight Variables actually got a physical register. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/696153002
-
- 01 Nov, 2014 3 commits
-
-
Jim Stichnoth authored
The main motivation is that -build-on-read introduces Intel-style asm output like: mov al, byte ptr [flags] and llvm-mc misinterprets the global symbol "flags" as the flags register. Further workarounds will likely cost more effort than switching over to AT&T syntax. Most of the lit tests don't need changing, since the asm text is generated by assembling and disassembling the llvm2ice asm output. There some LEAHACK TODOs that can be fixed, but that would change some of the instructions, so that can be a separate CL. The Operand emit() routines really ought to be moved entirely into the target-specific source files. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/695993004
-
Jim Stichnoth authored
The -asm-verbose flag adds comments to the text asm output about register availability. Specifically, it prints the registers in use at the beginning and end of each block, and it prints which registers' live ranges end at each instruction. This is extremely helpful when studying the output to find opportunities to improve the code quality. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/682983004
-
Jan Voung authored
The 32-bit validator is now consistent with the 64-bit validator w.r.t. 16-bit shld/shrd and accepts it. We didn't really use the 16-bit form in Subzero though, only the 32-bit one for 64-bit ops, I think. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/696753003
-
- 30 Oct, 2014 3 commits
-
-
Karl Schimpf authored
When compiling using toolchain_build_pnacl.py, we get errors of form: Don't use default labels in fully covered switches over enumerations I tried different combinations of -Wno-covered-switch-default and -Wno-error=covered-switch-default, but was not able to stop this error from being generated. Hence, taking the simplier route of removing -Werror from Makefile. (see www.llvm.org/docs/CodingStandards.html for more details) BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/686103006
-
Jim Stichnoth authored
Delays Phi lowering until after register allocation. This lets the Phi assignment order take register allocation into account and avoid creating false dependencies. All edges that lead to Phi instructions are split, and the new node gets mov instructions in the correct topological order, using available physical registers as needed. This lowering style is controllable under -O2 using -phi-edge-split (enabled by default). The result is faster translation time (due to fewer temporaries leading to faster liveness analysis and register allocation) as well as better code quality (due to better register allocation and fewer phi-based assignments). BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/680733002
-
Jim Stichnoth authored
The file ifatts.py no longer exists. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/687403002
-
- 29 Oct, 2014 3 commits
-
-
Karl Schimpf authored
BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/689433002
-
Karl Schimpf authored
BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/689753002
-
Karl Schimpf authored
indices are used. Also introduces a "error" instruction method for inserting an instruction placeholder when an instruction is erroneous, and the type of generated value is known. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/686913003
-
- 27 Oct, 2014 2 commits
-
-
Karl Schimpf authored
Adds conditionality to lit tests in two ways: 1) Allows the use of "; REQUIRES: XXX" lines in lit tests. In this case, the tests defined by the file are only run if all REQUIRES are met. 2) Allows the conditional running of RUN commands, based on build flags. This comes in two subforms. There are predefined %ifX commands that run the command defined by remaining arguments, if the corresponding %X2i command is applicable. Alternatively, one can use %if with explicit '--att' arguments to define what conditions should be checked. In any case, unlike REQUIRES, the %if commands RUN all the time, but simply generate empty output, rather then output defined by the following command, if the condition is not met. These latter tests are useful when the same input is to be tested under different conditions, since the REQUIRES form does not allow this. Note that m2i, p2i, l2i, and lc2i are also conditionally controlled, so that they do nothing if the build did not construct the appropriate Subzero translator. This CL replaces https://codereview.chromium.org/644143002 BUG=None R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/659513005
-
Jim Stichnoth authored
The (final) newline is emitted by the caller of emit(), instead of by all the emit() implementations. This sets the stage for being able to add useful comments to the textual asm, such as annotating which registers became free after the instruction. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/681783002
-
- 24 Oct, 2014 3 commits
-
-
Jim Stichnoth authored
The only functional change (though not actually visible at this point) is that redundant assignment elimination is moved into a separate pass. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/672393003
-
Jan Voung authored
Currently not testing fixups of forward branches and instead streaming a ".byte (foo - (. + 1))" or ".long (foo - (. + 4))". It should be supported once emitIAS() delays writing things out until after the function is fully emitted (and therefore forward labels have all been bound). BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/673543002
-
Jan Voung authored
Previously, llvm would emit naclcall instead of call to get the effect of bundle-align-to-end, but now it just emits plain calls and aligns plain calls like NaCl binutils would do. Adjust a few Subzero tests. See: https://codereview.chromium.org/647443005/ Currently leaving the nop checks loose in case the integrated assembler has some intermediate stage where it emits the bytes of a call in a "raw" manner (without padding). BUG=none R=kschimpf@google.com Review URL: https://codereview.chromium.org/671193003
-
- 23 Oct, 2014 1 commit
-
-
Jim Stichnoth authored
1. Decorate the list of live-in and live-out variables with register assignments in the dump() output. This helps one to assess register pressure. 2. Fix a bug where the DisableInternal flag wasn't being honored for function definitions. 3. Add a -translate-only=<symbol> to limit translation to a single function or global variable. This makes it easier to focus on debugging a single function. 4. Change the -no-phi-edge-split option to -phi-edge-split and invert the meaning, to better not avoid the non double negatives. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/673783002
-