- 30 Oct, 2014 2 commits
-
-
Jim Stichnoth authored
Delays Phi lowering until after register allocation. This lets the Phi assignment order take register allocation into account and avoid creating false dependencies. All edges that lead to Phi instructions are split, and the new node gets mov instructions in the correct topological order, using available physical registers as needed. This lowering style is controllable under -O2 using -phi-edge-split (enabled by default). The result is faster translation time (due to fewer temporaries leading to faster liveness analysis and register allocation) as well as better code quality (due to better register allocation and fewer phi-based assignments). BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/680733002
-
Jim Stichnoth authored
The file ifatts.py no longer exists. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/687403002
-
- 29 Oct, 2014 3 commits
-
-
Karl Schimpf authored
BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/689433002
-
Karl Schimpf authored
BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/689753002
-
Karl Schimpf authored
indices are used. Also introduces a "error" instruction method for inserting an instruction placeholder when an instruction is erroneous, and the type of generated value is known. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/686913003
-
- 27 Oct, 2014 2 commits
-
-
Karl Schimpf authored
Adds conditionality to lit tests in two ways: 1) Allows the use of "; REQUIRES: XXX" lines in lit tests. In this case, the tests defined by the file are only run if all REQUIRES are met. 2) Allows the conditional running of RUN commands, based on build flags. This comes in two subforms. There are predefined %ifX commands that run the command defined by remaining arguments, if the corresponding %X2i command is applicable. Alternatively, one can use %if with explicit '--att' arguments to define what conditions should be checked. In any case, unlike REQUIRES, the %if commands RUN all the time, but simply generate empty output, rather then output defined by the following command, if the condition is not met. These latter tests are useful when the same input is to be tested under different conditions, since the REQUIRES form does not allow this. Note that m2i, p2i, l2i, and lc2i are also conditionally controlled, so that they do nothing if the build did not construct the appropriate Subzero translator. This CL replaces https://codereview.chromium.org/644143002 BUG=None R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/659513005
-
Jim Stichnoth authored
The (final) newline is emitted by the caller of emit(), instead of by all the emit() implementations. This sets the stage for being able to add useful comments to the textual asm, such as annotating which registers became free after the instruction. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/681783002
-
- 24 Oct, 2014 3 commits
-
-
Jim Stichnoth authored
The only functional change (though not actually visible at this point) is that redundant assignment elimination is moved into a separate pass. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/672393003
-
Jan Voung authored
Currently not testing fixups of forward branches and instead streaming a ".byte (foo - (. + 1))" or ".long (foo - (. + 4))". It should be supported once emitIAS() delays writing things out until after the function is fully emitted (and therefore forward labels have all been bound). BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/673543002
-
Jan Voung authored
Previously, llvm would emit naclcall instead of call to get the effect of bundle-align-to-end, but now it just emits plain calls and aligns plain calls like NaCl binutils would do. Adjust a few Subzero tests. See: https://codereview.chromium.org/647443005/ Currently leaving the nop checks loose in case the integrated assembler has some intermediate stage where it emits the bytes of a call in a "raw" manner (without padding). BUG=none R=kschimpf@google.com Review URL: https://codereview.chromium.org/671193003
-
- 23 Oct, 2014 1 commit
-
-
Jim Stichnoth authored
1. Decorate the list of live-in and live-out variables with register assignments in the dump() output. This helps one to assess register pressure. 2. Fix a bug where the DisableInternal flag wasn't being honored for function definitions. 3. Add a -translate-only=<symbol> to limit translation to a single function or global variable. This makes it easier to focus on debugging a single function. 4. Change the -no-phi-edge-split option to -phi-edge-split and invert the meaning, to better not avoid the non double negatives. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/673783002
-
- 21 Oct, 2014 1 commit
-
-
Jan Voung authored
Helps make it work with p2i instead of lc2i. This affected the address mode optimizations, so some of the test expectations have changed. BUG=none (happened to notice it while trying to test some things manually) R=stichnot@chromium.org Review URL: https://codereview.chromium.org/671443003
-
- 20 Oct, 2014 2 commits
-
-
Karl Schimpf authored
The definition of ExternName4 in crosstest/test_global_main.cpp was not defined as a pointer, but was in crosstest/test_global.cpp. As a result, that name was not linked properly. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/651673003
-
Karl Schimpf authored
Fixes bug in the representation of relocation names (in either global initializers or as constant expressions in code) so that they understand when the name is externally defined. This allows us to test this property using command line arguments, and fixes relocation tests in cross compilations (where externnally referenced names shouldn't be name mangled). BUG= R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/667763002
-
- 16 Oct, 2014 2 commits
-
-
Jan Voung authored
Similar to https://codereview.chromium.org/656123003/, but cover some of the assembler files which were avoided to avoid conflicts. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/643903006
-
Jan Voung authored
We can't do direct calls via the .long sym hack, since that is normally for an absolute relocation, but calls are expecting relative relocations (except for reg/mem forms). Nop-out the InstFake emitIAS methods. Remove the generic dispatcher that redirects emitIAS() to emit(), since only branches and labels are left. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/647193003
-
- 15 Oct, 2014 5 commits
-
-
Jan Voung authored
Force dest to be the full 32-bit reg instead of sometimes being a 16-bit reg. This is to save on a operand size prefix (and avoid passing the DestTy down to the dispatchers). BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/647223004
-
Jim Stichnoth authored
Currently, O2 calls VariablesMetadata::init() 4 times: - Twice for liveness analysis, where only multi-block use information is needed for dealing with sparse bit vectors. - Once for address mode inference, where single-definition information is needed. - Once for register allocation, where all information is needed, including the set of all definitions which is needed for determining AllowOverlap. So we limit the amount of data we gather based on the actual need. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/650613003
-
Jim Stichnoth authored
For consistency, put deleted ctors at the beginning of the class definition. If the default copy ctor or assignment operator is not deleted, and the default implementation is used, leave it commented out to indicate it is intentional. Also, fixed one C++11 related TODO. BUG= none R=jvoung@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/656123003
-
Jim Stichnoth authored
This removes the redundancy between live ranges stored in the Variable and those stored in Liveness, by removing the Liveness copy. After liveness analysis, live ranges are constructed directly into the Variable. Also, the LiveRangeWrapper is removed and Variable * is directly used instead. The original thought behind LiveRangeWrapper was that it could be extended to include live range splitting. However, when/if live range splitting is implemented, it will probably involve creating a new variable with its own live range, and carrying around some extra bookkeeping until the split is committed, so such a wrapper probably won't be needed. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/656023002
-
Jan Voung authored
Give a different name to the crosstest .s and .o files depending on the CPU features as well. That way the SSE2 and SSE4.1 .s and .o are separate. The encodings for Pextrw and Pextrb/d... make me sad. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/656983002
-
- 14 Oct, 2014 2 commits
-
-
Jim Stichnoth authored
This adds update counts to the output, e.g.: Total across all functions - Flat times: 0.262297 (13.0%): [ 1287] linearScan 0.243965 (12.1%): [ 1287] emit ... This is useful to know when some passes are called once per function and others are called several times per function. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/655563005 -
Jim Stichnoth authored
The key performance problem was that the per-block LiveBegin and LiveEnd vectors were dense with respect to the multi-block "global" variables, even though very few of the global variables are ever live within the block. This led to large vectors needlessly initialized and iterated over. The new approach is to accumulate two small vectors of <variable,instruction_number> tuples (LiveBegin and LiveEnd) as each block is processed, then sort the vectors and iterate over them in parallel to construct the live ranges. Some of the anomalies in the original liveness analysis code have been straightened out: 1. Variables have an IgnoreLiveness attribute to suppress analysis. This is currently used only on the esp register. 2. Instructions have a DestNonKillable attribute which causes the Dest variable not to be marked as starting a new live range at that instruction. This is used when a variable is non-SSA and has more than one assignment within a block, but we want to treat it as a single live range. This lets the variable have zero or one live range begins or ends within a block. DestNonKillable is derived automatically for two-address instructions, and annotated manually in a few other cases. This is tested by comparing the O2 asm output in each Spec2K component. In theory, the output should be the same except for some differences in pseudo-instructions output as comments. However, some actual differences showed up, related to the i64 shl instruction followed by trunc to i32. This turned out to be a liveness bug that was accidentally fixed. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/652633002
-
- 13 Oct, 2014 4 commits
-
-
Jim Stichnoth authored
1. Use a sorted std::vector instead of std::set to improve management of the Unhandled sets. This is the main performance gain. 2. Use std::list.splice() to move items between lists, instead of erase()+push_back(). This doesn't really save much, but the intention is somewhat clearer. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/642603005
-
Jan Voung authored
Currently, this only checks and emits the segment override only for GPR instructions, assuming it's mostly only used for nacl.read.tp. The code will assert when used in other situations. The lea hack is still tested in some files, but it's not emitted with emitIAS, and instead the "immediate" operand now has a fixup. There is a more compact encoding for "mov eax, moffs32", etc., but that isn't used right now. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/649463002
-
Karl Schimpf authored
Introduces the notion of a function address, to replace using LLVM IR's Function class. Modifies Ice converter, and Subzero's bitcode reader, to build function addresses. BUG=None R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/641193002
-
Jan Voung authored
BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/650573002
-
- 09 Oct, 2014 1 commit
-
-
Jan Voung authored
BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/634333002
-
- 08 Oct, 2014 5 commits
-
-
Jan Voung authored
BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/640603002
-
Jim Stichnoth authored
Also changes the szbuild.py script to add -fdata-sections, and entirely removes the -disable-globals option. The global initializer emission basically copies what llc does, based on 3 properties of the global: constant vs non-constant, internal vs external, and full zero-initializer vs non-trivial initializer. BUG= none R=jvoung@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/631383003
-
Karl Schimpf authored
Changes Subzero's bitcode reader to build and store ICE types, instead of using LLVM's types. Note: This code doesn't remove all uses of LLVM types. They are still used to check types for instructions and to generate function addresses. BUG=None R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/625243002
-
Jim Stichnoth authored
Makes sure the percentages represent only the function(s) focused on, and not with respect to the total translation time across all functions. Reset the timings between functions so that --timing-focus=* gives reasonable numbers. Also, adds a timer for the live range construction phase. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/640713003
-
Jan Voung authored
Add the SZ runtime functions for unsigned conversion. Add some more cast tests before doing emitIAS for cvt. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/639543002
-
- 07 Oct, 2014 4 commits
-
-
Jan Voung authored
Since push isn't used for args passing anymore, the cases of handling push for vectors and floats/doubles isn't needed anymore. Passing vectors requires a bit more care of alignment, so that was changed. I can imagine push needing to handle addresses later (at least on x86-64 to push the lower 32-bits of return address), but for now, this means only handling GPRs. The XMM registers are not callee saved. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/633553003
-
Jim Stichnoth authored
The main optimization is for the repeated overlaps() calls against the Inactive set, by iteratively trimming away the early sections of the Inactive live ranges that can no longer overlap with Cur. A more minor optimization doesn't bother checking pure point-valued Inactive ranges for expiring or reactivating. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/627203002
-
Karl Schimpf authored
Modifies both LLVM to ICE converter, and Subzero's bitcode reader, to build Subzero's global initializers. Modifies target lowering routines for global initializers to use this new model. Also modifies both to now handle relocations in global variable initializers. BUG=None R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/624663002
-
Jim Stichnoth authored
--timing-funcs - Produces a sorted list of total time spent translating each function. --timing-focus=<F> - Turns on the --timing equivalent just for one function. Use '*' to do this for all functions, i.e. get complete timing breakdowns across all functions. --verbose-focus=<F> - Temporarily turns on --verbose=all for one function. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/620373004
-
- 06 Oct, 2014 1 commit
-
-
Jan Voung authored
The "test" instruction is used in very limited situations. I've made a best effort to fill in the possible forms (address for the first operand), but it's not tested, so I put the *untested* parts behind an assert. Otherwise it's very similar to icmp, so if it starts to be used and tested then the asserts can be taken out, and the code shared with icmp. Tighten some of the XMM dispatch/emitters. Most of those XMM instructions can only encode the variant where dest is a register. Rather than waste a slot for a NULL method pointer, just make the struct type have two variants instead of three. Fill out a couple of XMM instructions which *do* allow mem-ops as dest (mov instructions). BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/624263002
-
- 04 Oct, 2014 2 commits
-
-
Jim Stichnoth authored
Call instruction lowering includes the FakeKill instruction, which creates several precolored variables, one for each scratch register. The live range for each of these variables consists of a set of "point" ranges, one point for every FakeKill instruction. The overlaps() logic is such that a point range never overlaps with an individual instruction, but it can overlap with a normal non-point range. It turns out that during register allocation, usually most of the variables on the Inactive list are these FakeKill instructions. The live range representation can be quite large if there are many calls in the function. In the "Check for inactive ranges that have expired or reactivated" section, a lot of time was spent on overlapsStart() calls that were doomed to return false. This change lets the live range keep track of whether it contains non-point segments, and if not, optimize the overlaps(InstNumberT) method. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/631483002
-
Jan Voung authored
For the integer shift ops, since the Src1 operand is forced to be an immediate or register (cl), it should be legal to have Dest+Src0 be either register or memory. However, we are currently only using the register form. It might be the case that shift w/ Dest+Src0 as mem are less optimized on some micro-architectures though, since it has to load, shift, and store all in one operation, but I'm not sure. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/622113002
-