- 04 Oct, 2014 1 commit
-
-
Jan Voung authored
For the integer shift ops, since the Src1 operand is forced to be an immediate or register (cl), it should be legal to have Dest+Src0 be either register or memory. However, we are currently only using the register form. It might be the case that shift w/ Dest+Src0 as mem are less optimized on some micro-architectures though, since it has to load, shift, and store all in one operation, but I'm not sure. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/622113002
-
- 02 Oct, 2014 1 commit
-
-
Jim Stichnoth authored
A lot of time was being spent in the two loops that check precolored ranges in the Unhandled set, specifically in the endsBefore() check. Solve this by keeping a shadow copy of Unhandled, restricted to the ranges that are precolored. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/622553003
-
- 01 Oct, 2014 5 commits
-
-
Jim Stichnoth authored
sed -i 's/LLVM_DELETED_FUNCTION/= delete/' src/*.{h,cpp} BUG= https://codereview.chromium.org/512933006/ R=jfb@chromium.org Review URL: https://codereview.chromium.org/619983002 -
Jim Stichnoth authored
Use C++11 'auto' where practical to make iteration more concise. Use C++11 range-based for loops where possible. BUG= none R=jfb@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/619893002
-
Jim Stichnoth authored
BUG= none R=jvoung@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/618313003
-
Jim Stichnoth authored
1. Setting command-line make variable NOASSERT=1 adds -DNDEBUG and builds in a separate directory. By default, we still get Release+Asserts. 2. Add "(void)foo;" as necessary when foo is only used in an assert(), to remove warnings. 3. Minimize inclusion of llvm/Support/Timer.h because it adds warnings. 4. Call validateLiveness() only when asserts are enabled, because it's relatively expensive. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/623493002
-
Jim Stichnoth authored
While I'm at it, normalize the #include order: 1. C++ library headers 2. LLVM headers 3. Subzero headers A blank line between each group. Each group sorted alphabetically, case-insensitive. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3930 R=jfb@chromium.org, jvoung@chromium.org Review URL: https://codereview.chromium.org/622443002
-
- 30 Sep, 2014 3 commits
-
-
Jim Stichnoth authored
This makes it much more useful for individual analysis and long-term translation performance tracking. 1. Collect and report aggregated across the entire translation, instead of function-by-function. If you really care about a single function, just extract it and translate it separately for analysis. 2. Remove "-verbose time" and just use -timing. 3. Collects two kinds of timings: cumulative and flat. Cumulative measures the total time, even if a callee also times itself. Flat only measures the currently active timer at the top of the stack. The flat times should add up to 100%, but cumulative will usually add up to much more than 100%. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/610813002
-
Jim Stichnoth authored
This makes it much easier to copy/paste the output. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/611983003
-
Jan Voung authored
Be sure to legalize 8-bit imul immediates (there is only the r/m form). Add a test for that, and cover a couple of other ops too... There is a one-byte-shorter form when Dest/Src0 == EAX and Src1 is not an immediate, but that isn't taken advantage of. Go ahead and add the optimization for 8-bit immediates for i16/i32 (not allowed for i8). It shows up sometimes in spec, e.g., to multiply by 10. There is a lot of multiply by 4 as well, that we could strength-reduce. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/617593002
-
- 29 Sep, 2014 3 commits
-
-
Jan Voung authored
For some arithmetic assembler methods, instead of checking IceType_i8 || IceType_i1, only allow IceType_i8 and assert if an i1 leaked to that stage (should have been vetted earlier by the bitcode reader / ABI checks). Could have looked up the type width and isIntegerArithmeticType, etc. in the property table, but that seemed a bit heavy for just checking one type (or one of two types). Also changed some f32 || f64 checks into just using isScalarFloatingType() which looks things up in a property table. Could alternatively just keep it as an simple f32 || f64 check, and I could change isScalarFloatingType()'s implementation. In some places where we assume something is either i32 or i64 and do a select, change that into using a helper function so that we can do one compare, and then assert. Some of the asserts are really redundant (already within a branch which already checked that), but hopefully that disappears if we compile in release mode. Similar for f32 or f64 (which happened a lot in the assembler). BUG=none R=kschimpf@google.com, stichnot@chromium.org Review URL: https://codereview.chromium.org/613483002
-
Jim Stichnoth authored
BUG= none R=dschuff@chromium.org, jvoung@chromium.org Review URL: https://codereview.chromium.org/610273004
-
Jim Stichnoth authored
The operand type needs to be propagated into EmitImmediate() and EmitComplex() so that we know whether to emit the 2-byte or 4-byte form. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/607353002
-
- 27 Sep, 2014 1 commit
-
-
Jim Stichnoth authored
Separate objects are built with -O0 and -O2. Separate executables are built: build/Release/llvm2ice - Release build build/Debug/llvm2ice - Debug build The executable built depends on whether the DEBUG make variable is set: make -f Makefile.standalone make -f Makefile.standalone DEBUG=1 The llvm2ice file in the top-level directory is always removed and symlinked to the appropriate build. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/605093002
-
- 26 Sep, 2014 6 commits
-
-
Jan Voung authored
Add a test to check that the encodings are efficient for immediates (chooses the i8, and eax encodings when appropriate). The .byte syntax breaks NaCl bundle straddle checking in llvm-mc, so I had to change one of the tests which noted that a nop appeared (no longer does). This also assumes that _add(), etc. are usually done with _add(T, ...) and then _mov(dst, T) so that the dest is always register. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/604873003
-
Jim Stichnoth authored
Subzero translation is stable enough that szbuild.py should prefer Subzero-translated symbols by default. The exception is that if you explicitly use --include, the intuitive interpretation is that you only want Subzero to include those symbols (minus any given with --exclude). BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/605283002
-
Jim Stichnoth authored
Not necessary for the LLVM 3.5 merge, but nice to have anyway. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3930 R=jfb@chromium.org, jvoung@chromium.org Review URL: https://codereview.chromium.org/605123002
-
Karl Schimpf authored
szdiff is an approximate match tool used in early tests. When Subzero's bitcode reader tests already exist for failing cases of szdiff, remove the broken tests. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/609813003
-
Karl Schimpf authored
This test was previously failing because insertelement returned the wrong type. However, a previous CL fixed this problem and the test now works with Subzero's bitcode reader. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/605273002
-
Jim Stichnoth authored
This just adds -std=c++11 to the compiler flags and fixes the resulting errors/warnings. Later CLs can fix things related to the LLVM 3.5 merge. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3930 R=jfb@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/607443003
-
- 25 Sep, 2014 3 commits
-
-
Karl Schimpf authored
Instruction insertelement was incorrectly generating a result corresponding to the element type, instead of the updated vector type. BUG= None R=jvoung@chromium.org Review URL: https://codereview.chromium.org/604023003
-
Jim Stichnoth authored
Originally, for a given Variable, register preference and overlap were manually specified. That is, when choosing a free register for a Variable, it would be manually specified which (if any) related Variable would be a good choice for register selection, all things being equal. Also, it allowed the rather dangerous "AllowOverlap" specification which let the Variable use its preferred Variable's register, even if their live ranges overlap. Now, all this selection is automatic, and the machinery for manual specification is removed. A few other changes in this CL: - Address mode inference leverages the more precise - Better regalloc dump messages to follow the logic - "-verbose most" enables all verbose options except regalloc and time - "-ias" is an alias for "-integrated-as" - Bug fix: prevent 8-bit register ah from being used in register allocation, unless it is pre-colored - Bug fix: the _mov helper where Dest is NULL wasn't always actually creating a new Variable - A few tests are updated based on slightly different O2 register allocation decisions The static stats actually improve slightly across the board (around 1%), except that frame size improves by 6-10%. This is probably from smarter register allocation decisions, particularly involving phi lowering temporaries, where the manual hints weren't too good to start with. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/597003004
-
Karl Schimpf authored
Adds the python script run-llvm2ice.py (was llvm2iceinsts.py) that automatically handles conversion of LLVM source to a PEXE file, and then runs llvm2ice on the corresponding PEXE file. Also, defines three paths in tests, based on the executable chosen: %lc2i - Directly reads from LLVM source, and converts to Subzero. %l2i - Parses a PEXE file into LLVM IR, and converts to Subzero. %p2i - Parses a PEXE directly into Subzero. Note that for all three executables, the same arguments can be used, making it easy to change how the input is handled. Also moves tests to use %p2i whenever possible. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3892 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/600043002
-
- 24 Sep, 2014 1 commit
-
-
Jan Voung authored
Extend the bswap test to have a case which will exhibit a bit of register pressure to test register encoding more (at first wasn't sure if it was 0xC8 + reg or 0xC8 | reg... but it should be the same since there's only 0-7 for regs). BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/595093002
-
- 23 Sep, 2014 2 commits
-
-
Jan Voung authored
BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/597643002
-
Jan Voung authored
Add a flag to use the integrated assembler. Handle simple XMM binary op instructions as an initial example of how instructions might be handled. This tests fixups in a very limited sense -- Track buffer locations of fixups for floating point immediates. Patchset one shows the original dart assembler code (revision 39313), so that it can be diffed. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/574133002
-
- 22 Sep, 2014 3 commits
-
-
Jim Stichnoth authored
This affects tracking of two kinds of Variable metadata: whether a Variable is block-local (i.e., all uses are in a single block) and if so, which CfgNode that is; and whether a Variable has a single defining instruction, and if so, which Inst that is. Originally, this metadata was constructed incrementally, which was quite fragile and most likely inaccurate under many circumstances. In the new approach, this metadata is reconstructed in a separate pass as needed. As a side benefit, the metadata fields are removed from each Variable and pulled into a separate structure, shrinking the size of Variable. There should be no functional changes, except that simple stack slot coalescing is turned off under Om1, since it takes a separate pass to calculate block-local variables, and passes are minimized under Om1. As a result, a couple of the lit tests needed to be changed. There are a few non-mechanical changes, generally to tighten up Variable tracking for liveness analysis. This is being done mainly to get precise Variable definition information so that register allocation can infer the best register preferences as well as when overlapping live ranges are allowable. BUG=none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/589003002
-
Jan Voung authored
Should be fixed now. BUG=https://code.google.com/p/nativeclient/issues/detail?id=3929 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/588893005
-
Karl Schimpf authored
Adds workaround that uses IceConverter's convertGlobals to generate global initializers. This should complete the initial implementation of Subzero's bitcode reader. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3892 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/587893003
-
- 20 Sep, 2014 1 commit
-
-
Jim Stichnoth authored
When doing a bitcast between int and FP types, the way lowering works is that a spill temporary is created, with regalloc weight of zero to inhibit register allocation, and this spill temporary is used for the cvt instruction. If the other variable does not get register-allocated, then addProlog() forces the spill temporary to share the same stack slot as the other variable. Currently, the lowering code passes this information to addProlog() by using the setPreferredRegister() mechanism. This is changed by creating a target-specific subclass of Variable, so that only the spill temporaries need to carry this extra information. Ultimately, many of the existing Variable fields will be refactored into a separate structure, and only generated/used as needed by various optimization passes. The spill temporary linkage is the one thing that is still needed with Om1 when no optimizations are enabled, motivating this change. A couple other minor cleanups are also done here. The key test is that the cast cross tests continue to work, specifically the bitcast tests. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/586943003
-
- 19 Sep, 2014 3 commits
-
-
Jan Voung authored
See: https://codereview.chromium.org/580983002 BUG=none R=kschimpf@google.com, stichnot@chromium.org Review URL: https://codereview.chromium.org/581293003
-
Karl Schimpf authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=3892 R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/577353003
-
Jan Voung authored
Lift the enums out of IceInstX8632.h and IceTargetLoweringX8632.h. This will later allow the assembler to share the enum values and use them as encodings where appropriate. E.g., to avoid having a separate enum in: https://codereview.chromium.org/476323004/diff/680001/src/assembler_constants_ia32.h The "all registers" enum is retained, but separate GPRRegister and XmmRegister enums are created with tags "Encoded_Reg_foo" to represent the encoded value of register "foo". Functions are added to convert from the "all registers" namespace to the encoded ones. Re-order the BrCond so that they match the encoding according to the "Instruction Subcode" in B.1 of the Intel Manuals. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/582113003
-
- 18 Sep, 2014 2 commits
-
-
Jim Stichnoth authored
Use --llc to pass extra arguments to pnacl-translate. Use --sz to pass extra arguments to llvm2ice. The --stats argument is removed from the script because it is Subzero-only, and can now be done with --sz=--stats . BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/582593002
-
Jim Stichnoth authored
1. Unconditional branch to the next basic block is removed. 2. For a conditional branch with a "false" edge to the next basic block, remove the unconditional branch to the fallthrough block. 3. For a conditional branch with a "true" edge to the next basic block, invert the condition and do like #2. This is enabled only for O2, particularly because inverting the branch condition is a marginally risky operation. This decreases the instruction count by about 5-6%. Also, --stats prints a final tally to make it easier to post-process the output. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/580903005
-
- 17 Sep, 2014 5 commits
-
-
Karl Schimpf authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=3892 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/576243002
-
Jim Stichnoth authored
BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/559723003
-
Karl Schimpf authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=3892 R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/576853002
-
Jim Stichnoth authored
This is needed since we are now using an absolute (and non-standard) path to clang++. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/567393007
-
Jim Stichnoth authored
The following are collected: - Number of machine instructions emitted - Number of registers saved/restored in prolog/epilog - Number of stack frame bytes (non-alloca) allocated - Number of "spills", or stores to stack slots - Number of "fills", or loads/operations from stack slots - Fill+Spill count (sum of above two) These are somewhat reasonable approximations of code quality, and the primary intention is to compare before-and-after when trying out an optimization. The statistics are dumped after translating each function. Per-function and cumulative statistics are collected. The output lines have a prefix that is easy to filter. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/580633002
-