- 13 Oct, 2014 3 commits
-
-
Jan Voung authored
Currently, this only checks and emits the segment override only for GPR instructions, assuming it's mostly only used for nacl.read.tp. The code will assert when used in other situations. The lea hack is still tested in some files, but it's not emitted with emitIAS, and instead the "immediate" operand now has a fixup. There is a more compact encoding for "mov eax, moffs32", etc., but that isn't used right now. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/649463002
-
Karl Schimpf authored
Introduces the notion of a function address, to replace using LLVM IR's Function class. Modifies Ice converter, and Subzero's bitcode reader, to build function addresses. BUG=None R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/641193002
-
Jan Voung authored
BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/650573002
-
- 09 Oct, 2014 1 commit
-
-
Jan Voung authored
BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/634333002
-
- 08 Oct, 2014 5 commits
-
-
Jan Voung authored
BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/640603002
-
Jim Stichnoth authored
Also changes the szbuild.py script to add -fdata-sections, and entirely removes the -disable-globals option. The global initializer emission basically copies what llc does, based on 3 properties of the global: constant vs non-constant, internal vs external, and full zero-initializer vs non-trivial initializer. BUG= none R=jvoung@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/631383003
-
Karl Schimpf authored
Changes Subzero's bitcode reader to build and store ICE types, instead of using LLVM's types. Note: This code doesn't remove all uses of LLVM types. They are still used to check types for instructions and to generate function addresses. BUG=None R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/625243002
-
Jim Stichnoth authored
Makes sure the percentages represent only the function(s) focused on, and not with respect to the total translation time across all functions. Reset the timings between functions so that --timing-focus=* gives reasonable numbers. Also, adds a timer for the live range construction phase. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/640713003
-
Jan Voung authored
Add the SZ runtime functions for unsigned conversion. Add some more cast tests before doing emitIAS for cvt. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/639543002
-
- 07 Oct, 2014 4 commits
-
-
Jan Voung authored
Since push isn't used for args passing anymore, the cases of handling push for vectors and floats/doubles isn't needed anymore. Passing vectors requires a bit more care of alignment, so that was changed. I can imagine push needing to handle addresses later (at least on x86-64 to push the lower 32-bits of return address), but for now, this means only handling GPRs. The XMM registers are not callee saved. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/633553003
-
Jim Stichnoth authored
The main optimization is for the repeated overlaps() calls against the Inactive set, by iteratively trimming away the early sections of the Inactive live ranges that can no longer overlap with Cur. A more minor optimization doesn't bother checking pure point-valued Inactive ranges for expiring or reactivating. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/627203002
-
Karl Schimpf authored
Modifies both LLVM to ICE converter, and Subzero's bitcode reader, to build Subzero's global initializers. Modifies target lowering routines for global initializers to use this new model. Also modifies both to now handle relocations in global variable initializers. BUG=None R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/624663002
-
Jim Stichnoth authored
--timing-funcs - Produces a sorted list of total time spent translating each function. --timing-focus=<F> - Turns on the --timing equivalent just for one function. Use '*' to do this for all functions, i.e. get complete timing breakdowns across all functions. --verbose-focus=<F> - Temporarily turns on --verbose=all for one function. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/620373004
-
- 06 Oct, 2014 1 commit
-
-
Jan Voung authored
The "test" instruction is used in very limited situations. I've made a best effort to fill in the possible forms (address for the first operand), but it's not tested, so I put the *untested* parts behind an assert. Otherwise it's very similar to icmp, so if it starts to be used and tested then the asserts can be taken out, and the code shared with icmp. Tighten some of the XMM dispatch/emitters. Most of those XMM instructions can only encode the variant where dest is a register. Rather than waste a slot for a NULL method pointer, just make the struct type have two variants instead of three. Fill out a couple of XMM instructions which *do* allow mem-ops as dest (mov instructions). BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/624263002
-
- 04 Oct, 2014 2 commits
-
-
Jim Stichnoth authored
Call instruction lowering includes the FakeKill instruction, which creates several precolored variables, one for each scratch register. The live range for each of these variables consists of a set of "point" ranges, one point for every FakeKill instruction. The overlaps() logic is such that a point range never overlaps with an individual instruction, but it can overlap with a normal non-point range. It turns out that during register allocation, usually most of the variables on the Inactive list are these FakeKill instructions. The live range representation can be quite large if there are many calls in the function. In the "Check for inactive ranges that have expired or reactivated" section, a lot of time was spent on overlapsStart() calls that were doomed to return false. This change lets the live range keep track of whether it contains non-point segments, and if not, optimize the overlaps(InstNumberT) method. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/631483002
-
Jan Voung authored
For the integer shift ops, since the Src1 operand is forced to be an immediate or register (cl), it should be legal to have Dest+Src0 be either register or memory. However, we are currently only using the register form. It might be the case that shift w/ Dest+Src0 as mem are less optimized on some micro-architectures though, since it has to load, shift, and store all in one operation, but I'm not sure. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/622113002
-
- 02 Oct, 2014 1 commit
-
-
Jim Stichnoth authored
A lot of time was being spent in the two loops that check precolored ranges in the Unhandled set, specifically in the endsBefore() check. Solve this by keeping a shadow copy of Unhandled, restricted to the ranges that are precolored. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/622553003
-
- 01 Oct, 2014 5 commits
-
-
Jim Stichnoth authored
sed -i 's/LLVM_DELETED_FUNCTION/= delete/' src/*.{h,cpp} BUG= https://codereview.chromium.org/512933006/ R=jfb@chromium.org Review URL: https://codereview.chromium.org/619983002 -
Jim Stichnoth authored
Use C++11 'auto' where practical to make iteration more concise. Use C++11 range-based for loops where possible. BUG= none R=jfb@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/619893002
-
Jim Stichnoth authored
BUG= none R=jvoung@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/618313003
-
Jim Stichnoth authored
1. Setting command-line make variable NOASSERT=1 adds -DNDEBUG and builds in a separate directory. By default, we still get Release+Asserts. 2. Add "(void)foo;" as necessary when foo is only used in an assert(), to remove warnings. 3. Minimize inclusion of llvm/Support/Timer.h because it adds warnings. 4. Call validateLiveness() only when asserts are enabled, because it's relatively expensive. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/623493002
-
Jim Stichnoth authored
While I'm at it, normalize the #include order: 1. C++ library headers 2. LLVM headers 3. Subzero headers A blank line between each group. Each group sorted alphabetically, case-insensitive. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3930 R=jfb@chromium.org, jvoung@chromium.org Review URL: https://codereview.chromium.org/622443002
-
- 30 Sep, 2014 3 commits
-
-
Jim Stichnoth authored
This makes it much more useful for individual analysis and long-term translation performance tracking. 1. Collect and report aggregated across the entire translation, instead of function-by-function. If you really care about a single function, just extract it and translate it separately for analysis. 2. Remove "-verbose time" and just use -timing. 3. Collects two kinds of timings: cumulative and flat. Cumulative measures the total time, even if a callee also times itself. Flat only measures the currently active timer at the top of the stack. The flat times should add up to 100%, but cumulative will usually add up to much more than 100%. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/610813002
-
Jim Stichnoth authored
This makes it much easier to copy/paste the output. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/611983003
-
Jan Voung authored
Be sure to legalize 8-bit imul immediates (there is only the r/m form). Add a test for that, and cover a couple of other ops too... There is a one-byte-shorter form when Dest/Src0 == EAX and Src1 is not an immediate, but that isn't taken advantage of. Go ahead and add the optimization for 8-bit immediates for i16/i32 (not allowed for i8). It shows up sometimes in spec, e.g., to multiply by 10. There is a lot of multiply by 4 as well, that we could strength-reduce. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/617593002
-
- 29 Sep, 2014 3 commits
-
-
Jan Voung authored
For some arithmetic assembler methods, instead of checking IceType_i8 || IceType_i1, only allow IceType_i8 and assert if an i1 leaked to that stage (should have been vetted earlier by the bitcode reader / ABI checks). Could have looked up the type width and isIntegerArithmeticType, etc. in the property table, but that seemed a bit heavy for just checking one type (or one of two types). Also changed some f32 || f64 checks into just using isScalarFloatingType() which looks things up in a property table. Could alternatively just keep it as an simple f32 || f64 check, and I could change isScalarFloatingType()'s implementation. In some places where we assume something is either i32 or i64 and do a select, change that into using a helper function so that we can do one compare, and then assert. Some of the asserts are really redundant (already within a branch which already checked that), but hopefully that disappears if we compile in release mode. Similar for f32 or f64 (which happened a lot in the assembler). BUG=none R=kschimpf@google.com, stichnot@chromium.org Review URL: https://codereview.chromium.org/613483002
-
Jim Stichnoth authored
BUG= none R=dschuff@chromium.org, jvoung@chromium.org Review URL: https://codereview.chromium.org/610273004
-
Jim Stichnoth authored
The operand type needs to be propagated into EmitImmediate() and EmitComplex() so that we know whether to emit the 2-byte or 4-byte form. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/607353002
-
- 27 Sep, 2014 1 commit
-
-
Jim Stichnoth authored
Separate objects are built with -O0 and -O2. Separate executables are built: build/Release/llvm2ice - Release build build/Debug/llvm2ice - Debug build The executable built depends on whether the DEBUG make variable is set: make -f Makefile.standalone make -f Makefile.standalone DEBUG=1 The llvm2ice file in the top-level directory is always removed and symlinked to the appropriate build. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/605093002
-
- 26 Sep, 2014 6 commits
-
-
Jan Voung authored
Add a test to check that the encodings are efficient for immediates (chooses the i8, and eax encodings when appropriate). The .byte syntax breaks NaCl bundle straddle checking in llvm-mc, so I had to change one of the tests which noted that a nop appeared (no longer does). This also assumes that _add(), etc. are usually done with _add(T, ...) and then _mov(dst, T) so that the dest is always register. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/604873003
-
Jim Stichnoth authored
Subzero translation is stable enough that szbuild.py should prefer Subzero-translated symbols by default. The exception is that if you explicitly use --include, the intuitive interpretation is that you only want Subzero to include those symbols (minus any given with --exclude). BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/605283002
-
Jim Stichnoth authored
Not necessary for the LLVM 3.5 merge, but nice to have anyway. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3930 R=jfb@chromium.org, jvoung@chromium.org Review URL: https://codereview.chromium.org/605123002
-
Karl Schimpf authored
szdiff is an approximate match tool used in early tests. When Subzero's bitcode reader tests already exist for failing cases of szdiff, remove the broken tests. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/609813003
-
Karl Schimpf authored
This test was previously failing because insertelement returned the wrong type. However, a previous CL fixed this problem and the test now works with Subzero's bitcode reader. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/605273002
-
Jim Stichnoth authored
This just adds -std=c++11 to the compiler flags and fixes the resulting errors/warnings. Later CLs can fix things related to the LLVM 3.5 merge. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3930 R=jfb@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/607443003
-
- 25 Sep, 2014 3 commits
-
-
Karl Schimpf authored
Instruction insertelement was incorrectly generating a result corresponding to the element type, instead of the updated vector type. BUG= None R=jvoung@chromium.org Review URL: https://codereview.chromium.org/604023003
-
Jim Stichnoth authored
Originally, for a given Variable, register preference and overlap were manually specified. That is, when choosing a free register for a Variable, it would be manually specified which (if any) related Variable would be a good choice for register selection, all things being equal. Also, it allowed the rather dangerous "AllowOverlap" specification which let the Variable use its preferred Variable's register, even if their live ranges overlap. Now, all this selection is automatic, and the machinery for manual specification is removed. A few other changes in this CL: - Address mode inference leverages the more precise - Better regalloc dump messages to follow the logic - "-verbose most" enables all verbose options except regalloc and time - "-ias" is an alias for "-integrated-as" - Bug fix: prevent 8-bit register ah from being used in register allocation, unless it is pre-colored - Bug fix: the _mov helper where Dest is NULL wasn't always actually creating a new Variable - A few tests are updated based on slightly different O2 register allocation decisions The static stats actually improve slightly across the board (around 1%), except that frame size improves by 6-10%. This is probably from smarter register allocation decisions, particularly involving phi lowering temporaries, where the manual hints weren't too good to start with. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/597003004
-
Karl Schimpf authored
Adds the python script run-llvm2ice.py (was llvm2iceinsts.py) that automatically handles conversion of LLVM source to a PEXE file, and then runs llvm2ice on the corresponding PEXE file. Also, defines three paths in tests, based on the executable chosen: %lc2i - Directly reads from LLVM source, and converts to Subzero. %l2i - Parses a PEXE file into LLVM IR, and converts to Subzero. %p2i - Parses a PEXE directly into Subzero. Note that for all three executables, the same arguments can be used, making it easy to change how the input is handled. Also moves tests to use %p2i whenever possible. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3892 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/600043002
-
- 24 Sep, 2014 1 commit
-
-
Jan Voung authored
Extend the bswap test to have a case which will exhibit a bit of register pressure to test register encoding more (at first wasn't sure if it was 0xC8 + reg or 0xC8 | reg... but it should be the same since there's only 0-7 for regs). BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/595093002
-
- 23 Sep, 2014 1 commit
-
-
Jan Voung authored
BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/597643002
-