- 16 Jul, 2014 2 commits
-
-
Matt Wala authored
Impacted instructions: bitcast {v4f32, v4i32, v8i16, v16i8} <-> {v4f32, v4i32, v8i16, v16i8} bitcast v8i1 <-> i8 bitcast v16i1 <-> i16 (There was already code present to handle trivial bitcasts like v16i1 <-> v16i1.) [sz]ext v4i1 -> v4i32 [sz]ext v8i1 -> v8i16 [sz]ext v16i1 -> v16i8 trunc v4i32 -> v4i1 trunc v8i16 -> v8i1 trunc v16i8 -> v16i1 [su]itofp v4i32 -> v4f32 fpto[su]i v4f32 -> v4i32 Where there is a relatively simple lowering to x86 instructions, it has been used. Otherwise a helper call is used. Some lowerings require a materialization of a integer vector with 1s in each entry. Since there is no support for vector constant pools, the constant is materialized purely through register operations. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/383303003 -
Jan Voung authored
We'll need the fallbacks in any case. However, once we've decided on how to specify the CPU features of the user machine we can use the nicer LZCNT/TZCNT/POPCNT as well. Adds cmov, bsf, and bsr instructions. Calls a popcount helper function for machines without SSE4.2. Not handling bswap yet (which can also take i16 params). BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org, wala@chromium.org Review URL: https://codereview.chromium.org/390443005
-
- 15 Jul, 2014 2 commits
-
-
Matt Wala authored
1) In makeHelperCall(), function pointers that are created should have type IceType_i32, not the functions' own return type. 2) In legalize(), change the name of WillHaveRegister to MustHaveRegister. Add a comment to clarify the condition being computed. 3) In legalize(), add an assert to make sure that vector "constants" don't get legalized (other than undef). There should be no constants of vector type. 4) In copyToReg(), replace an unnecessary use of Src->getType(). BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/385133006
-
Matt Wala authored
The frem operation takes two arguments. Pass both Src0 and Src1 to __frem_v4f32. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/387153002
-
- 14 Jul, 2014 2 commits
-
-
Jan Voung authored
Now that the name mangling is a bit smarter (from commit: 217dc082), we don't need to avoid having the same type twice in the function signature. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/389683003
-
Jan Voung authored
64-bit ops are expanded via a cmpxchg8b loop. 64/32-bit and/or/xor are also expanded into a cmpxchg / cmpxchg8b loop. Add a cross test for atomic RMW operations and compare and swap. Misc: Test that atomic.is.lock.free can be optimized out if result is ignored. TODO: * optimize compare and swap with compare+branch further down instruction stream. * optimize atomic RMW when the return value is ignored (adds a locked field to binary ops though). * We may want to do some actual target-dependent basic block splitting + expansion (the instructions inserted by the expansion must reference the pre-colored registers, etc.). Otherwise, we are currently getting by with modeling the extended liveness of the variables used in the loops using fake uses. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=jfb@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/362463002
-
- 11 Jul, 2014 4 commits
-
-
Matt Wala authored
This adds lowering code for fadd, fsub, fmul, fdiv, and frem. frem, having no native x86 counterpart, is implemented by making a helper call. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/389653002
-
Jim Stichnoth authored
SZZZ_ was being incremented to S0000_ instead of S1000_. BUG= https://codereview.chromium.org/385273002/ R=wala@chromium.org Review URL: https://codereview.chromium.org/390533002
-
Jim Stichnoth authored
https://refspecs.linuxbase.org/cxxabi-1.75.html#mangling-compression describes the mechanism for compressing mangled strings by using substitutions of the form S[0-9A-Z]*_ to represent repeated components. When the prefix is handled as wrapping inside a namespace, the base-36 substitution numbers all have to be incremented. This is implemented in a very simple way by scanning the string only for instances of the substitution pattern. Unfortunately, false matches are possible because the S[0-9A-Z]*_ pattern can be a substring of the type name, or can span other components of the mangled name. Getting this completely right would essentially require a full demangling parser - see the ~4000 lines of code in cxa_demangle.cpp and ItaniumMangle.cpp. Since this is just for testing, any false matches will likely cause a linking error and the test can be rewritten to avoid false matches. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/385273002
-
Karl Schimpf authored
Makes IceTranslator.ExitStatus a boolean (rather than int), and changes code to check flag when done. Fixes bug introduced in https://codereview.chromium.org/387023002. Also cleans up the (Ice) Converter class to handle globals processing, rathe than doing it in llvm2ice.cpp. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3894 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/387023002
-
- 10 Jul, 2014 1 commit
-
-
Jim Stichnoth authored
See the BUG description for more details. In short, the register allocator was inappropriately honoring AllowRegisterOverlap even when the variable's live range overlaps with an Unhandled variable precolored to the preferred register. Also changes legalize() logic to recognize when a variable is guaranteed to ultimately have a physical register due to infinite weight, and not create a new temporary in those cases. Finally, dumps RegisterPreference and AllowRegisterOverlap info for Variables for improved diagnostics. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3897 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/380363002
-
- 09 Jul, 2014 4 commits
-
-
Jim Stichnoth authored
This invokes clang-format-diff.py so you can easily reformat just the code you touched. (Caution, this may not apply to new files.) BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/372133002
-
Matt Wala authored
- Add TargetLowering::lowerArguments() as a new stage in TargetLowering. - Add support for passing arguments/return values in XMM registers in the x86 target. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/372113005
-
Jan Voung authored
Re-used test_arith_main.cpp, mostly to share the set of interesting floating point constants. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org, wala@chromium.org Review URL: https://codereview.chromium.org/384443003
-
Jan Voung authored
For ebp, exclude as needed. For esp, don't mark it as an int register. Not sure exactly how to do a targeted test for this Om1 register allocator. The Om1 regalloc seems to start w/ a fresh whitelist after each instruction, so it may assign the same register (e.g., eax), as an earlier instruction. Without pre-colored registers, I'm not sure how to force it to allocate something other than the first few registers. I do have a test case that has a ton of pre-colored registers, (e.g., cmpxchg8b), but that is a different CL: https://codereview.chromium.org/362463002/ Encountered for: BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/369573005
-
- 08 Jul, 2014 1 commit
-
-
Jim Stichnoth authored
The compile error was introduced in https://codereview.chromium.org/361733002/ . BUG= none R=wala@chromium.org Review URL: https://codereview.chromium.org/376923003
-
- 07 Jul, 2014 2 commits
-
-
Matt Wala authored
- Add vector types to the type table. - Add support for parsing vector types in llvm2ice. - Legalize undef vector values to zero. Test that undef vector values are lowered correctly. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/353553004
-
Karl Schimpf authored
This patch only handles global addresses in PNaCl bitcode files. Function blocks are still not parsed. Also, factors out a common API for translation, so that generated ICE can always be translated using the same code. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3892 R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/361733002
-
- 29 Jun, 2014 1 commit
-
-
Jim Stichnoth authored
This is still missing a couple things: 1. It only supports flat arrays and zeroinitializers. Arrays of structs are not yet supported. 2. Initializers can't yet contain relocatables, e.g. the address of another global.Mod Some changes are made to work around an llvm-mc assembler bug. When assembling using intel syntax, llvm-mc doesn't correctly parse symbolic constants or add relocation entries in some circumstances. Call instructions work, and use in a memory operand works, e.g. mov eax, [ArrayBase+4*ecx]. To work around this, we adjust legalize() to not allow ConstantRelocatable by default, except for memory operands and when called from lowerCall(), so the relocatable ends up being the source operand of a mov instruction. Then, the mov emit routine actually emits an lea instruction for such moves. A few lit tests needed to be adjusted to make szdiff work properly with respect to global initializers. In the new cross test, the driver calls test code that returns a pointer to an array with a global initializer, and the driver compares the arrays returned by llc and Subzero. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/358013003
-
- 27 Jun, 2014 1 commit
-
-
Karl Schimpf authored
BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/350933002
-
- 26 Jun, 2014 1 commit
-
-
Jim Stichnoth authored
Without this being in the command substitutions list, lit will rely on the 'not' command being in $PATH. The substitution code is adapted from llvm/test/lit.cfg to add word-break regexps to the list. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/344063004
-
- 25 Jun, 2014 1 commit
-
-
Jan Voung authored
Loads/stores w/ type i8, i16, and i32 are converted to plain load/store instructions and lowered w/ the plain lowerLoad/lowerStore. Atomic stores are followed by an mfence for sequential consistency. For 64-bit types, use movq to do 64-bit memory loads/stores (vs the usual load/store being broken into separate 32-bit load/stores). This means bitcasting the i64 -> f64, first (which splits the load of the value to be stored into two 32-bit ops) then stores in a single op. For load, load into f64 then bitcast back to i64 (which splits after the atomic load). This follows what GCC does for c++11 std::atomic<uint64_t> load/store methods (uses movq when -mfpmath=sse). This introduces some redundancy between movq and movsd, but the convention seems to be to use movq when working with integer quantities. Otherwise, movsd could work too. The difference seems to be in whether or not the XMM register's upper 64-bits are filled with 0 or not. Zero-extending could help avoid partial register stalls. Handle up to i32 fetch_add. TODO: add i64 via a cmpxchg loop. TODO: add some runnable crosstests to make sure that this doesn't do funny things to integer bit patterns that happen to look like signaling NaNs and quiet NaNs. However, the system clang would not know how to handle "llvm.nacl.*" if we choose to target that level directly via .ll files. Or, (a) we use old-school __sync methods (sync_fetch_and_add w/ 0 to load) or (b) require buildbot's clang/gcc to support c++11... BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/342763004
-
- 24 Jun, 2014 1 commit
-
-
Jan Voung authored
Currently, the integer immediate is legalized to a 64-bit integer register first, and then the lower/upper parts of that register are used for the bitcast. However, mov(64_bit_reg, imm) done by the legalization isn't legal. Similarly, trunc of 64-bit immediates need to take the lower half of the immediate, not legalize to a var first. This shifts the legalization code around. Other cases where immediates are illegal and legalized are idiv/div, but for those cases 64-bit operands are handled separately via a function call. The function call code properly splits up immediate arguments. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/348373005
-
- 18 Jun, 2014 5 commits
-
-
Jan Voung authored
Handle: * mem{cpy,move,set} (without optimizations for known lengths) * nacl.read.tp * setjmp, longjmp * trap Mostly see if the dispatching/organization is okay. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/321993002 -
Jan Voung authored
InstX8632Store is essentially a "mov" and it would emit a mov, but it did not add the ss/sd suffix based on the operand type. Also, there are some cases where legalization would leave two memory operands in the case that one of them is a floating point immediate: storeDoubleConst: .LstoreDoubleConst$entry: mov eax, dword ptr [esp+4] mov qword ptr [eax], qword ptr [L$double$1] ret BUG=none R=stichnot@chromium.org, wala@chromium.org Review URL: https://codereview.chromium.org/341683002
-
Matt Wala authored
BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/344613002
-
Matt Wala authored
arbitrary bit pattern and are lowered to a zero constant. IceOperand.h: Introduce a new ConstantUndef subclass of Constant. Add a getConstantZero() method. IceGlobalContext.h / IceGlobalContext.cpp: Implement pooling for ConstantUndefs. IceTargetLoweringX8632.cpp: Legalize ConstantUndefs to constant zeros. llvm2ice.cpp: Translate LLVM Undefs into ConstantUndefs. undef.ll: Test that undef values are recognized and legalized to zero. BUG=none R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/339783002
-
Jan Voung authored
Change the i1 zeroext parameter to an explicit zext and i32. Add an assert in lowerCall that the type is at least 32-bits. I ended up putting the assert in lowerCall instead of InstX8632Push, since technically there are quite a few modes that push allows: 16-bit reg/mem (just not 8-bit reg/mem) and 8/16/32 bit constants. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/339933004
-
- 17 Jun, 2014 2 commits
-
-
Derek Schuff authored
The subzero mac build fails with errors like the following: /Users/dschuff/code/nacl/native_client/toolchain_build/src/subzero/src/IceGlobalContext.cpp:116: error: ISO C++ forbids variable-size array 'NameBase' Replace the variable-length array with llvm::SmallVector which will still allow stack allocation most of the time. R=stichnot@chromium.org BUG=build subzero on the bots Review URL: https://codereview.chromium.org/335343005
-
Jan Voung authored
The div/idiv instruction operand must be a register or memory. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/339643003
-
- 12 Jun, 2014 2 commits
-
-
Jim Stichnoth authored
The TargetX8632 class maintains a "current stack adjustment" during a push sequence, so that pushing or otherwise accessing stack locations during a function arg push sequence can use the right esp offset. This adjustment should only be used for esp-based frames, but it was being used for ebp-based frames as well, causing the wrong stack-based arguments to be pushed. BUG= https://code.google.com/p/nativeclient/issues/detail?id=3878 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/331743002
-
Jan Voung authored
Currently only the output has a unique name (supplied by the invocation), but the intermediate files (.sz.s, .sz.o) can get overwritten (w/ different optlevels, or targets). Would be nice to keep them around for debugging. (bug may happen for Om1 but not O2). BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/333713004
-
- 06 Jun, 2014 1 commit
-
-
Jan Voung authored
Derek's CL to check out subzero calls the source directory "subzero", and the file header comments call the directory "subzero". Just make the python sys.path munging for importing pydir more generic. Also change crosstest to not run the raw LLVM "opt" with optimizations (only use it for ABI stabilization passes). Instead run pnacl-clang with -O2. Otherwise, newer NACL_SDK versions include a newer LLVM "opt" binary which autovectorizes and may generate vector IR that is not handled by Subzero yet. E.g., LLVM ERROR: Invalid PNaCl instruction: %1 = insertelement <4 x i32> undef, i32 %0, i32 0 w/ pepper_canary to version 37, revision 274873 BUG=none TEST=make -f Makefile.standalone check R=stichnot@chromium.org, wala@chromium.org Review URL: https://codereview.chromium.org/317963002
-
- 05 Jun, 2014 1 commit
-
-
Jim Stichnoth authored
Ice::Inst::NumberSentinel is defined within the Inst class definition: class Inst { ... static const InstNumberT NumberDeleted = -1; static const InstNumberT NumberSentinel = 0; ... }; Under some compilers/options, this causes a link error when passing NumberSentinel as a const T& argument. (Another option would be to move the actual definitions into IceInst.cpp.) BUG= none R=jfb@chromium.org Review URL: https://codereview.chromium.org/311243006
-
- 04 Jun, 2014 1 commit
-
-
Jim Stichnoth authored
Includes the following: 1. Liveness analysis. 2. Linear-scan register allocation. 3. Address mode optimization. 4. Compare-branch fusing. All of these depend on liveness analysis. There are three versions of liveness analysis (in order of increasing cost): 1. Lightweight. This computes last-uses for variables local to a single basic block. 2. Full. This computes last-uses for all variables based on global dataflow analysis. 3. Full live ranges. This computes all last-uses, plus calculates the live range intervals in terms of instruction numbers. (The live ranges are needed for register allocation.) For testing the full live range computation, Cfg::validateLiveness() checks every Variable of every Inst and verifies that the current Inst is contained within the Variable's live range. The cross tests are run with O2 in addition to Om1. Some of the lit tests (for what good they do) are updated with O2 code sequences. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/300563003
-
- 02 Jun, 2014 1 commit
-
-
Matt Wala authored
BUG= none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/305973005
-
- 23 May, 2014 2 commits
-
-
Jim Stichnoth authored
1. Comma-terminated enumerator lists. 2. Empty macro arguments. 3. Variable-length arrays. The first issue is definitely hitting the Mac bots. The other two issues will quite possibly following that. BUG= none R=jfb@chromium.org Review URL: https://codereview.chromium.org/296823013
-
Jim Stichnoth authored
Previously, the basis of constant pooling was implemented, but two things were lacking: 1. The constant pools were not being emitted in the asm file. 2. A direct FP value was emitted in an FP instruction, e.g. "addss xmm0, 1.0000e00". Curiously, at least for some FP constants, llvm-mc was accepting this syntax. BUG= none R=jfb@chromium.org Review URL: https://codereview.chromium.org/291213003
-
- 22 May, 2014 2 commits
-
-
Derek Schuff authored
This change now supports building subzero as part of the LLVM build (instead of in a separate build step). It is modeled on clang's Makefiles. The existing Makefile has been renamed and can still be used manually, e.g. Make -f Makefile.standalone It does not yet support running tests, just building. R=stichnot@chromium.org, jvoung@chromium.org BUG= Review URL: https://codereview.chromium.org/293983007
-
Jim Stichnoth authored
This adds infrastructure for low-level x86-32 instructions, and the target lowering patterns. Practically no optimizations are performed. Optimizations to be introduced later include liveness analysis, dead-code elimination, global linear-scan register allocation, linear-scan based stack slot coalescing, and compare/branch fusing. One optimization that is present is simple coalescing of stack slots for variables that are only live within a single basic block. There are also some fairly comprehensive cross tests. This testing infrastructure translates bitcode using both Subzero and llc, and a testing harness calls both versions with a variety of "interesting" inputs and compares the results. Specifically, Arithmetic, Icmp, Fcmp, and Cast instructions are tested this way, across all PNaCl primitive types. BUG= R=jvoung@chromium.org Review URL: https://codereview.chromium.org/265703002
-