- 22 May, 2015 2 commits
-
-
Jan Voung authored
Allow instructions to be predicated and use that in lower icmp and branch. Tracking the predicate for almost every instruction is a bit overkill, but technically possible. Add that to most of the instruction constructors except ret and call for now. This doesn't yet do compare + branch fusing, but it does handle the branch fallthrough to avoid branching twice. I can't yet test 8bit and 16bit, since those come from "trunc" and "trunc" is not lowered yet (or load, which also isn't handled yet). Adds basic "call(void)" lowering, just to get the call markers showing up in tests. 64bit.pnacl.ll no longer explodes with liveness consistency errors, so risk running that and backfill some of the 64bit arith tests. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1151663004
-
Jim Stichnoth authored
For C/C++ semantics, this applies to all the FP comparisons except == and != which require two comparisons due to ordered/unordered requirements. For == and !=, two comparisons and control flow are still used. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095 TEST= crosstest/test_fcmp R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1148023003
-
- 19 May, 2015 2 commits
-
-
Jan Voung authored
Do basic lowering for add, sub, and, or, xor, mul. We don't yet take advantage of commuting immediate operands (e.g., use rsb to reverse subtract instead of sub) or inverting immediate operands (use bic to bit clear instead of using and). The binary operations can set the flags register (e.g., to have the carry bit for use with a subsequent adc instruction). That is optional for the "data processing" instructions. I'm not yet able to compile 8bit.pnacl.ll and 64bit.pnacl.ll so 8-bit and 64-bit are not well tested yet. Only tests are in the arith.ll file (like arith-opt.ll, but assembled instead of testing the "verbose inst" output). Not doing divide yet. ARM divide by 0 does not trap, but PNaCl requires uniform behavior for such bad code. Thus, in LLVM we insert a 0 check and would have to do the same. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1127003003
-
Jim Stichnoth authored
This is instead of explicit control flow which may interfere with branch prediction. However, explicit control flow is still needed for types other than i16 and i32, due to cmov limitations. The assembler for cmov is extended to allow the non-dest operand to be a memory operand. The select lowering is getting large enough that it was in our best interest to combine the default lowering with the bool-folding optimization. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1125323004
-
- 18 May, 2015 1 commit
-
-
Jan Voung authored
Adds basic assignment instructions, mov, movn, movw, movt, ldr, etc. in order to copy around the first few integer (i32, i64) arguments out of r0 - r3, and then return then. The "mov" instruction is a bit special and can actually be a "str" when the dest is a stack slot. Model the Memory operand types, and the "flexible Operand2". Add a few tests demonstrating the flexibility of the immediate encoding. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1127963004
-
- 17 May, 2015 1 commit
-
-
Jim Stichnoth authored
Originally there was a peephole-style optimization in lowerIcmp() that looks ahead to see if the next instruction is a conditional branch with the right properties, and if so, folds the icmp and br into a single lowering sequence. However, sometimes extra instructions come between the icmp and br instructions, disabling the folding even though it would still be possible. One thought is to do the folding inside lowerBr() instead of lowerIcmp(), by looking backward for a suitable icmp instruction. The problem here is that the icmp lowering code may leave lowered instructions that can't easily be dead-code eliminated, e.g. instructions lacking a dest variable. Instead, before lowering a basic block, we do a prepass on the block to identify folding candidates. For the icmp/br example, the prepass would tentatively delete the icmp instruction and then the br lowering would fold in the icmp. This folding can also be extended to several producers: icmp (i32 operands), icmp (i64 operands), fcmp, trunc .. to i1 and several consumers: br, select, sext, zext This CL starts with 2 combinations: icmp32 paired with br & select. Other combinations will be added in later CLs. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4162 BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1141213004
-
- 16 May, 2015 1 commit
-
-
Jan Voung authored
Previously it would print both the targets compiled-in and the target requested on the commandline, but we really only care about what's compiled-in. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1132883003
-
- 14 May, 2015 1 commit
-
-
Jan Voung authored
Wasn't sure how to allow TargetX8632 and TargetARM32 to both define "ConstantInteger32::emit(GlobalContext *)", and define them differently if both targets happen to be ifdef'ed into the code. Rearranged things so that it's now "TargetFoo::emit(ConstantInteger32 *)", so that each TargetFoo can have a separate definition. Some targets may allow emitting some types of constants while other targets do not (64-bit int for x86-64?). Also they emit constants with a different style. E.g., the prefix for x86 is "$" while the prefix for ARM is "#" and there isn't a prefix for mips(?). Renamed emitWithoutDollar to emitWithoutPrefix. Did this sort of multi-method dispatch via a visitor pattern, which is a bit verbose though. We may be able to remove the emitWithoutDollar/Prefix for ConstantPrimitive by just inlining that into the few places that need it (only needed for ConstantInteger32). This undoes the unreachable methods added by: https://codereview.chromium.org/1017373002/diff/60001/src/IceTargetLoweringX8632.cpp The only place extra was for emitting calls to constants. There was already an inlined instance for OperandX8632Mem. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1129263005
-
- 12 May, 2015 2 commits
-
-
Jan Voung authored
Modify run-pnacl-sz to pass in the correct assembler/disasembler flags for ARM when not using the integrated assembler. Model the "ret" pseudo instruction (special form of "bx" inst). Separate from "bx" to allow epilogue insertion to find the terminator. Add a flag "--skip-unimplemented" to skip through all of the "Not yet implemented" assertions, and use that in the test. Set up a stack trace printer when ALLOW_DUMP so that the UnimplementedError prints out some useful information of *which* case is unimplemented. Change the .type ...,@function from @function to %function. ARM assembler seems to only like %function because "@" is a comment character. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1136793002
-
Karl Schimpf authored
Dependent on https://codereview.chromium.org/1122423005 being committed first. BUG=https://code.google.com/p/nativeclient/issues/detail?id=4164 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1130313002
-
- 07 May, 2015 1 commit
-
-
Jim Stichnoth authored
The original code for 64-bit icmp lowering had separate cases for eq/ne versus other conditions, mostly because eq/ne need two 32-bit comparisons while the others need three. However, with small changes, we can handle everything uniformly, simplifying the code. This gets thoroughly tested by the test_icmp cross test. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4162 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1130023002
-
- 04 May, 2015 1 commit
-
-
Jim Stichnoth authored
For an example like: %a = icmp eq i32 %b, %c The original icmp lowering sequence for i8/i16/i32 was something like: cmpl b, c movb 1, a je label movb 0, a label: The improved sequence is: cmpl b, c sete a In O2 mode, this doesn't help when successive compare/branch instructions are fused, but it does help when the boolean result needs to be saved and later used. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1118353005
-
- 30 Apr, 2015 3 commits
-
-
Karl Schimpf authored
Fixes code to follow new editing constants in class NaClMungedBitcode rather than obsolete editing constants in class NaClBitcodeMunger. BUG=None R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1120853002
-
Jan Voung authored
This is to conditionally (ifdef) include only the enabled target assemblers. Also rename the assembler's "x86" namespace to "X8632" for similar reasons. The namespace was created to hide generic sounding classes like "Address" which are used all over the assembler. Plop the somewhat empty AssemblerARM32 in an ARM32 namespace for consistency. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1114223002
-
Jim Stichnoth authored
It's sometimes useful to know whether a use of a stack variable (as opposed to a physical register) is the last use of that variable. For example, in a code sequence like: movl %edx, 24(%esp) movl 24(%esp), %edx it would be nice to know whether the code sequence is merely bad (i.e., 24(%esp) will be used later), or horrible (i.e., this ends 24(%esp)'s live range). We add stack variables to the per-instruction live-range-end annotation, but not to the per-block live-in and live-out annotations, because the latter would clutter the output greatly while adding very little actionable information. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4135 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1113133002
-
- 29 Apr, 2015 1 commit
-
-
Jim Stichnoth authored
The "pnacl-sz --asm-verbose=1" mode annotates the asm output with physical register liveness information, including which registers are live at the beginning and end of each basic block, and which registers' live ranges end at each instruction. Computing this information requires a final liveness analysis pass. One of the side effects of liveness analysis is to remove dead instructions, which happens when the instruction's dest variable is not live and the instruction lacks important side effects. In some cases, direct manipulation of physical registers was missing extra fakedef/fakeuse/etc., and as as result these instructions could be eliminated, leading to incorrect code. Without --asm-verbose, these instructions were being created after the last run of liveness analysis, so they had no chance of being eliminated and everything was fine. But with --asm-verbose, some instructions would be eliminated. This CL fixes the omissions so that the resulting code is runnable. An alternative would be to add a flag to liveness analysis directing it not to dead-code eliminate any more instructions. However, it's better to get the liveness right in case future late-stage optimizations rely on it. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4135 TEST= pydir/szbuild_spec2k.py --filetype=asm -v --sz=--asm-verbose=1 --force R=jvoung@chromium.org, kschimpf@google.com Review URL: https://codereview.chromium.org/1113683002
-
- 28 Apr, 2015 1 commit
-
-
Jim Stichnoth authored
In an earlier version of Subzero, the text output stream object was stack-allocated within main. A later refactoring moved its allocation into a helper function, but it was still being stack-allocated, which was bad when the helper function returned. This change allocates the object via "new", which fixes that problem, but reveals another problem: the raw_ostream object for some reason doesn't finish writing everything to disk and yielding a truncated output file. This is solved in the style of the ELF streamer, by using raw_fd_ostream instead. BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/1111603003
-
- 22 Apr, 2015 2 commits
-
-
Karl Schimpf authored
Adds a notion of an (optional) error stream to the existing log and emit streams. If not specified, the log stream is used. Error messages in parser/translation are sent to this new error stream. In the browser compiler server, a separate error (string) stream is created to capture errors. Method onEndCallBack returns the contents of the error stream (if non-empty) instead of a generic error message. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4138 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1052833003
-
Jan Voung authored
Later commits will add more information, but this tests the conditional compilation and build setup. One way to do conditional compilation: determine this early, at LLVM configure/CMake time. Configure will fill in the template of SZTargets.def.in to get a SZTargets.def file. LLVM change: https://codereview.chromium.org/1084753002/ NaCl change: https://codereview.chromium.org/1082953002/ I suppose an alternative is to fill in the .def file via -D flags in CXXFLAGS. For conditional lit testing, pnacl-sz dumps the attributes when given the --build-atts so we just build on top of that. We do that instead of go the LLVM way of filling in a lit.site.cfg.in -> lit.site.cfg at configure/CMake time. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1075363002
-
- 21 Apr, 2015 2 commits
-
-
Jim Stichnoth authored
If you switch between "cmake" and "autoconf" toolchain builds, and neglect to clean out pnacl_newlib_raw/ in between, the wrong libgtest and libgtest_main may get pulled in for the autoconf build, leading to an assertion failure in "make check-unit". This tweak fixes that problem by rejiggering the lib search path. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1099093005
-
Jim Stichnoth authored
The CMAKE=1 option is no longer needed. Pretty much all the tools we need are now in pnacl_newlib_raw/bin, so use PNACL_BIN_PATH set to that instead of using LLVM_BIN_PATH and BINUTILS_BIN_PATH. However, for the autoconf build, libgtest and libtest_main and clang-format are only under the llvm_x86_64_linux_work directory, so they need special casing. This also means that you have to actually do an LLVM build and not blow away the work directory in order to "make check-unit" or "make format". BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1085733002
-
- 16 Apr, 2015 5 commits
-
-
Karl Schimpf authored
Same as CL https://codereview.chromium.org/1071423003 (which has LGTM). BUG= https://code.google.com/p/nativeclient/issues/detail?id=4138 Review URL: https://codereview.chromium.org/1097563003
-
Karl Schimpf authored
This reverts commit 187b3dfa. A unit test fails when it shouldn't. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4138 Review URL: https://codereview.chromium.org/1071423003
-
Karl Schimpf authored
This reverts commit a7340883. I reverted the wrong patch! BUG= https://code.google.com/p/nativeclient/issues/detail?id=4138 Review URL: https://codereview.chromium.org/1089323005
-
Karl Schimpf authored
This reverts commit d8fb3d33. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4138 Review URL: https://codereview.chromium.org/1091123002
-
Karl Schimpf authored
The method TopLevelParser::ErrorAt applies a lock to print the error message. Unfortunately, it keeps the lock longer than necessary, resulting in deadlock (on following fatal message) if error recovery is not allowed. Fixed by limiting scope of lock to only apply to the printing of the error message. Modified ClFlags to allow a "reset", and made ClFlags modifiable by bitcode munge tests. This allowed us to test this problem as a unit test. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4138 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1091023002
-
- 10 Apr, 2015 1 commit
-
-
Jan Voung authored
This is to go with toolchain_build changes which make LLVM cmake also use libc++: https://codereview.chromium.org/978963002/ May help with the memory sanitizer build, which wants most code to be built with memory sanitizer (e.g., make a special build of libc++). BUG= https://code.google.com/p/nativeclient/issues/detail?id=4119 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1074253002
-
- 09 Apr, 2015 2 commits
-
-
Jim Stichnoth authored
To make this work, Subzero provides its own RandomShuffle() as a replacement for std::random_shuffle(), and the Subzero implementation doesn't depend on the stdlib implementation. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4129 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1072913002
-
Jim Stichnoth authored
Otherwise, constant pools are emitted in hash table order, which can vary across systems. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4129 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1069453003
-
- 07 Apr, 2015 1 commit
-
-
Jim Stichnoth authored
Autoconf is the default. Use "make -f Makefile.standalong CMAKE=1" to use the cmake build. BUG= none R=jvoung@chromium.org, mtrofin@chromium.org Review URL: https://codereview.chromium.org/998863002
-
- 06 Apr, 2015 1 commit
-
-
Mircea Trofin authored
- redundant ';' after namespace decls - mix of enums and integer values - use of && insteand of & for bitwise operations BUG=NONE R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1062803005
-
- 31 Mar, 2015 1 commit
-
-
Jan Voung authored
The \0 delimited string array that the browser sends doesn't have the program name and the IRT only tokenizes that and forwards it along. We need argv[0] to make the llvm CL parser happy (used for -help message, etc). Alternatively, we could have the IRT fill in a program name so that the argv is a real argv. That will involve less copying since the argv will be the right size to begin with, but prevents each app from customizing its argv[0] =/ BUG= https://code.google.com/p/nativeclient/issues/detail?id=4091 TEST= manual for now (construct the sel_universal script to only pass the "--build-atts" flag and see it exits without being swallowed, or pass "-Ofoo" and see an error + exit) R=mtrofin@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1041843003
-
- 30 Mar, 2015 1 commit
-
-
Jim Stichnoth authored
BUG= none R=mtrofin@chromium.org Review URL: https://codereview.chromium.org/1041303002
-
- 29 Mar, 2015 1 commit
-
-
Jim Stichnoth authored
When trying to do bisection debugging, the pnacl-llc translation was happening every time even if the pexe didn't change. This is because it was checking for a binary called 'llc' in the current directory, instead of an absolute path the pnacl-llc. (This check is done so that updating pnacl-llc triggers a rebuild of the bisection binary, similar to the check for an update of pnacl-sz.) BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1044623003
-
- 27 Mar, 2015 1 commit
-
-
Jan Voung authored
Handlers are represented as a "compile server" even though right now it can really only handle a single compile request. Then there can be a commandline-based server and a browser-based server. This server takes over the main thread. In the browser-based case the server can block, waiting on bytes to be pushed. This becomes a producer of bitcode bytes. The original main thread which did bitcode reading is now shifted to yet another worker thread, which is then the consumer of bitcode bytes. This uses an IRT interface for listening to messages from the browser: https://codereview.chromium.org/984713003/ TEST=Build the IRT core nexe w/ the above patch and compile w/ something like: echo """ readwrite_file objfile /tmp/temp.nexe---gcc.opt.stripped.pexe---.o rpc StreamInitWithSplit i(4) h(objfile) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) C(4,-O2\x00) * s() stream_file /usr/local/google/home/jvoung/pexe_tests/gcc.opt.stripped.pexe 65536 1000000000 rpc StreamEnd * i() s() s() s() echo "pnacl-sz complete" """ | scons-out/opt-linux-x86-32/staging/sel_universal \ -a -B scons-out/nacl_irt-x86-32/staging/irt_core.nexe \ --abort_on_error \ -- toolchain/linux_x86/pnacl_translator/translator/x86-32/bin/pnacl-sz.nexe echo """ readwrite_file nexefile /tmp/temp.nexe.tmp readonly_file objfile0 /tmp/temp.nexe---gcc.opt.stripped.pexe---.o rpc RunWithSplit i(1) h(objfile0) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(invalid) h(nexefile) * echo "ld complete" """ | /usr/local/google/home/nacl3/native_client/scons-out/opt-linux-x86-32/staging/sel_universal \ --abort_on_error \ -a -B \ scons-out/nacl_irt-x86-32/staging/irt_core.nexe \ -E NACL_IRT_OPEN_RESOURCE_BASE=toolchain/linux_x86/pnacl_translator/translator/x86-32/lib/ \ -E NACL_IRT_OPEN_RESOURCE_REMAP=libpnacl_irt_shim.a:libpnacl_irt_shim_dummy.a \ -- toolchain/linux_x86/pnacl_translator/translator/x86-32/bin/ld.nexe BUG= https://code.google.com/p/nativeclient/issues/detail?id=4091 R=kschimpf@google.com, stichnot@chromium.org Review URL: https://codereview.chromium.org/997773002
-
- 24 Mar, 2015 2 commits
-
-
Jim Stichnoth authored
The x86-32 xchg and xadd instructions are modeled using two source operands, one of which is a memory operand and the other ultimately a physical register. These instructions have a side effect of modifying both operands. During lowering, we need to specially express that the instruction modifies the Variable operand (since it doesn't appear as the instruction's Dest variable). This makes the register allocator aware of the Variable being multi-def, and prevents it from sharing a register with an overlapping live range. This was being partially expressed by adding a FakeDef instruction. However, FakeDef instructions are still allowed to be dead-code eliminated, and if this happens, the Variable may appear to be single-def, triggering the unsafe register sharing. The solution is to prevent the FakeDef instruction from being eliminated, via a FakeUse instruction. It turns out that the current register allocator isn't aggressive enough to manifest the bug with cmpxchg instructions, but the fix and tests are there just in case. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1020853011
-
Jan Voung authored
Otherwise you get: In file included from src/IceGlobalContext.cpp:21: In file included from src/IceCfg.h:21: src/IceGlobalContext.h:257:44: error: variable 'TLS' is uninitialized when used within its own initialization [-Werror,-Wuninitialized] ThreadContext *TLS = ICE_TLS_GET_FIELD(TLS); ~~~ ^~~ src/IceTLS.h:95:39: note: expanded from macro 'ICE_TLS_GET_FIELD' ^ So rename the local var to Tls. BUG=none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1030793002
-
- 23 Mar, 2015 3 commits
-
-
Jim Stichnoth authored
The non-mov-like SSE instructions generally require 16-byte aligned memory operands. The PNaCl bitcode ABI only guarantees 4-byte alignment or less on vector loads and stores. Subzero maintains stack alignment so stack memory operands are fine. We handle this by legalizing memory operands into a register wherever there is doubt. This bug was first discovered on the vector_align scons test. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4083 BUG= https://code.google.com/p/nativeclient/issues/detail?id=4133 R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1024253003
-
Jim Stichnoth authored
The gcc torture test suite has examples where there is a function call (to a routine that throws an exception or aborts or something), followed by an "unreachable" instruction, followed by more code that may e.g. return a value to the caller. In these examples, the code following the unreachable is itself unreachable. Problems arise when the unreachable code references a variable defined in the reachable code. This triggers a liveness consistency error because the use of the variable has no reaching definition. It's a bit surprising that LLVM actually allows this, but it does so we need to deal with it. The solution is, after initial CFG construction, do a traversal starting from the entry node and then delete any undiscovered nodes. There is code in Subzero that assumes Cfg::Nodes[i]->Number == i, so the nodes need to be renumbered after pruning. The alternative was to set Nodes[i]=nullptr and not change the node number, but that would mean peppering the code base with CfgNode null checks. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1027933002
-
Jim Stichnoth authored
When lowering of a couple of atomic intrinsics down to a loop structure, a FakeUse on the memory address's base variable is created. However, if the memory address is a global constant, there is no base variable. So check for that and don't create a FakeUse if there is none. BUG= none TEST=synchronization_sync (scons test) R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1023673007
-