- 15 Sep, 2015 2 commits
-
-
Jim Stichnoth authored
The problem is that the block profiling pass runs at the very beginning and commits to particular label strings, but the actual label names might change by emission time because of node reordering. There was actually something of a workaround - given a label string from the profile output, inspect the *profiled* asm code and search for the block containing the increment of the counter location, as the name of the counter location label is related to the label string in the profile output. However, it's tedious to mentally filter out the counter update code, and the counter update code has a huge impact on register allocation. The solution is to use a persistent number in CfgNode for constructing the label string, which doesn't change when the nodes are reordered. One note (independent of this change): Without block profiling, empty basic blocks are deleted and don't appear in the asm output. But with block profiling, these blocks are never empty because they contain profile update instructions. This means the profile output may contain labels that don't exist in the non-profiled asm. Another note: New nodes created as a result of edge splitting from advanced phi lowering are not profiled. BUG= none R=ascull@google.com, jpp@chromium.org Review URL: https://codereview.chromium.org/1341613002 .
-
Jim Stichnoth authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4307 R=ascull@google.com Review URL: https://codereview.chromium.org/1337113008 .
-
- 14 Sep, 2015 2 commits
-
-
John Porto authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1343783003 .
-
Andrew Scull authored
The C++11 syntax is more powerful as it also allows templating. This commit converts all type aliasing to the same syntax for consistency. BUG= R=jpp@chromium.org Review URL: https://codereview.chromium.org/1343843003 .
-
- 11 Sep, 2015 1 commit
-
-
John Porto authored
64-bit ints (and vectors) are future work. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=kschimpf@google.com, stichnot@chromium.org Review URL: https://codereview.chromium.org/1330933002 .
-
- 09 Sep, 2015 2 commits
-
-
Andrew Scull authored
1. The assembler tried to write to unaligned addresses but memcpy fixes this. 2. The InstKind and OperandKind enums allowed target specific kinds that did not fall in the defined range. Defining the maximum target kind in the enum sorts this problem. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1311653003 .
-
Andrew Scull authored
3.8 has stricter warnings which LLVM doesn't pass so we turn them off when including the offending headers. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1318883007 .
-
- 08 Sep, 2015 2 commits
-
-
Jim Stichnoth authored
Scraped non-pragma related warnings from a recent trybot run: wget -O - 'http://build.chromium.org/p/tryserver.nacl/builders/nacl-toolchain-win7-pnacl-x86_64/builds/3221/steps/llvm_i686_w64_mingw32%20%28build%29/logs/stdio' | grep subzero | grep warning | grep -v pragma BUG= none R=ascull@google.com, jpp@chromium.org Review URL: https://codereview.chromium.org/1315193020.
-
John Porto authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1310863007.
-
- 04 Sep, 2015 3 commits
-
-
John Porto authored
aliases. BUG= R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1319203005.
-
Jim Stichnoth authored
A NOASSERT/MINIMAL build complains that private field Func is unused, which turned out to be true because of shadowing. BUG= none R=ascull@google.com Review URL: https://codereview.chromium.org/1312903015
-
Andrew Scull authored
further improves spec2k performance and fixes the regression in ammp. Loops are identified using an extension to Tarjan's algorithm. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1318553003.
-
- 03 Sep, 2015 1 commit
-
-
Jim Stichnoth authored
This is a reStructuredText version of https://docs.google.com/a/google.com/document/d/1DmLVyZqqwWSZ0is91lipTm4xsfjInSba2KBOOxatYhg/edit?usp=sharing which for technical reasons is only visible to @google.com accounts. Also update README.rst to be more accurate. BUG= none R=jfb@chromium.org, jpp@chromium.org, jvoung@chromium.org Review URL: https://codereview.chromium.org/1309073003.
-
- 31 Aug, 2015 1 commit
-
-
John Porto authored
This makes it easier and less error-prone to implement the relatively common pattern of looking at all the Variable operands contained within an instruction. BUG= none R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1323693002.
-
- 28 Aug, 2015 1 commit
-
-
Andrew Scull authored
Count the number of instructions that use a variable following the heuristic that more uses implies higher register priority. This is currently very simple but is precursor work for weighting variables by loop nest depth. BUG= R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1312433004.
-
- 25 Aug, 2015 2 commits
-
-
Karl Schimpf authored
This encorporate the changes introduced by llvm CL https://codereview.chromium.org/1310883003 (see for details). BUG=None R=dschuff@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1312473006 .
-
Andrew Scull authored
BUG= R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1310833003.
-
- 21 Aug, 2015 2 commits
-
-
Karl Schimpf authored
The bitcode reader for the switch insruction did not check if the branch labels were defined. This patch fixes the problem. Includes test for such a case. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1303003002 .
-
Qining Lu authored
BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1306713002.
-
- 20 Aug, 2015 3 commits
-
-
Qining Lu authored
This removes random number generator from GlobalContext class and decouples different randomization passes 1. Add a new constructor for random number generator which merge three arguments to into one seed for the underlying implementation of random number generator. RandomNumberGenerator(uint64_t Seed, RandomizationPassesEnum RandomizationPassID, uint64_t Salt=0) param Seed: Should be the global random number seed passed through command line. param RandomizationPassID: Should be the ID for different randomization passes. param Salt: Should be an additional integer salt, default to be 0. 2. Move the creation of random number generators to the call sites of randomization passes. Each randomization pass create its own random number generator with specific salt value. Function reordering: Salt = 0 (default) Basic Block reordering: Salt = Function Sequence Number Global Variable reordering: Salt = 0 (default) Pooled Constants reordering: Salt = Constants' Kind value (return of getKind()) *Jump Tables: Salt = 0 Nop Insertion: Salt = Function Sequence Number Register Alloc Randomization: Salt = (Function Sequence Number << 1) ^ (Kind == RAK_Phi ? 0u : 1u) Constants Blinding: Salt = Function Sequence Number *Jump tables are treated as pooled constants, but without Kind value as salt. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1300993002. -
Andrew Scull authored
The memory intrinsics are only optimized at -O1 and higher unless the -fmem-intrin-opt flag is set to force to optimization to take place. This change also introduces the xchg instruction for two register operands. This is no longer used in the memory intrinsic lowering (or by anything else) but the implementation is left for future use. BUG= R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1278173009.
-
Karl Schimpf authored
Changes to use arena allocator of the CFG associated with function, for vectors in the function parser. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1293343003 .
-
- 17 Aug, 2015 1 commit
-
-
Karl Schimpf authored
CL 1282523002 changed the bitcode parser from using a vector, to using an unordered map. This was done because one could forward reference a local variable, and would freeze the computer trying to allocate a vector large enough to contain the index. This patch goes back to using vectors. To fix the forward variable reference, we use the number of bytes in the function to determine if the index is possible. This stops very large (probematic) vector resizes from happening. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1293923002 .
-
- 14 Aug, 2015 1 commit
-
-
Karl Schimpf authored
Changing the code to "preallocate" basic blocks in a vector, rather than dynamically creating on demand. This has the advantage of not requiring basic blocks to be sorted after the bitcode is parsed. This also means that the name of the basic blocks remain constant, even during parsing, making debugging easier. The drawback is that the DECLAREBLOCKS bitcode record of a function block can define a very large number of basic blocks. To control this, we look at the function block size (within the bitstream) to determine the maximal number of basic blocks that could be defined. If the DECLAREBLOCKS record specifies a number larger than this, we generate an error and recover (if applicable). We also add an cleanup test that confirms the number of declared basic blocks correspond to the number of basic blocks defined in the function. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4261 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1297433002 .
-
- 12 Aug, 2015 2 commits
-
-
John Porto authored
This CL modifies the x86 instruction selection template to allow native 64-bit GPR support. It also enables x86-64 crosstests. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1273153002.
-
Karl Schimpf authored
The bitcode parser contained a number of TODO commments about removing error recovery once the parser was implemented. This was based on the fact that (1) pnacl-sz in the browser didn't need it, and (2) it would be trivial to remove. The advantage of leaving error recovery in is that multiple errors can be found at once. It turns out it isn't so easy to remove, since several methods assume that recovery can be applied, and hence do not need propagate back up an (optional) error code. Parsing backs out relatively quickly anyway, since the code exits between bitcode records anyway. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1286513002 .
-
- 10 Aug, 2015 2 commits
-
-
Jim Stichnoth authored
1. Fix MINIMAL build. (a) Add a void cast to a var only used in asserts. (b) Use "REQUIRES:" instead of "REQUIRES" in a .ll file. 2. Use StrError instead of StrDump for errors. 3. Use a lambda instead of a functor because C++11. 4. Explicit check for -filetype=obj in a non-dump-enabled build, to avoid cryptic downstream error messages. 5. Run "make format" which was neglected earlier. BUG= none R=kschimpf@google.com Review URL: https://codereview.chromium.org/1284493003.
-
Karl Schimpf authored
The previous code used a vector to hold local values associated with indices in the bitcode file. The problem was that the vector would be expanded to match the index of a "variable index forward reference". If the index was very large, the program would freeze the computer trying to allocate an array large enough to contain the index. This patch fixes this by using a local unordered map instead of a vector. Hence, forward index references just add a sinle entry into the map. Note that this fix doesn't have a corresponding issue. However, the problem was made apparent from the problems noted in issues 4257 and 4261. BUG=None R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1282523002 .
-
- 09 Aug, 2015 1 commit
-
-
Jan Voung authored
Lower some instructions, without much guarantee of correctness. *Running* generated code will be risky because the register allocator isn't aware of register aliasing. Fill in v{add,div,mul,sub}.f{32,64}, vmov, vldr and vsqrt.f{32,64}. I tried to make the nacl-other-intrinsics test not explode, so added vsqrt too. That was pretty easy for sqrt, but then fabs tests also exploded. Those are not truly fixed but are currently "fixed" by adding a FakeDef to satisfy liveness. Propagate float/double arguments to the right register in lowerArguments, lowerCall, and propagate to s0/d0/q0 for lowerReturn. May need to double check the calling convention. Currently can't test call-ret because vpush/vpop for prologues and epilogues isn't done. Legalize FP immediates to make the nacl-other-intrinsics sqrt test happy. Use the correct type of load (vldr (.32 and .64 are optional) instead of ldr{b,h,,d}). Whether or not the float/vector instructions can be predicated is a bit interesting. The float/double ones can, but the SIMD versions cannot. E.g. vadd<cond>.f32 s0, s0, s1 is okay vadd<cond>.f32 q0, q0, q1 is not okay. For now, just omit conditions from instructions that may end up being reused for SIMD. Split up the fp.pnacl.ll test into multiple ones so that parts of lowering can be tested incrementally. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4076 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1266263003 .
-
- 08 Aug, 2015 1 commit
-
-
Jim Stichnoth authored
The problem was that Translator and subclasses need to have virtual destructors since they are used as unique_ptr<>. As a result, only the Translator base class destructor was being invoked. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4290 TEST= make -f Makefile.standalone ASAN=1 check-lit R=ascull@google.com Review URL: https://codereview.chromium.org/1281003003.
-
- 07 Aug, 2015 3 commits
-
-
Karl Schimpf authored
The code used to use a vector to hold global variables associated with indices. The problem was that the count record in the global vars block would generate variables for the given count (even if very large). To fix this, we created a local unordered map to associate indices with defined/referenced globals. After processing the global vars block, this unordered map is used to verify the size makes sense, and then install the recognized global variables into the (top-level) contents. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4257 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1278793006 .
-
Andrew Scull authored
Combined with memset inlining, this has shown an improvement of over 11% on the eon benchmark. This the only C++ program in spec2k and it seems C++ programs have a significantly larger number of memset/memcpy calls. Other benchmarks also showed improvement of ~5% (perlbmk, parser) while most had a 1-2% improvement. This commit also includes a refactoring of lowerMemset which is much more readable and also removed the fake use of the destination pointer register. BUG= R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1279833005.
-
Jim Stichnoth authored
Specifically, if a variable is marked with IgnoreLiveness=true, then: 1. Completely avoid adding any segments to its live range 2. Assert that no one tries to add segments to its live range This is done in part by incorporating Variable::IgnoreLiveness into Liveness::RangeMask. Also, change a functor into a lambda because C++11. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1273823003.
-
- 06 Aug, 2015 4 commits
-
-
Jim Stichnoth authored
The symbol __Sz_block_profile_info needs to be explicitly globalized when combining llc and Subzero .o output, i.e. when the --include or --exclude options are used. BUG= none R=jpp@chromium.org Review URL: https://codereview.chromium.org/1275983003.
-
Karl Schimpf authored
The code used to use a vector to hold basic blocks associated with indices. This had two problem: 1) The "number of blocks" record would generate a vector of the given size (even if very large); and 2) Indices would expand the vector to define the index (even if very large). In most cases, such large values are incorrect. To fix this, this patch uses a map from block index, to the corresponding basic block. Only after all basic block indices have been processed, we check that the size makes sense, and convert it to a vector. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4261 R=jpp@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1275953002 .
-
Andrew Scull authored
The IACI marks identify the code which should be analyzed with the IACA. The generated binaries are not executable due to the marks. This feature should only be used during develpoment when analyzing generated code so it is protected behind the --allow-iaca-marks flag. ScopedIacaMark is a helper class which opens mark and closes it at the end of the scope. This is useful when there are many returns as you don't have to write `_iaca_end()` before them all. BUG= R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1260093003.
-
John Porto authored
Specifically, it moves lowerArguments lowerRet addProlog addEpilog from the x86 lowering template to the concrete lowering implementations. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077 R=stichnot@chromium.org Review URL: https://codereview.chromium.org/1261383002.
-
- 05 Aug, 2015 3 commits
-
-
Jim Stichnoth authored
With a certain combination of memset arguments, legalizeToReg() is called but its result is unused. Om1 register allocation does not like this because it sees an infinite-weight variable with a definition but no uses. The simplest fix is to add a fake use. The problem shows up when building spec2k with -Om1. BUG= none R=ascull@google.com Review URL: https://codereview.chromium.org/1272823004.
-
Jim Stichnoth authored
When doing post phi lowering register allocation, the original code limited register allocation to pre-colored or infinite-weight variables with a non-empty live range within the new edge-split nodes. This limitation ends up missing some opportunities. Specifically, when a temporary is introduced to break a dependency cycle, e.g.: // a = phi(b) // b = phi(a) t = a a = b b = t then t was always stack-allocated, even if a physical register was available. In the new design, the RangeMask bitvector specifies which variables should have their live ranges tracked and computed. For normal liveness analysis, all variables are tracked. For post phi lowering liveness analysis, all variables created from phi lowering, plus all pre-colored variables, plus all infinite-weight variables, are tracked. The result is slightly better code quality, and sometimes the frame size is 1 or 2 words smaller. The hope was to narrow the 10% translation-time degradation in pnacl-llc.pexe compared to the old, hackish way of phi lowering, but that goal still proves to be elusive. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1271923002.
-
John Porto authored
BUG= https://code.google.com/p/nativeclient/issues/detail?id=4077 R=jvoung@chromium.org, stichnot@chromium.org Review URL: https://codereview.chromium.org/1266673003.
-