1. 23 Oct, 2014 1 commit
    • Subzero: Improve debugging controls, plus minor refactoring. · 088b2be2
      Jim Stichnoth authored
      1. Decorate the list of live-in and live-out variables with register assignments in the dump() output.  This helps one to assess register pressure.
      
      2. Fix a bug where the DisableInternal flag wasn't being honored for function definitions.
      
      3. Add a -translate-only=<symbol> to limit translation to a single function or global variable.  This makes it easier to focus on debugging a single function.
      
      4. Change the -no-phi-edge-split option to -phi-edge-split and invert the meaning, to better not avoid the non double negatives.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/673783002
  2. 21 Oct, 2014 1 commit
  3. 20 Oct, 2014 2 commits
  4. 16 Oct, 2014 2 commits
  5. 15 Oct, 2014 5 commits
    • emitIAS for movsx and movzx. · 39d4aca3
      Jan Voung authored
      Force dest to be the full 32-bit reg instead of sometimes being
      a 16-bit reg. This is to save on a operand size prefix (and
      avoid passing the DestTy down to the dispatchers).
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/647223004
    • Subzero: Speed up VariablesMetadata initialization. · 877b04e4
      Jim Stichnoth authored
      Currently, O2 calls VariablesMetadata::init() 4 times:
      
      - Twice for liveness analysis, where only multi-block use information is needed for dealing with sparse bit vectors.
      
      - Once for address mode inference, where single-definition information is needed.
      
      - Once for register allocation, where all information is needed, including the set of all definitions which is needed for determining AllowOverlap.
      
      So we limit the amount of data we gather based on the actual need.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/650613003
    • Subzero: Class definition cleanup. · 7b451a92
      Jim Stichnoth authored
      For consistency, put deleted ctors at the beginning of the class
      definition.
      
      If the default copy ctor or assignment operator is not deleted,
      and the default implementation is used, leave it commented out to
      indicate it is intentional.
      
      Also, fixed one C++11 related TODO.
      
      BUG= none
      R=jvoung@chromium.org, kschimpf@google.com
      
      Review URL: https://codereview.chromium.org/656123003
    • Subzero: Register allocator performance improvements and simplifications. · 5ce0abb8
      Jim Stichnoth authored
      This removes the redundancy between live ranges stored in the Variable and those stored in Liveness, by removing the Liveness copy.  After liveness analysis, live ranges are constructed directly into the Variable.
      
      Also, the LiveRangeWrapper is removed and Variable * is directly used instead.  The original thought behind LiveRangeWrapper was that it could be extended to include live range splitting.  However, when/if live range splitting is implemented, it will probably involve creating a new variable with its own live range, and carrying around some extra bookkeeping until the split is committed, so such a wrapper probably won't be needed.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/656023002
    • emitIAS for Shld and Shrd and the ternary and three-address ops. · 962befa4
      Jan Voung authored
      Give a different name to the crosstest .s and .o files depending on the
      CPU features as well. That way the SSE2 and SSE4.1 .s and .o are separate.
      
      The encodings for Pextrw and Pextrb/d... make me sad.
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/656983002
  6. 14 Oct, 2014 2 commits
    • Subzero: Enhance the timer dump format. · abce6e56
      Jim Stichnoth authored
      This adds update counts to the output, e.g.:
      
      Total across all functions - Flat times:
          0.262297 (13.0%): [ 1287] linearScan
          0.243965 (12.1%): [ 1287] emit
      ...
      
      This is useful to know when some passes are called once per function and others are called several times per function.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/655563005
    • Subzero: Improve performance of liveness analysis and live range construction. · 4775255d
      Jim Stichnoth authored
      The key performance problem was that the per-block LiveBegin and LiveEnd vectors were dense with respect to the multi-block "global" variables, even though very few of the global variables are ever live within the block.  This led to large vectors needlessly initialized and iterated over.
      
      The new approach is to accumulate two small vectors of <variable,instruction_number> tuples (LiveBegin and LiveEnd) as each block is processed, then sort the vectors and iterate over them in parallel to construct the live ranges.
      
      Some of the anomalies in the original liveness analysis code have been straightened out:
      
      1. Variables have an IgnoreLiveness attribute to suppress analysis.  This is currently used only on the esp register.
      
      2. Instructions have a DestNonKillable attribute which causes the Dest variable not to be marked as starting a new live range at that instruction.  This is used when a variable is non-SSA and has more than one assignment within a block, but we want to treat it as a single live range.  This lets the variable have zero or one live range begins or ends within a block.  DestNonKillable is derived automatically for two-address instructions, and annotated manually in a few other cases.
      
      This is tested by comparing the O2 asm output in each Spec2K component.  In theory, the output should be the same except for some differences in pseudo-instructions output as comments.  However, some actual differences showed up, related to the i64 shl instruction followed by trunc to i32.  This turned out to be a liveness bug that was accidentally fixed.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/652633002
  7. 13 Oct, 2014 4 commits
  8. 09 Oct, 2014 1 commit
  9. 08 Oct, 2014 5 commits
  10. 07 Oct, 2014 4 commits
  11. 06 Oct, 2014 1 commit
    • emitIAS for icmp, and test, movss-reg, movq, movups, storep, storeq, tighten some of the Xmm ops · e4dc61bf
      Jan Voung authored
      The "test" instruction is used in very limited situations. I've made a best effort
      to fill in the possible forms (address for the first operand), but it's not tested,
      so I put the *untested* parts behind an assert. Otherwise it's very similar to
      icmp, so if it starts to be used and tested then the asserts can be taken out,
      and the code shared with icmp.
      
      Tighten some of the XMM dispatch/emitters. Most of those XMM instructions
      can only encode the variant where dest is a register. Rather than waste
      a slot for a NULL method pointer, just make the struct type have two variants
      instead of three.
      
      Fill out a couple of XMM instructions which *do* allow mem-ops as dest
      (mov instructions).
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/624263002
  12. 04 Oct, 2014 2 commits
    • Subzero: Optimize a common live range overlap calculation. · df861f73
      Jim Stichnoth authored
      Call instruction lowering includes the FakeKill instruction, which creates several precolored variables, one for each scratch register.  The live range for each of these variables consists of a set of "point" ranges, one point for every FakeKill instruction.  The overlaps() logic is such that a point range never overlaps with an individual instruction, but it can overlap with a normal non-point range.
      
      It turns out that during register allocation, usually most of the variables on the Inactive list are these FakeKill instructions.  The live range representation can be quite large if there are many calls in the function.  In the "Check for inactive ranges that have expired or reactivated" section, a lot of time was spent on overlapsStart() calls that were doomed to return false.
      
      This change lets the live range keep track of whether it contains non-point segments, and if not, optimize the overlaps(InstNumberT) method.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/631483002
    • Handle GPR and vector shift ops. Handle pmull also. · 8bcca041
      Jan Voung authored
      For the integer shift ops, since the Src1 operand is forced
      to be an immediate or register (cl), it should be legal to
      have Dest+Src0 be either register or memory. However, we
      are currently only using the register form. It might be the
      case that shift w/ Dest+Src0 as mem are less optimized
      on some micro-architectures though, since it has to load,
      shift, and store all in one operation, but I'm not sure.
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/622113002
  13. 02 Oct, 2014 1 commit
  14. 01 Oct, 2014 5 commits
  15. 30 Sep, 2014 3 commits
    • Subzero: Rewrite the pass timing infrastructure. · c4554d78
      Jim Stichnoth authored
      This makes it much more useful for individual analysis and long-term translation performance tracking.
      
      1. Collect and report aggregated across the entire translation, instead of function-by-function.  If you really care about a single function, just extract it and translate it separately for analysis.
      
      2. Remove "-verbose time" and just use -timing.
      
      3. Collects two kinds of timings: cumulative and flat.  Cumulative measures the total time, even if a callee also times itself.  Flat only measures the currently active timer at the top of the stack.  The flat times should add up to 100%, but cumulative will usually add up to much more than 100%.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/610813002
    • Subzero: Change the echoing in shellcmd(). · 118ca798
      Jim Stichnoth authored
      This makes it much easier to copy/paste the output.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/611983003
    • Handle imul, pcmpeq, pcmpgt. · 0ac50dcf
      Jan Voung authored
      Be sure to legalize 8-bit imul immediates (there is only the r/m form).
      Add a test for that, and cover a couple of other ops too...
      
      There is a one-byte-shorter form when Dest/Src0 == EAX and Src1 is not
      an immediate, but that isn't taken advantage of.
      
      Go ahead and add the optimization for 8-bit immediates for i16/i32
      (not allowed for i8). It shows up sometimes in spec, e.g., to multiply by 10.
      There is a lot of multiply by 4 as well, that we could strength-reduce.
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/617593002
  16. 29 Sep, 2014 1 commit
    • Change some explicit type checks into using helper functions. · 3a569183
      Jan Voung authored
      For some arithmetic assembler methods, instead of checking
      IceType_i8 || IceType_i1, only allow IceType_i8 and assert if
      an i1 leaked to that stage (should have been vetted earlier
      by the bitcode reader / ABI checks). Could have looked up the
      type width and isIntegerArithmeticType, etc. in the property table,
      but that seemed a bit heavy for just checking one type
      (or one of two types).
      
      Also changed some f32 || f64 checks into just using
      isScalarFloatingType() which looks things up in a property table.
      Could alternatively just keep it as an simple f32 || f64 check,
      and I could change isScalarFloatingType()'s implementation.
      
      In some places where we assume something is either i32 or i64
      and do a select, change that into using a helper function
      so that we can do one compare, and then assert. Some of the
      asserts are really redundant (already within a branch which
      already checked that), but hopefully that disappears if
      we compile in release mode.
      Similar for f32 or f64 (which happened a lot in the assembler).
      
      BUG=none
      R=kschimpf@google.com, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/613483002