1. 04 Oct, 2014 1 commit
    • Handle GPR and vector shift ops. Handle pmull also. · 8bcca041
      Jan Voung authored
      For the integer shift ops, since the Src1 operand is forced
      to be an immediate or register (cl), it should be legal to
      have Dest+Src0 be either register or memory. However, we
      are currently only using the register form. It might be the
      case that shift w/ Dest+Src0 as mem are less optimized
      on some micro-architectures though, since it has to load,
      shift, and store all in one operation, but I'm not sure.
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/622113002
  2. 02 Oct, 2014 1 commit
  3. 01 Oct, 2014 5 commits
  4. 30 Sep, 2014 3 commits
    • Subzero: Rewrite the pass timing infrastructure. · c4554d78
      Jim Stichnoth authored
      This makes it much more useful for individual analysis and long-term translation performance tracking.
      
      1. Collect and report aggregated across the entire translation, instead of function-by-function.  If you really care about a single function, just extract it and translate it separately for analysis.
      
      2. Remove "-verbose time" and just use -timing.
      
      3. Collects two kinds of timings: cumulative and flat.  Cumulative measures the total time, even if a callee also times itself.  Flat only measures the currently active timer at the top of the stack.  The flat times should add up to 100%, but cumulative will usually add up to much more than 100%.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/610813002
    • Subzero: Change the echoing in shellcmd(). · 118ca798
      Jim Stichnoth authored
      This makes it much easier to copy/paste the output.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/611983003
    • Handle imul, pcmpeq, pcmpgt. · 0ac50dcf
      Jan Voung authored
      Be sure to legalize 8-bit imul immediates (there is only the r/m form).
      Add a test for that, and cover a couple of other ops too...
      
      There is a one-byte-shorter form when Dest/Src0 == EAX and Src1 is not
      an immediate, but that isn't taken advantage of.
      
      Go ahead and add the optimization for 8-bit immediates for i16/i32
      (not allowed for i8). It shows up sometimes in spec, e.g., to multiply by 10.
      There is a lot of multiply by 4 as well, that we could strength-reduce.
      
      BUG=none
      R=stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/617593002
  5. 29 Sep, 2014 3 commits
    • Change some explicit type checks into using helper functions. · 3a569183
      Jan Voung authored
      For some arithmetic assembler methods, instead of checking
      IceType_i8 || IceType_i1, only allow IceType_i8 and assert if
      an i1 leaked to that stage (should have been vetted earlier
      by the bitcode reader / ABI checks). Could have looked up the
      type width and isIntegerArithmeticType, etc. in the property table,
      but that seemed a bit heavy for just checking one type
      (or one of two types).
      
      Also changed some f32 || f64 checks into just using
      isScalarFloatingType() which looks things up in a property table.
      Could alternatively just keep it as an simple f32 || f64 check,
      and I could change isScalarFloatingType()'s implementation.
      
      In some places where we assume something is either i32 or i64
      and do a select, change that into using a helper function
      so that we can do one compare, and then assert. Some of the
      asserts are really redundant (already within a branch which
      already checked that), but hopefully that disappears if
      we compile in release mode.
      Similar for f32 or f64 (which happened a lot in the assembler).
      
      BUG=none
      R=kschimpf@google.com, stichnot@chromium.org
      
      Review URL: https://codereview.chromium.org/613483002
    • Subzero: Update link path after recent newlib install changes. · ed178a6d
      Jim Stichnoth authored
      BUG= none
      R=dschuff@chromium.org, jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/610273004
    • Subzero: Fix emission of 16-bit immediates. · 94c4c8e7
      Jim Stichnoth authored
      The operand type needs to be propagated into EmitImmediate() and EmitComplex() so that we know whether to emit the 2-byte or 4-byte form.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/607353002
  6. 27 Sep, 2014 1 commit
    • Subzero: Build both Debug and Release version of llvm2ice. · fddef241
      Jim Stichnoth authored
      Separate objects are built with -O0 and -O2.
      
      Separate executables are built:
        build/Release/llvm2ice - Release build
        build/Debug/llvm2ice - Debug build
      
      The executable built depends on whether the DEBUG make variable is set:
        make -f Makefile.standalone
        make -f Makefile.standalone DEBUG=1
      
      The llvm2ice file in the top-level directory is always removed and symlinked to the appropriate build.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/605093002
  7. 26 Sep, 2014 6 commits
  8. 25 Sep, 2014 3 commits
    • Fix bug in Subzero bitcode reader for insertelement instruction. · f0657dd8
      Karl Schimpf authored
      Instruction insertelement was incorrectly generating a result
      corresponding to the element type, instead of the updated
      vector type.
      
      BUG= None
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/604023003
    • Subzero: Automatically infer regalloc preferences and overlap. · ad403539
      Jim Stichnoth authored
      Originally, for a given Variable, register preference and overlap were manually specified.  That is, when choosing a free register for a Variable, it would be manually specified which (if any) related Variable would be a good choice for register selection, all things being equal.  Also, it allowed the rather dangerous "AllowOverlap" specification which let the Variable use its preferred Variable's register, even if their live ranges overlap.
      
      Now, all this selection is automatic, and the machinery for manual specification is removed.
      
      A few other changes in this CL:
      
      - Address mode inference leverages the more precise
      
      - Better regalloc dump messages to follow the logic
      
      - "-verbose most" enables all verbose options except regalloc and time
      
      - "-ias" is an alias for "-integrated-as"
      
      - Bug fix: prevent 8-bit register ah from being used in register allocation, unless it is pre-colored
      
      - Bug fix: the _mov helper where Dest is NULL wasn't always actually creating a new Variable
      
      - A few tests are updated based on slightly different O2 register allocation decisions
      
      The static stats actually improve slightly across the board (around 1%), except that frame size improves by 6-10%.  This is probably from smarter register allocation decisions, particularly involving phi lowering temporaries, where the manual hints weren't too good to start with.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/597003004
    • Clean up run script to use for testing Subzero. · 2a5324a1
      Karl Schimpf authored
      Adds the python script run-llvm2ice.py (was llvm2iceinsts.py) that
      automatically handles conversion of LLVM source to a PEXE file,
      and then runs llvm2ice on the corresponding PEXE file.
      
      Also, defines three paths in tests, based on the executable chosen:
      
        %lc2i - Directly reads from LLVM source, and converts to Subzero.
        %l2i  - Parses a PEXE file into LLVM IR, and converts to Subzero.
        %p2i  - Parses a PEXE directly into Subzero.
      
      Note that for all three executables, the same arguments can be used,
      making it easy to change how the input is handled.
      
      Also moves tests to use %p2i whenever possible.
      
      BUG= https://code.google.com/p/nativeclient/issues/detail?id=3892
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/600043002
  9. 24 Sep, 2014 1 commit
  10. 23 Sep, 2014 2 commits
  11. 22 Sep, 2014 3 commits
  12. 20 Sep, 2014 1 commit
    • Subzero: Change the way bitcast stack slot lowering is handled. · 800dab29
      Jim Stichnoth authored
      When doing a bitcast between int and FP types, the way lowering works
      is that a spill temporary is created, with regalloc weight of zero to
      inhibit register allocation, and this spill temporary is used for the
      cvt instruction.  If the other variable does not get
      register-allocated, then addProlog() forces the spill temporary to
      share the same stack slot as the other variable.
      
      Currently, the lowering code passes this information to addProlog()
      by using the setPreferredRegister() mechanism.
      
      This is changed by creating a target-specific subclass of Variable, so
      that only the spill temporaries need to carry this extra information.
      
      Ultimately, many of the existing Variable fields will be refactored
      into a separate structure, and only generated/used as needed by
      various optimization passes.  The spill temporary linkage is the one
      thing that is still needed with Om1 when no optimizations are enabled,
      motivating this change.
      
      A couple other minor cleanups are also done here.
      
      The key test is that the cast cross tests continue to work,
      specifically the bitcast tests.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/586943003
  13. 19 Sep, 2014 3 commits
  14. 18 Sep, 2014 2 commits
    • Subzero: Allow extra args to be passed to llc and Subzero. · 89906a5e
      Jim Stichnoth authored
      Use --llc to pass extra arguments to pnacl-translate.
      
      Use --sz to pass extra arguments to llvm2ice.
      
      The --stats argument is removed from the script because it is Subzero-only, and can now be done with --sz=--stats .
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/582593002
    • Subzero: Add branch optimization. · ff9c7063
      Jim Stichnoth authored
      1. Unconditional branch to the next basic block is removed.
      
      2. For a conditional branch with a "false" edge to the next basic block, remove the unconditional branch to the fallthrough block.
      
      3. For a conditional branch with a "true" edge to the next basic block, invert the condition and do like #2.
      
      This is enabled only for O2, particularly because inverting the branch condition is a marginally risky operation.
      
      This decreases the instruction count by about 5-6%.
      
      Also, --stats prints a final tally to make it easier to post-process the output.
      
      BUG= none
      R=jvoung@chromium.org
      
      Review URL: https://codereview.chromium.org/580903005
  15. 17 Sep, 2014 5 commits