Commit a3f57b9a by Jim Stichnoth

Subzero: Cleanly implement register allocation after phi lowering.

After finding a valid linearization of phi assignments, the old approach calls a complicated target-specific method that lowers and ad-hoc register allocates the phi assignments. In the new approach, we use existing target lowering to lower assignments into mov/whatever instructions, and enhance the register allocator to be able to forcibly spill and reload a register if one is needed but none are available. The new approach incrementally updates liveness and live ranges for newly added nodes and variable uses, to avoid having to expensively recompute it all. Advanced phi lowering is enabled now on ARM, and constant blinding no longer needs to be disabled during phi lowering. Some of the metadata regarding which CfgNode a local variable belongs to, needed to be made non-const, in order to add spill/fill instructions to a CfgNode during register allocation. Most of the testing came from spec2k. There are some minor differences in the output regarding stack frame offsets, probably related to the order that new nodes are phi-lowered. The changes related to constant blinding were tested by running spec with "-randomize-pool-immediates=randomize -randomize-pool-threshold=8". Unfortunately, this appears to add about 10% to the translation time for 176.gcc. The cost is clear in the -timing output so it can be investigated later. There is a TODO suggesting the possible cause and solution, for later investigation. BUG= none R=jvoung@chromium.org Review URL: https://codereview.chromium.org/1253833002.
parent 448c16f0
......@@ -248,11 +248,50 @@ void Cfg::deletePhis() {
void Cfg::advancedPhiLowering() {
TimerMarker T(TimerStack::TT_advancedPhiLowering, this);
// Clear all previously computed live ranges (but not live-in/live-out bit
// vectors or last-use markers), because the follow-on register allocation is
// only concerned with live ranges across the newly created blocks.
for (Variable *Var : Variables) {
Var->getLiveRange().reset();
}
// This splits edges and appends new nodes to the end of the node
// list. This can invalidate iterators, so don't use an iterator.
SizeT NumNodes = getNumNodes();
SizeT NumVars = getNumVariables();
for (SizeT I = 0; I < NumNodes; ++I)
Nodes[I]->advancedPhiLowering();
TimerMarker TT(TimerStack::TT_lowerPhiAssignments, this);
if (true) {
// The following code does an in-place update of liveness and live ranges as
// a result of adding the new phi edge split nodes.
getLiveness()->initPhiEdgeSplits(Nodes.begin() + NumNodes,
Variables.begin() + NumVars);
TimerMarker TTT(TimerStack::TT_liveness, this);
// Iterate over the newly added nodes to add their liveness info.
for (auto I = Nodes.begin() + NumNodes, E = Nodes.end(); I != E; ++I) {
InstNumberT FirstInstNum = getNextInstNumber();
(*I)->renumberInstructions();
InstNumberT LastInstNum = getNextInstNumber() - 1;
// TODO(stichnot): May be able to speed up liveness and live range
// calculation by having it consider only pre-colored or infinite-weight
// variables. Could also simplify LinearScan::initForInfOnly() that way.
(*I)->liveness(getLiveness());
(*I)->livenessAddIntervals(getLiveness(), FirstInstNum, LastInstNum);
}
} else {
// The following code does a brute-force recalculation of live ranges as a
// result of adding the new phi edge split nodes. The liveness calculation
// is particularly expensive because the new nodes are not yet in a proper
// topological order and so convergence is slower.
//
// This code is kept here for reference and can be temporarily enabled in
// case the incremental code is under suspicion.
renumberInstructions();
liveness(Liveness_Intervals);
getVMetadata()->init(VMK_All);
}
Target->regAlloc(RAK_Phi);
}
// Find a reasonable placement for nodes that have not yet been
......
......@@ -270,33 +270,44 @@ bool sameVarOrReg(const Variable *Var, const Operand *Opnd) {
} // end of anonymous namespace
// This the "advanced" version of Phi lowering for a basic block, in
// contrast to the simple version that lowers through assignments
// involving temporaries.
// This the "advanced" version of Phi lowering for a basic block, in contrast to
// the simple version that lowers through assignments involving temporaries.
//
// All Phi instructions in a basic block are conceptually executed in
// parallel. However, if we lower Phis early and commit to a
// sequential ordering, we may end up creating unnecessary
// interferences which lead to worse register allocation. Delaying
// Phi scheduling until after register allocation can help unless
// there are no free registers for shuffling registers or stack slots
// and spilling becomes necessary.
// All Phi instructions in a basic block are conceptually executed in parallel.
// However, if we lower Phis early and commit to a sequential ordering, we may
// end up creating unnecessary interferences which lead to worse register
// allocation. Delaying Phi scheduling until after register allocation can help
// unless there are no free registers for shuffling registers or stack slots and
// spilling becomes necessary.
//
// The advanced Phi lowering starts by finding a topological sort of
// the Phi instructions, where "A=B" comes before "B=C" due to the
// anti-dependence on B. If a topological sort is not possible due to
// a cycle, the cycle is broken by introducing a non-parallel
// temporary. For example, a cycle arising from a permutation like
// "A=B;B=C;C=A" can become "T=A;A=B;B=C;C=T". All else being equal,
// prefer to schedule assignments with register-allocated Src operands
// earlier, in case that register becomes free afterwards, and prefer
// to schedule assignments with register-allocated Dest variables
// later, to keep that register free for longer.
// The advanced Phi lowering starts by finding a topological sort of the Phi
// instructions, where "A=B" comes before "B=C" due to the anti-dependence on B.
// Preexisting register assignments are considered in the topological sort. If
// a topological sort is not possible due to a cycle, the cycle is broken by
// introducing a non-parallel temporary. For example, a cycle arising from a
// permutation like "A=B;B=C;C=A" can become "T=A;A=B;B=C;C=T". All else being
// equal, prefer to schedule assignments with register-allocated Src operands
// earlier, in case that register becomes free afterwards, and prefer to
// schedule assignments with register-allocated Dest variables later, to keep
// that register free for longer.
//
// Once the ordering is determined, the Cfg edge is split and the
// assignment list is lowered by the target lowering layer. The
// specific placement of the new node within the Cfg node list is
// deferred until later, including after empty node contraction.
// Once the ordering is determined, the Cfg edge is split and the assignment
// list is lowered by the target lowering layer. Since the assignment lowering
// may create new infinite-weight temporaries, a follow-on register allocation
// pass will be needed. To prepare for this, liveness (including live range
// calculation) of the split nodes needs to be calculated, and liveness of the
// original node need to be updated to "undo" the effects of the phi
// assignments.
// The specific placement of the new node within the Cfg node list is deferred
// until later, including after empty node contraction.
//
// After phi assignments are lowered across all blocks, another register
// allocation pass is run, focusing only on pre-colored and infinite-weight
// variables, similar to Om1 register allocation (except without the need to
// specially compute these variables' live ranges, since they have already been
// precisely calculated). The register allocator in this mode needs the ability
// to forcibly spill and reload registers in case none are naturally available.
void CfgNode::advancedPhiLowering() {
if (getPhis().empty())
return;
......@@ -314,11 +325,18 @@ void CfgNode::advancedPhiLowering() {
size_t NumPhis = 0;
for (Inst &I : Phis) {
auto Inst = llvm::dyn_cast<InstPhi>(&I);
auto *Inst = llvm::dyn_cast<InstPhi>(&I);
if (!Inst->isDeleted()) {
Variable *Dest = Inst->getDest();
Desc[NumPhis].Phi = Inst;
Desc[NumPhis].Dest = Inst->getDest();
Desc[NumPhis].Dest = Dest;
++NumPhis;
// Undo the effect of the phi instruction on this node's live-in set by
// marking the phi dest variable as live on entry.
SizeT VarNum = Func->getLiveness()->getLiveIndex(Dest->getIndex());
assert(!Func->getLiveness()->getLiveIn(this)[VarNum]);
Func->getLiveness()->getLiveIn(this)[VarNum] = true;
Inst->setDeleted();
}
}
if (NumPhis == 0)
......@@ -327,7 +345,6 @@ void CfgNode::advancedPhiLowering() {
SizeT InEdgeIndex = 0;
for (CfgNode *Pred : InEdges) {
CfgNode *Split = splitIncomingEdge(Pred, InEdgeIndex++);
AssignList Assignments;
SizeT Remaining = NumPhis;
// First pass computes Src and initializes NumPred.
......@@ -348,7 +365,7 @@ void CfgNode::advancedPhiLowering() {
// be redundant, and if Dest/Src are on the stack, the
// target lowering may naively decide to lower it using a
// temporary register.
Assignments.push_back(InstAssign::create(Func, Dest, Src));
Split->appendInst(InstAssign::create(Func, Dest, Src));
}
}
// Second pass computes NumPred by comparing every pair of Phi
......@@ -374,12 +391,12 @@ void CfgNode::advancedPhiLowering() {
// Another pass to compute initial Weight values.
// Always pick NumPred=0 over NumPred>0.
const int32_t WeightNoPreds = 4;
constexpr int32_t WeightNoPreds = 4;
// Prefer Src as a register because the register might free up.
const int32_t WeightSrcIsReg = 2;
constexpr int32_t WeightSrcIsReg = 2;
// Prefer Dest not as a register because the register stays free
// longer.
const int32_t WeightDestNotReg = 1;
constexpr int32_t WeightDestNotReg = 1;
for (size_t I = 0; I < NumPhis; ++I) {
if (Desc[I].Processed)
......@@ -434,7 +451,7 @@ void CfgNode::advancedPhiLowering() {
Variable *Tmp = Func->makeVariable(OtherSrc->getType());
if (BuildDefs::dump())
Tmp->setName(Func, "__split_" + std::to_string(VarNum));
Assignments.push_back(InstAssign::create(Func, Tmp, OtherSrc));
Split->appendInst(InstAssign::create(Func, Tmp, OtherSrc));
Desc[J].Src = Tmp;
Found = true;
}
......@@ -443,7 +460,7 @@ void CfgNode::advancedPhiLowering() {
}
// Now that a cycle (if any) has been broken, create the actual
// assignment.
Assignments.push_back(InstAssign::create(Func, Dest, Src));
Split->appendInst(InstAssign::create(Func, Dest, Src));
// Update NumPred for all Phi assignments using this Phi's Src
// as their Dest variable. Also update Weight if NumPred
// dropped from 1 to 0.
......@@ -459,18 +476,11 @@ void CfgNode::advancedPhiLowering() {
}
Desc[BestIndex].Processed = true;
}
Split->appendInst(InstBr::create(Func, this));
Func->getTarget()->lowerPhiAssignments(Split, Assignments);
// Renumber the instructions to be monotonically increasing so
// that addNode() doesn't assert when multi-definitions are added
// out of order.
Split->renumberInstructions();
Split->genCode();
Func->getVMetadata()->addNode(Split);
}
for (Inst &I : Phis)
I.setDeleted();
}
// Does address mode optimization. Pass each instruction to the
......@@ -896,7 +906,7 @@ void CfgNode::emit(Cfg *Func) const {
// inference allows it to be greater than 1 for short periods.
std::vector<SizeT> LiveRegCount(Func->getTarget()->getNumRegisters());
if (DecorateAsm) {
const bool IsLiveIn = true;
constexpr bool IsLiveIn = true;
emitRegisterUsage(Str, Func, this, IsLiveIn, LiveRegCount);
}
......@@ -931,7 +941,7 @@ void CfgNode::emit(Cfg *Func) const {
updateStats(Func, &I);
}
if (DecorateAsm) {
const bool IsLiveIn = false;
constexpr bool IsLiveIn = false;
emitRegisterUsage(Str, Func, this, IsLiveIn, LiveRegCount);
}
}
......
......@@ -401,15 +401,7 @@ void ClFlags::getParsedClFlags(ClFlags &OutFlags) {
OutFlags.setFunctionSections(::FunctionSections);
OutFlags.setNumTranslationThreads(::NumThreads);
OutFlags.setOptLevel(::OLevel);
if (::TargetArch == Target_ARM32) {
// TODO(jvoung): We need lowerPhiAssignments to handle spilling
// more than one register, since some ARM lowerAssign sequences
// may require more than one register. For now, disable PhiEdgeSplit
// to avoid requiring lowerPhiAssignments.
OutFlags.setPhiEdgeSplit(false);
} else {
OutFlags.setPhiEdgeSplit(::EnablePhiEdgeSplit);
}
OutFlags.setPhiEdgeSplit(::EnablePhiEdgeSplit);
OutFlags.setRandomSeed(::RandomSeed);
OutFlags.setShouldDoNopInsertion(::ShouldDoNopInsertion);
OutFlags.setShouldRandomizeRegAlloc(::RandomizeRegisterAllocation);
......
......@@ -197,7 +197,9 @@ enum LivenessMode {
};
enum RegAllocKind {
RAK_Unknown,
RAK_Global, /// full, global register allocation
RAK_Phi, /// infinite-weight Variables with active spilling/filling
RAK_InfOnly /// allocation only for infinite-weight Variables
};
......
......@@ -30,7 +30,13 @@
namespace Ice {
void Liveness::init() {
// Initializes the basic liveness-related data structures for full liveness
// analysis (IsFullInit=true), or for incremental update after phi lowering
// (IsFullInit=false). In the latter case, FirstNode points to the first node
// added since starting phi lowering, and FirstVar points to the first Variable
// added since starting phi lowering.
void Liveness::initInternal(NodeList::const_iterator FirstNode,
VarList::const_iterator FirstVar, bool IsFullInit) {
// Initialize most of the container sizes.
SizeT NumVars = Func->getVariables().size();
SizeT NumNodes = Func->getNumNodes();
......@@ -38,32 +44,38 @@ void Liveness::init() {
Nodes.resize(NumNodes);
VarToLiveMap.resize(NumVars);
// Count the number of globals, and the number of locals for each
// block.
for (SizeT i = 0; i < NumVars; ++i) {
Variable *Var = Func->getVariables()[i];
// Count the number of globals, and the number of locals for each block.
SizeT TmpNumGlobals = 0;
for (auto I = FirstVar, E = Func->getVariables().end(); I != E; ++I) {
Variable *Var = *I;
if (VMetadata->isMultiBlock(Var)) {
++NumGlobals;
++TmpNumGlobals;
} else {
SizeT Index = VMetadata->getLocalUseNode(Var)->getIndex();
++Nodes[Index].NumLocals;
}
}
if (IsFullInit)
NumGlobals = TmpNumGlobals;
else
assert(TmpNumGlobals == 0);
// Resize each LivenessNode::LiveToVarMap, and the global
// LiveToVarMap. Reset the counts to 0.
for (SizeT i = 0; i < NumNodes; ++i) {
Nodes[i].LiveToVarMap.assign(Nodes[i].NumLocals, nullptr);
Nodes[i].NumLocals = 0;
Nodes[i].NumNonDeadPhis = 0;
// Resize each LivenessNode::LiveToVarMap, and the global LiveToVarMap. Reset
// the counts to 0.
for (auto I = FirstNode, E = Func->getNodes().end(); I != E; ++I) {
LivenessNode &N = Nodes[(*I)->getIndex()];
N.LiveToVarMap.assign(N.NumLocals, nullptr);
N.NumLocals = 0;
N.NumNonDeadPhis = 0;
}
LiveToVarMap.assign(NumGlobals, nullptr);
if (IsFullInit)
LiveToVarMap.assign(NumGlobals, nullptr);
// Sort each variable into the appropriate LiveToVarMap. Also set
// VarToLiveMap.
SizeT TmpNumGlobals = 0;
for (SizeT i = 0; i < NumVars; ++i) {
Variable *Var = Func->getVariables()[i];
TmpNumGlobals = 0;
for (auto I = FirstVar, E = Func->getVariables().end(); I != E; ++I) {
Variable *Var = *I;
SizeT VarIndex = Var->getIndex();
SizeT LiveIndex;
if (VMetadata->isMultiBlock(Var)) {
......@@ -77,11 +89,11 @@ void Liveness::init() {
}
VarToLiveMap[VarIndex] = LiveIndex;
}
assert(NumGlobals == TmpNumGlobals);
assert(TmpNumGlobals == (IsFullInit ? NumGlobals : 0));
// Process each node.
for (const CfgNode *LNode : Func->getNodes()) {
LivenessNode &Node = Nodes[LNode->getIndex()];
for (auto I = FirstNode, E = Func->getNodes().end(); I != E; ++I) {
LivenessNode &Node = Nodes[(*I)->getIndex()];
// NumLocals, LiveToVarMap already initialized
Node.LiveIn.resize(NumGlobals);
Node.LiveOut.resize(NumGlobals);
......@@ -90,6 +102,19 @@ void Liveness::init() {
}
}
void Liveness::init() {
constexpr bool IsFullInit = true;
NodeList::const_iterator FirstNode = Func->getNodes().begin();
VarList::const_iterator FirstVar = Func->getVariables().begin();
initInternal(FirstNode, FirstVar, IsFullInit);
}
void Liveness::initPhiEdgeSplits(NodeList::const_iterator FirstNode,
VarList::const_iterator FirstVar) {
constexpr bool IsFullInit = false;
initInternal(FirstNode, FirstVar, IsFullInit);
}
Variable *Liveness::getVariable(SizeT LiveIndex, const CfgNode *Node) const {
if (LiveIndex < NumGlobals)
return LiveToVarMap[LiveIndex];
......
......@@ -63,6 +63,8 @@ class Liveness {
public:
Liveness(Cfg *Func, LivenessMode Mode) : Func(Func), Mode(Mode) {}
void init();
void initPhiEdgeSplits(NodeList::const_iterator FirstNode,
VarList::const_iterator FirstVar);
Cfg *getFunc() const { return Func; }
LivenessMode getMode() const { return Mode; }
Variable *getVariable(SizeT LiveIndex, const CfgNode *Node) const;
......@@ -96,6 +98,8 @@ public:
}
private:
void initInternal(NodeList::const_iterator FirstNode,
VarList::const_iterator FirstVar, bool IsFullInit);
/// Resize Nodes so that Nodes[Index] is valid.
void resize(SizeT Index) {
if (Index >= Nodes.size())
......
......@@ -146,8 +146,7 @@ Variable *Variable::asType(Type Ty) {
}
void VariableTracking::markUse(MetadataKind TrackingKind, const Inst *Instr,
const CfgNode *Node, bool IsFromDef,
bool IsImplicit) {
CfgNode *Node, bool IsImplicit) {
(void)TrackingKind;
if (MultiBlock == MBS_MultiBlock)
return;
......@@ -165,7 +164,7 @@ void VariableTracking::markUse(MetadataKind TrackingKind, const Inst *Instr,
// is because there can be additional control flow before branching
// back to this node, and the variable is live throughout those
// nodes.
if (!IsFromDef && Instr && llvm::isa<InstPhi>(Instr))
if (Instr && llvm::isa<InstPhi>(Instr))
MakeMulti = true;
if (!MakeMulti) {
......@@ -190,7 +189,7 @@ void VariableTracking::markUse(MetadataKind TrackingKind, const Inst *Instr,
}
void VariableTracking::markDef(MetadataKind TrackingKind, const Inst *Instr,
const CfgNode *Node) {
CfgNode *Node) {
// TODO(stichnot): If the definition occurs in the last instruction
// of the block, consider not marking this as a separate use. But
// be careful not to omit all uses of the variable if markDef() and
......@@ -205,9 +204,8 @@ void VariableTracking::markDef(MetadataKind TrackingKind, const Inst *Instr,
Instr->getNumber() >= LastInstruction->getNumber());
}
#endif
const bool IsFromDef = true;
const bool IsImplicit = false;
markUse(TrackingKind, Instr, Node, IsFromDef, IsImplicit);
constexpr bool IsImplicit = false;
markUse(TrackingKind, Instr, Node, IsImplicit);
if (TrackingKind == VMK_Uses)
return;
if (FirstOrSingleDefinition == nullptr)
......@@ -276,12 +274,10 @@ void VariablesMetadata::init(MetadataKind TrackingKind) {
// Mark implicit args as being used in the entry node.
for (Variable *Var : Func->getImplicitArgs()) {
const Inst *NoInst = nullptr;
const CfgNode *EntryNode = Func->getEntryNode();
const bool IsFromDef = false;
const bool IsImplicit = true;
Metadata[Var->getIndex()].markUse(Kind, NoInst, EntryNode, IsFromDef,
IsImplicit);
constexpr Inst *NoInst = nullptr;
CfgNode *EntryNode = Func->getEntryNode();
constexpr bool IsImplicit = true;
Metadata[Var->getIndex()].markUse(Kind, NoInst, EntryNode, IsImplicit);
}
for (CfgNode *Node : Func->getNodes())
......@@ -304,9 +300,8 @@ void VariablesMetadata::addNode(CfgNode *Node) {
if (const Variable *Var = llvm::dyn_cast<Variable>(I.getSrc(SrcNum))) {
SizeT VarNum = Var->getIndex();
assert(VarNum < Metadata.size());
const bool IsFromDef = false;
const bool IsImplicit = false;
Metadata[VarNum].markUse(Kind, &I, Node, IsFromDef, IsImplicit);
constexpr bool IsImplicit = false;
Metadata[VarNum].markUse(Kind, &I, Node, IsImplicit);
}
}
}
......@@ -328,9 +323,8 @@ void VariablesMetadata::addNode(CfgNode *Node) {
const Variable *Var = Src->getVar(J);
SizeT VarNum = Var->getIndex();
assert(VarNum < Metadata.size());
const bool IsFromDef = false;
const bool IsImplicit = false;
Metadata[VarNum].markUse(Kind, &I, Node, IsFromDef, IsImplicit);
constexpr bool IsImplicit = false;
Metadata[VarNum].markUse(Kind, &I, Node, IsImplicit);
}
}
}
......@@ -382,7 +376,7 @@ VariablesMetadata::getLatterDefinitions(const Variable *Var) const {
return Metadata[VarNum].getLatterDefinitions();
}
const CfgNode *VariablesMetadata::getLocalUseNode(const Variable *Var) const {
CfgNode *VariablesMetadata::getLocalUseNode(const Variable *Var) const {
if (!isTracked(Var))
return nullptr; // conservative answer
SizeT VarNum = Var->getIndex();
......
......@@ -377,6 +377,9 @@ public:
InstNumberT getStart() const {
return Range.empty() ? -1 : Range.begin()->first;
}
InstNumberT getEnd() const {
return Range.empty() ? -1 : Range.rbegin()->second;
}
void untrim() { TrimmedBegin = Range.begin(); }
void trim(InstNumberT Lower);
......@@ -577,17 +580,16 @@ public:
const Inst *getFirstDefinition() const;
const Inst *getSingleDefinition() const;
const InstDefList &getLatterDefinitions() const { return Definitions; }
const CfgNode *getNode() const { return SingleUseNode; }
void markUse(MetadataKind TrackingKind, const Inst *Instr,
const CfgNode *Node, bool IsFromDef, bool IsImplicit);
void markDef(MetadataKind TrackingKind, const Inst *Instr,
const CfgNode *Node);
CfgNode *getNode() const { return SingleUseNode; }
void markUse(MetadataKind TrackingKind, const Inst *Instr, CfgNode *Node,
bool IsImplicit);
void markDef(MetadataKind TrackingKind, const Inst *Instr, CfgNode *Node);
private:
MultiDefState MultiDef = MDS_Unknown;
MultiBlockState MultiBlock = MBS_Unknown;
const CfgNode *SingleUseNode = nullptr;
const CfgNode *SingleDefNode = nullptr;
CfgNode *SingleUseNode = nullptr;
CfgNode *SingleDefNode = nullptr;
/// All definitions of the variable are collected here, in increasing
/// order of instruction number.
InstDefList Definitions; /// Only used if Kind==VMK_All
......@@ -645,7 +647,7 @@ public:
bool isMultiBlock(const Variable *Var) const;
/// Returns the node that the given Variable is used in, assuming
/// isMultiBlock() returns false. Otherwise, nullptr is returned.
const CfgNode *getLocalUseNode(const Variable *Var) const;
CfgNode *getLocalUseNode(const Variable *Var) const;
private:
const Cfg *Func;
......
......@@ -39,6 +39,9 @@ private:
void initForGlobal();
void initForInfOnly();
/// Free up a register for infinite-weight Cur by spilling and reloading some
/// register that isn't used during Cur's live range.
void addSpillFill(Variable *Cur, llvm::SmallBitVector RegMask);
/// Move an item from the From set to the To set. From[Index] is
/// pushed onto the end of To[], then the item is efficiently removed
/// from From[] by effectively swapping it with the last item in
......@@ -58,12 +61,9 @@ private:
OrderedRanges UnhandledPrecolored;
UnorderedRanges Active, Inactive, Handled;
std::vector<InstNumberT> Kills;
RegAllocKind Kind = RAK_Unknown;
bool FindPreference = false;
bool FindOverlap = false;
// TODO(stichnot): We're not really using FindOverlap yet, but we
// may want a flavor of register allocation where FindPreference is
// useful but we didn't want to initialize VMetadata with VMK_All
// and therefore we can't safely allow overlap.
};
} // end of namespace Ice
......
......@@ -211,6 +211,20 @@ void TargetLowering::lower() {
Context.advanceNext();
}
void TargetLowering::lowerInst(CfgNode *Node, InstList::iterator Next,
InstHighLevel *Instr) {
// TODO(stichnot): Consider modifying the design/implementation to avoid
// multiple init() calls when using lowerInst() to lower several instructions
// in the same node.
Context.init(Node);
Context.setNext(Next);
Context.insert(Instr);
--Next;
assert(&*Next == Instr);
Context.setCur(Next);
lower();
}
void TargetLowering::lowerOther(const Inst *Instr) {
(void)Instr;
Func->setError("Can't lower unsupported instruction type");
......
......@@ -63,6 +63,8 @@ public:
Inst *getLastInserted() const;
void advanceCur() { Cur = Next; }
void advanceNext() { advanceForward(Next); }
void setCur(InstList::iterator C) { Cur = C; }
void setNext(InstList::iterator N) { Next = N; }
void rewind();
void setInsertPoint(const InstList::iterator &Position) { Next = Position; }
......@@ -136,16 +138,15 @@ public:
void doNopInsertion();
/// Lowers a single non-Phi instruction.
void lower();
/// Inserts and lowers a single high-level instruction at a specific insertion
/// point.
void lowerInst(CfgNode *Node, InstList::iterator Next, InstHighLevel *Instr);
/// Does preliminary lowering of the set of Phi instructions in the
/// current node. The main intention is to do what's needed to keep
/// the unlowered Phi instructions consistent with the lowered
/// non-Phi instructions, e.g. to lower 64-bit operands on a 32-bit
/// target.
virtual void prelowerPhis() {}
/// Lowers a list of "parallel" assignment instructions representing
/// a topological sort of the Phi instructions.
virtual void lowerPhiAssignments(CfgNode *Node,
const AssignList &Assignments) = 0;
/// Tries to do branch optimization on a single instruction. Returns
/// true if some optimization was done.
virtual bool doBranchOpt(Inst * /*I*/, const CfgNode * /*NextNode*/) {
......
......@@ -1470,26 +1470,22 @@ void TargetARM32::lowerAssign(const InstAssign *Inst) {
_mov(T_Hi, Src0Hi);
_mov(DestHi, T_Hi);
} else {
Operand *SrcR;
Operand *NewSrc;
if (Dest->hasReg()) {
// If Dest already has a physical register, then legalize the
// Src operand into a Variable with the same register
// assignment. This is mostly a workaround for advanced phi
// lowering's ad-hoc register allocation which assumes no
// register allocation is needed when at least one of the
// operands is non-memory.
// TODO(jvoung): check this for ARM.
SrcR = legalize(Src0, Legal_Reg, Dest->getRegNum());
// If Dest already has a physical register, then legalize the Src operand
// into a Variable with the same register assignment. This especially
// helps allow the use of Flex operands.
NewSrc = legalize(Src0, Legal_Reg | Legal_Flex, Dest->getRegNum());
} else {
// Dest could be a stack operand. Since we could potentially need
// to do a Store (and store can only have Register operands),
// legalize this to a register.
SrcR = legalize(Src0, Legal_Reg);
NewSrc = legalize(Src0, Legal_Reg);
}
if (isVectorType(Dest->getType())) {
UnimplementedError(Func->getContext()->getFlags());
} else {
_mov(Dest, SrcR);
_mov(Dest, NewSrc);
}
}
}
......@@ -2415,15 +2411,6 @@ void TargetARM32::prelowerPhis() {
PhiLowering::prelowerPhis32Bit<TargetARM32>(this, Context.getNode(), Func);
}
// Lower the pre-ordered list of assignments into mov instructions.
// Also has to do some ad-hoc register allocation as necessary.
void TargetARM32::lowerPhiAssignments(CfgNode *Node,
const AssignList &Assignments) {
(void)Node;
(void)Assignments;
UnimplementedError(Func->getContext()->getFlags());
}
Variable *TargetARM32::makeVectorOfZeros(Type Ty, int32_t RegNum) {
Variable *Reg = makeReg(Ty, RegNum);
UnimplementedError(Func->getContext()->getFlags());
......
......@@ -132,8 +132,6 @@ protected:
void lowerSwitch(const InstSwitch *Inst) override;
void lowerUnreachable(const InstUnreachable *Inst) override;
void prelowerPhis() override;
void lowerPhiAssignments(CfgNode *Node,
const AssignList &Assignments) override;
void doAddressOptLoad() override;
void doAddressOptStore() override;
void randomlyInsertNop(float Probability) override;
......
......@@ -648,15 +648,6 @@ void TargetMIPS32::prelowerPhis() {
UnimplementedError(Func->getContext()->getFlags());
}
// Lower the pre-ordered list of assignments into mov instructions.
// Also has to do some ad-hoc register allocation as necessary.
void TargetMIPS32::lowerPhiAssignments(CfgNode *Node,
const AssignList &Assignments) {
(void)Node;
(void)Assignments;
UnimplementedError(Func->getContext()->getFlags());
}
void TargetMIPS32::postLower() {
if (Ctx->getFlags().getOptLevel() == Opt_m1)
return;
......
......@@ -112,8 +112,6 @@ protected:
void lowerSwitch(const InstSwitch *Inst) override;
void lowerUnreachable(const InstUnreachable *Inst) override;
void prelowerPhis() override;
void lowerPhiAssignments(CfgNode *Node,
const AssignList &Assignments) override;
void doAddressOptLoad() override;
void doAddressOptStore() override;
void randomlyInsertNop(float Probability) override;
......
......@@ -144,8 +144,6 @@ protected:
void lowerOther(const Inst *Instr) override;
void lowerRMW(const typename Traits::Insts::FakeRMW *RMW);
void prelowerPhis() override;
void lowerPhiAssignments(CfgNode *Node,
const AssignList &Assignments) override;
void doAddressOptLoad() override;
void doAddressOptStore() override;
void randomlyInsertNop(float Probability) override;
......
......@@ -387,13 +387,7 @@ template <class Machine> void TargetX86Base<Machine>::translateO2() {
Func->dump("After linear scan regalloc");
if (Ctx->getFlags().getPhiEdgeSplit()) {
// We need to pause constant blinding or pooling during advanced phi
// lowering, unless the lowering assignment has a physical register for the
// dest Variable.
{
BoolFlagSaver B(RandomizationPoolingPaused, true);
Func->advancedPhiLowering();
}
Func->advancedPhiLowering();
Func->dump("After advanced Phi lowering");
}
......@@ -2064,32 +2058,21 @@ void TargetX86Base<Machine>::lowerAssign(const InstAssign *Inst) {
_mov(T_Hi, Src0Hi);
_mov(DestHi, T_Hi);
} else {
Operand *RI;
Operand *Src0Legal;
if (Dest->hasReg()) {
// If Dest already has a physical register, then legalize the
// Src operand into a Variable with the same register
// assignment. This is mostly a workaround for advanced phi
// lowering's ad-hoc register allocation which assumes no
// register allocation is needed when at least one of the
// operands is non-memory.
// If we have a physical register for the dest variable, we can
// enable our constant blinding or pooling again. Note this is
// only for advancedPhiLowering(), the flag flip should leave
// no other side effect.
{
BoolFlagSaver B(RandomizationPoolingPaused, false);
RI = legalize(Src0, Legal_Reg, Dest->getRegNum());
}
// If Dest already has a physical register, then only basic legalization
// is needed, as the source operand can be a register, immediate, or
// memory.
Src0Legal = legalize(Src0);
} else {
// If Dest could be a stack operand, then RI must be a physical
// register or a scalar integer immediate.
RI = legalize(Src0, Legal_Reg | Legal_Imm);
Src0Legal = legalize(Src0, Legal_Reg | Legal_Imm);
}
if (isVectorType(Dest->getType()))
_movp(Dest, RI);
_movp(Dest, Src0Legal);
else
_mov(Dest, RI);
_mov(Dest, Src0Legal);
}
}
......@@ -4938,164 +4921,6 @@ template <class Machine> void TargetX86Base<Machine>::prelowerPhis() {
this, Context.getNode(), Func);
}
inline bool isMemoryOperand(const Operand *Opnd) {
if (const auto Var = llvm::dyn_cast<Variable>(Opnd))
return !Var->hasReg();
// We treat vector undef values the same as a memory operand,
// because they do in fact need a register to materialize the vector
// of zeroes into.
if (llvm::isa<ConstantUndef>(Opnd))
return isScalarFloatingType(Opnd->getType()) ||
isVectorType(Opnd->getType());
if (llvm::isa<Constant>(Opnd))
return isScalarFloatingType(Opnd->getType());
return true;
}
/// Lower the pre-ordered list of assignments into mov instructions.
/// Also has to do some ad-hoc register allocation as necessary.
template <class Machine>
void TargetX86Base<Machine>::lowerPhiAssignments(
CfgNode *Node, const AssignList &Assignments) {
// Check that this is a properly initialized shell of a node.
assert(Node->getOutEdges().size() == 1);
assert(Node->getInsts().empty());
assert(Node->getPhis().empty());
CfgNode *Succ = Node->getOutEdges().front();
getContext().init(Node);
// Register set setup similar to regAlloc().
RegSetMask RegInclude = RegSet_All;
RegSetMask RegExclude = RegSet_StackPointer;
if (hasFramePointer())
RegExclude |= RegSet_FramePointer;
llvm::SmallBitVector Available = getRegisterSet(RegInclude, RegExclude);
bool NeedsRegs = false;
// Initialize the set of available registers to the set of what is
// available (not live) at the beginning of the successor block,
// minus all registers used as Dest operands in the Assignments. To
// do this, we start off assuming all registers are available, then
// iterate through the Assignments and remove Dest registers.
// During this iteration, we also determine whether we will actually
// need any extra registers for memory-to-memory copies. If so, we
// do the actual work of removing the live-in registers from the
// set. TODO(stichnot): This work is being repeated for every split
// edge to the successor, so consider updating LiveIn just once
// after all the edges are split.
for (const Inst &I : Assignments) {
Variable *Dest = I.getDest();
if (Dest->hasReg()) {
Available[Dest->getRegNum()] = false;
} else if (isMemoryOperand(I.getSrc(0))) {
NeedsRegs = true; // Src and Dest are both in memory
}
}
if (NeedsRegs) {
LivenessBV &LiveIn = Func->getLiveness()->getLiveIn(Succ);
for (int i = LiveIn.find_first(); i != -1; i = LiveIn.find_next(i)) {
Variable *Var = Func->getLiveness()->getVariable(i, Succ);
if (Var->hasReg())
Available[Var->getRegNum()] = false;
}
}
// Iterate backwards through the Assignments. After lowering each
// assignment, add Dest to the set of available registers, and
// remove Src from the set of available registers. Iteration is
// done backwards to enable incremental updates of the available
// register set, and the lowered instruction numbers may be out of
// order, but that can be worked around by renumbering the block
// afterwards if necessary.
for (const Inst &I : reverse_range(Assignments)) {
Context.rewind();
auto Assign = llvm::dyn_cast<InstAssign>(&I);
Variable *Dest = Assign->getDest();
// If the source operand is ConstantUndef, do not legalize it. In function
// test_split_undef_int_vec, the advanced phi lowering process will find an
// assignment of undefined vector. This vector, as the Src here, will crash
// if it go through legalize(). legalize() will create a new variable with
// makeVectorOfZeros(), but this new variable will be assigned a stack
// slot. This will fail with pxor(Var, Var) because it is an illegal
// instruction form. Note this failure is irrelevant to randomization or
// pooling of constants. So, we do not call legalize() to add pool label
// for the src operands of phi assignment instructions. Instead, we
// manually add pool label for constant float and constant double values
// here. Note going through legalize() does not affect the testing results
// of SPEC2K and xtests.
Operand *Src = Assign->getSrc(0);
if (!llvm::isa<ConstantUndef>(Assign->getSrc(0))) {
Src = legalize(Src);
}
Variable *SrcVar = llvm::dyn_cast<Variable>(Src);
// Use normal assignment lowering, except lower mem=mem specially
// so we can register-allocate at the same time.
if (!isMemoryOperand(Dest) || !isMemoryOperand(Src)) {
lowerAssign(Assign);
} else {
assert(Dest->getType() == Src->getType());
const llvm::SmallBitVector &RegsForType =
getRegisterSetForType(Dest->getType());
llvm::SmallBitVector AvailRegsForType = RegsForType & Available;
Variable *SpillLoc = nullptr;
Variable *Preg = nullptr;
// TODO(stichnot): Opportunity for register randomization.
int32_t RegNum = AvailRegsForType.find_first();
bool IsVector = isVectorType(Dest->getType());
bool NeedSpill = (RegNum == -1);
if (NeedSpill) {
// Pick some register to spill and update RegNum.
// TODO(stichnot): Opportunity for register randomization.
RegNum = RegsForType.find_first();
Preg = getPhysicalRegister(RegNum, Dest->getType());
SpillLoc = Func->makeVariable(Dest->getType());
// Create a fake def of the physical register to avoid
// liveness inconsistency problems during late-stage liveness
// analysis (e.g. asm-verbose mode).
Context.insert(InstFakeDef::create(Func, Preg));
if (IsVector)
_movp(SpillLoc, Preg);
else
_mov(SpillLoc, Preg);
}
assert(RegNum >= 0);
if (llvm::isa<ConstantUndef>(Src))
// Materialize an actual constant instead of undef. RegNum is
// passed in for vector types because undef vectors are
// lowered to vector register of zeroes.
Src =
legalize(Src, Legal_All, IsVector ? RegNum : Variable::NoRegister);
Variable *Tmp = makeReg(Dest->getType(), RegNum);
if (IsVector) {
_movp(Tmp, Src);
_movp(Dest, Tmp);
} else {
_mov(Tmp, Src);
_mov(Dest, Tmp);
}
if (NeedSpill) {
// Restore the spilled register.
if (IsVector)
_movp(Preg, SpillLoc);
else
_mov(Preg, SpillLoc);
// Create a fake use of the physical register to keep it live
// for late-stage liveness analysis (e.g. asm-verbose mode).
Context.insert(InstFakeUse::create(Func, Preg));
}
}
// Update register availability before moving to the previous
// instruction on the Assignments list.
if (Dest->hasReg())
Available[Dest->getRegNum()] = true;
if (SrcVar && SrcVar->hasReg())
Available[SrcVar->getRegNum()] = false;
}
// Add the terminator branch instruction to the end.
Context.setInsertPoint(Context.getEnd());
_br(Succ);
}
// There is no support for loading or emitting vector constants, so the
// vector values returned from makeVectorOfZeros, makeVectorOfOnes,
// etc. are initialized with register operations.
......
......@@ -36,6 +36,7 @@
X(liveness) \
X(livenessLightweight) \
X(llvmConvert) \
X(lowerPhiAssignments) \
X(parse) \
X(parseConstants) \
X(parseFunctions) \
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment