Reduce wait times for very large PEXE files.

Investigated how many parser waits occur when the OptQ fills up. The current implementation has 64k entries, which for 10Mb examples, never fill up (but do come close to filling up). To test, I dropped the queue size down. The numbers I got was that the queue size plus the number of parse waits was within 2% of the total number of function blocks. Hence, once OptQ fills up a lot of slow notifies get applied. Hence, for scaling, I modifed the code to not wake up the parse thread (during a pop) until OptQ got half empty. The results were that once the Opt got up to size 1024, less than 100 notifies would be issued. From 1024 on, as the queue size doubled, the number of notifies would drop roughly in half. Based on this, I decided to add the feature that the OptQ did not wake up the waiting parse thread until half empty. Since the queue size was not shrunk, this CL shouldn't add any overhead for the PEXES we have, and very few waits with significantly largers than the current (10Mb) PEXES. BUG=None R=jpp@chromium.org Review URL: https://codereview.chromium.org/1877873002 .

Reduce wait times for very large PEXE files.
3018cf2b · Karl Schimpf · b627f094 · 3018cf2b · 3018cf2b · 3018cf2b
Commit 3018cf2b authored Apr 11, 2016 by Karl Schimpf
Hide whitespace changes
Inline Side-by-side

Showing with 25 additions and 9 deletions

IceGlobalContext.cpp src/IceGlobalContext.cpp +16 -6

IceGlobalContext.h src/IceGlobalContext.h +7 -1

IceThreading.h src/IceThreading.h +2 -2

No files found.
--- a/src/IceGlobalContext.cpp
+++ b/src/IceGlobalContext.cpp
@@ -289,15 +289,25 @@ void GlobalContext::CodeStats::dump(const Cfg *Func, GlobalContext *Ctx) {
  }
 }
+namespace {
+// By default, wake up the main parser thread when the OptQ gets half empty.
+static constexpr size_t DefaultOptQWakeupSize = GlobalContext::MaxOptQSize >> 1;
+} // end of anonymous namespace
 GlobalContext::GlobalContext(Ostream *OsDump, Ostream *OsEmit, Ostream *OsError,
                             ELFStreamer *ELFStr)
    : Strings(new StringPool()), ConstPool(new ConstantPool()), ErrorStatus(),
      StrDump(OsDump), StrEmit(OsEmit), StrError(OsError), IntrinsicsInfo(this),
-      ObjectWriter(), OptQ(/*Sequential=*/getFlags().isSequential(),
+      ObjectWriter(),
-                           /*MaxSize=*/
+      OptQWakeupSize(std::max(DefaultOptQWakeupSize,
-                           getFlags().isParseParallel()
+                              size_t(getFlags().getNumTranslationThreads()))),
-                               ? MaxOptQSize
+      OptQ(/*Sequential=*/getFlags().isSequential(),
-                               : getFlags().getNumTranslationThreads()),
+           /*MaxSize=*/
+           getFlags().isParseParallel()
+               ? MaxOptQSize
+               : getFlags().getNumTranslationThreads()),
      // EmitQ is allowed unlimited size.
      EmitQ(/*Sequential=*/getFlags().isSequential()),
      DataLowering(TargetDataLowering::createLowering(this)) {
@@ -939,7 +949,7 @@ void GlobalContext::optQueueBlockingPush(std::unique_ptr<OptWorkItem> Item) {
 std::unique_ptr<OptWorkItem> GlobalContext::optQueueBlockingPop() {
  TimerMarker _(TimerStack::TT_qTransPop, this);
-  return std::unique_ptr<OptWorkItem>(OptQ.blockingPop());
+  return OptQ.blockingPop(OptQWakeupSize);
 }
 void GlobalContext::emitQueueBlockingPush(

--- a/src/IceGlobalContext.h
+++ b/src/IceGlobalContext.h
@@ -477,6 +477,11 @@ public:
    return LockedPtr<StringPool>(Strings.get(), &StringsLock);
  }
+  /// Number of function blocks that can be queued before waiting for
+  /// translation
+  /// threads to consume.
+  static constexpr size_t MaxOptQSize = 1 << 16;
 private:
  // Try to ensure mutexes are allocated on separate cache lines.
@@ -543,7 +548,8 @@ private:
  Intrinsics IntrinsicsInfo;
  // TODO(jpp): move to EmitterContext.
  std::unique_ptr<ELFObjectWriter> ObjectWriter;
-  static constexpr size_t MaxOptQSize = 1 << 16;
+  // Value defining when to wake up the main parse thread.
+  const size_t OptQWakeupSize;
  BoundedProducerConsumerQueue<OptWorkItem, MaxOptQSize> OptQ;
  BoundedProducerConsumerQueue<EmitterWorkItem> EmitQ;
  // DataLowering is only ever used by a single thread at a time (either in

--- a/src/IceThreading.h
+++ b/src/IceThreading.h
@@ -67,7 +67,7 @@ public:
    }
    GrewOrEnded.notify_one();
  }
-  std::unique_ptr<T> blockingPop() {
+  std::unique_ptr<T> blockingPop(size_t NotifyWhenDownToSize = MaxStaticSize) {
    std::unique_ptr<T> Item;
    bool ShouldNotifyProducer = false;
    {
@@ -75,7 +75,7 @@ public:
      GrewOrEnded.wait(L, [this] { return IsEnded || !empty() || Sequential; });
      if (!empty()) {
        Item = pop();
-        ShouldNotifyProducer = !IsEnded;
+        ShouldNotifyProducer = (size() < NotifyWhenDownToSize) && !IsEnded;
      }
    }
    if (ShouldNotifyProducer)