Commit a7ff7df2 by Shahbaz Youssefi Committed by Commit Bot

Vulkan: Improve wording of PresentSemaphores.md

Bug: angleproject:3450 Change-Id: Iee5360a7b9cced403c08b7883fa11420e250244f Reviewed-on: https://chromium-review.googlesource.com/c/angle/angle/+/1784065Reviewed-by: 's avatarIan Elliott <ianelliott@google.com> Commit-Queue: Shahbaz Youssefi <syoussefi@chromium.org>
parent 32d6006b
...@@ -28,8 +28,13 @@ following: ...@@ -28,8 +28,13 @@ following:
GPU: <------------ R -----------> GPU: <------------ R ----------->
PE: <-------- P ------> PE: <-------- P ------>
That is, the GPU starts rendering after submission, and the presentation is done when rendering is That is, the GPU starts rendering after submission, and the presentation is started when rendering is
finished. With multiple frames, the pipeline looks different based on present mode. Let's focus on finished. Note that Vulkan tries to abstract a large variety of PE architectures, some of which do
not behave in a straight-forward manner. As such, ANGLE cannot know what the PE is exactly doing
with the images or when the images are visible on the screen. The only signal out of the PE is
received through the semaphore that's used in ANI.
With multiple frames, the pipeline looks different based on present mode. Let's focus on
FIFO (the arguments in this document translate to all modes) with 3 images: FIFO (the arguments in this document translate to all modes) with 3 images:
CPU: QS QP QS QP QS QP QS QP CPU: QS QP QS QP QS QP QS QP
...@@ -38,26 +43,29 @@ FIFO (the arguments in this document translate to all modes) with 3 images: ...@@ -38,26 +43,29 @@ FIFO (the arguments in this document translate to all modes) with 3 images:
PE: <----- P I1 -----><----- P I2 -----><----- P I3 -----><----- P I1 -----> PE: <----- P I1 -----><----- P I2 -----><----- P I3 -----><----- P I1 ----->
First, an issue is evident here. The CPU is submitting jobs and queuing images for presentation First, an issue is evident here. The CPU is submitting jobs and queuing images for presentation
faster than the GPU can render them or the PE can view them. This causes the length of the PE queue faster than the GPU can render them or the PE can view them. This can cause the length of the
to grow indefinitely, resulting in larger and larger input lag. submit queue to grow indefinitely, resulting in larger and larger input lag. In FIFO mode, the PE
present queue also grows indefinitely.
To address this issue, ANGLE paces the CPU such that the length of the PE queue is kept at a maximum To address this issue, ANGLE paces the CPU such that the length of the submit queue is kept at a
of 1 image (i.e. one image is being presented, and another one is in queue): maximum of 1 image (i.e. submission with one image is being processed, and another one is in queue):
CPU: QS QS W:F1 QS W:F2 QS CPU: QS QS W:F1 QS W:F2 QS
I1 I2 I3 I1 I1 I2 I3 I1
S:F1 S:F2 S:F3 S:F4 S:F1 S:F2 S:F3 S:F4
GPU: <---- R I1 ----><---- R I2 ----><---- R I3 ----><---- R I1 ----> GPU: <---- R I1 ----><---- R I2 ----><---- R I3 ----><---- R I1 ---->
> Note: While this works in heavy applications (as the rendering time is almost as long as the frame > Note: Ideally, the length of the PE present queue should also be kept at a maximum of 1 (i.e. one
> (i.e. present time), in which case pacing the submissions similarly paces the presentation), it's > image being presented, and another in queue). However, the Vulkan WSI extension doesn't provide
> not technically keeping the PE queue length 1, but rather below n+2 where n is the number of > enough control to achieve this. In heavy application, the length of the PE present queue is
> swapchain images. > probably 1 anyway (as the rendering time is almost as long as the frame (i.e. present time), in
> which case pacing the submissions similarly paces the presentation). In theory, in FIFO mode, the
> length of the PE present queue is below n+2 where n is the number of swapchain images.
> >
> To understand why, imagine a FIFO swapchain with 1000 images and submissions that are > To understand why, imagine a FIFO swapchain with 1000 images and submissions that are
> infinitesimally short. In this case, the CPU pacing is effectively a no-op (as the GPU instantly > infinitesimally short. In this case, the CPU pacing is effectively a no-op (as the GPU instantly
> finishes jobs) for the first 1002 submissions. The 1003rd submission waits for F1001 (which uses > finishes jobs) for the first 1002 submissions. The 1003rd submission waits for F1001 (which uses
> I1). However, the 1001st submission will not start until the PE is finished presenting I1 (at the > I1). However, the 1001st submission will not start until the PE switches to presenting I2 (at the
> next V-Sync). The CPU then waits for V-Sync before the 1003rd submission. The CPU waits for one > next V-Sync). The CPU then waits for V-Sync before the 1003rd submission. The CPU waits for one
> V-Sync for every subsequent submission, keeping the length of the queue 1002. > V-Sync for every subsequent submission, keeping the length of the queue 1002.
> [`VK_GOOGLE_display_timing`][DisplayTimingGOOGLE] is likely a solution to this problem. > [`VK_GOOGLE_display_timing`][DisplayTimingGOOGLE] is likely a solution to this problem.
...@@ -69,6 +77,11 @@ semaphore! This means that the application cannot generally know when to destro ...@@ -69,6 +77,11 @@ semaphore! This means that the application cannot generally know when to destro
semaphore. However, taking ANGLE's CPU pacing into account, we are able to destroy (or rather semaphore. However, taking ANGLE's CPU pacing into account, we are able to destroy (or rather
reuse) semaphores when they are provably unused. reuse) semaphores when they are provably unused.
This document describes an approach for destroying semaphores that should work with all valid PE
architectures, but will be described in terms of more common PE architectures (e.g. where the PE
only backs each VkImage and VkSemaphore handle with one actual memory object, and where the PE
cycles between the swapchain images in a straight-forward manner).
The interested reader may follow the discussion in this abandoned [gerrit CL][CL1757018] for more The interested reader may follow the discussion in this abandoned [gerrit CL][CL1757018] for more
background and ideas. background and ideas.
...@@ -100,8 +113,9 @@ Say we are at frame Y+2. There's therefore a wait on FY. The following holds: ...@@ -100,8 +113,9 @@ Say we are at frame Y+2. There's therefore a wait on FY. The following holds:
FY is signaled FY is signaled
=> SAY is signaled => SAY is signaled
=> Previous presentation of I1 (corresponding to SPX) is finished => The PE has handed I1 back to the application
=> SPX is waited => The PE has already processed the *previous* QP of I1
=> SPX is waited on
At this point, we can destroy SPX. In other words, in frame Y+2, we can destroy SPX (note that 2 is At this point, we can destroy SPX. In other words, in frame Y+2, we can destroy SPX (note that 2 is
the number of frames the CPU pacing code uses). If frame Y+1 is not using I1, this means the the number of frames the CPU pacing code uses). If frame Y+1 is not using I1, this means the
...@@ -114,15 +128,16 @@ present semaphores for each image (again, 3 is H+1 where H is the swap history s ...@@ -114,15 +128,16 @@ present semaphores for each image (again, 3 is H+1 where H is the swap history s
pacing) and always reuse (instead of destroy) the oldest semaphore of the image that is about to be pacing) and always reuse (instead of destroy) the oldest semaphore of the image that is about to be
presented. presented.
To summarize, we use the completion of a submission using an image to provably when the *previous* To summarize, we use the completion of a submission using an image to prove when the semaphore used
presentation of that image was finished. for the *previous* presentation of that image is no longer in use (and can be safely destroyed or
reused).
## Swapchain recreation ## Swapchain recreation
When recreating the swapchain, all images are freed and new ones are created, possibly with a When recreating the swapchain, all images are eventually freed and new ones are created, possibly
different count and present mode. For the old swapchain, we can no longer rely on the completion of with a different count and present mode. For the old swapchain, we can no longer rely on the
a future submission to know when a previous presentation is done, as there won't be any more completion of a future submission to know when a previous presentation's semaphore can be destroyed,
submissions using images from the old swapchain. as there won't be any more submissions using images from the old swapchain.
> For example, imagine the old swapchain was created in FIFO mode, and one image is being presented > For example, imagine the old swapchain was created in FIFO mode, and one image is being presented
> until the next V-Sync. Furthermore, imagine the new swapchain is created in MAILBOX mode. Since > until the next V-Sync. Furthermore, imagine the new swapchain is created in MAILBOX mode. Since
...@@ -134,14 +149,14 @@ submissions using images from the old swapchain. ...@@ -134,14 +149,14 @@ submissions using images from the old swapchain.
ANGLE resolves this issue by deferring the destruction of the old swapchain and its remaining ANGLE resolves this issue by deferring the destruction of the old swapchain and its remaining
present semaphores to the time when the semaphore corresponding to the first present of the new present semaphores to the time when the semaphore corresponding to the first present of the new
swapchain can be destroyed. In the example in the previous section, if SPX is the present semaphore swapchain can be destroyed. In the example in the previous section, if SPX is the present semaphore
of the first QP done on the new swapchain, at frame Y+2, when we know SPX can be destroyed, we know of the first QP performed on the new swapchain, at frame Y+2, when we know SPX can be destroyed, we
that the first image of the new swapchain has already been presented. This proves that all previous know that the first image of the new swapchain has already been presented. This proves that all
presentations of the old swapchain have finished. previous QPs of the old swapchain have been processed.
> Note: the swapchain can potentially be destroyed much earlier, but with no feedback from the > Note: the swapchain can potentially be destroyed much earlier, but with no feedback from the
> presentation engine, we cannot know that. This delays means that the swapchain could be recreated > presentation engine, we cannot know that. This delays means that the swapchain could be recreated
> while there are pending old swapchains to be destroyed. The destruction of both old swapchains > while there are pending old swapchains to be destroyed. The destruction of both old swapchains
> must now be deferred to when the first present of the new swapchain has finished. If an > must now be deferred to when the first QP of the new swapchain has been processed. If an
> application resizes the window constantly and at a high rate, ANGLE would keep accumulating old > application resizes the window constantly and at a high rate, ANGLE would keep accumulating old
> swapchains and not free them until it stops. While a user will likely not be able to do this (as > swapchains and not free them until it stops. While a user will likely not be able to do this (as
> the rate of window system events is lower than the framerate), this can be programmatically done > the rate of window system events is lower than the framerate), this can be programmatically done
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment