Picture-in-Picture Mode On iOS: Implementation and Peculiarities

Fora Soft
4 min readMay 25, 2023

by Alexander G.

The Picture-in-Picture (PiP) mode allows users to watch videos in a floating window on top of other windows. They can move it around the screen and place in any convenient spot. This feature enables users to keep an eye on what they are watching while interacting with other websites or applications.

We have previously covered the PiP implementation on Android with code examples. In this article, we will focus on iOS.

Picture in Picture is a must-have feature for modern multimedia applications

Here’s why:

1. Enhances multitasking. PiP allows users to simultaneously watch videos or view images in a small window while maintaining access to the main content or application interface. This enables users to multitask, e.g. watch a video while checking emails, sending messages, or browsing social media.

2. Improves user experience. The mode provides more flexible and convenient app navigation. This significantly enhances the user experience by eliminating the need to interrupt content playback or switch contexts completely.

3. Minimizes session interruptions. PiP enables users to continue watching or tracking content while performing other tasks. This helps reduce interruptions and ensures a smoother and uninterrupted workflow. For example, a user can watch a tutorial or a YouTube livestream while searching for information on the Internet or taking notes.

All these factors help retain users within the application and increase the duration of app usage sessions.

Peculiarities and difficulties of PiP on iOS

Apple envisages two scenarios for using PiP on iOS:

1. For video content playback

2. For video calls

The main issue is that for the video call scenario, prior to iOS 16 and for iPads that do not support Stage Manager, it is necessary to request special permission from Apple to access the camera in multitasking mode (com.apple.developer.avfoundation.multitasking-camera-access). But even after waiting for several months, as in our case, Apple may still not grant these permissions.

Therefore, in our mobile video chat app, Tunnel Video Calls, we decided not to use such a scenario. Instead, we adopted the approach where a video call and its content are considered as video playback.

PiP lifecycle on iOS

The Picture-in-Picture mode is essentially the content exchange between a full-screen app and PiP content from another app. The lifecycle of this exchange can be schematically represented as follows:

1. Video is playing in full-screen mode.

2. The user initiates an event that triggers the transition to PiP mode, such as pressing a specific button or minimizing the app.

3. An animation is launched to transition the video to PiP mode — the full-screen video shrinks into a thumbnail and moves to the corner of the screen.

4. The transition process completes, and the application changes its state to the background state.

Then, when it is necessary to bring the video back to full-screen mode from PiP mode, the following steps occur:

1. The app is in the background state and displays PiP.

2. An event occurs that initiates the transition from PiP to full-screen mode and stops the Picture in Picture mode — such as pressing a button or expanding the app. The app enters the foreground.

3. An animation is launched to transition the video to full-screen mode. The app enters the state of displaying the video in full-screen.

Here’s how it looks:

Implementing PiP for video playback

To enable PiP, you need to create an AVPictureInPictureController(playerLayer: AVPlayerLayer) object, and it must have a strong reference.

if AVPictureInPictureController.isPictureInPictureSupported() { 
// Create a new controller, passing the reference to the AVPlayerLayer.
pipController = AVPictureInPictureController(playerLayer: playerLayer)
pipController.delegate = self
pipController.canStartPictureInPictureAutomaticallyFromInline = true
}

Next, you need to start playing the video content.

func publishNowPlayingMetadata() { 
nowPlayingSession.nowPlayingInfoCenter.nowPlayingInfo = nowPlayingInfo
nowPlayingSession.becomeActiveIfPossible()
}

After this, when the button is pressed or the app is minimized/expanded, the PiP will activate:

func togglePictureInPictureMode(_ sender: UIButton) { 
if pipController.isPictureInPictureActive {
pipController.stopPictureInPicture()
} else {
pipController.startPictureInPicture()
}
}

Implementing Picture in Picture mode in an iOS app for video calls using WebRTC technology is perhaps the most challenging part of the work. We would be happy to help you with it, so please reach out to us to discuss the details. Conceptually:

In this implementation, the camera will not capture the user’s image, and you will only be able to see the conversation partner.

To achieve this, you need to:

1. Create an AVPictureInPictureController object.

2. Obtain the RTCVideoFrame.

3. Retrieve and populate CMSampleBuffer based on RTCVideoFrame.

4. Pass the CMSampleBuffer and display it using AVSampleBufferDisplayLayer.

Here’s a sequence diagram illustrating the process:

--

--