How many seconds does it take to switch between ARKit and AVFoundation?

3 min readSep 25, 2020


I want to benefit from both ARKit and AVFoundation

Suspend ARKit and capture at AVFoundation, return to AR immediately

You may want to capture high quality images while using ARKit’s world / face tracking.
1920 * 1440 is the largest size for ARKit, so I asked, “Isn’t it okay to capture with AVFoundation?” “Can I use ARKit and AVFoundation at the same time?”

When I looked it up
ARKit and AVFoundation sessions cannot be launched at the same time.
So how many seconds would it take to switch quickly?

I tried it.
* Experimented with iPhone11 iOS14.

Experimental procedure

Save the tracking status, pause the AR session, and start the AVCaptureSession.

sceneView.session.getCurrentWorldMap { [self] worldMap, error in
time = 0.0
// Behind the scenes, Timer increment time. Return to 0 and start from here.
map = worldMap
//AVFoundation start

Experiment 1, Get the image with AVCaptureVideoDataOutput

Get the image in the captureOutput delegate method, stop the AVCaptureSession as soon as you take one, and restart the AR session.

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let image = UIImage(ciImage: CIImage(cvImageBuffer: pixelBuffer!))
let configuration = ARWorldTrackingConfiguration()
configuration.initialWorldMap = map, options: [])

Captured in 0.15 seconds.

However, the image captured by AVFoundation is dark because it is just after the session is started.

So I drop 5 frames, then I captured.
It took 0.29 seconds to take a beautiful picture with 3840 * 2160.
The screen stopped for about 0.5 seconds (just like releasing the shutter).

Experiment 2, Get the image with AVCapturePhotoCapture

I took it with the output for photography.

func photoOutput(_ output: AVCapturePhotoOutput, didFinishProcessingPhoto photo: AVCapturePhoto, error: Error?) {
if let imageData = photo.fileDataRepresentation() {
let uiImage = UIImage(data: imageData)
// 0.50
let configuration = ARWorldTrackingConfiguration()
configuration.initialWorldMap = map, options: [])

It took 0.5 seconds to take a beautiful picture.

However, when I released the shutter of the photo, ARKit returned slowly and the screen remained stationary for about 7 seconds.


AVCaptureVideoDataOutput takes 0.3second for capturing, and the screen remained stationary for about 0.5 seconds.

AVCapturePhotoCapture takes 0.5second for capturing, and the screen remained stationary for about 7.0 seconds.


My opinion is that AVCaptureVideoDataOutput is better. It can be used like a normal shutter.

By the way, when I took the tilt data of the device with ARWorldTracking, it was off by about 0.05 radians before and after the session switching. When I fixed the device, it didn’t shift, so my camera shake for 0.3 seconds while capturing was that much.



Request for

We send information related to machine learning.