When we think of sound and how we experience it, most of the time it’s pretty limited: we stream music in stereo, some apps may pan sound to give it a spatial audio effect, some games do not put emphasis on sound experience, because graphics take higher priority. Before there were frameworks to process and manipulate sound, companies use to have a whole room dedicated to create surround sound. Typically, the price of the equipment was beyond what an average person would be willing to pay. Luckily, today’s technology offers that at a fraction of the price. 
 
Bose offers AR-enabled hardware equipped with a magnetometer, accelerometer, gyroscope, and a microphone. There are also underlying virtual sensors that can be activated, such as game rotation and absolute rotation. In order to build this app, you’ll need the FramesQC 35 II’s, or the NC 700’s. If you don’t have one, be sure to join us in conferences we attend, meetups we hold, and code jams to be able to play with the hardware, or even possibly win one!
 
This will be a technical blog that will cover a few topics in iOS, including AVFoundation, Bose AR iOS SDK, and how to build a spatialize audio app using Bose AR and AVFoundation. 
 

Table of Contents

Using AVFoundation with Core Audio 

If you have gone through the Apple docs, you probably struggled a day or two to string concepts together. Some of you may have tossed in the towel and looked at StackOverflow. In an effort to understand AVFoundation, I wrote some technical docs that I wish I had. 

AVFoundation provides Obj-C classes to play and create time-based audiovisual media. We’re focused on audio - so we won’t go into media and animation.  In terms of audio, AVFoundation provides Obj-C classes to play, record, and manage audio.  It sits on top of Core Audio that consists of frameworks designed to handle audio.
 
AVFoundation has several classes some of us may have seen in passing, such as AVAudioPlayer (play audio) and AVAudioRecorder (record audio).  Playing and recording audio by itself can be written in several lines as seen here. What if we want to spatialize audio- where do we begin? Searching endlessly on stack overflow and GitHub can probably help you resolve that issue in a few hours (or you can skip this and check out my GitHub).  
 
To really understand how to spatialize audio and build more complex audio-based apps, we need to understand how the concepts are all strung together. The best way to visualize the concepts is to draw it. Once we understand these concepts, the code in this app and all your future audio-based apps will be 10x’s easier to understand and build.

AVFoundation Classes

Some classes include:
  • AVAudioEngine
  • AVAudioPlayer
  • AVAudioNode
  • AVAudioEnvironment 
  • AVAudioBuffer
  • AVAudioFormat
  • AVAudioSession
  • AVAudio Recorder
 
A list of all classes  available here. We’re going to take a deep dive into the classes bolded above. 
AVAudioEngine is an object that maintains a graph of audio nodes used to generate and process audio signals and perform audio input and output. The way we attach audio nodes to the audio engine is described below. This is a high-level overview and incomplete. We’ll go into the details and provide a complete overview of this later on in this blog. 
 
  1. Create an instance of the engine
  2. Create instances of the nodes
  3. Attach nodes (i.e. AVAudioPlayerNode) to the engine
  4. Use the engine to connect different nodes together (i.e. AVAudioPlayerNode and AVEnvironmentNode) and create a chain
  5. Start the engine so audio can flow through the chains

We first attached the AVAudioPlayerNode and AVAudioEnvironmentNode to the AVAudioEngine:

 

 For this app, you’ll attach an AVAudioPlayerNode and an AVAudioEnvironmentNode to an instance of the AVAudioEngine. Visually, it’ll look like this

With the AVAudioEngine, we can dynamically reconfigure the nodes.

The AVAudioNode’s we’ll be using are: 
  • AVAudioPlayerNode
  • AVAudioEnvironmentNode
  • AVAudioOutputNode
 
These nodes are part of the three different classifications of AVAudioNodes shown below:
Source Node: Where audio comes from, either from a file or a microphone.
 
Processing Node: How we process the audio, such as creating effects and/or spatializing it. 
 
Destination Node: The output for the audio, such as speakers or headphones. The AVAudioOutputNode is a property on the AVAudioEngine. What we’ll do (and not shown in the diagram above) is connect the AVAudioEnvironmentNode to the AVAudioOutputNode (I’ll show this diagram later).
 
AVAudioNodes have input and output busses attach to them. The busses have formats you specify either in terms of sample rate or channel count. For our purposes, the AVAudioNodes will have a bus 0 for both input and output busses. 

 

You’ll download a music file that is a mono channel. For any spatialize audio, it needs to be mono. The AVAudioPlayerNode is going to schedule the file to play in an infinite loop.  The AVAudioEnvironmentNode has an implicit listener where you can control the position and orientation. We are going to update the listener’s positions with the Bose AR Hardware. More specifically, we’ll get the game rotation’s pitch, roll, and yaw positions and plug that in the listener’s coordinates .  The AVAudioPlayerNode will have a fixed position (see diagram below). The audio that has been processed by the AVAudioEnvironmentNode will output to the Bose AR hardware.
 
Here's the diagram of the objects we'll be using and how they connect to each other:
 
Here's a diagram of how the listener position is changing position relative to the AVAudioPlayerNode:
I hope this overview of AVFoundation will serve as building blocks for how the framework works and to preface the spatial audio app we will build together. This overview is not meant to supplement the Apple docs, but to help be a guide with Apple docs.