Capturing the Soundfield: Recording Ambisonics for VR

When building experiences in virtual reality we’re confronted with the challenge of mimicking how sounds hit us in the real world from all directions. One useful tool for us to attempt this mimicry is called a soundfield microphone. We tested one of these microphones to explore how audio plays into building immersive experiences for virtual reality.

Approaching ambisonics with the soundfield microphone has become popular in development for VR particularly for 360 videos. With it, we can record from one fixed point and still match sounds accurately in all directions. This makes it convenient when editing your video as only one microphone needs to be removed from the visuals rather than several placed around the scene. Ambisonic audio is also useful as a background ambience for VR experiences rendered in game engines like Unity. This technology has been with us since the 1970s, courtesy of Michael Gerzon and Peter Craven, but was largely a commercial failure. Its applicability for VR, however, has revived interest in this approach.

Core Sound TetraMic
Core Sound TetraMic

We tested Core Sound’s TetraMic and processed the audio gathered from this soundfield microphone into an ambisonic format usable for virtual reality. It’s designed conventionally with the four cardioid microphone capsules arranged in a tetrahedron. You can picture each one of these capsules picking up all the sound that’s directly in front of them in a kind of dome. When each of these records simultaneously, we’re getting audio information of the entire space with a moderate degree of overlap between the mics. This just gives us four distinct audio signals from the areas each capsule is facing. We need to bring this audio data together in a way that it can be relayed over two channels in headphones and shift as we move our head about the scene. Our format conversion to ambisonics will permit this.

A Tetrahedron
Stella software
A Tetrahedron

These four unprocessed signals are the A-Format. These need to be converted, usually by software provided by the microphone’s manufacturer, into the Ambisonic B-Format standard. The processing transforms these four separate signals into the three-dimensional soundfield, creating a kind of virtual sphere around the microphone. This gives us the flexibility to decode the format into any kind of polar pattern, to any number of audio signals we wish. The format consists of four different signals: W, X, Y and Z. The naming convention is derived from spherical harmonics, the functions which define the surface of a sphere. They’re simply ordered in the same way our axes would be in the right-hand coordinate system, so don’t read too much into the number of signals.

  • W: simulating the output of an omnidirectional microphone, or full sphere.
  • X: front and back directionality, simulating a forward facing "figure of eight" microphone
  • Y: left to right directionality, simulating a horizontal "figure of eight" microphone
  • Z: up and down directionality, simulating a vertical "figure of eight" microphone

You can think of the overlapping signals in figure 1 as the totality of our soundfield in the B-Format, and the rotating polar patterns in figure 2 as the ways in which we can manipulate the directionality of audio contained in that soundfield by morphing the signals.

Figure 1. The overlapping signals in this diagram represent the totality of the soundfield in the B-Format
Virtual Microphone Animation by Nettings at English Wikipedia
Figure 2. These rotating polar patterns represent the ways we can manipulate the directionality of audio in our B-Format soundfield.

Conceivably this format can be decoded into any signal pattern, which is what makes ambisonics so flexible. Let’s picture our standard microphone, capturing sound only in the direction you’re pointing it in. With the information gathered from that mic, all we have is what’s directly in front of the capsule. With the soundfield microphone, because we’re now working with sound in all directions we can use a software to create virtual microphones. If I want to process the audio to focus on what’s directly behind me, I can use the software to isolate that portion of the sound field and render an audio file with that focused directionality. We never had a microphone pointing exclusively in that direction, but because I have the entire sound field captured I can isolate that source by creating this "virtual microphone."

While soundfield microphones can get the process started for this configuration of audio, it’s not as simple as hitting record and there’s your virtual sphere. Most portable multi-channel recorders will compress audio to stereophonic format (two-channels). Using the Core Sound TetraMic with the TASCAM DR-701D, our four signals were compressed to two stereo files. Our first and second capsule signals will be rendered to one stereo file, while the third and fourth will be rendered to another. It’s important here to log at the beginning of a recording which direction each microphone capsule is facing for accurate positioning once you’re working with the compressed B-format file.

A Reaper project
Using a digital audio workstation such as Reaper, you can route the two stereo files from your recording onto a master track with four channels. We’re doing this because our processing software will only take one file, and each signal needs to be mapped to it. Rendering the master track to a multichannel file will provide the A-format needed to then convert to Ambisonic B-format.

The VVMic application

The Core Sound TetraMic includes a windows application calibrated specifically to the microphone, called VVMic. This software will take our four-channel file (A-format) as an input, and render an ambisonic file as its output (B-format). Getting from A to B might vary depending on the model microphone being used for the recording.

There’s been some variance on the standards for the calculations and formulas generating B-Format audio, but software developers have largely agreed upon ACN/SN3D. Simply put, this just describes the audio’s component ordering and normalization. Audio playback for VR in game engines like Unity, or over 360 video players on YouTube or Facebook will accept this standard. Their software will then read this format and decode in real-time to your headphones, or two channels, based one where you’re viewing.

You’ll notice the decode tab on our VVMic software. This gives us the option to decode our B-Format file to a specific number of outputs, or channels, depending on what our production needs are… and here lies the unique flexibility which makes our soundfield microphone so interesting. With the soundfield now contained in one file we can highlight sounds from any direction, creating "virtual microphones" from our capsule array, and output them to any number of signals.

The 'decode' interface in VVMic provides the unique flexibility to simulate 'virtual microphones' and highlight sounds from any direction.

So, to review, the soundfield microphone permits us a way in which we can record ambisonic audio to reproduce surround sound in full 360 degrees. We’ll process the audio gathered, first to one multichannel file and then to the ambisonic format. From there, we have audio which can then be placed into a 360 video or VR game engine which carries spatialized sound over two channels to your headphones and phases according to your movement. We’re just beginning our exploration of immersive audio for VR and would love to hear your thoughts on the subject. Please reach out to get the conversation going by reaching us @knightlab or contacting me directly @jwhitesu.

Pick-up Patterns

A big part of understanding spatial sound is understanding what different microphones actually "hear." Sound engineers know the polar (pick-up) patterns for all of the microphones in their kit. The pick-up pattern describes how sensitive a microphone is to sounds hitting it at different angles from its center. A diversity of polar patterns means that recording engineers have a variety of approaches to capturing audio in different directions.
Omnidirectional
equal sensitivity at every angle, picking up sound from all directions. This is ideal for capturing the environmental ambience from around a dominant sound source.
Cardioid
best for capturing sound directly in front of where it’s pointed. Positioning incorrectly relative to the sound source results in "off-axis" coloration, or a dull muted effect when the angle is off placement.
Figure of Eight
picks up sound from the front and the rear of where it’s been placed, muting sounds on either side. Useful for recording two sound sources at once, or as a component of stereo recording techniques.

A Spatial Sound Glossary

We know it can be challenging picking up the new vocab that goes with new technology, but it pays off to know the language. Here's a cribsheet.

A-Format
the first set of signals produced by a soundfield microphone, one from each of the capsules in the microphone array. These four signals give us the full scope of directional information necessary to create a "soundfield".
B-Format
the standard audio format for ambisonics, consisting of a spherical wave field around the microphone itself. Derived after processing the initial recording from the soundfield microphone through a software that phases A-Format signals together to produce directionality and depth. The typical naming convention of these signals are W, X, Y and Z. W is a sound pressure signal that mimics an omnidirectional microphone recording in all directions, while XYZ mimic figure-of-eight mics rigged along our three spatial axes (front-back, left-right, up-down).
ACN/SN3D
method of channel ordering and normalization, currently the most common data exchange format for ambisonic audio. Stands for "Ambisonic Channel Number" and “Schmidt semi-normalization.”
Ambisonics
a "full-sphere" surround sound technique that covers sound sources in three dimensions. Carries no distinct audio signals, but rather a representation of a sound field encoded into “B-format” audio. This audio can be decoded to derive “virtual microphones,” pick-up patterns which localize sounds in any direction.
Audio Signal
what is passed from the microphone to a loudspeaker or recording device, carried over an audio channel. In stereo, for instance, we receive two distinct audio signals over left and right audio channels.
Soundfield Microphone
a microphone which typically has four closely spaced cardioid capsules arranged in a triangular pyramid or tetrahedron. Captures sound in all directions by using these four distinct signals to recreate a three-dimensional soundfield.
Surround sound
most often when multiple channels are sending audio signals to multiple speakers. To create the sensation of sound hitting the listener from all directions signals are routed to designated speakers to achieve a fixed perspective, or "sweet spot"

The featured image for this article is CC BY 3.0 and originally appeared on "Resonance Audio: Fundamental Concepts", published on the Google Developers website.

About the author

J Kyle White-Sullivan

Device Lab Fellow

Latest Posts

  • Building a Community for VR and AR Storytelling

    In 2016 we founded the Device Lab to provide a hub for the exploration of AR/VR storytelling on campus. In addition to providing access to these technologies for Medill and the wider Northwestern community, we’ve also pursued a wide variety of research and experimental content development projects. We’ve built WebVR timelines of feminist history and looked into the inner workings of ambisonic audio. We’ve built virtual coral reefs and prototyped an AR experience setting interviews...

    Continue Reading

  • A Brief Introduction to NewsgamesCan video games be used to tell the news?

    When the Financial Times released The Uber Game in 2017, the game immediately gained widespread popularity with more than 360,000 visits, rising up the ranks as the paper’s most popular interactive piece of the year. David Blood, the game’s lead developer, said that the average time spent on the page was about 20 minutes, which was substantially longer than what most Financial Times interactives tend to receive, according to Blood. The Uber Game was so successful that the Financial...

    Continue Reading

  • With the 25th CAR Conference upon us, let’s recall the first oneWhen the Web was young, data journalism pioneers gathered in Raleigh

    For a few days in October 1993, if you were interested in journalism and technology, Raleigh, North Carolina was the place you had to be. The first Computer-Assisted Reporting Conference offered by Investigative Reporters & Editors brought more than 400 journalists to Raleigh for 3½ days of panels, demos and hands-on lessons in how to use computers to find stories in data. That seminal event will be commemorated this week at the 25th CAR Conference, which...

    Continue Reading

  • Prototyping Augmented Reality

    Something that really frustrates me is that, while I’m excited about the potential AR has for storytelling, I don’t feel like I have really great AR experiences that I can point people to. We know that AR is great for taking a selfie with a Pikachu and it’s pretty good at measuring spaces (as long as your room is really well lit and your phone is fully charged) but beyond that, we’re really still figuring...

    Continue Reading

  • Capturing the Soundfield: Recording Ambisonics for VR

    When building experiences in virtual reality we’re confronted with the challenge of mimicking how sounds hit us in the real world from all directions. One useful tool for us to attempt this mimicry is called a soundfield microphone. We tested one of these microphones to explore how audio plays into building immersive experiences for virtual reality. Approaching ambisonics with the soundfield microphone has become popular in development for VR particularly for 360 videos. With it,...

    Continue Reading

  • Audience Engagement and Onboarding with Hearken Auditing the News Resurrecting History for VR Civic Engagement with City Bureau Automated Fact Checking Conversational Interface for News Creative Co-Author Crowdsourcing for Journalism Environmental Reporting with Sensors Augmented Reality Visualizations Exploring Data Visualization in VR Fact Flow Storytelling with GIFs Historical Census Data Information Spaces in AR/VR Contrasting Forms Of Interactive 3D Storytelling Interactive Audio Juxtapose Legislator Tracker Storytelling with Augmented Reality Music Magazine Navigating Virtual Reality Open Data Reporter Oscillations Personalize My Story Photo Bingo Photojournalism in 3D for VR and Beyond Podcast Discoverability Privacy Mirror Projection Mapping ProPublica Illinois Rethinking Election Coverage SensorGrid API and Dashboard Sidebar Smarter News Exploring Software Defined Radio Story for You Storyline: Charts that tell stories. Storytelling Layers on 360 Video Talking to Data Visual Recipes Watch Me Work Writing and Designing for Chatbots
  • Prototyping Spatial Audio for Movement Art

    One of Oscillations’ technical goals for this quarter’s Knight Lab Studio class was an exploration of spatial audio. Spatial audio is sound that exists in three dimensions. It is a perfect complement to 360 video, because sound sources can be localized to certain parts of the video. Oscillations is especially interested in using spatial audio to enhance the neuroscientific principles of audiovisual synchrony that they aim to emphasize in their productions. Existing work in spatial......

    Continue Reading

Storytelling Tools

We build easy-to-use tools that can help you tell better stories.

View More