Technology

Ambisonics

Ambisonics is a system for capturing, manipulating, and rendering a full sphere sound field, originally developed by Michael Gerzon in the 1970's. The Wikipedia article on ambisonics is a good place to start for a basic understanding. Ambisonics is different than most surround systems in that the channels do not represent speakers but rather are used to represent the sound field via spherical harmonics, as illustrated in the image to the right.

Today even higher order ambisonics, once the realm of academics, is now a normal part of VR audio workflows. Using higher orders allow more spatial resolution in the sound field, which become more important as the number of speakers, or virtual speakers increases.

Encoding

Encoding is the process of converting the raw microphone signals from a surround microphone, called A-format into a B-format signal. This process was published by Gerzon in his 1975 paper "The Design of Precisely Coincident Microphone Arrays for Stereo and Surround Sound." But when the first commercial tetrahedral microphone was built by SoundField, encoding had to be done with complex analog circuits. Modern digital signal processing allows us to do the job more easily and more accurately. Those strange shapes to the right show the effects of spatial aliasing in a tetrahedral mic - just one of the effects that surround microphone encoding systems must account for.

Starting in 2007, VVAudio worked with Core Sound and Richard Lee, a real ambisonics pioneer, to develop the signal chain for processing the new Core Sound TetraMic. The results were implemented in VVAudio's VVMic for TetraMic and then in VVTetraVST, the first ever commercial implementation of A-format recording. With the release of VVEncode, TetraMic encoding can be done natively in ProTools as well.

VVEncode also supports the Brahma tetrahedral microphone. These mics use a very different approach to calibration and correction which results in a matrix of sixteen convolution-based filters. In fact, any tetrahedral microphone whose calibration can be converted to this format can be processed by VVEncode.

Now, in 2018, Core Sound has released the OctoMic, the first commercial, second order microphone. Together with Fons Andriessen, they developed the calibration system and signal chain for the OctoMic. VVAudio implements this signal chain in the plugin VVOctoEncode, part of our new higher order ambisonics suite VVHOA

Decoding

Decoding is the process of converting B-format into speaker feeds. At its most basic level, decoding is just a few sums and differences, much like MS recording but in 3D. Several fine points have been developed over the years including shelf filters to accommodate how we hear at high and low frequencies, near field correction (NFC) to correct for the fact that the speakers are not producing plane waves, and a matrix pseudo-inverse method for calculating the best decode coefficients. VVAudio's plugin VVDecode implements all of these methods as described in the classic work "Is My Decoder Ambisonic" by Benjamin, Lee, and Heller.

VVAudio has recently released a higher order decoder, VVDecodeH, to go with VVOctoEncode. In order to optimize higher order decoding, the simple shelf filter switch of first order becomes a pair of knobs for source and speaker distance. This implements Near Field Corrected Higher Order Ambisonics (NFC-HOA) as described in Jérôme Daniel's "Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format". Note how the second order pseudo-inverse response pictured at right has smaller main lobes than the first order response above and how the negative tail points to the side.

Parametric Decoding

In order to get better localization, especially when using large numbers of speakers, advanced decoder techniques have been developed that do processing in the frequency domain. Such decoders are called parametric. VVAudio has developed a novel, parametric decoding technology, available now for custom projects as soon in other formats.

Binaural Decoding

Binaural recording techniques use knowledge about how our ears, head, and body change the sound on the way to our eardrums to deliver a surround sound image using headphones. Binaural has become very popular lately with the rise of virtual reality since it can deliver the same sense of immersion that the visuals strive to achieve. An ambisonic recording can be converted to binaural by rendering several virtual speakers and then processing each with the appropriate HRTF. VVAudio has been working on binaural technologies and the image shows an HRTF from the Listen database compared to a synthetically generated HRTF.

Ambisonic Processing

By far the most common form of ambisonic processing is a simple rotate, but others like zoom, reverb, and echo, are also possible. Non-ambisonic plugins can be used on ambisonic signals as long as care is taken to process all channels the same way. Sometimes special processing is needed, for example VVAudio's rotate library actually has three different algorithms depending on how fast the soundfield is rotating. The symmetrical nature of ambisonics makes surround specific processes like rotation or spatial EQ practical and well behaved. The shape to the left shows an example of a ninth order spatial EQ.

VVSDK

You can use the same ambisonic processing library in your project that VVAudio's plugins use, ensuring consistent results from standard DAW's to your custom code. It's called VVSDK and it includes modules can include encoding for any of the microphones that VVAudio supports as well as rotate and various decoding options. Most modules are available as C++ source code, though a few are binary only to protect proprietary algorithms. Contact us at info@vvaudio.com to include VVSDK in your next project.

VVUnity

Unity 3D offers a great platform for creating VR games and other applications and as such it makes a great platform for demoing ambisonics. VVAudio have ported all of our best algorithms into a set of C# and native classes for use in the Unity environment. The most common decoding methods are availble everywhere including native Android and pure C# versions. Output can be binaural or speaker compatible and CPU use is carefully managed. See VVUnity for more information.