A Field Guide to Immersive Sound: How to Find Your Way Through the Different Formats, Technologies and Sound Experiences

2023-04-116 Min read
A Field Guide to Immersive Sound: How to Find Your Way Through the Different Formats, Technologies and Sound Experiences

Talking Immersive Sound – The Glossary

How did we come up with today’s immersive sound? How did we create this spatialization of sound that immerses us in an ever more elaborate and compelling auditory environment? It is the result of several advanced technologies, developed over the years.

To fully understand immersive audio and its implications, we must first understand a few seemingly complex terms.

Formats: The Key to Creating Immersion

First of all, immersive sound is developed in terms of format. Several techniques allow to modulate and transform the sound, making it more and more adapted to the human ear. We are naturally used to hearing sounds coming from many directions, constantly. The sound is processed to reproduce this impression.

What’s the Difference Between Mono and Stereo?

When we talk about mono sound, we are talking about sound that is transmitted through a single speaker. Stereo is a way to expand the sound stage, thanks to two speakers placed in a triangular arrangement with the listener. The objective is to create the illusion of multiple virtual sources placed between the two speakers. This is the beginning of spatial sound.

Generally speaking, we prefer stereo sound; we perceive sound naturally from several places at the same time. It is more familiar to our ears and simply requires less effort from our brain to listen. With stereo broadcasting, it is possible to create an illusion close to reality. But this technique still falls short of the quality of immersive sound.

Quadraphonic Sound

Quadraphonic sound is a recording technique. Four pickup channels (microphones) are placed at different locations, angled at 90 degrees to each other. It is also a playback technique, with four speakers placed at four corners of a square or rectangle, surrounding the audience.

Quadraphonic sound is a type of multichannel sound that has gone far beyond the realm of research. Quadraphonic sound is used notably in mainstream music, with artists such as Pink Floyd being the first to use it during their concerts in the mid 1960s.

5.1 and 7.1 Sound

To go a little further, there are several formats of audio restitution, on a horizontal plane. So-called 5.1 sound and 7.1 sound are multichannel listening configurations. 5.1 sound comprises five speakers arranged around the listener, who is placed in the center. 7.1 Sound features two additional speakers on each side of the listener. This allows for better sound immersion, without having to be placed precisely in the center. In both cases, the speakers are placed around the individual, no playback channel is positioned above or below.

Wave Field Synthesis (WFS)

As part of the ongoing work to make sound more natural for human beings, the WFS format was created in the mid-2000s. Christiaan Huygens, a physicist and mathematician, was able to synthesize sound to optimize its reproduction based on physical rather than psycho-acoustic principles. WFS sound thus makes it possible to create virtual sound sources, which the listener perceives as being in the same place. WFS sound is a sonic hologram.

3D (or 360) Spatial Audio

3D audio is all about perception. The sound is played back in a particular configuration, with sound sources placed around the listener (next to, above, behind…) to create a sensation of a totally immersive environment. The only requirement is that the sound sources must be correctly placed around the audience to create a sound sphere.

The advantage of 3D audio is that it allows a person to perceive sounds coming from all directions. Of course, we hear sounds from all directions, all the time. So it is a system that allows us to reproduce the natural way we hear.

Ambisonics

Arriving just after WFS sound, ambisonics is a technique that enables the capture, synthesis and reproduction of a 3D sound environment. Ambisonic sound is thought of as a sphere that envelops the listener.

Ambisonic sound can be of several orders: a higher order can improve the precision of the origin of each sound emitted in space. For orders higher than 2, we speak of HOA (High Order Ambisonic) sound. The direction that each sound takes can be studied and encoded through precise mathematical calculations. This technique allows the use of a large number of sound channels, both in capture and restitution, for a more total immersion. Today these technologies are used particularly in video games and virtual reality.

Dolby Atmos

Dolby Atmos sound is a specific technique developed by the Dolby Lab for the recreation of 3D sound. Dolby Atmos supports mixing and monitoring procedures that are compatible with consumer systems. The technology is based on speakers positioned in a predefined way as well as sound objects that can be played back from all directions.

Technologies to Provide Even Greater Immersion

Deeply linked to the evolution of audio formats, technologies are constantly pushing forward to allow the most convincing recording and playback.

HRTF

HRTF (head-related transfer function) is a natural human response to the localization of a sound. The term refers to our ability to combine the echoes of sounds within our ears to allow our brain to interpret these variations and determine the source of the sound.

In audio, HRTF synthesis engines recreate these sound patterns, making it possible to create sound environments that are tailored to our brain. This technique is used to enable a tailored experience for each user, on headphones. It is notably a technique used in video games to enhance the immersive experience.

Binaural Sound

Binaural sound is most often used to describe listening with headphones. In a very real sense, the term binaural means « relating to hearing through both ears ». The term refers to recording and mixing techniques for listening via a broadcast system capable of sending separate sound to each of the listener’s ears. Most of the time, listening is done through headphones, giving the sensation of perceiving sounds from different sources.

HRTF synthesis engines are used in particular in these cases, to recreate the illusion of sounds coming from any direction.

Perception and Emotion: Making Immersive Sound Even More Sccessible

The human experience lies at the heart of sound technology. Immersive sound is consequently deeply linked to our emotions and to its cognitive impact.

Anechoic (or Echo-Free) Chamber

This term is used in relation to a space built in such a way that the walls, floors and ceilings absorb sound and echoes. These spaces are also called « deaf rooms » – they produce the most complete silence and allow the study of sound itself by observing its path in space and the impact on our perception.

Immersion in Sound

Originally immersion was a term that referred to a liquid environment. When we talk about sound immersion, we are indeed talking about being plunged into sound.

Such experiences are designed to have an impact on the emotional state of individuals, to provoke sensations by bringing sound to the brain in the most natural way possible.

Phonographic Projection

Following the same model as cinematographic projection, phonographic projection is the playback of audio recordings in a specially designed setting, for a collective experience. Rooms such as the EsPro, opened by IRCAM, allow people to participate in phonographic projections, in spaces that can be modulated and adapted to influence the audio output.

Spatialized Sound

This term refers to many of the computing, technological, and mathematical techniques we’ve discussed in this glossary for creating a sound that comes from multiple sources (or at least, that our brains hear as coming from multiple sources).

Spatial audio differs from 3D sound: the former adapts to the listener (if the person turns his or her head; the sound adapts to give the sensation of coming closer to a particular part of the sound); the latter is positioned around the listener (it doesn’t matter how the objects are positioned, but the listener must be in the center). Spatial sound does not need an object to work; 3D audio is very much related to playback through speakers.

Spatialized sound is the term used to mean sounds designed for a different cognitive impact, acting on the perception and emotion of listeners.

Human perception is at the heart of the work of sound technologies. Our ability to reproduce artificial sounds is only viable if they are adapted to our listening experience.

– Mathilde Neu and Antoine Petroff

Think we're on the same wavelenght?