Ⅰ. What is spatial audio?
Ⅱ. The development of spatial audio
Ⅲ. How does spatial audio work?
Ⅳ. Spatial audio recording method
Ⅴ. Spatial audio collection
Ⅵ. Spatial audio music platform
Ⅶ. Static spatial audio VS dynamic spatial audio
Ⅷ. Spatial audio processing method
Ⅰ. What is spatial audio?
Often referred to as 3D audio, spatial audio allows users to fully immerse themselves in a virtual three-dimensional space. Most smartphones can play spatial audio. But if we want to fully use spatial audio technology in music, movies, games or other types of content, we need to produce more spatial audio content and adopt more new technologies. For example, in the headphone/true wireless product category, spatial audio technology with head tracking can provide a fully immersive surround sound experience.
Ⅱ. The development of spatial audio
The development of audio has gone through the development process from monophonic to stereophonic, to multi-channel surround sound, and finally to spatial audio. Mono refers to the process of using only one microphone to pick up the sound and one speaker to play it back. It records audio signals from different directions and plays them through a speaker. In this way, the audience can only feel the timbre, pitch, loudness and front and rear position of the audio, but cannot feel the lateral movement of the sound. Stereo is composed of two channels with a phase difference. Compared with mono audio, it can feel the orientation and level of sound, and has the effect of spatial stereoscopic effect. Common stereo coding techniques include parametric stereo (Parametric Stereo, PS), intensity stereo (IntensityStereo, Is), <Left/Right, L/R), (Mid/Side, M/S), joint stereo (Joint Stereo, JS )wait. Multichannel surround sound is an audio signal with multiple channels. Compared with stereo, it has better spatial effects.
Ⅲ. How does spatial audio work?
Before spatial audio, surround sound was encoded so that sound could be assigned to specific speakers. Sound usually comes from the center speaker, and background music and effects usually come from the rear speakers. Instead of assigning sounds to specific speakers, spatial audio places them in a space. For example, effects can be positioned above and to the right of center. Based on the number of speakers and the speaker layout, your system figures out how best to make the sound appear to be coming from that location.
Spatial audio also adds height, making sound domes possible. You can hear the helicopter lift off and fly over your head, or feel the bullets whizzing past your ears.
Applying spatial audio to music is similar, but the impact on your experience is different. When a song is mixed into spatial audio, the music can actually surround you. You feel like you are in the center of the band. The vocals might be in front of you, the guitars might be from your right, and the harmonies might be from behind you. This can be a fun experience or completely disorienting. When you get used to a stereo mix of your favorite songs, a spatial audio mix can bring out something you've never heard before, or it can make you wish you heard it the way you've always had and loved it.
Ⅳ. Spatial audio recording method
1. Single point sound source recording
We can use a single-point mono microphone to record, then pan in the corresponding sound field in a traditional way, and then complete it through algorithmic technology, and reproduce it into a full 3D speaker setup.
2. Microphone array
We use a multi-channel microphone array to place multiple mono microphones to record sound sources and pan them through post-production software to place the signals in a three-dimensional scene.
3. Dummy head
The dummy head binaural recording technology uses two omni-directional microphones placed in the dummy head's ears to simulate human perception of sound and will provide the recording with important auditory information about sound source distance, sense of space, timbre and direction , reflecting the sound received by the listener's ears in the real environment.
4. Ambisonic microphones
Ambisonics is a multi-channel technology that allows you to spherically capture sound from all directions within a scene at a point in space. This can be achieved with a dedicated microphone with an Ambisonics model. These are uniquely designed microphones that contain four cardioid-shaped microphones pointing in different directions. This particular arrangement is called a tetrahedral array. The microphone signal is in Ambisonics A format and needs to be converted to B format for post-processing.
A format is the raw audio from an ambisonics microphone. Each microphone head is an audio channel. B-format is a standard multi-channel audio format for ambisonics audio. Ambisonics microphones of different models must have their native A-format recordings converted to standard B-format for post-production and compatibility. These days, major post-production and playback tools on the market support ambisonics, making ambisonics a suitable tool for virtual reality and production programs involving 3D spatial sound.
Ⅴ. Spatial audio collection
1. In-ear microphone and artificial head
Obviously, if we want to fully preserve the sound heard by the human ear, we can use an in-ear microphone to directly record the audio received by the left and right ear canals. Or we can use the artificial head method to build the human head, auricle, ear canal and other parts through a bionic model, and then collect spatial audio through the built-in microphone in the artificial ear on the artificial head.
The difference between in-ear microphones and artificial head acquisition is actually obvious. If you use the audio collected by the in-ear microphone and then play it with the in-ear headphones, it can basically be perfectly restored. And if it is recorded with an artificial head, the shape of the pinna and the shape of the head are different from your own. So although it is possible to achieve a large degree of space restoration, it is still somewhat different from actually going to the scene to listen to it yourself. In actual use, everyone's ears and head shapes are different, but the general shape and position are the same. Therefore, the use of artificial heads for audio recording is often used in many film and television and game audio productions.
2. Quad binaural
We can simply understand Quad Binaural as a 4-way dummy head microphone. We can use it to obtain sound fields with HRFT information in four horizontal directions of 0°, 90°, 180°, and 270°. Of course, if the sound comes from an angle other than these four directions, such as 120°, we can use the two sets of data of 90° and 180° to do the algorithm. The microphones using Quad Binaural technology are mainly 3Dio's Omini.
The advantage of Quad Binaural is that it collects natural HRTF information, so the later algorithm and decoding are very simple. And it pans out horizontally better than the usual low end Ambsonic way. But the disadvantage of this method is also obvious. Since the HRTFs in the four directions are all on the same horizontal plane, the height information cannot be fed back according to the head rotation. That is to say, when you shake your head left and right, the sound will change according to your direction. And when you look up or down, the sound doesn't change.
3. Ambisonics
The audio collected by the artificial head or the in-ear microphone is only a fixed-direction stereo restoration, and can only restore the sound that the head is facing at the time of collection. If you want to record the sound field of the entire space, you can turn your head to listen to the sound in any direction during playback, then you need another set of technology called high-fidelity stereo image reproduction (Ambisonics).
High-fidelity stereo image reproduction originated from a research on three-dimensional sound field reconstruction technology at Oxford University in the 1970s. The core of the technology is to reproduce the sound heard in the far end by recording it through a special microphone, such as a first-order ambisonics microphone (a cubic array of four identical microphone units).
Here the raw data collected by the first-order ambisonic microphone is called A-format. Four heart-shaped diaphragms point to four directions: left front LF, left rear LB, right front RF, and right rear RB. It cannot be played directly. We need to first convert to 4-channel B-format according to the multi-channel transcoding format. The 4-channel B-format is also called the first-order B-format. Four of these channels are called W, X, Y, and Z. To understand it simply, these four directions represent the center, left and right, front and rear, and up and down of a spherical sound field respectively. The B-format data can be rendered by software into any format supported by any playback device, such as stereo, 2.1, 5.1 or even 7.1.
Low-end ambisonics microphones can reproduce a relatively small sound field. And if it is an airport, large-scale concert and other scenes, we may need a high-end ambisonic microphone. It can be seen that the higher the order, the more microphones are required. For example, the Audio Camera of VisiSonics uses 7th-order Ambisonic technology with 64 channels. Ambisonics technology can well restore the sense of hearing of the entire sound field in AR, VR and other scenes that need to rotate the perspective, so it is widely used.
Ⅵ. Spatial audio music platform
1. Apple Music
Apple Music is a streaming service that lets you listen to tens of millions of great songs. It has many wonderful features, including downloading songs and playing offline, displaying lyrics in real time, listening across devices, recommending new songs based on your preferences, and curated playlists from editors, etc. Plus, it has exclusive content and original programming to enjoy.
Dolby Atmos brings you spatial audio that surrounds you. Lossless audio lets you hear beautiful details clearly. Dolby Atmos is an innovative audio technology for an immersive listening experience. Stereo mixed music can only be presented through the left and right channels, but music recorded in Dolby Atmos breaks through the limitation of channels, allowing the sound effects to linger around. In addition, musicians can also adjust the volume, ratio and intensity of each instrument to interpret the various subtleties of the work.
Apple Music subscribers can listen to thousands of Dolby Atmos-enabled songs on any headphones, as long as they’re running the latest version of Apple Music on their iPhone, iPad and Mac. Music that supports Dolby Atmos automatically plays in this mode when you listen to it with compatible Apple or Beats-branded headphones.
2. NetEase Cloud Music
Use the self-developed algorithm to separate the two-channel sound source, decompose different sound elements, and then use the sound space position transfer function to create an immersive space experience. This applies to all NetEase Cloud Music content. NetEase Immersive Sound separates the two-channel sound source through a self-developed algorithm. After decomposing different sound elements, the sound space position transfer function is used to create an immersive spatial experience, which is applicable to all content on this platform.
NetEase Cloud Music Mobile App has launched Dolby Atmos music service. The built-in Dolby Atmos zone will have rich music content resources. A new experience in music listening.
3. Huawei Music
Huawei Music launched the spatial audio experience zone. Enter the Huawei music space audio zone, and let's listen to the space audio versions of popular songs by Cai Jianya, Chen Linong, Chen Zitong TIFA, Gina Alice Gina, Sunnnee, Xu Wei, feel the all-round lingering of music, and experience the immersive space of sound. Audio Vivid redefined as "good sound".
4. QQ Music
QQ Music launched the Dolby Atmos music function, becoming the first domestic music platform to support Dolby Atmos. Super member users can now use Dolby Atmos-enabled Android phones to enjoy the immersive high-quality music experience of Dolby Atmos.
On July 6, 2022, Tencent Music Entertainment Group (TME) and Dolby Laboratories (Dolby), a leader in immersive entertainment, jointly announced the launch of the Dolby Atmos music function on QQ Music. QQ music platform became the first domestic music platform to support Dolby Atmos. Super member users can now enjoy an immersive high-quality music experience through Dolby Atmos. This marks the beginning of the strategic cooperation between Tencent Music Entertainment Group and Dolby. The two parties will further promote the popularization of Dolby Atmos music in China in the cooperation.
Dolby Atmos Music is a new way to create and experience music that maximizes artistic expression and creates a deeper connection between musicians and their fans. Music in Dolby Atmos goes above and beyond the ordinary listening experience, immersing you in the song, delivering rich detail with unparalleled clarity and depth. It gives musicians more creative space and freedom, allows them to fully realize their creative vision, and opens up a new realm of feeling music emotions for music fans. Whether listening to layers of instrumentation swirling around you, catching a singer's tiny breaths between lyrics, or feeling a melody drown you out, nothing brings you into the music like Dolby Atmos.
Ⅶ. Static spatial audio VS dynamic spatial audio
In traditional static spatial audio, when the head is turned, the content currently playing in the audio device will remain at the original position, that is, the audio on the right remains on the right, and the audio on the left remains on the left. This is because the spatialization effect is achieved without head tracking, which means that the spatial audio is locked to a fixed position. When the user listens to this type of spatial audio, there will be a feeling that the sound is near or far away.
With the blessing of head tracking, dynamic spatial audio can provide users with a more immersive experience. For example, if you turn your head to the right of the sound field, the sound field for the entire audio will rotate to the left by an equal amount, and the information going to each ear depends on the position of the head. As you move, it quickly fills in various parts of the soundstage around you. This means you can be immersed in a full 360-degree soundstage at all times. To reap the benefits of spatial audio, songs, games, movies and other media and programming must still support 5.1, 7.1 or Dolby Atmos formats. Only then can users experience static or dynamic spatial audio content.
Ⅷ. Spatial audio processing method
From the perspective of production technology, it can be divided into three schemes: object-based scheme, scene-based scheme and channel-based scheme. The three methods are briefly introduced below.
1. Based on object orientation
This approach can overcome the above-mentioned channel-based obstacles. We independently encode in which direction and how loud each sound object is, and let the sound replay try to position the audio where it needs to be. This allows the flexibility to adapt to specific factors such as the user's environment and platform. This format can reproduce audio from mono to a full 360-degree sphere.
2. Based on scene orientation
Capture complete scene information from the very center of the scene, most commonly using ambisonic technology. It is a full 360-degree sphere that can be captured from a single point with an ambisonic microphone or artificially created in post-production. Ambisonic comes in two different flavors: FOA (First Order) and HOA (Higher Order). FOA contains four channels - Omnidirectional, Left and Right, Front and Back, and Up and Down. HOA means more channels, more channels are technically equivalent to increased spatial resolution, and higher resolution means better localization.
3. Based on channel guidance
This is the most traditional and well-developed way of positioning, the production framework of which is linked to the format of reproduction. The various sound sources are mixed in Digital Audio Workshop and the final channel-based mix is created. It is usually used for a specific target loudspeaker layout, and each channel in the final product must be reproduced by loudspeakers at a well-defined position and delivered to the end user in a fixed audio mix. The mono, stereo, 5.1, and 7.1 we often hear are all of this type.
Tags:Audio Products