Ⅰ. What is audio?
Ⅱ. The form of audio
Ⅲ. Audio frequency range
Ⅳ. Three elements of audio: pitch, timbre, loudness
Ⅴ. Audio compression technology
Ⅵ. Classification of audio signals
Ⅶ. Audio related parameters
Ⅷ. What are the audio formats?
Ⅰ. What is audio?
Audio refers to audible sounds with frequencies in the 20Hz-20kHz range. Audio is a technical term used to describe generally sound-related devices in the audio range and what they do. All sounds that humans can hear, including noise, are called audio. After the sound is recorded, whether it is speech, singing, or musical instruments, it can be processed by digital music software, or it can be made into a CD. At this time, all the sounds will not change, because the CD is originally an audio file. And audio is just the sound stored in the computer. If there is a computer and a corresponding audio card (sound card), we can record all the sounds. The acoustic characteristics of the sound can be stored in the form of computer hard disk files.
Ⅱ. The form of audio
At present, audio in multimedia computers mainly has three forms: CD audio, MIDI audio and waveform audio.
1. CD-Audio
CD-Audio is digital audio stored on music CD discs. It can be read and collected into a multimedia computer system through a CD-ROM drive, and stored and processed in the corresponding form of waveform audio.
2. MIDI audio
It symbolizes music and saves it in MIDI files, and generates corresponding sound waveforms through a music synthesizer to restore playback.
3. Waveform Audio
It is all forms of sound collected by an external sound source into a multimedia computer through a digitization process. Speech is the human speaking voice in the waveform sound, which has inherent linguistic and phonetic connotations. Multimedia computers can use special methods to analyze, research, and extract the relevant features of speech, and realize the differentiation and recognition of different speeches and the synthesis of speech waveforms through text.
Ⅲ. Audio frequency range
1. Low frequency range (20-150Hz)
It can express the low-frequency components of audio, making the viewer feel strong and dynamic.
2. Intermediate frequency band (150-500Hz)
It can express the expressive power of a single percussion instrument in music. It is the part that expresses strength in low frequencies.
3. Medium to high frequency range (500-5000Hz)
It mainly expresses the clarity of the singer or language and the expressive power of the strings.
Ⅳ. Three elements of audio: pitch, timbre, loudness
1. Pitch
It represents the subjective perception of the human ear on the pitch of the sound. Generally speaking, the pitch mainly depends on the level of the fundamental frequency of the sound wave. The higher the frequency, the higher the pitch, and vice versa. Its unit is expressed in hertz (Hz).
2. Timbre
The timbre is determined by the harmonic spectrum and envelope of the sound waveform. The clearly audible sound generated by the fundamental frequency of the sound waveform is called the fundamental sound, and the sound generated by the small vibrations of each harmonic is called overtone. A single frequency tone is called a pure tone, and a tone with harmonics is called a complex tone. Each pitch has an inherent frequency and overtones of different loudness levels, which can distinguish other sounds with the same loudness and tone. The proportion of each harmonic in the sound waveform and the magnitude of its attenuation over time determine the timbre characteristics of various sound sources. The envelope is the line connecting the peaks of each cycle, and the steepness of the envelope affects the transient characteristics of sound intensity. The goal of High Fidelity (Hi Fi) audio is to accurately transmit, restore, and reconstruct all the features of the original sound field as much as possible, so that people can actually feel various three-dimensional surround sound effects such as sound source positioning, spatial surround, and layer thickness.
3. Loudness
Loudness can also be called volume. It is the perceived quantity of sound magnitude corresponding to sound intensity. Sound intensity is an objective physical quantity, while loudness is a subjective psychological quantity. Loudness is not only related to sound intensity, but also related to frequency. Different frequencies have different loudness levels. Loudness is constrained by sound force. As the sound intensity level increases, the loudness level generally also increases.
The statistical standard of loudness is calculated from the most basic sine wave of digital audio. In order to quantitatively estimate the loudness of a pure tone, we can compare the loudness of this pure tone with a pure tone of a certain sound intensity level of 1000Hz. When the two sounds are considered to have the same loudness in hearing, we can define the sound intensity level of the 1000Hz pure tone as the loudness level of the pure tone at this frequency. The unit of loudness level is Phon. For example, if the frequency of a pure tone is 1000Hz, if its loudness is expected to reach 40 Phons, according to the equal loudness curve, its sound pressure level must reach 40dB SPL.
Ⅴ. Audio compression technology
Audio compression mainly includes two methods, Huffman lossless coding and redundant data elimination.
1. Huffman lossless coding
After deleting the unrecognizable sound signal, the remaining sound signal is compressed and encoded, which is lossless compression.
2. Eliminate redundant data
The main method of this compression is to remove the redundant information of the collected audio. These deleted audio signals cannot be recovered, so it is called lossy compression. Redundant information includes audio signals outside the range of human hearing and masked audio signals. Signal masking is divided into temporal masking and frequency domain masking.
(1) Temporal masking effect
In addition to the masking phenomenon between simultaneous sounds, there is also a masking phenomenon between temporally adjacent sounds, which is called temporal masking. Temporal masking is further divided into leading masking and lagging masking. The main reason for temporal masking is that it takes a certain amount of time for the human brain to process information. Generally speaking, the leading masking is very short, only about 5-20ms, while the lagging masking can last 50-200ms.
(2) Frequency domain masking effect
The human hearing range is 20-20000Hz, but this does not mean that all sounds within this frequency range can be heard. Whether you can hear it is also related to the decibel level of the sound. Sound has a decibel threshold. Sounds above this threshold can only be heard, and sounds below this threshold cannot be heard. The critical value varies at different frequencies. Another situation is that two people with similar tones speak at the same time, one with a loud voice and the other with a low voice, and the person with a low voice will be affected by the loud voice and cannot hear it.
Ⅵ. Classification of audio signals
Audio signals can be divided into two categories: speech signals and non-speech signals.
1. Speech is the material carrier of language and the symbol of social communication tools. It contains rich language connotations and is a unique form of human information exchange.
2. Non-speech signals mainly include music and other forms of sound that exist in nature. Non-speech signals are characterized by not having complex semantic and grammatical information. It has a low amount of information and is easy to identify.
Ⅶ. Audio related parameters
1. Audio
Sound waves with a frequency between 20HZ and 20kHz that can be heard by the human ear.
2. Interleaved mode
It refers to the way of continuous frame storage, that is, first record the left channel sample and the right channel sample of frame 1, and then start to record the digital audio signal storage method of frame 2. Data is stored in consecutive frames.
3. Non-interlaced mode
Firstly, the left channel samples of all frames in a period are recorded, and then all right channel samples are recorded.
4. Cycle
It refers to the number of frames required for one processing. We use this as the unit for data access of audio devices and storage of audio data.
5. Number of sampling bits
It is a parameter used to measure the fluctuation of sound, and it can also be said to be the resolution of the sound card. The larger its value, the higher the resolution and the stronger the ability of the sound produced. Each sample data record is the amplitude. Sampling accuracy depends on the size of the number of sampling bits.
6. Sampling frequency
It refers to the number of times a sound sample is taken per second. The higher the sampling frequency, the better the sound quality, and the more realistic the sound restoration, but at the same time it takes up more resources. Due to the immediate sampling frequency of the human ear, the resolution of the number of sound samples obtained per second is very limited, and too high a frequency cannot be distinguished.
7. Number of bit rates
This is an important standard for measuring sound quality. It represents the data stream traffic in one second, in kbps. In the same compression format, the higher the bit rate, the higher the fidelity of sound quality. However, this relationship is not a simple proportional relationship, but varies depending on the compression algorithm.
8. Number of channels
It refers to the number of channels of sound, often divided into monophonic and stereophonic. Monophonic sound can only use one speaker to produce sound, while stereo sound can use two speakers to produce sound, so that the spatial effect can be felt more. There are of course other channel numbers.
Ⅷ. What are the audio formats?
1. AIFF (Audio Interchange File Format)
It is a sound file developed by Apple for the exchange of audio information between different platforms.
2. MPC
Like OGG, MPC's competitor is MP3. At medium and high bit rates, MPC can achieve better sound quality than competitors. In the medium bit rate, the performance of MPC is not inferior to OGG. The sound quality advantage of MPC is mainly manifested in the high frequency part. The high frequency of MPC is much more delicate than that of MP3, and it does not have the metallic taste of OGG. It is currently the most suitable lossy encoding for music appreciation. Since it is a nascent code, like OGG, it lacks extensive software and hardware support. MPC has good coding efficiency, and the coding time is much shorter than OGG and LAME.
3. OGG
OGG is a very potential encoding, and it has amazing performance at various bit rates. In addition to good sound quality, OGG is also a completely free encoding. OGG has a very good algorithm, which can achieve better sound quality with a smaller bit rate. 128kbps OGG is even better than 192kbps or even higher bit rate OGG. The treble of OGG has a certain metallic taste, so this defect of OGG will be exposed when encoding some instrument solos with high frequency requirements. QG has the basic characteristics of streaming media, but there is no media service software to support it, so digital broadcasting based on OGG cannot be realized. The support of OGG is not high, whether it is hardware or software, it is not as good as MP3.
4. VQF
Another Yamaha format is *.vqf. At its core is a method to achieve higher compression ratios by reducing data traffic while maintaining sound quality. It is technologically advanced, but due to poor publicity, this format is difficult to use. *.vqf can be played on a Yamaha player. At the same time, Yamaha also provides software for converting *.wav files to *.vqf files.
5. WMA (Windows Media Audio)
The WMA format comes from Microsoft, and its sound quality is stronger than the MP3 format and far superior to the RA format. It, like the VQF format developed by YAMAHA Company in Japan, achieves higher compression rates than MP3 by reducing data traffic while maintaining sound quality. The compression rate of WMA can generally reach around 1:18. Another advantage of WMA is that content providers can incorporate anti copy protection through DRM (Digital Rights Management) solutions such as Windows Media Rights Manager 7. This built-in copyright protection technology can limit the number of playback times and even the playback machine, and so on.
In addition, WMA also supports audio streaming technology, which is suitable for online playback on the network. As a pioneer in Microsoft's foray into online music, it does not need to install additional players like MP3 players. The seamless binding of the Windows operating system and Windows Media Player allows you to play WMA music directly as long as you install the Windows operating system. The new version of Windows Media Player 7.0 also adds the ability to directly convert CD discs to WMA sound format. In the newly released operating system Windows XP, WMA is the default encoding format. WMA format can adjust the sound quality during recording.
6. APE
It is an emerging lossless audio encoding that can provide a compression ratio of 50-70%. APE can achieve true losslessness, and its compression ratio is also better than similar lossless formats.
7. MIDI (Musical Instrument Digital Interface)
We should often hear the term MIDI. MIDI allows digital synthesizers and other devices to exchange data. The MID file format is inherited from MIDI. A MID file is not a recorded sound, but a set of instructions to record sound information and then tell the sound card how to reproduce the music. Such a MIDI file only uses about 5-10KB for storing one minute of music. Nowadays, MID files are mostly used for original musical instrument compositions, amateur performances of popular songs and game soundtracks, and electronic greeting cards, among others. The playback effect of *.mid files depends entirely on the grade of the sound card. The biggest use of the *.mid format is in the field of computer composition. It can be written by music software, or the music played by an external sequencer can be input into the computer through the MIDI port of the sound card to make a *.mid file.
8. Real audio
Real audio is mainly suitable for online music appreciation on the Internet. Most users are still using 56Kbps or lower modems, so typical playback is not the best sound quality. Some download sites will prompt you to choose the best Real file according to your Modem speed. Now there are several real file formats: RA (Real Audio), RM (Real Media, Real Audio G2), RMX (Real Audio Secured) and so on. The characteristic of these formats is that the quality of the sound can be changed according to the network bandwidth. Under the premise of ensuring that most people can hear smooth sound, they enable listeners with richer bandwidth to obtain better sound quality.
Recently, with the general improvement of network bandwidth, Real is introducing a CD-quality format for Internet broadcasting. If your Real Player software can't handle this format, it will remind you to download a free update package. Many music websites offer demo versions of songs in Real format. Now the latest version is Real Player 9.0.
9. MP3
The MP3 format was born in Germany in the 1980s. The so-called MP3 refers to the audio part of the MPEG standard, that is, the MPEG audio layer. According to the different compression quality and encoding process, it is divided into 3 layers, corresponding to the 3 kinds of sound files "*.mp1", "*.mp2" and "*.mp3".
It should be noted that the compression of MPEG audio files is a lossy compression. MPEG3 audio coding has a high compression ratio of 10:1~12:1, and at the same time it basically keeps the low audio part undistorted. But it sacrifices the quality of the 12KHz to 16KHz high frequency part of the sound file in exchange for file size. Music files of the same length are generally only 1/10 of *.wav files stored in *.mp3 format, and its sound quality is inferior to sound files in CD format or WAV format. At its inception, no other audio format could match its small file size and sound quality. This also provides good conditions for the development of *.mp3 format. Until now, this format's position as a mainstream audio format has been difficult to shake.
There are many kinds of sampling frequencies for compressed music in MP3 format. It can save space with a sampling frequency of 64Kbps or lower, and can also achieve extremely high sound quality with a standard of 320Kbps.
10. WAV
PGM-encoded WAV files are the format with the best sound quality. Under the Windows platform, all audio software can provide support for it. Many functions in WinARJ provided by Windows can play wav directly. Therefore, when developing multimedia software, people often use wav extensively as event sound effects and background music. Under the same sampling rate and sample size, PM encoded wav can achieve the best sound quality. Therefore, it is also widely used in audio editing, non-linear editing and other fields.
Tags:Audio Products