Does Lossy And Lossless Sound Different

- Mar 03, 2021-

We are talking about "lossy", where is the "lossy"? First of all, before the advent of mp3, the uncompressed music files that we usually come into contact with are often in the form of files such as wav formed by CD ripping. And CD, or wav, is a kind of digital signal recording, which is the common 0101. What is recorded? Sound, that is, analog signal. Obviously, these two are not the same thing.

How do you record it? It's very simple, through "sampling".

Simply put, it is to select countless points in a period of time, record and play back the analog signal of sound by recording the corresponding parts of these points on the sound waveform, and then connecting these recorded points in sequence. Good sampling is good, but how many samples are appropriate? After all, for the same waveform, there must be a gap between what I collected 3 samples per second and 80 samples per second. Fortunately, there have been studies in this area at this time. According to the Nyquist-Shannon theorem, if the signal bandwidth is less than half of the sampling frequency, then these discrete sampling points can be guaranteed to completely represent the original signal. In other words, if you want to record an analog signal, the sampling frequency should not be less than (precisely, greater than) 2 times the highest frequency in the analog signal spectrum. It is known that the frequency range that the human ear can hear is about 20 to 20000hz. Then, to completely record the sound signal heard by the human ear, the sampling rate must be at least greater than 40000hz, that is, 40000 data points are recorded per second. After a dispute between the inventors of the CD, Sony and Philips, the sampling rate of the CD was set at 44100hz. (As for why it is 44100, Zhihu has been answered earlier, so I won’t explain it in detail here.) At the same time, the 16bit sampling depth of CD recording can provide it with a signal-to-noise ratio of about 96db, which is sufficient for most music recording and playback. . However, the problem is that the wav file of the CD content is too large.

How big is it? We can make a calculation. Generally, CDs record 2-channel content, 16bit accuracy, plus a sampling rate of 44100hz, and the bit rate of CD or wav can be 2×16×44100=1411200 bps=1411.2 kbps. In other words, for a wav file, about 1411kb of data is transmitted in one second. In other words, a 3-minute wav file has a volume of about 3×60×1411.2/8=31747.5kb, which is about 30M. Too big, too troublesome, too unnecessary. At this time, "compression" has become a top priority. Among them, there is naturally a lossless compression that can perfectly restore the wav content from the compressed file, but in order to ensure the "lossless" result of this compression, the compression rate must not be too small, which limits its performance. After all, 20M for a song and 30M for a song, it seems that the difference is not very big. Thus, for a smaller compression rate and smaller file size, lossy compression came into being. The representative among them is mp3, the full name is MPEG-1 Layer 3. The so-called "lossy", as the name implies, this compression format has loss of information in the original file. The question is, how do you decide what to lose? This is where mp3 is smart. In order to explain this problem, we must first talk about some psychoacoustic content. First of all, the human perception of sound comes from our ears to a large extent: the sound is transmitted from the ear canal to the tympanic membrane to cause vibration, and the auditory ossicles then conduct the vibration at the tympanic membrane to the cochlea of the inner ear, and the internal auditory hair cells sense the vibration and It is converted into electrical signals, which are then processed by the brain to complete our perception of sound. It sounds okay. But the problem is that our ears do not conduct all auditory signals. In other words, our brains, including our ears, do not care about some sound information.

This involves a very important psychoacoustic content called Masking Effects". To give a simple example. At Christmas, you are talking on the phone with your girlfriend at home. At this time, there are celebratory cheers outside. Question: At this time you Can you still hear what your girlfriend is saying? Most likely you won’t hear it. (If you can hear it, it involves another important psychoacoustic phenomenon, Cocktail Party Effects." Sounds appear at the same time, and the phenomenon that a louder sound covers a lower louder sound is the so-called "masking effect".

(Of course, the masking effect includes not only this one, but also the time-domain masking caused by several sounds that are not sounded at the same time. The masking relationship between music and noise also needs to be distinguished. I won't talk about it here. I know there is such a thing. That’s fine.) The emergence of the masking phenomenon is directly related to this “no care” attitude: the hair cells in the human ear will only respond to the strongest sound stimulation in a certain area, and the rest It doesn't care at all.

Our "lossy compression" is built on this basis. To put it simply, a very important step in the "compression" of mp3 is to find out the uncareful parts of the human ear and brain in the original sound file by constructing a psychoacoustic model, and then delete and rewrite it. This process involves many aspects such as signal-to-mask ratio (SNR), sub-band signal, filter selection, etc. It is quite complicated. Yes). In this way, mp3 can reduce the impact of compression on file hearing as much as possible under the premise that the compression is small enough. At the same time, after this step is completed, mp3 will use the lossless compression algorithm to further compress the audio data generated after the original compression, so as to obtain a compression rate far less than ordinary lossless compression, and even reach 1:10 or even higher Compression ratio.

This means that the original 1411kbps wav file, after MP3 compression, can reach 192kbps, 128kbps, or even smaller bit rates. Music files that originally required 30M to be loaded, now only need 2.8M of space, which can be easily accommodated. On top of this, the MP3 team has also used methods such as further improving the psychoacoustic model to compile an aac format with a better sense of hearing, which can achieve a smaller volume under the same sense of hearing. Unfortunately, for various reasons, the popularity of aac has been weaker than that of mp3. There are already many people on this point who have been in the popularization of science, so I won't elaborate on it here. Back to the beginning. The reason why I said that the subject of the title "is right if you can't hear it" is basically explained: because mp3, from the very beginning, it was written with the goal of "making it impossible for you to hear it".

But the explanation is not finished yet. As mentioned earlier, in the MP3 format, the decrease in bit rate is obtained by deleting the uncareful parts of the human ear and brain in the original file. The more deleted, the smaller the bit rate, and the smaller the volume of the MP3 file obtained. But if you delete it, there will be a problem: as the bit rate gets smaller and smaller, our brains begin to realize that we seem to be able to hear the difference between MP3 and the original file.

This is the question that the subject may be most concerned about: to what extent is MP3 compressed to ensure that it can be distinguished by comparison with CD?