转迈阿密大学的一个项目：28.8Kpbs下的MPEG Layer2 编码方案

名称：Methods to Reduce the Bandwidth Requirements of MPEG-1 Layer II Audio Data for Transmission Speeds of Less Than 28.8 kbps

http://www.music.miami.edu/programs/mue/research/klampert/cover.html

作者：Kirk Lampert

该论文仅仅是提出一些设计方案，并没有形成可信赖的结果，但他的一些对MPEG2 进一步压缩的方案值得思考，他的Delta Coding，（VLBR算法）。

结论引用并思考

“

7. Conclusions

The VLBR algorithms were tested against Progressive Network's Real Audio encoder and VocalTec's I-Wave encoder. Within the limits of MPEG-1 Layer II the encoder managed to reach a level of performance almost equivalent to, and sometimes matching that of the I-Wave encoder while using less bandwidth. Earlier in this project the VLBR files were tested against Real Audio version 2.0. The quality of almost all of the VLBR files is equivalent to a 28.8 kbps Real Audio 2.0 file and much better than a 14.4 kbps Real Audio 2.0 file. Real Audio 3.0 provides much greater sound quality, surpassing both Real Audio 2.0 and, as was shown in this project, the VLBR encoding scheme.

This project demonstrated that, for certain music types, further bandwidth reduction is possible while retaining an acceptable quality level; meaning the quality is good enough to allow enjoyment by the listener without becoming too distracting. This is done by using the three controls developed in this project to tweak the encoding process until the resulting files sound acceptable at the desired bitrate. In most cases the preferences of the listening panel (shown in Figure 7) did not match that of the author. This indicates that a better method of testing is to have the panel listen to the samples a number of times throughout the process until the quality results are as high as possible. Figure 7 shows that the panel preferred high frequency content with more noise over the alternative of less noise with less high frequencies. Higher frequencies could have been added (at the cost of more noise) with the three controls and may have yielded higher quality ratings.

The MPEG-1 Layer II compatible tools that were created are unavailable in other encoders. These tools give the user more control over the very low bitrate encoding process, allowing the user to ensure the encoded files sound as good as possible. Without these tools it is neither possible to encode MPEG-1 files with a bandwidth of less than 32 kbps, nor is it possible to actively control how the encoder chooses to allocate bits.

The most useful of these tools was the four sets of subband bit allocation scalars. The encoder typically keeps the quantization noise at a constant level across the eight subbands. These scalars allow the noise level of a particular subband to be raised or lowered by a constant amount. Without fluctuations in the noise level among individual subbands the listener is more readily able to begin ignoring it and focusing on the music instead.

The temporal capabilities of the four sets of subband bit allocation scalars were not as useful as anticipated. Because one MPEG-1 Layer II frame is 24ms to 28ms long it is easy for the listener to perceive the frequency changes imposed by a subband that is essentially being turned on and off at a fixed frequency. To remedy this the "on / off" frequency must be raised or lowered so as not to be perceived up by the listener. This frequency cannot be raised because a subband cannot be turned on and off more than one frame at a time. It can be lowered by leaving a subband on for a specified number of frames and off for the same number of frames. The slowness by which the subbands are switched in and out, however, prevents the ear from being tricked into not noticing the missing subbands. In some cases, such as the voice sample, the switching in and out of a subband seemed to match up nicely with the source material. The temporal timing can be set somewhat so that the "on cycle" of certain subbands coincides with syllables where that subband is preferable. Obviously this takes much patience and luck and is not an effective way to encode audio files.

The maximum bit allocation control was useful in certain cases. Unlike the four sets of subband bit allocation scalars this control can hard limit the number of bits that are allocated to a subband. This "headroom" setting may or may not be effective depending on whether the encoder is actually allocating more than the number of bits the limit is set to. The advantage to this control is that if the limit for a subband is reached then the remaining bits in the bit pool can be allocated to another subband. In this way a certain subband can be "doped" where it wouldn抰 be otherwise. Thus if the user knows a certain subband should receive more bits, the user has the ability to force the encoder to do so.

The least useful of all the controls was the bit pool scalar. This was unexpected at first since the MPEG encoder is supposed to use whatever bits it has as efficiently as possible. In every case where the bit pool was scaled the files contained a large number of artifacts making them unusable. Closer inspection, however, reveals why this happens. As noted, a 32 kbps MPEG-1 file only uses the lower eight subbands because there are not enough bits to efficiently encode more. The same applies for the VLBR files. If the number of bits in the bit pool is reduced to half, the encoder is not able to efficiently encode all eight subbands. This provides some validation for the other two controls which can be used to more intelligently allocate the bits. One could simply remove the upper five subband to achieve 18 kbps. Five must be removed rather than four because the bit allocation is not even; subbands one and two typically receive the most bits. While this would work, the resulting files would lack sufficient high frequency content. The algorithms developed in this project use more than the lower three subbands to gain more high frequencies. The cost of this is audible quantization noise in the lower subbands (where bits were taken from) and in the upper subbands (where only a few bits could be added).

In summary this project created three new tools that allow standard MPEG-1 Layer II stream decoders to receive moderate quality low bandwidth files over standard 28.8 kbps modem connections. All three tools, when used in conjunction with the coolmpeg.txt output file, can also be used as a learning tool to gain a greater understanding of the MPEG-1 encoding process.

”