Development of a wideband scalable speech codec and contribution towards ITU-T Standard

Along with the transition from conventional public switched telephone network (PSTN) to broadband networks using xDSL and optical fibres, there is a growing need to provide a high quality speech communication services on Next Generation Network (NGN).

The research started off questioning how a speech coding should be designed for communication over broadband IP networks, where traditional emphasis has been solely on compression efficiency. The design strategies were to (a) high-quality speech for next generation communication services, (b) high inter-operability with widely used G.711, and (c) low algorithmic delay and low complexity for processing. Based on these, the outcome was a scalable wideband speech coding scheme that embeds G.711 as a core bitstream. Moreover, aimed for widely deployment in the international market, the codec became the basis for an international technical specification, International Telecommunication Union, Standardization Sector (ITU-T) Recommendation G.711.1. This codec has the best wideband speech quality ever standardized at ITU-T and has contributed in starting off high-quality speech telecommunication over IP networks such as NGN.

On top of the most widely deployed G.711 bitstream, the original algorithm in this research adds two extra bitstream layers; one enhances the lower band (narrow-band part, 50 - 4000 Hz) and the other extends the bandwidth up to 7000 Hz. This enables speech communication in wideband (50 - 7000 Hz) while maintaining high inter-operability with the conventional G.711. This bitstream structure is also advantageous for audio mixing applications such as tele-conferencing connecting multiple remote locations, utilized by partial mixing. When using conventional wideband codec, audio stream mixing is usually computationally expensive, but when G.711.1 is used in conjunction with partial mixing, the wideband audio mixing complexity is kept as small as normal narrowband G.711 mixing one.

The research resulted in an international speech coding standard with a high-quality and low-complexity profile. The processing frame length was confined to 5 ms, which means that the algorithmic delay is very low, and the suppressing the complexity means that it is implementable on low-grade and cheap processors. The standard also embraces a packet-loss concealment algorithm suitable for transporting over IP networks, and is accompanied with a low-complexity partial mixing algorithm, adding a new dimension to speech codec capabilities other than compression.

For the contribution of the IEICE has awarded Yusuke Hiwasaki, Shigeaki Sasaki, and Hitoshi Ohmuro, Achievement Award in 2009.


[1] Recommendation ITU-T G.711.1: "Wideband Embedded Extension for G.711 Pulse Code Modulation," Mar. 2008.
[2] Y. Hiwasaki and H. Ohmuro: "ITU-T G.711.1: Extending G.711 to Higher-Quality Widband Speech," IEEE Communications Magazine, Vol.47, No.10, pp.110-116, 2009.

Related Researches


(Human communication)

Events in World

no data.
Page Top