The CODEC Translator: Can You Hear Me Now?

PolQA is the new ITU-T Standard for Speech Quality Measurement which embraces Wideband or High Definition Telephony.

“Can you hear me now?”

We’ve all heard the refrain. How often have you been on a mobile phone & not been able to hear your calling party? How often have you experienced drop-outs on a VoIP call and missed that vital clue, that's important piece of information the caller mentioned which allowed you to understand their needs. May be you lost the business as a result. Good clear speech quality means productivity, both in business and in personal life. Everyone is critically busy these days and if you have to ask folks to repeat themselves, you waste time, first-rate meaningful conversation and miss information.

The existing telephony network uses 200-34000Hz analog bandwidth, digitized at a sampling rate of 8kbps. 8 bits of vertical resolution multiplied by 8kbps gives the traditional 64kbps bandwidth required for a voice channel. Compression by codecs such as G.729 and iLBC VoIP and specifically iSAC for Skype and GSM-FR & EVRC for wireless transmits narrowband traditional telephony at data rates as low as 4kbps.
So now we can compress voice sports to very low bandwidths and at the same time we have broadband Internet. so what can we do to improve speech quality.

Wideband or High Definition Telephony technology is now appearing in VoIP networks and wireless networks using voice codecs such as G.722 and WB-AMR. This provides speech with an analog bandwidth up to 7kHz and gives a richer listening experience. Those problems you currently have trying to recognize which of your young nieces or nephews is speaking to you is due to high frequencies filtered out with narrowband telephony. Wideband telephony will reinstate these, enriching your telephone conversation experience and improving productivity through speech clarity. This technology will eventually send telephony speech all the way up to 20 kHz, the limit of human hearing, equivalent to hi-fi music systems.

3gpp release 5 introduces AMR-WB codec which gives enhanced speech quality using data rates of only 16kbps. So wideband telephony or high definition telephony is being made available to wireless cellular networks.

Tools to Automatically Measure Speech Quality

Determining the subjective speech quality of a transmission system has always been an expensive and laborious process. The tool described in ITU-T Rec. P.862 Perceptual Evaluation of Speech Quality – PESQ provides a rapid and repeatable result in a few moments. PESQ is an objective measurement tool i.e. a computer measures the quality of the received audio in relation to the audio that was transmitted. PESQ predicts or has a very accurate close correlation to the results of subjective listening tests [i.e. human beings listening to speech files] On telephony systems. The resulting quality score is analogous to the subjective “Mean Opinion Score” (MOS) measured using panel tests according to ITU-T P.800. Strictly speaking, MOS is a score derived from human subjective testing. The PESQ scores are calibrated using a large database of subjective tests.

The ITU-T selection process that resulted in the standardization of PESQ involved a wide range of conditions, with demanding correlation requirements set to ensure that it has good performance in assessing conventional fixed and mobile networks and packet-based transmission systems.

Since ITU-T Rec. P.862 was originally released in 2000, further mappings of the PESQ score have been created. PESQ-LQ modified the score to improve correlation with subjective test results at the high and low ends of the scale where the raw PESQ score was found to be less accurate. A new mapping described in ITU-T Rec. P.862.1 was been released that further modified the raw score and correlated better to subjective testing.

PESQ Shortcomings - Time Warping

PESQ takes into account coding distortions, errors, packet loss, delay and variable delay, and filtering in analogue network components. The user interfaces have been designed to provide a simple access to this powerful algorithm, either directly from the analogue connection or from speech files recorded elsewhere.

PESQ Shortcomings
Noise Reduction: (Subjective > PESQ)

The performance of a network or a network element can be fully characterized using high quality analog test equipment and PESQ. High quality analog interfaces are needed because the test equipment itself very easily introduces impairments which are included in the measurement and drank the desk score lower than should be measured for the system under test or network element. Whilst it is possible to use phonetically balanced sentences and other test patterns, accurate and repeatable measurements of the active speech level, activity, delay, echo, noise and speech quality can be obtained quickly using artificial speech test stimulus in different languages, which comprehensively tests all voice sounds the codec may be incident with, but at the same time achieves the process quickly in a time efficient way. A graphical mapping of the errors provides a useful insight into how the signal has been degraded and exactly what kind of sounds course the codec core system and test problems.

Since the launch of PESQ in 2000, there have been many advances in codec design. Unfortunately, PESQ was not trained on these later designs and can produce scores that are lower than expected from subjective tests. Time-warping and voice quality enhancement techniques are particularly difficult for PESQ. The ITU agreed on a new standard, P.863 POLQA, in 2010. POLQA addresses many of the issues and produces reliable scores for codecs, both old and new. POLQA is Now available on a couple of speech quality measurement platforms but Malden is the only platform that provides a quiet, high-quality analog front-end and the only platform to be recommended.

The CODEC Translator

Tuesday, October 25, 2011

Can You Hear Me Now?

1 comment: