In the world of communications systems testing and monitoring, MOS is an acronym for Mean Opinion Score. A MOS is used to evaluate and characterize the transmission characteristics of a telecommunications system.
If you are not using an automated system to calculate MOS, you need the following things:
- A group of people to serve as test subjects
- A communications system
- A sound-proof room
The MOS Process
The process for creating a MOS value is as follows:
Test subjects enter a sound-proof room one at a time. Each person is instructed to rate the quality of what he or she hears through a handset or other listening device. The test subjects either listen to pre-recorded voice clips or carry on a conversation with someone outside the sound-proof room. Either way, the test subjects rate each voice clip or conversation with one of the following terms: Excellent, Good, Fair, Poor, or Bad.
After each test subject has had a turn in the room, the test administrator assigns a numeric value to each answer (Excellent=5, Good=4, Fair=3, Poor=2, Bad=1), adds up the results and calculates the average. That average is the Mean Opinion Score.
The MOS Standards
MOS is defined by ITU-T recommendation P.800. The recommendation was adopted by the ITU in 1996, largely in response to the rapid deployment of digital technology at the time. The timing was fortunate because as development of VoIP technology took off in the late ’90s, the industry really needed a common language to fully understand the speech quality levels produced by that early equipment.
Automatic Creation of MOS Values
You may be wondering how you see so many MOS values coming from automated testing and monitoring systems, yet there isn’t a single sound-proof room on the premises. The reason is that automated systems must use objective methods to measure voice quality. The objective methods can be used for both active and passive monitoring systems, and they’ve evolved over the years.
Below are a few of the objective methods we’ve seen here at Empirix.
|PSQM (ITU-T P.861)||0.0 – 6.5||Lower is better|
|R-Factor||0.0 – 120||Higher is better|
|PESQ (ITU-T P.862)||-0.5 – 4.5||Higher is better|
Regardless of which method is used, the result is typically mapped to MOS.
Why do people continue to use MOS as a standard?
MOS is easy to understand and it avoids apples to oranges comparisons. It’s true that scoring has become a bit more complicated over the past 20+ years, largely due to the need to map objective measurements to a subjective scale (see ITU-T P.800.1 for the 6 variations of MOS). However, you can still rely on the fact that when you’re talking about MOS, 1=bad, 5=excellent.