The CODEC Translator: VoIP toubleshooting on demand

Showing posts with label VoIP toubleshooting on demand. Show all posts

Wednesday, April 11, 2012

Solutions for SIP NAT/Firewall Traversal for VoIP

SIP-based communication does not reach users on the local area network (LAN) behind firewalls and Network Address Translation (NAT) routers automatically. Firewalls are designed to prevent inbound unknown communications and NAT stops users on a LAN from being addressed. Firewalls are almost always combined with NAT and typically still do not support the SIP protocol properly.

The issue of SIP traffic not traversing enterprise firewalls or NAT is critical to any VoIP implementation. At some point, all firewalls will need to be SIP capable in order to support the wide-scale deployment of enterprise person-to-person communications. However, in the short term, several solutions have been proposed to work around the firewall/NAT traversal problem. The bad news is that many of these solutions have serious security implications. The good news: there are other solutions that allow you to remain in control to various degrees. It is important to consider to what level you are prepared to surrender the control of your corporate or carrier infrastructure when choosing a NAT/firewall traversal solution in your network.
As you envision your solution, you should consider these questions:

Who should be in control of my security infrastructure: the firewall administrator, the user or a peering service provider?
Do we want a solution that is predictable and functions reliably with SIP standard compliant equipment or is it sufficient with a best effort solution that works in certain scenarios and maybe only with specific operators?

Universal Plug-and-Play (UPnP) – The SIP client or Windowsis in control

Universal Plug-and-Play (UPnP) for NAT control allows Microsoft Windows or a UPnP-capable SIP client to take control of the firewall. Both the client and firewall must support UPnP. This is a viable alternative only for those that can be sure there will never be anything malevolent on the LAN. UPnP is only supported by few firewalls and SIP clients. Due to the inherent high security risk in allowing a third party software to take control of the firewall this method is rarely used and in practice only for home users.

STUN, TURN, ICE – The SIP client is in control

These are all protocols proposed by the IETF for solving the firewall/NAT traversal issue with intelligence in the clients together with external servers. With these methods, pinholes are created in the NAT/firewall for SIP signaling and media to pass through. It is also the responsibility of the SIP client to emulate what the protocol should have looked like outside the firewall. These methods assume certain behavior from the NAT/firewall and cannot work in all scenarios. In addition, they remove control from the firewall, which must be sufficiently open to allow the users to create the necessary pinholes.

Session Border Controllers at Service Provider – The service provider is in control

Most service providers use some sort of session border controller (SBC) in their core network to perform a number of tasks related to their SIP services. One of these tasks is to make sure that the SIP services can be delivered to their customers. They may use STUN, TURN, ICE for this by acting as a server component for these protocols. However, not all clients support these protocols so the SBC may also use far-end-NAT traversal (FENT) technology for NAT traversal. The FENT function will aid remote SIP clients by transforming any SIP message by rewriting all relevant information and relay media, as well as keeping the client on the NATed network reachable. This solution only works with firewalls that are open from the inside, and may not work with all equipment and in all call scenarios. FENT is best suited for road warriors working at a hotel or at a conference, rather than at fixed location where there are more reliable and secure solutions. FENT also removes control from the firewall, which must be sufficiently open to allow FENT from the service provider SBC to work.

SIP-capable Firewalls or enterprise SBC – The firewall administrator is in control

This is a long-term solution where the problem is solved where it occurs, at the firewall or in tandem with an existing firewall using an enterprise session border controller. When deployed at the enterprise edge the SBC offers the same security and control as it does for the service provider's core network. The enterprise SBC typically has a built-in SIP proxy and/or back-to-back user agent (B2BUA) functionality to give unparalleled flexibility in real-life enterprise deployments. Most vendors of SBCs for service providers have products that can be deployed at the enterprise and then there are other companies that have developed products for the enterprise market from the very beginning.

For an enterprise, there are special security and functional requirements that make the SIP-capable firewall or enterprise SBC the solution of choice. First, it is the only solution that allows the firewall to maintain control of what is traversed between the LAN and the outside world. In addition, it is becoming more and more common to have a SIP server on the LAN. In fact, all SIP-based IP PBXs are SIP servers. In order for these SIP servers to communicate over IP with the outside world, the firewall simply must be SIP-enabled. As many IP-PBXs have done their own SIP-extensions outside the SIP standard it is very important that the firewall or enterprise SBC be adapted to support these extensions.

Due to the complexity of real-life installations, it is highly recommended that you deploy specialized SIP-enabling devices such as SIP-proxy firewalls even in the SOHO and SMB markets, even if it might not be motivated from a security policy perspective.

With the explosion of these various solutions and protocols, comes another potential headache. What do I do if something goes wrong? How do I find the offending service or equipment if jitter occurs or my users start experiencing one way audio? What is causing echo on the conversation? Before considering a large scale enterprise or service provider deployment, you should also consider a robust troubleshooting package. You might determine that additional test equipment is needed, but you will likely find that hosted troubleshooting packages are much more cost effective and future-proof.

Wednesday, April 4, 2012

Packet Voice Quality Problems and Answers

Almost all packet-based voice quality issues are attributable to some type of degradation on the packet network that the voice traffic's RTP stream traverses. Voice traffic brings to light network problems that might otherwise go unnoticed when just carrying normal data traffic. This is because in voice compressor-decompressors (CODECs) packet loss and variable delay in the IP telephony network must be minimized. Let's explore some common network issues that result in poor voice quality and what you can do about it:

Packet Drops

Packet-based telephony demands that speech packets find their destination within a predictable amount of time. There is very little tolerance for them to be dropped somewhere along the way from the source to the destination. In a properly designed network with Quality of Service (QoS) provisioning in place, packet loss should be nearly zero. All voice CODECs can tolerate degrees of packet loss without adversely affecting voice quality. Upon detecting a missing packet, the CODEC decoder on the receiving device makes a best guess as to what the waveform during the missing period of time should have been. Most CODECs can tolerate up to five percent random packet loss without noticeable voice quality degradation. This assumes that the five percent of packets being lost are not being lost at the same time, but rather are randomly dropped in groups of one or two packets. However, losing multiple simultaneous packets, even as a low percentage of total packets, can cause noticeable voice quality problems.

Note: You should design your network for zero packet loss for packets that are tagged as voice packets. A converged voice/data network should be engineered to ensure that only a specific number of calls are allowed over a limited-bandwidth link. You should guarantee the bandwidth for those calls by giving priority treatment to voice traffic over all other traffic.

There are various tools and procedures that you can use to determine whether you are experiencing packet loss in your network and where in the network the packets are getting dropped.

1. Examine IP phone statistics:

If you are troubleshooting at the phone experiencing the problem, access the phone statistics by pressing the help (i or ?) button on the IP phone twice in quick succession during an active call.
If you are working with a remote user, open a web browser on your computer and enter the IP address of the user's phone. During an active call, choose the Streaming Statistics > Stream 1 options from the display.

2. Examine the counters RxDisc and RxLost shown on the IP phone (or Rcvr Lost Packets if you are viewing the statistics remotely using a web browser).

RxLost measures the number of packets that were never received because they were dropped in the network somewhere. By detecting a missing RTP sequence number, the IP phone can determine that a packet has been lost.
RxDisc corresponds to packets that were received but were discarded because they could not be used at the time they arrived. RxDisc can come from an out-of-order packet or a packet that arrived too late.

3. If either of these two counters increments, you should investigate to learn why packets are being lost or discarded.

Regardless of how low your packet loss is, if it is not zero, you should investigate the root cause. It just might be a sign of a bigger problem that will get worse with higher call volume. Always remember that packet loss can be occurring at any layer of the OSI-like transmission model, so be sure to check for all layers possible in each hop. For example, if there is a Frame Relay connection over a T1 between two sites, you should:

Make certain that there are no errors at the physical layer on the T1.
Determine if you are exceeding your committed information rate (CIR) on the Frame Relay connection.
Verify that you are not dropping the packets at the IP layer because you are exceeding your buffer sizes.
Check that you have your QoS improperly configured.
Ensure that your service provider not only guarantees packet delivery but also guarantees a low-jitter link. Some service providers may tell you that they do not provide a CIR but guarantee that they will not drop any packets. In a voice environment, delay is as important as packet loss. Many service providers' switches can buffer a large amount of data, thereby causing a large amount of jitter.

Another common cause of drops in an Ethernet environment is a duplex mismatch, when one side of a connection is set to full duplex and the other side is set to half duplex. To determine if this is the case, perform the following steps:

Check all the switch ports through which a given call must travel and ensure that there are no alignment or frame check sequence (FCS) errors. Poor cabling or connectors can also contribute to such errors; however, duplex mismatches are a far more common cause of this kind of problem.
Examine each link between the two endpoints that are experiencing packet loss and verify that the speed and duplex settings match on either side.

Although duplex mismatches are responsible for a large number of packet loss problems, there are many other opportunities for packet loss in other places in the network as well. When voice traffic must traverse a WAN, there are several places to look. First, check each interface between the two endpoints, and look for packet loss. If you are seeing dropped packets on any interface, there is a good chance that you are oversubscribing the link. This could also be indicative of some other traffic that you are not expecting on your network. The best solution in this case is to take a trace to examine which traffic is congesting the link.

Hosted/Web-based analysis tools are invaluable in troubleshooting voice quality problems. With these tools, you can examine each packet in an RTP stream to see if packets are really being lost and where in the network they are being lost. With such a tool, perform the following steps:

Start at the endpoint that is experiencing the poor-quality audio where you suspect packet loss.
Take a trace of a poor-quality call and filter it so that it shows you only packets from the far end to the endpoint that is hearing the problem. The packets should be equally spaced, and the sequence numbers should be consecutive with no gaps.
If you are seeing all the packets in the trace, continue taking traces after each hop until you get a trace where packets are missing.
When you have isolated the point in the network where the packet loss is occurring, look for any counters on that device that might indicate where the packets are being lost.

Queuing Problems

Queuing delay can be a significant contributor to variable delay (jitter). When you have too much jitter end-to-end, you encounter voice quality problems. A voice sample that is delayed over the size of the receiving device's jitter buffer is no better than a packet that is dropped in the network because the delay still causes a noticeable break in the audio stream. In fact, high jitter is actually worse than a small amount of packet loss because most CODECs can compensate for small amounts of packet loss. The only way to compensate for high jitter is to make the jitter buffer larger, but as the jitter buffer gets larger, the voice stream is delayed longer in the jitter buffer. If the jitter buffer gets large enough such that the end-to-end delay is more than 200ms, the two parties on the call feel like the conversation is not interactive and start talking over each other.

Remember that every network device between the two endpoints involved in a call (switches, routers, firewalls, and so on) is a potential source of queuing or buffering delays. The ideal way to troubleshoot a problem in which the symptoms point to delayed or jittered packets is to use a hosted diagnostic tool at each network hop to see where the delay or jitter is being introduced.

Also remember that once you've made changes to your network elements and call paths that it is always appropriate to perform regression testing across your network with some type of automated voice quality test package.

Saturday, March 3, 2012

MOS Results are Good but Customers Still Complain of Poor Speech Quality

When monitoring the speech packets of a VoIP network, i.e. the RTP stream, protocol analyzers or VoIP service monitoring systems may analyze the packets and calculate a Mean opinion score (MOS) which describes the quality of the speech. If customers call into your call center or NOC and complain about speech quality this is the first place you would go to analyze the problem. Finding the bad call the customer is referring to will take a few seconds with a good voice service assurance system. Sometimes when you find the call, the MOS maybe 4.0 or greater indicating good speech quality.

How can we explain this discrepancy?

The method used to calculate MOS by a monitoring system which needs to monitor thousands of active calls and produce voice quality measurements or QoS of all of them is R-factor based on the E-model (ITU-T G.107). R-factor takes into account the codecs used by the endpoints (VoIP Phones) in the call. But most of the calculation is derived from what happens to the packet stream as it transits an IP packet network. So R-factor requires measurements for packet loss and jitter of the packets in the stream of VoIP (RTP) packets

How is packet loss and jitter detected in an RTP stream? The RTP packets contain a sequence number and a time stamp and from this packet loss and jitter can be calculated. It is however, important to understand that these measurements can only be made on the received RTP streams i.e. packet loss and jitter is only measured up to that point in the network where the monitoring system probe is connected.

RTP streams leaving your network going to your customer’s premises will not be measured unless you have a probe on the customer’s premises or their VoIP phones support RTCP. A sophisticated good voice service assurance system will also read the RTCP packets coming back from the customer’s probes and so that leg of the call can be included in the R factor measurement. Some monitoring systems will show you in which leg of the call the packet impairments are added.

If call quality is bad but MOS is good, perhaps you are not monitoring all legs of the call.

Another reason may account for this discrepancy. MOS or Perceptual Speech Quality does not include echo or overall end-to-end delay. So if the customer is experiencing echo, this will not degrade the MOS values. Similarly, if there is delay in the network (as opposed to jitter, which is short-term varying delay), this will not impact the MOS value. Network Delay in VoIP causes cold or stinted conversation, so this this could be a reason for your customer to call you.

If customers complain of poor speech quality but the MOS results are good, use your voice service assurance system to record the calls from this customers (with disclaimers and forwarding necessary of course) and listen to a sample for echo or delay.

Thursday, January 5, 2012

Attacking VoIP Call-Quality Issues

Call-quality issues are a bit trickier with VoIP than with traditional telephony.

One of the main problems with VoIP are the negative effects of the delays in transmission of packets. The latency is always inherent in converting a call to and from VoIP. But no matter how quickly that conversion occurs, if there are several, the cumulative effective latency can have noticeable degradation in quality. An end-to-end cumulative latency of great than 200 milliseconds (ms) causes human perceptible delay and results in conversations with participants that repeatedly interrupt each other.

Fluctuations in latency are also common on VoIP networks. This is commonly referred to as jitter. Jitter is in most cases caused by a leg in the call path that is sharing traffic with other applications or calls. The most common symptom of jitter is called "clipping."

A normal conversation sounds like this:
"Hi dear, shall I stop at the store to pick up some milk?"
but clipping produces:
"I ear, all op de ore ick om ilk"

In future posts we'll explore how to isolate latency and jitter to one or many legs of a VoIP call.

Friday, November 18, 2011

The True Customer Experience of a Voice Service

Customers perceive the quality of a voice service and the reliability of a voice service based on many different experiences. Such experiences may be good or bad, but very often customers notice only the bad experience – obviously as they buy a service that is meant to work 100% at any time and from anywhere.

Bad experiences are often caused by network related problems:

• Calls with degraded voice quality
• Interrupted or Dropped calls or VoIP Dropped calls
• Unsuccessful call attempts
• Missed calls that did not ring or calls that were not signaled

Also end device related problems bring negative user experience:

• Empty battery on smartphone
• End device & VoIP Endpoint crashes
• Inability to setup 3 party conference call

For all voice operators, over-the-top (OTT) and legacy networks, these end-device related troubles are very important. Because in the end the customer does not care at all why something is not working! He will blame it on his voice service provider anyway.

As a result when considering the management of customer experience in next generation voice networks, it is very important to look beyond your own network infrastructure. Internet connections, utilization and network problems on the customers’ corporate LAN and end devices cause a large portion of the problems. So it would definitively not be a good idea to consider a service running well, simply because in one’s own network, all systems are up and running.

Instead operators also need to manage:

• End device & VoIP Endpoint firmware versions
• App/softphone software versions
• IP link quality characteristics
• Performance and problems at your Customers’ premises
• Error logs from the end device

Very often problems can be solved by optimizing configurations, codec settings, centralized firmware updates or pro-actively contacting the customer about up-coming problems with his device or software. So a tool is needed that see the endpoint or VoIP device as well as the network. But for all of that, a customer experience management system must provide the full set of information about how the customer truly perceives the service End-to-End.

When looking at customer experience management for VoIP networks, operators should choose a holistic approach and carefully select software solutions that are designed to look far beyond the usual scope and take care about the most important stakeholder in this game – the customer.

Sunday, November 6, 2011

Visualizing a VOIP Security Attack

Visualizing a cyber attack on a VOIP server from Ben Reardon, Dataviz Australia on Vimeo.
Prevent a VoIP security attack on your systems!

Friday, October 21, 2011

Network Monitoring - It's Your Livelihood

What are your daily duties as the VoIP network operations director? You have to keep your network up and running. You have to answer calls which may relate to situations like “X location is down” or “Y location is slow”. My we suggest that you proactively monitor your network as described below and perform tasks like:

Monitor your network and take actions with respect to situations like device and line failures.
Analyze line/physical facility utilization, errors on the facility and be sure about network performance and conformance to SLAs.
Be aware of what "talks to what" and when? Be sure how much bandwidth is needed for every single application riding your network (and the networks you traverse.)
Know your exact data flows over your networks.

If you have all this information at your disposal, people will think twice before they point finger at you.

But how can you achieve this?

You need a phased approach to understand network monitoring. I am not talking about network layers, but network monitoring layers. We have to involve deeply to monitoring layers before deciding about network monitoring software needs. A simple summary could include these:

Preconditions of network monitoring.

Up/Down monitoring

Performance Monitoring / SNMP monitoring

Who talks with whom? / Netflow monitoring

Data capture / Data sniffing

Preconditions of Network Monitoring
Network documentation is essential to monitor a network. Trying to set up network monitoring tools before going through the documentation is a complete waste of time. You will see everything green on the screen, but this maybe due to one of the redundant lines that are down. You will sit staring without knowing what is happening. Always remember, documentation comes first and everything follows.Suggested Network documentation tools: Powerpoint/Visio, NetViz

Up/Down monitoring
Design a map in which you can see some red and green lights glowing. Green means up and red means down. It is simple yet powerful. You will immediately come to know that there is some problem if the red light glows.This is based on ping. Almost every IP devices support echo/echo reply. So, you can monitor all IP devices in your network by using ping. Go one step further by monitoring one application at a time present on a device instead of whole device. All network applications utilize TCP/UDP ports. You can monitor the applications by trying to access with telnet to its TCP/UDP ports. The port being open suggests that the application is running
Suggested monitoring tools: WhatsupGold, nmap

Performance monitoring / SNMP monitoring
The lines are up, the devices are up, but life is not perfect. People may complain about the performance of data lines, but are they saturated or do they have plenty of spare bandwidth? Is there packet loss on the lines? Are routers running out of memory? We need SNMP to monitor the heart beat of the network.
Suggested monitoring tools: MRTG, Solarwinds Orion, PRTG

What is "talking" with what? / Netflow monitoring
You may realize that the line is full, but is someone or some applications increasing traffic load enormously. Who are they? Is it necessary traffic? In some devices, by using “ip accounting” command you can get an idea of current traffic sources and destinations. Nevertheless, to analyze and to optimize the traffic we need flow monitoring. We need to know source and destination IP addresses and TCP/UDP ports and number of packages/bytes.

Everyone blames the network speed until you publish an network usage report that clearly shows only 15% of the traffic is ERP traffic and rest comes from Internet access.You should know that flow monitoring tools requires more server resources, since they collect enormous amount of data.
Suggested monitoring tools: Fluke Netflow monitor, Paasler

Data capture / RMON – Sniffer tools
Sometimes you need to observe the exact data flow on the line and not just information about it. Just have a look at this sample scenario. After you find out that the web service causes inappropriately high network traffic, the owner of the application just can say “No, we are not pushing this much of data to network. We just respond Yes or No in this web service and it is just 100 bytes”. Therefore, you should sniff the data flow on the line. Maybe, you will find that web service responds yes or no (100 bytes) and with the definition of web service (6 kilobytes).
Suggested monitoring tools: Wireshark, Palladion

You can have a look at Network Monitoring Tools in Stanford University web site for a superb list of network monitoring tools. You can find another tidy list at Network Traffic Monitoring in Alan Kennington’s topology.org.