Relevant mostly to OS X admins
In regret of Auto-VoIP
May 20, 2016Posted by on
Due to a remodeling project at work, it came to be that I needed to provide temporary Ethernet drops to a lot of areas that weren’t designed to have a human and a VoIP phone sitting there. To make this happen, we added 8 Netgear GSTP110TP switches to our network- PoE, managed, endorsed by a friend, and not expensive- as these are a temporary fix, not years of infrastructure to rely on. Configuration was not complicated: each of these had to handle just the main wired client vLAN and the VoIP vLAN, so the task list boiled down to
- Bring firmware current (who wants to troubleshoot something potentially fixed in last week’s update?)
- Add the OUI for our Polycom phones, since that was not a vendor it recognized out of the box
- Enable LLDP on ports 2-8 (1 was declared to be the uplink to the core stack)
- Add vLANS to the switch, using the Voice VLAN option to match our VoIP vLAN and apply to all ports.
Soon we had streams of Cat5e running in all sorts of ways that would make any self-respecting admin hang his head in shame.
During the setup, one other option caught my eye: “auto-VoIP”. Per the Netgear documentation:
The Auto-VoIP automatically makes sure that time-sensitive voice traffic is given priority over data traffic on ports that have this feature enabled. Auto-VoIP checks for packets carrying the following VoIP protocols:
• Session Initiation Protocol (SIP)
• Signalling Connection Control Part (SCCP)
• Media Gateway Control Protocol (MGCP)
Reading this, it sounded like a fine idea to enable this option, and that was done. With the above configuration set, we started testing switches and plugging phones in, and all worked as expected. LLDP allowed the switches and phones to establish that there was a device with a qualifying OUI attached to a port, and therefore put its traffic in the voice vLAN, and despite being a cabling mess, all seemed well with the world.
Then the tickets started trickling in- only from staff using phones attached to the Netgears:
- Lost audio in one direction while on a call (call stays connected, voice transit suddenly became ONE direction only)
- No dialtone
- url calling disabled shown on the Polycom screen
- More one-way audio issues
The events were unlike any other networking oddities I’ve tackled: sometimes they’d be magically fixed before my fellow IT staff or I could get down to witness them. We configured our PRTG monitoring to scan the VoIP subnet and start tracking if phones were pingable or not, and we ended up with 2 day graphs that showed that at approximately 24 hour-ish intervals, we’d loose connectivity with phones, in clusters, all members of the same Netgear. They didn’t all go offline at the same moment, but a wave of failure would wash over the group: it might loose G2 at 2P, G3 at 2:04, G5 at 2:07, then G3 would work again, G4 would drop pings, G2 would start working… no pattern that we could see, just a wave of “nope, no traffic going to/from that phone” ranging from 2 to 20+ minutes, that would eventually resolve without our input. Naturally, this never happened in the dark of night: there was the 2P cluster, the 3:45P cluster, and the 6P cluster.
With some guidance from our VoIP provider, we finally determined the culprit: Auto-VoIP. While this might help improve the experience in high-traffic conditions where the voice device isn’t in a prioritized vLAN of its own (such as a small deployment, where this 8 port switch is the only switch), it’s not a benefit when there’s a dedicated voice vLAN that has its own prioritization rules. Not only “not a benefit”, but enabling it caused one of the most unique network issues I’ve ever met. Since disabling auto-VoIP on all ports, this issue has not returned.