Relevant mostly to OS X admins
Due to a remodeling project at work, it came to be that I needed to provide temporary Ethernet drops to a lot of areas that weren’t designed to have a human and a VoIP phone sitting there. To make this happen, we added 8 Netgear GSTP110TP switches to our network- PoE, managed, endorsed by a friend, and not expensive- as these are a temporary fix, not years of infrastructure to rely on. Configuration was not complicated: each of these had to handle just the main wired client vLAN and the VoIP vLAN, so the task list boiled down to
Soon we had streams of Cat5e running in all sorts of ways that would make any self-respecting admin hang his head in shame.
During the setup, one other option caught my eye: “auto-VoIP”. Per the Netgear documentation:
The Auto-VoIP automatically makes sure that time-sensitive voice traffic is given priority over data traffic on ports that have this feature enabled. Auto-VoIP checks for packets carrying the following VoIP protocols:
• Session Initiation Protocol (SIP)
• Signalling Connection Control Part (SCCP)
• Media Gateway Control Protocol (MGCP)
Reading this, it sounded like a fine idea to enable this option, and that was done. With the above configuration set, we started testing switches and plugging phones in, and all worked as expected. LLDP allowed the switches and phones to establish that there was a device with a qualifying OUI attached to a port, and therefore put its traffic in the voice vLAN, and despite being a cabling mess, all seemed well with the world.
Then the tickets started trickling in- only from staff using phones attached to the Netgears:
The events were unlike any other networking oddities I’ve tackled: sometimes they’d be magically fixed before my fellow IT staff or I could get down to witness them. We configured our PRTG monitoring to scan the VoIP subnet and start tracking if phones were pingable or not, and we ended up with 2 day graphs that showed that at approximately 24 hour-ish intervals, we’d loose connectivity with phones, in clusters, all members of the same Netgear. They didn’t all go offline at the same moment, but a wave of failure would wash over the group: it might loose G2 at 2P, G3 at 2:04, G5 at 2:07, then G3 would work again, G4 would drop pings, G2 would start working… no pattern that we could see, just a wave of “nope, no traffic going to/from that phone” ranging from 2 to 20+ minutes, that would eventually resolve without our input. Naturally, this never happened in the dark of night: there was the 2P cluster, the 3:45P cluster, and the 6P cluster.
With some guidance from our VoIP provider, we finally determined the culprit: Auto-VoIP. While this might help improve the experience in high-traffic conditions where the voice device isn’t in a prioritized vLAN of its own (such as a small deployment, where this 8 port switch is the only switch), it’s not a benefit when there’s a dedicated voice vLAN that has its own prioritization rules. Not only “not a benefit”, but enabling it caused one of the most unique network issues I’ve ever met. Since disabling auto-VoIP on all ports, this issue has not returned.