When SIP messages are sent over UDP, there are several things that go wrong. The first problem is that large SIP messages can be fragmented by the IP stack and some fragments are not delivered.
When ICE is used, the SIP message contains a larger SDP body to encapsulate the ICE candidates. When a softphone attempts a video call, the combination of the video and audio descriptors further enlarges the SDP. More and more frequently in modern RTC deployments, SIP messages sent over UDP exceed the maximum transmission unit and are subject to fragmentation. IP packet fragments are not always routed correctly by other intermediate network components. This was not a problem in the early days of SIP when the vast majority of devices only supported a limited number of audio codecs and overall packet sizes were well under one kilobyte.
A more obscure issue is the presence of routers in homes and small offices that claim to have SIP helper capabilities. These routers try to modify the SIP messages to help them through NAT. In reality, the modifications made by the router can clash with the ICE protocol or other NAT discovery techniques used by the phone or the server.
Sending all the SIP messages over a TLS connection eliminates all of these problems. While there is slightly more effort involved to create a certificate for the server, it saves an enormous amount of ongoing support effort.
See Chapter 9, TLS certificate creation for details about creating the TLS certificates for SIP and XMPP.