In January 2019, a critical flaw was reported in Apple's FaceTime group chats feature that made it possible for users to initiate a FaceTime video call and eavesdrop on targets by adding their own number as a third person in a group chat even before the person on the other end accepted the incoming call.
The vulnerability was deemed so severe that the iPhone maker removed the FaceTime group chats feature altogether before the issue was resolved in a subsequent iOS update.
Since then, a number of similar shortcomings have been discovered in multiple video chat apps such as Signal, JioChat, Mocha, Google Duo, and Facebook Messenger — all thanks to the work of Google Project Zero researcher Natalie Silvanovich.
"While [the Group FaceTime] bug was soon fixed, the fact that such a serious and easy to reach vulnerability had occurred due to a logic bug in a calling state machine — an attack scenario I had never seen considered on any platform — made me wonder whether other state machines had similar vulnerabilities as well," Silvanovich wrote in a Tuesday deep-dive of her year-long investigation.
Although a majority of the messaging apps today rely on WebRTC for communication, the connections themselves are created by exchanging call set-up information using Session Description Protocol (SDP) between peers in what's called signaling, which typically works by sending an SDP offer from the caller's end, to which the callee responds with an SDP answer.
Put differently, when a user starts a WebRTC call to another user, a session description called an "offer" is created containing all the information necessary setting up a connection — the kind of media being sent, its format, the transfer protocol used, and the endpoint's IP address and port, among others. The recipient then responds with an "answer," including a description of its endpoint.
The entire process is a state machine, which indicates "where in the process of signaling the exchange of offer and answer the connection currently is."
Also included optionally as part of the offer/answer exchange is the ability of the two peers to trade SDP candidates to each other so as to negotiate the actual connection between them. It details the methods that can be used to communicate, regardless of the network topology — a WebRTC framework called Interactive Connectivity Establishment (ICE).
Once the two peers agree upon a mutually-compatible candidate, that candidate's SDP is used by each peer to construct and open a connection, through which media then begins to flow.
In this way, both devices share with one another the information needed in order to exchange audio or video over the peer-to-peer connection. But before this relay can happen, the captured media data has to be attached to the connection using a feature called tracks.
While it's expected that callee consent is ensured ahead of audio or video transmission and that no data is shared until the receiver has interacted with the application to answer the call (i.e., before adding any tracks to the connection), Silvanovich observed behavior to the contrary.
Not only did the flaws in the apps allow calls to be connected without interaction from the callee, but they also potentially permitted the caller to force a callee device to transmit audio or video data.
The common root cause? Logic bugs in the signaling state machines, which Silvanovich said "are a concerning and under-investigated attack surface of video conferencing applications."
Other messaging apps like Telegram and Viber were found to have none of the above flaws, although Silvanovich noted that significant reverse engineering challenges when analyzing Viber made the investigation "less rigorous" than the others.
"The majority of calling state machines I investigated had logic vulnerabilities that allowed audio or video content to be transmitted from the callee to the caller without the callee's consent," Silvanovich concluded. "This is clearly an area that is often overlooked when securing WebRTC applications."
"The majority of the bugs did not appear to be due to developer misunderstanding of WebRTC features. Instead, they were due to errors in how the state machines are implemented. That said, a lack of awareness of these types of issues was likely a factor," she added.
"It is also concerning to note that I did not look at any group calling features of these applications, and all the vulnerabilities reported were found in peer-to-peer calls. This is an area for future work that could reveal additional problems."