SIP SDP RTSP RTP RTCP webrtc

rfc1889 rfc2326 rfc3261 rfc3550 rfc3856 rfc6120.

SIP SDP RTSP RTP RTCP，就像他们出现的顺序一样，他们在实际应用中的启用也是这个顺序： SIP（一般基于tcp）用于设备或用户(准确的说是 Internet endpoints)地址管理、设备发现并初始化一个Session，并负责传输SDP包；而SDP(是一个资源描述协议，与传输无关，大多数时候只能包含到其它协议中作为资源描述，更像是一个规范)包中描述了一个Session中包含哪些媒体数据，邀请的人等等；当需要被邀请的人都通过各自的终端设备被通知到后，就可以使用RTSP(串流协议，一般基于tcp传送，主要作用是使用sdp来描述流而非设备的信息)来控制特定Media的通信; 最后，若RTSP控制信息要求开始Video的播放，那么就开始使用 RTP（或者TCP）实时传输数据，在传输过程中，RTCP要负责QoS等。

因此，一般流程是这样的：先基于sip注册、发现并描述设备或用户(准确的说是 Internet endpoints)，再基于rtsp描述流媒体，再基于rtp/rtcp协议对进行载荷传输与控制。

SIP被称之为VoIP的信令协议,他可以处理呼叫的建立,呼叫的控制和呼叫的终结,并且产生CDR用来计费. SIP提供连接的建立、改变和终结。SIP transparently supports name mapping and redirection services, so it supports personal mobility.

sip与rtsp的区别：sip是通信session管理协议(会议协议)，它有 invite, busy, trying(ringing), bye(hangup)等控制命令；而rtsp是串流协议，它有 option, describe, setup, play,等控制命令。

SIP的信令和消息传送是基于文本的，是平面化(Flat)的数据表达,解析起来缺少规律性,在新增数据消息体的时候缺少继承性,需要开发新的代码来封装和解析,原有代码的继承性比较差。而XMPP采用开放的标准的XML表达，是一种结构化的消息结构，能够方面表达层次化的消息内容，表达内容的内在逻辑。这种XML的结构对应用的扩展和内容的解析带来极大的方便，大量软件代码可以重复利用。

SIP和XMPP都是应用层的协议，主要用来在互联网上发送语音和即时通信IM。RFC 3521定义了SIP，RFC3920定义了XMPP。XMPP来自即时通信系统，而SIP类似语音和视频通信。XMPP增加了Jingle扩展协议来支持面向连接的业务，如语音和视频；而SIP增加了SIMPLE协议来支持即时通信业务。

SIP是双向对称，客户端和服务器都可以主动发起连接请求并响应，这种对称连接的方式在穿越NAT和firewall的时候，带来很大的复杂性，无法保证穿越NAT。而XMPP是单向的连接，只有Client可以向Server发起连接请求，server不会向Client发起连接。这样便于NAT和firewall的穿越。

由IETF制定的SIMPLE（SIP for Instant Messaging and Presence Leveraging Extensions）协议簇对SIP协议进行了扩展，以使其支持IM服务。并且SIMPLE为呈现服务（Presence Service）新增了一些逻辑实体和功能。

webrtc主要目的是在基于浏览器的网络应用中加入实时音视频通信功能，它包含来自gips公司的强大ip语音处理技术，如回声消除、自动增益(AGC)、降噪处理等等技术gips属于世界绝对顶尖的，google将gips购买并开源了，such as VoiceEngine, VideoEngine, NetEQ, AEC, etc all stem from the GIPS acquisition。 webrtc需要与sip等信令协议协同提供基于强大语音处理技术的实时通信，单单sip也可以实现实时语音视频通信，但就失去了采用顶级语音处理的功能，除非你解开webrtc来自己集成进去。后来基于这个核心功能，webrtc更进一步实现了一些音视频通信的通用处理模块进去，提供了视频会议的核心技术，包括音视频的采集、编解码、网络传输(它也采用常规的rtp/rtcp)、jitter buffer、显示等功能，并且还支持跨平台。这个地址对webrtc介绍得相当详细：http://www.cnblogs.com/lingyunhu/p/3621057.html。 kurento是什么：搞视频会议就会涉及一对多、多对多、广播、转码、混音、合屏、录制，这就需要用到流媒体服务器，而kurento就具有这些功能。他主要用来作为webrtc的流媒体服务器，因为BUG多，目前不适于商用，不过前景可期。

webrtc的书和资料：

WebRTC codelab: 一步步的操作告诉你怎么创建一个文本和视频的聊天应用,他用的是运行在Node上的Socket.io信令服务.

2013 Google I/O WebRTC presentation with WebRTC tech lead, Justin Uberti.

Chris Wilson's SFHTML5 presentation: Introduction to WebRTC Apps.

这本书(WebRTC Book)给出了详细介绍关于数据和信令通道, 也包括网络拓扑图的细节.

WebRTC and Signaling: What Two Years Has Taught Us: TokBox的博客介绍了为什么没有在规范中定义信令是一个好主意.

Ben Strong's presentation A Practical Guide to Building WebRTC Apps 提供了许多WebRTC的拓扑结构和基础设施架构.

The WebRTC chapter in Ilya Grigorik's High Performance Browser Networkinggoes deep into WebRTC architecture, use cases and performance.

http://www.informit.com/articles/article.aspx?p=169578&seqNum=3

Real Time Streaming Protocol (RTSP)

In the modern Internet, applications are required to deliver value. One of the biggest conundrums in recent years has been the battle to actually make the Internet a viable platform for making money. As we'll see throughout the course of this book, one of the biggest drivers for delivering on the "Gold Rush" promise of Internet technologies is content. Making content attractive to end consumers to the point where they are willing to pay is a big challenge and one that has been aided by the delivery of Application layer protocols such as RTSP, which enables the delivery of real-time video and audio in variable qualities. The other Application layer protocols we've looked at so far in this chapter work in a request/response manner, whereby the client asks for some piece of content, the content is delivered using TCP or UDP, and then the client application can display the content to the user. While these mechanisms are suitable for a large number of applications in the Internet, there also exists a requirement to deliver content, be it images, audio, video, or a combination of all three, in real time. Imagine if a user were to try to watch a full-screen video file of a one-hour movie using HTTP or FTP as the Application layer protocol. The movie file might be several hundred megabytes, if not several gigabytes, in size. Even with modern broadband services deliverable to the home, this type of large file size does not fit well in the "download then play" model we saw previously.

RTSP uses a combination of reliable transmission over TCP (used for control) and best-efforts delivery over UDP (used for content) to stream content to users. By this, we mean that the file delivery can start and the client-side application can begin displaying the audio and video content before the complete file has arrived. In terms of our one-hour movie example, this means that the client can request a movie file and watch a "live" feed similar to how one would watch a TV. Along with this "on demand" type service, RTSP also enables the delivery of live broadcast content that would not be possible with traditional download and play type mechanisms.

The Components of RTSP Delivery

During our look at RTSP, we'll use the term to describe a number of protocols that work together in delivering content to the user.

RTSP

RTSP is the control protocol for the delivery of multimedia content across IP networks. It is based typically on TCP for reliable delivery and has a very similar operation and syntax to HTTP. RTSP is used by the client application to communicate to the server information such as the media file being requested, the type of application the client is using, the mechanism of delivery of the file (unicast or multicast, UDP or TCP), and other important control information commands such as DESCRIBE, SETUP, and PLAY. The actual multimedia content is not typically delivered over the RTSP connection(s), although it can be interleaved if required. RTSP is analogous to the remote control of the streaming protocols.

Real Time Transport Protocol (RTP)

RTP is the protocol used for the actual transport and delivery of the real-time audio and video data. As the delivery of the actual data for audio and video is typically delay sensitive, the lighter weight UDP protocol is used as the Layer 4 delivery mechanism, although TCP might also be used in environments that suffer higher packet loss. The RTP flow when delivering the content is unidirectional from the server to the client. One interesting part of the RTP operation is that the source port used by the server when sending the UDP data is always even—although it is dynamically assigned. The destination port (i.e., the UDP port on which the client is listening) is chosen by the client and communicated over the RTSP control connection.

Real Time Control Protocol (RTCP)

RTCP is a complimentary protocol to RTP and is a bidirectional UDP-based mechanism to allow the client to communicate stream-quality information back to the object server. The RTCP UDP communication always uses the next UDP source port up from that used by the RTP stream, and consequently is always odd. Figure 3-7 shows how the three protocols work together.

Figure 3-7. The three main application protocols used in real-time streaming.

RTSP Operation

The RTSP protocol is very similar in structure and specifically syntax to HTTP. Both use the same URL structure to describe an object, with RTSP using the rtsp:// scheme rather than the http://. RTSP, however, introduces a number of additional headers (such as DESCRIBE, SETUP, and PLAY) and also allows data transport out-of-band and over a different protocol, such as RTP described earlier. The best way to understand how the components described previously work together to deliver an audio/video stream is to look at an example. The basic steps involved in the process are as follows:

The client establishes a TCP connection to the servers, typically on TCP port 554, the well-known port for RTSP.
The client will then commence issuing a series of RTSP header commands that have a similar format to HTTP, each of which is acknowledged by the server. Within these RTSP commands, the client will describe to the server details of the session requirements, such as the version of RTSP it supports, the transport to be used for the data flow, and any associated UDP or TCP port information. This information is passed using the DESCRIBE and SETUP headers and is augmented on the server response with a Session ID that the client, and any transitory proxy devices, can use to identify the stream in further exchanges.
Once the negotiation of transport parameters has been completed, the client will issue a PLAY command to instruct the server to commence delivery of the RTP data stream.
Once the client decides to close the stream, a TEARDOWN command is issued along with the Session ID instructing the server to cease the RTP delivery associated with that ID.

Example—RTSP with UDP-Based RTP Delivery

Let's consider an example interaction where the client and server will use a combination of TCP-based RTSP and UDP-based RTP and RTCP to deliver and view a video stream. In the first step, the client will establish a TCP connection to port 554 on the server and issue an OPTIONS command showing the protocol version used for the session. The server acknowledges this with a 200 OK message, similar to HTTP.

C->S  OPTIONS rtsp://video.foocorp.com:554 RTSP/1.0
Cseq: 1

S->C  RTSP/1.0 200 OK
      Cseq: 1

Next, the client issues a DESCRIBE command that indicates to the server the URL of the media file being requested. The server responds with another 200 OK acknowledgment and includes a full media description of the content, which is presented in either Session Description Protocol (SDP) or Multimedia and Hypermedia Experts Group (MHEG) format.

C->S  DESCRIBE rtsp://video.foocorp.com:554/streams/example.rm RTSP/1.0
      Cseq:2

S->C  RTSP/1.0 200 OK
      Cseq: 2
      Content-Type: application/sdp
Content-Length: 210
      <SDP Data...>

In the third stage of the RTSP negotiation, the client issues a SETUP command that identifies to the server the transport mechanisms, in order of preference, the client wants to use. We won't list all of the available transport options here (the RFC obviously contains an exhaustive list), but we'll see the client request RTP over UDP on ports 5067 and 5068 for the data transport. The server responds with confirmation of the RTP over UDP transport mechanism and the client-side ports and includes the unique Session ID and server port information.

C->S  SETUP rtsp://video.foocorp.com:554/streams/example.rm RTSP/1.0
      Cseq: 3
      Transport: rtp/udp;unicast;client_port=5067-5068

S->C  RTSP/1.0 200 OK
      Cseq: 3
      Session: 12345678
      Transport: rtp/udp;client_port=5067-5068;server_port=6023-6024

Finally, the client is now ready to commence the receipt of the data stream and issues a PLAY command. This simply contains the URL and Session ID previously provided by the server. The server acknowledges this PLAY command, and the RTP stream from the server to client will begin.

C->S  PLAY rtsp://video.foocorp.com:554/streams/example.rm RTSP/1.0
      Cseq: 4
      Session: 12345678

S->C  RTSP/1.0 200 OK
      Cseq: 4

Once the client decides that the stream can be stopped, a TEARDOWN command is issued over the RTSP connection referenced only by the Session ID. The server again acknowledges this and the RTP delivery will cease.

C->S  TEARDOWN rtsp://video.foocorp.com:554/streams/example.rm RTSP/1.0
      Cseq: 5
      Session: 12345678

S->C  RTSP/1.0 200 OK
      Cseq: 5

Figure 3-8 shows this example in a simplified graphic form.

Figure 3-8. An example of RTSP in action with the video and audio data being delivered over a separate UDP-based RTP stream.

Other Options for Data Delivery

In certain scenarios, the best-effort, dynamic port methods of UDP-based RTP, as described previously, are not suitable. Some environments might consider the allocation of dynamic source and destination UDP ports through firewalls to be something they can live happily without. Moreover, just the nature of the Layer 1 and Layer 2 transport mechanisms underlying the data delivery might not be suited to nonguaranteed UDP traffic. In either instance, RTSP allows for the negotiation of the RTP delivery of the media data to be interleaved into the existing TCP connection.

When interleaving, the client-to-server SETUP command has the following format:

C->S  SETUP rtsp://video.foocorp.com:554/streams/example.rm RTSP/1.0
      Cseq: 3
      Transport: rtp/avp/tcp; interleaved=0-1

The changeover in the preceding example is in the transport description. First, the transport mechanisms have changed to show that the RTP delivery must be over TCP rather than UDP. Second, the addition of the interleaved option shows that the RTP data should be interleaved and use channel identifiers 0 and 1—0 will be used for the RTP data and 1 will be used for the RTCP messages. To confirm the transport setup, the server will respond with confirmation and a Session ID as before:

S->C  RTSP/1.0 200 OK
      Cseq: 3
      Session: 12345678
      Transport: rtp/ avp/tcp; interleaved=0-1

The RTP and RTCP data can now be transmitted over the existing RTSP TCP connection with the server using the 0 and 1 identifiers to represent the relevant channel.

One further delivery option for RTP and RTCP under RTSP is to wrap the delivery of all media streaming components inside traditional HTTP frame formats. This removes most barriers presented when using streaming media through firewalled environments, as even the most stringent administrator will typically allow HTTP traffic to traverse perimeter security. While HTTP and RTSP interleaved delivery of the streamed media data will make the content available to the widest possible audience, when you consider the overhead of wrapping all RTP data inside either an existing TCP stream or, worse still, inside HTTP, it is the least efficient method for delivery. To enable the streaming media client browser to cope with the different options described previously, most offer the client users the ability to configure their preferred delivery mechanism or mechanisms, and the timeout that should be imposed in failing between them. What you will see from a client perspective is that the client application will first request that the stream be delivered using RTP in UDP, and if the stream does not arrive within x seconds (as it is potentially being blocked by an intermediate firewall), it will fail back to using RTP interleaved in the existing RTSP connection.

RTSP and RTP—Further Reading

For further information on the RTSP and RTP protocols, RFCs 2326 and 1889, respectively, are a good source.

http://www.cnblogs.com/whyandinside/archive/2009/08/30/1556572.html

RTP
Real-time Transport Protocol)是用于Internet上针对多媒体数据流的一种传输层协议。RTP协议详细说明了在互联网上传递音频和视频的标准数据包格式。RTP 协议常用于流媒体系统（配合RTCP协议），视频会议和一键通（Push to Talk）系统（配合H.323或SIP），使它成为IP电话产业的技术基础。RTP协议和RTP控制协议RTCP一起使用，而且它是建立在UDP协议上的。
RTP 本身并没有提供按时发送机制或其它服务质量（QoS）保证，它依赖于低层服务去实现这一过程。 RTP 并不保证传送或防止无序传送，也不确定底层网络的可靠性。 RTP 实行有序传送， RTP 中的序列号允许接收方重组发送方的包序列，同时序列号也能用于决定适当的包位置，例如：在视频解码中，就不需要顺序解码。
RTP 由两个紧密链接部分组成： RTP ― 传送具有实时属性的数据；RTP 控制协议（RTCP） ― 监控服务质量并传送正在进行的会话参与者的相关信息。

RTCP
实时传输控制协议（Real-time Transport Control Protocol或RTP Control Protocol或简写RTCP）是实时传输协议（RTP）的一个姐妹协议。RTCP为RTP媒体流提供信道外（out-of-band）控制。RTCP 本身并不传输数据，但和RTP一起协作将多媒体数据打包和发送。RTCP定期在流多媒体会话参加者之间传输控制数据。RTCP的主要功能是为RTP所提供的服务质量（Quality of Service）提供反馈。

RTCP收集相关媒体连接的统计信息，例如：传输字节数，传输分组数，丢失分组数，jitter，单向和双向网络延迟等等。网络应用程序可以利用 RTCP所提供的信息试图提高服务质量，比如限制信息流量或改用压缩比较小的编解码器。RTCP本身不提供数据加密或身份认证。SRTCP可以用于此类用途。
SRTP & SRTCP
安全实时传输协议（Secure Real-time Transport Protocol或SRTP）是在实时传输协议（Real-time Transport Protocol或RTP）基础上所定义的一个协议，旨在为单播和多播应用程序中的实时传输协议的数据提供加密、消息认证、完整性保证和重放保护。它是由 David Oran（思科）和Rolf Blom（爱立信）开发的，并最早由IETF于2004年3月作为RFC 3711发布。

由于实时传输协议和可以被用来控制实时传输协议的会话的实时传输控制协议（RTP Control Protocol或RTCP）有着紧密的联系，安全实时传输协议同样也有一个伴生协议，它被称为安全实时传输控制协议（Secure RTCP或SRTCP）；安全实时传输控制协议为实时传输控制协议提供类似的与安全有关的特性，就像安全实时传输协议为实时传输协议提供的那些一样。

在使用实时传输协议或实时传输控制协议时，使不使用安全实时传输协议或安全实时传输控制协议是可选的；但即使使用了安全实时传输协议或安全实时传输控制协议，所有它们提供的特性（如加密和认证）也都是可选的，这些特性可以被独立地使用或禁用。唯一的例外是在使用安全实时传输控制协议时，必须要用到其消息认证特性。

RTSP
RTSP（Real Time Streaming Protocol）是用来控制声音或影像的多媒体串流协议，并允许同时多个串流需求控制，传输时所用的网络通讯协定并不在其定义的范围内，服务器端可以自行选择使用TCP或UDP来传送串流内容，它的语法和运作跟HTTP 1.1类似，但并不特别强调时间同步，所以比较能容忍网络延迟。而前面提到的允许同时多个串流需求控制（Multicast），除了可以降低服务器端的网络用量，更进而支持多方视讯会议（Video Conference）。因为与HTTP1.1的运作方式相似，所以代理服务器《Proxy》的快取功能《Cache》也同样适用于RTSP，并因RTSP具有重新导向功能，可视实际负载情况来转换提供服务的服务器，以避免过大的负载集中于同一服务器而造成延迟。
RTSP 和RTP的关系

SIP

SIP 会话使用多达四个主要组件：SIP 用户代理、SIP 注册服务器、SIP 代理服务器和 SIP 重定向服务器。这些系统通过传输包括了 SDP 协议（用于定义消息的内容和特点）的消息来完成 SIP 会话。下面概括性地介绍各个 SIP 组件及其在此过程中的作用。

SIP 用户代理 (UA) 是终端用户设备，如用于创建和管理 SIP 会话的移动电话、多媒体手持设备、PC、PDA 等。用户代理客户机发出消息。用户代理服务器对消息进行响应。
SIP 注册服务器是包含域中所有用户代理的位置的数据库。在 SIP 通信中，这些服务器会检索参与方的 IP 地址和其他相关信息，并将其发送到 SIP 代理服务器。
SIP 代理服务器接受 SIP UA 的会话请求并查询 SIP 注册服务器，获取收件方 UA 的地址信息。然后，它将会话邀请信息直接转发给收件方 UA（如果它位于同一域中）或代理服务器（如果 UA 位于另一域中）。
SIP 重定向服务器允许 SIP 代理服务器将 SIP 会话邀请信息定向到外部域。SIP 重定向服务器可以与 SIP 注册服务器和 SIP 代理服务器同在一个硬件上。

下面是一个典型的SIP会话：

以下几个情景说明 SIP 组件之间如何进行协调以在同一域和不同域中的 UA 之间建立 SIP 会话：

在同一域中建立 SIP 会话

下图说明了在预订同一个 ISP 从而使用同一域的两个用户之间建立 SIP 会话的过程。用户 A 使用 SIP 电话。用户 B 有一台 PC，运行支持语音和视频的软客户程序。加电后，两个用户都在 ISP 网络中的 SIP 代理服务器上注册了他们的空闲情况和 IP 地址。用户 A 发起此呼叫，告诉 SIP 代理服务器要联系用户 B。然后，SIP 代理服务器向 SIP 注册服务器发出请求，要求提供用户 B 的 IP 地址，并收到用户 B 的 IP 地址。SIP 代理服务器转发用户 A 与用户 B 进行通信的邀请信息（使用 SDP），包括用户 A 要使用的媒体。用户 B 通知 SIP 代理服务器可以接受用户 A 的邀请，且已做好接收消息的准备。SIP 代理服务器将此消息传达给用户 A，从而建立 SIP 会话。然后，用户创建一个点到点 RTP 连接，实现用户间的交互通信。

在不同的域中建立 SIP 会话

本情景与第一种情景的不同之处如下。用户 A 邀请正在使用多媒体手持设备的用户 B 进行 SIP 会话时，域 A 中的 SIP 代理服务器辨别出用户 B 不在同一域中。然后，SIP 代理服务器在 SIP 重定向服务器上查询用户 B 的 IP 地址。SIP 重定向服务器既可在域 A 中，也可在域 B 中，也可既在域 A 中又在域 B 中。SIP 重定向服务器将用户 B 的联系信息反馈给 SIP 代理服务器，该服务器再将 SIP 会话邀请信息转发给域 B 中的 SIP 代理服务器。域 B 中的 SIP 代理服务器将用户 A 的邀请信息发送给用户 B。用户 B 再沿邀请信息经由的同一路径转发接受邀请的信息。

SDP
SDP is intended for describing multimedia communication sessions for the purposes of session announcement, session invitation, and parameter negotiation. SDP does not deliver media itself but is used for negotiation between end points of media type, format, and all associated properties. The set of properties and parameters are often called a session profile. SDP is designed to be extensible to support new media types and formats.
The Session Description Protocol (SDP) is a format for describing streaming media initialization parameters in an ASCII string.
SDP started off as a component of the Session Announcement Protocol (SAP), but found other uses in conjunction with Real-time Transport Protocol (RTP), Real-time Streaming Protocol (RTSP), Session Initiation Protocol (SIP) and even as a standalone format for describing multicast sessions.
Summary
就如同它们的名字所表示的那样，SIP用于初始化一个Session，并负责传输SDP包；而SDP包中描述了一个Session中包含哪些媒体数据，邀请的人等等；当需要被邀请的人都通过各自的终端设备被通知到后，就可以使用RTSP来控制特定Media的通信，比如RTSP控制信息要求开始Video的播放，那么就开始使用RTP（或者TCP）实时传输数据，在传输过程中，RTCP要负责QoS等。