The Road to SDN: An Intellectual History of Programmable Networks (三）

2.2 Separating Control and Data Planes

单词学习

翻译

In the early 2000s, increasing traffic volumes and a greater emphasis on on network reliability, predictability, and performance led network operators to seek better approaches to certain network-management functions such as the control over the paths used to deliver traffic (a practice commonly known as traffic engineering). The means for performing traffic engineering using conventional routing protocols were primitive at best. Operators’ frustration with these approaches were recognized by a small, well-situated community of researchers who either worked for or interacted regularly with backbone network operators. These researchers explored pragmatic, nearterm approaches that were either standards-driven or imminently deployable using existing protocols

在21世纪初，流量不断增加，更加强调网络的可靠性、可预测性，性能导致网络运营商寻求更好的方法来实现某些网络管理功能，例如控制用于传输流量的路径（通常称为流量工程）。使用传统路由协议执行流量工程的方法充其量是原始的，运营商对这些方法的失望得到了一个小型、位置优越的研究人员社区的认可，他们要么为骨干网络运营商工作，要么定期与骨干网络运营商互动。这些研究人员探索了实用的、短期的方法，这些方法要么是标准驱动的，要么是使用现有协议立即部署的

Specifically, conventional routers and switches embody a tight integration between the control and data planes. This coupling made various network-management tasks, such as debugging configuration problems and predicting or controlling routing behavior, exceedingly challenging. To address these challenges, various efforts to separate the data and control planes began to emerge

具体来说，传统路由器和交换机体现了控制和数据平面之间的紧密集成，这种耦合使得各种网络管理任务，如调试配置问题和预测或控制路由行为，变得非常具有挑战性。为了应对这些挑战，各种分离数据和控制平面的努力开始出现。

Technology push and use pull. As the Internet flourished in the 1990s, the link speeds in backbone networks grew rapidly, leading equipment vendors to implement packet-forwarding logic directly in hardware, separate from the control-plane software. In addition, Internet Service Providers (ISPs) were struggling to manage the increasing size and scope of their networks, and the demands for greater reliability and new services (such as virtual private networks). In parallel with these two trends, the rapid advances in commodity computing platforms meant that servers often had substantially more memory and processing resources than the control-plane processor of a router deployed just one or two years earlier. These trends catalyzed two innovations:

随着互联网在20世纪90年代的蓬勃发展，骨干网的链路速度迅速增长，领先的设备供应商直接在硬件上实现包转发逻辑，独立于控制平面软件。此外，互联网服务提供商（ISP）正在努力管理其网络不断扩大的规模和范围，以及对更高可靠性和新服务（如虚拟专用网络）的需求。与这两种趋势并行的是，商品计算平台的快速发展意味着服务器的内存和处理资源通常比一年或两年前部署的路由器的控制平面处理器要多得多，这些趋势催生了两项创新。

• an open interface between the control and data planes, such as the ForCES (Forwarding and Control Element Separation) [86] interface standardized by the Internet Engineering Task Force (IETF) and the Netlink interface to the kernellevel packet-forwarding functionality in Linux [65]; and

• logically centralized control of the network, as seen in the Routing Control Platform (RCP) [12, 26] and SoftRouter [47] architectures, as well as the Path Computation Element (PCE) [25] protocol at the IETF.

控制和数据平面之间的开放接口，例如由因特网工程工作队（IETF）标准化的Forces（转发和控制元件分离）接口和Linux中Kernellevel包转发功能的Netlink接口；
以及网络的逻辑集中控制，如IETF的路由控制平台（RCP）和软路由器架构以及路径计算单元（PCE）协议所示。

These innovations were driven by industry’s demands for technologies to manage routing within an ISP network. Some early proposals for separating the data and control planes also came from academic circles, in both ATM networks [10, 30, 78] and active networks

这些创新是由行业对ISP网络中路由管理技术的需求推动的，一些早期关于分离数据和控制平面的建议也来自学术界，包括ATM网络和主动网络。

Compared to earlier research on active networking, these projects focused on pressing problems in network management, with an emphasis on: innovation by and for network administrators (rather than end users and researchers); programmability in the control plane (rather than the data plane); and network-wide visibility and control (rather than devicelevel configuration)

与早期关于主动网络的研究相比，这些项目侧重于网络管理中的紧迫问题，重点是

网络管理员（而不是最终用户和研究人员）的创新
控制平面（而不是数据平面）的可编程性
网络范围的可见性和控制（而不是设备级别配置）

Network-management applications included selecting better network paths based on the current traffic load, minimizing transient disruptions during planned routing changes, giving customer networks more control over the flow of traffic, and redirecting or dropping suspected attack traffic. Several control applications ran in operational ISP networks using legacy routers, including the Intelligent Route Service Control Point (IRSCP) deployed to offer value-added services for virtual-private network customers in AT&T’s tier-1 backbone network [77]. Although much of the work during this time focused on managing routing within a single ISP, some work [25, 26] also proposed ways to enable flexible route control across multiple administrative domains

网络管理应用包括：根据当前的流量负载选择更好的网络路径，在计划的路由更改期间最小化瞬时中断，使客户网络对流量有更大的控制，以及重定向或丢弃可疑的攻击流量。一些控制应用程序使用传统路由器在运营的ISP网络中运行，包括部署的智能路由服务控制点（IRSCP），以在AT&T的第1层骨干网络中为虚拟专用网络客户提供增值服务。虽然这段时间的大部分工作都集中在管理单个ISP内的路由，但一些工作也提出了实现跨多个管理域的灵活路由控制的方法。

Moving control functionality off of network equipment and into separate servers made sense because network management is, by definition, a network-wide activity. Logically centralized routing controllers [12, 47, 77] were enabled by the emergence of open-source routing software [9, 40, 64] that lowered the barrier to creating prototype implementations. The advances in server technology meant that a single commodity server could store all of the routing state and compute all of the routing decisions for a large ISP network [12, 79]. This, in turn, enabled simple primary-backup replication strategies, where backup servers store the same state and perform the same computation as the primary server, to ensure controller reliability

将控制功能从网络设备转移到单独的服务器上是有意义的，因为网络管理，顾名思义，是一种网络范围的活动。逻辑上集中的路由控制器是通过开源路由软件的出现而实现的，它降低了创建原型实现的障碍。服务器技术的进步意味着单个商品服务器可以存储所有的路由状态，并计算大型ISP网络的所有路由决策，这反过来又启用了简单的主备份复制策略，其中备份服务器存储与主服务器相同的状态并执行相同的计算，以确保控制器的可靠性。

Intellectual contributions. The initial attempts to separate the control and data planes were relatively pragmatic, but they represented a significant conceptual departure from the Internet’s conventionally tight coupling of path computation and packet forwarding. The efforts to separate the network’s control and data plane resulted in several concepts that have been carried forward in subsequent SDN designs

最初分离控制平面和数据平面的尝试是相对实用的，但它们代表了一种与因特网传统上紧密耦合的路径计算和包转发的重大概念性背离。将网络的控制平面和数据平面分开的努力产生了几个概念，这些概念在随后的SDN设计中得到了推广。

• Logically centralized control using an open interface to the data plane. The ForCES working group at the IETF proposed a standard, open interface to the data plane to enable innovation in control-plane software. The SoftRouter [47] used the ForCES API to allow a separate controller to install forwarding-table entries in the data plane, enabling the complete removal of control functionality from the routers. Unfortunately, ForCES was not adopted by the major router vendors, which hampered incremental deployment. Rather than waiting for new, open APIs to emerge, the RCP [12,26] used an existing standard control-plane protocol (the Border Gateway Protocol) to install forwarding-table entries in legacy routers, enabling immediate deployment. OpenFlow also faced similar backwards compatibility challenges and constraints: in particular, the initial OpenFlow specification relied on backwards compatibility with hardware capabilities of commodity switches

使用数据平面的开放接口进行逻辑集中控制 .IETF的Forces工作组提出了一个标准的、开放的数据平面接口，以实现控制平面软件的创新。软路由器使用ForCES API允许单独的控制器在数据平面中安装转发表条目，从而能够从路由器中完全删除控制功能。不幸的是，主要路由器供应商没有采用ForCES，这妨碍了增量部署。RCP没有等待新的、开放的API出现，而是使用现有的标准控制平面协议（边界网关协议）在传统路由器中安装转发表条目，从而实现即时部署。OpenFlow还面临着类似的向后兼容性挑战和限制——特别是，最初的OpenFlow规范依赖于与商品交换机硬件功能的向后兼容性。

Distributed state management. Logically centralized route controllers faced challenges involving distributed state management. A logically centralized controller must be replicated to cope with controller failure, but replication introduces the potential for inconsistent state across replicas. Researchers explored the likely failure scenarios and consistency requirements. At least in the case of routing control, the controller replicas did not need a general state management protocol, since each replica would eventually compute the same routes (after learning the same topology and routing information) and transient disruptions during routingprotocol convergence were acceptable even with legacy protocols [12]. For better scalability, each controller instance could be responsible for a separate portion of the topology. These controller instances could then exchange routing information with each other to ensure consistent decisions [79]. The challenges of building distributed controllers would arise again several years later in the context of distributed SDN controllers [46, 55]. Distributed SDN controllers face the far more general problem of supporting arbitrary controller applications, requiring more sophisticated solutions for distributed state management.

逻辑集中式路由控制器面临着涉及分布式状态管理的挑战——必须复制逻辑上集中的控制器以应对控制器故障，但复制可能会导致副本之间的状态不一致。研究人员探索了可能的故障场景和一致性要求，至少在路由控制的情况下，控制器副本不需要一般的状态管理协议，因为每个副本最终都会计算相同的路由（在学习相同的拓扑和路由信息之后）和路由协议收敛期间的暂时中断，即使使用传统协议也可以接受。为了更好的可伸缩性，每个控制器实例可以负责拓扑的单独部分，然后这些控制器实例可以相互交换路由信息，以确保一致的决策。几年后，在分布式SDN控制器的背景下，构建分布式控制器的挑战将再次出现——分布式SDN控制器面临着支持任意控制器应用的更普遍的问题，需要更复杂的分布式状态管理解决方案.

Myths and misconceptions. When these new architectures were proposed, critics viewed them with healthy skepticism, often vehemently arguing that logically centralized route control would violate “fate sharing”, since the controller could fail independently from the devices responsible for forwarding traffic. Many network operators and researchers viewed separating the control and data planes as an inherently bad idea, as initially there was no clear articulation of how these networks would continue to operate correctly if a controller failed. Skeptics also worried that logically centralized control moved away from the conceptually simple model of the routers achieving distributed consensus, where they all (eventually) have a common view of network state (e.g., through flooding). In logically centralized control, each router has only a purely local view of the outcome of the route-selection process.

当这些新的架构被提出时，批评者以健康的怀疑态度看待它们，经常激烈地争论逻辑上集中的路由控制将违反“命运共享”，因为控制器可能独立于负责转发流量的设备而失败。许多网络运营商和研究人员认为，将控制平面和数据平面分离是一个固有的坏主意，因为最初没有明确说明如果控制器出现故障，这些网络将如何继续正确运行。怀疑论者还担心，逻辑上的集中控制从概念上简单的路由器模型转移到了实现分布式共识的路由器上，它们最终都对网络状态有一个共同的看法（例如，通过洪泛）。在逻辑集中控制中，每个路由器对路由选择过程的结果只有一个纯本地视图.

In fact, by the time these projects took root, even the traditional distributed routing solutions already violated these principles. Moving packet-forwarding logic into hardware meant that a router’s control-plane software could fail independently from the data plane. Similarly, distributed routing protocols adopted scaling techniques, such as OSPF areas and BGP route reflectors, where routers in one region of a network had limited visibility into the routing information in other regions. As we discuss in the next section, the separation of the control and data planes somewhat paradoxically enabled researchers to think more clearly about distributed state management: the decoupling of the control and data planes catalyzed the emergence of a state management layer that maintains consistent view of network state

事实上当这些项目扎根时，即使是传统的分布式路由解决方案也已经违反了这些原则。将包转发逻辑移入硬件意味着路由器的控制平面软件可能独立于数据平面而失败，类似地，分布式路由协议采用可伸缩技术，如OSPF区域和BGP路由反射器，其中网络中一个区域的路由器对其他区域的路由信息的可见性有限.正如我们在下一节中所讨论的，控制平面和数据平面的分离有些自相矛盾，使得研究人员能够更清楚地思考分布式状态管理：控制平面和数据平面的分离促进了状态管理层的出现网络状态的一致视图。

In search of generality. Dominant equipment vendors had little incentive to adopt standard data-plane APIs like ForCES, since open APIs could enable new entrants into the marketplace. The resulting need to rely on existing routing protocols to control the data plane imposed significant limitations on the range of applications that programmable controllers could support. Conventional IP routing protocols compute routes for destination IP address blocks, rather than providing a wider range of functionality (e.g., dropping, flooding, or modifying packets) based on a wider range of header fields (e.g., MAC and IP addresses, TCP and UDP port numbers), as OpenFlow does. In the end, although the industry prototypes and standardization efforts made some progress, widespread adoption remained elusive.

主流设备供应商几乎没有动力采用像ForCES这样的标准数据平面API，因为开放API可以使新的进入者进入市场。因此，需要依赖现有的路由协议来控制数据平面，这对可编程控制器能够支持的应用范围造成了很大的限制。传统的IP路由协议计算目标IP地址块的路由，而不是基于更大范围的头字段（例如，MAC和IP地址、TCP和UDP端口号）提供更大范围的功能（例如，丢弃、泛洪或修改数据包），就像OpenFlow。最后，虽然行业原型和标准化工作取得了一些进展，但普遍采用仍然是难以捉摸的.

To broaden the vision of control and data plane separation, researchers started exploring clean-slate architecturesfor logically centralized control. The 4D project [35] advocated four main layers—the data plane (for processing packets based on configurable rules), the discovery plane (for collecting topology and traffic measurements), the dissemination plane (for installing packet-processing rules), and a decision plane (consisting of logically centralized controllers that convert network-level objectives into packet-handling state). Several groups proceeded to design and build systems that applied this high-level approach to new application areas [16, 85], beyond route control. In particular, the Ethane project [16] (and its direct predecessor, SANE [17]) created a logically centralized, flow-level solution for access control in enterprise networks. Ethane reduces the switches to flow tables that are populated by the controller based on high-level security policies. The Ethane project, and its operational deployment in the Stanford computer science department, set the stage for the creation of OpenFlow. In particular, the simple switch design in Ethane became the basis of the original OpenFlow API

为了拓宽控制和数据平面分离的视野，研究人员开始探索逻辑集中控制的全新体系结构。4D项目主张四个主要层数据平面（用于基于可配置规则处理分组),发现平面（用于收集拓扑和流量测量),

传播平面（用于安装分组处理规则),决策平面（由逻辑上集中的控制器组成，这些控制器将网络级目标转换为分组处理状态).几个小组开始设计和建造系统，将这种高层次的方法应用于新的应用领域，超出了路线控制。具体而言，Ethane项目及其直接前身Sane为企业网络中的访问控制创建了一个逻辑上集中的、流级的解决方案。Ethane根据高级安全策略将交换机减少到由控制器填充的流表，Ethane项目及其在斯坦福计算机科学系的运行部署为创建OpenFlow奠定了基础。特别是，Ethane中的简单交换机设计成为了最初OpenFlow API的基础