Linux network namespace源码分析

一、network namespace的创建

　　在对iproute2的源码进行分析后，我们可以知道，当我们调用命令｀ip netns add ns1｀时，本质上就是调用`unshare(CLONE_NEWNET)`创建了一个新的network namespace。接着，我们进一步对内核中对于unshare系统调用的实现进行分析，从而了解内核是如何创建一个network namespace的。

１、内核对unshare()的实现分为两步，第一步调用unshare_nsproxy_namespaces创建一个新的nsproxy，nsproxy的数据结构如下

struct nsproxy {
	atomic_t count;
	struct uts_namespace *uts_ns;
	struct ipc_namespace *ipc_ns;
	struct mnt_namespace *mnt_ns;
	struct pid_namespace *pid_ns;
	struct net 	     *net_ns;
};

一个nsproxy实例中包含了指向五种namespace结构的指针，一个process包含一个nsproxy，代表了这个process所在的各个namespace。当process调用unshare()函数时，内核就会其分配一个新的nsproxy结构，并且根据参数，新建部分namespace，并复制保留其余的namespace。例如，对于`unshare(CLONE_NEWNET)`语句，内核就会为当前进程新建一个network namespace，其余namespace保持不变。

int unshare_nsproxy_namespaces(unsigned long unshare_flags, struct nsproxy **new_nsp, struct cred *new_cred, struct fs_struct *new_fs)

　　1、若unshare_flags参数中没有包含任何CLONE_NEW*参数，说明不用新建任何namespace，直接退出

　　２、检验所在的user namespace是否有CAP_SYS_ADMIN权限，没有则报错退出

　　３、调用*new_nsp = create_new_namespaces(unshare_flags, current, user_ns, new_fs ? new_fs : current->fs)创建新的namespace

static struct nsproxy *create_new_namespaces(unsigned long flags, struct task_struct *tsk, struct user_namespace *user_ns, struct fs_struct *new_fs)

　　1、调用new_nsp = create_nsproxy()创建一个新的nsproxy结构

　　２、再调用一系列例如new_nsp->mnt_ns = copy_mnt_ns(...)的命令，初始化新的nsproxy中的各个namespace指针。如果flags指示需要新建某个namespace，则copy_*函数就会新建一个对应的namespace，否则，就沿用之前的namespace。

　　３、最后调用new_nsp->net_ns = copy_net_ns(flags, user_ns, tsk->nsproxy->net_ns)，同理，根据flags中的对应位，选择新建或者沿用之前的network namespace

struct net *copy_net_ns(unsigned long flags, struct user_namespace *user_ns, struct net *old_net)

　　1、如果flags中不包含CLONE_NEWNET，则返回old_net，否则需要新建一个network namespace

　　2、调用net = net_alloc()分配一个新的struct net 结构

　　3、调用rv = setup_net(net, user_ns)对新分配的net结构进行初始化

　　4、调用list_add_tail_rcu(&net->list, &net_namespace_list)，将新建的network namespace，添加到全局的network namespace列表net_namespace_list中

static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)

　　1、对net中的某些字段进行初始化

　　２、遍历pernet_list列表，对其中的每个struct pernet_operations结构，调用err = ops_init(ops, net)，完成各个模块的初始化工作

我们知道，每当创建一个新的network namespace，里面总会默认存在一个loopback device，但这是怎么完成的呢？其实在DEV模块初始化的时候，会调用函数register_pernet_device(&loopback_net_ops)，将loopback_net_ops挂载到pernet_list中，loopback_pernet_device结构如下所示：

struct pernet_operation __net_initdata loopback_net_ops = {

　　.init = loop_net_init,

}

而在ops_init(ops, net)就会调用loop_net_init()，创建该network namespace自己的loopback设备，对于路由表等其他网络资源的初始化，同理可得。

２、在创建了nsproxy之后，再调用switch_task_namespace(current, new_nsproxy)更换当前process的nsproxy

二、network devcie在network namespace之间的移动

通过命令`ip link set eth0 netns ns1`就能将eth0移动到network namespace ns1中。需要注意的是，当设备被标记为NETIF_F_NETNS_LOCAL时，该设备不能在namespace间移动，物理设备只能存在于root namespace中。当namespace被删除时，非NETIF_F_NETNS_LOCAL设备会被移回root namespace，而NETIF_F_NETNS_LOCAL设备会被删除

int dev_change_net_namespace(struct net_device *dev, struct net *net, const char *pat)

　　1、若dev->features中的NETIF_F_NETNS_LOCAL置位或者设备的状态不是NETREG_REGISTERED则退出

　　2、进行一系列的检测，将设备关闭并且从原先的device chain中取下，并进行设备被移除的通知工作

　　3、调用dev_net_set(dev, net)将设备放入新的network namespace中，其实就是将struct net_device中的nd_net字段设置为net

　　4、调用__dev_get_by_index(net, dev->ifindex)，如果在转移过程中，index有冲突，则另外分配一个

　　5、最后进行新设备添加的通知工作

三、总结

在对内核中network namespace相关的源码进行分析之后，我们可以发现，其实network namespace特性的添加，对整体代码的修改并不是很大。事实上，它只是将一些原本全局的唯一的网络资源变量，例如设备列表，路由表等等，包裹到了struct net这样一个结构中。因此我们创建多个net结构，就相当于拥有了多个原本的网络空间。从本质上来说，我们可以把network namespace的出现，看做是一种网络空间模块化的从特殊到一般的推广，原本全局唯一的网络空间仅仅只是当前情况的一种特例。