路由的添加和删除

路由表的结构

为了各种操作的快速查找,内核中定义了几个不同的哈希表, 存储着相同的结构

1. 基于掩码长度的哈希表

内核中定义了长度为33的哈希表, 分别表示掩码长度0-32, 其中掩码长度为0的表示默认路由; fib_table的tb_data是fn_hash结构:

struct fn_hash {
    struct fn_zone    *fn_zones[33];
    struct fn_zone    *fn_zone_list;
};

其中包含了一个长度为33的fn_zone数组, 用于分别表示掩码长度0-32路由信息; 所有非空的fn_zone通过fn_zone_list连接到一起;

掩码长度为0的路由, 也就是默认路由fn_zone的哈希表长度为1, 从哈希表退化为单链表, 因为一般不会有太多的默认路由项;

fib_node用于表示唯一的子网, 它的成员fn_key用于区分不同的fib_node; 特别要注意的是fib_node是表示到达一个子网的路由(长度为32的主机路由可以看成只有一台主机的子网)

fib_alias用于表示到达同一子网的不同路由, 这些路由的TOS值是不同的; fib_node的所有fib_alias是根据IP TOS升序排列的; fib_alias会指向某个fib_node, fib_node存储了真正的路由信息;

fn_zone的哈希表在entry数量达到哈希表大小的两倍时会进行扩容, 以减少查找时间;

2. 直接搜索fib_info结构的哈希表

系统中定义了两个链表, 可以直接搜索fib_info结构;

fib_info_hash: 所有的fib_info结构都链入此链表;

fib_info_laddrhash:带preferred address的路由会链入此链表;

fib_create_info会检查fib_info entry数量fib_info_cnt是否达到哈希表的大小fib_hash_size, 如果条件成立fib_info_hashfib_info_laddrhash都分配原先大小的两倍, 然后把就的fib_info entry移到新的链表中;

3. 以net_device为索引的哈希表, 用于搜索下一跳信息

fib_info中可能包含多个fib_nh(开启多路由的编译宏),  fib_nh中包含了net_device, 表示发往下一跳的报文需通过该设备转发; 当设备shutdown时需要删除该设备关联的所有路由(fib_sync_down); 当设备启用时要从新启用通过该设备可以到达的路由;

fib_aliasfib_info不是一对一的, 几个fib_alias可以共享一个fib_info, fib_info中有引用计数fib_treeref记录着它被几个fib_alias使用;

比如下图中:

1. 有4条路由, 因为有4个fib_alias结构

2. 这4条路由指向3个不同的子网(3个fib_node结构), 其中同一fib_node包含两个fib_alias

3. 其中有2条路由的下一跳路由相同, 两个fib_alias的fa_info指向相同的fib_info

未命名

路由Scope

RT_SCOPE_NOWHERE

表示该路由无法到达任意主机, 也就是说到该目的地址没有路由

RT_SCOPE_HOST

表示该路由目的为本机接口, 此类型路由为fib_add_ifaddr自动添加

RT_SCOPE_LINK

表示路由目的为本地网络

RT_SCOPE_UNIVERSE

表示路由目的为其他非直连网络, 也就是需要至少一个下一条网关;

路由的scope和本地配置地址的scope可以由用户显式指定或者由内核配置为默认值; 而下一跳fib_nh的scope只能由fib_check_nh指定; 给定路由和它的下一跳, 下一跳fib_nh的scope是用于到达该下一跳路由的scope; 当主机转发一条报文都会使该报文跟接近最终目的; 因此, 路由的scope必须大等于该到达下一跳路由scope;

未命名 

A要发送报文给C, A到C路由的scope是RT_SCOPE_UNIVERSE, 下一跳是RT; 而A到RT路由的scope是RT_SCOPE_LINK < RT_SCOPE_UNIVERSE;

A要发送报文给A, A到A路由的scope是RT_SCOPE_HOST, 下一条为空, scope是RT_SCOPE_NOWHERE;

路由结束条件是路由查找的结果返回RT_SCOPE_HOST或者RT_SCOPE_LINK; RT_SCOPE_HOST表示目的地址是本机; RT_SCOPE_LINK表示目的地址与本机直连, 可以通过L2协议进行发送;

路由表的初始化

#ifdef CONFIG_IP_MULTIPLE_TABLES
/* 可以在任意时候调用 */
struct fib_table * fib_hash_init(u32 id)
#else
/* 只在初始化的时候创建local和main路由表 */
struct fib_table * __init fib_hash_init(u32 id)
#endif
{
    struct fib_table *tb;

    if (fn_hash_kmem == NULL)
        fn_hash_kmem = kmem_cache_create("ip_fib_hash",
                         sizeof(struct fib_node),
                         0, SLAB_HWCACHE_ALIGN,
                         NULL, NULL);

    if (fn_alias_kmem == NULL)
        fn_alias_kmem = kmem_cache_create("ip_fib_alias",
                          sizeof(struct fib_alias),
                          0, SLAB_HWCACHE_ALIGN,
                          NULL, NULL);

    tb = kmalloc(sizeof(struct fib_table) + sizeof(struct fn_hash),
             GFP_KERNEL);
    if (tb == NULL)
        return NULL;

    tb->tb_id = id;
    tb->tb_lookup = fn_hash_lookup;
    tb->tb_insert = fn_hash_insert;
    tb->tb_delete = fn_hash_delete;
    tb->tb_flush = fn_hash_flush;
    tb->tb_select_default = fn_hash_select_default;
    tb->tb_dump = fn_hash_dump;
    memset(tb->tb_data, 0, sizeof(struct fn_hash));
    return tb;
}

路由表插入

添加或删除一条路由时, Flag的组合如下:

CLI keyword

Operation

Flags

Kernel handler

add

RTM_NEWROUTE

NLM_F_EXCL NLM_F_CREATE

inet_rtm_newroute

change

RTM_NEWROUTE

NLM_F_REPLACE

inet_rtm_newroute

replace

RTM_NEWROUTE

NLM_F_CREATE

NLM_F_REPLACE

inet_rtm_newroute

prepend

RTM_NEWROUTE

NLM_F_CREATE

inet_rtm_newroute

append

RTM_NEWROUTE

NLM_F_CREATE

NLM_F_APPEND

inet_rtm_newroute

test

RTM_NEWROUTE

NLM_F_EXCL

inet_rtm_newroute

delete

RTM_DELROUTE

None

inet_rtm_delroute

list/lst/show

RTM_GETROUTE

None

inet_dump_fib

get

RTM_GETROUTE

NLM_F_REQUEST

inet_rtm_getroute

flush

RTM_GETROUTE

None

None


路由表的插入是有fn_hash_insert来完成的:
static int fn_hash_insert(struct fib_table *tb, struct fib_config *cfg)
{
    struct fn_hash *table = (struct fn_hash *) tb->tb_data;
    struct fib_node *new_f, *f;
    struct fib_alias *fa, *new_fa;
    struct fn_zone *fz;
    struct fib_info *fi;
    u8 tos = cfg->fc_tos;
    __be32 key;
    int err;

    /* 掩码长度不能大于32 */
    if (cfg->fc_dst_len > 32)
        return -EINVAL;

    /* 取对应掩码长度的fn_zone, 如果不存在则创建fn_zone, 并连接到table->fn_zone_list中 */
    fz = table->fn_zones[cfg->fc_dst_len];
    if (!fz && !(fz = fn_new_zone(table, cfg->fc_dst_len)))
        return -ENOBUFS;

    key = 0;
    if (cfg->fc_dst) {
        /* 上面说过, fib_node是表示某个子网路由, 长度为32的子网可以看出只有一个ip的子网 */
        if (cfg->fc_dst & ~FZ_MASK(fz))
            return -EINVAL;
        key = fz_key(cfg->fc_dst, fz);
    }

    /* 新建fib_info, 并初始化nh_oif, nh_gw, nh_flags, nh_scope, nh_dev等成员, 最后把fib_info链到fib_info_hash以及fib_info_laddrhash中, 把带dev的fib_nh链到fib_info_devhash中 */
    fi = fib_create_info(cfg);
    if (IS_ERR(fi))
        return PTR_ERR(fi);

    /* fn_zone的entry数量达到该zone哈希表大小的两倍时扩容, 并把旧的fib_node节点移动到新的哈希表中 */
    if (fz->fz_nent > (fz->fz_divisor<<1) &&
        fz->fz_divisor < FZ_MAX_DIVISOR &&
        (cfg->fc_dst_len == 32 ||
         (1 << cfg->fc_dst_len) > fz->fz_divisor))
        fn_rehash_zone(fz);

    /* 查找是否已有到达该子网的路由 */
    f = fib_find_node(fz, key);

    if (!f)
        fa = NULL;
    else
        fa = fib_find_alias(&f->fn_alias, tos, fi->fib_priority);

    /* Now fa, if non-NULL, points to the first fib alias
     * with the same keys [prefix,tos,priority], if such key already
     * exists or to the node before which we will insert new one.
     *
     * If fa is NULL, we will need to allocate a new one and
     * insert to the head of f.
     *
     * If f is NULL, no fib node matched the destination key
     * and we need to allocate a new one of those as well.
     */

    if (fa && fa->fa_tos == tos &&
        fa->fa_info->fib_priority == fi->fib_priority) {
        struct fib_alias *fa_orig;

        err = -EEXIST;
        /* 已经存在, Do not touch, if it exists */
        if (cfg->fc_nlflags & NLM_F_EXCL)
            goto out;

        /* 替换该fib_alias */
        if (cfg->fc_nlflags & NLM_F_REPLACE) {
            struct fib_info *fi_drop;
            u8 state;

            write_lock_bh(&fib_hash_lock);
            fi_drop = fa->fa_info;
            fa->fa_info = fi;
            fa->fa_type = cfg->fc_type;
            fa->fa_scope = cfg->fc_scope;
            state = fa->fa_state;
            fa->fa_state &= ~FA_S_ACCESSED;
            fib_hash_genid++;
            write_unlock_bh(&fib_hash_lock);
            
            /* 释放对原fib_info的引用 */
            fib_release_info(fi_drop);

            /* 在路由缓存中有,则刷新缓存 */
            if (state & FA_S_ACCESSED)
                rt_cache_flush(-1);
            return 0;
        }

        /* append(CREATE | APPEND)或者prepend(CREATE) */
        /* Error if we find a perfect match which
         * uses the same scope, type, and nexthop
         * information.
         */
        fa_orig = fa;
        fa = list_entry(fa->fa_list.prev, struct fib_alias, fa_list);
        /* tos和priority相同, 表示相同路由, 已经存在则插入失败 */
        list_for_each_entry_continue(fa, &f->fn_alias, fa_list) {
            if (fa->fa_tos != tos)
                break;
            if (fa->fa_info->fib_priority != fi->fib_priority)
                break;
            if (fa->fa_type == cfg->fc_type &&
                fa->fa_scope == cfg->fc_scope &&
                fa->fa_info == fi)
                goto out;
        }
        /* prepend, 在原先fa后插入, 否则插入到tos相同的fa之后 */
        if (!(cfg->fc_nlflags & NLM_F_APPEND))
            fa = fa_orig;
    }

    err = -ENOENT;
    /* 没找到fib_alias, 则需要新建 */
    if (!(cfg->fc_nlflags & NLM_F_CREATE))
        goto out;

    err = -ENOBUFS;
    new_fa = kmem_cache_alloc(fn_alias_kmem, GFP_KERNEL);
    if (new_fa == NULL)
        goto out;

    new_f = NULL;
    if (!f) {
        new_f = kmem_cache_alloc(fn_hash_kmem, GFP_KERNEL);
        if (new_f == NULL)
            goto out_free_new_fa;

        INIT_HLIST_NODE(&new_f->fn_hash);
        INIT_LIST_HEAD(&new_f->fn_alias);
        new_f->fn_key = key;
        f = new_f;
    }

    new_fa->fa_info = fi;
    new_fa->fa_tos = tos;
    /* fib_alias的scope由配置指定 */
    new_fa->fa_type = cfg->fc_type;
    new_fa->fa_scope = cfg->fc_scope;
    new_fa->fa_state = 0;

    /*
     * Insert new entry to the list.
     */

    write_lock_bh(&fib_hash_lock);
    if (new_f)
        /* fib_node新建, 链入fn_zone的哈希表中 */
        fib_insert_node(fz, new_f);
    /* fib_node上已存在fib_alias则链入到刚才找到的fa后面, 否则链入新建的fib_node的fn_alias */
    list_add_tail(&new_fa->fa_list,
         (fa ? &fa->fa_list : &f->fn_alias));
    fib_hash_genid++;
    write_unlock_bh(&fib_hash_lock);

    if (new_f)
        fz->fz_nent++;
    rt_cache_flush(-1);

    /* 通知感兴趣的app新建路由 */
    rtmsg_fib(RTM_NEWROUTE, key, new_fa, cfg->fc_dst_len, tb->tb_id,
          &cfg->fc_nlinfo);
    return 0;

out_free_new_fa:
    kmem_cache_free(fn_alias_kmem, new_fa);
out:
    fib_release_info(fi);
    return err;
}
fn_zone的初始化:
static struct fn_zone *
fn_new_zone(struct fn_hash *table, int z)
{
    int i;
    struct fn_zone *fz = kzalloc(sizeof(struct fn_zone), GFP_KERNEL);
    if (!fz)
        return NULL;

    if (z) {
        fz->fz_divisor = 16;
    } else {
        /* 默认路由哈希表大小为1, 退化为单链表 */
        fz->fz_divisor = 1;
    }
    fz->fz_hashmask = (fz->fz_divisor - 1);
    fz->fz_hash = fz_hash_alloc(fz->fz_divisor);
    if (!fz->fz_hash) {
        kfree(fz);
        return NULL;
    }
    memset(fz->fz_hash, 0, fz->fz_divisor * sizeof(struct hlist_head *));
    /* 掩码长度 */
    fz->fz_order = z;
    /* 掩码 */
    fz->fz_mask = inet_make_mask(z);

    /* 链入fib_table, Find the first not empty zone with more specific mask */
    for (i=z+1; i<=32; i++)
        if (table->fn_zones[i])
            break;
    write_lock_bh(&fib_hash_lock);
    if (i>32) {
        /* No more specific masks, we are the first. */
        fz->fz_next = table->fn_zone_list;
        table->fn_zone_list = fz;
    } else {
        fz->fz_next = table->fn_zones[i]->fz_next;
        table->fn_zones[i]->fz_next = fz;
    }
    table->fn_zones[z] = fz;
    fib_hash_genid++;
    write_unlock_bh(&fib_hash_lock);
    return fz;
}
fib_info初始化
struct fib_info *fib_create_info(struct fib_config *cfg)
{
    int err;
    struct fib_info *fi = NULL;
    struct fib_info *ofi;
    int nhs = 1;

    /* Fast check to catch the most weird cases, 所有类型的scope都预定义了一个最大值 */
    if (fib_props[cfg->fc_type].scope > cfg->fc_scope)
        goto err_inval;

#ifdef CONFIG_IP_ROUTE_MULTIPATH
    if (cfg->fc_mp) {
        /* 计算下一跳个数 */
        nhs = fib_count_nexthops(cfg->fc_mp, cfg->fc_mp_len);
        if (nhs == 0)
            goto err_inval;
    }
#endif
#ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED
    if (cfg->fc_mp_alg) {
        if (cfg->fc_mp_alg < IP_MP_ALG_NONE ||
            cfg->fc_mp_alg > IP_MP_ALG_MAX)
            goto err_inval;
    }
#endif

    err = -ENOBUFS;
    if (fib_info_cnt >= fib_hash_size) {
        /* fib_info个数大于fib_node哈希表的大小时扩容 */
        unsigned int new_size = fib_hash_size << 1;
        struct hlist_head *new_info_hash;
        struct hlist_head *new_laddrhash;
        unsigned int bytes;

        if (!new_size)
            new_size = 1;
        bytes = new_size * sizeof(struct hlist_head *);
        new_info_hash = fib_hash_alloc(bytes);
        new_laddrhash = fib_hash_alloc(bytes);
        if (!new_info_hash || !new_laddrhash) {
            fib_hash_free(new_info_hash, bytes);
            fib_hash_free(new_laddrhash, bytes);
        } else {
            memset(new_info_hash, 0, bytes);
            memset(new_laddrhash, 0, bytes);
            
            /* fib_info移动到新的链表中 */
            fib_hash_move(new_info_hash, new_laddrhash, new_size);
        }

        if (!fib_hash_size)
            goto failure;
    }

    fi = kzalloc(sizeof(*fi)+nhs*sizeof(struct fib_nh), GFP_KERNEL);
    if (fi == NULL)
        goto failure;
    fib_info_cnt++;

    fi->fib_protocol = cfg->fc_protocol;
    fi->fib_flags = cfg->fc_flags;
    fi->fib_priority = cfg->fc_priority;
    fi->fib_prefsrc = cfg->fc_prefsrc;

    fi->fib_nhs = nhs;
    change_nexthops(fi) {
        nh->nh_parent = fi;
    } endfor_nexthops(fi)

    if (cfg->fc_mx) {
        struct nlattr *nla;
        int remaining;

        /* 其他属性 */
        nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
            int type = nla->nla_type;

            if (type) {
                if (type > RTAX_MAX)
                    goto err_inval;
                fi->fib_metrics[type - 1] = nla_get_u32(nla);
            }
        }
    }

    if (cfg->fc_mp) {
#ifdef CONFIG_IP_ROUTE_MULTIPATH
        err = fib_get_nhs(fi, cfg->fc_mp, cfg->fc_mp_len, cfg);
        if (err != 0)
            goto failure;
        if (cfg->fc_oif && fi->fib_nh->nh_oif != cfg->fc_oif)
            goto err_inval;
        if (cfg->fc_gw && fi->fib_nh->nh_gw != cfg->fc_gw)
            goto err_inval;
#ifdef CONFIG_NET_CLS_ROUTE
        if (cfg->fc_flow && fi->fib_nh->nh_tclassid != cfg->fc_flow)
            goto err_inval;
#endif
#else
        goto err_inval;
#endif
    } else {
        struct fib_nh *nh = fi->fib_nh;

        /* 初始化该路由出口的device, 网关地址等 */
        nh->nh_oif = cfg->fc_oif;
        nh->nh_gw = cfg->fc_gw;
        nh->nh_flags = cfg->fc_flags;
#ifdef CONFIG_NET_CLS_ROUTE
        nh->nh_tclassid = cfg->fc_flow;
#endif
#ifdef CONFIG_IP_ROUTE_MULTIPATH
        nh->nh_weight = 1;
#endif
    }

#ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED
    fi->fib_mp_alg = cfg->fc_mp_alg;
#endif

    /* 如果是特殊的路由, black hole, prohibit等 */
    if (fib_props[cfg->fc_type].error) {
        /* 不允许配置下一跳, 出口device */
        if (cfg->fc_gw || cfg->fc_oif || cfg->fc_mp)
            goto err_inval;
        goto link_it;
    }

    if (cfg->fc_scope > RT_SCOPE_HOST)
        goto err_inval;

    /* 下一跳的scope是由kernel决定 */
    if (cfg->fc_scope == RT_SCOPE_HOST) {
        /* 本机路由 */
        struct fib_nh *nh = fi->fib_nh;

        /* Local address is added.本机路由不需要下一跳 */
        if (nhs != 1 || nh->nh_gw)
            goto err_inval;
        /* 大于cfg->fc_scope,  RT_SCOPE_HOST */
        nh->nh_scope = RT_SCOPE_NOWHERE;
        nh->nh_dev = dev_get_by_index(fi->fib_nh->nh_oif);
        err = -ENODEV;
        if (nh->nh_dev == NULL)
            goto failure;
    } else {
        /* 其他路由, 需要下一跳网关, 或者网关为本地接口 */
        change_nexthops(fi) {
            /* 根据前面的规则初始化fib_nh的scope */
            if ((err = fib_check_nh(cfg, fi, nh)) != 0)
                goto failure;
        } endfor_nexthops(fi)
    }

    if (fi->fib_prefsrc) {
        if (cfg->fc_type != RTN_LOCAL || !cfg->fc_dst ||
            fi->fib_prefsrc != cfg->fc_dst)
            if (inet_addr_type(fi->fib_prefsrc) != RTN_LOCAL)
                goto err_inval;
    }

link_it:
    if ((ofi = fib_find_info(fi)) != NULL) {
        /* 已经存在相同的fib_info, 直接返回它 */
        fi->fib_dead = 1;
        free_fib_info(fi);
        ofi->fib_treeref++;
        return ofi;
    }

    /* 表示已经被fib_alias引用 */
    fi->fib_treeref++;
    /* fib_info本身被引用的次数 */
    atomic_inc(&fi->fib_clntref);
    spin_lock_bh(&fib_info_lock);
    /* 链入fib_info_hash */
    hlist_add_head(&fi->fib_hash,
               &fib_info_hash[fib_info_hashfn(fi)]);
    /* 指定了preferred src 则链入fib_info_laddrhash */
    if (fi->fib_prefsrc) {
        struct hlist_head *head;

        head = &fib_info_laddrhash[fib_laddr_hashfn(fi->fib_prefsrc)];
        hlist_add_head(&fi->fib_lhash, head);
    }
    change_nexthops(fi) {
        struct hlist_head *head;
        unsigned int hash;

        if (!nh->nh_dev)
            continue;
        
        /* fib_nh链入fib_info_devhash */
        hash = fib_devindex_hashfn(nh->nh_dev->ifindex);
        head = &fib_info_devhash[hash];
        hlist_add_head(&nh->nh_hash, head);
    } endfor_nexthops(fi)
    spin_unlock_bh(&fib_info_lock);
    return fi;

err_inval:
    err = -EINVAL;

failure:
    if (fi) {
        fi->fib_dead = 1;
        free_fib_info(fi);
    }

    return ERR_PTR(err);
}

static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
            struct fib_nh *nh)
{
    int err;

    if (nh->nh_gw) {
        /* 其他路由, 带下一跳网关 */
        struct fib_result res;

#ifdef CONFIG_IP_ROUTE_PERVASIVE
        if (nh->nh_flags&RTNH_F_PERVASIVE)
            return 0;
#endif
        /* 下一跳网关与本地直连 */
        if (nh->nh_flags&RTNH_F_ONLINK) {
            struct net_device *dev;

            /* 路由的scope必须大于下一跳的scope, 一般为RT_SCOPE_UNIVERSE(实际数值的大小是相反的, RT_SCOPE_UNIVERSE是0) */
            if (cfg->fc_scope >= RT_SCOPE_LINK)
                return -EINVAL;
            if (inet_addr_type(nh->nh_gw) != RTN_UNICAST)
                return -EINVAL;
            if ((dev = __dev_get_by_index(nh->nh_oif)) == NULL)
                return -ENODEV;
            if (!(dev->flags&IFF_UP))
                return -ENETDOWN;
            nh->nh_dev = dev;
            dev_hold(dev);
            nh->nh_scope = RT_SCOPE_LINK;
            return 0;
        }
        /* 下一跳没有与本地直连, 必须搜索到达该下一跳的路由 */
        {
            struct flowi fl = {
                .nl_u = {
                    .ip4_u = {
                        .daddr = nh->nh_gw,
                        .scope = cfg->fc_scope + 1,
                    },
                },
                .oif = nh->nh_oif,
            };

            /* It is not necessary, but requires a bit of thinking */
            if (fl.fl4_scope < RT_SCOPE_LINK)
                fl.fl4_scope = RT_SCOPE_LINK;
            if ((err = fib_lookup(&fl, &res)) != 0)
                return err;
        }
        err = -EINVAL;
        /* 到达该下一跳必须是单播路由或本机? */
        if (res.type != RTN_UNICAST && res.type != RTN_LOCAL)
            goto out;
        /* 下一跳的scope是到达该下一跳路由的scope */
        nh->nh_scope = res.scope;
        nh->nh_oif = FIB_RES_OIF(res);
        if ((nh->nh_dev = FIB_RES_DEV(res)) == NULL)
            goto out;
        dev_hold(nh->nh_dev);
        err = -ENETDOWN;
        if (!(nh->nh_dev->flags & IFF_UP))
            goto out;
        err = 0;
out:
        fib_res_put(&res);
        return err;
    } else {
        /* 其他路由, 下一跳为本地接口 */
        struct in_device *in_dev;

        if (nh->nh_flags&(RTNH_F_PERVASIVE|RTNH_F_ONLINK))
            return -EINVAL;

        in_dev = inetdev_by_index(nh->nh_oif);
        if (in_dev == NULL)
            return -ENODEV;
        if (!(in_dev->dev->flags&IFF_UP)) {
            in_dev_put(in_dev);
            return -ENETDOWN;
        }
        nh->nh_dev = in_dev->dev;
        dev_hold(nh->nh_dev);
        /* 通过本地接口地址作为下一跳网关 */
        nh->nh_scope = RT_SCOPE_HOST;
        in_dev_put(in_dev);
    }
    return 0;
}

路由表的删除
static int fn_hash_delete(struct fib_table *tb, struct fib_config *cfg)
{
    struct fn_hash *table = (struct fn_hash*)tb->tb_data;
    struct fib_node *f;
    struct fib_alias *fa, *fa_to_delete;
    struct fn_zone *fz;
    __be32 key;

    if (cfg->fc_dst_len > 32)
        return -EINVAL;

    if ((fz  = table->fn_zones[cfg->fc_dst_len]) == NULL)
        return -ESRCH;

    key = 0;
    if (cfg->fc_dst) {
        if (cfg->fc_dst & ~FZ_MASK(fz))
            return -EINVAL;
        key = fz_key(cfg->fc_dst, fz);
    }

    /* 查找对应fib_node */
    f = fib_find_node(fz, key);

    if (!f)
        fa = NULL;
    else
        /* 查找对应fib_alias */
        fa = fib_find_alias(&f->fn_alias, cfg->fc_tos, 0);
    if (!fa)
        return -ESRCH;

    fa_to_delete = NULL;
    fa = list_entry(fa->fa_list.prev, struct fib_alias, fa_list);
    /* 查找要删除的fib_alias */
    list_for_each_entry_continue(fa, &f->fn_alias, fa_list) {
        struct fib_info *fi = fa->fa_info;

        if (fa->fa_tos != cfg->fc_tos)
            break;

        if ((!cfg->fc_type ||
             fa->fa_type == cfg->fc_type) &&
            (cfg->fc_scope == RT_SCOPE_NOWHERE ||
             fa->fa_scope == cfg->fc_scope) &&
            (!cfg->fc_protocol ||
             fi->fib_protocol == cfg->fc_protocol) &&
            fib_nh_match(cfg, fi) == 0) {
            fa_to_delete = fa;
            break;
        }
    }

    if (fa_to_delete) {
        int kill_fn;

        fa = fa_to_delete;
        rtmsg_fib(RTM_DELROUTE, key, fa, cfg->fc_dst_len,
              tb->tb_id, &cfg->fc_nlinfo);

        kill_fn = 0;
        write_lock_bh(&fib_hash_lock);
        list_del(&fa->fa_list);
        /* fib_node上还有fib_alias吗? 没有则删除fib_node */
        if (list_empty(&f->fn_alias)) {
            hlist_del(&f->fn_hash);
            kill_fn = 1;
        }
        fib_hash_genid++;
        write_unlock_bh(&fib_hash_lock);

        /* 缓存中有使用则刷新 */
        if (fa->fa_state & FA_S_ACCESSED)
            rt_cache_flush(-1);
        /* 释放该fib_alias对fib_info的引用 */
        fn_free_alias(fa);
        if (kill_fn) {
            fn_free_node(f);
            fz->fz_nent--;
        }

        return 0;
    }
    return -ESRCH;
}

void fib_release_info(struct fib_info *fi)
{
    spin_lock_bh(&fib_info_lock);
    if (fi && --fi->fib_treeref == 0) {
        /* 该fib_info没有人引用了, 从fib_info_hash以及fib_info_laddrhash链表中删除 */
        hlist_del(&fi->fib_hash);
        if (fi->fib_prefsrc)
            hlist_del(&fi->fib_lhash);
        change_nexthops(fi) {
            if (!nh->nh_dev)
                continue;
            hlist_del(&nh->nh_hash);
        } endfor_nexthops(fi)
        fi->fib_dead = 1;
        /* fib_info使用次数减一, 为0则内存回收 */
        fib_info_put(fi);
    }
    spin_unlock_bh(&fib_info_lock);
}
原文地址:https://www.cnblogs.com/chanwai1219/p/2788760.html