kubernetes源码阅读笔记——kube-proxy(之一)

kube-proxy是kubernetes中用于实现service与pod之间流量转发的组件。当我们向一个service发送数据包时,实际的接收者是service代理的后端pod,这一功能就是由kube-proxy实现的。

与其它组件一样,kube-proxy的入口函数位于cmd中,具体位置在cmd/kube-proxy/proxy.go,一样是采用了cobra方法:

cmd/kube-proxy/proxy.go

func main() {
    rand.Seed(time.Now().UnixNano())

    command := app.NewProxyCommand()

    // TODO: once we switch everything over to Cobra commands, we can go back to calling
    // utilflag.InitFlags() (by removing its pflag.Parse() call). For now, we have to set the
    // normalize func and add the go flag set by hand.
    pflag.CommandLine.SetNormalizeFunc(utilflag.WordSepNormalizeFunc)
    pflag.CommandLine.AddGoFlagSet(goflag.CommandLine)
    // utilflag.InitFlags()
    logs.InitLogs()
    defer logs.FlushLogs()

    if err := command.Execute(); err != nil {
        fmt.Fprintf(os.Stderr, "error: %v
", err)
        os.Exit(1)
    }
}

进入NewProxyCommand方法,其核心仍在于Run方法。

进入Run方法:

cmd/kube-proxy/app/server.go

func (o *Options) Run() error {
    if len(o.WriteConfigTo) > 0 {
        return o.writeConfigFile()
    }

    proxyServer, err := NewProxyServer(o)
    if err != nil {
        return err
    }

    return proxyServer.Run()
}

方法很简洁,就是生成了一个ProxyServer对象,并运行。

一、NewProxyServer

NewProxyServer方法调用了私有的newProxyServer方法。方法位于app包中,对于不同的操作系统,会执行server_windows.go或者server_others.go中的同名方法。这里只分析linux下的:

cmd/kube-proxy/app/server_others.go

func newProxyServer( config
*proxyconfigapi.KubeProxyConfiguration, cleanupAndExit bool, cleanupIPVS bool, scheme *runtime.Scheme, master string) (*ProxyServer, error) { if config == nil { return nil, errors.New("config is required") } ... var iptInterface utiliptables.Interface var ipvsInterface utilipvs.Interface var kernelHandler ipvs.KernelHandler var ipsetInterface utilipset.Interface var dbus utildbus.Interface // Create a iptables utils. execer := exec.New() dbus = utildbus.New() iptInterface = utiliptables.New(execer, dbus, protocol) kernelHandler = ipvs.NewLinuxKernelHandler() ipsetInterface = utilipset.New(execer) canUseIPVS, _ := ipvs.CanUseIPVSProxier(kernelHandler, ipsetInterface) if canUseIPVS { ipvsInterface = utilipvs.New(execer) } ... client, eventClient, err := createClients(config.ClientConnection, master) if err != nil { return nil, err } ... var proxier proxy.ProxyProvider var serviceEventHandler proxyconfig.ServiceHandler var endpointsEventHandler proxyconfig.EndpointsHandler proxyMode := getProxyMode(string(config.Mode), iptInterface, kernelHandler, ipsetInterface, iptables.LinuxKernelCompatTester{}) nodeIP := net.ParseIP(config.BindAddress) if nodeIP.IsUnspecified() { nodeIP = utilnode.GetNodeIP(client, hostname) } if proxyMode == proxyModeIPTables { klog.V(0).Info("Using iptables Proxier.") if config.IPTables.MasqueradeBit == nil { // MasqueradeBit must be specified or defaulted. return nil, fmt.Errorf("unable to read IPTables MasqueradeBit from config") } // TODO this has side effects that should only happen when Run() is invoked. proxierIPTables, err := iptables.NewProxier( iptInterface, utilsysctl.New(), execer, config.IPTables.SyncPeriod.Duration, config.IPTables.MinSyncPeriod.Duration, config.IPTables.MasqueradeAll, int(*config.IPTables.MasqueradeBit), config.ClusterCIDR, hostname, nodeIP, recorder, healthzUpdater, config.NodePortAddresses, ) if err != nil { return nil, fmt.Errorf("unable to create proxier: %v", err) } metrics.RegisterMetrics() proxier = proxierIPTables serviceEventHandler = proxierIPTables endpointsEventHandler = proxierIPTables // No turning back. Remove artifacts that might still exist from the userspace Proxier. klog.V(0).Info("Tearing down inactive rules.") // TODO this has side effects that should only happen when Run() is invoked. userspace.CleanupLeftovers(iptInterface) // IPVS Proxier will generate some iptables rules, need to clean them before switching to other proxy mode. // Besides, ipvs proxier will create some ipvs rules as well. Because there is no way to tell if a given // ipvs rule is created by IPVS proxier or not. Users should explicitly specify `--clean-ipvs=true` to flush // all ipvs rules when kube-proxy start up. Users do this operation should be with caution. if canUseIPVS { ipvs.CleanupLeftovers(ipvsInterface, iptInterface, ipsetInterface, cleanupIPVS) } } else if proxyMode == proxyModeIPVS { klog.V(0).Info("Using ipvs Proxier.") ...... } else { klog.V(0).Info("Using userspace Proxier.")
...... } iptInterface.AddReloadFunc(proxier.Sync) return &ProxyServer{ Client: client, EventClient: eventClient, IptInterface: iptInterface, IpvsInterface: ipvsInterface, IpsetInterface: ipsetInterface, execer: execer, Proxier: proxier, Broadcaster: eventBroadcaster, Recorder: recorder, ConntrackConfiguration: config.Conntrack, Conntracker: &realConntracker{}, ProxyMode: proxyMode, NodeRef: nodeRef, MetricsBindAddress: config.MetricsBindAddress, EnableProfiling: config.EnableProfiling, OOMScoreAdj: config.OOMScoreAdj, ResourceContainer: config.ResourceContainer, ConfigSyncPeriod: config.ConfigSyncPeriod.Duration, ServiceEventHandler: serviceEventHandler, EndpointsEventHandler: endpointsEventHandler, HealthzServer: healthzServer, }, nil }

前面的代码相对不那么重要,最重要的在于判断kube-proxy的运行模式,并进行相应处理,生成对应的ProxyServer结构体。目前,kubernetes广泛使用的是iptables模式,因此这里以iptables为例,另外两个模式略过。

可以看到,在iptables模式下,会调用iptables包中的NewProxier方法,生成适用于iptables的proxier。此外,eventhandler也都配置成这个proxier。最后,将处理后的字段填入ProxyServer结构体中,并返回。

newProxier方法相对直观,就是生成一个proxier并返回。如注释中所说,proxier会及时维护iptables的状态,确保iptables数据始终处于最新:

pkg/proxy/iptables/proxier.go

//
NewProxier returns a new Proxier given an iptables Interface instance. // Because of the iptables logic, it is assumed that there is only a single Proxier active on a machine. // An error will be returned if iptables fails to update or acquire the initial lock. // Once a proxier is created, it will keep iptables up to date in the background and // will not terminate if a particular iptables call fails. func NewProxier(ipt utiliptables.Interface, sysctl utilsysctl.Interface, exec utilexec.Interface, syncPeriod time.Duration, minSyncPeriod time.Duration, masqueradeAll bool, masqueradeBit int, clusterCIDR string, hostname string, nodeIP net.IP, recorder record.EventRecorder, healthzServer healthcheck.HealthzUpdater, nodePortAddresses []string, ) (*Proxier, error) { // Set the route_localnet sysctl we need for if val, _ := sysctl.GetSysctl(sysctlRouteLocalnet); val != 1 { if err := sysctl.SetSysctl(sysctlRouteLocalnet, 1); err != nil { return nil, fmt.Errorf("can't set sysctl %s: %v", sysctlRouteLocalnet, err) } } // Proxy needs br_netfilter and bridge-nf-call-iptables=1 when containers // are connected to a Linux bridge (but not SDN bridges). Until most // plugins handle this, log when config is missing if val, err := sysctl.GetSysctl(sysctlBridgeCallIPTables); err == nil && val != 1 { klog.Warning("missing br-netfilter module or unset sysctl br-nf-call-iptables; proxy may not work as intended") } // Generate the masquerade mark to use for SNAT rules. masqueradeValue := 1 << uint(masqueradeBit) masqueradeMark := fmt.Sprintf("%#08x/%#08x", masqueradeValue, masqueradeValue) ...
proxier := &Proxier{ portsMap: make(map[utilproxy.LocalPort]utilproxy.Closeable), serviceMap: make(proxy.ServiceMap), serviceChanges: proxy.NewServiceChangeTracker(newServiceInfo, &isIPv6, recorder), endpointsMap: make(proxy.EndpointsMap), endpointsChanges: proxy.NewEndpointChangeTracker(hostname, newEndpointInfo, &isIPv6, recorder), iptables: ipt, masqueradeAll: masqueradeAll, masqueradeMark: masqueradeMark, exec: exec, clusterCIDR: clusterCIDR, hostname: hostname, nodeIP: nodeIP, portMapper: &listenPortOpener{}, recorder: recorder, healthChecker: healthChecker, healthzServer: healthzServer, precomputedProbabilities: make([]string, 0, 1001), iptablesData: bytes.NewBuffer(nil), existingFilterChainsData: bytes.NewBuffer(nil), filterChains: bytes.NewBuffer(nil), filterRules: bytes.NewBuffer(nil), natChains: bytes.NewBuffer(nil), natRules: bytes.NewBuffer(nil), nodePortAddresses: nodePortAddresses, networkInterfacer: utilproxy.RealNetwork{}, } burstSyncs := 2 klog.V(3).Infof("minSyncPeriod: %v, syncPeriod: %v, burstSyncs: %d", minSyncPeriod, syncPeriod, burstSyncs) proxier.syncRunner = async.NewBoundedFrequencyRunner("sync-runner", proxier.syncProxyRules, minSyncPeriod, syncPeriod, burstSyncs) return proxier, nil }

值得注意的是倒数第二句,这里定义了proxier的syncRunner字段,即proxier的具体运行逻辑。这里我们后面再详细说。

总之,NewProxyServer方法相对直观,就是创建了对应运行模式(如iptables)的ProxyServer。具体执行则要看后面的Run方法。

二、Run

进入server.go中的Run方法:

cmd/kube-proxy/app/server.go

// Run runs the specified ProxyServer.  This should never exit (unless CleanupAndExit is set).
func (s *ProxyServer) Run() error {
    ...

// Start up a healthz server if requested ...

// Start up a metrics server if requested ...

// Tune conntrack, if requested // Conntracker is always nil for windows ... informerFactory := informers.NewSharedInformerFactoryWithOptions(s.Client, s.ConfigSyncPeriod, informers.WithTweakListOptions(func(options *v1meta.ListOptions) { options.LabelSelector = "!service.kubernetes.io/service-proxy-name" })) // Create configs (i.e. Watches for Services and Endpoints) // Note: RegisterHandler() calls need to happen before creation of Sources because sources // only notify on changes, and the initial update (on process start) may be lost if no handlers // are registered yet. serviceConfig := config.NewServiceConfig(informerFactory.Core().V1().Services(), s.ConfigSyncPeriod) serviceConfig.RegisterEventHandler(s.ServiceEventHandler) go serviceConfig.Run(wait.NeverStop) endpointsConfig := config.NewEndpointsConfig(informerFactory.Core().V1().Endpoints(), s.ConfigSyncPeriod) endpointsConfig.RegisterEventHandler(s.EndpointsEventHandler) go endpointsConfig.Run(wait.NeverStop) // This has to start after the calls to NewServiceConfig and NewEndpointsConfig because those // functions must configure their shared informer event handlers first. go informerFactory.Start(wait.NeverStop) // Birth Cry after the birth is successful s.birthCry() // Just loop forever for now... s.Proxier.SyncLoop() return nil }

前面仍是添加健康检查、监测等前置处理,略过。重点在于后半部分:

(1)创建service和endpoint的informer,并运行,即通过这两个informer来及时获取集群中service和endpoint资源的变化。

(2)调用birthCry方法。这个方法没什么特别的,就是记录一个kube-proxy启动的事件。

(3)调用SyncLoop方法,持续运行Proxier。

下一篇文章我们分别来看(1)和(3)。后面将以service为例,endpoint的逻辑与service类似。https://www.cnblogs.com/00986014w/p/11018314.html

原文地址:https://www.cnblogs.com/00986014w/p/11011496.html