通过iptables LOG动作分析k8s的Service负载

环境

kubernetes 1.12.1

内核版本 3.10.0-1062.18.1.el7.x86_64

docker版本 1.13.1-162

kube-proxy使用iptables mode

pod网络 flannel vxlan 

在集群中存在两个Service分别是prometheus和kube-dns,其中prometheus有一个实例在node01,kube-dns具有两个实例,分布在node01和master上

Service pod node
prometheus 10.110.238.91:9090 prometheus-546bd96fc5-h4gtg  10.244.1.6:9090 k8s-node01.com 172.21.0.13
kube-dns 10.96.0.10:53 coredns-576cbf47c7-57xd7 10.244.1.8:53 k8s-node01.com 172.21.0.13
coredns-576cbf47c7-q6p2h 10.244.0.4:53 k8s-master.com 172.16.0.2

过程

以下通过编写脚本,在每一条iptables规则上面,插入一条匹配条件一致但是动作为LOG的规则,这样就能通过LOG动作记录的日志来观察请求所匹配到的规则

选择master节点,停止该节点上的kube-proxy pod以确保在实验过程中节点上的Iptables规则不因集群信息变化而变动

1、停止master节点的kube-proxy

kc get ds -n kube-system kube-proxy -o yaml > /appdata/kube-proxy.yaml 
kc  patch ds kube-proxy -n kube-system --type=json -p='[{"op":"replace", "path": "/spec/template/spec/tolerations", "value":null}]' 
kc edit ds kube-proxy -n kube-system #增加nodeSelector kubernetes.io/hostname: k8s-node01.com

再次查看,发现master节点上的kube-proxy pod已消失,但是节点上的iptables规则仍然保留

2、备份原iptables规则

iptables-save >> /appdata/iptables.bak

3、配置节点rsyslog,将kernel的debug日志输出到/var/log/iptables

echo "kern.debug    /var/log/iptables" >> /etc/rsyslog.conf
systemctl restart rsyslog

4、编写LOG脚本,其在每一条规则上面,插入一条匹配条件一致但是动作为LOG的规则,而且每条LOG规则都利用了limit模块限制速率,否则可能有太多log不便于观察如下

(涉及操作iptables规则,务必需要在测试环境的测试节点测试,否则有严重后果)

#Author JianlongZ
#At 10-08-2020

#!/bin/bash
set -e
rulefile="./iptables-rule.log"
bakfile="./iptables-save-$(date +%Y%m%d%H%M%S)"
resultfile="./run-iptables-$(date +%Y%m%d%H%M%S)"

#save table or rule
iptables-save | grep -E "^-|*" > ${rulefile}
#insert position
position=1
#insert table
table=""
#the prerule
prerule=""

#save rules to file for read
iptables-save > $bakfile

while read rule

do
    if [[ $rule =~ ^* ]]; then
        table=$(awk -F* '{print $2}' <<< $rule)
        continue
    fi

    chain=$(awk '{print $2}' <<< $rule)
    prechain=$(awk '{print $2}' <<< $prerule)
    condition=$(echo $rule | awk '{$1=$2=""; print}' | awk -F'-j' '{print $1}')
    position=$(expr $position + 2)
    
    if [[ $chain != $prechain ]]; then
        position=1
    fi

    res="iptables -t $table -I $chain $position $condition -m limit --limit-burst 1 --limit 1/second -j LOG --log-prefix "$table|$position|$chain" --log-level debug "
    echo $res >> $resultfile

    prerule=$rule

done < $rulefile

echo "Finish."
echo "Please run $resultfile to insert the log rules."

结论

在/var/log/iptables文件中根据svcIp和podIp筛选日志,再根据日志前缀就可以在如下完整的iptables规则中找到该此访问所经过的规则

# Generated by iptables-save v1.4.21 on Fri Oct  9 15:39:58 2020
*mangle
:PREROUTING ACCEPT [512762:115269776]
:INPUT ACCEPT [512762:115269776]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [518304:132706339]
:POSTROUTING ACCEPT [518304:132706339]
COMMIT
# Completed on Fri Oct  9 15:39:58 2020
# Generated by iptables-save v1.4.21 on Fri Oct  9 15:39:58 2020
*nat
:PREROUTING ACCEPT [2:112]
:INPUT ACCEPT [2:112]
:OUTPUT ACCEPT [1:60]
:POSTROUTING ACCEPT [1:60]
:DOCKER - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SEP-6FRGWTS5YGV54XWV - [0:0]
:KUBE-SEP-HPQF756YQTNK43WA - [0:0]
:KUBE-SEP-KZMEYJZBDY4HFAEO - [0:0]
:KUBE-SEP-MXQMVNGFUQPLZSHS - [0:0]
:KUBE-SEP-NWYX6ZRA4HKJWFJ6 - [0:0]
:KUBE-SEP-YC5G23GHTZAZPNO5 - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-FNI7RW7PEKOXZDFO - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|PREROUTING" --log-level 7
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -m comment --comment "kubernetes service portals" -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|OUTPUT" --log-level 7
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|POSTROUTING" --log-level 7
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|POSTROUTING" --log-level 7
-A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
-A POSTROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|5|POSTROUTING" --log-level 7
-A POSTROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
-A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/24 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|7|POSTROUTING" --log-level 7
-A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/24 -j RETURN
-A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/16 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|9|POSTROUTING" --log-level 7
-A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
-A KUBE-MARK-DROP -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-MARK-DROP" --log-level 7
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-MARK-MASQ" --log-level 7
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-POSTROUTING" --log-level 7
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-6FRGWTS5YGV54XWV -s 10.244.1.6/32 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SEP-6FRGWTS5YGV54X" --log-level 7
-A KUBE-SEP-6FRGWTS5YGV54XWV -s 10.244.1.6/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-6FRGWTS5YGV54XWV -p tcp -m tcp -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SEP-6FRGWTS5YGV54X" --log-level 7
-A KUBE-SEP-6FRGWTS5YGV54XWV -p tcp -m tcp -j DNAT --to-destination 10.244.1.6:9090
-A KUBE-SEP-HPQF756YQTNK43WA -s 10.244.1.9/32 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SEP-HPQF756YQTNK43" --log-level 7
-A KUBE-SEP-HPQF756YQTNK43WA -s 10.244.1.9/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-HPQF756YQTNK43WA -p tcp -m tcp -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SEP-HPQF756YQTNK43" --log-level 7
-A KUBE-SEP-HPQF756YQTNK43WA -p tcp -m tcp -j DNAT --to-destination 10.244.1.9:53
-A KUBE-SEP-KZMEYJZBDY4HFAEO -s 10.244.1.8/32 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SEP-KZMEYJZBDY4HFA" --log-level 7
-A KUBE-SEP-KZMEYJZBDY4HFAEO -s 10.244.1.8/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-KZMEYJZBDY4HFAEO -p tcp -m tcp -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SEP-KZMEYJZBDY4HFA" --log-level 7
-A KUBE-SEP-KZMEYJZBDY4HFAEO -p tcp -m tcp -j DNAT --to-destination 10.244.1.8:53
-A KUBE-SEP-MXQMVNGFUQPLZSHS -s 10.244.1.8/32 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SEP-MXQMVNGFUQPLZS" --log-level 7
-A KUBE-SEP-MXQMVNGFUQPLZSHS -s 10.244.1.8/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-MXQMVNGFUQPLZSHS -p udp -m udp -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SEP-MXQMVNGFUQPLZS" --log-level 7
-A KUBE-SEP-MXQMVNGFUQPLZSHS -p udp -m udp -j DNAT --to-destination 10.244.1.8:53
-A KUBE-SEP-NWYX6ZRA4HKJWFJ6 -s 10.244.1.9/32 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SEP-NWYX6ZRA4HKJWF" --log-level 7
-A KUBE-SEP-NWYX6ZRA4HKJWFJ6 -s 10.244.1.9/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-NWYX6ZRA4HKJWFJ6 -p udp -m udp -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SEP-NWYX6ZRA4HKJWF" --log-level 7
-A KUBE-SEP-NWYX6ZRA4HKJWFJ6 -p udp -m udp -j DNAT --to-destination 10.244.1.9:53
-A KUBE-SEP-YC5G23GHTZAZPNO5 -s 172.16.0.2/32 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SEP-YC5G23GHTZAZPN" --log-level 7
-A KUBE-SEP-YC5G23GHTZAZPNO5 -s 172.16.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-YC5G23GHTZAZPNO5 -p tcp -m tcp -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SEP-YC5G23GHTZAZPN" --log-level 7
-A KUBE-SEP-YC5G23GHTZAZPNO5 -p tcp -m tcp -j DNAT --to-destination 172.16.0.2:6443
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SERVICES" --log-level 7
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SERVICES" --log-level 7
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|5|KUBE-SERVICES" --log-level 7
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|7|KUBE-SERVICES" --log-level 7
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|9|KUBE-SERVICES" --log-level 7
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|11|KUBE-SERVICES" --log-level 7
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|13|KUBE-SERVICES" --log-level 7
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|15|KUBE-SERVICES" --log-level 7
-A KUBE-SERVICES -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -j KUBE-SVC-FNI7RW7PEKOXZDFO
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|17|KUBE-SERVICES" --log-level 7
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m statistic --mode random --probability 0.50000000000 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SVC-ERIFXISQEP7F7O" --log-level 7
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-KZMEYJZBDY4HFAEO
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SVC-ERIFXISQEP7F7O" --log-level 7
-A KUBE-SVC-ERIFXISQEP7F7OF4 -j KUBE-SEP-HPQF756YQTNK43WA
-A KUBE-SVC-FNI7RW7PEKOXZDFO -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SVC-FNI7RW7PEKOXZD" --log-level 7
-A KUBE-SVC-FNI7RW7PEKOXZDFO -j KUBE-SEP-6FRGWTS5YGV54XWV
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SVC-NPX46M4PTMTKRN" --log-level 7
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-YC5G23GHTZAZPNO5
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m statistic --mode random --probability 0.50000000000 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SVC-TCOU7JCQXEZGVU" --log-level 7
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-MXQMVNGFUQPLZSHS
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SVC-TCOU7JCQXEZGVU" --log-level 7
-A KUBE-SVC-TCOU7JCQXEZGVUNU -j KUBE-SEP-NWYX6ZRA4HKJWFJ6
COMMIT
# Completed on Fri Oct  9 15:39:58 2020
# Generated by iptables-save v1.4.21 on Fri Oct  9 15:39:58 2020
*filter
:INPUT ACCEPT [505840:113853465]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [511458:131041314]
:DOCKER - [0:0]
:DOCKER-ISOLATION - [0:0]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|1|INPUT" --log-level 7
-A INPUT -j KUBE-FIREWALL
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|3|INPUT" --log-level 7
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A FORWARD -m comment --comment "kubernetes forwarding rules" -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|1|FORWARD" --log-level 7
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -s 10.244.0.0/16 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|3|FORWARD" --log-level 7
-A FORWARD -s 10.244.0.0/16 -j ACCEPT
-A FORWARD -d 10.244.0.0/16 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|5|FORWARD" --log-level 7
-A FORWARD -d 10.244.0.0/16 -j ACCEPT
-A OUTPUT -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|1|OUTPUT" --log-level 7
-A OUTPUT -j KUBE-FIREWALL
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|3|OUTPUT" --log-level 7
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|1|KUBE-FIREWALL" --log-level 7
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|1|KUBE-FORWARD" --log-level 7
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -s 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|3|KUBE-FORWARD" --log-level 7
-A KUBE-FORWARD -s 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -d 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|5|KUBE-FORWARD" --log-level 7
-A KUBE-FORWARD -d 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
COMMIT
# Completed on Fri Oct  9 15:39:58 2020
iptables-save-all

比如日志Oct  9 15:34:59 k8s-master kernel: nat|1|OUTPUT IN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 表示经过了nat表的OUTPUT链的第1条规则,那么这条规则的下一条就是实际我们要找的,即-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES

1、master节点上直接访问svc  172.16.0.2 -> 10.110.238.91

[root@VM-0-2-centos appdata]# cat /var/log/iptables | grep -E  "10.244.1.6|10.110.238.91"
Oct  9 15:34:59 k8s-master kernel: nat|1|OUTPUTIN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 
Oct  9 15:34:59 k8s-master kernel: nat|13|KUBE-SERVICESIN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 
Oct  9 15:34:59 k8s-master kernel: nat|1|KUBE-MARK-MASQIN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 
Oct  9 15:34:59 k8s-master kernel: nat|15|KUBE-SERVICESIN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x4000 
Oct  9 15:34:59 k8s-master kernel: nat|1|KUBE-SVC-FNI7RW7PEKOXZDIN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x4000 
Oct  9 15:34:59 k8s-master kernel: nat|3|KUBE-SEP-6FRGWTS5YGV54XIN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x4000 
Oct  9 15:34:59 k8s-master kernel: filter|3|OUTPUTIN= OUT=eth0 SRC=172.16.0.2 DST=10.244.1.6 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x4000 
Oct  9 15:34:59 k8s-master kernel: nat|1|POSTROUTINGIN= OUT=flannel.1 SRC=172.16.0.2 DST=10.244.1.6 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x4000 
Oct  9 15:34:59 k8s-master kernel: nat|1|KUBE-POSTROUTINGIN= OUT=flannel.1 SRC=172.16.0.2 DST=10.244.1.6 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x4000 
iptables-log
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -j KUBE-MARK-MASQ
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-SERVICES -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -j KUBE-SVC-FNI7RW7PEKOXZDFO
-A KUBE-SVC-FNI7RW7PEKOXZDFO -j KUBE-SEP-6FRGWTS5YGV54XWV
-A KUBE-SEP-6FRGWTS5YGV54XWV -p tcp -m tcp -j DNAT --to-destination 10.244.1.6:9090
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE

由于是节点上面通过curl访问,也就是节点上的进程发出请求,所以匹配的第一条规则在OUTPUT链且不经过FORWARD链

然后发现在Openshift环境下发现节点访问svc是不会做MASQ的:我们知道本机进程发出请求之前要先根据目标ip做路由判断再匹配OUTPUT链,这个决定了包是从哪个网卡发出的,比如在当前openshift环境下会为每个节点增加一条路由规则(例如172.30.0.0/16 dev tun0,这个tun0网卡相当于这个节点所有pod的网关),这样路由判断之后匹配到的iptables规则的包的src就都是tun0即pod网段了,所以node直接访问svc是否会做MASQ跟网络插件为节点添加的路由表有关系。

2、master节点上的pod访问svc 10.244.0.4 -> 10.110.238.91

pid=$(docker ps | grep -i coredns | docker inspect 9752d0a23a80 --format="{{.State.Pid}}")
nsenter -t $pid -n
Oct 10 14:43:44 k8s-master kernel: nat|1|PREROUTINGIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
Oct 10 14:43:44 k8s-master kernel: nat|15|KUBE-SERVICESIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
Oct 10 14:43:44 k8s-master kernel: nat|1|KUBE-SVC-FNI7RW7PEKOXZDIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
Oct 10 14:43:44 k8s-master kernel: nat|3|KUBE-SEP-6FRGWTS5YGV54XIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
Oct 10 14:43:44 k8s-master kernel: filter|1|FORWARDIN=cni0 OUT=flannel.1 PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.244.1.6 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
Oct 10 14:43:44 k8s-master kernel: filter|3|FORWARDIN=cni0 OUT=flannel.1 PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.244.1.6 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
Oct 10 14:43:44 k8s-master kernel: nat|3|POSTROUTINGIN= OUT=flannel.1 PHYSIN=veth5082b103 SRC=10.244.0.4 DST=10.244.1.6 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
iptables-log-pod-svc
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A KUBE-SERVICES -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -j KUBE-SVC-FNI7RW7PEKOXZDFO
-A KUBE-SVC-FNI7RW7PEKOXZDFO -j KUBE-SEP-6FRGWTS5YGV54XWV
-A KUBE-SEP-6FRGWTS5YGV54XWV -p tcp -m tcp -j DNAT --to-destination 10.244.1.6:9090
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -s 10.244.0.0/16 -j ACCEPT
-A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN

节点上的pod访问svc,当包从veth流到宿主节点上时应该从PREROUTING开始匹配,对应入口第一条规则

在flannel+vxlan实现中,如果源ip和目标ip都属于pod网段,那么请求必然是从pod发出,所以这里不需要snat

3、pod直接访问自己的svc 10.244.0.4 -> 10.96.0.10

Oct 10 15:19:02 k8s-master kernel: nat|1|PREROUTINGIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.96.0.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 
Oct 10 15:19:02 k8s-master kernel: nat|11|KUBE-SERVICESIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.96.0.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 
Oct 10 15:19:02 k8s-master kernel: nat|1|KUBE-SEP-SF3LG62VAE5ALYIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.96.0.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 
Oct 10 15:19:02 k8s-master kernel: nat|1|KUBE-MARK-MASQIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.96.0.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 
Oct 10 15:19:02 k8s-master kernel: nat|3|KUBE-SEP-SF3LG62VAE5ALYIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.96.0.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 MARK=0x4000 
Oct 10 15:19:02 k8s-master kernel: filter|1|FORWARDIN=cni0 OUT=cni0 PHYSIN=veth5082b103 PHYSOUT=veth5082b103 MAC=0a:58:0a:f4:00:04:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.244.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 MARK=0x4000 
Oct 10 15:19:02 k8s-master kernel: filter|1|KUBE-FORWARDIN=cni0 OUT=cni0 PHYSIN=veth5082b103 PHYSOUT=veth5082b103 MAC=0a:58:0a:f4:00:04:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.244.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 MARK=0x4000 
Oct 10 15:19:02 k8s-master kernel: nat|1|POSTROUTINGIN= OUT=cni0 PHYSIN=veth5082b103 PHYSOUT=veth5082b103 SRC=10.244.0.4 DST=10.244.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 MARK=0x4000 
Oct 10 15:19:02 k8s-master kernel: nat|1|KUBE-POSTROUTINGIN= OUT=cni0 PHYSIN=veth5082b103 PHYSOUT=veth5082b103 SRC=10.244.0.4 DST=10.244.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 MARK=0x4000 
pod-serf-svc
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-SF3LG62VAE5ALYDV
-A KUBE-SEP-SF3LG62VAE5ALYDV -s 10.244.0.4/32 -j KUBE-MARK-MASQ
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-SEP-SF3LG62VAE5ALYDV -p tcp -m tcp -j DNAT --to-destination 10.244.0.4:53
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE

同样是会先走PREROUTING,然后因为后端有两个pod,0.5的概率选择到自身,然后发现源ip是自己的ip,就需要做MASQ,防止pod收到包之后误以为是自己发给自己(10.244.0.4->10.244.0.4)从而握手失败(正常应该是10.96.0.10 -> 10.244.0.4),做MASQ可以让回包先到宿主节点,再利用conntrack记录的信息设置源ip为svcIp。

类似的,当集群外客户端通过nodePort形式访问时,也需要做snat,否则回包直接发给客户端就会导致握手失败。

问题

在/var/log/iptables通过ip筛选规则时,发现很经常规则是不全的

原文地址:https://www.cnblogs.com/orchidzjl/p/13784264.html