Kubernetes 极简备忘录
{Back to Index}

1. 应用组件
2. Volume
- 2.1. emptyDir
- 2.2. PV/PVC
3. Taint/Toleration
- 3.1. Node Affinity
4. 网络
5. Service Account
- 5.1. 示例
6. Role&RoleBinding
7. Tips

1 应用组件

Figure 1: 应用组件概览

1.1 Pod

Pod 是一切资源的中心，Kubernetes 可以看做是 Pod 编排系统，大多数 Kubernetes 资源都是为了创建，监控，更新 Pod 而设计的。

Figure 2: 以 Pod 为中心的编排系统

1.1.1 Command/Argument

Figure 3: Pod command/args 与 Dockerfile ENTRYPOINT/CMD 对应关系

1.1.1.1 ENTRYPOINT/CMD

nginx DockerFile: 【实际运行的是: /docker-entrypoint.sh nginx -g 'daemon off;' 】

...
ENTRYPOINT ["/docker-entrypoint.sh"]  # ENTRYPOINT 是必须要执行的，不能被覆盖
EXPOSE 80
STOPSIGNAL SITTERM
CMD ["nginx" "-g" "daemon off;"] # 如果有指定 ENTRYPOINT ，CMD 则作为 ENTRYPOINT 的参数
                                 # CMD 是可以被覆盖的，例如 docker run nginx my-command ，则 my-command 就把 CMD 给覆盖掉了

busybox DockerFile: 【实际运行的就是: sh 】

ADD file ... in /
CMD ["sh"]        # busybox 没有定义 ENTRYPOINT ，因此运行容器时可以指定任意命令，从而将 CMD 覆盖掉

1.1.2 设计模式

1.1.3 Static Pod

Static Pod 由 kubelet 直接管理。【没有在 kube-apiserver 中注册过，但是 apiserver 可以检测到 static pod 的存在】

默认情况下，static pod 由 /etc/kubernetes/manifests/ 目录下的 Pod 定义文件所定义并由 kubelet 进行创建【Pod naming pattern: <pod-name>-<node-name> 】。想要删除 static pod ，只有通过删除定义文件。

# 如何找到 manifests 目录：
$> ps -ef | grep [/]usr/bin/kubelet
root 854 1 2 11:49 ? 00:07:23 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.6
$> grep static /var/lib/kubelet/config.yaml
staticPodPath: /etc/kubernetes/manifests

控制平面的 Pods ，如 etcd, kube-apiserver, kuber-controller 以及 kube-scheduler ， 它们都是 static Pod 。

(master)$> ls -l /etc/kubernetes/manifests/
total 16
-rw------- 1 root root 2209 May 12 22:42 etcd.yaml
-rw------- 1 root root 4004 May 12 22:42 kube-apiserver.yaml
-rw------- 1 root root 3516 May 12 22:42 kube-controller-manager.yaml
-rw------- 1 root root 1435 May 12 22:42 kube-scheduler.yaml

1.2 Deployment

1.2.1 RollingUpdate/rollout

滚动升级与回退示例

$> k create deployment my-deployment --image=ruanhao/kubia:v1 --replicas=3
deployment.apps/my-deployment created
$> k set image deployment/my-deployment kubia=ruanhao/kubia:v2 --record
Flag --record has been deprecated, --record will be removed in the future
deployment.apps/my-deployment image updated
$> k get all
NAME                                 READY   STATUS    RESTARTS   AGE
pod/my-deployment-7df8999569-b9fwq   1/1     Running   0          14s
pod/my-deployment-7df8999569-j6226   1/1     Running   0          16s
pod/my-deployment-7df8999569-wm5g8   1/1     Running   0          12s

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1            443/TCP   3m25s

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/my-deployment   3/3     3            3           2m16s

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/my-deployment-6dcbbf6fb4   0         0         0       2m16s  # old version of rs
replicaset.apps/my-deployment-7df8999569   3         3         3       16s    # new version of rs
$> k rollout history deployment my-deployment # 查看历史
deployment.apps/my-deployment
REVISION  CHANGE-CAUSE
1         
2         kubectl set image deployment/my-deployment kubia=ruanhao/kubia:v2 --record=true
$> k describe deployment/my-deployment | grep Image
    Image:        ruanhao/kubia:v2
$> k rollout undo deployment my-deployment  # 回退
deployment.apps/my-deployment rolled back
$> k describe deployment/my-deployment | grep Image
    Image:        ruanhao/kubia:v1
$> k get all
NAME                                 READY   STATUS    RESTARTS   AGE
pod/my-deployment-6dcbbf6fb4-2h2rs   1/1     Running   0          118s
pod/my-deployment-6dcbbf6fb4-2hnnp   1/1     Running   0          2m
pod/my-deployment-6dcbbf6fb4-r775p   1/1     Running   0          117s

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1            443/TCP   10m

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/my-deployment   3/3     3            3           9m19s

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/my-deployment-6dcbbf6fb4   3         3         3       9m19s
replicaset.apps/my-deployment-7df8999569   0         0         0       7m19s    # 仍然存在

1.3 Service

Figure 9: ClusterIP 是默认的 Service 类型

Figure 10: 端口对应关系

1.4 Ingress

Ingress 可以看成提供 NodePort/Loadbalance 服务的 HTTP 反向代理。

Figure 11: Ingress 与 Service 的比较

1.4.1 安装 (Bare metal/NodePort)

# k8s version must > 1.19
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.2.0/deploy/static/provider/baremetal/deploy.yaml

1.4.2 部署示例

1.4.2.1 基于主机名

apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: kubia-v1
  name: kubia-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kubia-v1
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: kubia-v1
    spec:
      containers:
      - image: ruanhao/kubia:v1
        name: kubia
        ports:
        - containerPort: 80
        resources: {}
---
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    app: kubia-v1
  name: kubia-v1
spec:
  ports:
  - name: 80-80
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: kubia-v1
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  creationTimestamp: null
  name: kubia-v1
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - host: v1.finditnm.com
    http:
      paths:
      - backend:
          service:
            name: kubia-v1
            port:
              number: 80
        path: /
        pathType: Prefix
--- # Define another Ingress
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: kubia-v2
  name: kubia-v2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kubia-v2
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: kubia-v2
    spec:
      containers:
      - image: ruanhao/kubia:v2
        name: kubia
        ports:
        - containerPort: 80
        resources: {}
---
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    app: kubia-v2
  name: kubia-v2
spec:
  ports:
  - name: 80-80
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: kubia-v2
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  creationTimestamp: null
  name: kubia-v2
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - host: v2.finditnm.com
    http:
      paths:
      - backend:
          service:
            name: kubia-v2
            port:
              number: 80
        path: /
        pathType: Prefix

$ k get service -n ingress-nginx
NAME                                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx-controller             NodePort    10.110.205.94           80:30590/TCP,443:32265/TCP   11h
ingress-nginx-controller-admission   ClusterIP   10.110.98.216           443/TCP                      11h
$ curl v1.finditnm.com:30590
Demo(v1) [POD: kubia-v1-56567467d5-qcwvh]
$ curl v2.finditnm.com:30590
Demo(v2) [POD: kubia-v2-8d64b5745-78wq4]

1.4.2.2 基于路径

apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: kubia-v1
  name: kubia-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kubia-v1
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: kubia-v1
    spec:
      containers:
      - image: ruanhao/kubia:v1
        name: kubia
        ports:
        - containerPort: 80
        resources: {}
---
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    app: kubia-v1
  name: kubia-v1
spec:
  ports:
  - name: 80-80
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: kubia-v1
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/rewrite-target: /
  creationTimestamp: null
  name: kubia-v1
spec:
  rules:
  - http:
      paths:
      - backend:
          service:
            name: kubia-v1
            port:
              number: 80
        path: /kubia/v1
        pathType: Prefix
--- # Define another Ingress
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: kubia-v2
  name: kubia-v2
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kubia-v2
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: kubia-v2
    spec:
      containers:
      - image: ruanhao/kubia:v2
        name: kubia
        ports:
        - containerPort: 80
        resources: {}
---
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    app: kubia-v2
  name: kubia-v2
spec:
  ports:
  - name: 80-80
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: kubia-v2
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/rewrite-target: /$2
  creationTimestamp: null
  name: kubia-v2
spec:
  rules:
  - http:
      paths:
      - backend:
          service:
            name: kubia-v2
            port:
              number: 80
        path: /kubia/v2(/|$)(.*)
        pathType: Prefix
status:
  loadBalancer: {}

# 留意两种不同的 rewrite 写法的效果
$ curl www.finditnm.com:30590/kubia/v1/hello/world
demo(v1), uri: /, pod: kubia-v1-56567467d5-x9rtn, ip: 192.168.1.25
$ curl www.finditnm.com:30590/kubia/v2/hello/world
demo(v2), uri: /hello/world, pod: kubia-v2-8d64b5745-6h5zd, ip: 192.168.1.24

2 Volume

在 .spec.volumes 中定义 Volume 并在 .spec.contaners[].volumeMounts 完成挂载。

2.1 emptyDir

emptyDir 会随 Pod 一起删除，通常用于在多个容器间共享存储。

apiVersion: v1
kind: Pod
metadata:
  name: multi-pod
spec:
  containers:
  - name: nginx-container
    image: nginx:latest
    volumeMounts:
    - name: log-dir
      mountPath: /var/log/nginx
  - name: fluentd-container
    image: fluentd:latest
    volumeMounts:
    - name: log-dir
      mountPath: /log
  volumes:
  - name: log-dir
    emptyDir: {}

emptyDir 类型的卷，实际上是在承载 Pod 的工作节点的实际磁盘上创建出来的。可以设置 medium: Memory 使用内存(tmfs) 来创建卷。

2.2 PV/PVC

Figure 15: PV 是集群级别的资源，PVC 是应用级别的，且 PV 和 PVC 是 一对一 的绑定

Figure 16: PV 本质是定义如何创建挂载点，PVC 则是使用挂载点，PVC 可以被不同 Pod 共享

2.2.1 Reclaim Policy

Retain (Pod 删除后，Volume 中内容保留)
Recycle (Pod 删除后，Volume 中内容也全被删除)
Delete (通常用于云服务提供商所提供的的 Volume ，在 Pod 删除后自动将 AWS EBS/GCE PD/Azure Disk 等资源删除)

2.2.2 Access Mode

Access Mode 对于 PV 而言是定义 Volume 的访问权限的，而对于 PVC 而言，则是用于匹配 PV 的。

ReadWriteOnce, ReadOnlyMany 和 ReadWriteMany 是针对 Node 而言的，即一个 PV 是否能在多个物理主机上创建挂载点，比如 PV1(ReadWriteOnce) 一开始与 PVC1 绑定并在 Node1 上创建了挂载点，解绑后又与另一个 PVC2 绑定了，那么使用 PVC2 的 Pod 仍然需要位于 Node1 上。

ReadWriteOncePod 表示 PVC 不能被多个 Pod 共享。

针对不同的 Volume 类型，支持的 Access Mode 也不相同，使用时需要查阅文档。¹

2.2.3 绑定/匹配

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-pv
  labels:
    type: nfs
spec:
  storageClassName: nfs
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  nfs:
    path: /nfsdata
    server: 10.74.107.7

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  storageClassName: nfs
  resources:
    requests:
      storage: 1Gi
  accessModes:
    - ReadWriteOnce
  selector:
    matchLabels:
      type: nfs

2.2.4 静态 PV 配置示例 ²

Figure 17: 每个节点都要安装 nfs-common

YAML:

$ k get pod -o wide
NAME                 READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
static-pv-demo-pod   1/1     Running   0          53s   192.168.2.63   worker2              
$ k get pvc -o wide
NAME     STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE   VOLUMEMODE
nfspvc   Bound    nfspv    10Gi       RWO            nfs            60s   Filesystem
$ k get pv -o wide
NAME    CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM            STORAGECLASS   REASON   AGE   VOLUMEMODE
nfspv   10Gi       RWO            Recycle          Bound    default/nfspvc   nfs                     63s   Filesystem
$ k describe pv nfspv
Name:            nfspv
Labels:          
Annotations:     pv.kubernetes.io/bound-by-controller: yes
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    nfs
Status:          Bound
Claim:           default/nfspvc
Reclaim Policy:  Recycle
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        10Gi
Node Affinity:   
Message:
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    10.74.107.7
    Path:      /nfsdata
    ReadOnly:  false
Events:

3 Taint/Toleration

通过为 Node 设置 Taint 与为 Pod 设置 Toleration 来实现调度策略。

3.1 Node Affinity

Taint/Toleration 适用于基于标签的简单选择，Node Affinity 适用于节点选择逻辑较为复杂的情况。

4 网络

4.1 基于网桥的三层网络模型(理论) ³

Figure 20: 仅使用三层网络构建的网络模型

此网络模型在不使用 Overlay 技术的情况下实现 Pod 间的通信，是最简单的网络模型。结合 iptables 的 DNAT 功能实现 Pod 与 Service 之间的通信。

此模型中物理网卡作为网桥的 Slave 设备，因此节点之间只能通过二层设备互联，不能跨路由器，因此多使用在局域网中，不能跨机房，跨区域大面积部署。

在 Host1 中需要设置路由：(Host2 同理)

echo 1 > /proc/sys/net/ipv4/ip_forward
ip route add 172.17.2.0/24 via 10.10.103.92 dev eth0
ip route add 172.17.1.0/24 dev docker0

在 Pod-1/Pod-2 中也需要做配置：(Pod-3/Pod-4 同理)

ip route add 10.10.103.91 dev eth0
ip route add default via 10.10.103.91 dev eth0

4.2 CNI

CNI 是一个用于构建虚拟化网络的插件，它有两方面的含义：① 指定操作规范 (Specifications) ；② 提供工具命令 (位于 /opt/cni/bin/ 目录下)

创建容器网络的简单示例

Figure 21: 创建网桥

Figure 22: 创建 Network Namespace

Figure 23: 创建 VETH PAIR

Figure 24: 将 VETH PAIR 连接到 NS 和 Bridge

Figure 25: 为 NS 中的接口添加地址

Figure 26: 在根 NS 中配置 iptables (DNAT)

以下面的脚本为例，CNI 就是制定了 ADD/DELETE/CHECK/VERSION 这些动作(接口) ，而 Calico, Flannel 等则负责具体的实现，就像上述的示例。

#! /bin/bash
case $1 in
    "ADD")                      # Add a network to a container
        ip l add br0 type bridge
        ip netns add ns1
        ip l add name veth123 type veth peer name veth456
        ip netns exec ns1 link set eth0 up
        ip l set dev veth456 master br0
        ip l veth456 up
        ip netns exec ns1 ip a add 172.32.2.1/24 dev eth0
        iptables -t nat -A DOCKER -j DNAT --dport 8080 --to-destination 172.32.2.1:80
        ;;
    "DELETE")                   # Remove a network from a container
        ;;
    "CHECK")                    # Get network information for a container
        ;;
    "VERSION")                  # CNI version
        ;;
done

Figure 27: CNI plugin 负责生成对容器网络的描述

Figure 28: kubelet 负责 CNI plugin 的加载

4.2.1 Flannel ⁴

Figure 29: Flannel 网络示意图

①

(pod1) # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
    link/ether 6a:97:3e:20:55:e6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.7/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
(pod1) # ip r
default via 192.168.1.1 dev eth0
192.168.0.0/16 via 192.168.1.1 dev eth0
192.168.1.0/24 dev eth0 scope link  src 192.168.1.7
#################################################################
(worker-1) $ ip a
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:de:f0:74:4f brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 7e:4f:1b:41:c7:7a brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::7c4f:1bff:fe41:c77a/64 scope link
       valid_lft forever preferred_lft forever
5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether e2:b0:c7:1e:8e:7a brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 brd 192.168.1.255 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::e0b0:c7ff:fe1e:8e7a/64 scope link
       valid_lft forever preferred_lft forever
11: veth6b3caf45@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
    link/ether de:9c:43:da:8b:ce brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::dc9c:43ff:feda:8bce/64 scope link
       valid_lft forever preferred_lft forever
(worker-1)$ brctl show
bridge name     bridge id               STP enabled     interfaces
cni0            8000.e2b0c71e8e7a       no              veth100d66d7
                                                        veth6b3caf45
                                                        vetheed12777
docker0         8000.0242def0744f       no

② cni0 收到 IP 包，因为 network namespace 发生了切换，即 root namespace 收到一个数据包，如果 pod1 访问的是 service url (cluster ip) ，则在这一步，会根据 iptables NAT 表 PREROUTING 链，修改目标 ip/port 。因此无论是访问 service 还是直接访问 pod ，目标 ip/port 指向的永远是 endpoint 。

由于 endpoint 为 192.168.2.12:80 ，根据路由表，数据将被交给 flannel.1 设备：

(worker-1)$ ip r
default via 10.74.107.154 dev eth0 proto static
10.74.107.0/24 dev eth0 proto kernel scope link src 10.74.107.63
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.0.0/24 via 192.168.0.0 dev flannel.1 onlink
192.168.1.0/24 dev cni0 proto kernel scope link src 192.168.1.1
192.168.2.0/24 via 192.168.2.0 dev flannel.1 onlink

③ flannel.1 实际上是 vxlan 设备，驱动程序会依据 etcd 中记录的转发规则确定 vxlan header ，即目标 ip 为 worker-2 。

④ 当数据包即将从物理网卡发出时，会根据 iptables nat 表的 POSTROUTING 链执行 SNAT 转换。

在 worker-2 上进行抓包： PCAP 💾

Frame 361: 192 bytes on wire (1536 bits), 192 bytes captured (1536 bits)
Ethernet II, Src: VMware_82:51:0c (00:50:56:82:51:0c), Dst: VMware_82:a9:01 (00:50:56:82:a9:01)
Internet Protocol Version 4, Src: 10.74.107.63, Dst: 10.74.107.62  外层 IP
User Datagram Protocol, Src Port: 35115, Dst Port: 8472
Virtual eXtensible Local Area Network
Ethernet II, Src: 7e:4f:1b:41:c7:7a (7e:4f:1b:41:c7:7a), Dst: a6:0d:d2:cd:65:14 (a6:0d:d2:cd:65:14)
Internet Protocol Version 4, Src: 192.168.1.7, Dst: 192.168.2.12  内层 IP
Transmission Control Protocol, Src Port: 50058, Dst Port: 80, Seq: 1, Ack: 1, Len: 76
Hypertext Transfer Protocol
    GET / HTTP/1.1\r\n
    Host: 192.168.2.12\r\n
    User-Agent: curl/7.64.1\r\n
    Accept: */*\r\n
    \r\n
    [Full request URI: http://192.168.2.12/]
    [HTTP request 1/1]
    [Response in frame: 363]

⑤ 当数据包到达 worker-2 后，因为有 kernel module(flanneld) 监听 udp:8472 端口，vxlan 报文被解封装，再根据路由表可知，数据包需要转交给 cni0 ，最终到达 pod2 。

root@worker-2(10.74.107.62):/ $ ps -ef | grep [f]lannel
root       16763   16743  0 11:41 ?        00:00:07 /opt/bin/flanneld --ip-masq --kube-subnet-mgr
root@worker-2(10.74.107.62):/ ss -ulpne
State    Recv-Q   Send-Q      Local Address:Port       Peer Address:Port   Process
UNCONN   0        0           127.0.0.53%lo:53              0.0.0.0:*       users:(("systemd-resolve",pid=629,fd=12)) uid:101 ino:22660 sk:8b <->
UNCONN   0        0                 0.0.0.0:8472            0.0.0.0:*       ino:184269 sk:8c <->
root@worker-2(10.74.107.62):/ ip r
default via 10.74.107.154 dev eth0 proto static
10.74.107.0/24 dev eth0 proto kernel scope link src 10.74.107.62
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.0.0/24 via 192.168.0.0 dev flannel.1 onlink
192.168.1.0/24 via 192.168.1.0 dev flannel.1 onlink
192.168.2.0/24 dev cni0 proto kernel scope link src 192.168.2.1

4.2.2 Calico

Figure 30: Calico 网络示意

网络原理和 Flannel 基本类似，不同的是 calico 不使用网桥设备作为容器的网关，而是将网关直接指向 veth pair 。

①

(pod1)/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if16:  mtu 1450 qdisc noqueue
    link/ether 7a:1d:70:95:60:29 brd ff:ff:ff:ff:ff:ff
    inet 192.168.226.74/32 scope global eth0
       valid_lft forever preferred_lft forever
(pod1)/ # ip r
default via 169.254.1.1 dev eth0  网关指向一个虚拟的本地链路地址
169.254.1.1 dev eth0 scope link
(pod1)/ # ip n
169.254.1.1 dev eth0 lladdr ee:ee:ee:ee:ee:ee used 0/0/0 probes 1 STALE  就是 veth 对端接口的物理地址
##############
(worker-1)$> ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:50:56:82:6b:75 brd ff:ff:ff:ff:ff:ff
3: docker0: <hNO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
    link/ether 02:42:20:9d:01:22 brd ff:ff:ff:ff:ff:ff
6: cali9558f23398a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
7: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/ether 66:93:7f:54:d5:53 brd ff:ff:ff:ff:ff:ff
15: cali5d30ef99766@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
16: cali6abd5b4bbeb@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
(worker-1)$> ip r 路由信息由 BIRD/felix 负责维护
default via 10.74.107.154 dev eth0 proto static
10.74.107.0/24 dev eth0 proto kernel scope link src 10.74.107.72
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.133.192/26 via 192.168.133.192 dev vxlan.calico onlink  去往位于 worker-2 上的 pod 的流量需经过 vxlan 设备进行封装
192.168.219.64/26 via 192.168.219.65 dev vxlan.calico onlink
blackhole 192.168.226.64/26 proto 80
192.168.226.65 dev cali9558f23398a scope link
192.168.226.73 dev cali5d30ef99766 scope link  去往位于本机上 pod 的流量，直接走 cali 接口即可
192.168.226.74 dev cali6abd5b4bbeb scope link

⑤ PCAP 💾

Frame 54: 195 bytes on wire (1560 bits), 195 bytes captured (1560 bits)
Ethernet II, Src: VMware_82:6b:75 (00:50:56:82:6b:75), Dst: VMware_82:2a:53 (00:50:56:82:2a:53)
Internet Protocol Version 4, Src: 10.74.107.72, Dst: 10.74.107.73
User Datagram Protocol, Src Port: 59709, Dst Port: 4789
Virtual eXtensible Local Area Network
Ethernet II, Src: 66:93:7f:54:d5:53 (66:93:7f:54:d5:53), Dst: 66:e2:6c:ca:0c:ad (66:e2:6c:ca:0c:ad)
Internet Protocol Version 4, Src: 192.168.226.74, Dst: 192.168.133.200
Transmission Control Protocol, Src Port: 37570, Dst Port: 80, Seq: 1, Ack: 1, Len: 79
Hypertext Transfer Protocol
    GET / HTTP/1.1\r\n
    Host: 192.168.133.200\r\n
    User-Agent: curl/7.64.1\r\n
    Accept: */*\r\n
    \r\n
    [Full request URI: http://192.168.133.200/]
    [HTTP request 1/1]
    [Response in frame: 56]

4.2.2.1 组件

每个节点都会运行 BGP ( bird 进程) ，节点上 pod 的网络信息通过 BGP 协议分发。

Figure 31: Calico 使用的几大组件

4.3 Network Policy ⁵

4.3.1 Deny all incoming traffic

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-incoming
spec:
  podSelector: {}
  policyTypes:
  - Ingress

$> k describe networkpolicy deny-all-incoming
Name:         deny-all-incoming
Namespace:    default
Created on:   2022-05-16 18:33:57 +0800 CST
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Allowing ingress traffic:
    <none> (Selected pods are isolated for ingress connectivity)
  Not affecting egress traffic
  Policy Types: Ingress

4.3.2 Deny all outgoing traffic

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-outgoing
spec:
  podSelector: {}
  policyTypes:
  - Egress

$ k describe networkpolicy deny-all-outgoing
Name:         deny-all-outgoing
Namespace:    default
Created on:   2022-05-16 18:33:57 +0800 CST
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Not affecting ingress traffic
  Allowing egress traffic:
    <none> (Selected pods are isolated for egress connectivity)
  Policy Types: Egress

4.3.3 Allow all incoming traffic

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-incoming
spec:
  podSelector: {}
  ingress:
  - {}
  policyTypes:
  - Ingress

$ k describe networkpolicy allow-all-incoming
Name:         allow-all-incoming
Namespace:    default
Created on:   2022-05-16 18:33:57 +0800 CST
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Allowing ingress traffic:
    To Port: <any> (traffic allowed to all ports)
    From: <any> (traffic not restricted by source)
  Not affecting egress traffic
  Policy Types: Ingress

4.3.4 Allow all outgoing traffic

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-outgoing
spec:
  podSelector: {}
  egress:
  - {}
  policyTypes:
  - Egress

$ k describe networkpolicy allow-all-outgoing
Name:         allow-all-outgoing
Namespace:    default
Created on:   2022-05-16 17:55:21 +0800 CST
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Not affecting ingress traffic
  Allowing egress traffic:
    To Port: <any> (traffic allowed to all ports)
    To: <any> (traffic not restricted by destination)
  Policy Types: Egress

5 Service Account

Figure 32: 基于 Token 的 SA

每个 Namespace 下都有一个默认的 SA ，在创建 NS 时会自动创建
默认情况下，SA 没有任何权限，通常需要手动授权【除非集群中没有 RBAC 插件，此时 SA 具有任何权限】
Pod 的 SA 必须在 Pod 创建时设置，后续无法被修改

5.1 示例

5.1.1 允许访问 NS 下某些资源

资源清单

apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: null
  name: namespace-view-sa
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: null
  name: pod-viewer-role
  namespace: default
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-reader-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: pod-viewer-role
subjects:
- kind: ServiceAccount
  name: namespace-view-sa
  namespace: default
---
apiVersion: v1
kind: Pod
metadata:
  name: test-namespace-view-sa
spec:
  serviceAccountName: namespace-view-sa
  containers:
  - image: cirros
    name: main
  - image: ruanhao/kubectl-proxy
    name: proxy
  restartPolicy: Never
---
# Create Deployment for test purpose
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: kubia
  name: kubia
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kubia
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: kubia
    spec:
      containers:
      - image: ruanhao/kubia:v1
        name: kubia
        ports:
        - containerPort: 80
        resources: {}

$> k exec -it test-namespace-view-sa -c main -- sh
/ # curl -s -D -  localhost:8001/api/v1/namespaces/default/pods -o nul
HTTP/1.1 200 OK
Audit-Id: b975720a-3bbf-4ce4-a2c1-deeadedb549d
Cache-Control: no-cache, private
Content-Type: application/json
Date: Mon, 16 May 2022 08:53:49 GMT
X-Kubernetes-Pf-Flowschema-Uid: 03dfc3e7-af25-4320-b6e6-fd653c811b78
X-Kubernetes-Pf-Prioritylevel-Uid: b704a131-e510-4db4-97a5-2241ba7d48c3
Transfer-Encoding: chunked
/ # curl -s -D -  localhost:8001/api/v1/namespaces/default/services -o nul
HTTP/1.1 403 Forbidden
Audit-Id: 5ba79ae8-d134-405c-b47c-b8238db4eb4a
Cache-Control: no-cache, private
Content-Length: 345
Content-Type: application/json
Date: Mon, 16 May 2022 08:53:57 GMT
X-Content-Type-Options: nosniff
X-Kubernetes-Pf-Flowschema-Uid: 03dfc3e7-af25-4320-b6e6-fd653c811b78
X-Kubernetes-Pf-Prioritylevel-Uid: b704a131-e510-4db4-97a5-2241ba7d48c3
/ #

5.1.2 允许访问 NS 下所有资源

资源清单

apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: null
  name: namespace-view-sa
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: null
  name: pod-viewer-role
  namespace: default
rules:
- apiGroups:
  - ""
  resources:
  - "*"
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-reader-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: pod-viewer-role
subjects:
- kind: ServiceAccount
  name: namespace-view-sa
  namespace: default
---
apiVersion: v1
kind: Pod
metadata:
  name: test-namespace-view-sa
spec:
  serviceAccountName: namespace-view-sa
  containers:
  - image: cirros
    name: main
  - image: ruanhao/kubectl-proxy
    name: proxy
  restartPolicy: Never
---
# Create Deployment for test purpose
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: kubia
  name: kubia
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kubia
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: kubia
    spec:
      containers:
      - image: ruanhao/kubia:v1
        name: kubia
        ports:
        - containerPort: 80
        resources: {}

$ k exec -it test-namespace-view-sa -c main -- sh
/ # curl -s -D -  localhost:8001/api/v1/namespaces/default/pods -o nul
HTTP/1.1 200 OK
Audit-Id: 00e19919-bee1-4ef8-a630-6617846a0cb4
Cache-Control: no-cache, private
Content-Type: application/json
Date: Mon, 16 May 2022 08:57:57 GMT
X-Kubernetes-Pf-Flowschema-Uid: 03dfc3e7-af25-4320-b6e6-fd653c811b78
X-Kubernetes-Pf-Prioritylevel-Uid: b704a131-e510-4db4-97a5-2241ba7d48c3
Transfer-Encoding: chunked
/ # curl -s -D -  localhost:8001/api/v1/namespaces/default/services -o nul
HTTP/1.1 200 OK
Audit-Id: ffe4ea4e-d512-4ea0-ba3a-8ba3bd51b3ef
Cache-Control: no-cache, private
Content-Length: 2058
Content-Type: application/json
Date: Mon, 16 May 2022 08:58:03 GMT
X-Kubernetes-Pf-Flowschema-Uid: 03dfc3e7-af25-4320-b6e6-fd653c811b78
X-Kubernetes-Pf-Prioritylevel-Uid: b704a131-e510-4db4-97a5-2241ba7d48c3

5.1.3 赋予 SA 最高权限

资源清单

apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: null
  name: admin-sa
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: null
  name: sa-admin-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-sa
  namespace: default
---
apiVersion: v1
kind: Pod
metadata:
  name: test-admin-sa
spec:
  serviceAccountName: admin-sa
  containers:
  - image: cirros
    name: main
  - image: ruanhao/kubectl-proxy
    name: proxy
  restartPolicy: Never
---
# Create Deployment for test purpose
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: kubia
  name: kubia
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kubia
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: kubia
    spec:
      containers:
      - image: ruanhao/kubia:v1
        name: kubia
        ports:
        - containerPort: 80
        resources: {}

 $ k exec -it test-admin-sa -c main -- sh
/ # curl -s -D - localhost:8001/api/v1/nodes -o nul
HTTP/1.1 200 OK
Audit-Id: 98bc3732-55de-4be3-8b69-f1c763e2b4a1
Cache-Control: no-cache, private
Content-Type: application/json
Date: Mon, 16 May 2022 09:05:55 GMT
X-Kubernetes-Pf-Flowschema-Uid: 03dfc3e7-af25-4320-b6e6-fd653c811b78
X-Kubernetes-Pf-Prioritylevel-Uid: b704a131-e510-4db4-97a5-2241ba7d48c3
Transfer-Encoding: chunked

6 Role&RoleBinding

Role 可以定义访问：

特定 namespace 下的资源

ClusterRole 可以定义访问：

集群级别的资源，如 node
非资源型 endpoints ，如 /healthz
所有 namespace 下的资源

Role/ClusterRole 只是定义可以访问的 资源范围 ，还需要参考实际使用的 RoleBinding/ClusterRoleBinding 。

Figure 34: per namespace vs per cluster

6.1 仅当前命名空间内【Role/RoleBinding】【ClusterRole/RoleBinding】

Figure 35: Role + RoleBinding

Figure 36: ClusterRole + RoleBinding

6.2 允许访问所有命名空间【ClusterRole/ClusterRoleBinding】

Figure 37: ClusterRole + ClusterRoleBinding

6.3 允许访问集群资源【ClusterRole/ClusterRoleBinding】

Figure 38: ClusterRole + ClusterRoleBinding

7 Tips

7.1 安装 master/worker

Figure 39: 需要安装的组件

Figure 40: 安装步骤

$ corgi_configure k8s-bootstrap-master --help
Usage: corgi_configure k8s-bootstrap-master [OPTIONS]

Options:
  --kubernetes-version TEXT
  --pod-network TEXT
  --helm-version TEXT
  --metrics-server-version TEXT
  --dry
  --help                         Show this message and exit.

$ corgi_configure k8s-bootstrap-worker --help
Usage: corgi_configure k8s-bootstrap-worker [OPTIONS]

Options:
  --kubernetes-version TEXT
  --ip TEXT                  Master node global IP address  [required]
  --dry
  --help                     Show this message and exit.

7.2 YAML 模板 ⁶

7.2.1 Pod

$> k run my-busybox --image=busybox -o yaml --dry-run=client --command -- sh -c 'while true; do sleep 10; done'
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: my-busybox
  name: my-busybox
spec:
  containers:
  - command:
    - sh
    - -c
    - while true; do sleep 10; done
    image: busybox
    name: my-busybox
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

7.2.2 Deployment

k create deployment  --image= --port= --replicas= -o yaml --dry-run=client
# k create deployment kubia --image=ruanhao/kubia:v1 --port=80 --replicas=3 -o yaml --dry-run=client

7.2.3 DaemonSet

先创建 Deployment 模块，然后修改下。

k create deploy my-daemon-set --image=ruanhao/kubia:v1 --dry-run=client -o yaml

apiVersion: apps/v1
kind: Deployment    DaemonSet
metadata:
  creationTimestamp: null
  labels:
    app: my-daemon-set
  name: my-daemon-set
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-daemon-set
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: my-daemon-set
    spec:
      containers:
      - image: ruanhao/kubia:v1
        name: kubia
        resources: {}
status: {}

7.2.4 Service

k create service [clusterip|nodeport|externalname|loadbalancer]  --tcp=: --dry-run=client -o yaml
# k create service clusterip kubia-v1 --tcp=80:80 --dry-run=client -o yaml

7.2.5 Ingress

kubectl create ingress NAME --rule=host/path=service:port[,tls[=secret]] --dry-run=client -o yaml
# k create ingress kubia-v1 --rule=www1.finditnm.com/*=kubia-v1:80 -o yaml --dry-run=client --annotation=kubernetes.io/ingress.class=nginx
# k create ingress kubia-v1 --rule=/v1*=kubia-v1:80 -o yaml --dry-run=client --annotation=kubernetes.io/ingress.class=nginx --annotation=nginx.ingress.kubernetes.io/rewrite-target=/
# k create ingress kubia-v1 --rule='/kubia/v2(/|$)(.*)*'=kubia-v2:80 -o yaml --dry-run=client --annotation=kubernetes.io/ingress.class=nginx --annotation=nginx.ingress.kubernetes.io/rewrite-target='/$2'

7.2.6 Role,ClusterRole/RoleBinding,ClusterRoleBinding

# 在某个 namespace 中限定对 Pod 的访问操作
k create role pod-reader-role --verb=get,list,watch --resource=pods --namespace=dev
# 在所有 namespace 中，限定对 Pod 的访问操作
k create role pod-reader-role --verb=get,list,watch --resource=pods
# 限定 Resource name
k create role pod-reader-role --verb=get --resource=pods --resource-name=one-pod --resource-name=anothere-pod  --namespace=dev
# 限定 API Group
k create role only-rs-role --verb=get,list,watch --resource=replicasets.apps

# 将 Role 与用户关联
k create rolebinding bob-admin-binding --clusterrole=admin --user=bob --namespace=dev
# 将 Role 与 Service Account 关联
k create rolebinding myapp-view-binding --clusterrole=view --serviceaccount=dev:myapp --namespace=dev
# 赋予用户最高权限
k create clusterrolebinding root-cluster-admin-binding --clusterrole=cluster-admin --user=root
# 赋予 Service Account (位于 namespace dev 中) 集群级别权限
k create clusterrolebinding myapp-view-binding --clusterrole=view --serviceaccount=dev:myapp

7.3 查看资源版本信息

k api-resources

7.4 更新 join command

(master)$> kubeadm token create $( kubeadm token generate ) --print-join-command

7.5 删除 label

k label pods PodName LabelName-

7.6 创建 KUBECONFIG

Figure 41: k config view (Context 将 User, NS 与 Cluster 绑定在一起)【可以通过 k config get-contexts 查看 Context 信息】

$ corgi_configure k8s-user-conf --help
Usage: corgi_configure k8s-user-conf [OPTIONS]

  Generate KUBECONFIG file which can be used to authenticate/authorize user
  with default cluster-admin role

Options:
  -s, --api-server TEXT      API Server address  [required]
  --api-server-port INTEGER  API Server port
  --pki-path TEXT            Directory where CA resides
  --cfssl-version TEXT       CFSSL version
  -u, --user TEXT            [required]
  -g, --group TEXT           [required]
  -ns, --namespace TEXT
  --dry
  --userspace                Namespace based or cluster based access level
  -e, --expiry-days INTEGER  Expiry days for certificate
  --help                     Show this message and exit.

7.7 创建用于测试的容器

k run cirror-$RANDOM --rm -it --image=cirros -- /bin/sh

7.8 Helm 简单使用

################ Install Redis #######################

helm repo add bitnami https://charts.bitnami.com/bitnami
helm pull bitnami/redis
tar xfvz redis-16.9.2.tgz
cd redis
sed -i '/storageClass: ""/ d' values.yaml   # in order to use dynamic pv
helm install -f values.yaml --namespace=default my-redis .
# if only modified values.yaml, can just:
# helm install my-redis bitnami/redis -f ./values.yaml

# BTW, show you how to get redis password:
# kubectl get secret --namespace default my-redis -o jsonpath="{.data.redis-password}" | base64 --decode
# kubectl exec my-redis-master-0 -it -- /bin/sh
# $ redis-cli -a **********

7.9 更新证书

################## Extend Certificates' expiry date by one year ########################

# kubeadm alpha certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

7.10 查看 etcd

> cat /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://10.74.107.43:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://10.74.107.43:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://10.74.107.43:2380
    - --initial-cluster=master=https://10.74.107.43:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://10.74.107.43:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://10.74.107.43:2380
    - --name=master
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: k8s.gcr.io/etcd:3.5.1-0
...
> etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key get / --prefix --keys-only
/registry/apiregistration.k8s.io/apiservices/v1.
/registry/apiregistration.k8s.io/apiservices/v1.admissionregistration.k8s.io
/registry/apiregistration.k8s.io/apiservices/v1.apiextensions.k8s.io
/registry/apiregistration.k8s.io/apiservices/v1.apps
/registry/apiregistration.k8s.io/apiservices/v1.authentication.k8s.io
...

7.11 安装私有仓库

安装脚本

# run as root and define variables here
DOMAIN=registry.finditnm.com

# Install docker-compose
curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

# Install Docker
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get update -y
apt-get install docker-ce -y
systemctl enable docker
systemctl start docker

# Prepare certificate
mkdir -p /etc/cert
cd /etc/cert
openssl genrsa 2048 >key.pem
openssl req -new -x509 -key key.pem -out cert.pem -days 3650 -subj /C=CN/ST=SH/L=SH/O=CRDC/OU=CBD/CN=$DOMAIN
chmod -R 777 /etc/cert
cd -

# Install harbor
wget https://github.com/goharbor/harbor/releases/download/v2.5.0/harbor-online-installer-v2.5.0.tgz
tar xfvz harbor*.tgz
cd harbor
cp harbor.yml.tmpl harbor.yml

bash install.sh

安装完后，为了使节点有权限访问仓库，需要在所有 Master/Worker 节点上运行（也可以使用其他方法，比如 Secret ）：

如使用 Docker 作为 cri ：

或使用通用的命令：

scp cisco@docker.finditnm.com:~/certs/docker-registry-root-ca.crt ./
cp docker-registry-root-ca.crt /usr/local/share/ca-certificates
update-ca-certificates
nerdctl login docker.finditnm.com

7.12 自定义输出格式

$ k get pods -o=custom-columns='NAME:metadata.name,IMAGES:spec.containers[*].image'
NAME                             IMAGES
kubia-7c845994ff-jw57c           ruanhao/kubia:v1
kubia-7c845994ff-mg6f5           ruanhao/kubia:v1
kubia-7c845994ff-nm5x6           ruanhao/kubia:v1
my-deployment-7df8999569-6qgmp   ruanhao/kubia:v2
my-deployment-7df8999569-7xrdb   ruanhao/kubia:v2
my-deployment-7df8999569-k7h2h   ruanhao/kubia:v2
my-deployment-7df8999569-mg644   ruanhao/kubia:v2
my-deployment-7df8999569-zlcjh   ruanhao/kubia:v2
my-pod-v3                        ruanhao/kubia:v3
test-pod                         ruanhao/kubia:v1
test-pod-v2                      ruanhao/kubia:v2
test-pod-v3                      ruanhao/kubia:v3

7.13 暂停调度

$> mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp
$> k run my-pod-v2 --image=ruanhao/kubia:v2 --port=80 --expose
service/my-pod-v2 created
pod/my-pod-v2 created
$> k get po my-pod-v2
NAME        READY   STATUS    RESTARTS   AGE
my-pod-v2   1/1     Pending   0          10s
$> mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/
# wait seconds
$> k get po my-pod-v2
NAME        READY   STATUS    RESTARTS   AGE
my-pod-v2   1/1     Running   0          101s

7.14 查看 Requests/Limits 使用情况

$> k describe no worker2
...
Non-terminated Pods:          (4 in total) # Resource usage of all pods that are allocated on this node
  Namespace                   Name                      CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                      ------------  ----------  ---------------  -------------  ---
  kube-system                 glusterfs-4bptc           0 (0%)        0 (0%)      0 (0%)           0 (0%)         2d16h
  kube-system                 heketi-7857b5964-v7g9m    0 (0%)        0 (0%)      0 (0%)           0 (0%)         2d16h
  kube-system                 kube-flannel-ds-tzdxh     100m (2%)     100m (2%)   50Mi (0%)        50Mi (0%)      2d16h
  kube-system                 kube-proxy-7v5ll          0 (0%)        0 (0%)      0 (0%)           0 (0%)         2d16h
Allocated resources: # summary of node resource usage
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (2%)  100m (2%)
  memory             50Mi (0%)  50Mi (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
...

7.15 drain/cordon Node

# Remove all pods on the node, and put the node in Unschedurable state
k drain <node> --force --ignore-daemonsets
# Put the node in Unschedurable state (without deleting existing pods)
k cordon <node>
# Uncordon node
k uncordon <node>

7.16 升级流程

Upgrade kubeadm on master node
Use kubeadm tool to upgrade master node
Upgrade kubelet on master node
- 这步完成后，记得执行 apt-mark hold kubelet kubeadm ，即禁用自动升级
Drain worker node (will move pods to other nodes)
Upgrade kubeadm on worker node
Use kubeadm tool to upgrade worker node
Upgrade kubelet on worker node
- 这步完成后，记得执行 apt-mark hold kubelet kubeadm ，即禁用自动升级
uncordon worker node
Repeat the same steps for other worker nodes

Figure 42: 升级 Kubernetes 流程

查看升级推荐：

(master)$> kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.23.6
[upgrade/versions] kubeadm version: v1.23.6
I0515 21:24:35.981037  261265 version.go:255] remote version is much newer: v1.24.0; falling back to: stable-1.23
[upgrade/versions] Target version: v1.23.6
[upgrade/versions] Latest version in the v1.23 series: v1.23.6

7.17 备份所有应用配置

kubectl get all -A -o yaml >apps-configuration.yaml 【并非包含所有资源】

7.18 查看所有 Named Group 【非 Core Group】

$> k get --raw /apis/ | jq -r '.groups[].name'
apiregistration.k8s.io
apps
events.k8s.io
authentication.k8s.io
authorization.k8s.io
autoscaling
batch
certificates.k8s.io
networking.k8s.io
policy
rbac.authorization.k8s.io
storage.k8s.io
admissionregistration.k8s.io
apiextensions.k8s.io
scheduling.k8s.io
coordination.k8s.io
node.k8s.io
discovery.k8s.io
flowcontrol.apiserver.k8s.io
metrics.k8s.io

7.19 本地端口转发

An effective way to test an individual pod.

kubectl port-forward <pod-name> <local-port>:<pod-port>

7.20 获取所有节点 IP 【jsonpath用法】

kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}'

更多用法参考：JSONPath Support

7.21 强制删除 Pod

kubectl delete po mypod --grace-period=0 --force

7.22 拷贝本地文件

kubectl cp foo-pod:/var/log/foo.log foo.log
kubectl cp localfile foo-pod:/etc/remotefile

Footnotes:

Access Modes

使用 corgi_configure ubuntu-nfs-server 和 corgi_configure ubuntu-nfs-client 分别配置 NFS server/client

使用三层网络模型实现 Pod 间通信实验

⁴

Flannel 可以开启 Directrouting 以提高传输性能，这里使用 vxlan 模式来举例

⁵

Flannel Plugin 不支持 Network Policy

⁶

kubectl create

Kubernetes 极简备忘录 {Back to Index} Expand All Details on Page

Table of Contents

1 应用组件

1.1 Pod

1.1.1 Command/Argument

1.1.1.1 ENTRYPOINT/CMD

1.1.2 设计模式

1.1.3 Static Pod

1.2 Deployment

1.2.1 RollingUpdate/rollout

1.3 Service

1.4 Ingress

1.4.1 安装 (Bare metal/NodePort)

1.4.2 部署示例

1.4.2.1 基于主机名

1.4.2.2 基于路径

2 Volume

2.1 emptyDir

2.2 PV/PVC

2.2.1 Reclaim Policy

2.2.2 Access Mode

2.2.3 绑定/匹配

2.2.4 静态 PV 配置示例 2

3 Taint/Toleration

3.1 Node Affinity

4 网络

4.1 基于网桥的三层网络模型(理论) 3

4.2 CNI

4.2.1 Flannel 4

4.2.2 Calico

4.2.2.1 组件

4.3 Network Policy 5

4.3.1 Deny all incoming traffic

4.3.2 Deny all outgoing traffic

4.3.3 Allow all incoming traffic

4.3.4 Allow all outgoing traffic

5 Service Account

5.1 示例

5.1.1 允许访问 NS 下某些资源

5.1.2 允许访问 NS 下所有资源

5.1.3 赋予 SA 最高权限

6 Role&RoleBinding

6.1 仅当前命名空间内【Role/RoleBinding】【ClusterRole/RoleBinding】

6.2 允许访问所有命名空间【ClusterRole/ClusterRoleBinding】

6.3 允许访问集群资源【ClusterRole/ClusterRoleBinding】

7 Tips

7.1 安装 master/worker

7.2 YAML 模板 6

7.2.1 Pod

7.2.2 Deployment

7.2.3 DaemonSet

7.2.4 Service

7.2.5 Ingress

7.2.6 Role,ClusterRole/RoleBinding,ClusterRoleBinding

7.3 查看资源版本信息

7.4 更新 join command

7.5 删除 label

7.6 创建 KUBECONFIG

7.7 创建用于测试的容器

7.8 Helm 简单使用

7.9 更新证书

7.10 查看 etcd

7.11 安装私有仓库

7.12 自定义输出格式

7.13 暂停调度

7.14 查看 Requests/Limits 使用情况

7.15 drain/cordon Node

7.16 升级流程

7.17 备份所有应用配置

7.18 查看所有 Named Group 【非 Core Group】

7.19 本地端口转发

7.20 获取所有节点 IP 【jsonpath用法】

7.21 强制删除 Pod

7.22 拷贝本地文件

Footnotes:

Kubernetes 极简备忘录
{Back to Index}

2.2.4 静态 PV 配置示例 ²

4.1 基于网桥的三层网络模型(理论) ³

4.2.1 Flannel ⁴

4.3 Network Policy ⁵

7.2 YAML 模板 ⁶