Kubernetes 企业级容器编排实战:从入门到生产架构的深度实践

作者:云原生架构师
技术栈:Kubernetes, Docker, Helm, Istio, Prometheus
难度等级:★★★★★(专家级)
预计阅读时间:80 分钟


目录

  1. [引言:为什么选择 Kubernetes](#1-引言为什么选择 kubernetes)
  2. Kubernetes 架构深度解析
  3. Kubernetes 集群部署
  4. 核心资源对象详解
  5. 应用部署实战
  6. 服务发现与负载均衡
  7. 存储管理
  8. 配置与密钥管理
  9. 自动扩缩容
  10. 监控与日志
  11. 安全加固
  12. 生产环境最佳实践

1. 引言:为什么选择 Kubernetes

1.1 容器编排的演进

┌─────────────────────────────────────────────────────┐
│  容器编排发展史                                      │
├─────────────────────────────────────────────────────┤
│  2013-2014: 手动部署阶段                             │
│  - Docker Compose (单机)                            │
│  - 脚本管理 (ansible, puppet)                       │
│  问题:无法自动恢复、无法弹性伸缩                    │
│                                                       │
│  2014-2016: 编排工具竞争                             │
│  - Docker Swarm (Docker 原生)                       │
│  - Kubernetes (Google 背景)                         │
│  - Mesos (Apache 出品)                              │
│                                                       │
│  2016-2018: Kubernetes 胜出                         │
│  - CNCF 毕业项目 (2018)                             │
│  - 事实标准 (78% 市场份额)                          │
│  - 生态完善 (Helm, Istio, Prometheus)               │
│                                                       │
│  2018-现在:云原生时代                               │
│  - Serverless Kubernetes                            │
│  - Service Mesh                                     │
│  - GitOps                                           │
└─────────────────────────────────────────────────────┘

1.2 Kubernetes vs Docker Swarm

特性 Kubernetes Docker Swarm
架构复杂度 高(Master/Node) 低(Manager/Worker)
学习曲线 陡峭 平缓
功能丰富度 ⭐⭐⭐⭐⭐ ⭐⭐⭐
自动恢复 支持 支持
自动扩缩容 HPA/VPA 手动
服务发现 内置 DNS 内置 DNS
负载均衡 高级(Ingress) 基础
存储编排 CSI 标准 基础
配置管理 ConfigMap/Secret Config
生态成熟度 极成熟 一般
企业采用率 78% 15%

结论:生产环境首选 Kubernetes


2. Kubernetes 架构深度解析

2.1 集群架构

Control Plane

API Server

Scheduler

Controller Manager

etcd

Worker Node 1

Kubelet

Kube Proxy

Container Runtime

Pod 1

Pod 2

Worker Node 2

Kubelet

Kube Proxy

Container Runtime

Pod 3

Pod 4

2.2 核心组件详解

Control Plane(控制平面)

  1. API Server

    • 集群统一入口
    • RESTful API
    • 认证授权
    • 参数验证
  2. Scheduler

    • Pod 调度决策
    • 资源优化
    • 亲和性/反亲和性
  3. Controller Manager

    • 节点控制器
    • 副本控制器
    • 端点控制器
    • 服务账户控制器
  4. etcd

    • 分布式键值存储
    • 集群状态存储
    • 强一致性(Raft 协议)

Worker Node(工作节点)

  1. Kubelet

    • 节点代理
    • Pod 生命周期管理
    • 健康检查
  2. Kube Proxy

    • 网络代理
    • 负载均衡
    • iptables/IPVS
  3. Container Runtime

    • Docker
    • containerd
    • CRI-O

3. Kubernetes 集群部署

3.1 kubeadm 部署

环境准备

# 所有节点执行
# 关闭 swap
swapoff -a
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# 配置内核参数
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF

sysctl --system

# 安装 Docker
yum install -y yum-utils
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum install -y docker-ce docker-ce-cli containerd.io
systemctl enable docker && systemctl start docker

# 安装 kubeadm, kubelet, kubectl
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

yum install -y kubelet kubeadm kubectl
systemctl enable kubelet && systemctl start kubelet

初始化 Master 节点

# Master 节点执行
kubeadm init \
  --pod-network-cidr=10.244.0.0/16 \
  --service-cidr=10.96.0.0/12 \
  --kubernetes-version=v1.29.0 \
  --apiserver-advertise-address=192.168.1.100

# 配置 kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 安装网络插件(Calico)
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

加入 Worker 节点

# Worker 节点执行(使用 kubeadm init 输出的命令)
kubeadm join 192.168.1.100:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash>

验证集群

# 查看节点
kubectl get nodes

# 输出:
# NAME           STATUS   ROLES           AGE   VERSION
# master         Ready    control-plane   10m   v1.29.0
# worker1        Ready    <none>          5m    v1.29.0
# worker2        Ready    <none>          5m    v1.29.0

# 查看组件状态
kubectl get componentstatuses

# 输出:
# NAME                 STATUS    MESSAGE             ERROR
# controller-manager   Healthy   ok
# scheduler            Healthy   ok
# etcd-0               Healthy   {"health":"true"}

3.2 高可用集群部署

架构

┌─────────────────────────────────────────────────────┐
│  Kubernetes 高可用集群                               │
├─────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │
│  │   Master 1  │  │   Master 2  │  │   Master 3  │ │
│  │ 192.168.1.10│  │ 192.168.1.11│  │ 192.168.1.12│ │
│  └──────┬──────┘  └──────┬──────┘  └────────────┘ │
│         │                │                │         │
│         └────────────────┼────────────────┘         │
│                          │                          │
│              ┌───────────▼───────────┐             │
│              │   Load Balancer       │             │
│              │   (HAProxy/Nginx)     │             │
│              │   192.168.1.100:6443  │             │
│              └───────────┬───────────┘             │
│                          │                          │
│         ┌────────────────┼────────────────┐         │
│         │                │                │         │
│  ┌──────▼──────┐  ┌──────▼──────┐  ┌──────▼──────┐ │
│  │  Worker 1   │  │  Worker 2   │  │  Worker 3   │ │
│  │192.168.1.20 │  │192.168.1.21 │  │192.168.1.22 │ │
│  └─────────────┘  └─────────────┘  └─────────────┘ │
└─────────────────────────────────────────────────────┘

HAProxy 配置

# /etc/haproxy/haproxy.cfg
global
    log         127.0.0.1 local2
    maxconn     4000
    ulimit-n    4160

defaults
    mode                    tcp
    log                     global
    option                  tcplog
    option                  dontlognull
    timeout connect         5000
    timeout client          50000
    timeout server          50000

frontend kubernetes
    bind *:6443
    option tcplog
    default_backend kubernetes-master

backend kubernetes-master
    option httpchk GET /healthz
    http-check expect status 200
    balance roundrobin
    server master1 192.168.1.10:6443 check
    server master2 192.168.1.11:6443 check
    server master3 192.168.1.12:6443 check

4. 核心资源对象详解

4.1 Pod

Pod 定义

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels:
    app: nginx
    version: "1.25"
  annotations:
    description: "Nginx web server"
spec:
  containers:
  - name: nginx
    image: nginx:1.25.3-alpine
    ports:
    - containerPort: 80
      protocol: TCP
    resources:
      requests:
        cpu: "100m"
        memory: "128Mi"
      limits:
        cpu: "500m"
        memory: "512Mi"
    env:
    - name: TZ
      value: "Asia/Shanghai"
    volumeMounts:
    - name: html
      mountPath: /usr/share/nginx/html
    - name: config
      mountPath: /etc/nginx/conf.d
      readOnly: true
    livenessProbe:
      httpGet:
        path: /health
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /ready
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 3
      failureThreshold: 3
  volumes:
  - name: html
    emptyDir: {}
  - name: config
    configMap:
      name: nginx-config
  restartPolicy: Always
  nodeSelector:
    disktype: ssd
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "web"
    effect: "NoSchedule"

4.2 Deployment

Deployment 定义

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: nginx
        version: "1.25"
    spec:
      containers:
      - name: nginx
        image: nginx:1.25.3-alpine
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
        livenessProbe:
          httpGet:
            path: /health
            port: 80
          initialDelaySeconds: 10
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5

滚动更新

# 更新镜像
kubectl set image deployment/nginx-deployment nginx=nginx:1.25.4

# 查看更新状态
kubectl rollout status deployment/nginx-deployment

# 查看更新历史
kubectl rollout history deployment/nginx-deployment

# 回滚到上一个版本
kubectl rollout undo deployment/nginx-deployment

# 回滚到指定版本
kubectl rollout undo deployment/nginx-deployment --to-revision=2

4.3 StatefulSet

StatefulSet 定义

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        ports:
        - containerPort: 3306
          name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

4.4 DaemonSet

DaemonSet 定义

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      hostNetwork: true
      hostPID: true
      containers:
      - name: node-exporter
        image: prom/node-exporter:latest
        ports:
        - containerPort: 9100
          hostPort: 9100
        volumeMounts:
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
        - name: root
          mountPath: /rootfs
          readOnly: true
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
      - name: root
        hostPath:
          path: /

5. 应用部署实战

5.1 微服务应用部署

完整 YAML

# 部署微服务应用
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
        version: v1
    spec:
      containers:
      - name: user-service
        image: myregistry/user-service:v1.0.0
        ports:
        - containerPort: 8080
        env:
        - name: DB_HOST
          value: "mysql.default.svc.cluster.local"
        - name: REDIS_HOST
          value: "redis.default.svc.cluster.local"
        - name: JAVA_OPTS
          value: "-Xms512m -Xmx1024m"
        resources:
          requests:
            cpu: "200m"
            memory: "512Mi"
          limits:
            cpu: "1000m"
            memory: "2Gi"
        livenessProbe:
          httpGet:
            path: /actuator/health
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /actuator/health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 5
---
# 服务暴露
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
  - port: 80
    targetPort: 8080
  type: ClusterIP
---
# 水平自动扩缩容
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: user-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: user-service
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

5.2 数据库部署

MySQL StatefulSet

apiVersion: v1
kind: Secret
metadata:
  name: mysql-secret
type: Opaque
stringData:
  root-password: "secure-password-123"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        ports:
        - containerPort: 3306
          name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        - name: MYSQL_DATABASE
          value: "appdb"
        - name: MYSQL_USER
          value: "appuser"
        - name: MYSQL_PASSWORD
          value: "apppassword"
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "2000m"
            memory: "4Gi"
        livenessProbe:
          exec:
            command:
            - mysqladmin
            - ping
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - mysqladmin
            - ping
          initialDelaySeconds: 10
          periodSeconds: 5
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 20Gi
      storageClassName: nfs-storage
---
apiVersion: v1
kind: Service
metadata:
  name: mysql
spec:
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306
  clusterIP: None

6. 服务发现与负载均衡

6.1 Service 类型

ClusterIP(默认):

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
  type: ClusterIP

NodePort

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30080
  type: NodePort

LoadBalancer

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
  type: LoadBalancer

6.2 Ingress

Ingress 配置

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - example.com
    secretName: example-tls
  rules:
  - host: example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80

7. 存储管理

7.1 PersistentVolume

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
  nfs:
    path: /data/nfs
    server: 192.168.1.100

7.2 PersistentVolumeClaim

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: nfs
  resources:
    requests:
      storage: 10Gi

8. 配置与密钥管理

8.1 ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  database.properties: |
    db.host=mysql.default.svc.cluster.local
    db.port=3306
    db.name=appdb
  application.yml: |
    server:
      port: 8080
    spring:
      datasource:
        url: jdbc:mysql://${DB_HOST}:3306/appdb
        username: appuser
        password: apppassword

8.2 Secret

apiVersion: v1
kind: Secret
metadata:
  name: app-secret
type: Opaque
stringData:
  db-password: "secure-password"
  api-key: "your-api-key"

9. 自动扩缩容

9.1 HPA(Horizontal Pod Autoscaler)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

10. 监控与日志

10.1 Prometheus 部署

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: k8s
spec:
  version: v2.45.0
  replicas: 2
  retention: 15d
  resources:
    requests:
      cpu: 500m
      memory: 2Gi
  serviceAccountName: prometheus
  serviceMonitorSelector: {}
  podMonitorSelector: {}

10.2 Grafana 仪表盘

导入 Dashboard

  • Kubernetes Cluster: 6417
  • Node Exporter: 1860
  • Prometheus: 2
  • Pod Monitoring: 6417

11. 安全加固

11.1 RBAC 配置

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: User
  name: jane
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

11.2 Network Policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
  namespace: default
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-nginx
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: nginx
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 80

12. 生产环境最佳实践

12.1 资源管理

resources:
  requests:
    cpu: "500m"      # 保证资源
    memory: "512Mi"
  limits:
    cpu: "1000m"     # 最大资源
    memory: "1Gi"

12.2 健康检查

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 60  # 启动 60 秒后开始检查
  periodSeconds: 10        # 每 10 秒检查一次
  timeoutSeconds: 5        # 超时 5 秒
  failureThreshold: 3      # 失败 3 次重启

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 3      # 失败 3 次从 Service 移除

12.3 标签规范

metadata:
  labels:
    app: user-service          # 应用名称
    version: v1.0.0           # 版本号
    component: backend        # 组件类型
    team: platform            # 所属团队
    environment: production   # 环境

版权声明:本文原创,转载请注明出处


如果本文对您有帮助,欢迎点赞、收藏、转发!

Logo

开源鸿蒙跨平台开发社区汇聚开发者与厂商,共建“一次开发,多端部署”的开源生态,致力于降低跨端开发门槛,推动万物智联创新。

更多推荐