企业级K8s部署:Helm+Kustomize混合策略实现零配置漂移与10分钟多环境发布
dev→staging→prod 按 Git 分支或 Tag 推进,staging 必须通过自动化测试(冒烟测试 + 性能基线),生产变更需要 2 人 PR 审批。使用 Signed Commit 防止篡改。: 创建多环境 overlays(dev/staging/prod差异化配置): 集成 ArgoCD 实现 Git 即配置源(暂不执行,提供配置范例)。• 需要管理 3+ 环境(dev/sta
适用场景 & 前置条件
适用场景:
-
• 需要管理 3+ 环境(dev/staging/prod)的微服务架构
-
• 要求配置标准化、差异可控、变更可追溯
-
• 团队规模 5+ 开发者,需要统一部署规范
前置条件:
-
• Kubernetes 1.24+ 集群(支持多命名空间隔离)
-
• Helm 3.10+ 与 Kustomize 5.0+(集成于 kubectl 1.27+)
-
• Git 仓库管理权限,支持分支策略与 PR 审批
-
• 集群 RBAC 开启,具备
namespace-admin或以上权限
环境与版本矩阵
|
组件 |
版本要求 |
关键特性依赖 |
最小资源规格 |
|---|---|---|---|
|
Kubernetes |
1.24~1.30 |
PodSecurity准入、字段验证增强 |
3节点(4C8G) |
|
Helm |
3.10+ |
values schema校验、chart依赖锁定 |
- |
|
Kustomize |
5.0+ |
replacements、HelmChartInflationGenerator |
- |
|
kubectl |
1.27+ |
内置Kustomize 5.x、apply --server-side |
- |
|
Git |
2.30+ |
分支保护、CODEOWNERS、signed commit |
- |
|
OS |
RHEL 8+ / Ubuntu 22.04+ |
- |
- |
快速清单(Checklist)
-
• [ ] Step 1: 初始化 Git 仓库结构(base/overlays/charts)
-
• [ ] Step 2: 编写 Helm Chart 模板(最小可用集)
-
• [ ] Step 3: 配置 Kustomize base(通用资源定义)
-
• [ ] Step 4: 创建多环境 overlays(dev/staging/prod差异化配置)
-
• [ ] Step 5: 集成 Helm + Kustomize(helmCharts 生成器)
-
• [ ] Step 6: 验证渲染输出与 dry-run 部署
-
• [ ] Step 7: 实施灰度发布与健康检查
-
• [ ] Step 8: 配置 GitOps 同步与自动回滚
实施步骤
Step 1: 初始化标准化仓库结构
目标: 创建符合企业规范的 GitOps 仓库骨架。
# RHEL/CentOS/Ubuntu 通用
mkdir -p k8s-manifests/{base,overlays/{dev,staging,prod},charts}
cd k8s-manifests
# 初始化 Git 与保护主分支
git init
git checkout -b main
cat <<EOF > .gitignore
*.swp
.DS_Store
secrets/
EOF
# 目录结构
tree -L 3
# ├── base/ # 通用基线配置
# ├── overlays/
# │ ├── dev/ # 开发环境差异
# │ ├── staging/ # 预发环境差异
# │ └── prod/ # 生产环境差异
# └── charts/ # 自定义 Helm Charts
参数解释:
-
•
base/: 存放环境无关的通用资源(Deployment/Service/ConfigMap 基础模板) -
•
overlays/: 各环境差异配置(副本数、资源限制、镜像标签、环境变量) -
•
charts/: 自研或第三方 Chart 本地缓存(离线部署、版本锁定)
执行后校验:
ls -R | grep -E "base|overlays|charts"
# 输出应显示三个顶层目录及各环境子目录
Step 2: 创建 Helm Chart 最小可用集
目标: 封装微服务通用部署逻辑,支持参数化配置。
# 在 charts/ 目录下创建标准 Chart
cd charts
helm create app-template
# 精简 Chart(删除示例文件,保留核心模板)
cd app-template
rm -rf templates/tests templates/NOTES.txt
编辑 values.yaml(关键参数):
# charts/app-template/values.yaml
replicaCount:1
image:
repository:nginx
tag:"1.25.3"
pullPolicy:IfNotPresent
service:
type:ClusterIP
port:80
resources:
limits:
cpu:500m
memory:512Mi
requests:
cpu:100m
memory:128Mi
livenessProbe:
httpGet:
path:/health
port:80
initialDelaySeconds:30
periodSeconds:10
readinessProbe:
httpGet:
path:/ready
port:80
initialDelaySeconds:5
periodSeconds:5
关键参数:
-
•
image.tag: 显式指定版本,避免latest漂移 -
•
resources: 生产环境建议 limits ≥ 2×requests -
•
probe.initialDelaySeconds: 根据应用启动时间调整(Java 应用建议 60+)
渲染测试:
helm template my-app . --values values.yaml > /tmp/rendered.yaml
grep -E "replicas|image:|cpu:|memory:" /tmp/rendered.yaml
# 确认输出包含 values.yaml 中定义的值
Step 3: 配置 Kustomize Base 层
目标: 定义环境无关的资源基线。
cd ../../base
cat <<EOF > kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: default
commonLabels:
app.kubernetes.io/managed-by: kustomize
resources:
- deployment.yaml
- service.yaml
configMapGenerator:
- name: app-config
literals:
- LOG_LEVEL=info
- TIMEZONE=Asia/Shanghai
EOF
创建 Deployment 基线:
# base/deployment.yaml
apiVersion:apps/v1
kind:Deployment
metadata:
name:app
spec:
replicas:1
selector:
matchLabels:
app:myapp
template:
metadata:
labels:
app:myapp
spec:
containers:
-name:app
image:nginx:1.25.3
ports:
-containerPort:80
envFrom:
-configMapRef:
name:app-config
resources:
requests:
cpu:100m
memory:128Mi
limits:
cpu:500m
memory:512Mi
livenessProbe:
httpGet:
path:/health
port:80
initialDelaySeconds:30
readinessProbe:
httpGet:
path:/ready
port:80
initialDelaySeconds:5
创建 Service 基线:
# base/service.yaml
apiVersion:v1
kind:Service
metadata:
name:app
spec:
type:ClusterIP
ports:
-port:80
targetPort:80
selector:
app:myapp
校验基线:
kubectl kustomize . | grep -E "kind:|name:|replicas:"
# 输出应包含 Deployment(replicas:1)、Service、ConfigMap
Step 4: 创建多环境 Overlays
目标: 为各环境定制差异配置(副本数、镜像、资源)。
Dev 环境(最小资源)
cd ../overlays/dev
cat <<EOF > kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: dev
bases:
- ../../base
patchesStrategicMerge:
- replica-patch.yaml
- resource-patch.yaml
images:
- name: nginx
newTag: 1.25.3-dev
EOF
# 副本数补丁
cat <<EOF > replica-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
replicas: 1
EOF
# 资源补丁(开发环境降低限额)
cat <<EOF > resource-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
template:
spec:
containers:
- name: app
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
EOF
Prod 环境(高可用 + HPA)
cd ../prod
cat <<EOF > kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: prod
bases:
- ../../base
patchesStrategicMerge:
- replica-patch.yaml
images:
- name: nginx
newTag: 1.25.3
resources:
- hpa.yaml
EOF
# 生产副本数
cat <<EOF > replica-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
replicas: 3
EOF
# HPA 配置
cat <<EOF > hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
EOF
对比渲染输出:
# Dev 环境
kubectl kustomize ../dev | grep -A2 "replicas:"
# 输出: replicas: 1, cpu: 50m
# Prod 环境
kubectl kustomize . | grep -A2 "replicas:"
# 输出: replicas: 3, cpu: 100m, 以及 HPA 配置
Step 5: 集成 Helm + Kustomize(核心能力)
目标: 使用 Kustomize HelmChart 生成器管理第三方 Chart。
cd ../staging
cat <<EOF > kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: staging
helmCharts:
- name: redis
repo: https://charts.bitnami.com/bitnami
version: 18.1.0
releaseName: redis
namespace: staging
valuesInline:
auth:
enabled: true
password: "StagingRedisPass123"
master:
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 200m
memory: 256Mi
replica:
replicaCount: 2
bases:
- ../../base
patchesStrategicMerge:
- env-patch.yaml
EOF
# 应用连接 Redis 的环境变量
cat <<EOF > env-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
template:
spec:
containers:
- name: app
env:
- name: REDIS_HOST
value: redis-master.staging.svc.cluster.local
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis
key: redis-password
EOF
参数解释:
-
•
helmCharts.repo: 必须使用 HTTPS URL,支持私有 Helm 仓库(需配置认证) -
•
valuesInline: 直接嵌入 values,避免单独管理 values 文件 -
•
releaseName: Helm Release 名称,用于helm list查询
渲染 Helm Chart(需要 helm-kustomize 插件或 kubectl 1.27+):
# 方式一:kubectl 内置 kustomize
kubectl kustomize . --enable-helm > /tmp/staging-with-helm.yaml
# 方式二:kustomize CLI(需安装 helm plugin)
kustomize build . --enable-helm > /tmp/staging-with-helm.yaml
# 校验 Redis 资源已生成
grep -E "kind: StatefulSet|name: redis" /tmp/staging-with-helm.yaml
# 应输出 Redis StatefulSet 与 Service
Step 6: Dry-Run 验证与 Diff 对比
目标: 在应用前检测配置错误与资源冲突。
# 生产环境 dry-run(服务端验证)
kubectl apply -k overlays/prod --dry-run=server --validate=true
# 输出示例
# deployment.apps/app created (dry run)
# service/app unchanged (dry run)
# horizontalpodautoscaler.autoscaling/app-hpa created (dry run)
# 对比当前集群与新配置差异(需 kubectl-diff 或 kubediff 工具)
kubectl diff -k overlays/prod
# 输出类似 git diff 格式,显示增删改的字段
验证资源配额不超限:
# 检查命名空间 ResourceQuota
kubectl get resourcequota -n prod
# 计算新配置总请求量
kubectl kustomize overlays/prod | \
yq eval'select(.kind == "Deployment") | .spec.template.spec.containers[].resources.requests' - | \
awk '/cpu:/ {cpu+=$2} /memory:/ {mem+=$2} END {print "Total CPU:", cpu "m", "Memory:", mem "Mi"}'
Step 7: 灰度发布与健康检查
目标: 逐步替换 Pod,确保新版本稳定后全量发布。
# 应用 staging 环境(含 Helm Chart)
kubectl apply -k overlays/staging --server-side --force-conflicts
# 监控滚动更新进度
kubectl rollout status deployment/app -n staging --timeout=5m
# 输出示例
# Waiting for deployment "app" rollout to finish: 1 out of 3 new replicas have been updated...
# deployment "app" successfully rolled out
# 验证新 Pod 就绪
kubectl get pods -n staging -l app=myapp -o wide
kubectl logs -n staging -l app=myapp --tail=50 | grep -E "ERROR|WARN"
# 健康检查(模拟流量测试)
POD_IP=$(kubectl get pod -n staging -l app=myapp -o jsonpath='{.items[0].status.podIP}')
kubectl run curl-test --image=curlimages/curl --rm -it --restart=Never -- \
curl -s http://$POD_IP/health -w "\nHTTP_CODE: %{http_code}\n"
# 预期输出: HTTP_CODE: 200
生产环境分批发布:
# 修改 Deployment 注解启用分批策略
cd overlays/prod
cat <<EOF > rollout-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # 最多超出期望副本 1 个
maxUnavailable: 0 # 保证零停机
EOF
# 添加到 kustomization.yaml
yq eval'.patchesStrategicMerge += ["rollout-patch.yaml"]' -i kustomization.yaml
# 应用到生产
kubectl apply -k . --server-side
# 逐个观察 Pod 替换
watch kubectl get pods -n prod -l app=myapp
Step 8: GitOps 自动同步与回滚
目标: 集成 ArgoCD 实现 Git 即配置源(暂不执行,提供配置范例)。
# argocd-application.yaml(示例,不实际部署)
apiVersion:argoproj.io/v1alpha1
kind:Application
metadata:
name:myapp-prod
namespace:argocd
spec:
project:default
source:
repoURL:https://github.com/myorg/k8s-manifests.git
targetRevision:main
path:overlays/prod
destination:
server:https://kubernetes.default.svc
namespace:prod
syncPolicy:
automated:
prune:true# 自动删除 Git 中不存在的资源
selfHeal:true# 手动修改自动恢复
allowEmpty:false
syncOptions:
-CreateNamespace=true
-ServerSideApply=true
retry:
limit:5
backoff:
duration:5s
factor:2
maxDuration:3m
手动回滚到上一版本:
# 查看 Deployment 历史版本
kubectl rollout history deployment/app -n prod
# 输出示例
# REVISION CHANGE-CAUSE
# 1 Initial deployment
# 2 Update to v1.25.3
# 3 Scale to 5 replicas
# 回滚到 Revision 2
kubectl rollout undo deployment/app -n prod --to-revision=2
# 验证回滚结果
kubectl rollout status deployment/app -n prod
kubectl describe deployment/app -n prod | grep Image:
监控与告警
Prometheus 关键指标
# ServiceMonitor 示例(需 Prometheus Operator)
apiVersion:monitoring.coreos.com/v1
kind:ServiceMonitor
metadata:
name:app-metrics
namespace:prod
spec:
selector:
matchLabels:
app:myapp
endpoints:
-port:metrics
interval:30s
path:/metrics
核心 PromQL 查询:
# Pod 可用率(目标 >99.9%)
sum(kube_deployment_status_replicas_available{deployment="app", namespace="prod"})
/
sum(kube_deployment_spec_replicas{deployment="app", namespace="prod"}) * 100
# 容器重启次数(告警阈值: 最近1h内 >3 次)
rate(kube_pod_container_status_restarts_total{namespace="prod", pod=~"app-.*"}[1h]) > 3
# 滚动更新耗时(目标 <5min)
time() - kube_deployment_status_observed_generation{deployment="app"} *
on(deployment) group_left kube_deployment_spec_strategy_rollingupdate_max_unavailable
Linux 原生监控命令
# 查看 Pod 资源实际使用
kubectl top pods -n prod -l app=myapp --containers
# 输出示例
# POD NAME CPU(cores) MEMORY(bytes)
# app-xxx-yyy app 150m 256Mi
# 持续监控 Pod 事件(异常重启/调度失败)
kubectl get events -n prod --field-selector involvedObject.name=app --watch
# 验证 Service Endpoints 正常
kubectl get endpoints -n prod app -o yaml | grep -A5 addresses:
性能与容量
基准测试命令
# 使用 wrk 测试应用吞吐
kubectl run wrk-bench --image=williamyeh/wrk --rm -it --restart=Never -- \
-t4 -c100 -d30s --latency http://app.prod.svc.cluster.local/api/test
# 预期输出
# Requests/sec: 5000+
# Latency (P99): <50ms
# 使用 kubectl-bench 测试部署速度
time kubectl apply -k overlays/prod --server-side
# 目标: 10 分钟内完成 100+ Pod 集群的全量更新
调优参数
Kubernetes 层面:
# Deployment 优化配置
spec:
progressDeadlineSeconds:600# 10分钟超时
revisionHistoryLimit:5# 保留5个历史版本
strategy:
rollingUpdate:
maxSurge:25%# 生产建议 1-2 个固定数量
maxUnavailable:0# 零停机
template:
spec:
terminationGracePeriodSeconds:30# 优雅关闭时间
priorityClassName:high-priority# 高优先级 Pod
系统层面(集群节点):
# 提高文件描述符限制(RHEL/CentOS)
cat <<EOF > /etc/sysctl.d/k8s-perf.conf
fs.file-max = 2097152
fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 8192
EOF
sysctl -p /etc/sysctl.d/k8s-perf.conf
# Ubuntu/Debian 同上
容量规划公式:
节点 CPU 总量 = (Pod数 × requests.cpu) / 可用率(0.7-0.8)
节点内存总量 = (Pod数 × requests.memory) / 可用率(0.7-0.8) + 系统保留(2-4GB)
安全与合规
RBAC 最小权限
# 应用部署专用 ServiceAccount
apiVersion:v1
kind:ServiceAccount
metadata:
name:deployer
namespace:prod
---
apiVersion:rbac.authorization.k8s.io/v1
kind:Role
metadata:
name:app-deployer
namespace:prod
rules:
-apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "patch", "update"]
-apiGroups: [""]
resources: ["services", "configmaps"]
verbs: ["get", "list"]
---
apiVersion:rbac.authorization.k8s.io/v1
kind:RoleBinding
metadata:
name:deployer-binding
namespace:prod
roleRef:
apiGroup:rbac.authorization.k8s.io
kind:Role
name:app-deployer
subjects:
-kind:ServiceAccount
name:deployer
namespace:prod
敏感信息管理
# 使用 Sealed Secrets(加密 Secret 存储于 Git)
kubectl create secret generic redis-auth \
--from-literal=password=Prod_Redis_Pass_2024 \
--dry-run=client -o yaml | \
kubeseal --controller-namespace=kube-system \
--controller-name=sealed-secrets \
--format yaml > overlays/prod/redis-secret-sealed.yaml
# 验证加密后的 Secret 可提交 Git
grep "encryptedData:" overlays/prod/redis-secret-sealed.yaml
审计与合规
# 启用 Kustomize 构建元数据(追溯配置来源)
cd overlays/prod
kustomize edit add buildmetadata originAnnotations
# 验证生成资源包含 Git 信息
kubectl kustomize . | grep -E "config.kubernetes.io/origin"
# 输出: config.kubernetes.io/origin: |
# path: overlays/prod/kustomization.yaml
# repo: https://github.com/myorg/k8s-manifests
常见故障与排错
|
症状 |
诊断命令 |
可能根因 |
快速修复 |
永久修复 |
|---|---|---|---|---|
kubectl apply -k
失败 |
kubectl kustomize . | kubectl apply --dry-run=server -f - |
YAML 语法错误或资源冲突 |
执行 |
在 CI 中集成 yamllint + kubeval 校验 |
|
Helm Chart 未渲染 |
kubectl kustomize . --enable-helm |
kubectl 版本 <1.27 或插件缺失 |
升级 kubectl 或安装 kustomize 5.x |
在 kustomization.yaml 明确声明 |
|
镜像拉取失败 |
kubectl describe pod <pod> -n <ns> | grep -A5 Events |
镜像 Tag 不存在或私有仓库认证失败 |
回滚到上一版本 |
配置 ImagePullSecrets 并验证镜像存在性 |
|
Pod 一直 Pending |
kubectl get events --field-selector involvedObject.name=<pod> |
资源不足或节点亲和性规则过严 |
扩容节点或调整 requests |
配置 ResourceQuota 与 LimitRange 预防超卖 |
|
ConfigMap 修改未生效 |
kubectl rollout restart deployment/<name> -n <ns> |
ConfigMap 挂载为 subPath 不支持热更新 |
手动重启 Pod |
改用 envFrom 或配置 Reloader 自动重启 |
|
滚动更新卡住 |
kubectl rollout status deploy/<name> --timeout=0 |
新 Pod readinessProbe 失败 |
检查健康检查端点与超时配置 |
调整 probe 参数或修复应用启动逻辑 |
变更与回滚剧本
生产变更标准流程
#!/bin/bash
# prod-deploy.sh - 生产发布剧本
set -euo pipefail
NAMESPACE="prod"
OVERLAY_PATH="overlays/prod"
BACKUP_DIR="backups/$(date +%Y%m%d-%H%M%S)"
echo"==> 1. 备份当前配置"
mkdir -p $BACKUP_DIR
kubectl get all,cm,secret -n $NAMESPACE -o yaml > $BACKUP_DIR/pre-deploy-snapshot.yaml
echo"==> 2. 验证新配置"
kubectl kustomize $OVERLAY_PATH --enable-helm | kubectl apply --dry-run=server -f -
echo"==> 3. 应用变更(灰度策略:先更新 1 个 Pod)"
kubectl patch deployment app -n $NAMESPACE -p '{"spec":{"strategy":{"rollingUpdate":{"maxSurge":1,"maxUnavailable":0}}}}'
kubectl apply -k $OVERLAY_PATH --server-side
echo"==> 4. 健康检查(等待 2 分钟)"
sleep 120
kubectl rollout status deployment/app -n $NAMESPACE --timeout=5m
echo"==> 5. 验证新 Pod 日志无错误"
NEW_POD=$(kubectl get pod -n $NAMESPACE -l app=myapp --sort-by=.metadata.creationTimestamp | tail -1 | awk '{print $1}')
kubectl logs -n $NAMESPACE$NEW_POD --tail=100 | grep -i error && {
echo"发现错误日志,触发回滚"
kubectl rollout undo deployment/app -n $NAMESPACE
exit 1
}
echo"==> 6. 全量发布"
kubectl scale deployment/app -n $NAMESPACE --replicas=10
echo"==> 部署完成,备份保存于 $BACKUP_DIR"
快速回滚脚本
#!/bin/bash
# rollback.sh - 一键回滚到上一稳定版本
NAMESPACE=${1:-prod}
DEPLOYMENT=${2:-app}
echo"==> 回滚 $NAMESPACE/$DEPLOYMENT 到上一版本"
kubectl rollout undo deployment/$DEPLOYMENT -n $NAMESPACE
echo"==> 等待回滚完成"
kubectl rollout status deployment/$DEPLOYMENT -n $NAMESPACE --timeout=3m
echo"==> 验证回滚后 Pod 状态"
kubectl get pods -n $NAMESPACE -l app=myapp
kubectl logs -n $NAMESPACE -l app=myapp --tail=50 | grep -i "started successfully"
echo"==> 回滚完成"
最佳实践(10 条决策性要点)
-
1. 配置分层原则:base 只放环境无关资源,环境差异全部通过 overlays 覆盖,禁止在 base 中硬编码环境特定值。
-
2. 镜像版本锁定:生产环境禁止使用
latest或浮动 Tag(如v1.2),必须使用完整 SHA256 或语义化版本(v1.2.3)。 -
3. 资源限制强制:所有 Deployment 必须声明
requests和limits,limits 设为 requests 的 1.5-2 倍,防止 OOM 与资源争抢。 -
4. 滚动更新参数:生产环境
maxUnavailable=0+maxSurge=1(或 25%),确保零停机;progressDeadlineSeconds=600防止卡死。 -
5. 健康检查调优:
readinessProbe.initialDelaySeconds设为应用实际启动时间的 1.2 倍;livenessProbe避免检查依赖服务(防止级联重启)。 -
6. Kustomize 引用 Helm:对于第三方组件(Redis/MySQL/Kafka),使用
helmCharts生成器而非手动helm template,保持单一真实来源。 -
7. GitOps 原子性:每次变更对应一个 Git Commit,Commit Message 包含 Ticket ID 与变更摘要;使用 Signed Commit 防止篡改。
-
8. 环境晋升策略:dev→staging→prod 按 Git 分支或 Tag 推进,staging 必须通过自动化测试(冒烟测试 + 性能基线),生产变更需要 2 人 PR 审批。
-
9. Secret 加密存储:敏感信息使用 Sealed Secrets、External Secrets Operator 或 Vault 注入,禁止明文 Secret 提交 Git。
-
10. 回滚时间窗口:生产部署后保持 30 分钟观察窗口,监控错误率、延迟 P99、重启次数三大指标,异常立即回滚。
附录:完整配置样例
生产环境 Kustomization.yaml 完整示例
# overlays/prod/kustomization.yaml
apiVersion:kustomize.config.k8s.io/v1beta1
kind:Kustomization
namespace:prod
# 引用基线配置
bases:
-../../base
# Helm Chart 集成(Redis 集群)
helmCharts:
-name:redis
repo:https://charts.bitnami.com/bitnami
version:18.1.0
releaseName:redis-ha
namespace:prod
valuesInline:
auth:
enabled:true
existingSecret:redis-auth-sealed
master:
resources:
requests: {cpu:500m, memory:1Gi}
limits: {cpu:2000m, memory:4Gi}
persistence:
enabled:true
size:20Gi
replica:
replicaCount:3
resources:
requests: {cpu:200m, memory:512Mi}
limits: {cpu:1000m, memory:2Gi}
# 镜像覆盖(使用 SHA256)
images:
-name:nginx
newName:myregistry.com/myapp
newTag:sha256:abcd1234...
# 通用标签
commonLabels:
env:production
team:platform
cost-center:"12345"
# 通用注解
commonAnnotations:
deployed-by:argocd
git-commit:${GIT_COMMIT}
# ConfigMap 生成器(追加配置)
configMapGenerator:
-name:app-config
behavior:merge
literals:
-MAX_CONNECTIONS=1000
-CACHE_TTL=3600
# Secret 生成器(Sealed Secret)
secretGenerator:
-name:app-secrets
files:
-secrets/database-creds.txt
-secrets/api-keys.txt
# 资源补丁
patchesStrategicMerge:
-replica-patch.yaml
-rollout-strategy-patch.yaml
-monitoring-patch.yaml
# JSON6902 补丁(精确字段修改)
patchesJson6902:
-target:
group:apps
version:v1
kind:Deployment
name:app
patch:|-
- op: replace
path: /spec/template/spec/containers/0/resources/requests/cpu
value: "1000m"
# 资源转换(设置优先级)
transformers:
-priority-transformer.yaml
# 构建元数据(追溯配置来源)
buildMetadata:
-originAnnotations
Ansible 部署任务示例
# ansible/deploy-k8s-app.yml
---
-name:DeployapplicationusingKustomize
hosts:localhost
gather_facts:no
vars:
overlay:"overlays/{{ env }}"
namespace:"{{ env }}"
tasks:
-name:Validatekustomization
command:kubectlkustomize {{ overlay }} --enable-helm
register:kustomize_output
changed_when:false
check_mode:no
-name:Dry-runapply
command:kubectlapply-k {{ overlay }} --dry-run=server--validate=true
register:dryrun_result
failed_when:"'error' in dryrun_result.stderr"
-name:Backupcurrentstate
shell:|
kubectl get all,cm,secret -n {{ namespace }} -o yaml > \
backups/{{ namespace }}-$(date +%Y%m%d-%H%M%S).yaml
args:
creates:backups/
-name:Applykustomization
command:kubectlapply-k {{ overlay }} --server-side--force-conflicts
when:notansible_check_mode
-name:Waitforrollout
command:kubectlrolloutstatusdeployment/app-n {{ namespace }} --timeout=5m
when:notansible_check_mode
-name:Verifydeployment
command:kubectlgetpods-n {{ namespace }} -lapp=myapp-ojsonpath='{.items[*].status.phase}'
register:pod_status
failed_when:"'Running' not in pod_status.stdout"
when:notansible_check_mode
更多推荐

所有评论(0)