Kubernetes容器编排完全指南
深入学习Kubernetes容器编排技术,从基础概念到生产环境部署的完整指南
2025年9月18日
DocsLib Team
Kubernetes容器编排DevOps微服务云原生
Kubernetes容器编排完全指南
Kubernetes(简称K8s)是目前最流行的容器编排平台,它为容器化应用提供了自动化部署、扩展和管理的能力。本文将从基础概念开始,逐步深入到生产环境的实际应用。
1. Kubernetes简介
1.1 什么是Kubernetes
Kubernetes是一个开源的容器编排引擎,用于自动化容器化应用程序的部署、扩展和管理。它最初由Google开发,现在由Cloud Native Computing Foundation (CNCF)维护。
1.2 核心优势
- 自动化部署和回滚:支持声明式配置和自动化部署
- 服务发现和负载均衡:自动分配IP地址和DNS名称
- 存储编排:自动挂载存储系统
- 自我修复:重启失败的容器,替换和重新调度节点
- 密钥和配置管理:安全地管理敏感信息
- 水平扩展:根据CPU使用率或其他指标自动扩展应用
1.3 核心概念
# 基本架构组件
Master Node (控制平面):
- API Server: 集群的统一入口
- etcd: 分布式键值存储
- Controller Manager: 控制器管理器
- Scheduler: 调度器
Worker Node (工作节点):
- kubelet: 节点代理
- kube-proxy: 网络代理
- Container Runtime: 容器运行时
2. 环境搭建
2.1 本地开发环境
使用Minikube
# 安装Minikube (Windows)
choco install minikube
# 启动集群
minikube start --driver=docker
# 查看集群状态
kubectl cluster-info
kubectl get nodes
# 启用插件
minikube addons enable dashboard
minikube addons enable ingress
使用Docker Desktop
# 在Docker Desktop中启用Kubernetes
# Settings -> Kubernetes -> Enable Kubernetes
# 验证安装
kubectl version --client
kubectl cluster-info
2.2 生产环境搭建
使用kubeadm
# 在所有节点上安装Docker和kubeadm
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
# 添加Kubernetes APT仓库
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
# 安装kubelet、kubeadm和kubectl
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
# 在主节点初始化集群
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
# 配置kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 安装网络插件(Flannel)
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# 加入工作节点
# 在工作节点上运行kubeadm init输出的join命令
sudo kubeadm join <master-ip>:6443 --token <token> --discovery-token-ca-cert-hash <hash>
3. 核心资源对象
3.1 Pod
Pod是Kubernetes中最小的部署单元,包含一个或多个容器。
# simple-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.20
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
env:
- name: ENV_VAR
value: "production"
volumeMounts:
- name: config-volume
mountPath: /etc/nginx/conf.d
volumes:
- name: config-volume
configMap:
name: nginx-config
restartPolicy: Always
# 部署和管理Pod
kubectl apply -f simple-pod.yaml
kubectl get pods
kubectl describe pod nginx-pod
kubectl logs nginx-pod
kubectl exec -it nginx-pod -- /bin/bash
kubectl delete pod nginx-pod
3.2 Deployment
Deployment提供了Pod和ReplicaSet的声明式更新。
# nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.20
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
# 部署和管理Deployment
kubectl apply -f nginx-deployment.yaml
kubectl get deployments
kubectl get replicasets
kubectl get pods
# 扩展副本数
kubectl scale deployment nginx-deployment --replicas=5
# 更新镜像
kubectl set image deployment/nginx-deployment nginx=nginx:1.21
# 查看滚动更新状态
kubectl rollout status deployment/nginx-deployment
# 查看更新历史
kubectl rollout history deployment/nginx-deployment
# 回滚到上一个版本
kubectl rollout undo deployment/nginx-deployment
3.3 Service
Service为Pod提供稳定的网络访问入口。
# nginx-service.yaml
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
nodePort: 30080
type: NodePort
---
# ClusterIP Service (集群内部访问)
apiVersion: v1
kind: Service
metadata:
name: nginx-clusterip
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
type: ClusterIP
---
# LoadBalancer Service (云环境)
apiVersion: v1
kind: Service
metadata:
name: nginx-loadbalancer
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer
# 部署和测试Service
kubectl apply -f nginx-service.yaml
kubectl get services
kubectl describe service nginx-service
# 测试服务访问
kubectl get nodes -o wide
curl http://<node-ip>:30080
# 查看服务端点
kubectl get endpoints nginx-service
3.4 ConfigMap和Secret
ConfigMap
# nginx-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-config
data:
nginx.conf: |
server {
listen 80;
server_name localhost;
location / {
root /usr/share/nginx/html;
index index.html index.htm;
}
location /api {
proxy_pass http://backend-service:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
app.properties: |
database.host=mysql-service
database.port=3306
database.name=myapp
log.level=INFO
Secret
# app-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: app-secret
type: Opaque
data:
# base64编码的值
database-username: bXl1c2Vy # myuser
database-password: bXlwYXNzd29yZA== # mypassword
api-key: YWJjZGVmZ2hpams= # abcdefghijk
# 创建Secret的其他方式
kubectl create secret generic app-secret \
--from-literal=database-username=myuser \
--from-literal=database-password=mypassword
# 从文件创建
kubectl create secret generic app-secret --from-file=./secret-file.txt
# 查看Secret
kubectl get secrets
kubectl describe secret app-secret
3.5 Ingress
Ingress提供HTTP和HTTPS路由到集群内服务的规则。
# nginx-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
rules:
- host: myapp.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx-service
port:
number: 80
- path: /api
pathType: Prefix
backend:
service:
name: backend-service
port:
number: 8080
- host: admin.myapp.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: admin-service
port:
number: 3000
# 部署Ingress Controller (Nginx)
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.1.1/deploy/static/provider/cloud/deploy.yaml
# 等待Ingress Controller就绪
kubectl wait --namespace ingress-nginx \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=120s
# 部署Ingress规则
kubectl apply -f nginx-ingress.yaml
kubectl get ingress
kubectl describe ingress nginx-ingress
# 配置本地hosts文件
echo "127.0.0.1 myapp.local admin.myapp.local" >> /etc/hosts
4. 存储管理
4.1 Volume类型
# storage-examples.yaml
apiVersion: v1
kind: Pod
metadata:
name: storage-pod
spec:
containers:
- name: app
image: nginx
volumeMounts:
- name: empty-dir-volume
mountPath: /tmp/empty
- name: host-path-volume
mountPath: /tmp/host
- name: config-volume
mountPath: /etc/config
- name: secret-volume
mountPath: /etc/secrets
volumes:
# EmptyDir - 临时存储
- name: empty-dir-volume
emptyDir: {}
# HostPath - 主机路径
- name: host-path-volume
hostPath:
path: /var/log
type: Directory
# ConfigMap
- name: config-volume
configMap:
name: app-config
# Secret
- name: secret-volume
secret:
secretName: app-secret
4.2 PersistentVolume和PersistentVolumeClaim
# persistent-volume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: mysql-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: manual
hostPath:
path: /var/lib/mysql-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: manual
4.3 StorageClass
# storage-class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
replication-type: none
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
# 使用StorageClass的PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dynamic-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: fast-ssd
5. 应用部署实战
5.1 完整的Web应用部署
# web-app-complete.yaml
# MySQL数据库
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql
spec:
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
- name: MYSQL_DATABASE
value: "webapp"
ports:
- containerPort: 3306
volumeMounts:
- name: mysql-storage
mountPath: /var/lib/mysql
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
volumes:
- name: mysql-storage
persistentVolumeClaim:
claimName: mysql-pvc
---
apiVersion: v1
kind: Service
metadata:
name: mysql-service
spec:
selector:
app: mysql
ports:
- port: 3306
targetPort: 3306
type: ClusterIP
---
# 后端API服务
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend-api
spec:
replicas: 3
selector:
matchLabels:
app: backend-api
template:
metadata:
labels:
app: backend-api
spec:
containers:
- name: api
image: myapp/backend:v1.0.0
ports:
- containerPort: 8080
env:
- name: DB_HOST
value: "mysql-service"
- name: DB_PORT
value: "3306"
- name: DB_NAME
value: "webapp"
- name: DB_USER
valueFrom:
secretKeyRef:
name: mysql-secret
key: username
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
volumeMounts:
- name: config-volume
mountPath: /app/config
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
volumes:
- name: config-volume
configMap:
name: backend-config
---
apiVersion: v1
kind: Service
metadata:
name: backend-service
spec:
selector:
app: backend-api
ports:
- port: 8080
targetPort: 8080
type: ClusterIP
---
# 前端应用
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
spec:
replicas: 2
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: myapp/frontend:v1.0.0
ports:
- containerPort: 80
env:
- name: API_URL
value: "http://backend-service:8080"
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
name: frontend-service
spec:
selector:
app: frontend
ports:
- port: 80
targetPort: 80
type: ClusterIP
5.2 配置和密钥管理
# configs-and-secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: mysql-secret
type: Opaque
data:
root-password: cm9vdHBhc3N3b3Jk # rootpassword
username: d2ViYXBw # webapp
password: d2ViYXBwcGFzcw== # webapppass
---
apiVersion: v1
kind: ConfigMap
metadata:
name: backend-config
data:
application.yml: |
server:
port: 8080
spring:
datasource:
url: jdbc:mysql://mysql-service:3306/webapp
driver-class-name: com.mysql.cj.jdbc.Driver
jpa:
hibernate:
ddl-auto: update
show-sql: true
logging:
level:
com.myapp: DEBUG
org.springframework.web: INFO
logback.xml: |
<configuration>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="STDOUT" />
</root>
</configuration>
6. 监控和日志
6.1 资源监控
# monitoring.yaml
# Metrics Server
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
ports:
- name: main-port
containerPort: 4443
protocol: TCP
resources:
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: tmp-dir
mountPath: /tmp
volumes:
- name: tmp-dir
emptyDir: {}
# 安装Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# 查看资源使用情况
kubectl top nodes
kubectl top pods
kubectl top pods --all-namespaces
# 查看特定命名空间的资源使用
kubectl top pods -n kube-system
6.2 Horizontal Pod Autoscaler (HPA)
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: backend-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: backend-api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
# 部署HPA
kubectl apply -f hpa.yaml
# 查看HPA状态
kubectl get hpa
kubectl describe hpa backend-hpa
# 模拟负载测试
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh
# 在容器内执行
while true; do wget -q -O- http://backend-service:8080/api/test; done
6.3 日志收集
# logging.yaml
# Fluentd DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
spec:
selector:
matchLabels:
name: fluentd
template:
metadata:
labels:
name: fluentd
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch-service"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
- name: FLUENT_ELASTICSEARCH_SCHEME
value: "http"
resources:
limits:
memory: 512Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
7. 安全管理
7.1 RBAC (基于角色的访问控制)
# rbac.yaml
# 创建命名空间
apiVersion: v1
kind: Namespace
metadata:
name: development
---
# 创建ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: dev-user
namespace: development
---
# 创建Role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: development
name: dev-role
rules:
- apiGroups: [""]
resources: ["pods", "services", "configmaps", "secrets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
# 创建RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: dev-binding
namespace: development
subjects:
- kind: ServiceAccount
name: dev-user
namespace: development
roleRef:
kind: Role
name: dev-role
apiGroup: rbac.authorization.k8s.io
---
# 集群级别的ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: node-reader
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
---
# ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: node-reader-binding
subjects:
- kind: ServiceAccount
name: dev-user
namespace: development
roleRef:
kind: ClusterRole
name: node-reader
apiGroup: rbac.authorization.k8s.io
7.2 Network Policies
# network-policy.yaml
# 默认拒绝所有入站流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
---
# 允许前端访问后端
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
app: backend-api
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
---
# 允许后端访问数据库
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-backend-to-db
namespace: production
spec:
podSelector:
matchLabels:
app: mysql
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: backend-api
ports:
- protocol: TCP
port: 3306
7.3 Pod Security Standards
# pod-security.yaml
apiVersion: v1
kind: Namespace
metadata:
name: secure-namespace
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
---
# 安全的Pod配置
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
namespace: secure-namespace
spec:
replicas: 2
selector:
matchLabels:
app: secure-app
template:
metadata:
labels:
app: secure-app
spec:
serviceAccountName: secure-app-sa
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: nginx:1.20
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
capabilities:
drop:
- ALL
ports:
- containerPort: 8080
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
volumeMounts:
- name: tmp-volume
mountPath: /tmp
- name: cache-volume
mountPath: /var/cache/nginx
volumes:
- name: tmp-volume
emptyDir: {}
- name: cache-volume
emptyDir: {}
8. 故障排除
8.1 常用调试命令
# 查看集群状态
kubectl cluster-info
kubectl get nodes
kubectl describe node <node-name>
# 查看Pod状态
kubectl get pods --all-namespaces
kubectl describe pod <pod-name>
kubectl logs <pod-name>
kubectl logs <pod-name> -c <container-name>
kubectl logs <pod-name> --previous
# 进入Pod调试
kubectl exec -it <pod-name> -- /bin/bash
kubectl exec -it <pod-name> -c <container-name> -- /bin/sh
# 查看事件
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl get events --field-selector involvedObject.name=<pod-name>
# 查看资源使用
kubectl top nodes
kubectl top pods
# 网络调试
kubectl run debug-pod --image=nicolaka/netshoot -it --rm
# 在debug pod中
nslookup kubernetes.default
ping <service-name>
curl http://<service-name>:<port>
# 查看服务端点
kubectl get endpoints
kubectl describe service <service-name>
# 端口转发
kubectl port-forward pod/<pod-name> 8080:80
kubectl port-forward service/<service-name> 8080:80
8.2 常见问题解决
Pod启动失败
# 查看Pod状态和事件
kubectl describe pod <pod-name>
# 常见状态及解决方法:
# ImagePullBackOff: 镜像拉取失败
# - 检查镜像名称和标签
# - 检查镜像仓库访问权限
# - 检查网络连接
# CrashLoopBackOff: 容器启动后立即退出
# - 查看容器日志
# - 检查应用配置
# - 检查健康检查配置
# Pending: Pod无法调度
# - 检查资源请求是否过大
# - 检查节点标签和污点
# - 检查PVC是否绑定成功
服务访问问题
# 检查Service配置
kubectl describe service <service-name>
kubectl get endpoints <service-name>
# 检查标签选择器
kubectl get pods --show-labels
# 测试服务连通性
kubectl run test-pod --image=busybox -it --rm -- /bin/sh
# 在测试Pod中
wget -qO- http://<service-name>:<port>
nslookup <service-name>
9. 生产环境最佳实践
9.1 资源管理
# resource-management.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: resource-limits
namespace: production
spec:
limits:
- default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
type: Container
- max:
cpu: "2"
memory: "2Gi"
min:
cpu: "50m"
memory: "64Mi"
type: Container
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: resource-quota
namespace: production
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
persistentvolumeclaims: "10"
pods: "50"
services: "20"
9.2 备份和恢复
# etcd备份
ETCDCTL_API=3 etcdctl snapshot save backup.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key
# 验证备份
ETCDCTL_API=3 etcdctl --write-out=table snapshot status backup.db
# 恢复etcd
ETCDCTL_API=3 etcdctl snapshot restore backup.db \
--name m1 \
--initial-cluster m1=https://127.0.0.1:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls https://127.0.0.1:2380
# 应用配置备份
kubectl get all --all-namespaces -o yaml > cluster-backup.yaml
# 备份特定命名空间
kubectl get all -n production -o yaml > production-backup.yaml
# 备份Secret和ConfigMap
kubectl get secrets --all-namespaces -o yaml > secrets-backup.yaml
kubectl get configmaps --all-namespaces -o yaml > configmaps-backup.yaml
9.3 升级策略
# 集群升级前检查
kubectl version
kubeadm version
kubeadm upgrade plan
# 升级控制平面
sudo kubeadm upgrade apply v1.25.0
# 升级kubelet和kubectl
sudo apt-mark unhold kubelet kubectl && \
sudo apt-get update && sudo apt-get install -y kubelet=1.25.0-00 kubectl=1.25.0-00 && \
sudo apt-mark hold kubelet kubectl
# 重启kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet
# 升级工作节点
# 在每个工作节点上执行
sudo kubeadm upgrade node
sudo apt-mark unhold kubelet kubectl && \
sudo apt-get update && sudo apt-get install -y kubelet=1.25.0-00 kubectl=1.25.0-00 && \
sudo apt-mark hold kubelet kubectl
sudo systemctl daemon-reload
sudo systemctl restart kubelet
总结
Kubernetes是一个功能强大的容器编排平台,通过本文的学习,你应该能够:
- 理解核心概念:掌握Pod、Deployment、Service等基本资源对象
- 环境搭建:能够搭建开发和生产环境的Kubernetes集群
- 应用部署:部署完整的多层应用架构
- 存储管理:配置和使用各种存储解决方案
- 监控和日志:实现应用和集群的监控
- 安全管理:配置RBAC、网络策略等安全措施
- 故障排除:诊断和解决常见问题
- 生产实践:应用生产环境的最佳实践
Kubernetes的学习曲线较陡峭,但掌握后将大大提升你的容器化应用管理能力。建议从简单的应用开始,逐步深入到复杂的生产环境部署。持续实践和学习是掌握Kubernetes的关键。