Kubernetes容器编排完全指南

深入学习Kubernetes容器编排技术,从基础概念到生产环境部署的完整指南

2025年9月18日
DocsLib Team
Kubernetes容器编排DevOps微服务云原生

Kubernetes容器编排完全指南

Kubernetes(简称K8s)是目前最流行的容器编排平台,它为容器化应用提供了自动化部署、扩展和管理的能力。本文将从基础概念开始,逐步深入到生产环境的实际应用。

1. Kubernetes简介

1.1 什么是Kubernetes

Kubernetes是一个开源的容器编排引擎,用于自动化容器化应用程序的部署、扩展和管理。它最初由Google开发,现在由Cloud Native Computing Foundation (CNCF)维护。

1.2 核心优势

  • 自动化部署和回滚:支持声明式配置和自动化部署
  • 服务发现和负载均衡:自动分配IP地址和DNS名称
  • 存储编排:自动挂载存储系统
  • 自我修复:重启失败的容器,替换和重新调度节点
  • 密钥和配置管理:安全地管理敏感信息
  • 水平扩展:根据CPU使用率或其他指标自动扩展应用

1.3 核心概念

# 基本架构组件
Master Node (控制平面):
  - API Server: 集群的统一入口
  - etcd: 分布式键值存储
  - Controller Manager: 控制器管理器
  - Scheduler: 调度器

Worker Node (工作节点):
  - kubelet: 节点代理
  - kube-proxy: 网络代理
  - Container Runtime: 容器运行时

2. 环境搭建

2.1 本地开发环境

使用Minikube

# 安装Minikube (Windows)
choco install minikube

# 启动集群
minikube start --driver=docker

# 查看集群状态
kubectl cluster-info
kubectl get nodes

# 启用插件
minikube addons enable dashboard
minikube addons enable ingress

使用Docker Desktop

# 在Docker Desktop中启用Kubernetes
# Settings -> Kubernetes -> Enable Kubernetes

# 验证安装
kubectl version --client
kubectl cluster-info

2.2 生产环境搭建

使用kubeadm

# 在所有节点上安装Docker和kubeadm
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl

# 添加Kubernetes APT仓库
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

# 安装kubelet、kubeadm和kubectl
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

# 在主节点初始化集群
sudo kubeadm init --pod-network-cidr=10.244.0.0/16

# 配置kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 安装网络插件(Flannel)
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

# 加入工作节点
# 在工作节点上运行kubeadm init输出的join命令
sudo kubeadm join <master-ip>:6443 --token <token> --discovery-token-ca-cert-hash <hash>

3. 核心资源对象

3.1 Pod

Pod是Kubernetes中最小的部署单元,包含一个或多个容器。

# simple-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.20
    ports:
    - containerPort: 80
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"
    env:
    - name: ENV_VAR
      value: "production"
    volumeMounts:
    - name: config-volume
      mountPath: /etc/nginx/conf.d
  volumes:
  - name: config-volume
    configMap:
      name: nginx-config
  restartPolicy: Always
# 部署和管理Pod
kubectl apply -f simple-pod.yaml
kubectl get pods
kubectl describe pod nginx-pod
kubectl logs nginx-pod
kubectl exec -it nginx-pod -- /bin/bash
kubectl delete pod nginx-pod

3.2 Deployment

Deployment提供了Pod和ReplicaSet的声明式更新。

# nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.20
        ports:
        - containerPort: 80
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
# 部署和管理Deployment
kubectl apply -f nginx-deployment.yaml
kubectl get deployments
kubectl get replicasets
kubectl get pods

# 扩展副本数
kubectl scale deployment nginx-deployment --replicas=5

# 更新镜像
kubectl set image deployment/nginx-deployment nginx=nginx:1.21

# 查看滚动更新状态
kubectl rollout status deployment/nginx-deployment

# 查看更新历史
kubectl rollout history deployment/nginx-deployment

# 回滚到上一个版本
kubectl rollout undo deployment/nginx-deployment

3.3 Service

Service为Pod提供稳定的网络访问入口。

# nginx-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
    nodePort: 30080
  type: NodePort
---
# ClusterIP Service (集群内部访问)
apiVersion: v1
kind: Service
metadata:
  name: nginx-clusterip
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: ClusterIP
---
# LoadBalancer Service (云环境)
apiVersion: v1
kind: Service
metadata:
  name: nginx-loadbalancer
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: LoadBalancer
# 部署和测试Service
kubectl apply -f nginx-service.yaml
kubectl get services
kubectl describe service nginx-service

# 测试服务访问
kubectl get nodes -o wide
curl http://<node-ip>:30080

# 查看服务端点
kubectl get endpoints nginx-service

3.4 ConfigMap和Secret

ConfigMap

# nginx-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-config
data:
  nginx.conf: |
    server {
        listen 80;
        server_name localhost;
        
        location / {
            root /usr/share/nginx/html;
            index index.html index.htm;
        }
        
        location /api {
            proxy_pass http://backend-service:8080;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
  app.properties: |
    database.host=mysql-service
    database.port=3306
    database.name=myapp
    log.level=INFO

Secret

# app-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: app-secret
type: Opaque
data:
  # base64编码的值
  database-username: bXl1c2Vy  # myuser
  database-password: bXlwYXNzd29yZA==  # mypassword
  api-key: YWJjZGVmZ2hpams=  # abcdefghijk
# 创建Secret的其他方式
kubectl create secret generic app-secret \
  --from-literal=database-username=myuser \
  --from-literal=database-password=mypassword

# 从文件创建
kubectl create secret generic app-secret --from-file=./secret-file.txt

# 查看Secret
kubectl get secrets
kubectl describe secret app-secret

3.5 Ingress

Ingress提供HTTP和HTTPS路由到集群内服务的规则。

# nginx-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
  rules:
  - host: myapp.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: nginx-service
            port:
              number: 80
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: backend-service
            port:
              number: 8080
  - host: admin.myapp.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: admin-service
            port:
              number: 3000
# 部署Ingress Controller (Nginx)
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.1.1/deploy/static/provider/cloud/deploy.yaml

# 等待Ingress Controller就绪
kubectl wait --namespace ingress-nginx \
  --for=condition=ready pod \
  --selector=app.kubernetes.io/component=controller \
  --timeout=120s

# 部署Ingress规则
kubectl apply -f nginx-ingress.yaml
kubectl get ingress
kubectl describe ingress nginx-ingress

# 配置本地hosts文件
echo "127.0.0.1 myapp.local admin.myapp.local" >> /etc/hosts

4. 存储管理

4.1 Volume类型

# storage-examples.yaml
apiVersion: v1
kind: Pod
metadata:
  name: storage-pod
spec:
  containers:
  - name: app
    image: nginx
    volumeMounts:
    - name: empty-dir-volume
      mountPath: /tmp/empty
    - name: host-path-volume
      mountPath: /tmp/host
    - name: config-volume
      mountPath: /etc/config
    - name: secret-volume
      mountPath: /etc/secrets
  volumes:
  # EmptyDir - 临时存储
  - name: empty-dir-volume
    emptyDir: {}
  
  # HostPath - 主机路径
  - name: host-path-volume
    hostPath:
      path: /var/log
      type: Directory
  
  # ConfigMap
  - name: config-volume
    configMap:
      name: app-config
  
  # Secret
  - name: secret-volume
    secret:
      secretName: app-secret

4.2 PersistentVolume和PersistentVolumeClaim

# persistent-volume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mysql-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: manual
  hostPath:
    path: /var/lib/mysql-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: manual

4.3 StorageClass

# storage-class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  replication-type: none
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
# 使用StorageClass的PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dynamic-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: fast-ssd

5. 应用部署实战

5.1 完整的Web应用部署

# web-app-complete.yaml
# MySQL数据库
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        - name: MYSQL_DATABASE
          value: "webapp"
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-storage
          mountPath: /var/lib/mysql
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
      volumes:
      - name: mysql-storage
        persistentVolumeClaim:
          claimName: mysql-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: mysql-service
spec:
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306
  type: ClusterIP
---
# 后端API服务
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend-api
  template:
    metadata:
      labels:
        app: backend-api
    spec:
      containers:
      - name: api
        image: myapp/backend:v1.0.0
        ports:
        - containerPort: 8080
        env:
        - name: DB_HOST
          value: "mysql-service"
        - name: DB_PORT
          value: "3306"
        - name: DB_NAME
          value: "webapp"
        - name: DB_USER
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: username
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: password
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
      volumes:
      - name: config-volume
        configMap:
          name: backend-config
---
apiVersion: v1
kind: Service
metadata:
  name: backend-service
spec:
  selector:
    app: backend-api
  ports:
  - port: 8080
    targetPort: 8080
  type: ClusterIP
---
# 前端应用
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
spec:
  replicas: 2
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: myapp/frontend:v1.0.0
        ports:
        - containerPort: 80
        env:
        - name: API_URL
          value: "http://backend-service:8080"
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
  name: frontend-service
spec:
  selector:
    app: frontend
  ports:
  - port: 80
    targetPort: 80
  type: ClusterIP

5.2 配置和密钥管理

# configs-and-secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: mysql-secret
type: Opaque
data:
  root-password: cm9vdHBhc3N3b3Jk  # rootpassword
  username: d2ViYXBw  # webapp
  password: d2ViYXBwcGFzcw==  # webapppass
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: backend-config
data:
  application.yml: |
    server:
      port: 8080
    spring:
      datasource:
        url: jdbc:mysql://mysql-service:3306/webapp
        driver-class-name: com.mysql.cj.jdbc.Driver
      jpa:
        hibernate:
          ddl-auto: update
        show-sql: true
    logging:
      level:
        com.myapp: DEBUG
        org.springframework.web: INFO
  logback.xml: |
    <configuration>
      <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
          <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
        </encoder>
      </appender>
      <root level="INFO">
        <appender-ref ref="STDOUT" />
      </root>
    </configuration>

6. 监控和日志

6.1 资源监控

# monitoring.yaml
# Metrics Server
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
        args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        ports:
        - name: main-port
          containerPort: 4443
          protocol: TCP
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp
      volumes:
      - name: tmp-dir
        emptyDir: {}
# 安装Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# 查看资源使用情况
kubectl top nodes
kubectl top pods
kubectl top pods --all-namespaces

# 查看特定命名空间的资源使用
kubectl top pods -n kube-system

6.2 Horizontal Pod Autoscaler (HPA)

# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: backend-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend-api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max
# 部署HPA
kubectl apply -f hpa.yaml

# 查看HPA状态
kubectl get hpa
kubectl describe hpa backend-hpa

# 模拟负载测试
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh
# 在容器内执行
while true; do wget -q -O- http://backend-service:8080/api/test; done

6.3 日志收集

# logging.yaml
# Fluentd DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch-service"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        - name: FLUENT_ELASTICSEARCH_SCHEME
          value: "http"
        resources:
          limits:
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

7. 安全管理

7.1 RBAC (基于角色的访问控制)

# rbac.yaml
# 创建命名空间
apiVersion: v1
kind: Namespace
metadata:
  name: development
---
# 创建ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: dev-user
  namespace: development
---
# 创建Role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: development
  name: dev-role
rules:
- apiGroups: [""]
  resources: ["pods", "services", "configmaps", "secrets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
# 创建RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: dev-binding
  namespace: development
subjects:
- kind: ServiceAccount
  name: dev-user
  namespace: development
roleRef:
  kind: Role
  name: dev-role
  apiGroup: rbac.authorization.k8s.io
---
# 集群级别的ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-reader
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch"]
---
# ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-reader-binding
subjects:
- kind: ServiceAccount
  name: dev-user
  namespace: development
roleRef:
  kind: ClusterRole
  name: node-reader
  apiGroup: rbac.authorization.k8s.io

7.2 Network Policies

# network-policy.yaml
# 默认拒绝所有入站流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
---
# 允许前端访问后端
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend-api
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
---
# 允许后端访问数据库
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-backend-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: mysql
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: backend-api
    ports:
    - protocol: TCP
      port: 3306

7.3 Pod Security Standards

# pod-security.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: secure-namespace
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted
---
# 安全的Pod配置
apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
  namespace: secure-namespace
spec:
  replicas: 2
  selector:
    matchLabels:
      app: secure-app
  template:
    metadata:
      labels:
        app: secure-app
    spec:
      serviceAccountName: secure-app-sa
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 3000
        fsGroup: 2000
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: app
        image: nginx:1.20
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
          capabilities:
            drop:
            - ALL
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"
        volumeMounts:
        - name: tmp-volume
          mountPath: /tmp
        - name: cache-volume
          mountPath: /var/cache/nginx
      volumes:
      - name: tmp-volume
        emptyDir: {}
      - name: cache-volume
        emptyDir: {}

8. 故障排除

8.1 常用调试命令

# 查看集群状态
kubectl cluster-info
kubectl get nodes
kubectl describe node <node-name>

# 查看Pod状态
kubectl get pods --all-namespaces
kubectl describe pod <pod-name>
kubectl logs <pod-name>
kubectl logs <pod-name> -c <container-name>
kubectl logs <pod-name> --previous

# 进入Pod调试
kubectl exec -it <pod-name> -- /bin/bash
kubectl exec -it <pod-name> -c <container-name> -- /bin/sh

# 查看事件
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl get events --field-selector involvedObject.name=<pod-name>

# 查看资源使用
kubectl top nodes
kubectl top pods

# 网络调试
kubectl run debug-pod --image=nicolaka/netshoot -it --rm
# 在debug pod中
nslookup kubernetes.default
ping <service-name>
curl http://<service-name>:<port>

# 查看服务端点
kubectl get endpoints
kubectl describe service <service-name>

# 端口转发
kubectl port-forward pod/<pod-name> 8080:80
kubectl port-forward service/<service-name> 8080:80

8.2 常见问题解决

Pod启动失败

# 查看Pod状态和事件
kubectl describe pod <pod-name>

# 常见状态及解决方法:
# ImagePullBackOff: 镜像拉取失败
# - 检查镜像名称和标签
# - 检查镜像仓库访问权限
# - 检查网络连接

# CrashLoopBackOff: 容器启动后立即退出
# - 查看容器日志
# - 检查应用配置
# - 检查健康检查配置

# Pending: Pod无法调度
# - 检查资源请求是否过大
# - 检查节点标签和污点
# - 检查PVC是否绑定成功

服务访问问题

# 检查Service配置
kubectl describe service <service-name>
kubectl get endpoints <service-name>

# 检查标签选择器
kubectl get pods --show-labels

# 测试服务连通性
kubectl run test-pod --image=busybox -it --rm -- /bin/sh
# 在测试Pod中
wget -qO- http://<service-name>:<port>
nslookup <service-name>

9. 生产环境最佳实践

9.1 资源管理

# resource-management.yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: resource-limits
  namespace: production
spec:
  limits:
  - default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    type: Container
  - max:
      cpu: "2"
      memory: "2Gi"
    min:
      cpu: "50m"
      memory: "64Mi"
    type: Container
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: resource-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    persistentvolumeclaims: "10"
    pods: "50"
    services: "20"

9.2 备份和恢复

# etcd备份
ETCDCTL_API=3 etcdctl snapshot save backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key

# 验证备份
ETCDCTL_API=3 etcdctl --write-out=table snapshot status backup.db

# 恢复etcd
ETCDCTL_API=3 etcdctl snapshot restore backup.db \
  --name m1 \
  --initial-cluster m1=https://127.0.0.1:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-advertise-peer-urls https://127.0.0.1:2380

# 应用配置备份
kubectl get all --all-namespaces -o yaml > cluster-backup.yaml

# 备份特定命名空间
kubectl get all -n production -o yaml > production-backup.yaml

# 备份Secret和ConfigMap
kubectl get secrets --all-namespaces -o yaml > secrets-backup.yaml
kubectl get configmaps --all-namespaces -o yaml > configmaps-backup.yaml

9.3 升级策略

# 集群升级前检查
kubectl version
kubeadm version
kubeadm upgrade plan

# 升级控制平面
sudo kubeadm upgrade apply v1.25.0

# 升级kubelet和kubectl
sudo apt-mark unhold kubelet kubectl && \
sudo apt-get update && sudo apt-get install -y kubelet=1.25.0-00 kubectl=1.25.0-00 && \
sudo apt-mark hold kubelet kubectl

# 重启kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet

# 升级工作节点
# 在每个工作节点上执行
sudo kubeadm upgrade node
sudo apt-mark unhold kubelet kubectl && \
sudo apt-get update && sudo apt-get install -y kubelet=1.25.0-00 kubectl=1.25.0-00 && \
sudo apt-mark hold kubelet kubectl
sudo systemctl daemon-reload
sudo systemctl restart kubelet

总结

Kubernetes是一个功能强大的容器编排平台,通过本文的学习,你应该能够:

  1. 理解核心概念:掌握Pod、Deployment、Service等基本资源对象
  2. 环境搭建:能够搭建开发和生产环境的Kubernetes集群
  3. 应用部署:部署完整的多层应用架构
  4. 存储管理:配置和使用各种存储解决方案
  5. 监控和日志:实现应用和集群的监控
  6. 安全管理:配置RBAC、网络策略等安全措施
  7. 故障排除:诊断和解决常见问题
  8. 生产实践:应用生产环境的最佳实践

Kubernetes的学习曲线较陡峭,但掌握后将大大提升你的容器化应用管理能力。建议从简单的应用开始,逐步深入到复杂的生产环境部署。持续实践和学习是掌握Kubernetes的关键。

返回博客列表
感谢阅读!