一键部署Kubernetes集群Shell脚本自动化实践指南为什么需要自动化Kubernetes集群部署在云原生时代Kubernetes已成为容器编排的事实标准。然而手动搭建一个生产可用的Kubernetes集群涉及数十个步骤系统初始化配置关闭swap、设置防火墙规则等证书生成与管理etcd、apiserver等组件都需要TLS证书各组件二进制文件部署与配置网络插件安装与配置核心插件部署DNS、Dashboard等手动操作不仅耗时且极易出错。一个配置项的失误就可能导致整个集群无法正常工作。更糟糕的是这种手动过程难以复现给测试和生产环境的一致性带来挑战。自动化部署方案设计我们的Shell脚本方案具有以下核心特性全自动化流程从系统初始化到集群验证一键完成幂等性设计支持重复执行不会因部分成功导致后续失败模块化架构每个功能模块独立便于维护和扩展生产级配置包含必要的安全设置和优化参数主要功能模块#!/bin/bash # 主脚本结构 main() { init_system install_dependencies generate_certs deploy_etcd deploy_control_plane deploy_network deploy_addons verify_cluster }系统初始化配置集群所有节点都需要进行基础环境准备init_system() { # 关闭Swap swapoff -a sed -i / swap / s/^\(.*\)$/#\1/g /etc/fstab # 关闭SELinux setenforce 0 sed -i s/^SELINUXenforcing$/SELINUXpermissive/ /etc/selinux/config # 设置系统参数 cat EOF /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables 1 net.bridge.bridge-nf-call-iptables 1 net.ipv4.ip_forward 1 EOF sysctl --system # 设置主机名解析 echo ${MASTER_IP} ${MASTER_HOSTNAME} /etc/hosts echo ${NODE1_IP} ${NODE1_HOSTNAME} /etc/hosts echo ${NODE2_IP} ${NODE2_HOSTNAME} /etc/hosts }证书自动化管理Kubernetes集群安全运行依赖于TLS证书。我们使用cfssl工具自动生成所有必要证书generate_certs() { # CA证书配置 cat EOF ca-config.json { signing: { default: { expiry: 8760h }, profiles: { kubernetes: { usages: [signing, key encipherment, server auth, client auth], expiry: 8760h } } } } EOF # 生成CA证书 cfssl gencert -initca ca-csr.json | cfssljson -bare ca # 为每个组件生成证书 for component in kube-apiserver kube-controller-manager kube-scheduler kubelet kube-proxy; do cfssl gencert \ -caca.pem -ca-keyca-key.pem \ -configca-config.json \ -profilekubernetes \ ${component}-csr.json | cfssljson -bare ${component} done }控制平面组件部署Master节点需要部署以下核心组件组件功能关键配置参数etcd分布式键值存储--listen-client-urls, --advertise-client-urlskube-apiserverAPI服务入口--etcd-servers, --service-cluster-ip-rangekube-controller-manager控制器管理器--cluster-cidr, --service-account-private-key-filekube-scheduler调度器--leader-elect, --kubeconfig部署脚本示例deploy_control_plane() { # etcd部署 cat EOF /etc/systemd/system/etcd.service [Unit] Descriptionetcd Documentationhttps://github.com/coreos/etcd [Service] ExecStart/usr/local/bin/etcd \\ --name${NODE_NAME} \\ --cert-file/etc/kubernetes/pki/etcd.pem \\ --key-file/etc/kubernetes/pki/etcd-key.pem \\ --peer-cert-file/etc/kubernetes/pki/etcd.pem \\ --peer-key-file/etc/kubernetes/pki/etcd-key.pem \\ --trusted-ca-file/etc/kubernetes/pki/ca.pem \\ --peer-trusted-ca-file/etc/kubernetes/pki/ca.pem \\ --initial-advertise-peer-urlshttps://${MASTER_IP}:2380 \\ --listen-peer-urlshttps://${MASTER_IP}:2380 \\ --listen-client-urlshttps://${MASTER_IP}:2379,https://127.0.0.1:2379 \\ --advertise-client-urlshttps://${MASTER_IP}:2379 \\ --initial-cluster-tokenetcd-cluster-1 \\ --initial-cluster${ETCD_CLUSTER} \\ --initial-cluster-statenew \\ --data-dir/var/lib/etcd Restarton-failure RestartSec5 [Install] WantedBymulti-user.target EOF systemctl daemon-reload systemctl enable etcd systemctl start etcd }网络插件集成我们选择Calico作为网络插件因其性能优异且支持网络策略deploy_network() { kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml # 等待Calico Pod就绪 while ! kubectl get pods -n kube-system | grep calico-node | grep Running; do echo 等待Calico启动... sleep 5 done }核心插件部署生产集群需要以下核心插件CoreDNS集群内服务发现DashboardWeb管理界面Metrics Server资源监控deploy_addons() { # CoreDNS kubectl apply -f https://storage.googleapis.com/kubernetes-the-hard-way/coredns.yaml # Metrics Server kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml # Dashboard kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.3.1/aio/deploy/recommended.yaml # 创建管理员用户 kubectl apply -f - EOF apiVersion: v1 kind: ServiceAccount metadata: name: admin-user namespace: kubernetes-dashboard EOF kubectl apply -f - EOF apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin-user namespace: kubernetes-dashboard EOF }集群验证与测试部署完成后需要进行全面验证verify_cluster() { # 检查节点状态 kubectl get nodes # 检查核心组件状态 kubectl get componentstatus # 部署测试应用 kubectl create deployment nginx --imagenginx kubectl expose deployment nginx --port80 --typeNodePort # 获取访问地址 NODE_PORT$(kubectl get svc nginx -o jsonpath{.spec.ports[0].nodePort}) echo 测试应用已部署可通过以下地址访问 echo http://任意节点IP:${NODE_PORT} }进阶功能Calico网络策略以下是一个典型的网络策略示例限制命名空间间的通信apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny namespace: production spec: podSelector: {} policyTypes: - Ingress - Egress与CI/CD工具集成将集群部署脚本集成到CI/CD流水线中pipeline { agent any stages { stage(部署K8S集群) { steps { sh chmod x deploy-k8s-cluster.sh ./deploy-k8s-cluster.sh } } stage(验证集群) { steps { sh kubectl get nodes kubectl cluster-info } } } }性能优化建议etcd调优--auto-compaction-retention1h # 自动压缩 --quota-backend-bytes8589934592 # 8GB存储限制kubelet配置--max-pods100 # 每节点最大Pod数 --kube-reservedcpu500m,memory1Gi # 为系统进程保留资源API Server参数--max-requests-inflight800 --max-mutating-requests-inflight400错误排查指南常见问题及解决方法证书过期# 查看证书有效期 openssl x509 -noout -dates -in /etc/kubernetes/pki/apiserver.crt # 更新证书 kubeadm certs renew all节点NotReady# 检查kubelet日志 journalctl -u kubelet -f # 检查网络插件状态 kubectl get pods -n kube-systemPod创建失败# 查看事件 kubectl describe pod pod-name # 检查资源配额 kubectl describe quota安全加固建议启用RBAC--authorization-modeNode,RBACPod安全策略apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: restricted spec: privileged: false allowPrivilegeEscalation: false网络策略apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all spec: podSelector: {} policyTypes: - Ingress - Egress通过本文提供的自动化脚本和最佳实践您可以在短时间内部署一个生产可用的Kubernetes集群显著提升运维效率并降低人为错误风险。