cat <<EOF > /etc/apt/sources.list.d/kubernetes.list deb http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial main EOF
更新:
1
apt-get update
但出错:
1 2 3 4 5 6 7 8
Ign:7 http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial/main amd64 Packages Get:7 http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial/main amd64 Packages [31.3 kB] Err:7 http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial/main amd64 Packages Hash Sum mismatch Fetched 38.9 kB in 1s (20.2 kB/s) Reading package lists... Done E: Failed to fetch http://mirrors.ustc.edu.cn/kubernetes/apt/dists/kubernetes-xenial/main/binary-amd64/Packages.gz Hash Sum mismatch E: Some index files failed to download. They have been ignored, or old ones used instead.
W: GPG error: https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6A030B21BA07F4FB W: The repository 'https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial InRelease' is not signed. N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use. N: See apt-secure(8) manpage for repository creation and user configuration details.
查询k8s相应配置包:
1 2 3 4
W1214 08:46:14.303158 8461 version.go:101] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) W1214 08:46:14.303772 8461 version.go:102] falling back to the local client version: v1.17.0 W1214 08:46:14.304223 8461 validation.go:28] Cannot validate kube-proxy config - no validator is available W1214 08:46:14.304609 8461 validation.go:28] Cannot validate kubelet config - no validator is available
[ERROR Swap]: running with swap on is not supported. Please disable swap
原因及解决: 不支持 swap 需要禁止。
提示:
1
[ERROR Port-10250]: Port 10250 is in use
需要停止 kubelet 的运行: systemctl stop kubelet。
提示WARNING IsDockerSystemdCheck。
1 2 3 4 5 6 7 8
[init] Using Kubernetes version: v1.17.0 [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR NumCPU]: the number of available CPUs 1 is less than the required 2 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher )
原因及解决: docker使用cgroupfs,与k8s不一致。先查看:
1 2 3
# docker info | grep -i cgroup Cgroup Driver: cgroupfs // !!! 此处为cgroupfs WARNING: No swap limit support
需要修改,先停止docker:
1
systemctl stop docker
更改 /etc/docker/daemon.json,添加:
1
"exec-opts": ["native.cgroupdriver=systemd"]
重启docker:
1
systemctl start docker
查看 cgroup:
1 2
# docker info | grep -i cgroup Cgroup Driver: systemd
已改。 (!!!!!! 注: 修改kubeadm配置文件:
1
vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR NumCPU]: the number of available CPUs 1 is less than the required 2 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...
原因解决:CPU要双核以上,改虚拟机cpu为2个核心或以上即可。
运行时
查看状态:
1
kubectl get pods -n kube-system
出错:
1
The connection to the server localhost:8080 was refused - did you specify the right host or port?
[FATAL] plugin/loop: Loop (127.0.0.1:60825 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 7805087528265218508.4857814245207702505."
# kubectl logs kube-flannel-ds-amd64-n55rf -n kube-system Error from server (BadRequest): container "kube-flannel" in pod "kube-flannel-ds-amd64-n55rf" is waiting to start: PodInitializing
使用 kubectl describe pod 查看:
1 2 3 4 5 6 7 8
# kubectl describe pod kube-flannel-ds-amd64-n55rf -n kube-system ... Normal Scheduled 13m default-scheduler Successfully assigned kube-system/kube-flannel-ds-amd64-n55rf to ubuntu Normal Pulling 4m21s (x4 over 13m) kubelet, ubuntu Pulling image "quay.io/coreos/flannel:v0.11.0-amd64" Warning Failed 3m6s (x4 over 10m) kubelet, ubuntu Failed to pull image "quay.io/coreos/flannel:v0.11.0-amd64": rpc error: code = Unknown desc = context canceled Warning Failed 3m6s (x4 over 10m) kubelet, ubuntu Error: ErrImagePull Normal BackOff 2m38s (x7 over 10m) kubelet, ubuntu Back-off pulling image "quay.io/coreos/flannel:v0.11.0-amd64" Warning Failed 2m27s (x8 over 10m) kubelet, ubuntu Error: ImagePullBackOff
# kubectl logs coredns-6955765f44-4csvn -n kube-system Error from server (BadRequest): container "coredns" in pod "coredns-6955765f44-r96qk" is waiting to start: ContainerCreating
# kubectl describe pod coredns-6955765f44-4csvn -n kube-system Name: coredns-6955765f44-r96qk Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Node: ubuntu/192.168.0.102 Start Time: Sun, 15 Dec 2019 22:45:15 +0800 Labels: k8s-app=kube-dns pod-template-hash=6955765f44 Annotations: <none> Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/coredns-6955765f44 Containers: coredns: Container ID: Image: k8s.gcr.io/coredns:1.6.5 Image ID: Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-qq7qf (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false coredns-token-qq7qf: Type: Secret (a volume populated by a Secret) SecretName: coredns-token-qq7qf Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux Tolerations: CriticalAddonsOnly node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 7m21s (x3 over 8m32s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. Normal Scheduled 6m55s default-scheduler Successfully assigned kube-system/coredns-6955765f44-r96qk to ubuntu Warning FailedCreatePodSandBox 6m52s kubelet, ubuntu Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9a2d45536097d22cc6b10f338b47f1789869f45f4b12f8a202aa898295dc80a4" network for pod "coredns-6955765f44-r96qk": networkPlugin cni failed to set up pod "coredns-6955765f44-r96qk_kube-system" network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.0.1/24
安装flannel后,删除出问题的pod:
1
kubectl delete pod coredns-6955765f44-4csvn -n kube-system
会自动重启一个新的pod,但问题依然。查看 ifconfig,发现有 cni0 。 网上解决方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
#在master节点之外的节点进行操作 kubeadm reset systemctl stop kubelet systemctl stop docker rm -rf /var/lib/cni/ rm -rf /var/lib/kubelet/* rm -rf /etc/cni/ ifconfig cni0 down ifconfig flannel.1 down ifconfig docker0 down ip link delete cni0 ip link delete flannel.1 ##重启kubelet systemctl restart kubelet ##重启docker systemctl restart docker
尝试,失败!
又一次部署的提示:
1 2 3 4 5 6 7 8 9
Warning FailedScheduling 77s (x5 over 5m53s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. Normal Scheduled 76s default-scheduler Successfully assigned kube-system/coredns-9d85f5447-4jwf2 to ubuntu Warning FailedCreatePodSandBox 73s kubelet, ubuntu Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "5c109baa51b8d97e75c6b35edf108ca4f2f56680b629140c8b477b9a8a03d97c" network for pod "coredns-9d85f5447-4jwf2": networkPlugin cni failed to set up pod "coredns-9d85f5447-4jwf2_kube-system" network: open /run/flannel/subnet.env: no such file or directory Warning FailedCreatePodSandBox 71s kubelet, ubuntu Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "3f8c5b704fb1dc4584a2903b2ecff329e717e5c2558c9f761501fab909d32133" network for pod "coredns-9d85f5447-4jwf2": networkPlugin cni failed to set up pod "coredns-9d85f5447-4jwf2_kube-system" network: open /run/flannel/subnet.env: no such file or directory Normal SandboxChanged 70s (x2 over 72s) kubelet, ubuntu Pod sandbox changed, it will be killed and re-created. Normal Pulled 29s (x4 over 69s) kubelet, ubuntu Container image "registry.aliyuncs.com/google_containers/coredns:1.6.5" already present on machine Normal Created 29s (x4 over 69s) kubelet, ubuntu Created container coredns Normal Started 29s (x4 over 69s) kubelet, ubuntu Started container coredns Warning BackOff 10s (x9 over 67s) kubelet, ubuntu Back-off restarting failed container
Warning FailedScheduling 56m (x5 over 60m) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. Normal Scheduled 56m default-scheduler Successfully assigned kube-system/coredns-9d85f5447-4jwf2 to ubuntu Warning FailedCreatePodSandBox 56m kubelet, ubuntu Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "5c109baa51b8d97e75c6b35edf108ca4f2f56680b629140c8b477b9a8a03d97c" network for pod "coredns-9d85f5447-4jwf2": networkPlugin cni failed to set up pod "coredns-9d85f5447-4jwf2_kube-system" network: open /run/flannel/subnet.env: no such file or directory Warning FailedCreatePodSandBox 55m kubelet, ubuntu Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "3f8c5b704fb1dc4584a2903b2ecff329e717e5c2558c9f761501fab909d32133" network for pod "coredns-9d85f5447-4jwf2": networkPlugin cni failed to set up pod "coredns-9d85f5447-4jwf2_kube-system" network: open /run/flannel/subnet.env: no such file or directory Normal SandboxChanged 55m (x2 over 55m) kubelet, ubuntu Pod sandbox changed, it will be killed and re-created. Normal Pulled 55m (x4 over 55m) kubelet, ubuntu Container image "registry.aliyuncs.com/google_containers/coredns:1.6.5" already present on machine Normal Created 55m (x4 over 55m) kubelet, ubuntu Created container coredns Normal Started 55m (x4 over 55m) kubelet, ubuntu Started container coredns Warning BackOff 59s (x270 over 55m) kubelet, ubuntu Back-off restarting failed container
log信息:
1 2 3 4 5
.:53 [INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7 CoreDNS-1.6.5 linux/amd64, go1.13.4, c2fd1b2 [FATAL] plugin/loop: Loop (127.0.0.1:48100 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 639535139534040434.6569166625322327450."
[preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set. [preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... [kubelet-check] Initial timeout of 40s passed. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
原因及解决: 猜测可能是主机名与 master 一致导致的,但没有实证。
TLS 超时
执行 kubectl apply -f xxx时,
1
Unable to connect to the server: net/http: TLS handshake timeout
可能原因:master 分配内存过小。加大即可。(已加到4GB,依然出错,重启一次正常)
网上收集的
WARNING FileExisting-socat socat是一个网络工具, k8s 使用它来进行 pod 的数据交互,出现这个问题直接安装socat即可:
1
apt-get install socat
工作节点加入失败 在子节点执行kubeadm join命令后返回超时错误,如下:
1 2 3
root@worker2:~# kubeadm join 192.168.56.11:6443 --token wbryr0.am1n476fgjsno6wa --discovery-token-ca-cert-hash sha256:7640582747efefe7c2d537655e428faa6275dbaff631de37822eb8fd4c054807 [preflight] Running pre-flight checks error execution phase preflight: couldn't validate the identity of the API Server: abort connecting to API servers after timeout of 5m0s