KubeEdge ARM平台部署

本文记录 KubeEdge 部署到 armv7l 平台的过程。

docker 环境

ARM 系统需要搭建 docker 环境。需要添加驱动的支持。过程较复杂,此处不提。

交叉编译

只需要编译边缘端即可:
安装交叉编译器:

1
sudo apt-get install gcc-arm-linux-gnueabihf

设置环境变量并编译:

1
2
3
4
5
6
7
export GOARCH=arm
export GOOS="linux"
export GOARM=7
export CGO_ENABLED=1
export CC=arm-linux-gnueabihf-gcc
export GO111MODULE=off
make all WHAT=edgecore

注意,如果机器内存过小,编译 go 会出现 Killed 错误。(在编译时查看内存,最大占用2GB,官方不建议在目标板编译,也是这个原因)。
另外,交叉编译器也可以使用非 hf 版本。

1
2
3
sudo apt-get install gcc-arm-linux-gnueabihf
export CC=arm-linux-gnueabi-gcc
// 其它相同

注意:交叉编译器最好与目标板系统构建的交叉编译器版本一致,如果不一致,也需要进行测试。本文目标板使用gcc 8构建,但用 ubuntu 系统自带版本也可以在目标板上运行。

部署

部署过程,同 x86 平台一样。
但要注意,docker需要修改。edge.yaml文件的容器需要修改。主要是节点名称,如url、node-id、hostname-override等出现的节点名称要保持一致(且与集群中其它节点区别)。另外 podsandbox-image 需要使用 arm 版本。

云端查看:

1
2
3
4
# kubectl get nodes                
NAME STATUS ROLES AGE VERSION
edge-node-arm Ready edge 46h v1.15.3-kubeedge-v1.1.0-beta.0.323+52dd841358b292
ubuntu Ready master 7d23h v1.17.0

边缘端:

1
2
3
4
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
70fa72886761 nginx "nginx -g 'daemon ..." 23 seconds ago Up 17 seconds k8s_nginx_nginx-deployment-77698bff7d-q5shs_default_236ddc3e-17de-4d61-92bc-4dd86d9dad92_0
4a4013a1a488 kubeedge/pause-arm:3.1 "/pause" 3 minutes ago Up 3 minutes 0.0.0.0:80->80/tcp k8s_POD_nginx-deployment-77698bff7d-q5shs_default_236ddc3e-17de-4d61-92bc-4dd86d9dad92_0

所遇问题

官方编译的arm版本

使用官方编译的arm版本,无法在板子上跑起来,提示coredump。目测是编译器、链接库版本兼容性引起的。故需要自行交叉编译。

交叉编译

默认 GO111MODULE=auto,编译下载依赖包,有些无法下载,失败。
关闭,再编译,可成功(存疑:此时似乎没有下载依赖包,但亦能编译通过,暂未知原因)。

运行问题

12.31日编译:
边缘端执行信息:

1
2
RoundRobin.
W1231 17:01:33.736283 625 proxy.go:78] [L4 Proxy] create Device is failed : operation not supported

原因及解决:
看日志似乎是权限原因,但已使用 root 运行。见下解决。

使用1.6日主分支版本编译,错误如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[L4 Proxy] Add ip is failed : [L4 Proxy] Device edge0 is not exist!! please checkout the env
panic: runtime error: index out of range

goroutine 119 [running]:
github.com/kubeedge/kubeedge/edgemesh/pkg/proxy.addServer(0x3f61120, 0x14, 0x427e300, 0x2, 0x2)
/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/edgemesh/pkg/proxy/proxy.go:388 +0x8f4
github.com/kubeedge/kubeedge/edgemesh/pkg/proxy.updateServer(0x3f61120, 0x14, 0x427e300, 0x2, 0x2, 0x427e308, 0x0)
/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/edgemesh/pkg/proxy/proxy.go:443 +0x508
github.com/kubeedge/kubeedge/edgemesh/pkg/proxy.MsgProcess(0x3e96ba0, 0x24, 0x0, 0x0, 0x7a370c50, 0x16f, 0x0, 0x3ef3b50, 0xe, 0x3ef3b60, ...)
/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/edgemesh/pkg/proxy/proxy.go:359 +0x520
github.com/kubeedge/kubeedge/edgemesh/pkg.(*EdgeMesh).Start(0x3b0f934)
/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/edgemesh/pkg/module.go:51 +0x190
created by github.com/kubeedge/kubeedge/vendor/github.com/kubeedge/beehive/pkg/core.StartModules
/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/vendor/github.com/kubeedge/beehive/pkg/core/core.go:23 +0x11c

原因及解决:
在 ubuntu 边缘端查看 edge0 设备,结果如下:

1
2
3
# find / -name "edge0"
/sys/devices/virtual/net/edge0
/sys/class/net/edge0

猜测是无法创建 edge0 设备出错。
越界问题,proxy.go 文件 addServer

1
2
3
4
5
6
7
8
9
10
} else {
truetrueif len(ports) == 0 {
truetruetruereturn
truetrue}
truetrueif len(unused) == 0 {
truetruetrueexpandIpPool()
truetrue}
truetrueip = unused[0]
truetrueunused = unused[1:]
true}

出错,原因 unused 数组长度为 0,取之越界。判断其长度不为0时才获取。
修改后,无越界,但依然有错:

1
2
3
4
5
6
7
8
9
I0106 22:44:28.682395    3562 generic.go:81] GenericLifecycle: Relisting
W0106 22:44:29.026038 3562 proxy.go:76] [L4 Proxy] create Device is failed : operation not supported
I0106 22:44:29.689864 3562 generic.go:81] GenericLifecycle: Relisting
I0106 22:44:30.697860 3562 generic.go:81] GenericLifecycle: Relisting
W0106 22:44:31.068954 3562 proxy.go:76] [L4 Proxy] create Device is failed : operation not supported
I0106 22:44:31.705060 3562 generic.go:81] GenericLifecycle: Relisting
I0106 22:44:32.724408 3562 generic.go:81] GenericLifecycle: Relisting
W0106 22:44:33.117475 3562 proxy.go:76] [L4 Proxy] create Device is failed : operation not supported
I0106 22:44:33.307057 3562 communicate.go:148] CheckConfirm

原因与前述一致。权限问题,无法创建虚拟网络设备 edge0。
但是节点为 Ready 状态。应该还有问题:

1
2
3
4
5
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
edge-node Ready edge 6d5h v1.15.3-kubeedge-v1.1.0-beta.0.323+52dd841358b292
edge-node-arm Ready edge 6h6m v1.15.3-kubeedge-v1.1.0-beta.0.323+52dd841358b292-dirty
ubuntu Ready master 6d7h v1.17.0

日志分析:

1
2
3
4
5
6
7
8
9
10
11
cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d

hostport_manager.go:68] The binary conntrack is not installed, this can cause failures in network connection cleanup.
--> http://conntrack-tools.netfilter.org/downloads.html

container_manager_linux.go:295] Creating device plugin manager: false
cpu_manager.go:135] [cpumanager] Unknown policy "", falling back to default policy "none"

csi_plugin.go:222] kubernetes.io/csi: kubeclient not set, assuming standalone kubelet

plugin_watcher.go:81] failed to traverse deprecated plugin socket path "/var/lib/edged/plugins", err: error accessing path: /var/lib/edged/plugins error: lstat /var/lib/edged/plugins: no such file or directory

分析vishvananda源码:
/dev/net/tun 不存在,ubuntu上存在 crw-rw-rw- 1 root root 10, 200 Jan 6 16:25 /dev/net/tun
原因:CONFIG_TUN 没有配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Device Drivers  --->   
[*] Network device support --->
--- Network device support
[*] Network core driver support
<M> Bonding driver support
<M> Dummy net driver support
<M> EQL (serial line load balancing) support
<M> Ethernet team driver support ---> // 注:其下的全选为M
<M> MAC-VLAN support
<M> MAC-VLAN based tap driver
<M> Virtual eXtensible Local Area Network (VXLAN)
<M> Generic Network Virtualization Encapsulation
<M> GPRS Tunneling Protocol datapath (GTP-U)
<M> IEEE 802.1AE MAC-level encryption (MACsec)
<M> Network console logging support
[*] Dynamic reconfiguration of logging targets
<*> Universal TUN/TAP device driver support // !! 这就是 CONFIG_TUN
[ ] Support for cross-endian vnet headers on little-endian kernels
<*> Virtual ethernet pair device
<M> Virtual netlink monitoring device

3.17 段错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
I0130 05:44:31.614058    9380 client.go:143] finish hub-client pub
I0130 05:44:31.659321 9380 eventbus.go:61] Init Sub And Pub Client for externel mqtt broker tcp://127.0.0.1:1883 successfully
I0130 05:44:31.658978 9380 client.go:86] edge-hub-cli subscribe topic to $hw/events/device/+/twin/+
I0130 05:44:31.693166 9380 client.go:86] edge-hub-cli subscribe topic to $hw/events/node/+/membership/get
I0130 05:44:31.739451 9380 client.go:86] edge-hub-cli subscribe topic to SYS/dis/upload_records
I0130 05:44:31.781856 9380 proxy.go:92] [L4 Proxy] proxy is running now
I0130 05:44:31.868529 9380 cpu_manager.go:173] [cpumanager] starting with none policy
I0130 05:44:31.868783 9380 cpu_manager.go:174] [cpumanager] reconciling every 0s
I0130 05:44:31.868938 9380 policy_none.go:43] [cpumanager] none policy: Start
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x14 pc=0x2050dd4]

goroutine 93 [running]:
github.com/kubeedge/kubeedge/vendor/k8s.io/kubernetes/pkg/kubelet/cm.(*containerManagerImpl).enforceNodeAllocatableCgroups(0x44c23c0, 0x6, 0x0)
/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/vendor/k8s.io/kubernetes/pkg/kubelet/cm/node_container_manager_linux.go:78 +0x144
github.com/kubeedge/kubeedge/vendor/k8s.io/kubernetes/pkg/kubelet/cm.(*containerManagerImpl).setupNode(0x44c23c0, 0x47165d0, 0x0, 0x11)
/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/vendor/k8s.io/kubernetes/pkg/kubelet/cm/container_manager_linux.go:452 +0xa8
github.com/kubeedge/kubeedge/vendor/k8s.io/kubernetes/pkg/kubelet/cm.(*containerManagerImpl).Start(0x44c23c0, 0x0, 0x47165d0, 0x2c64348, 0x4144500, 0xa5334f80, 0x4827c80, 0x2cb2be8, 0x4664720, 0x46cf960, ...)
/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/vendor/k8s.io/kubernetes/pkg/kubelet/cm/container_manager_linux.go:600 +0xd4
github.com/kubeedge/kubeedge/edge/pkg/edged.(*edged).initializeModules(0x44dc600, 0x4827c80, 0x461c440)
/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/edge/pkg/edged/edged.go:564 +0xf4
github.com/kubeedge/kubeedge/edge/pkg/edged.(*edged).Start(0x44dc600)
/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/edge/pkg/edged/edged.go:264 +0x1f8
created by github.com/kubeedge/kubeedge/vendor/github.com/kubeedge/beehive/pkg/core.StartModules
/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/vendor/github.com/kubeedge/beehive/pkg/core/core.go:23 +0x11c

解决:添加默认网关。
如果没有默认网关,使用edgecore --minconfig输出的信息没有IP地址,此时出现段错误。实际上,程序通过默认网关获取IP,但没有的话,则获取不了IP,node结构体为空,但代码未判断,故出错。

1
E0129 00:02:12.582898     520 edged_status.go:371] register node failed, error: <nil>

主机名称不合法,必须是小写字母、数字,其它字符只能是-.(下划线也不行),而且名称的开头和结尾必须是小写字母。(注:这是k8s dns命名的一个规范)。

参考