迟思堂随笔

我的docker随笔27：基于容器的sqlite测试

2020-06-17T16:25:05.000Z

需求：
sqlite 常用于嵌入式平台，本文使用容器进行测试。选用环境有 nodejs 和 python，主要目的是测试在容器运行的情况，及数据共享。测试代码源自网络，但有修改。

nodejs 环境

创建工程目录。
运行容器：

1
2
3

sudo docker run -itd --rm --name node -v $PWD:/home/node node:alpine sh

sudo docker exec -it node sh

查看容器的 node 版本为 14.2.0，在宿主机上安装指定的版本：

1 2	sudo npm i -g n sudo n 14.2.0

安装 sqlite：

1	sudo npm install sqlite3

写数据库核心代码：

/*
注1：node为异步，不能按顺序创建表、插入数据，可能会提示表不存在。
注2：
*/
var sqlite3 = require('sqlite3').verbose();

var db;
db = new sqlite3.Database("db.db", function(err) {
  if (err) throw err;
});

console.log("db: ", db)
db.run(`create table IF NOT EXISTS user (id INT,name VARCHAR,password VARCHAR)`, function(
  err
) {
  if (err) throw err;
  console.log("Create Table Success!");
});

// Run Insert Data
db.run(`insert into user values (666,"admin","admin")`, function(err) {
  if (err) throw err;
  console.log("Insert Data Success!");
});

db.close(function(err) {
  if (err) throw err;
});

注意，由于 nodejs 是异步的，所以可能会提示 user 表不存在，多次执行即可。本文仅演示，不做实践指导。

查询数据库核心代码：

var sqlite3 = require('sqlite3').verbose();

var db;
db = new sqlite3.Database("db.db", function(err) {
  if (err) throw err;
});

function show() {
console.log("inside....")
db.all("select * from user", function(err, rows) {
  if (err) throw err;
  console.log(rows);
});

setTimeout(show, 1000);
}

show();

/*
db.close(function(err) {
  if (err) throw err;
});
*/

解释：间隔 1 秒查询数据库并打印数据。

测试结论：
宿主机写、读数据库，通过。
宿主机写数据库，容器读数据库，通过。
容器写数据库，宿主机查询，失败。在容器中执行提示段错误Segmentation fault。尝试在纯 docker 目录中执行，亦然。

nodejs arm 环境

此处仅描述环境的搭建。需求：在 arm 平台实现 nodejs 容器，内含 koa、sqlite3。
在 x86 运行 arm 容器。

docker pull arm32v7/node:10-slim

docker run -itd --rm -v $PWD:/home/node -p 3000:3000 --name nodejsbuild arm32v7/node:10-slim sh  
docker exec -it nodejsbuild sh

安装：

1	npm install sqlite3

出错：

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! sqlite3@4.2.0 install: `node-pre-gyp install --fallback-to-build`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the sqlite3@4.2.0 install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

分析：

node-pre-gyp WARN Using needle for node-pre-gyp https download 
node-pre-gyp WARN Tried to download(403): https://mapbox-node-binary.s3.amazonaws.com/sqlite3/v4.2.0/node-v64-linux-arm.tar.gz 
node-pre-gyp WARN Pre-built binaries not found for sqlite3@4.2.0 and node@10.20.1 (node-v64 ABI, glibc) (falling back to source compile with node-gyp)  // !! arm 平台没有预编译，需要从源码安装
gyp ERR! find Python   // !! 未安装python
gyp ERR! find Python Python is not set from command line or npm configuration
gyp ERR! find Python Python is not set from environment variable PYTHON

gyp ERR! build error 
gyp ERR! stack Error: not found: make // !! 未安装编译所需工具

解决：

1
2
3

apt-get install python3
ln -s /usr/bin/python3 /usr/bin/python  // 做链接
apt-get install build-essential

制作备忘：

cd /home/latelee/nodejs/node_arm/node
docker build -t nodejsapp .

docker tag nodejsapp registry.cn-hangzhou.aliyuncs.com/latelee/nodejsapp:armweb
docker push registry.cn-hangzhou.aliyuncs.com/latelee/nodejsapp:armweb

docker run -itd --rm -p 9000:3000 --name nodejsapp1 nodejsapp

docker run -itd --rm -p 3000:3000 --name nodejsapp1 registry.cn-hangzhou.aliyuncs.com/latelee/nodejsapp:armweb

docker load -i nodejsapp.img 
docker run -itd --rm -p 3000:3000 -v /mnt/data:/mnt/data --name nodejsapp1 registry.cn-hangzhou.aliyuncs.com/latelee/nodejsapp:armweb


docker run -itd --rm -p 3000:3000 -v /mnt/aaa/nodejsapp/node:/home/node -v /mnt/data:/mnt/data --name  nodejs nodebase

docker run -itd --rm -p 3000:3000 -v /mnt/data:/mnt/data --name nodejsapp nodejsapp

docker run -itd --rm -p 9000:3000 -v $PWD/data:/mnt/data --name nodejsapp nodejsapp
（注：导出3000端口，方便其它主机访问。挂载/mnt/data以便访问数据库。

python 环境

运行容器：

1 2	docker run -itd --rm --name python -v $PWD:/home/python python:3.5-slim-stretch sh docker exec -it python sh

注：该镜像已经包含了 sqlite3 库，无须额外安装。

写数据库核心代码：

import sqlite3

conn = sqlite3.connect("db.db")
cursor = conn.cursor()
cursor.execute("insert into user values (777,\"python11\",\"adminpython\")")

cursor.close()
conn.commit()
conn.close()

查询数据库：

import sqlite3

conn = sqlite3.connect("db.db")
cursor = conn.cursor()
cursor.execute("select * from user")
values = cursor.fetchall()
print(values)

cursor.close()
conn.commit()
conn.close()

结论：
同上，但在容器中可以写数据库。
另外，python 容器中写数据库，nodejs 容器中查询数据库，正常。

小结

sqlite 只需数据库文件即可。适用于小型系统。本文测试发现 nodejs 容器无法写数据库（包括创建数据表）。
只要保证数据库文件相同，跨容器可以操作数据库。

我的docker随笔26：制作arm平台的python-pandas镜像

2020-06-17T16:24:24.000Z

需求：
构建 arm （linux_armv7l）平台上用于测试机器训练的 python 镜像，带 numpy、 pandas、sklearn，等。
本文构建所用操作系统为 ubuntu 16.04 64bit（4GB双核），采用容器内安装依赖库的形式，非 Dockerfile，是因为考虑到实际构建中可能会遇到各种问题。

技术总结

在 pc 端运行 arm 镜像容器，使用arm32v7/python，此方式是为了方便制作（也可在 arm 系统上直接制作）。
镜像标签为 slim，其为 Debian 的 buster 版本。容器中无法补齐命令，无法查看以往命令，使用较为麻烦。
安装编译相关工具和库，因为有些 python 库要本地编译（据查，是没有该平台的预编译包）。
安装 numpy 等库。注意，由于官方没有现成的包，需要在本地编译，故会较耗时。
pip 安装会顺带安装相应依赖包。
写程序验证（本文略）。
国内源是为了加快下载速度。编译耗时取决于机器性能。

知识点

在 x86 上运行 arm 容器。
从头开始编译、安装 python 库。
从容器变成镜像。

实验步骤

运行容器

建立 pc 端运行 arm 容器环境：

1	docker run --rm --privileged multiarch/qemu-user-static --reset -p yes

注：经测试发现，ubuntu 内核需在 4.8 以上。

运行基础镜像：

1	docker run -itd --name pythonslim arm32v7/python:3.7-slim sh

以下命令中，安装、测试等在容器内进行。与 docker 有关的，在宿主机上进行。本文假定读者能区别出来。

添加源

添加 debian 国内源，文件：

cat > /etc/apt/sources.list <<-EOF
deb http://mirrors.aliyun.com/debian/ buster main non-free contrib
deb-src http://mirrors.aliyun.com/debian/ buster main non-free contrib
deb http://mirrors.aliyun.com/debian-security buster/updates main
deb-src http://mirrors.aliyun.com/debian-security buster/updates main
deb http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib
deb-src http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib
deb http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib
deb-src http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib
EOF

原始内容为：

# deb http://snapshot.debian.org/archive/debian/20200414T000000Z buster main
deb http://deb.debian.org/debian buster main
# deb http://snapshot.debian.org/archive/debian-security/20200414T000000Z buster/updates main
deb http://security.debian.org/debian-security buster/updates main
# deb http://snapshot.debian.org/archive/debian/20200414T000000Z buster-updates main
deb http://deb.debian.org/debian buster-updates main

添加 pip 国内源：

mkdir ~/.pip/
cat > ~/.pip/pip.conf <<-EOF
[global]
index-url = https://pypi.tuna.tsinghua.edu.cn/simple
[install]
trusted-host = https://pypi.tuna.tsinghua.edu.cn
EOF

注1：slim 版本没有 vi 编辑器，故用此法。
注2：也可在 pip 安装时用 -i 临时指定源地址。

安装编译环境

apt-get install gcc g++ gfortran python-dev libopenblas-dev libblas-dev liblapack-dev cython -y

apt-get install libfreetype6-dev libpng-dev -y
apt-get install pkg-config -y  # 注：需要此工具找freetype
apt-get install libfontconfig1-dev -y

安装包

pip install numpy==1.18.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install pandas==0.23.4 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install scipy==1.4.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install Cython -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install sklearn -i https://pypi.tuna.tsinghua.edu.cn/simple   注：依赖scipy Cython

pip install six -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install xlrd -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pyparsing -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install python-dateutil -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install matplotlib==3.2.1 -i https://pypi.tuna.tsinghua.edu.cn/simple  # 注：要freetype，先不安装
pip install pyhht -i https://pypi.tuna.tsinghua.edu.cn/simple  # 注：需要 scipy、matplotlib

注1：可用pip list查看安装的库及其版本。
注2：安装（编译） numpy、pandas、scipy、sklearn 等较耗时，每个包耗时数小时不等（因 slim 容器没有 time 命令，无法知道具体耗时时间）。

查看安装的包

本容器安装的包：

# pip list
Package         Version     
--------------- ------------
cycler          0.10.0      
Cython          0.29.16     
freetype-py     2.1.0.post1 
joblib          0.14.1      
kiwisolver      1.2.0       
matplotlib      3.2.1       
numpy           1.18.1      
pandas          0.23.4      
pip             20.0.2      
pyhht           0.1.0       
pyparsing       2.4.7       
python-dateutil 2.8.1       
pytz            2019.3      
scikit-learn    0.22.2.post1
scipy           1.4.1       
setuptools      46.1.3      
six             1.14.0      
sklearn         0.0         
wheel           0.34.2      
xlrd            1.2.0

验证

# python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets,linear_model

如果没有错误输出，说明安装成功。

制作镜像

查看容器体积：

# du -h --max-depth=1
136M    ./tmp
26M     ./var
2.5M    ./sbin
3.2M    ./bin
220M    ./root
1.5M    ./etc
835M    ./usr
7.1M    ./lib

1.2G

将原始的容器保存为镜像：

1	docker commit pythonslim python-pandas-build:arm

这一步的目的是为了保留编译信息。方便后续制作。

清除不必要的文件：

apt-get autoremove python2   # bzip2 file会被删除
apt-get autoremove gcc g++ gfortran # libgomp1 binutils  binutils-arm-linux-gnueabihf 会被删除

apt-get autoremove cython
apt-get autoremove perl

apt-get autoremove openssl

rm /usr/bin/perl  /usr/bin/perl5.28.1

补回被删除的包：

1 2	apt-get install libgomp1 # 注：sklearn依赖此包 apt-get install bzip2

清除缓存：

1
2
3

apt-get clean && rm -rf /var/lib/apt/lists/*

rm -rf /root/.cache

经分析，slim 版本的镜像本身就超过 100 MB，加上 python 的几个重要库，经精简后，体积仍近 600 MB。其中占大头的目录是/usr/local/lib/python3.7/site-packages。

将精简后的容器保存为镜像：

1	docker commit pythonslim python-pandas:arm

打标签，提交(到笔者的阿里云仓库)：

1
2
3

docker tag python-pandas:arm registry.cn-hangzhou.aliyuncs.com/latelee/python-pandas:arm

docker push registry.cn-hangzhou.aliyuncs.com/latelee/python-pandas:arm

注：Docker 构建是分层的，不能在python-pandas-build:arm镜像中精简，因为此镜像已超 1 GB，即使删除文件，Docker 镜像亦举减少。因此，需要在pythonslim中精简。

运行：

1	docker run -itd --name pandas -v $PWD:/work registry.cn-hangzhou.aliyuncs.com/latelee/python-pandas:arm sh

问题及解决

安装了 libfreetype6-dev 后，编译 matplotlib 时还是提示 freetype 版本过低（即找不到库），后添加 pkg-config ，可编译通过。

使用

registry.cn-hangzhou.aliyuncs.com/latelee/python-pandas:arm为公开镜像（仅在当前可访问，后续不保证）。安装软件需要执行apt-get update。

参考

scipy 镜像构建参考：
https://github.com/publysher/docker-alpine-numpy
https://github.com/publysher/docker-alpine-scipy
https://github.com/publysher/docker-alpine-sklearn
https://github.com/amancevice/docker-pandas

scipy 安装指导：
https://docs.scipy.org/doc/scipy-1.1.0/reference/building/linux.html

python 的 alpine 镜像的问题：
https://pythonspeed.com/articles/alpine-docker-python/

scipy 在 alpine 上安装问题：
https://github.com/scipy/scipy/issues/9481
https://github.com/scipy/scipy/issues/9338

附

slim镜像出错信息：

 ERROR: Command errored out with exit status 1:
     command: /usr/local/bin/python /usr/local/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp5_jvogwa
         cwd: /tmp/pip-install-kyd6kagr/scipy
    Complete output (137 lines):
    lapack_opt_info:
    lapack_mkl_info:
    customize UnixCCompiler
      libraries mkl_rt not found in ['/usr/local/lib', '/usr/lib', '/usr/lib/']
      NOT AVAILABLE
    
    openblas_lapack_info:
    customize UnixCCompiler
    customize UnixCCompiler
      libraries openblas not found in ['/usr/local/lib', '/usr/lib', '/usr/lib/']
      NOT AVAILABLE
    
    openblas_clapack_info:
    customize UnixCCompiler
    customize UnixCCompiler
      libraries openblas,lapack not found in ['/usr/local/lib', '/usr/lib', '/usr/lib/']
      NOT AVAILABLE
    
    atlas_3_10_threads_info:
    Setting PTATLAS=ATLAS
    customize UnixCCompiler
      libraries tatlas,tatlas not found in /usr/local/lib
    customize UnixCCompiler
      libraries lapack_atlas not found in /usr/local/lib
    customize UnixCCompiler
      libraries tatlas,tatlas not found in /usr/lib
    customize UnixCCompiler
      libraries lapack_atlas not found in /usr/lib
    customize UnixCCompiler
      libraries tatlas,tatlas not found in /usr/lib/
    customize UnixCCompiler
      libraries lapack_atlas not found in /usr/lib/
    
      NOT AVAILABLE
    
    atlas_3_10_info:
    customize UnixCCompiler
      libraries satlas,satlas not found in /usr/local/lib
    customize UnixCCompiler
      libraries lapack_atlas not found in /usr/local/lib
    customize UnixCCompiler
      libraries satlas,satlas not found in /usr/lib
    customize UnixCCompiler
      libraries lapack_atlas not found in /usr/lib
    customize UnixCCompiler
      libraries satlas,satlas not found in /usr/lib/
    customize UnixCCompiler
      libraries lapack_atlas not found in /usr/lib/
    
      NOT AVAILABLE
    
    atlas_threads_info:
    Setting PTATLAS=ATLAS
    customize UnixCCompiler
      libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
    customize UnixCCompiler
      libraries lapack_atlas not found in /usr/local/lib
    customize UnixCCompiler
      libraries ptf77blas,ptcblas,atlas not found in /usr/lib
    customize UnixCCompiler
      libraries lapack_atlas not found in /usr/lib
    customize UnixCCompiler
      libraries ptf77blas,ptcblas,atlas not found in /usr/lib/
    customize UnixCCompiler
      libraries lapack_atlas not found in /usr/lib/
    
      NOT AVAILABLE
    
    atlas_info:
    customize UnixCCompiler
      libraries f77blas,cblas,atlas not found in /usr/local/lib
    customize UnixCCompiler
      libraries lapack_atlas not found in /usr/local/lib
    customize UnixCCompiler
      libraries f77blas,cblas,atlas not found in /usr/lib
    customize UnixCCompiler
      libraries lapack_atlas not found in /usr/lib
    customize UnixCCompiler
      libraries f77blas,cblas,atlas not found in /usr/lib/
    customize UnixCCompiler
      libraries lapack_atlas not found in /usr/lib/
    
      NOT AVAILABLE
    
    lapack_info:
    customize UnixCCompiler
      libraries lapack not found in ['/usr/local/lib', '/usr/lib', '/usr/lib/']
      NOT AVAILABLE
    
    lapack_src_info:
      NOT AVAILABLE
    
      NOT AVAILABLE
    
    setup.py:420: UserWarning: Unrecognized setuptools command ('dist_info --egg-base /tmp/pip-modern-metadata-n429kcrr'), proceeding with generating Cython sources and expanding templates
      ' '.join(sys.argv[1:])))
    Running from scipy source directory.
    /tmp/pip-build-env-dwf79396/overlay/lib/python3.7/site-packages/numpy/distutils/system_info.py:624: UserWarning:
        Atlas (http://math-atlas.sourceforge.net/) libraries not found.
        Directories to search for the libraries can be specified in the
        numpy/distutils/site.cfg file (section [atlas]) or by setting
        the ATLAS environment variable.
      self.calc_info()
    /tmp/pip-build-env-dwf79396/overlay/lib/python3.7/site-packages/numpy/distutils/system_info.py:624: UserWarning:
        Lapack (http://www.netlib.org/lapack/) libraries not found.
        Directories to search for the libraries can be specified in the
        numpy/distutils/site.cfg file (section [lapack]) or by setting
        the LAPACK environment variable.
      self.calc_info()
    /tmp/pip-build-env-dwf79396/overlay/lib/python3.7/site-packages/numpy/distutils/system_info.py:624: UserWarning:
        Lapack (http://www.netlib.org/lapack/) sources not found.
        Directories to search for the sources can be specified in the
        numpy/distutils/site.cfg file (section [lapack_src]) or by setting
        the LAPACK_SRC environment variable.
      self.calc_info()
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 257, in 
        main()
      File "/usr/local/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 240, in main
        json_out['return_val'] = hook(**hook_input['kwargs'])
      File "/usr/local/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 110, in prepare_metadata_for_build_wheel
        return hook(metadata_directory, config_settings)
      File "/tmp/pip-build-env-dwf79396/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 158, in prepare_metadata_for_build_wheel
        self.run_setup()
      File "/tmp/pip-build-env-dwf79396/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 250, in run_setup
        self).run_setup(setup_script=setup_script)
      File "/tmp/pip-build-env-dwf79396/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 143, in run_setup
        exec(compile(code, __file__, 'exec'), locals())
      File "setup.py", line 540, in 
        setup_package()
      File "setup.py", line 536, in setup_package
        setup(**metadata)
      File "/tmp/pip-build-env-dwf79396/overlay/lib/python3.7/site-packages/numpy/distutils/core.py", line 135, in setup
        config = configuration()
      File "setup.py", line 435, in configuration
        raise NotFoundError(msg)
    numpy.distutils.system_info.NotFoundError: No lapack/blas resources found.
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/local/bin/python /usr/local/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp5_jvogwa Check the logs for full command output.
WARNING: You are using pip version 19.3.1; however, version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

原因：缺少编译依赖的库。

1
2
3

ModuleNotFoundError: No module named 'Cython'
raise ModuleNotFoundError(message)
ModuleNotFoundError: Please install Cython with a version >= 0.28.5 in order to build a scikit-learn from source.

原因：Cython 未安装。

1 2	raise ReadTimeoutError(self._pool, None, "Read timed out.") pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

原因：网络原因超时，重试。

src/checkdep_freetype2.c:3:6: error: #error "FreeType version 2.3 or higher is required. You may set the MPLLOCALFREETYPE environment variable to 1 to let Matplotlib download it."
       #error "FreeType version 2.3 or higher is required. \
        ^~~~~
  src/checkdep_freetype2.c:10:10: error: #include expects "FILENAME" or 
   #include FT_FREETYPE_H
            ^~~~~~~~~~~~~
  src/checkdep_freetype2.c:15:9: note: #pragma message: Compiling with FreeType version FREETYPE_MAJOR.FREETYPE_MINOR.FREETYPE_PATCH.
   #pragma message("Compiling with FreeType version " \
           ^~~~~~~
  src/checkdep_freetype2.c:18:4: error: #error "FreeType version 2.3 or higher is required. You may set the MPLLOCALFREETYPE environment variable to 1 to let Matplotlib download it."
     #error "FreeType version 2.3 or higher is required. \
      ^~~~~
  error: command 'gcc' failed with exit status 1

原因：安装 freetype。

alpine镜像：

libraries lapack_atlas not found in /usr/local/lib
      libraries tatlas,tatlas not found in /usr/local/lib
      libraries lapack_atlas not found in /usr/lib
      libraries tatlas,tatlas not found in /usr/lib
    
      NOT AVAILABLE
    
    atlas_3_10_info:
      libraries lapack_atlas not found in /usr/local/lib
      libraries satlas,satlas not found in /usr/local/lib
      libraries lapack_atlas not found in /usr/lib
      libraries satlas,satlas not found in /usr/lib
    
      NOT AVAILABLE
    
    atlas_threads_info:
    Setting PTATLAS=ATLAS
      libraries lapack_atlas not found in /usr/local/lib
      libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
      libraries lapack_atlas not found in /usr/lib
      libraries ptf77blas,ptcblas,atlas not found in /usr/lib
    
      NOT AVAILABLE
    
    atlas_info:
      libraries lapack_atlas not found in /usr/local/lib
      libraries f77blas,cblas,atlas not found in /usr/local/lib
      libraries lapack_atlas not found in /usr/lib
      libraries f77blas,cblas,atlas not found in /usr/lib
    
      NOT AVAILABLE
    
    lapack_info:
      libraries lapack not found in ['/usr/local/lib', '/usr/lib']
      NOT AVAILABLE
    
    lapack_src_info:
      NOT AVAILABLE
    
      NOT AVAILABLE
    
    running dist_info
    running build_src
    build_src
    building py_modules sources
    creating build
    creating build/src.linux-armv7l-3.5
    creating build/src.linux-armv7l-3.5/numpy
    creating build/src.linux-armv7l-3.5/numpy/distutils
    building library "npymath" sources
    Could not locate executable gfortran
    Could not locate executable f95
    Could not locate executable ifort
    Could not locate executable ifc
    Could not locate executable lf95
    Could not locate executable pgfortran
    Could not locate executable f90
    Could not locate executable f77
    Could not locate executable fort
    Could not locate executable efort
    Could not locate executable efc
    Could not locate executable g77
    Could not locate executable g95
    Could not locate executable pathf95
    Could not locate executable nagfor
    don't know how to compile Fortran code on platform 'posix'
    Running from numpy source directory.
    setup.py:461: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
      run_build = parse_setuppy_commands()
    /tmp/pip-install-lpg1hl36/numpy/numpy/distutils/system_info.py:1896: UserWarning:
        Optimized (vendor) Blas libraries are not found.
        Falls back to netlib Blas library which has worse performance.
        A better performance should be easily gained by switching
        Blas library.
      if self._calc_info(blas):
    /tmp/pip-install-lpg1hl36/numpy/numpy/distutils/system_info.py:1896: UserWarning:
        Blas (http://www.netlib.org/blas/) libraries not found.
        Directories to search for the libraries can be specified in the
        numpy/distutils/site.cfg file (section [blas]) or by setting
        the BLAS environment variable.
      if self._calc_info(blas):
    /tmp/pip-install-lpg1hl36/numpy/numpy/distutils/system_info.py:1896: UserWarning:
        Blas (http://www.netlib.org/blas/) sources not found.
        Directories to search for the sources can be specified in the
        numpy/distutils/site.cfg file (section [blas_src]) or by setting
        the BLAS_SRC environment variable.
      if self._calc_info(blas):
    /tmp/pip-install-lpg1hl36/numpy/numpy/distutils/system_info.py:1730: UserWarning:
        Lapack (http://www.netlib.org/lapack/) libraries not found.
        Directories to search for the libraries can be specified in the
        numpy/distutils/site.cfg file (section [lapack]) or by setting
        the LAPACK environment variable.
      return getattr(self, '_calc_info_{}'.format(name))()
    /tmp/pip-install-lpg1hl36/numpy/numpy/distutils/system_info.py:1730: UserWarning:
        Lapack (http://www.netlib.org/lapack/) sources not found.
        Directories to search for the sources can be specified in the
        numpy/distutils/site.cfg file (section [lapack_src]) or by setting
        the LAPACK_SRC environment variable.
      return getattr(self, '_calc_info_{}'.format(name))()
    /usr/local/lib/python3.5/distutils/dist.py:261: UserWarning: Unknown distribution option: 'define_macros'
      warnings.warn(msg)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.5/site-packages/pip/_vendor/pep517/_in_process.py", line 257, in 
        main()
      File "/usr/local/lib/python3.5/site-packages/pip/_vendor/pep517/_in_process.py", line 240, in main
        json_out['return_val'] = hook(**hook_input['kwargs'])
      File "/usr/local/lib/python3.5/site-packages/pip/_vendor/pep517/_in_process.py", line 110, in prepare_metadata_for_build_wheel
        return hook(metadata_directory, config_settings)
      File "/tmp/pip-build-env-35qyjrc3/overlay/lib/python3.5/site-packages/setuptools/build_meta.py", line 158, in prepare_metadata_for_build_wheel
        self.run_setup()
      File "/tmp/pip-build-env-35qyjrc3/overlay/lib/python3.5/site-packages/setuptools/build_meta.py", line 250, in run_setup
        self).run_setup(setup_script=setup_script)
      File "/tmp/pip-build-env-35qyjrc3/overlay/lib/python3.5/site-packages/setuptools/build_meta.py", line 143, in run_setup
        exec(compile(code, __file__, 'exec'), locals())
      File "setup.py", line 488, in 
        setup_package()
      File "setup.py", line 480, in setup_package
        setup(**metadata)
      File "/tmp/pip-install-lpg1hl36/numpy/numpy/distutils/core.py", line 171, in setup
        return old_setup(**new_attr)
      File "/tmp/pip-build-env-35qyjrc3/overlay/lib/python3.5/site-packages/setuptools/__init__.py", line 144, in setup
        return distutils.core.setup(**attrs)
      File "/usr/local/lib/python3.5/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/local/lib/python3.5/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/usr/local/lib/python3.5/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/tmp/pip-build-env-35qyjrc3/overlay/lib/python3.5/site-packages/setuptools/command/dist_info.py", line 31, in run
        egg_info.run()
      File "/tmp/pip-install-lpg1hl36/numpy/numpy/distutils/command/egg_info.py", line 26, in run
        self.run_command("build_src")
      File "/usr/local/lib/python3.5/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/local/lib/python3.5/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/tmp/pip-install-lpg1hl36/numpy/numpy/distutils/command/build_src.py", line 146, in run
        self.build_sources()
      File "/tmp/pip-install-lpg1hl36/numpy/numpy/distutils/command/build_src.py", line 157, in build_sources
        self.build_library_sources(*libname_info)
      File "/tmp/pip-install-lpg1hl36/numpy/numpy/distutils/command/build_src.py", line 290, in build_library_sources
        sources = self.generate_sources(sources, (lib_name, build_info))
      File "/tmp/pip-install-lpg1hl36/numpy/numpy/distutils/command/build_src.py", line 380, in generate_sources
        source = func(extension, build_dir)
      File "numpy/core/setup.py", line 661, in get_mathlib_info
        raise RuntimeError("Broken toolchain: cannot link a simple C program")
    RuntimeError: Broken toolchain: cannot link a simple C program
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/local/bin/python /usr/local/lib/python3.5/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmpxxww43e_ Check the logs for full command output.
The command '/bin/sh -c pip install $(grep numpy requirements.txt) &&     pip install -r requirements.txt' returned a non-zero code: 1

原因：缺少库，同 slim 版本，本文不采用。

我的docker随笔25：一个测试用的镜像制作过程

2020-06-17T16:23:28.000Z

本文记录制作一个镜像的过程，先构建可运行静态程序的镜像，以此为基础，构建一个golang语言编写的web服务器，可获取容器的主机、内核版本等信息。该镜像可用于 k8s 和 KubeEdge 群集测试。

环境说明

安装docker，登陆到dockerhub。
安装golang编译器，用于编译源码。
安装 qemu，用于在 x86 平台上运行 arm 版本容器。如无此需求，可忽略。

1	sudo apt install qemu-user-static

基于manifest制作镜像，适用于 x86 和 arm 平台。
注意，这里说的 x86，实际是64位系统，应该称为amd64，说 x86 仅是习惯而已，非错误。但 arm 平台，是指 32 位系统，因笔者暂无 64 位系统，后续再完善。

镜像设计

如下：

latelee/busybox  这是对外使用的镜像名称，根据不同平台自动匹配下载
latelee/busybox-arm
latelee/busybox-amd64

latelee/webgin 
latelee/webgin-arm
latelee/webgin-amd64

基础镜像

官方busybox支持众多平台，但默认的版本没有一些依赖文件。但glibc版本有。
下面从实践角度描述如何制作。

制作x86平台基础镜像

下载：

1	docker pull busybox

制作如下：

运行：
docker run -itd --name busybox busybox
创建目录：
docker exec -it busybox mkdir -p /lib/x86_64-linux-gnu /lib64
拷贝运行库、链接器：
docker cp -a /lib/x86_64-linux-gnu/libpthread.so.0 busybox:/lib/x86_64-linux-gnu
docker cp -a /lib/x86_64-linux-gnu/libpthread-2.23.so busybox:/lib/x86_64-linux-gnu
docker cp -a /lib/x86_64-linux-gnu/libc-2.23.so  busybox:/lib/x86_64-linux-gnu
docker cp -a /lib/x86_64-linux-gnu/libc.so.6  busybox:/lib/x86_64-linux-gnu
docker cp -a /lib64/ld-linux-x86-64.so.2 busybox:/lib64/
docker cp -a /lib/x86_64-linux-gnu/ld-2.23.so busybox:/lib/x86_64-linux-gnu/

保存为镜像
docker commit busybox latelee/busybox-amd64

测试(预期结果有上述文件输出)
docker run -it --rm latelee/busybox-amd64 ls -lh /lib/x86_64-linux-gnu /lib64

提交：

1	docker push latelee/busybox-amd64

制作arm平台基础镜像

在一块安装了 docker 的 arm 板子上执行：

1	docker pull busybox

注：该命令与上述完全相同，因其系统不同，dockerhub自动匹配到合适的并下载。在真实机器上是为了确保镜像的可靠性。

制作如下：

运行：
docker run -itd --name busybox busybox
创建目录：
docker exec -it busybox mkdir -p /usr/lib/ /lib
拷贝运行库、链接器：
docker cp /lib/ld-2.25.so busybox:/lib/
docker cp /lib/ld-linux-armhf.so.3 busybox:/lib/
docker cp /usr/lib/libpthread-2.25.so busybox:/usr/lib
docker cp /usr/lib/libpthread.so.0 busybox:/usr/lib
docker cp /usr/lib/libc.so.6 busybox:/usr/lib/
docker cp /usr/lib/libc-2.25.so busybox:/usr/lib/

保存为镜像
docker commit busybox latelee/busybox-arm

测试(预期结果有上述文件输出)
docker run -it --rm latelee/busybox-arm ls -lh /usr/lib/ /lib

提交：

1	docker push latelee/busybox-arm

使用glibc版本

直接使用busybox:glibc版本制作，无法额外拷贝文件。在 x86 上执行：

1 2	docker pull busybox:glibc docker tag busybox:glibc latelee/busybox-amd64

在 arm 上执行：

1 2	docker pull busybox:glibc docker tag busybox:glibc latelee/busybox-arm

注：笔者使用前面小节的方法，glibc版本可能后续更新。

在 x86 上运行 arm 版本容器

有时不方便在 arm 板子上运行，则可以在 x86 上模拟之。
挂载 qemu-arm-static 文件：

1	docker run -it --rm -v /usr/bin/qemu-arm-static:/usr/bin/qemu-arm-static latelee/busybox-arm ls -lh /usr/lib/ /lib

另一方法，运行 qemu-user-static 容器，再运行 arm 容器：

1 2	docker run --rm --privileged multiarch/qemu-user-static --reset -p yes docker run -it --rm latelee/busybox-arm ls -lh /usr/lib/ /lib

多平台支持

技术要点：开启docker实验功能，预先提交不同平台的镜像到dockerhub上，创建manifest，推送。

export DOCKER_CLI_EXPERIMENTAL=enabled

docker manifest create latelee/busybox latelee/busybox-amd64 latelee/busybox-arm

docker manifest annotate latelee/busybox latelee/busybox-amd64 --os linux --arch amd64
docker manifest annotate latelee/busybox latelee/busybox-arm --os linux --arch arm

查看：
docker manifest inspect latelee/busybox

推送：
docker manifest push latelee/busybox

webgin

webgin 是指用 gin 框架编写的 web 服务，开放80端口，可输出主机信息。其构建方式与上述类似，不再赘述。webgin.go源码如下：

package main

import (
    "fmt"
    "runtime"
    "os"
    "time"
    "github.com/gin-gonic/gin"
    "net/http"
)

// uname

/*
#include 
#include 

char* GetName()
{
    arch := fmt.Sprintf("arch: %s os: %s hostname: %s\r\n", runtime.GOARCH, runtime.GOOS, hostname)
    struct utsname myname;
    static char buffer[128] = {0};
    uname(&myname);
    
    snprintf(buffer, 128, "uname: %s %s %s %s %s\r\n", myname.sysname, 
                    myname.nodename, myname.release, 
                    myname.version, myname.machine);
    return buffer;
}
*/
import "C"

var version = "v1.0"

func myIndex (c *gin.Context) {
    uname := C.GetName()
    name := C.GoString(uname)
    hostname, _ := os.Hostname()
    arch := "arch: " + runtime.GOARCH + " os: " + runtime.GOOS + " hostname: " + hostname + "\r\n";
    timeStr := "Now: " + time.Now().Format("2006-01-02 15:04:05") + "\r\n"
    c.String(http.StatusOK, "Hello World " + version + "\r\n" + arch + name + timeStr)
}

func main(){
    router := gin.Default()
    router.GET("/", myIndex)
    fmt.Println("gin server start...")
    router.Run(":80")
}

构建脚本：

#!/bin/sh

export GOARCH=amd64
export GOOS="linux"
export GOARM= 
export CGO_ENABLED=1
export CC=gcc
GO111MODULE=off go build
strip webgin
docker build -t latelee/webgin-amd64 . -f Dockerfile

export GOARCH=arm
export GOOS="linux"
export GOARM=7 
export CGO_ENABLED=1
export CC=arm-linux-gnueabihf-gcc
GO111MODULE=off go build
arm-linux-gnueabihf-strip webgin
docker build -t latelee/webgin-arm . -f Dockerfile.arm

dockerfile：

From latelee/busybox-amd64

LABEL maintainer="Late Lee"

COPY webgin /

EXPOSE 80

CMD ["/webgin"]

运行：

1	docker run -it --name webgin --rm -p 80:80 latelee/webgin

测试：

# curl localhost:80
Hello World v1.0
arch: amd64 os: linux hostname: 60acfd65857a
uname: Linux 60acfd65857a 4.4.0-174-generic #204-Ubuntu SMP Wed Jan 29 06:41:01 UTC 2020 x86_64
Now: 2020-03-26 23:10:36

依赖文件确认

缺少链接器：
/ # ./webgin 
sh: ./webgin: not found
其它：
/ # /webgin 
/webgin: error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file or directory

/ # /webgin 
/webgin: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory

附

如果官方 dockerhub 速度慢，可选用阿里云容器镜像服务。其企业版本已于2020年3月中旬商业化，个人版不太清楚。
登陆阿里云仓库：sudo docker login --username=li@latelee.org registry.cn-hangzhou.aliyuncs.com。
已完成版本：

1 2	registry.cn-hangzhou.aliyuncs.com/latelee/webgin 版本：v1.0 v1.1 v1.2 registry.cn-hangzhou.aliyuncs.com/latelee/busybox

我的docker随笔24：docker产生coredump文件

2020-06-16T16:51:54.000Z

docker容器中运行 C++ 程序，有时崩溃，但没日志，也没提示。之前掌握了 coredump 调试，本文在容器中尝试。

操作

先查看core大小：

1 2	$ ulimit -a \| grep core core file size (blocks, -c) 0

设置：

1	$ulimit -c unlimited

再查看：

1 2	$ ulimit -a \| grep core core file size (blocks, -c) unlimited

设置路径

1	$ sudo echo 'core.%t.%e.%p' \| sudo tee /proc/sys/kernel/core_pattern

运行镜像：

1	docker run -v /home:/home -it latelee/myserver bash

进入对应的程序目录：

1	# cd /home/latelee/docker/test/myserver/

运行有段错误的测试程序：

1 2	# ./myserver Segmentation fault (core dumped)

查看是否产生：

1 2	# ls Dockerfile core.1535079291.myserver.11 entrypoint.sh config.ini myserver

生成的coredump文件为core.1535079291.myserver.11

core设置永久生效

编辑/etc/security/limits.conf文件，修改core相关的配置项，如下：

1 2	* soft core unlimited root hard core unlimited

编辑/etc/sysctl.conf文件，在文件最后添加：

1	kernel.core_pattern = core.%t.%e.%p

注：以上2个文件均需root权限打开。

小结

0、程序必须使用-g编译，即程序是带有调试信息的，否则，即使有coredump，也看不出问题所在。
1、先在宿主机上执行ulimit -c unlimited，并且设置生成coredump路径。
2、再在docker容器里执行程序。

经验

1、网上有说法提到在docker run时带--ulimit core=-1 --security-opt seccomp=unconfined参数，经验证，带与不带都可以生成coredump文件。由于笔者一般使用docker-compose来编排容器，这个还不知道怎么写到docker-compose.yml文件，所以暂不使用。
2、关于设置coredump文件路径，建议在/tmp或单独挂载的目录，上文仅是演示，没有实际指导意义。

我的docker随笔23：修改容器时区和添加中文支持

2020-06-16T16:49:09.000Z

许多 docker 镜像没有时区，默认是0时区，对于日志的时间显示，可能不太友好。另外有些镜像无法输出中文，也不太好友。本文以 busybox 为例，尝试解决此类问题。

时区支持

运行busybox：

1	docker run -itd --rm --name busybox latelee/busybox

docker exec -it busybox date
Fri Mar 20 05:13:50 UTC 2020

docker exec -it busybox cat /etc/localtime
TZif2UTCTZif2UTC
UTC0

查看本地时区文件：

ls -l /etc/localtime 
lrwxrwxrwx 1 root root 33 Dec 17 21:50 /etc/localtime -> /usr/share/zoneinfo/Asia/Shanghai
ls -l /usr/share/zoneinfo/Asia/Shanghai
lrwxrwxrwx 1 root root 6 Oct  3 05:06 /usr/share/zoneinfo/Asia/Shanghai -> ../PRC
cat /usr/share/zoneinfo/PRC 
TZif2
     Ӌ{
pMTCDTCST
CST-8

拷贝本地时区文件：

1	docker cp /usr/share/zoneinfo/PRC busybox:/etc/localtime

查看：

1 2	docker exec -it busybox date Fri Mar 20 13:14:27 CST 2020

如果在 k8s 中

apiVersion: v1
kind: Pod
metadata:
  name: busybox-pod1
  labels:
    app: busybox
spec:
  containers:
  - name: busybox1
    image: latelee/busybox
    imagePullPolicy: IfNotPresent
    command: ["sh", "-c", "sleep 3600"]
    volumeMounts:
    - mountPath: /test111
      name: host-volume
    - mountPath: /etc/localtime
      name: time-zone
  volumes:
  - name: host-volume
    hostPath: 
      path: /data
  - name: time-zone
    hostPath: 
      path: /etc/localtime

字符编码

进入容器，设置环境变量：

1
2
3

export LANG=C.UTF-8 
export LANGUAGE=C.UTF-8
export LC_ALL=C.UTF-8

在 Dockerfile 文件中可如此使用：

1
2
3

ENV LANG C.UTF-8 
ENV LANGUAGE C.UTF-8
ENV LC_ALL C.UTF-8

设置前后的输出对比：

/ # æ?????
sh: æ??是中文: not found


/ # 我是中文
sh: 我是中文: not found

制作

将该容器保存为新的镜像即可。另外可用 Dockerfile 制作。

总结

是否添加支持，取决于实际需求，如果所有基础镜像均是自己维护，建议添加。

我的docker随笔22：多域名同主机部署

2020-06-16T16:48:28.000Z

需求：
只有一台云主机，但有多个不同域名网站，甚至还有二级域名。不能通过端口访问，即只有域名（二级域名）。同时需要启用 https （到期自动更新证书）。
网站内容为静态文件（当前暂定），docker 部署。使用原始httpd镜像，但是网站文件挂载。
使用 gitlab 管理网站源文件，通过 CI 构建静态文件，并自动更新到云主机。

云主机

需要在域名后台添加子域名和IP的映射，否则无法访问。
云主机需要开启 80 和 443 端口。

部署

为方便管理，使用docker-compose部署。
反向代理：

proxy:
    image: jwilder/nginx-proxy
    container_name: nginx-proxy
    restart: always
    ports:
      - 80:80
      - 443:443
    labels:
      com.github.jrcs.letsencrypt_nginx_proxy_companion.nginx_proxy: "true"
    volumes:
      - ./certs:/etc/nginx/certs:ro
      - ./acme:/acmecerts
      - ./vhost.d:/etc/nginx/vhost.d
      - ./html:/usr/share/nginx/html
      - /var/run/docker.sock:/tmp/docker.sock:ro
    networks:

使用镜像名为jwilder/nginx-proxy，可到官网了解文档。
其中需要映射 80 和 443 端口，标签com.github.jrcs.letsencrypt_nginx_proxy_companion.nginx_proxy是必须的，否则docker-letsencrypt-nginx-proxy-companion连不上代理容器。为方便管理，大部分挂载目录都在当前目录，注意，certs 是只读的。

https 认证：

letsencrypt-companion:
    image: jrcs/letsencrypt-nginx-proxy-companion
    container_name: letsencrypt
    restart: always
    volumes:
      - ./certs:/etc/nginx/certs
      - ./vhost.d:/etc/nginx/vhost.d
      - ./html:/usr/share/nginx/html
      - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
      - nginxp-net
    depends_on:
      - proxy

使用镜像jrcs/letsencrypt-nginx-proxy-companion，可到官网了解详情。挂载目录也是必须的，其中自动创建的证书位于 certs 目录中。（存疑：证书有效期为三个月，到期是否自动更新？）

网站服务容器：

http1:
  image: httpd
  container_name: httpd1
  volumes:
    - ./html1:/usr/local/apache2/htdocs/
  environment:
    - VIRTUAL_HOST=latelee.org,www.latelee.org
    - LETSENCRYPT_HOST=latelee.org,www.latelee.org
    - LETSENCRYPT_EMAIL=test@latelee.org
    - ENABLE_ACME=true
  networks:
    - nginxp-net

挂载当前目录 html1 为 apache 的服务根目录，里面为静态网站文件。注意，由于 httpd 只暴露了一个 80 端口，所以此处不需要指定 VIRTUAL_PORT 的值，反向代理容器会自动指定这个端口。VIRTUAL_HOST 和 LETSENCRYPT_HOST 指定了主机域名，可同时指定多个，使用逗号隔开即可。为方便理解，一般指定无前缀和带www的域名。LETSENCRYPT_EMAIL 指定邮箱，方便接收 letsencrypt 发的邮件。

这是一个模板，可以根据需求任意添加。如：

http2:
  image: httpd
  container_name: httpd2
  volumes:
    - ./html1:/usr/local/apache2/htdocs/
  environment:
    - VIRTUAL_HOST=i.latelee.org
    - LETSENCRYPT_HOST=i.latelee.org
    - LETSENCRYPT_EMAIL=test@latelee.org
    - ENABLE_ACME=true
  networks:
    - nginxp-net

启动

使用如下命令启动：

1	docker-compose up -d

可针对个别服务进行启动、停止操作，如：

1
2
3

docker-compose up -d http2
docker-compose stop http2
docker-compose start http2

参考资料

https://www.v2ex.com/amp/t/365967

jekyll生成页面
先安装ruby，使用管理员权限打开cmd，执行gem install jekyll。
构建：
bundle exec jekyll b

运行：
bundle exec jekyll serve –incremental -H 0.0.0.0 -P 80

备份网站。全部在github上

后台：去掉网站名称的cname，登陆github.com，去掉cname OK

CI自动化：
统一在个人账号管理所有源文件。通过ci，生成静态文件，分别提交到云主机以及github.com对应仓库。
提交云：登陆云，删除对应目录内容，再拷贝。

更新文件后，不会自动刷新，重启容器可行。更好方法？

网站：
工作室：www.cststudio.com

个人：www.latelee.org blog.latelee.org i.latelee.org(jekyll)
大锤：jekyll
真实：jekyll

备份：
www CNAME 默认

@ A 默认 120.79.237.54

出错：
ACME server returned an error: urn:ietf:params:acme:error:rateLimited :: There were too many requests of a given type :: Error creating new account :: too many registrations for this IP: see https://letsencrypt.org/docs/rate-limits/

networks:
mongocluster_default:
external: true

docker-compose文件：
version: “2”
services:
proxy:
image: jwilder/nginx-proxy
container_name: nginx-proxy
restart: always
ports:

  - 80:80  - 443:443labels:  com.github.jrcs.letsencrypt_nginx_proxy_companion.nginx_proxy: "true"volumes:  - ./certs:/etc/nginx/certs:ro  - ./acme:/acmecerts  - ./vhost.d:/etc/nginx/vhost.d  - ./html:/usr/share/nginx/html  - /var/run/docker.sock:/tmp/docker.sock:ronetworks:  - nginxp-net

letsencrypt:
image: jrcs/letsencrypt-nginx-proxy-companion
container_name: letsencrypt
restart: always
volumes:

  - ./certs:/etc/nginx/certs  - ./vhost.d:/etc/nginx/vhost.d  - ./html:/usr/share/nginx/html  - /var/run/docker.sock:/var/run/docker.sock:ronetworks:  - nginxp-netdepends_on:  - proxy

latelee:
image: registry.cn-hangzhou.aliyuncs.com/latelee/lidch
container_name: latelee
volumes:

  - ./html_latelee:/usr/local/apache2/htdocs/ environment:  - VIRTUAL_HOST=www.latelee.org  - LETSENCRYPT_HOST=www.latelee.org  - LETSENCRYPT_EMAIL=li@latelee.org  - ENABLE_ACME=truenetworks:  - nginxp-net

ilatelee:
image: registry.cn-hangzhou.aliyuncs.com/latelee/lidch
container_name: ilatelee
volumes:

  - ./html_i.latelee:/usr/local/apache2/htdocs/ environment:  - VIRTUAL_HOST=i.latelee.org  - LETSENCRYPT_HOST=i.latelee.org  - LETSENCRYPT_EMAIL=li@latelee.org  - ENABLE_ACME=truenetworks:  - nginxp-net

bloglatelee:
image: registry.cn-hangzhou.aliyuncs.com/latelee/lidch
container_name: bloglatelee
volumes:

  - ./html_blog.latelee:/usr/local/apache2/htdocs/ environment:  - VIRTUAL_HOST=blog.latelee.org  - LETSENCRYPT_HOST=blog.latelee.org  - LETSENCRYPT_EMAIL=li@latelee.org  - ENABLE_ACME=truenetworks:  - nginxp-net

cst:
image: registry.cn-hangzhou.aliyuncs.com/latelee/lidch
container_name: cst
volumes:

  - ./html_cst:/usr/local/apache2/htdocs/ environment:  - VIRTUAL_HOST=cststudio.com.cn,www.cststudio.com.cn  - LETSENCRYPT_HOST=cststudio.com.cn,www.cststudio.com.cn  - LETSENCRYPT_EMAIL=li@latelee.org  - ENABLE_ACME=truenetworks:  - nginxp-net

lijj:
image: registry.cn-hangzhou.aliyuncs.com/latelee/lidch
container_name: lijj
volumes:

  - ./html_lijj:/usr/local/apache2/htdocs/environment:  - VIRTUAL_HOST=lijiangjin.cn,www.lijiangjin.cn  - LETSENCRYPT_HOST=lijiangjin.cn,www.lijiangjin.cn  - LETSENCRYPT_EMAIL=li@latelee.org  - ENABLE_ACME=truenetworks:  - nginxp-net

lidch:
image: registry.cn-hangzhou.aliyuncs.com/latelee/lidch
container_name: lidch
volumes:

  - ./html_lidch:/usr/local/apache2/htdocs/ environment:  - VIRTUAL_HOST=lidachui.cn,www.lidachui.cn  - LETSENCRYPT_HOST=lidachui.cn,www.lidachui.cn  - LETSENCRYPT_EMAIL=li@latelee.org  - ENABLE_ACME=truenetworks:  - nginxp-net

networks:
nginxp-net:
driver: bridge

我的docker随笔21：web 服务器部署

2020-06-16T16:46:04.000Z

本文记录几种不同的web服务器部署，其过程大同小异。

技术总结：
1、选择镜像，建议体积越小越好。确认宿主机目录，将其挂载到容器存放 html 文件的目录。
2、可以将 html 文件拷贝到镜像中重新运行，但此法不太方便。
3、容器内的目录：

nginx：/usr/share/nginx/html
httpd：/usr/local/apache2/htdocs/
tomcat： /usr/local/tomcat/webapps/ROOT
php：/var/www/

nginx部署

要点：选择nginx:alpine版本，体积小。

docker-compose 文件：

version: "2"
services:
  nginx_all:
    image: nginx:alpine
    container_name: nginx_all
    volumes:
      - $PWD/nginx:/etc/nginx
    ports:
      - 8080:80
    networks:
      - mywebsite
  web1:
    image: latelee/nginx
    container_name: web1
    volumes:
      - $PWD/html1:/usr/share/nginx/html
    ports:
      - 8081:80
    networks:
      - mywebsite

  web2:
    image: latelee/nginx
    container_name: web2
    volumes:
      - $PWD/html2:/usr/share/nginx/html
    ports:
      - 8082:80
    networks:
      - mywebsite
networks:
    mywebsite:
      driver: bridge

主页示例：

cat XX/index.html

  
    hello world

httpd

httpd实际是apache。

version: "2"
services:
  web1:
    image: latelee/httpd
    container_name: web1
    volumes:
      - $PWD/html1:/usr/local/apache2/htdocs/
    ports:
      - 8081:80
    networks:
      - mywebsite
  web2:
    image: latelee/httpd
    container_name: web2
    volumes:
      - $PWD/html2:/usr/local/apache2/htdocs/
    ports:
      - 8082:80
    networks:
      - mywebsite
networks:
    mywebsite:
      driver: bridge

tomcat

version: '2'
services:
    tomcat:
        image: tomcat:8.0.51-jre8-slim
        container_name: tomcat
        #restart: always
        volumes:
            - $PWD/webapps:/usr/local/tomcat/webapps/ROOT
        ports:
            - "8080:8080"

$ cat webapps/index.php

  
    THis is tomcat test
     PHP 
     2018 5 5

php

version: "2"
services:
  php:
    image: php:7.2.7-apache
    container_name: php
    restart: always
    volumes:
      - ./www:/var/www/
    ports:
      - 5000:80

我的docker随笔20：多平台docker镜像下载

2020-06-16T16:43:47.000Z

dockerhub上有很多官方的镜像，支持多平台。本文进行简单介绍。

下载

以 busybox 为例，官方镜像地址为： https://hub.docker.com/_/busybox?tab=tags 。可以查看各种平台的各种版本。如图1所示。

下载一个 arm v7 版本的：

1	docker pull busybox:glibc@sha256:783d05e2c73f48d4499387b807caf11b0b3afef5e17e225643b4b4558b21e221

通过 docker images | grep busybox 查看其镜像 ID 为 2128ff41e8e1。如下：

1	busybox 2128ff41e8e1 12 days ago 2.68 MB

只有镜像ID，没有镜像名称，打标签：

1	docker tag 2128ff41e8e1 latelee/armbusybox:v7

这个镜像无法在x86上运行。提示：

1	standard_init_linux.go:178: exec user process caused "exec format error"

docker 19.03 版本已经支持多平台镜像的构建。因故未实验。

上传

前一小节，我们在拉取镜像时，指定了 sha256 的值，以示区别，但实际上，docker pull可以根据当前的系统选择不同的镜像，以拉取 nginx 命令为例：

1	docker pull nginx

在不同系统（如arm、x86、x86_64）执行，得到的镜像是不同的，且匹配当前系统。因为官方的 nginx 支持了不同系统，详情参考 https://hub.docker.com/_/nginx?tab=tags 。
这利益于 docker manifest 机制。

下面尝试制作支持多平台的镜像，对用户而言镜像名称相同，docker 拉取时根据执行命令时的架构判断选择。

1、开启实验特性：

1	export DOCKER_CLI_EXPERIMENTAL=enabled

2、制作不同镜像。

3、上传镜像

1 2	docker push latelee/webgin latelee/amd64webgin docker push latelee/webgin latelee/armwebgin

4、制作
先创建：

1 2	# docker manifest create latelee/webgin latelee/amd64webgin latelee/armwebgin Created manifest list docker.io/latelee/webgin:latest

再标注不同平台：

1 2	docker manifest annotate latelee/webgin latelee/amd64webgin --os linux --arch x86_64 docker manifest annotate latelee/webgin latelee/armwebgin --os linux --arch armv7l

可查看详情：

1	docker manifest inspect latelee/webgin

最后推送：

1	docker manifest push latelee/webgin

注1：在不同的系统实验，发现arch字段名称不同，有时用x86_64可以，但有时用amd64可以，一些示例：arm、ppc64le、arm64。
注2：如果错误创建了manifest（如名称不同的镜像，但内容一致，打标签不同，认为是2个，实际是一个），再更新似乎更新不了。方法：找另一台新的机器，重新步骤。
注3：不需要下载到本地，但需要dockerhub上存在。镜像似乎的确要存在于dockerhub上。
注4：因周知之原因，dockerhub网络不太稳定，可能需要多次尝试。

如不存在，无法下载，比如本身不存在arm版本，但却在 arm 系统中拉取：

docker pull latelee/webgin
latest: Pulling from latelee/webgin
latest: Pulling from latelee/webgin
latest: Pulling from latelee/webgin
no matching manifest for linux/arm in the manifest list entries

记一次ubuntu虚拟机被挖矿木马攻击的过程

2020-06-04T15:24:40.000Z

事由：
今天发现虚拟机有点卡，用 top 查看发现2个未知进程占用大量CPU，遂查，发现被挖矿木马攻击了。

定位

使用 top 查看：

1 2	top 96058 root 20 0 5263372 60460 916 S 56.5 1.7 66:34.21 tsm

查看详细：

 ps aux | grep tsm
root      96052  0.0  0.0  11552    88 ?        S    11:14   0:00 timeout 3h ./tsm -t 515 -f 1 -s 12 -S 10 -p 0 -d 1 p ip
root      96053  0.0  0.0  12512   196 ?        S    11:14   0:00 /bin/bash ./tsm -t 515 -f 1 -s 12 -S 10 -p 0 -d 1 p ip
root      96058 43.7  1.7 5263372 60524 ?       Sl   11:14  66:54 /tmp/.X25-unix/.rsync/c/lib/64/tsm --library-path /tmp/.X25-unix/.rsync/c/lib/64/ /usr/sbin/httpd rsync/c/tsm64 -t 515 -f 1 -s 12 -S 10 -p 0 -d 1 p ip
ubuntu   118586  0.0  0.0  11552    84 ?        S    13:22   0:00 timeout 3h ./tsm -t 515 -f 1 -s 12 -S 10 -p 0 -d 1 p ip
ubuntu   118587  0.0  0.0  12512   652 ?        S    13:22   0:00 /bin/bash ./tsm -t 515 -f 1 -s 12 -S 10 -p 0 -d 1 p ip
ubuntu   118592 13.3  1.0 5263336 37160 ?       Sl   13:22   3:15 /tmp/.X25-unix/.rsync/c/lib/64/tsm --library-path /tmp/.X25-unix/.rsync/c/lib/64/ /usr/sbin/httpd rsync/c/tsm64 -t 515 -f 1 -s 12 -S 10 -p 0 -d 1 p ip

查看bash运行的程序：

 ps -ef | grep bash
ubuntu      564    563  0 14:19 ?        00:00:00 /bin/bash ./tsm -t 515 -f 1 -s 12 -S 10 -p 0 -d 1 p ip
root       1196   1195  0 14:19 ?        00:00:00 /bin/bash ./tsm -t 515 -f 1 -s 12 -S 10 -p 0 -d 1 p ip
root       5988   5987  0 May18 pts/2    00:00:01 /bin/bash
ubuntu    55500  55499  0 Apr23 pts/2    00:00:00 -bash
ubuntu    75581      1  0 May13 ?        00:00:03 /bin/bash ./go
root     103117      1  0 May03 ?        00:00:04 /bin/bash ./go

注：2个用户均有运行。

定位到/tmp/.X25-unix目录，文件如下：

# ls -la
total 5220
drwxr-xr-x  3 root   root      4096 May  3 23:44 .
drwxrwxrwt 11 root   root      4096 Jun  4 13:40 ..
-rw-r--r--  1 root   root   5332768 May  3 23:43 dota3.tar.gz
drwxr-xr-x  5 ubuntu ubuntu    4096 Apr  9 20:33 .rsync

.rsync目录分析见参考资料。

查看up.txt文件（无意间在tmp目录发现才查看的）：

1 2	cat /tmp/up.txt root 123456 // 本机账号和密码

定时任务

查看所有用户的定时任务。

# cat /etc/passwd | cut -f 1 -d : |xargs -I {} crontab -l -u {}
* */23 * * * /root/.configrc/a/upd>/dev/null 2>&1
@reboot /root/.configrc/a/upd>/dev/null 2>&1
5 8 * * 0 /root/.configrc/b/sync>/dev/null 2>&1
@reboot /root/.configrc/b/sync>/dev/null 2>&1  
0 0 */3 * * /tmp/.X25-unix/.rsync/c/aptitude>/dev/null 2>&1

* */23 * * * /home/ubuntu/.configrc/a/upd>/dev/null 2>&1
@reboot /home/ubuntu/.configrc/a/upd>/dev/null 2>&1
5 8 * * 0 /home/ubuntu/.configrc/b/sync>/dev/null 2>&1
@reboot /home/ubuntu/.configrc/b/sync>/dev/null 2>&1  
0 0 */3 * * /tmp/.X25-unix/.rsync/c/aptitude>/dev/null 2>&1
no crontab for sshd
no crontab for statd
no crontab for mosquitto

2个用户的home目录下均有，/root/.configrc/和/home/ubuntu/.configrc。
使用crontab -l -u root和crontab -l -u ubuntu确认是否这2个用户。清理：

crontab  -r -u root
crontab  -r -u ubuntu
```  
再执行`cat /etc/passwd | cut -f 1 -d : |xargs -I {} crontab -l -u {}`查看，已无。  

## 干掉进程删除文件
先杀死进程：

kill -9 12889

1
2
3

但立刻又启动了。失败。  

删除目录：

rm .X25-unix/ -rf

1	停止 bash 启动的程序：

kill -9 564 1196 75581 103117

1 2	再次执行`ps -ef \| grep bash`查看，无异常。查看 tsm 进程并杀死（此时无 bash 启动的`./go`信息）：

#ps aux | grep tsm
root 1206 0.0 0.2 4320576 7072 ? Sl 14:19 0:00 /tmp/.X25-unix/.rsync/c/lib/64/tsm –library-path /tmp/.X25-unix/.rsync/c/lib/64/ /tmp/.X25-unix/.rsync/c/tsm64 -t 515 -f 1 -s 12 -S 10 -p 0 -d 1 p ip
root 2360 0.0 0.0 14224 956 pts/2 S+ 14:23 0:00 grep –color=auto tsm
ubuntu 130134 0.3 0.1 1222452 4608 ? Sl 14:18 0:01 /tmp/.X25-unix/.rsync/c/lib/64/tsm –library-path /tmp/.X25-unix/.rsync/c/lib/64/ /usr/sbin/httpd rsync/c/tsm64 -t 515 -f 1 -s 12 -S 10 -p 0 -d 1 p ip

kill -9 1206 130134

再次执行`ps aux | grep tsm`确认，无异常。过几分钟后再查，无异常。  


## 清理残留
在查看定时任务时，还发现存在`/root/.configrc/`和`/home/ubuntu/.configrc`，查看文件：

ll

total 24
drwxr-xr-x 4 root root 4096 May 3 23:44 ./
drwx—— 8 root root 4096 May 3 23:44 ../
drwxr-xr-x 2 root root 4096 May 3 23:44 a/
drwxr-xr-x 2 root root 4096 May 3 23:44 b/
-rw-r–r– 1 root root 251 May 3 23:44 cron.d
-rw-r–r– 1 root root 16 May 3 23:44 dir2.dir

1
2
3

从上文的`dota3.tar.gz`文件时间看，是5月3号更新的，与这里保持一致。  

删除之：

rm -rf /root/.configrc/

rm -rf /home/ubuntu/.configrc

1
2
3

观察`/root/`和`/home/ubuntu/`的隐藏文件目录，暂无异常。  

前面查看有`dota3.tar.gz`压缩包，当前再看看有无

find / -name “dota3.tar.gz”

/var/tmp/dota3.tar.gz

查看其时间：

ls /var/tmp/dota3.tar.gz -lh
-rwx—— 1 ubuntu ubuntu 5.1M May 13 02:03 /var/tmp/dota3.tar.gz

1 2	5月13号凌晨下载的。删除之：

rm -rf /var/tmp/dota3.tar.gz

1
2
3


## 清理 ssh
从参考资料知，此攻击是通过弱 ssh 口令登陆系统的，查看2个用户的ssh文件：

ls /root/.ssh/ -la

total 16
drwx—— 2 root root 4096 May 8 12:49 ./
drwx—— 7 root root 4096 Jun 4 14:33 ../
-rw——- 1 root root 389 May 3 23:44 authorized_keys
-rw-r–r– 1 root root 222 May 8 12:49 known_hosts

#ls /home/ubuntu/.ssh/ -la
total 16
drwx—— 2 ubuntu ubuntu 4096 May 15 23:53 .
drwxr-xr-x 31 ubuntu ubuntu 4096 Jun 4 14:33 ..
-rw——- 1 ubuntu ubuntu 389 May 13 02:24 authorized_keys
-rw-r–r– 1 ubuntu ubuntu 222 May 15 23:53 known_hosts

1
2

从时间看，root 用户是5月3号，普通用户是5月13号，应该是当时扫描到弱口令登陆，然后修改 ssh 配置，大概浏览内容，是一样的。  
直接把`.ssh`目录删除：

rm -rf /root/.ssh

rm -rf /home/ubuntu/.ssh/

后续

完成上述操作后，重启，重启前再次执行前面的命令确认进程或文件是否还存在。
重启后一切正常。
暂时不改密码，待过几天看看。

其它

观察最近一千条历史命令，未发现异常。
难得一次亲历，把木马的压缩包保存起来，有空再研究，学习一下。

参考：

https://blog.csdn.net/whatday/article/details/103761081
https://cloud.tencent.com/developer/article/1447419
https://blog.csdn.net/yisangwu/article/details/106292958

KubeEdge 1.3.0 部署

2020-06-01T16:03:00.000Z

本文介绍了如何在两台 ubuntu 16.04 64 bit 双核 CPU 虚拟机上从源码开始部署 KubeEdge 1.3.0 集群的过程，云端 Kubernetes 版本为 1.17.3，Golang 版本为1.13.5。本文基于 KubeEdge 1.1.0 和 KubeEdge 1.2.0 部署文章，具有一定实践参考意义。

一、概述

1.1 环境

云端：ubuntu 16.04 64 bit，用户名为 ubuntu。IP 为 192.168.0.102。
边缘端：同上，IP 为 192.168.0.140。
KubeEdge部署涉及两端：

云端
docker， kubernetes 集群和 KubeEdge 云端核心模块。
边缘端
docker， mqtt 和 KubeEdge 边缘端核心模块。

技术总结：
1、搭建编译环境（建议自行编译源码），编译系统内存需足够大（如4GB）。
2、部署k8s，但只安装k8s master节点即可，node不需要安装。
3、编译 KubeEdge，生成证书（注：针对 kubectl logs命令而言），创建crds。
4、先运行得到配置文件，再修改。注意配置文件位置，注意系统平台框架。
5、检查主机名称，必须合规，否则注册不了。
6、先运行云端，获取 token，再修改边缘端配置。

1.2 依赖

依赖组件（含工具、编译器）及版本：

golang
版本 1.13.5，到 https://studygolang.com/dl 下载。编译源码需要，如果不编译，则无须安装。需要指出的是，KubeEdge 对 golang 版本有兼容性，当前官方已经支持 1.13 版本 golang。
k8s 版本
1.17，具体部署参考其它文章。理论上1.18也会支持，但无暇测试，仅猜测。
mosquitto
1.6.8，到 https://mosquitto.org/download/ 下载。
KubeEdge 相关的（云端、边缘端）
最新发布版本为 v1.3.0，到 https://github.com/kubeedge/kubeedge/releases/tag/v1.3.0 下载。代码仓库为 https://github.com/kubeedge/kubeedge/ 。
注意，github并不是一直都十分稳定，所以下载可能会较耗时。

本文部署时间约为2020年5月下旬。KubeEdge目前在快速开发中，请注时效性，以官方文档为准，本文仅针对 KubeEdge 1.3.0 有效。

1.3 方法论

最权威的莫过于 KubeEdge 官方安装文档：https://docs.kubeedge.io/en/latest/。该文档从源码编译到安装，均有涉及。然作为初学者，个人认为官方的文档还不够细致。网上也有很多安装部署教程，虽然方法不同，但目的是一样的。这里根据笔者经验简述如下：

步步为营
根据官方文档安装，先产生证书，再编译源码得到云端和边缘端核心程序。再修改配置文件，最后运行。
事预则立
下载官方编译好的二进制文件（cloudcore、edgecore），或者自行编译源码得到这两个二进制文件。准备好配置文件（并做适应性修改）。然后再运行。本文采用此方式。先准备好所有的文件，并统一放置到部署工程目录。

需要注意的是，KubeEdge 官方文档只涉及 KubeEdge 有关的部署，KubeEdge 还要依赖 docker 和 kubernetes（边缘端还要依赖 mosquitto）。而这些需要实践者额外进行，为减少篇幅，本文不过多涉及，但给出部署文章链接。

1.4 新版本主要变化

相比1.2.0 版本，1.3.0版本有如下变化（仅涉及个人感兴趣部分）：
1、不需要手动产生证书，程序自动产生。
2、支持 kubectl logs 命令（据说未来会支持 kubectl exec）。
3、不需要手动分发证书，边缘端自动获取。
4、云端高可用待有空闲有条件测试。

二、准备

在开始之前，必须先安装 Docker 并部署k8s集群。
Docker 安装可参考这里，kubernetes 安装可参考这里。需要指出的是，kubernetes 只部署 master 即可，node 无须部署。但必须安装网络插件（此处存疑，如果不安装，状态不是 Ready）。

2.1 创建部署文件目录

官方文档建议创建单独的目录存储二进制文件，如~/cmd/，在部署 1.1.0 版本时笔者也创建目录，但新版本做了优化，笔者认为无须部署目录，为方便测试，直接在新版本生成的目录下执行程序，其目录为$GOPATH/src/github.com/kubeedge/kubeedge/_output/local/bin。

2.2 KubeEdge 二进制

新版本渐渐使用 keadm 部署，对于不需要了解背后原理者而言，建议使用此方式。
本文使用的 KubeEdge 是从源码编译得到。

2.2.1 官方编译好的文件

到 https://github.com/kubeedge/kubeedge/releases 下载官方编译好的二进制。压缩包名称为 kubeedge-v1.3.0-linux-amd64.tar.gz 。
也可以通过编译源码得到，方法参考文后。
另外，除了编译好的二进制外，还需要下载源码，地址为： https://github.com/kubeedge/kubeedge 。因为部分配置文件只在仓库源码中才可找到（当然，也可以直接在 github 仓库上找到对应的文件下载）。

2.2.2 KubeEdge 源码编译

1、Golang 环境搭建
下载好 golang，并解压：

1 2	# mkdir ~/tools # tar xf go1.13.5.linux-amd64.tar.gz -C ~/tools

在 ~/.bashrc 文件最后添加：

1
2
3

export PATH=$PATH:/home/ubuntu/tools/go/bin
export GOROOT=/home/ubuntu/kubeedge
export GOPATH=/home/ubuntu/kubeedge

执行 source ~/.bashrc 生效。验证：

1 2	# ubuntu@ubuntu:~/kubeedge$ go version go version go1.13.5 linux/amd64

2、克隆仓库：

1	# git clone https://github.com/kubeedge/kubeedge.git $GOPATH/src/github.com/kubeedge/kubeedge

如果克隆速度慢，可以直接下载zip包，并解压源码到 $GOPATH/src/github.com/kubeedge/kubeedge，注意，这个目录一定是源码所在目录。
切换 1.3.0 分支：

1	# git checkout -b release-1.3 remotes/origin/release-1.3

3、检测 gcc 版本：

1
2
3

# gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.

如果没有安装 gcc，则自行安装。

编译云端：

1 2	# cd $GOPATH/src/github.com/kubeedge/kubeedge/ # make all WHAT=cloudcore

编译边缘端：

1 2	# cd $GOPATH/src/github.com/kubeedge/kubeedge # make all WHAT=edgecore

生成的二进制位于_output/local/bin/目录下。

2.3 生成证书

1.3.0 正式版本不需要手动生成证书，如果已经安装了旧版本，则需要清除 /etc/kubeedge/ca 和 /etc/kubeedge/certs 目录的证书。执行如下命令：

1 2	kubectl delete secret casecret -nkubeedge kubectl delete secret cloudcoresecret -nkubeedge

如果是首次安装，忽略此步骤即可。

生成证书以支持 kubectl logs 命令

确认 k8s 集群正常运行，本文使用 kubeadmin 部署，其证书位于/etc/kubernetes/pki/目录（注：生成证书脚本需要使用/etc/kubernetes/pki/ca.crt和/etc/kubernetes/pki/ca.key文件）。
先设置云端IP：

1
2
3

export CLOUDCOREIPS="192.168.0.102"
注：可同时设置多个，如：
export CLOUDCOREIPS="172.20.12.45 172.20.12.46"

生成证书：

1	$GOPATH/src/github.com/kubeedge/kubeedge/build/tools/certgen.sh stream

确保如下目录存在，如否创建之，否则证书无法生成：

1 2	mkdir -p /etc/kubeedge/ca mkdir -p /etc/kubeedge/certs

注：

设置 iptables：

1	iptables -t nat -A OUTPUT -p tcp --dport 10350 -j DNAT --to 192.168.0.102:10003

（注：设置 NAT 端口转发）

2.4 创建设备模块和设备CRD yaml 文件

# cd $GOPATH/src/github.com/kubeedge/kubeedge/build/crds/devices
# kubectl create -f devices_v1alpha1_devicemodel.yaml
# kubectl create -f devices_v1alpha1_device.yaml

# cd $GOPATH/src/github.com/kubeedge/kubeedge/build/crds/reliablesyncs
# kubectl create -f cluster_objectsync_v1alpha1.yaml
# kubectl create -f objectsync_v1alpha1.yaml

注：新版本的 yaml 文件有两类，在 devices 和 reliablesyncs 目录。可用kubectl get crds查看。

2.6 配置云端节点

新版本配置文件由 cloudcore 程序生成，执行：

1
2
3

# cd $GOPATH/src/github.com/kubeedge/kubeedge/_output/local/bin
# mkdir -p /etc/kubeedge/config/ 
# ./cloudcore --minconfig > /etc/kubeedge/config/cloudcore.yaml

注1：cloudcore --minconfig生成最小配置，类似有cloudcore --defaultconfig。
注2：cloudcore 默认使用的配置文件为/etc/kubeedge/config/cloudcore.yaml。
边缘端类似，下省略。

接着修改配置文件：

1	# vim /etc/kubeedge/config/cloudcore.yaml

官方默认为kubeconfig: "/root/.kube/config"，本文改为 kubeconfig: "/home/ubuntu/.kube/config"。其它保持默认。注：具体的路径，取决于部署 k8s 时的选择，就前2个路径而言，前者一般由 root 权限运行，后者是普通权限（普通用户）运行。

2.7 配置边缘节点

新版本配置文件由 edgecore 程序生成，因此，需要在边缘端机器上执行。具体参考下文。

2.8 mqtt

mqtt 只有边缘端需要。
如果边缘端为 ubuntu 系统，直接使用源安装：

# add-apt-repository ppa:mosquitto-dev/mosquitto-ppa // 添加源
# apt-get update // 更新
# apt-get install mosquitto // 安装mqtt服务端
# apt-get install mosquitto-clients // 如果需要测试，则安装mqtt客户端

另外也可以使用源码编译。

在 ubuntu 系统，安装 mosquitto 成功后会自动启动服务。由于 KubeEdge 使用多个端口，故需用配置文件。服务端添加多端口：

vim /etc/mosquitto/conf.d/port.conf
port 1883
listener 1884
```  
此处指定 1883 和 1884 端口，从 KubeEdge 生成配置文件可知。没有指定协议，默认使用 mqtt。修改配置后需要重启：

/etc/init.d/mosquitto restart


或者手动启动：  
``` 
/usr/sbin/mosquitto -d -c /etc/mosquitto/mosquitto.conf

建议使用系统级别服务，预防漏掉此步骤，导致 KubeEdge 测试失败。

可用如下命令验证服务是否正常：

1	mosquitto_pub -h -p 1884 -t "hello" -m "this is hello world"

如果出现Error: Connection refused表示服务（及对应的端口）未启动。

题外话：
在嵌入式 ARM Linux 环境中，Buildroot 已包含 mosquitto，可直接勾选。此处略，根据笔者实验，Buildroot 的 mosquitto 所有配置均在文件 /etc/mosquitto/mosquitto.conf 中。使用如下命令操作：

systemctl restart mosquitto  // 重启
systemctl stop mosquitto     // 停止
```  

## 三、部署

### 3.1 云端

#### 3.1.1 查看 k8s 集群
查看节点状态：

kubectl get node

NAME STATUS ROLES AGE VERSION
latelee-master Ready master 3m v1.17.3

此刻只有云端节点就绪。  

#### 3.1.3 运行云端核心
可以另建目录运行程序，也可以在程序生成目录，此处选择后者，方便调试。

cd $GOPATH/src/github.com/kubeedge/kubeedge/_output/local/bin

./cloudcore // 建议先如此，方便观察日志

也可以：

nohup ./cloudcore > cloudcore.log 2>&1 &

1 2	如果使用系统服务方式，启动脚本为`build/tools/cloudcore.service`，需修改`ExecStart`为真实值。示例如下：

[Unit]
Description=cloudcore.service

[Service]
Type=simple
Restart=always
ExecStart=/etc/kubeedge/cloudcore

[Install]
WantedBy=multi-user.target

添加服务命令：

cp build/tools/cloudcore.service /etc/systemd/system/cloudcore.service
sudo systemctl daemon-reload
sudo systemctl start cloudcore


**注意，1.3版本不再需要手动生成证书，改为用 k8s secret 方式，因此，必须先运行云端，才会生成 secret(至少成功运行一次，以产生 secret)，否则无法得到 token，无法配置边缘端。**

### 3.2 边缘端
#### 3.2.1 分发
前面已经准备好了文件，直接部署就方便很多。注意，需要将边缘端可执行文件拷贝到边缘机器上。方式多种，建议使用 scp 命令。前提是安装了 SSH 协议。**在边缘端机器上执行拷贝（也称为分发）示例**：

mkdir -p /etc/kubeedge/config ~/kubeedge/

cd ~/kubeedge/

scp -r 192.168.0.102:/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/_output/local/bin/edgecore ~/kubeedge/

注1：此操作在边缘端机器上，非云端。假设部署工程目录为`~/kubeedge`。  
注2：1.3.0 版本无须手动拷贝证书，在运行边缘端时会自动从云端获取并存储在`/etc/kubeedge/`目录下，从结果看，依然生成 /etc/kubeedge/ca 和 /etc/kubeedge/certs ，亦即与前面版本保持一致。  
注3：如果以其它登陆用户身份拷贝，可在IP地址前加用户名，如`sudo scp -r latelee@192.168.0.102:/etc/kubeedge/* /etc/kubeedge`。  

#### 3.2.2 获取 token
**前文已经运行了云端，会自动产生token，这里要先切换到云端机器，获取 secret，再将 tokendata 内容解码得到 token**。示例如下：

kubectl get secret tokensecret -n kubeedge -oyaml

输出：
apiVersion: v1
data:
tokendata: ZWE1NDg3YWNhYjZlMWEwNmE2OGI5OTNkOTMxNGVlMzA5OTg2YzJkM2MyOTkzMmNlNGI2NTE2MzI0NzljMDlhOC5leUpoYkdjaU9pSklVekkxTmlJc0luUjVjQ0k2SWtwWFZDSjkuZXlKbGVIQWlPakUxT1RFeE1USXhNamg5LlVoUHBBdnR6YmhMZkcycUNaZmtqX3Zoak9qbEw5VEFQdElGWkJQTlpuZ0E=
kind: Secret
…

解码：

echo ZWE1NDg3YWNhYjZlMWEwNmE2OGI5OTNkOTMxNGVlMzA5OTg2YzJkM2MyOTkzMmNlNGI2NTE2MzI0NzljMDlhOC5leUpoYkdjaU9pSklVekkxTmlJc0luUjVjQ0k2SWtwWFZDSjkuZXlKbGVIQWlPakUxT1RFeE1USXhNamg5LlVoUHBBdnR6YmhMZkcycUNaZmtqX3Zoak9qbEw5VEFQdElGWkJQTlpuZ0E= | base64 -d

输出结果为：
ea5487acab6e1a06a68b993d9314ee309986c2d3c29932ce4b651632479c09a8.eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE1OTExMTIxMjh9.UhPpAvtzbhLfG2qCZfkj_vhjOjlL9TAPtIFZBPNZngA // 注意解码后字符串没有换行，要仔细核对

**再次强调，本小节在云端机器执行。**  

#### 3.2.3 配置
生成配置文件：

./edgecore –minconfig > /etc/kubeedge/config/edgecore.yaml

修改配置文件：

vim /etc/kubeedge/config/edgecore.yaml

注意3处地方。  
1. 修改`websocket`下的`server`，默认为`127.0.0.1:10000`，需改为实际云端 IP 地址，此处为`192.168.0.102:10000`。
2. 修改（确认）`podSandboxImage`，X86平台为`podSandboxImage: kubeedge/pause:3.1`（默认），ARM 平台根据位数不同，可设为`kubeedge/pause-arm:3.1`或`ubeedge/pause-arm64:3.1`。  
3. 填写 edgeHub 下的 token 值，见 3.2.2 小节生成的结果。  
其它要点：cgroup 驱动默认值为：`cgroupDriver: cgroupfs`，无须改 Docker 配置。网络设备接口名称及 IP 地址，会运行上述命令时自动检测获取，无须修改（注：在一台虚拟机中，网卡为 enp0s3，但配置文件中依然为 eth0）。  

#### 3.2.4 运行
接上，运行边缘端核心：

./edgecore // 建议先如此，方便观察日志

也可以：

nohup ./edgecore > edgecore.log 2>&1 &

1
2
3


如果使用系统服务方式，启动脚本为`build/tools/edgecore.service`，需修改`ExecStart`为真实值。  
示例如下：

[Unit]
Description=edgecore.service

[Service]
Type=simple
Restart=always
ExecStart=/etc/kubeedge/edgecore

[Install]
WantedBy=multi-user.target

添加服务：

cp build/tools/edgecore.service /etc/systemd/system/edgecore.service
sudo systemctl daemon-reload
sudo systemctl start edgecore


这里再强调一次 mqtt，另起终端，运行命令：  
``` 
/usr/sbin/mosquitto -d -c /etc/mosquitto/mosquitto.conf

3.3 验证

在云端查看状态：

# kubectl get node
NAME           STATUS   ROLES        AGE    VERSION
latelee-master Ready    master       24m    v1.17.3
latelee-node   Ready    agent,edge   2m9s   v1.17.1-kubeedge-v1.3.0-beta.0.49+5bfca35b2d99a5-dirty

云端和边缘端均为 Ready 状态。

尝试部署官方的 deployment：

1	kubectl apply -f $GOPATH/src/github.com/kubeedge/kubeedge/build/deployment.yaml

输出示例：

1
2
3

# kubectl get pod -owide
NAME                                           READY   STATUS    RESTARTS   AGE     IP       NODE         NOMINATED NODE   READINESS GATES
nginx-deployment-77698bff7d-zf5c6              1/1     Running   0          110s      latelee-node

扩容测试：

1	kubectl scale deployment nginx-deployment --replicas=4

预期效果：有4个pod出现，但只有1个正常运行，因为该 deployment 导出了节点端口，前面的 pod 已经占用，无法再分配。理论上，如果有4个节点，则会自动将 deployment 调度到4个节点上。输出示例：

# kubectl get pod 
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-77698bff7d-b9mlc   0/1     Pending   0          6s
nginx-deployment-77698bff7d-ddvl2   0/1     Pending   0          6s
nginx-deployment-77698bff7d-p6k8t   0/1     Pending   0          7s
nginx-deployment-77698bff7d-zf5c6   1/1     Running   0          2m27s
```  
  
删除：

kubectl delete -f $GOPATH/src/github.com/kubeedge/kubeedge/build/deployment.yaml


测试小记：
1. 初步测试，可以正常删除 pod了。  
2. 再次运行云端程序，查看 tokensecret，发现值不同，但边缘端使用旧的 token 可以成功连接。

## 四、ARM部署
ARM的部署十分简单，将 edgecore 交叉编译即可，其它与 X86 环境是一致的，这里再列一次：创建对应的目录，拷贝边缘端可执行二进制文件，启动mqtt（一般作为系统服务随系统启动而启动），运行边缘端。  
安装交叉编译器：

sudo apt-get install gcc-arm-linux-gnueabihf

1	设置环境变量并编译：

export GOARCH=arm
export GOOS=”linux”
export GOARM=7
export CGO_ENABLED=1
export CC=arm-linux-gnueabihf-gcc
export GO111MODULE=off
make all WHAT=edgecore

注：KubeEdge 已经将依赖包纳入代码仓库，直接编译即可，不需要下载额外的包，为安全，可暂时禁止 GO111MODULE。  

## 五、清除
kubeedge运行文件：  
1、/etc/kubeedge/： 证书、配置文件（云边均有）。  
2、/var/lib/kubeedge/： 云端有socket文件kubeedge.sock ，边缘端有数据库文件edgecore.db。  

如果要完全清理kubeedge环境，上述目录需要删除。  

## 六、问题
在测试时发现的问题及解决方法。  

1、  
云端运行时出现：

./cloudcore

[address: Invalid value: “unix:///var/lib/kubeedge/kubeedge.sock”: unixSocketAddress unix:///var/lib/kubeedge/kubeedge.sock dir /var/lib/kubeedge not exist , need create it]

1	解决：此目录保存socket文件，需要手动创建目录

mkdir -p /var/lib/kubeedge


备注：1.3.0 版本似乎无须手动创建了。  

2、  
云端无法获取请求的资源：

./cloudcore
…
github.com/kubeedge/kubeedge/cloud/pkg/synccontroller/synccontroller.go:162: Failed to list *v1alpha1.ObjectSync: the server could not find the requested resource (get objectsyncs.reliablesyncs.kubeedge.io)

没有执行`$GOPATH/src/github.com/kubeedge/kubeedge/build/crds/reliablesyncs`目录的yaml文件，参见 2.4 小节。  

3、  
Cgroup 驱动不匹配：

[CGroupDriver: Invalid value: “groupfs”: CGroupDriver value error]

如果 Docker 使用的驱动方式为 systemd，则需要修改 yaml 文件为 systemd，如使用 cgroupfs，Docker 也要保持一致。  

4、  
边缘端机器的配置文件的主机名称、IP，必须与真实主机一致。否则会注册不成功。   

5、  
节点注册失败：

create node LATE-LEE error: Node “LATE-LEE” is invalid: metadata.name: Invalid value: “LATE-LEE”: a DNS-1123 subdomain must consist of lower case alphanumeric characters, ‘-‘ or ‘.’, and must start and end with an alphanumeric character (e.g. ‘example.com’, regex used for validation is ‘a-z0-9?(.a-z0-9?)*’) , register node failed

主机名称不合法，必须是小写字母、数字，其它字符只能是`-`或`.`（下划线也不行），而且名称的开头和结尾必须是小写字母。（注：这是k8s dns命名的一个规范）。  

6、  
清理相关。

Failed to check the running environment: Kubelet should not running on edge node when running edgecore

1	一般出现于k8s和kubeedge混用情况，需要完全清理k8s方可。相似问题有：

Orphan pod: Orphaned pod “8685b805-a1c7-4687-8ce8-c77d24af5828” found, but volume paths are still present on disk


如果要重新运行edgecore，需要删除`/var/lib/kubeedge/edgecore.db`。  

7、  
边缘端有配置 token：

Error: token are in the wrong format

需要在云端生成 token，并填写边缘端配置文件的 token 字段。  

8、  
mqtt 服务未运行。提示：

connect error: Network Error : dial tcp 127.0.0.1:1883: connect: connection refused

根据前文运行 mqtt。  

9、
无法查看边缘端 pod 日志：

kubectl logs nginx-deployment-77698bff7d-wmqfx

Error from server: Get https://192.168.0.140:10350/containerLogs/default/nginx-deployment-77698bff7d-zf5c6/nginx: dial tcp 192.168.0.140:10350: connect: connection refused

1	在边缘端查看端口是存在的：

netstat -ntpl | grep 10350

tcp 0 0 127.0.0.1:10350 0.0.0.0:* LISTEN 5690/edgecore

本机测试该端口：

curl 127.0.0.1:10350

404 page not found
`
提示404。
该问题目前还没排查。

七、小结

KubeEdge 在2020年5月中旬（即本文发表前16天）发布了1.3.0。笔者本想在发布时进行测试验证（主要想看看新特性），但宥于家庭琐事，既想研究技术又想把家庭照料得很好是两难全之事。但最终还是下定决心，利用几个深夜时间慢慢摸索，总算有了本文。

参考

官方源码仓库： https://github.com/kubeedge/kubeedge
官方镜像： https://hub.docker.com/u/kubeedge
压缩包： https://github.com/kubeedge/kubeedge/releases
官方安装文档：https://docs.kubeedge.io/en/latest/setup/setup.html
KubeEdge環境を構築してみた by AWS EC2: https://qiita.com/S-dwinter/items/f1e92f21d4b23fbbba80
KubeEdge 部署: https://www.latelee.org/kubeedge/kubeedge-deploy.html
KubeEdge 1.2.0 部署: https://www.latelee.org/kubeedge/kubeedge-deploy-v1.2.0.html

MySQL操作实例

2020-05-10T12:26:00.000Z

本文介绍MySQL的简单操作。

连接数据库

安装 mysql 客户端：

1	sudo apt install mysql-client-core-5.7

命令行：

mysql -uroot -p123456
mysql -h latelee.org -P 3305 -ulatelee -p1qaz@WSX 
mysql -h latelee.org -P 3305 -uroot -p1qaz@WSX**>> 

mysql -h 127.0.0.1 -P 3306 -u root -p123456

注：不同用户连接，看到的数据库亦不同。

如成功会提示：

1
2
3

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

在mysql>后即可输入sql语句。sql语句使用分号“;”作为结束符号。

查看数据库：

mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
4 rows in set (0.00 sec)

创建数据库，名称为mydb：

1	mysql> CREATE DATABASE mydb;

选择mydb数据库：

1	mysql> USE mydb;

创建数据表user：

mysql> CREATE TABLE `user` (
 `id` bigint(20) NOT NULL,
 `email` varchar(255) DEFAULT NULL,
 `first_name` varchar(255) DEFAULT NULL,
 `last_name` varchar(255) DEFAULT NULL,
 `username` varchar(255) DEFAULT NULL,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

查看user数据表字段内容：

mysql> DESC user;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| id         | bigint(20)   | NO   | PRI | NULL    |       |
| email      | varchar(255) | YES  |     | NULL    |       |
| first_name | varchar(255) | YES  |     | NULL    |       |
| last_name  | varchar(255) | YES  |     | NULL    |       |
| username   | varchar(255) | YES  |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+
5 rows in set (0.02 sec)

往user表插入数据：

1 2	mysql> INSERT INTO `user` (`id`, `email`, `first_name`, `last_name`, `username`) VALUES(0,'li@latelee.org','Late','Lee','latelee');

查看已经插入了的数据：

mysql>  SELECT * FROM user;
+----+----------------+------------+-----------+----------+
| id | email          | first_name | last_name | username |
+----+----------------+------------+-----------+----------+
|  0 | li@latelee.org | Late       | Lee       | latelee  |
+----+----------------+------------+-----------+----------+
1 row in set (0.00 sec)

删除user表所有数据：

1	mysql> DELETE FROM user;

删除user数据表：

1	mysql> DROP TABLE user;

删除数据库mydb：

1	mysql> DROP DATABASE mydb;

退出mysql命令行：

exit


create database cameradb;
use cameradb;

CREATE TABLE `devinfo` (
 `devid` varchar(16) NOT NULL,
 `version` varchar(255) DEFAULT NULL,
 PRIMARY KEY (`devid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DESC devinfo;

SELECT * FROM devinfo;

INSERT INTO `devinfo` (`devid`, `version`)
VALUES('sn_test001','v1.0');

INSERT INTO `devinfo` (`devid`, `version`)
VALUES('sn_test002','v1.0');

INSERT INTO `devinfo` (`devid`, `version`)
VALUES('sn_test100','v1.0');

INSERT INTO `devinfo` (`devid`, `version`)
VALUES('sn_test200','v1.0');

添加字段
末尾：
ALTER TABLE devinfo ADD ip varchar(16);
开头：
ALTER TABLE devinfo ADD netmask varchar(16) first;
指定字段后：
ALTER TABLE devinfo ADD gateway varchar(16) after devid;

删除字段
ALTER TABLE devinfo drop COLUMN ip;

我的k8s随笔：Kubernetes 1.18.0 部署讲解-centos

2020-03-27T15:30:00.000Z

本文介绍了如何在两台 centos 7 16.04 64 bit 双核 CPU 云主机上使用 kubeadm 部署 Kubernetes 1.18.0 集群的过程，网络插件为 flannel v0.11.0，镜像源为阿里云。本文具有一定实践参考意义。

一、环境

云主机，centos 7 64 bit，内核3.10.0，8GB内存，双核 CPU。
环境要求和设置：
工程目录为：$HOME/k8s。
所有操作使用 root 权限执行（注：理论上普通用户亦可，为避免权限问题，故出此下策）。
注意，k8s要求机器的CPU必须双核心以上。
本文部署的k8s版本为1.17.0。部署日期约2019年12月中旬~下旬，请注意时效性。
本文部署镜像及版本如下：

k8s.gcr.io/kube-apiserver:v1.18.0
k8s.gcr.io/kube-controller-manager:v1.18.0
k8s.gcr.io/kube-scheduler:v1.18.0
k8s.gcr.io/kube-proxy:v1.18.0
k8s.gcr.io/pause:3.2
k8s.gcr.io/etcd:3.4.3-0
k8s.gcr.io/coredns:1.6.7
quay.io/coreos/flannel:v0.12.0-amd64

注1：k8s.gcr.io 使用阿里云镜像地址 registry.aliyuncs.com/google_containers 替换。
注2：不同时期部署，所用的 k8s 版本不同，相应的组件版本亦不同，需要重新下载。

二、安装docker

安装系统工具：

1	yum install -y yum-utils device-mapper-persistent-data lvm2

添加国内源（阿里云）：

1	yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

生成缓存：

1	yum makecache

安装：

1	yum install docker

本文安装的 docker 版本为 1.13.1。
执行如下命令新建 /etc/docker/daemon.json 文件：

cat > /etc/docker/daemon.json <<-EOF
{
  "registry-mirrors": [
    "https://a8qh6yqv.mirror.aliyuncs.com",
    "http://hub-mirror.c.163.com"
  ]
}
EOF

释义：
registry-mirrors 为镜像加速器地址。

启动docker，查看 cgroup：

1
2
3

# systemctl start docker
# docker info | grep -i cgroup
Cgroup Driver: systemd

默认cgroup为 systemd，与k8s保持一致，无须修改。

三、部署 k8s master 主机

k8s的部署分 master 主机和 node 节点。本节为 master 主机。

3.1 关闭swap

编辑 /etc/fstab 文件，注释掉swap分区挂载的行，示例：

1 2	# swap was on /dev/sda5 during installation UUID=aaa38da3-6e60-4e9d-bfc6-7128fd05f1c7 none swapsw 0 0

再执行：

1	# swapoff -a

3.2 添加国内k8s源

此处选择阿里云的：

cat < /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
        https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

3.3 安装

安装 kubeadm、kubectl、kubelet、kubernetes-cni 等工具。

1	# yum install kubeadm kubectl kubelet kubernetes-cni

提示信息：

依赖关系解决

================================================================================
 Package                   架构      版本                   源             大小
================================================================================
正在安装:
 kubeadm                   x86_64    1.18.0-0               kubernetes    8.8 M
 kubectl                   x86_64    1.18.0-0               kubernetes    9.5 M
 kubelet                   x86_64    1.18.0-0               kubernetes     21 M
 kubernetes-cni            x86_64    0.7.5-0                kubernetes     10 M
为依赖而安装:
 conntrack-tools           x86_64    1.4.4-5.el7_7.2        updates       187 k
 cri-tools                 x86_64    1.13.0-0               kubernetes    5.1 M
 libnetfilter_cthelper     x86_64    1.0.0-10.el7_7.1       updates        18 k
 libnetfilter_cttimeout    x86_64    1.0.0-6.el7_7.1        updates        18 k
 libnetfilter_queue        x86_64    1.0.2-2.el7_2          base           23 k

输入y确认。

注：从上述信息看，安装的版本为1.18.0，kubernetes-cni 为0.7.5。

3.4 获取部署所需的镜像版本

1	# kubeadm config images list

输出如下：

W0327 16:16:50.268440    3424 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
k8s.gcr.io/kube-apiserver:v1.18.0
k8s.gcr.io/kube-controller-manager:v1.18.0
k8s.gcr.io/kube-scheduler:v1.18.0
k8s.gcr.io/kube-proxy:v1.18.0
k8s.gcr.io/pause:3.2
k8s.gcr.io/etcd:3.4.3-0
k8s.gcr.io/coredns:1.6.7

前面提示的警告信息可不理会。此处是确认本版本 kubeadm 匹配的镜像的版本，因为各组件版本不同可能出现兼容性问题。

3.5 拉取镜像文件。

一般地，国内无法直接下载 k8s.gcr.io 的镜像。方式有二：
1、在初始化k8s时，使用阿里云镜像地址，此地址可以顺利下载，见下初始化命令，初始化时如无镜像会自动下载。也可以预先下载，将上一节的镜像地址前缀改为 registry.cn-hangzhou.aliyuncs.com/google_containers 即可。

2、自行下载好前述镜像。使用如下脚本pullk8s.sh（注意脚本必须添加x属性）：

#!/bin/bash
# 下面的镜像应该去除"k8s.gcr.io/"的前缀，版本换成kubeadm config images list命令获取到的版本
images=(
    kube-apiserver:v1.18.0
    kube-controller-manager:v1.18.0
    kube-scheduler:v1.18.0
    kube-proxy:v1.18.0
    pause:3.2
    etcd:3.4.3-0
    coredns:1.6.7
)

for imageName in ${images[@]} ; do
    docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName
    docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName
    docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName
done

拉取：

1 2	chmod +x pullk8s.sh bash pullk8s.sh (或 ./pullk8s.sh)

3.6 网络

设置网络配置：

mkdir -p /etc/cni/net.d

cat >/etc/cni/net.d/10-mynet.conf <<-EOF
{
    "cniVersion": "0.3.0",
    "name": "mynet",
    "type": "bridge",
    "bridge": "cni0",
    "isGateway": true,
    "ipMasq": true,
    "ipam": {
        "type": "host-local",
        "subnet": "10.244.0.0/16",
        "routes": [
            {"dst": "0.0.0.0/0"}
        ]
    }
}
EOF

cat >/etc/cni/net.d/99-loopback.conf <<-EOF
{
    "cniVersion": "0.3.0",
    "type": "loopback"
}
EOF

经实践，此步骤不做亦可。

3.7 下载flannel镜像

1	docker pull quay.io/coreos/flannel:v0.12.0-amd64

注：如果无法下载，需要使用其它方法。
flannel 镜像信息：

1 2	# docker images \| grep flannel quay.io/coreos/flannel v0.12.0-amd64 4e9f801d2217 2 weeks ago 52.8 MB

注意，这里是先下载好 flannel 镜像，其版本由官方 yaml 文件确认，地址见下文。

3.8 初始化

版本一：

1 2	kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-bind-port=10010 \ --image-repository registry.aliyuncs.com/google_containers

释义：
–pod-network-cidr 指定了网络段，后续网络插件会使用到（本文使用 flannel）。
–image-repository 指定了镜像地址，默认为 k8s.gcr.io，此处指定为阿里云镜像地址 registry.aliyuncs.com/google_containers。
–pod-network-cidr 指定了 CIDR 的网段，默认是192.168.0.0/16，笔者网段也是192.168，为避免冲突，故修改。
–apiserver-bind-port 指定了服务端口，默认是6443，因为该云主机其它程序占用，故改。
注意，其它参数默认。

上述命令等同如下命令：

kubeadm init \
  --apiserver-advertise-address=192.168.0.102 --apiserver-bind-port=10010\
  --image-repository registry.aliyuncs.com/google_containers \
  --kubernetes-version v1.18.0 \
  --service-cidr=10.1.0.0/16\
  --pod-network-cidr=10.244.0.0/16

版本二，根据前文脚本自行拉取版本：

1	kubeadm init --pod-network-cidr=10.244.0.0/16

本文使用版本一部署。

初始化过程的提示信息如下：

W0327 16:35:43.258829    4726 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.18.0
[preflight] Running pre-flight checks
        [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
        [WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [izwz9hs1zswgl6frxwsnhhz kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 119.23.174.153]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [izwz9hs1zswgl6frxwsnhhz localhost] and IPs [119.23.174.153 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [izwz9hs1zswgl6frxwsnhhz localhost] and IPs [119.23.174.153 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
W0327 16:36:10.648368    4726 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0327 16:36:10.649340    4726 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 22.002445 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node izwz9hs1zswgl6frxwsnhhz as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node izwz9hs1zswgl6frxwsnhhz as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 5nx6xk.ufqgazdygjbo31k1
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 119.911.109.901:10010 --token 5nx6xk.ufqgazdygjbo31k1 \
    --discovery-token-ca-cert-hash sha256:fb2b5d905f931b999df435b6c2079fdc5d42959b6b5fb7e2f609b34c1b571a97

首先确认了k8s版本。
接着创建配置文件，如证书等。
再创建 pod。
最后提示加入集群的命令。
部署时不建议深入了解 k8s 概念。最后出现kubeadm join表示初始化成功。

如果忘记，可 kubeadm token create –print-join-command 查看，示例如下：

1
2

W0327 16:41:28.351647    6107 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
kubeadm join 123.231.312.123:10010 --token x04h7k.rvx3xeyc0us0aop2     --discovery-token-ca-cert-hash sha256:fb2b5d905f931b999df435b6c2079fdc5d42959b6b5fb7e2f609b34c1b571a97

释义：前后的token值不同，但 hash 值相同，不影响。

根据提示，根据拷贝 admin.conf 文件到当前用户相应目录下。admin.conf 文件后续会使用到（需要拷贝到 node 节点）。

1
2
3

# mkdir -p $HOME/.kube
# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# sudo chown $(id -u):$(id -g) $HOME/.kube/config

注：如果是使用普通用户切换为 root 权限的，$HOME为普通用户的 HOME 目录路径。目录与用户必须一致，如以 latelee 用户切换 root 权限执行命令的，admin.conf 必须在/home/latelee/.kube目录，而不是 root 用户的/root/.kube目录。如果无此步骤，在执行 kubectl 命令时提示：

1	The connection to the server localhost:8080 was refused - did you specify the right host or port?

初始化时，如不存在则自动下载镜像，初始化后镜像如下：

# docker images
REPOSITORY                                                        TAG                 IMAGE ID            CREATED             SIZE
registry.aliyuncs.com/google_containers/kube-proxy                v1.18.0             43940c34f24f        41 hours ago        117 MB
registry.aliyuncs.com/google_containers/kube-scheduler            v1.18.0             a31f78c7c8ce        41 hours ago        95.3 MB
registry.aliyuncs.com/google_containers/kube-apiserver            v1.18.0             74060cea7f70        41 hours ago        173 MB
registry.aliyuncs.com/google_containers/kube-controller-manager   v1.18.0             d3e55153f52f        41 hours ago        162 MB
quay.io/coreos/flannel                                            v0.12.0-amd64       4e9f801d2217        2 weeks ago         52.8 MB
registry.aliyuncs.com/google_containers/pause                     3.2                 80d28bedfe5d        5 weeks ago         683 kB
registry.aliyuncs.com/google_containers/coredns                   1.6.7               67da37a9a360        8 weeks ago         43.8 MB
registry.aliyuncs.com/google_containers/etcd                      3.4.3-0             303ce5db0e90        5 months ago        288 MB

此时 pod 状态如下：

# kubectl get pods -n kube-system
NAME                                              READY   STATUS    RESTARTS   AGE
coredns-7ff77c879f-mjbm9                          0/1     Pending   0          6m1s
coredns-7ff77c879f-x7jjn                          0/1     Pending   0          6m1s
etcd-izwz9hs1zswgl6frxwsnhhz                      1/1     Running   0          6m10s
kube-apiserver-izwz9hs1zswgl6frxwsnhhz            1/1     Running   0          6m10s
kube-controller-manager-izwz9hs1zswgl6frxwsnhhz   1/1     Running   0          6m10s
kube-proxy-2mxmx                                  1/1     Running   0          6m1s
kube-scheduler-izwz9hs1zswgl6frxwsnhhz            1/1     Running   0          6m10s

除 coredns 状态为 Pending外，其它 pod 均运行。这是因为没有部署网络插件导致的。本文选用 flannel 。

3.9 部署flannel

执行如下命令部署 flannel：

1	# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

释义：
使用 flannel 仓库的 kube-flannel.yml 文件部署。详细的信息，如所用版本号，可参考该文件。
如果无法访问，则可手动下载 https://github.com/coreos/flannel/blob/master/Documentation/kube-flannel.yml 文件到当前目录，再执行 kubectl apply -f kube-flannel.yml 命令。

# kubectl apply -f kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
``` 
部署 flannel 时如 flannel 镜像不存在会自动下载，前文已下载，故启动较快。启动过程中，flannel 状态变化如下：

kube-flannel-ds-amd64-zk6np 0/1 Init:0/1 0 3s
kube-flannel-ds-amd64-zk6np 1/1 Running 0 9s

1 2	这个步骤会创建 cni0 和 flannel.1 网络设备。部署 flannel后。查看 pod：

kubectl get pod -n kube-system

NAME READY STATUS RESTARTS AGE
GE
coredns-7ff77c879f-mjbm9 1/1 Running 0 8m46s
coredns-7ff77c879f-x7jjn 1/1 Running 0 8m46s
etcd-izwz9hs1zswgl6frxwsnhhz 1/1 Running 0 8m55s
kube-apiserver-izwz9hs1zswgl6frxwsnhhz 1/1 Running 0 8m55s
kube-controller-manager-izwz9hs1zswgl6frxwsnhhz 1/1 Running 0 8m55s
kube-flannel-ds-amd64-zk6np 1/1 Running 0 2m18s
kube-proxy-2mxmx 1/1 Running 0 8m46s
kube-scheduler-izwz9hs1zswgl6frxwsnhhz 1/1 Running 0 8m55s

全部 pod 已全部运行。  
注1：与在本地 ubuntu 系统部署稍不同，此处的 coredns 十分正常，可能是云主机的原因。  

**至此，master 节点已部署成功**。  

查看 flannel 网络信息：

cat /run/flannel/subnet.env

FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

1 2	查看 flannel 网络配置：

cat /etc/cni/net.d/10-flannel.conflist

{
“name”: “cbr0”,
“cniVersion”: “0.3.1”,
“plugins”: [
{
“type”: “flannel”,
“delegate”: {
“hairpinMode”: true,
“isDefaultGateway”: true
}
},
{
“type”: “portmap”,
“capabilities”: {
“portMappings”: true
}
}
]
}
`

四、node 节点

k8s的部署分 master 主机和 node 节点。node 节点的部署，与前面文章没有差别，此处从略。

参考资源

本文部署时主要参考如下文章并根据实际情况调整：

https://juejin.im/post/5b8a4536e51d4538c545645c
https://zhuanlan.zhihu.com/p/46341911
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ （官方）
calico canal相关： https://github.com/projectcalico/calico

curl 命令使用

2020-03-18T16:00:00.000Z

curl命令使用

-v：可输出调试信息。

GET操作：

1	curl localhost:8080/test

POST：

1	curl localhost:8080/test -X POST -d "title=hello&name=latelee"

上传文件：

1	curl localhost:8000/api/v1/upimg -F "file=@/Users/fungleo/Downloads/401.png" -H "token: 222" -v

KubeEdge源码学习4：k8s相关

2020-02-20T04:04:00.000Z

本文仅为个人的学习笔记（速记），不具任何参考意义。
k8s相关的东西。

状态

ContainersReady、PodReady、PodScheduled等等，在k8s.io/api/core/v1/types.go定义。可在kubeedge源码搜索变量名称。

type PodConditionType string
ContainersReady PodConditionType = "ContainersReady"
PodInitialized PodConditionType = "Initialized"
PodReady PodConditionType = "Ready"
PodScheduled PodConditionType = "PodScheduled"
PodReasonUnschedulable = "Unschedulable"

pod阶段状态：

type PodPhase string
PodPending PodPhase = "Pending"
PodRunning PodPhase = "Running"
PodSucceeded PodPhase = "Succeeded"
PodFailed PodPhase = "Failed"
PodUnknown PodPhase = "Unknown"

KubeEdge源码学习3：云边配置

2020-02-20T04:03:00.000Z

本文仅为个人的学习笔记（速记），不具任何参考意义。
云、边配置。

1.2版本的一个改进，是使用程序生成配置信息（但需要自行重定向到文件）。在边缘端，能自动检测IP、主机名。但是还没实现根据系统位数和架构确认pause版本。

配置文件为 yaml 格式，云端、边缘端初始化时会读取文件，如果没有指定，而且默认位置/etc/kubeedge/config也没有，则报错。
读取后，转换成大结构体，再根据 Modules 传递到各模块。即各模块各自管理自己的配置项。（额外：如果是简单项目，可以一个结构体通用所有模块，简单方便）
一般情况，用最小配置即可，其它保持默认。

云端配置

最小配置

 ./cloudcore --minconfig
# With --minconfig , you can easily used this configurations as reference.
# It's useful to users who are new to KubeEdge, and you can modify/create your own configs accordingly. 
# This configuration is suitable for beginners.

apiVersion: cloudcore.config.kubeedge.io/v1alpha1
kind: CloudCore
kubeAPIConfig:
  kubeConfig: /root/.kube/config  # 默认k8s配置文件
  master: ""
modules:
  cloudhub:
    nodeLimit: 10
    tlsCAFile: /etc/kubeedge/ca/rootCA.crt
    tlsCertFile: /etc/kubeedge/certs/edge.crt
    tlsPrivateKeyFile: /etc/kubeedge/certs/edge.key
    unixsocket:
      address: unix:///var/lib/kubeedge/kubeedge.sock
      enable: true
    websocket:
      address: 0.0.0.0
      enable: true
      port: 10000

默认配置

./cloudcore --defaultconfig
# With --defaultconfig flag, users can easily get a default full config file as reference, with all fields (and field descriptions) included and default values set. 
# Users can modify/create their own configs accordingly as reference. 
# Because it is a full configuration, it is more suitable for advanced users.

apiVersion: cloudcore.config.kubeedge.io/v1alpha1
kind: CloudCore
kubeAPIConfig:
  burst: 200
  contentType: application/vnd.kubernetes.protobuf
  kubeConfig: /root/.kube/config
  master: ""
  qps: 100
modules:
  cloudhub:
    enable: true
    keepaliveInterval: 30
    nodeLimit: 10
    quic:
      address: 0.0.0.0
      maxIncomingStreams: 10000
      port: 10001
    tlsCAFile: /etc/kubeedge/ca/rootCA.crt
    tlsCertFile: /etc/kubeedge/certs/edge.crt
    tlsPrivateKeyFile: /etc/kubeedge/certs/edge.key
    unixsocket:
      address: unix:///var/lib/kubeedge/kubeedge.sock
      enable: true
    websocket:
      address: 0.0.0.0
      enable: true
      port: 10000
    writeTimeout: 30
  edgecontroller:
    buffer:
      configmapEvent: 1
      endpointsEvent: 1
      podEvent: 1
      queryConfigmap: 1024
      queryEndpoints: 1024
      queryNode: 1024
      queryPersistentvolume: 1024
      queryPersistentvolumeclaim: 1024
      querySecret: 1024
      queryService: 1024
      queryVolumeattachment: 1024
      secretEvent: 1
      serviceEvent: 1
      updateNode: 1024
      updateNodeStatus: 1024
      updatePodStatus: 1024
    context:
      receiveModule: edgecontroller
      responseModule: cloudhub
      sendModule: cloudhub
    enable: true
    load:
      queryConfigmapWorkers: 4
      queryEndpointsWorkers: 4
      queryNodeWorkers: 4
      queryPersistentColumeClaimWorkers: 4
      queryPersistentVolumeWorkers: 4
      querySecretWorkers: 4
      queryServiceWorkers: 4
      queryVolumeAttachmentWorkers: 4
      updateNodeStatusWorkers: 1
      updateNodeWorkers: 4
      updatePodStatusWorkers: 1
    nodeUpdateFrequency: 10

边缘端配置

程序运行时，会获取网口及对应的 IP 地址，可自动检测真实的 IP。（额外：如果有多个 IP，但只有一个有默认网关，应该可以检测，待议）

最小配置

$ ./edgecore --minconfig
2020-02-18 22:51:09.753952 I | INFO: Install client plugin, protocol: rest
2020-02-18 22:51:09.780037 I | INFO: Installed service discovery plugin: edge
I0218 22:51:10.063748     873 util.go:395] Looking for default routes with IPv4 addresses
I0218 22:51:10.063875     873 util.go:400] Default route transits interface "enp0s3"
I0218 22:51:10.109175     873 util.go:209] Interface enp0s3 is up
I0218 22:51:10.111900     873 util.go:257] Interface "enp0s3" has 2 addresses :[172.18.18.10/16 fe80::a2d:57ce:28ba:4290/64].
I0218 22:51:10.119191     873 util.go:225] Checking addr  172.18.18.10/16.
I0218 22:51:10.119314     873 util.go:232] IP found 172.18.18.10
I0218 22:51:10.126894     873 util.go:263] Found valid IPv4 address 172.18.18.10 for interface "enp0s3".
I0218 22:51:10.126980     873 util.go:406] Found active IP 172.18.18.10 
# With --minconfig , you can easily used this configurations as reference.
# It's useful to users who are new to KubeEdge, and you can modify/create your own configs accordingly. 
# This configuration is suitable for beginners.

apiVersion: edgecore.config.kubeedge.io/v1alpha1
database:
  dataSource: /var/lib/kubeedge/edgecore.db
kind: EdgeCore
modules:
  edged:
    cgroupDriver: cgroupfs
    clusterDNS: ""
    clusterDomain: ""
    devicePluginEnabled: false
    dockerAddress: unix:///var/run/docker.sock
    gpuPluginEnabled: false
    hostnameOverride: latelee-VirtualBox
    interfaceName: eth0
    nodeIP: 172.18.18.10
    podSandboxImage: kubeedge/pause:3.1
    remoteImageEndpoint: unix:///var/run/dockershim.sock
    remoteRuntimeEndpoint: unix:///var/run/dockershim.sock
    runtimeType: docker
  edgehub:
    heartbeat: 15
    tlsCaFile: /etc/kubeedge/ca/rootCA.crt
    tlsCertFile: /etc/kubeedge/certs/edge.crt
    tlsPrivateKeyFile: /etc/kubeedge/certs/edge.key
    websocket:
      enable: true
      handshakeTimeout: 30
      readDeadline: 15
      server: 127.0.0.1:10000
      writeDeadline: 15
  eventbus:
    mqttMode: 2
    mqttQOS: 0
    mqttRetain: false
    mqttServerExternal: tcp://127.0.0.1:1883
    mqttServerInternal: tcp://127.0.0.1:1884

默认配置

./edgecore --defaultconfig
2020-02-18 22:54:25.586421 I | INFO: Install client plugin, protocol: rest
2020-02-18 22:54:25.587414 I | INFO: Installed service discovery plugin: edge
I0218 22:54:25.588917     887 util.go:395] Looking for default routes with IPv4 addresses
I0218 22:54:25.589682     887 util.go:400] Default route transits interface "enp0s3"
I0218 22:54:25.591263     887 util.go:209] Interface enp0s3 is up
I0218 22:54:25.591991     887 util.go:257] Interface "enp0s3" has 2 addresses :[172.18.18.10/16 fe80::a2d:57ce:28ba:4290/64].
I0218 22:54:25.593474     887 util.go:225] Checking addr  172.18.18.10/16.
I0218 22:54:25.593883     887 util.go:232] IP found 172.18.18.10
I0218 22:54:25.594312     887 util.go:263] Found valid IPv4 address 172.18.18.10 for interface "enp0s3".
I0218 22:54:25.594539     887 util.go:406] Found active IP 172.18.18.10 
I0218 22:54:25.613268     887 util.go:395] Looking for default routes with IPv4 addresses
I0218 22:54:25.613309     887 util.go:400] Default route transits interface "enp0s3"
I0218 22:54:25.613599     887 util.go:209] Interface enp0s3 is up
I0218 22:54:25.613761     887 util.go:257] Interface "enp0s3" has 2 addresses :[172.18.18.10/16 fe80::a2d:57ce:28ba:4290/64].
I0218 22:54:25.613818     887 util.go:225] Checking addr  172.18.18.10/16.
I0218 22:54:25.613845     887 util.go:232] IP found 172.18.18.10
I0218 22:54:25.613867     887 util.go:263] Found valid IPv4 address 172.18.18.10 for interface "enp0s3".
I0218 22:54:25.613884     887 util.go:406] Found active IP 172.18.18.10 
# With --defaultconfig flag, users can easily get a default full config file as reference, with all fields (and field descriptions) included and default values set. 
# Users can modify/create their own configs accordingly as reference. 
# Because it is a full configuration, it is more suitable for advanced users.

apiVersion: edgecore.config.kubeedge.io/v1alpha1
database:
  aliasName: default
  dataSource: /var/lib/kubeedge/edgecore.db
  driverName: sqlite3
kind: EdgeCore
modules:
  dbtest:
    enable: false
  devicetwin:
    enable: true
  edged:
    cgroupDriver: cgroupfs
    clusterDNS: ""
    clusterDomain: ""
    devicePluginEnabled: false
    dockerAddress: unix:///var/run/docker.sock
    edgedMemoryCapacity: 7852396000
    enable: true
    gpuPluginEnabled: false
    hostnameOverride: latelee-VirtualBox
    imageGCHighThreshold: 80
    imageGCLowThreshold: 40
    imagePullProgressDeadline: 60
    interfaceName: eth0
    maximumDeadContainersPerPod: 1
    nodeIP: 172.18.18.10
    nodeStatusUpdateFrequency: 10
    podSandboxImage: kubeedge/pause:3.1
    registerNode: true
    registerNodeNamespace: default
    remoteImageEndpoint: unix:///var/run/dockershim.sock
    remoteRuntimeEndpoint: unix:///var/run/dockershim.sock
    runtimeRequestTimeout: 2
    runtimeType: docker
  edgehub:
    enable: true
    heartbeat: 15
    projectID: e632aba927ea4ac2b575ec1603d56f10
    quic:
      handshakeTimeout: 30
      readDeadline: 15
      server: 127.0.0.1:10001
      writeDeadline: 15
    tlsCaFile: /etc/kubeedge/ca/rootCA.crt
    tlsCertFile: /etc/kubeedge/certs/edge.crt
    tlsPrivateKeyFile: /etc/kubeedge/certs/edge.key
    websocket:
      enable: true
      handshakeTimeout: 30
      readDeadline: 15
      server: 127.0.0.1:10000
      writeDeadline: 15
  edgemesh:
    enable: true
    lbStrategy: RoundRobin
  eventbus:
    enable: true
    mqttMode: 2
    mqttQOS: 0
    mqttRetain: false
    mqttServerExternal: tcp://127.0.0.1:1883
    mqttServerInternal: tcp://127.0.0.1:1884
    mqttSessionQueueSize: 100
  metamanager:
    contextSendGroup: hub
    contextSendModule: websocket
    enable: true
    podStatusSyncInterval: 60
  servicebus:
    enable: false

KubeEdge源码学习2：框架细分

2020-02-20T04:02:00.000Z

本文仅为个人的学习笔记（速记），不具任何参考意义。
根据文档理解整个框架。

总述

kubeedge本身不提供k8s apiserver功能，因此需要额外部署k8s主节点。但其实现了apiserver的一些操作，也因此原因，kubeedge可以无缝融入到k8s中。
用户通过kubectl操作，apiserver向 edgecontroller 发送请求，如node、pod状态等，最终通过websocket与边缘端通信。

edgecontroller

内有 downstream 、upstream 和 manager。downstream 负责将将信息下发到边缘，upstream 负责上传。
CommonResourceEventHandler，有OnAdd、OnUpdate、OnDelete事件实现，通过 NewCommonResourceEventHandler 创建，添加：

events := make(chan watch.Event)
rh := NewCommonResourceEventHandler(events)
si := cache.NewSharedInformer(lw, &v1.Node{}, 0)
si.AddEventHandler(rh)

适用：service secret node endpoint configmap，等。

NewListWatchFromClient：似乎是监听 kubectl 运作的（存疑）
NewSharedInformer：涉及到 k8s 的机制，暂不明其理。
ListWatch是一种很重要的机制，待研究。

KubeEdge源码学习1：杂记

2020-02-20T04:01:00.000Z

本文仅为个人的学习笔记（速记），不具任何参考意义。
主入口，其它主要模式、机制记录。

阅读疑问

注册模块，模块怎么理解？

模块中还有消息源？

发送、接收消息为一类函数
创建消息、创建回复消息为一类函数

主入口

使用cobra框架。cobra在较多golang项目中使用。以“命令”形式添加命令行参数或子命令。其形式可参考docker、k8s、git等等。

多用NewXXX形式函数创建。云边均用server.go作入口源码文件。

云端主函数(cloud/cmd/cloudcore.go)：

func main() {
truecommand := app.NewCloudCoreCommand()
truelogs.InitLogs()
truedefer logs.FlushLogs()

trueif err := command.Execute(); err != nil {
truetrueos.Exit(1)
true}
}

创建“命令”(cloud/cmd/cloudcore/app/server.go)：

func NewCloudCoreCommand() *cobra.Command {
trueopts := options.NewCloudCoreOptions()
truecmd := &cobra.Command{
        ...
        Run: func(cmd *cobra.Command, args []string) {
            config, err := opts.Config() // 配置
            registerModules(config) // 注册模块
            core.Run() // 运行模块
        }

注册的模块：

// registerModules register all the modules started in cloudcore
func registerModules(c *v1alpha1.CloudCoreConfig) {
truecloudhub.Register(c.Modules.CloudHub, c.KubeAPIConfig)
trueedgecontroller.Register(c.Modules.EdgeController, c.KubeAPIConfig, "", false)
truedevicecontroller.Register(c.Modules.DeviceController, c.KubeAPIConfig)
truesynccontroller.Register(c.Modules.SyncController, c.KubeAPIConfig)
}

边缘端主函数(edge/cmd/edgecore.go)：

func main() {
truecommand := app.NewEdgeCoreCommand()
truelogs.InitLogs()
truedefer logs.FlushLogs()

trueif err := command.Execute(); err != nil {
truetrueos.Exit(1)
true}
}

NewEdgeCoreCommand函数：

// NewEdgeCoreCommand create edgecore cmd
func NewEdgeCoreCommand() *cobra.Command {
trueopts := options.NewEdgeCoreOptions()
truecmd := &cobra.Command{
truetrueRun: func(cmd *cobra.Command, args []string) {
            config, err := opts.Config()
            registerModules(config) // 注册模块
            core.Run() // 运行
        }
    }

注册模块：


// registerModules register all the modules started in edgecore
func registerModules(c *v1alpha1.EdgeCoreConfig) {
truedevicetwin.Register(c.Modules.DeviceTwin, c.Modules.Edged.HostnameOverride)
trueedged.Register(c.Modules.Edged)
trueedgehub.Register(c.Modules.EdgeHub, c.Modules.Edged.HostnameOverride)
trueeventbus.Register(c.Modules.EventBus, c.Modules.Edged.HostnameOverride)
trueedgemesh.Register(c.Modules.EdgeMesh)
truemetamanager.Register(c.Modules.MetaManager)
trueservicebus.Register(c.Modules.ServiceBus)
truetest.Register(c.Modules.DBTest)
true// Nodte: Need to put it to the end, and wait for all models to register before executing
truedbm.InitDBConfig(c.DataBase.DriverName, c.DataBase.AliasName, c.DataBase.DataSource)
}

其它类似的：

1
2
3

func NewAdmissionCommand() *cobra.Command {
func NewCSIDriverCommand() *cobra.Command {
func NewEdgeSiteCommand() *cobra.Command {

keadm不在此列

初始化参数选项

使用--minconfig或--defaultconfig选项，可生成默认信息，默认打印到终端（因为用fmt.Print)，需要手动重定向到文件，默认配置文件为/etc/kubeedge/cofig/cloudcore.ymal、/etc/kubeedge/cofig/edgecore.ymal，可用--config指定，建议默认。
参数初始化概述：
打印配置，然后退出程序：

1 2	flag.PrintMinConfigAndExitIfRequested(v1alpha1.NewMinCloudCoreConfig()) flag.PrintDefaultConfigAndExitIfRequested(v1alpha1.NewDefaultCloudCoreConfig())

检测是否合法：opts.Validate()，获取配置结构体：opts.Config()，再检测一次：ValidateCloudCoreConfiguration(config)。

beehive

云边的各种模块使用beehive框架管理、通信，与阿里的不知有否密切联系。个人理解，一是方便各模块注册、运行，从代码结构上看较清晰（即使不用通信功能）。二是各模块间的通信(使用channel通信，在socket，但未实现)。
使用：

创建结构体，字段自定义，但必须有enable字段
type cloudHub struct {
trueenable bool
}

func newCloudHub(enable bool) *cloudHub {
truereturn &cloudHub{
truetrueenable: enable,
true}
}

注册，可初始化本模块其它参数
func Register() {
truehubconfig.InitConfigure(hub, kubeAPIConfig)
truecore.Register(newCloudHub(hub.Enable))
}

// 以下几个必须具备
// 用于通信的类别（本模块名，本模块所属组）
func (a *cloudHub) Name() string {
truereturn "cloudhub"
}

func (a *cloudHub) Group() string {
truereturn "cloudhub"
}

// 使能，之前通过配置文件，当前传参
// Enable indicates whether enable this module
func (a *cloudHub) Enable() bool {
truereturn a.enable
}

// 启动本模块，之前带参数，当前无参数，并且不用实现cleanup函数
func (a *cloudHub) Start() {
    ...
}

beehive核心机制简要说明：
全局表，模块表modules和disabledModules。Register函数中判断使能，是则加入modules。运行函数中，初始化上下文beehiveContext，再遍历模块表，用go启动协程。退出判断系统中断。

核心运行函数：

// Run starts the modules and in the end does module cleanup
func Run() {
true// Address the module registration and start the core
trueStartModules()
true// monitor system signal and shutdown gracefully
trueGracefulShutdown()
}

StartModules函数：

// StartModules starts modules that are registered
func StartModules() {
    // MsgCtxTypeChannel值为channel，目前只支持此类型
truebeehiveContext.InitContext(beehiveContext.MsgCtxTypeChannel)

    // 获取所有模块，再用go启动模块的start函数
truemodules := GetModules()
truefor name, module := range modules {
truetrue//Init the module
truetruebeehiveContext.AddModule(name)
truetrue//Assemble typeChannels for sendToGroup
truetruebeehiveContext.AddModuleGroup(name, module.Group())
truetruego module.Start()
truetrueklog.Infof("Starting module %v", name)
true}
}

GracefulShutdown函数：

// GracefulShutdown is if it gets the special signals it does modules cleanup
func GracefulShutdown() {
truec := make(chan os.Signal)
truesignal.Notify(c, syscall.SIGINT, syscall.SIGHUP, syscall.SIGTERM,
truetruesyscall.SIGQUIT, syscall.SIGILL, syscall.SIGTRAP, syscall.SIGABRT)
trueselect {
truecase s := <-c:
truetrueklog.Infof("Get os signal %v", s.String())
truetrue//Cleanup each modules
truetruebeehiveContext.Cancel()
truetruemodules := GetModules()
truetruefor name, _ := range modules {
truetruetrueklog.Infof("Cleanup module %v", name)
truetruetruebeehiveContext.Cleanup(name)
truetrue}
true}
}

viaduct

通信框架。具体细节暂无研究，边缘端发送消息即通过该框架，如1.2版本自动注册节点。

消息

消息名：ModuleNameEdgeHub

edged 发送：
resource := fmt.Sprintf("%s/%s/%s", e.namespace, model.ResourceTypeNodeStatus, e.nodeName)
nodeInfoMsg := message.BuildMsg(modules.MetaGroup, "", modules.EdgedModuleName, resource, model.InsertOperation, node)
res, err := beehiveContext.SendSync(edgehub.ModuleNameEdgeHub, *nodeInfoMsg, syncMsgRespTimeout)
    
edgehub 接收
message, err := beehiveContext.Receive(ModuleNameEdgeHub)
发送云端
eh.sendToCloud(message)

===============

消息名：EdgeControllerModuleName
发送：
msg.BuildRouter(constants.EdgeControllerModuleName, constants.GroupResource, resource, operation)
msg.Content = secret
err = dc.messageLayer.Send(*msg)

接收：

其它

常量定义：

// Constants for database operations and resource type settings
const (
trueInsertOperation        = "insert"
trueDeleteOperation        = "delete"
trueQueryOperation         = "query"
trueUpdateOperation        = "update"
trueResponseOperation      = "response"
trueResponseErrorOperation = "error"

trueResourceTypePod        = "pod"
trueResourceTypeConfigmap  = "configmap"
trueResourceTypeSecret     = "secret"
trueResourceTypeNode       = "node"
trueResourceTypePodlist    = "podlist"
trueResourceTypePodStatus  = "podstatus"
trueResourceTypeNodeStatus = "nodestatus"
)


// constants for resource types
const (
trueResNode   = "node"
trueResMember = "membership"
trueResTwin   = "twin"
trueResAuth   = "auth_info"
trueResDevice = "device"
)

// constants for resource operations
const (
trueOpGet        = "get"
trueOpResult     = "get_result"
trueOpList       = "list"
trueOpDetail     = "detail"
trueOpDelta      = "delta"
trueOpDoc        = "document"
trueOpUpdate     = "updated"
trueOpInsert     = "insert"
trueOpDelete     = "deleted"
trueOpConnect    = "connected"
trueOpDisConnect = "disconnected"
trueOpKeepalive  = "keepalive"
)

// constants for message source
const (
trueSrcCloudHub         = "cloudhub"
trueSrcEdgeController   = "edgecontroller"
trueSrcDeviceController = "devicecontroller"
trueSrcManager          = "edgemgr"
)


// constants for identifier information for edge hub
const (
trueProjectID = "project_id"
trueNodeID    = "node_id"
)

KubeEdge源码学习0：架构记录

2020-02-20T04:00:00.000Z

本文仅为个人的学习笔记（速记），不具任何参考意义。
收集自网络片段及官方视频，但用自己理解的话字出来。

Cloud部分

以旁路方式接入k8s master，不影响原有k8s功能。
负责将k8s的操作指令发送到边缘端。
同时，将边缘端的状态、事件同步到k8s中。
注：k8s只知道资源，不知道真正机器在哪里。

EdgeController

边缘节点管理。应用状态数据云边协同。

DeviceController

接入和管理边缘设备。设备数据云边协同。

SyncController

1.2版本引入，看得不明白，待写。

CSI 驱动

同步存储数据到边缘端，为兼容标准的 CSI 做的适配器。

Admission webhook

校验进入kubeedge

Edge部分

管理边缘端的所有操作。

EdgeHub

与CloudHub通过websocket通信，提供可靠的云边信息同步（可理解为唯一的云边通道）。与云端的cloudhub对等。有：同步云端的资源到边缘端，将边缘的状态上报到云端。

MetaManager

元数据本地持久化。configmap、secret等，通过该组件写到本地存储，sqlite(足够轻量)。
离线自治功能？
MetaManager 模块后端对应一个本地的数据库（sqlLite），所有其他模块需要与 cloud 端通信的内容都会被保存到本地 DB 种一份，当需要查询数据时，如果本地 DB 中存在该数据，就会从本地获取，这样就避免了与 cloud 端之间频繁的网络交互；同时，在网络中断的情况下，本地的缓存的数据也能够保障其稳定运行（比如你的智能汽车进入到没有无线信号的隧道中），在通信恢复之后，重新同步数据。是边缘节点自治能力的关键；

Edged

类似kubelet，实现了pod生命周期管理的功能，但又删除了部分功能。实现 Pod，Volume，Node 等 Kubernetes 资源对象的生命周期管理。可接入docker、containerd、cri-o。

DeviceTwin

将设备管理数据（开头、状态）持久化存储，同时同步设备信息到云端。（是设备孪生概念，所以起名为devicetwin）

EventBus

实际为mqtt客户端，为其他组件提供发布和订阅功能

ServiceBus

实际是http客户端。vis类似EventBus，只通信协议不同。

Edgemesh

基于Istio的横跨Cloud和Edge的服务网格解决方案；

EdgeController

管理边缘节点。它是一个扩展的 Kubernetes 控制器，管理边缘节点和 pod 元数据，以便数据可以面向特定的边缘节点

Edgesite

为满足在边缘需要完整集群功能的场景，定制的在边缘搭建既能管理、编排又能运行负载的完整集群解决方案；

mappers

物联网协议实现包。
设备信息管理模块主要通过MQTT协议与接入到边缘端的设备交互。
支持的协议：MQTT、BlueTooth、OPC UA、Modbus。

模块、名称及所属组

ModuleNameGroup
DeviceTwintwintwin
edgededgededged
EdgeHubwebsockethub
eventbuseventbusbus
edgemeshedgemeshmesh
metaManagermetaManagermeta
servicebusservicebusbus
testtestManagermeta

KubeEdge 实践过程的记录

2020-02-19T15:03:00.000Z

本文记录KubeEdge实践的一些记录，包括疑问和解决方案。本文不定时更新。

杂项

编译kubeedge，内存为2GB会出错，4G正常。
同一个pod导出节点端口相同，扩容会不成功，因为节点端口已被占用。
需要先运行得到配置文件，再修改。注意配置文件位置，注意系统平台框架，如果是arm平台，但pause不使用kubeedge/pause-arm:3.1，则出错。
检查主机名称，必须合规（小写字母、数字、横杠-、点号.），否则注册不了，有时返回信息为err:，无法排查。
边缘端系统需要有默认网关，否则运行会有段错误。按issue说法，此已解决，但依然有。
KubeEdge 不完全等同于 k8s，k8s的部分命令还没有实现。如查看、运行容器的命令就没有。

我收集的相关的bug

2020.4.27 记录：
led 示例：在创建 crds 时，会创建 configmap，但有时候可能没有 Data，即没有 yaml 文件里面的字段，手动删除 cm，再创建 crds，可能又会出现。如果没有 cm，则边缘端 docker 提示找不到 json 文件。

2020.4.19 记录：
本地制作测试镜像（即在边缘端机器编译 Demo 后直接制作镜像，为测试简单如此做）。在云端创建 deployment，正常，删除，此时云端的 pod 为 Terminating 状态。少时，测试镜像被删除了，查边缘端日志，未有发现。上月亦发现过。
经查：是机器空间不足，低于80%。（注：根目录占用为7%，另挂载的windows目录占90%，不知何故会提示不足）

2020.3.30 记录：
arm边缘端跑约1.5天，段错误。

panic: runtime error: index out of range

goroutine 100 [running]:
github.com/kubeedge/kubeedge/edgemesh/pkg/proxy.updateServer(0x4cbb180, 0x12, 0x4c47dc8, 0x0, 0x2, 0x4c47dd0, 0x0)
        /home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/edgemesh/pkg/proxy/proxy.go:457 +0x528

云端为NotReady，边缘端的pod还在。重新运行，连接上之后，pod重新生成新的。
注：再加一台x86的运行，deployment扩容为2，作对比。
接上，经过一晚，早上看，edgecore正常运行，但云端为NotReady，从边缘端日志未发现异常，有日志表示上报。停止边缘，再启动，报超时，约几分钟后，连接上，但此时：边缘docker在运行，云端为Pending或Terminating。云端强制删除，可行，边缘端用docker stop停止，会自动再启动pod，云端未发现，感觉此时状态已乱。
停掉edgecore，将所有docker删除，启动边缘，连上云端，此时，边缘的docker会自动启动，感觉边缘记住了此状态。但云端不知道。

2020.3.19记录：
不支持kubectl exec和kubectl logs命令，官方说后续支持。待观察。
调度信息不够。从kubectl describe中只知道成功调度到了某个节点，至于成功或失败，不知道。只能到节点机器看用docker logs查日志。

我的一些设想

目前看，在云端配置的mapper，只针对一个节点，即一个设备。因为k8s调度时会通过节点选择。如此一来，则不太适合批量部署。如果改，未知。是否与kubeedge设计理念冲突，未知。

问题

无法调度

环境：3台主机，已部署k8s。清理k8s。
按k8s部署deployment，查看pod，显示Pending，删除pod，显示Terminating。再尝试，发现有一个pod可运行在其中一节点，扩容，该节点可运行，另一节点Pending。经过一晚，依旧。
强制停止cloudcore 和 edgecore，k8s中的节点显示NotReady。节点的容器依旧在运行。

疑问：
无法调度，何解？如果优雅关掉pod，再停止cloudcore？目前找不到方法。

云端打印：

messagehandler.go:448] write error, connection for node edge-node2 will be closed, affected event id: dba8d7ec-ffa4-4c6f-ac6e-accfa527a366, parent_id: , group: resource, source: edgecontroller, resource: default/pod/nginx-deployment-77698bff7d-jdm8k, operation: update, reason tls: use of closed connection

边缘端打印：

1 2	process.go:130] failed to send message: tls: use of closed connection process.go:196] websocket write error: failed to send message, error: tls: use of closed connection

猜测：连接断开，但查看node状态，是Ready状态，不知何故。
后续：删除，过一段时间，再部署，成功。

正常连接，跑，一夜后，NotReady状态。pod不断销毁，不断创建。

# kubectl get pod
NAME                                         READY   STATUS        RESTARTS   AGE
led-light-mapper-deployment-94bbdf88-26h2d   0/1     Terminating   0          14h
led-light-mapper-deployment-94bbdf88-2hwxq   0/1     Terminating   0          90m
led-light-mapper-deployment-94bbdf88-4f8pd   0/1     Terminating   0          80m
led-light-mapper-deployment-94bbdf88-52p9w   0/1     Terminating   0          15m
led-light-mapper-deployment-94bbdf88-8t9cl   0/1     Terminating   0          30m
led-light-mapper-deployment-94bbdf88-9bpt7   0/1     Terminating   0          95m
led-light-mapper-deployment-94bbdf88-9nfk6   0/1     Terminating   0          65m
led-light-mapper-deployment-94bbdf88-c8wtb   0/1     Terminating   0          85m
led-light-mapper-deployment-94bbdf88-kpcx4   0/1     Terminating   0          75m
led-light-mapper-deployment-94bbdf88-kwgqs   0/1     Terminating   0          35m
led-light-mapper-deployment-94bbdf88-l6hn2   0/1     Terminating   0          55m
led-light-mapper-deployment-94bbdf88-pk6fx   0/1     Terminating   0          5m1s
led-light-mapper-deployment-94bbdf88-qk9gj   0/1     Terminating   0          60m
led-light-mapper-deployment-94bbdf88-sgns2   0/1     Terminating   0          100m
led-light-mapper-deployment-94bbdf88-sk8gf   0/1     Terminating   0          20m
led-light-mapper-deployment-94bbdf88-svkgr   0/1     Terminating   0          50m
led-light-mapper-deployment-94bbdf88-tjz7z   0/1     Terminating   0          45m
led-light-mapper-deployment-94bbdf88-vwx7w   0/1     Pending       0          1s
led-light-mapper-deployment-94bbdf88-xfsc8   0/1     Terminating   0          10m
led-light-mapper-deployment-94bbdf88-xpq8k   0/1     Terminating   0          40m
led-light-mapper-deployment-94bbdf88-zhj24   0/1     Terminating   0          25m
led-light-mapper-deployment-94bbdf88-zncjg   0/1     Terminating   0          70m

查边缘端：

I0319 09:17:05.425874    2147 communicate.go:151] has msg
I0319 09:17:05.426062    2147 communicate.go:155] redo task due to no recv
I0319 09:17:05.427233    2147 communicate.go:151] has msg
I0319 09:17:05.427416    2147 communicate.go:155] redo task due to no recv
I0319 09:17:05.428657    2147 dtcontext.go:69] CommModule is healthy 1584580625

context_channel.go:175] the message channel is full, message: {Header:{ID:5f072fe2-b8cf-411e-8aee-16e927f27433 ParentID: Timestamp:1584580605260 ResourceVersion:391570 Sync:false} Router:{Source:edgecontroller Group:resource Operation:update Resource:default/pod/led-light-mapper-deployment-94bbdf88-26h2d} Content:map[metadata:map[creationTimestamp:2020-03-18T10:23:50Z deletionGracePeriodSeconds:30 deletionTimestamp:2020-03-18T23:40:09Z generateName:led-light-mapper-deployment-94bbdf88- labels:map[app:led-light-mapper pod-template-hash:94bbdf88] name:led-light-mapper-deployment-94bbdf88-26h2d namespace:default ownerReferences:[map[apiVersion:apps/v1 blockOwnerDeletion:true controller:true kind:ReplicaSet name:led-light-mapper-deployment-94bbdf88 uid:52c44b48-1214-4b10-9007-23093a953a40]] resourceVersion:391570 selfLink:/api/v1/namespaces/default/pods/led-light-mapper-deployment-94bbdf88-26h2d uid:12002c7e-69fe-4a31-bf66-759d78380abe] spec:map[containers:[map[image:latelee/led-light-mapper:v1.1 imagePullPolicy:IfNotPresent name:led-light-mapper-container resources:map[] securityContext:map[privileged:true] terminationMessagePath:/dev/termination-log terminationMessagePolicy:File volumeMounts:[map[mountPath:/opt/kubeedge/ name:config-volume] map[mountPath:/var/run/secrets/kubernetes.io/serviceaccount name:default-token-gb4kq readOnly:true]]]] dnsPolicy:ClusterFirst enableServiceLinks:true hostNetwork:true nodeName:latelee.org.ttucon-2142ec priority:0 restartPolicy:Always schedulerName:default-scheduler securityContext:map[] serviceAccount:default serviceAccountName:default terminationGracePeriodSeconds:30 tolerations:[map[effect:NoExecute key:node.kubernetes.io/not-ready operator:Exists tolerationSeconds:300] map[effect:NoExecute key:node.kubernetes.io/unreachable operator:Exists tolerationSeconds:300]] volumes:[map[configMap:map[defaultMode:420 name:device-profile-config-edge-node2] name:config-volume] map[name:default-token-gb4kq secret:map[defaultMode:420 secretName:default-token-gb4kq]]]] status:map[phase:Pending qosClass:BestEffort]]}

DNS警告：

I0319 16:25:18.563472   17947 record.go:24] Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
I0319 16:25:18.563724   17947 record.go:24] Warning MissingClusterDNS pod: "webgin-deployment-747c6887f5-dwmtb_default(1ceb1dd6-6dae-4aff-a2c6-d0de64373031)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
I0319 16:25:18.563902   17947 record.go:19] Warning DNSConfigForming Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 8.8.8.8 8.8.4.4 2001:4860:4860::8888
E0319 16:25:18.564035   17947 dns.go:135] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 8.8.8.8 8.8.4.4 2001:4860:4860::8888

I0319 16:30:09.037479   17947 edged.go:808] consume added pod [webgin-deployment-7ccff86d8b-s227c] successfully
I0319 16:30:10.506631   17947 record.go:19] Normal Started Started container webgin
E0319 16:30:10.507199   17947 kuberuntime_container.go:172] Failed to create legacy symbolic link "/var/log/containers/webgin-deployment-747c6887f5-f6547_default_webgin-1772b70cd7725f77c30b9cf47e3ce57159d9fdccf47c0c19aed8edf779c52c16.log" to container "1772b70cd7725f77c30b9cf47e3ce57159d9fdccf47c0c19aed8edf779c52c16" log "/var/log/pods/default_webgin-deployment-747c6887f5-f6547_abc27c3c-50f1-49e9-9f2e-b00fa802dc7f/webgin/0.log": symlink /var/log/pods/default_webgin-deployment-747c6887f5-f6547_abc27c3c-50f1-49e9-9f2e-b00fa802dc7f/webgin/0.log /var/log/containers/webgin-deployment-747c6887f5-f6547_default_webgin-1772b70cd7725f77c30b9cf47e3ce57159d9fdccf47c0c19aed8edf779c52c16.log: no such file or directory
I0319 16:30:10.507557   17947 edged.go:808] consume added pod [webgin-deployment-747c6887f5-f6547] successfully
I0319 16:30:10.667156   17947 edged.go:648] sync loop ignore event: [ContainerDied], with pod [1ceb1dd6-6dae-4aff-a2c6-d0de64373031] not found
W0319 16:30:10.685178   17947 docker_sandbox.go:394] failed to read pod IP from plugin/docker: Couldn't find network status for default/webgin-deployment-747c6887f5-f6547 through plugin: invalid network status for
W0319 16:30:10.871129   17947 docker_sandbox.go:394] failed to read pod IP from plugin/docker: Couldn't find network status for default/webgin-deployment-747c6887f5-f6547 through plugin: invalid network status for
I0319 16:30:10.914857   17947 container_manager_linux.go:880] Found 44 PIDs in root, 44 of them are not to be moved
I0319 16:30:11.088286   17947 edged.go:645] sync loop get event [ContainerStarted], ignore it now.
I0319 16:30:11.327738   17947 edged.go:645] sync loop get event [ContainerStarted], ignore it now.
W0319 16:30:12.413498   17947 docker_sandbox.go:394] failed to read pod IP from plugin/docker: Couldn't find network status for default/webgin-deployment-747c6887f5-f6547 through plugin: invalid network status for
W0319 16:30:12.543879   17947 docker_sandbox.go:394] failed to read pod IP from plugin/docker: Couldn't find network status for default/webgin-deployment-747c6887f5-f6547 through plugin: invalid network status for

成功部署pod的：

1
2
3

I0319 16:25:18.564503   17947 edged.go:808] consume added pod [webgin-deployment-747c6887f5-dwmtb] successfully
I0319 16:25:18.564974   17947 proxy.go:318] [L4 Proxy] process other resource: kube-system/endpoints/kube-scheduler
I0319 16:25:18.688263   17947 edged_volumes.go:54] Using volume plugin "kubernetes.io/empty-dir" to mount wrapped_default-token-gb4kq

KubeEdge temperature 部署

2020-02-19T15:03:00.000Z

本文对官方示例 temperature 进行实验。

功能说明

本示例主要是演示在云端获取边缘端的设备状态。

编译

本文对官方示例进行了修改。此处给出修改描述，详情参考修改后的源码。
1、参考 led 示例，新加 Makefile，修改 Dockerfile。
2、修改源码，去掉与硬件操作相关的函数，将采集的温度值 temperature 累加以观察其变化。
3、制作镜像，提交镜像。
3、修改 crds 和 deployment.yaml 文件，指定调度节点名称。

实验

部署：

1 2	kubectl apply -f crds/ kubectl apply -f deployment.yaml

查看pod：

# kubectl get pod
NAME                                READY   STATUS    RESTARTS   AGE
temperature-mapper-77fb74f5-vzztl   1/1     Running   0          5m10s
`

在边缘商查看日志：

# docker logs ecc3ba6a29dc
2020-03-22T09:11:51.777 [    main] INFO  Sensor = DHT11: Temperature = 74*C, Humidity = 85% (retried 1 times)
2020-03-22T09:11:56.778 [    main] INFO  Sensor = DHT11: Temperature = 75*C, Humidity = 85% (retried 1 times)
2020-03-22T09:12:01.778 [    main] INFO  Sensor = DHT11: Temperature = 76*C, Humidity = 85% (retried 1 times)

在云端观察：

kubectl get device temperature1 -oyaml -w

输出示例：
apiVersion: devices.kubeedge.io/v1alpha1
kind: Device
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"devices.kubeedge.io/v1alpha1","kind":"Device","metadata":{"annotations":{},"labels":{"description":"temperature","manufacturer":"test"},"name":"temperature1","namespace":"default"},"spec":{"deviceModelRef":{"name":"temperature-model"},"nodeSelector":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"","operator":"In","values":["latelee1"]}]}]}},"status":{"twins":[{"desired":{"metadata":{"type":"string"},"value":""},"propertyName":"temperature-status"}]}}
  creationTimestamp: "2020-03-22T09:04:18Z"
  generation: 77
  labels:
    description: temperature
    manufacturer: test
  name: temperature1
  namespace: default
  resourceVersion: "29280"
  selfLink: /apis/devices.kubeedge.io/v1alpha1/namespaces/default/devices/temperature1
  uid: e9869339-6d9b-4bf3-bf9f-c6191efeedc7
spec:
  deviceModelRef:
    name: temperature-model
  nodeSelector:
    nodeSelectorTerms:
    - matchExpressions:
      - key: ""
        operator: In
        values:
        - latelee1
status:
  twins:
  - desired:
      metadata:
        type: string
      value: ""
    propertyName: temperature-status
    reported:
      metadata:
        timestamp: "1584868316781"
        type: string
      value: 75C   // !!!! 此值会变化

源码研究

流程：
1、连接 mqtt：connectToMqtt，故需要边缘端开启 mqtt 服务，并监听 1883 端口。
2、采集温度：ReadDHTxxWithContextAndRetry，本例注释。
3、将温度值发布到mqtt：publishToMqtt。
4、之后进入 KubeEdge 系统，在云端可查看状态。

其它说明：
发布主题指定如下

1	deviceTwinUpdate := "$hw/events/device/" + "temperature" + "/twin/update"

temperature 为设备名称。需要与 Device 中的 metadata.name 保持一致。此名称也是 kubectl get device 查看到的名称。可修改使其不一致，观察效果。

发布消息前先创建消息体：createActualUpdateMessage，结构体为 DeviceTwinUpdate，具体如下：

1	map[string]*MsgTwin{"temperature-status": {Actual: &TwinValue{Value: &actualValue}, Metadata: &TypeMetadata{Type: "Updated"}}}

查看device.yaml：

status:
  twins:
    - propertyName: temperature-status
      desired:
        metadata:
          type: string
        value: ''

目前看，部分对应起来，深层理论待研究。

KubeEdge 1.2.0 部署

2020-02-19T15:03:00.000Z

本文介绍了如何在两台 ubuntu 16.04 64 bit 双核 CPU 虚拟机上从源码开始部署 KubeEdge 1.2.0 集群的过程，云端 Kubernetes 版本为 1.17.3，Golang 版本为1.12.4。本文基于 KubeEdge 1.1.0 部署文章，具有一定实践参考意义。

一、概述

1.1 环境

云端：ubuntu 16.04 64 bit，用户名为 ubuntu。IP 为 192.168.0.102。
边缘端：同上，IP 为 192.168.0.140。
KubeEdge部署涉及两端：

云端
docker， kubernetes 集群和 KubeEdge 云端核心模块。
边缘端
docker， mqtt 和 KubeEdge 边缘端核心模块。

技术总结：
1、搭建编译环境（建议自行编译源码），编译系统内存需足够大（如4GB）。
2、部署k8s，其中需要安装docker，安装k8s master节点，node不需要安装。
3、编译 KubeEdge，生成证书，分发证书。创建crds。
4、先运行得到配置文件，再修改。注意配置文件位置，注意系统平台框架。
5、检查主机名称，必须合规，否则注册不了。
6、运行。
7、KubeEdge 不完全等同于 k8s，k8s的部分命令还没有实现。

1.2版本使用情况：
1、边缘端系统设置默认网关，否则会有段错误，一般PC系统均有默认值，但 ARM 平台不一定，故可能会出现。1.2版本发布后已修正。
2、边缘端连接上云端，但边缘端停止，后再启动，此时需要花费几分钟才能连上云端。

1.2 依赖

依赖组件（含工具、编译器）及版本：

golang
版本 1.12.14，到 https://studygolang.com/dl 下载。编译源码需要，如果不编译，则无须安装。需要指出的是，KubeEdge 对 golang 版本有兼容性，当前官方源码的 master 分支已经支持 1.13 版本 golang，但本文依旧使用 1.12。
k8s 版本
1.17，具体部署参考其它文章。根据当前官方文档，已经支持 1.17 版本了。
mosquitto
1.6.8，到 https://mosquitto.org/download/ 下载。
KubeEdge 相关的（云端、边缘端）
最新发布版本为 v1.2.0，到 https://github.com/kubeedge/kubeedge/releases/tag/v1.2.0 下载。代码仓库为 https://github.com/kubeedge/kubeedge/ 。
注意，github并不是一直都十分稳定，所以下载可能会较耗时。

本文部署时间约为2020年2月中下旬。KubeEdge目前在快速开发中，请注时效性，以官方文档为准，本文仅针对 KubeEdge 1.2.0 有效。

1.3 方法论

最权威的莫过于 KubeEdge 官方安装文档：https://docs.kubeedge.io/en/latest/setup/setup.html 。该文档从源码编译到安装，均有涉及。然作为初学者，个人认为官方的文档还不够细致。网上也有很多安装部署教程，虽然方法不同，但目的是一样的。这里根据笔者经验简述如下：

步步为营
根据官方文档安装，先产生证书，再编译源码得到云端和边缘端核心程序。再修改配置文件，最后运行。
事预则立
下载官方编译好的二进制文件（cloudcore、edgecore），或者自行编译源码得到这两个二进制文件。准备好配置文件（并做适应性修改）。然后再运行。本文采用此方式。先准备好所有的文件，并统一放置到部署工程目录。

1.4 新版本主要变化

1.2.0 版本的云边端配置由各自的程序生成默认的配置，再手动修改，相比 1.1.0 版本方便很多。
新版本的边缘端默认自动注册云端（需在配置文件中指定云端IP）。
新版本加强了异步通信。

二、准备

2.1 创建部署文件目录

官方文档建议创建单独的目录存储二进制文件，如~/cmd/，在部署 1.1.0 版本时笔者也创建目录，但新版本做了优化，笔者认为无须部署目录，故省略。

2.2 KubeEdge 二进制

本文使用的 KubeEdge 是从源码编译得到。

2.2.1 官方编译好的文件

到 https://github.com/kubeedge/kubeedge/releases 下载官方编译好的二进制。压缩包名称为 kubeedge-v1.2.0-linux-amd64.tar.gz 。
也可以通过编译源码得到，方法参考文后。
另外，除了编译好的二进制外，还需要下载源码，地址为： https://github.com/kubeedge/kubeedge 。因为部分配置文件只在仓库源码中才可找到（当然，也可以直接在 github 仓库上找到对应的文件下载）。

2.2.2 KubeEdge 源码编译

1、Golang 环境搭建
下载好 golang，并解压：

1 2	# mkdir ~/tools # tar xf go1.12.14.linux-amd64.tar.gz -C ~/tools

在 ~/.bashrc 文件最后添加：

1
2
3

export PATH=$PATH:/home/ubuntu/tools/go/bin
export GOROOT=/home/ubuntu/kubeedge
export GOPATH=/home/ubuntu/kubeedge

执行 source ~/.bashrc 生效。验证：

1 2	# ubuntu@ubuntu:~/kubeedge$ go version go version go1.12.14 linux/amd64

2、克隆仓库：

1	# git clone https://github.com/kubeedge/kubeedge.git $GOPATH/src/github.com/kubeedge/kubeedge

如果克隆速度慢，可以直接下载zip包，并解压源码到 $GOPATH/src/github.com/kubeedge/kubeedge，注意，这个目录一定是源码所在目录。
切换 1.2.0 分支：

1	# git checkout -b release-1.2 remotes/origin/release-1.2

3、检测 gcc 版本：

1
2
3

# gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.

如果没有安装 gcc，则自行安装。

编译云端：

1 2	# cd $GOPATH/src/github.com/kubeedge/kubeedge/ # make all WHAT=cloudcore （注：进行coloud目录，执行make cloudcore 也可以，下类似）

编译边缘端：

1 2	# cd $GOPATH/src/github.com/kubeedge/kubeedge # make all WHAT=edgecore

生成的二进制位于_output/local/bin/目录下。

2.3 生成证书

1	# $GOPATH/src/github.com/kubeedge/kubeedge/build/tools/certgen.sh genCertAndKey edge

ca 和 certs 分别位于 /etc/kubeedge/ca 和 /etc/kubeedge/certs 目录。
注：实际上，生成的证书可以复用，这样在迁移时比较方便，但如果是不同的项目，建议用不同的证书。

2.4 创建设备模块和设备CRD yaml 文件

# cd $GOPATH/src/github.com/kubeedge/kubeedge/build/crds/devices
# kubectl create -f devices_v1alpha1_devicemodel.yaml
# kubectl create -f devices_v1alpha1_device.yaml

# cd $GOPATH/src/github.com/kubeedge/kubeedge/build/crds/reliablesyncs
# kubectl create -f cluster_objectsync_v1alpha1.yaml
# kubectl create -f objectsync_v1alpha1.yaml

注：新版本的 yaml 文件有两类，在 devices 和 reliablesyncs 目录。可用kubectl get crds查看。

2.6 配置云端节点

新版本配置文件由 cloudcore 程序生成，执行：

1
2
3

# cd $GOPATH/src/github.com/kubeedge/kubeedge/cloud
# mkdir -p /etc/kubeedge/config/ 
# ./cloudcore --minconfig > /etc/kubeedge/config/cloudcore.yaml

接着修改配置文件：

1	# vim /etc/kubeedge/config/cloudcore.yaml

官方默认为kubeconfig: "/root/.kube/config"，本文改为 kubeconfig: "/home/ubuntu/.kube/config"。其它保持默认。

2.7 配置边缘节点

新版本配置文件由 edgecore 程序生成，因此，需要在边缘端机器上执行。具体参考下文。

2.8 mqtt

mqtt 只有边缘端需要。
如果边缘端为 ubuntu 系统，直接使用源安装：

# add-apt-repository ppa:mosquitto-dev/mosquitto-ppa // 添加源
# apt-get update // 更新
# apt-get install mosquitto // 安装mqtt服务端
# apt-get install mosquitto-clients // 如果需要测试，则安装mqtt客户端

另外也可以使用源码编译。

在 ubuntu 系统，安装 mosquitto 成功后会自动启动服务。由于 KubeEdge 使用多个端口，故需用配置文件。服务端添加多端口：

vim /etc/mosquitto/conf.d/port.conf
port 1883
listener 1884
```  
此处指定 1883 和 1884 端口，从 KubeEdge 生成配置文件可知。没有指定协议，默认使用 mqtt。修改配置后需要重启：

/etc/init.d/mosquitto restart


或者手动启动：  
``` 
/usr/sbin/mosquitto -d -c /etc/mosquitto/mosquitto.conf

建议使用系统级别服务，预防漏掉此步骤，导致 KubeEdge 测试失败。

可用如下命令验证服务是否正常：

1	mosquitto_pub -h -p 1884 -t "hello" -m "this is hello world"

如果出现Error: Connection refused表示服务（及对应的端口）未启动。

在嵌入式 ARM Linux 环境中，Buildroot 已包含 mosquitto，可直接勾选。此处略，根据笔者实验，Buildroot 的 mosquitto 所有配置均在文件 /etc/mosquitto/mosquitto.conf 中。使用如下命令操作：

systemctl restart mosquitto  // 重启
systemctl stop mosquitto     // 停止
```  

## 三、部署

### 3.1 云端

#### 3.1.1 查看 k8s 集群
查看节点状态：

kubectl get node

NAME STATUS ROLES AGE VERSION
latelee-master Ready master 2m v1.17.0

此刻只有云端节点就绪。  

#### 3.1.3 运行云端核心
可以另建目录运行程序，也可以在源码目录，此处选择后者，方便调试。

cd $GOPATH/src/github.com/kubeedge/kubeedge/cloud

./cloudcore // 建议先如此，方便观察日志

也可以：

nohup ./cloudcore > cloudcore.log 2>&1 &

1 2	如果使用系统服务方式，启动脚本为`build/tools/cloudcore.service`，需修改`ExecStart`为真实值。示例如下：

[Unit]
Description=cloudcore.service

[Service]
Type=simple
Restart=always
ExecStart=/etc/kubeedge/cloudcore

[Install]
WantedBy=multi-user.target

添加服务命令：

cp build/tools/cloudcore.service /etc/systemd/system/cloudcore.service
sudo systemctl daemon-reload
sudo systemctl start cloudcore


### 3.2 边缘端
#### 3.2.1 分发
前面已经准备好了文件，直接部署就方便很多。注意，需要将证书和边缘端文件拷贝到边缘机器上。方式多种，建议使用 scp 命令。前提是安装了 SSH 协议。**在边缘端机器上执行拷贝（也称为分发）示例**：

mkdir -p /etc/kubeedge/config ~/kubeedge/

cd ~/kubeedge/

scp -r 192.168.0.102:/etc/kubeedge/ca /etc/kubeedge

scp -r 192.168.0.102:/etc/kubeedge/certs /etc/kubeedge

scp -r 192.168.0.102:/home/ubuntu/kubeedge/src/github.com/kubeedge/kubeedge/edge/edgecore ~/kubeedge/

注1：此操作在边缘端机器上，非云端。假设部署工程目录为`~/kubeedge`。  
注2：此处直接将云端位于`/etc`目录的证书拷贝到边缘端机器的 /etc/kubeedge 目录。将边缘端文件拷贝到 ~/kubeedge 目录。
笔者配置了 SSH 允许 root 登陆，并添加了公钥。所以无须输入密码。如无此方式，可手动拷贝。    
注3：接注2，如果以其它登陆用户身份拷贝，可在IP地址前加用户名，如`sudo scp -r latelee@192.168.0.102:/etc/kubeedge/* /etc/kubeedge`。  

#### 3.2.2 配置
并生成配置文件：

./edgecore –minconfig > /etc/kubeedge/config/edgecore.yaml

修改配置文件：

vim /etc/kubeedge/config/edgecore.yaml

注意2处地方。修改`websocket`下的`server`，默认为`127.0.0.1:10000`，需改为实际云端 IP 地址，此处为`192.168.0.102:10000`。另一处是`podSandboxImage`，X86平台为`podSandboxImage: kubeedge/pause:3.1`，ARM 平台根据位数不同，可设为`kubeedge/pause-arm:3.1`或`ubeedge/pause-arm64:3.1`。  
其它要点：cgroup 驱动默认值为：`cgroupDriver: cgroupfs`，无须改 Docker 配置。网络设备接口名称及 IP 地址，会运行上述命令时自动检测获取，无须修改。  

#### 3.2.3 运行
接上，运行边缘端核心：

./edgecore // 建议先如此，方便观察日志

也可以：

nohup ./edgecore > edgecore.log 2>&1 &

1
2
3


如果使用系统服务方式，启动脚本为`build/tools/edgecore.service`，需修改`ExecStart`为真实值。  
示例如下：

[Unit]
Description=edgecore.service

[Service]
Type=simple
Restart=always
ExecStart=/etc/kubeedge/edgecore

[Install]
WantedBy=multi-user.target

添加服务：

cp build/tools/edgecore.service /etc/systemd/system/edgecore.service
sudo systemctl daemon-reload
sudo systemctl start edgecore


这里再强调一次 mqtt，另起终端，运行命令：  
``` 
/usr/sbin/mosquitto -d -c /etc/mosquitto/mosquitto.conf

3.3 验证

在云端查看状态：

# kubectl get nodes
NAME           STATUS   ROLES    AGE   VERSION
latelee-master Ready    master   49m   v1.17.3
latelee-node   Ready    edge     4m8s  v1.17.1-kubeedge-v0.0.0-master+$Format:%h$

云端和边缘端均为 Ready 状态。

尝试部署官方的 deployment：

1	kubectl apply -f $GOPATH/src/github.com/kubeedge/kubeedge/build/deployment.yaml

输出示例：

1
2
3

# kubectl get pod -owide
NAME                                           READY   STATUS    RESTARTS   AGE     IP       NODE         NOMINATED NODE   READINESS GATES
nginx-deployment-77698bff7d-t4pkg              1/1     Running   0          3m11s      latelee-node

扩容测试：

1	kubectl scale deployment nginx-deployment --replicas=4

预期效果：有4个pod出现，但只有1个正常运行，因为该 deployment 导出了节点端口，前面的 pod 已经占用，无法再分配。理论上，如果有4个节点，则会自动将 deployment 调度到4个节点上。

删除：

1	kubectl delete -f $GOPATH/src/github.com/kubeedge/kubeedge/build/deployment.yaml

注：截至2020年3月中下旬，KubeEdge 环境中，是不支持kubectl logs和kubectl exec命令的，官方说法未来会支持。

四、ARM部署

ARM的部署十分简单，将 edgecore 交叉编译即可，其它与 X86 环境是一致的，这里再列一次：创建对应的目录，分发证书，启动mqtt，运行。
安装交叉编译器：

1	sudo apt-get install gcc-arm-linux-gnueabihf

设置环境变量并编译：

export GOARCH=arm
export GOOS="linux"
export GOARM=7 
export CGO_ENABLED=1
export CC=arm-linux-gnueabihf-gcc
export GO111MODULE=off
make all WHAT=edgecore

注：KubeEdge 已经将依赖包纳入代码仓库，直接编译即可，不需要下载额外的包，为安全，可暂时禁止 GO111MODULE。

五、清除

kubeedge运行文件：
1、/etc/kubeedge/：证书、配置文件（云边均有）。
2、/var/lib/kubeedge/：云端有socket文件kubeedge.sock ，边缘端有数据库文件edgecore.db。

如果要完全清理kubeedge环境，上述目录需要删除。

六、问题

在测试时发现的问题及解决方法。

1、
云端运行时出现：

1 2	# ./cloudcore [address: Invalid value: "unix:///var/lib/kubeedge/kubeedge.sock": unixSocketAddress unix:///var/lib/kubeedge/kubeedge.sock dir /var/lib/kubeedge not exist , need create it]

解决：此目录保存socket文件，需要手动创建目录

1	mkdir -p /var/lib/kubeedge

2、
云端无法获取请求的资源：

1
2
3

./cloudcore
...
github.com/kubeedge/kubeedge/cloud/pkg/synccontroller/synccontroller.go:162: Failed to list *v1alpha1.ObjectSync: the server could not find the requested resource (get objectsyncs.reliablesyncs.kubeedge.io)

没有执行$GOPATH/src/github.com/kubeedge/kubeedge/build/crds/reliablesyncs目录的yaml文件，参见 2.4 小节。

3、
Cgroup 驱动不匹配：

1	[CGroupDriver: Invalid value: "groupfs": CGroupDriver value error]

如果 Docker 使用的驱动方式为 systemd，则需要修改 yaml 文件为 systemd，如使用 cgroupfs，Docker 也要保持一致。

4、
边缘端机器的配置文件的主机名称、IP，必须与真实主机一致。否则会注册不成功。

5、
节点注册失败：

create node LATE-LEE error: Node "LATE-LEE" is invalid: metadata.name: Invalid value: "LATE-LEE": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*') , register node failed

主机名称不合法，必须是小写字母、数字，其它字符只能是-或.（下划线也不行），而且名称的开头和结尾必须是小写字母。（注：这是k8s dns命名的一个规范）。

6、
清理相关。

1	Failed to check the running environment: Kubelet should not running on edge node when running edgecore

一般出现于k8s和kubeedge混用情况，需要完全清理k8s方可。相似问题有：

1	Orphan pod: Orphaned pod "8685b805-a1c7-4687-8ce8-c77d24af5828" found, but volume paths are still present on disk

如果要重新运行edgecore，需要删除/var/lib/kubeedge/edgecore.db。

七、小结

KubeEdge 在2020年2月10日（即本文发表前9天）发布了1.2.0。笔者在春节期间看了几天源码，略有所得，还用 visio 画了流程图并记录结构体参数，不料通网后更新代码，发现代码大变样，又得重新阅读一次。CHANGELOG 还没有细看，为了继续研究代码，于是先部署运行，再通过打印信息来跟踪，此为笔者研读代码习惯之一。
KubeEdge 的 beehive 框架目录使用了符号链接方式，不能在 Windows 文件系统存储。

参考

官方源码仓库： https://github.com/kubeedge/kubeedge
官方镜像： https://hub.docker.com/u/kubeedge
压缩包： https://github.com/kubeedge/kubeedge/releases
官方安装文档：https://docs.kubeedge.io/en/latest/setup/setup.html
KubeEdge環境を構築してみた by AWS EC2: https://qiita.com/S-dwinter/items/f1e92f21d4b23fbbba80
KubeEdge 部署: https://www.latelee.org/kubeedge/kubeedge-deploy.html

KubeEdge led部署

2020-02-19T15:03:00.000Z

本文对 KubeEdge 的 led 灯示例进行测试。

KubeEdge 官方示例文件仓库为 https://github.com/kubeedge/examples ，将其下载到$GOPATH/src/github.com/kubeedge/ 目录，本文所用目录为 led-raspberrypi 。
本文所有修改见仓库 https://github.com/latelee/kube-examples/tree/master/led-raspberrypi 。

下面按测试步骤描述。

技术小结

一般步骤：
1、编写程序，编译，根据需求适配不同平台。
2、制作镜像。可与编译合并到 Makefile 或脚本中。
3、提交镜像，因为镜像会调度到不同节点，所以要提交到dockerhub，但是实际中，可以通过docker命令拷贝镜像到节点机器。在调试阶段比较方便。（注：笔者经常用此法调试）
4、修改并创建设备模型。
5、使用 kubectl 部署，一般选用 deployment，有时需要配合其它组件，如ConfigMap。
6、修改状态，观察实际效果。查看状态，观察反馈效果。

源码说明

sample-crds：crds 配置，指定了调度的节点、GPIO号，LED默认状态，等等。下称设备模型。
configuration：配置相关，需要读取 deviceProfile.json （运行时生成）和 config.yaml 文件。
light_mapper.go：主程序文件，主要匹配 crds 设备模型，并与真实硬件交互。与硬件交互主要使用github.com/stianeikeland/go-rpio/包，由于笔者没硬件环境，故其操作 GPIO 的代码注释掉，仅作示例。

Dockerfile：生成 docker 镜像文件，笔者扩展了 arm 平台（注：笔者去掉了硬件操作，故代码可适用不同平台）。
deployment.yaml：deployment 配置文件，指定调度节点、镜像和 configMap （如果不指定，生成不了 json 文件）。

理论上，不同的硬件，其操作不同，因此需要不同的 crds 为匹配。这也是为什么 crds 中要指定节点的原因。不过，笔者认为，实践中，可能存在批量操作，即同一批硬件，其硬件相同，功能相同，因此使用的程序也相同，如温度采集等。此情况下，可以通过节点的 label 来匹配调度的节点。当然，这不是本文关注的重点。

编译

笔者修改了 Makefile，如下：

# make led_light_mapper
.PHONY: default led_light_mapper
led_light_mapper:
        export GOARCH=amd64; export GOOS="linux"; export GOARM=""; export CGO_ENABLED=1; export CC=cc; \
    go build light_mapper.go
        docker build -t latelee/led-light-mapper-x86:v1.1 . -f Dockerfile 
        export GOARCH=arm; export GOOS="linux"; export GOARM=7; export CGO_ENABLED=1; export CC=arm-linux-gnueabihf-gcc; \
    go build light_mapper.go
        docker build -t latelee/led-light-mapper-arm:v1.1 . -f Dockerfile-arm

分别使用不同编译器编译，并修改 docker 镜像地址。接着合并镜像：


docker push latelee/led-light-mapper-x86:v1.1
docker push latelee/led-light-mapper-arm:v1.1

export DOCKER_CLI_EXPERIMENTAL=enabled

docker manifest create latelee/led-light-mapper:v1.1 latelee/led-light-mapper-x86:v1.1 latelee/led-light-mapper-arm:v1.1

docker manifest annotate latelee/led-light-mapper:v1.1 latelee/led-light-mapper-x86:v1.1 --os linux --arch x86_64
docker manifest annotate latelee/led-light-mapper:v1.1 latelee/led-light-mapper-arm:v1.1 --os linux --arch armv7l

docker manifest push latelee/led-light-mapper:v1.1

创建设备模型

kubeedge在部署时已经创建了crds了，此处查看：

# kubectl get crds
NAME                                           CREATED AT
clusterobjectsyncs.reliablesyncs.kubeedge.io   2020-02-20T08:28:32Z
devicemodels.devices.kubeedge.io               2019-12-31T08:41:34Z
devices.devices.kubeedge.io                    2019-12-31T08:41:34Z
objectsyncs.reliablesyncs.kubeedge.io          2020-02-20T08:28:32Z

再创建led的crds。

1
2
3

cd $GOPATH/src/github.com/kubeedge/examples/led-raspberrypi
cd sample-crds
vim led-light-device-instance.yaml

修改 led-light-device-instance.yaml 文件，将节点改为 latelee.org.ttucon-2142ec。
创建：

# kubectl apply -f .
```  

查看

kubectl get deviceModel

NAME AGE
led-light 28h

kubectl get device

NAME AGE
led-light-instance-01 28h

查看详情：

kubectl describe devices.devices.kubeedge.io led-light-instance-01

1
2
3


### 部署
修改 deployment.yaml，如下：

apiVersion: apps/v1
kind: Deployment
metadata:
name: led-light-mapper-deployment
spec:
replicas: 1
selector:
matchLabels:
app: led-light-mapper
template:
metadata:
labels:
app: led-light-mapper
spec:
nodeName: latelee.org.ttucon-2142ec #edge-node2
hostNetwork: true
containers:

- name: led-light-mapper-container  image: latelee/led-light-mapper:v1.1  imagePullPolicy: IfNotPresent  securityContext:    privileged: true  volumeMounts:  - name: config-volume    mountPath: /opt/kubeedge/volumes:- name: config-volume  configMap:    name: device-profile-config-edge-node2restartPolicy: Always

1
2


部署：

kubectl apply -f deployment.yaml

等待调度完成。  

### 测试
修改 LED 状态值，把OFF改为ON，：

vim led-light-device-instance.yaml

1
2
3

再更新配置。  

或者直接实时修改：

kubectl edit device led-light-instance-01


注：
如果此时修改`led-light-device-model.yaml`的引脚号，再更新，是不成功的。  

### 检查
在节点机器上，执行`docker ps`查看容器。再进入容器查看json文件：

docker exec -it 4bc0f93a0174 cat /opt/kubeedge/deviceProfile.json

{“deviceInstances”:[{“id”:”led-light-instance-01”,”name”:”led-light-instance-01”,”model”:”led-light”}],”deviceModels”:[{“name”:”led-light”,”properties”:[{“name”:”power-status”,”dataType”:”string”,”description”:”Indicates whether the led light is ON/OFF”,”accessMode”:”ReadWrite”,”defaultValue”:”OFF”},{“name”:”gpio-pin-number”,”dataType”:”int”,”description”:”Indicates whether the GPIO pin to which LED is connected”,”accessMode”:”ReadOnly”,”defaultValue”:18}]}],”protocols”:[{“protocol_config”:null}]}

1 2	查看日志（加-f）：

docker logs -f 4bc0f93a0174
I0318 03:54:21.558189 1 light_mapper.go:242] Watching on the device twin values for device: led-light-instance-01
I0318 03:54:22.559353 1 light_mapper.go:272] Actual values are in sync with Expected value
I0318 03:54:22.559374 1 light_mapper.go:242] Watching on the device twin values for device: led-light-instance-01
I0318 03:54:23.560669 1 light_mapper.go:272] Actual values are in sync with Expected value
I0318 03:54:23.560695 1 light_mapper.go:242] Watching on the device twin values for device: led-light-instance-01
I0318 03:54:24.561883 1 light_mapper.go:248] Expected Value : ON
I0318 03:54:24.561909 1 light_mapper.go:252] Actual Value: OFF
I0318 03:54:24.561913 1 light_mapper.go:254] Equating the actual value to expected value
I0318 03:54:24.561918 1 light_mapper.go:257] Turning ON the light
I0318 03:54:24.561922 1 light_driver.go:11] TurnON pin: 18
I0318 03:54:24.562033 1 light_mapper.go:242] Watching on the device twin values for device: led-light-instance-01
I0318 03:54:25.563141 1 light_mapper.go:248] Expected Value : ON
I0318 03:54:25.563164 1 light_mapper.go:252] Actual Value: OFF
I0318 03:54:25.563168 1 light_mapper.go:254] Equating the actual value to expected value
I0318 03:54:25.563172 1 light_mapper.go:257] Turning ON the light
I0318 03:54:25.563195 1 light_driver.go:11] TurnON pin: 18
I0318 03:54:25.563281 1 light_mapper.go:242] Watching on the device twin values for device: led-light-instance-01

1
2
3

从日志中看到 GPIO 引脚的电平变化了。  

在云端查看状态：

kubectl get device led-light-instance-01 -oyaml -w

1	查看configmap详情：

kubectl get cm device-profile-config-edge-node2 -oyaml

1 2	### 排错如果不使用 KubeEdge 部署的话，容器报错：

Error while reading from config map Error while reading from config map open /opt/kubeedge/deviceProfile.json: no such file or directory

有时候即使用 KubeEdge 部署，也会报相同的错误，原因未知。  
记：查看configmap，没有创建。疑惑：只部署了deployment，里面指定configmap名称而已，何时由谁创建configmap的？
与其中一位项目开发者进行邮件和github交流，得知在创建device时，kubeedge会自动创建configmap的。可能哪里出错，创建不成功。网上也有一些人遇到相同的问题。  

非正当途径解决：根据上述json内容，创建名为deviceProfile.json的文件（一定是此名称，因为led程序代码使用这个文件名称），拷贝之。注意，json需用字符串形式，不能格式化。

kubectl create configmap led-config –from-file=deviceProfile.json

之后用` kubectl get cm -oyaml`查看。  

测试发现，导出的docker镜像，过一段时间会消失，此时，会卡在 ContainerCreating 阶段。  


似乎自动创建的cm，同一节点，名称是一样的。    
测试，先创建led设备模型，再创建temp设备模式，其结果如下：

kubectl describe cm device-profile-config-latelee1

Name: device-profile-config-latelee1
Namespace: default
Labels:
Annotations:

Data

deviceProfile.json:

{“deviceInstances”:[{“id”:”led-light-instance-01”,”name”:”led-light-instance-01”,”model”:”led-light”},{“id”:”temperature1”,”name”:”temperature1”,”model”:”temperature-model”}],”deviceModels”:[{“name”:”led-light”,”properties”:[{“name”:”power-status”,”dataType”:”string”,”description”:”Indicates whether the led light is ON/OFF”,”accessMode”:”ReadWrite”,”defaultValue”:”OFF”},{“name”:”gpio-pin-number”,”dataType”:”int”,”description”:”Indicates whether the GPIO pin to which LED is connected”,”accessMode”:”ReadOnly”,”defaultValue”:168}]},{“name”:”temperature-model”,”properties”:[{“name”:”temperature-status”,”dataType”:”string”,”description”:”Temperature collected from the edge device”,”accessMode”:”ReadOnly”,”defaultValue”:””}]}],”protocols”:[{“protocol_config”:null},{“protocol_config”:null}]}
Events:

1
2

docker save -o led.docker latelee/led-light-mapper-x86:v1.1

docker load -i led.docker

kubectl delete -f deployment.yaml

kubectl apply -f deployment.yaml

docker save -o temp.gdocker latelee/temp-mapper-x86

docker load -i temp.docker

kubectl delete pod –all –force –grace-period=0

kubectl get device temperature -oyaml -w

kubectl create configmap led-config –from-file=device.json

1
2


容器打印：

I0419 03:05:04.308734 1 led_demo.go:146] read configmap ok
I0419 03:05:04.308818 1 led_demo.go:148] device: configuration.DeviceInstance{ID:”led-demo-instance-01”, Name:”led-demo-instance-01”, Protocol:””, Model:”led-demo”}
I0419 03:05:04.308858 1 led_demo.go:151] device id: led-demo-instance-01
I0419 03:05:04.308871 1 led_demo.go:170] Changing the state of the device to online 111
I0419 03:05:35.354202 1 led_demo.go:250] Expected Value : ON
I0419 03:05:35.354246 1 led_demo.go:254] Actual Value: OFF
I0419 03:05:35.354278 1 led_demo.go:256] Equating the actual value to expected value
I0419 03:05:35.354297 1 led_demo.go:259] Turning ON the light
I0419 03:05:36.356098 1 led_demo.go:250] Expected Value : ON
I0419 03:05:36.356130 1 led_demo.go:254] Actual Value: OFF
I0419 03:05:36.356141 1 led_demo.go:256] Equating the actual value to expected value
I0419 03:05:36.356154 1 led_demo.go:259] Turning ON the light
`

KubeEdge 初测

2020-02-19T15:03:00.000Z

本文在已经成功部署了 KubeEdge 的集群中进行实测。目的是了解 KubeEdge 与 k8s 的异同。本文针对1.2版本。

一些说明

因为 KubeEdge 在 edgecore 上实现了 kubelet 部分功能，所以理论上是无缝接合的。
本文使用统一的镜像registry.cn-hangzhou.aliyuncs.com/latelee/webgin，该镜像的功能是提供 web 服务，返回运行时的 CPU、OS 和主机名称。笔者利用 docker manifest，可根据不同 CPU 拉取不同镜像，所以在 yaml 文件中统一使用同一名称，可自动匹配不同平台。

webgin镜像有几个版本：v1.0 v1.1 v1.2。

在主节点查看集群：

# kubectl get node
NAME                        STATUS   ROLES     AGE    VERSION
edge-node                   Ready    edge      9d     v1.17.1-kubeedge-v1.2.1-dirty
edge-node2                  Ready    k8snode   105m   v1.17.0
latelee.org.ttucon-2142ec   Ready    edge      9d     v1.17.1-kubeedge-v1.2.1-dirty
ubuntu                      Ready    master    9d     v1.17.4

其中 edge-node2 为 k8s，版本为 v1.17.0，edge-node 和 latelee.org.ttucon-2142ec 为 KubeEdge 边缘端，后者是 arm 板子系统主机。

在部署 KubeEdge 时已经创建了 crds 了，查看之（本文中作用不大）：

# kubectl get crds
NAME                                           CREATED AT
clusterobjectsyncs.reliablesyncs.kubeedge.io   2020-03-17T06:45:08Z
devicemodels.devices.kubeedge.io               2020-03-17T06:44:50Z
devices.devices.kubeedge.io                    2020-03-17T06:44:55Z
objectsyncs.reliablesyncs.kubeedge.io          2020-03-17T06:45:08Z

测试

测试 yaml 文件 webgin-service.yaml 如下：

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: webgin-deployment
  labels:
   app: webgin
spec:
  replicas: 3 # tells deployment to run 3 pods matching the template
  selector:
    matchLabels:
      app: webgin
  template:
    metadata:
      labels:
        app: webgin
    spec:
      containers:
      - name: webgin
        image: registry.cn-hangzhou.aliyuncs.com/latelee/webgin:v1.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: /etc/localtime
          name: time-zone
      volumes:
      - name: time-zone
        hostPath: 
          path: /etc/localtime
      hostNetwork: true
---

apiVersion: v1
kind: Service # 指定为service
metadata:
  labels:
    run: webgin
  name: webgin
  namespace: default
spec:
  ports:
  - port: 10080 # 对外为88端口
    targetPort: 80
  selector:
    app: webgin
  type: LoadBalancer

释义：deployment + service 组合，副本数为3（因为有3台节点机器），hostNetwork 模式，挂载日期文件是为了输出真实时间。

在主节点创建 deployment：

1	kubectl apply -f webgin-service.yaml

查看 pod：

# kubectl get pod -owide
NAME                                 READY   STATUS    RESTARTS   AGE    IP              NODE                        NOMINATED NODE   READINESS GATES
webgin-deployment-57cbd68f7f-f8hq4   1/1     Running   0          5m2s   192.168.0.153   edge-node2                             
webgin-deployment-57cbd68f7f-ktmfq   1/1     Running   0          5m2s   192.168.0.220   latelee.org.ttucon-2142ec              
webgin-deployment-57cbd68f7f-xt7hf   1/1     Running   0          5m2s             edge-node

三个节点均已正常运行。但是最后一台的IP没有看到，下面测试获取信息时，没有看到该节点。

查看服务：

# kubectl get svc
NAME         TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP      10.96.0.1              443/TCP        9d
webgin       LoadBalancer   10.97.98.218        88:31059/TCP   12m

访问web服务：

root@ubuntu:mytest# curl 10.97.98.218:88
Hello World v1.1
arch: arm os: linux hostname: latelee.org.ttucon-2142ec
uname: Linux latelee.org.ttucon-2142ec 4.14.67 #7 SMP PREEMPT Fri Feb 28 11:35:26 CST 2020 armv7l
Now: 2020-03-27 13:13:10

root@ubuntu:mytest# 
root@ubuntu:mytest# curl 10.97.98.218:88
Hello World v1.1
arch: amd64 os: linux hostname: edge-node2
uname: Linux edge-node2 4.4.0-174-generic #204-Ubuntu SMP Wed Jan 29 06:41:01 UTC 2020 x86_64
Now: 2020-03-27 13:13:21

root@ubuntu:mytest# 
root@ubuntu:mytest# curl 10.97.98.218:88
Hello World v1.1
arch: arm os: linux hostname: latelee.org.ttucon-2142ec
uname: Linux latelee.org.ttucon-2142ec 4.14.67 #7 SMP PREEMPT Fri Feb 28 11:35:26 CST 2020 armv7l
Now: 2020-03-27 13:13:29

结果：三台节点，在查看时，可以看到其中2台（平台不同），不同时候访问，其结果不同，理论上三台。

其它测试

对比kubectl describe命令。

k8s：
# kubectl describe pod webgin-deployment-57cbd68f7f-f8hq4
Events:
  Type    Reason     Age    From                 Message
  ----    ------     ----   ----                 -------
  Normal  Scheduled  9m3s   default-scheduler    Successfully assigned default/webgin-deployment-57cbd68f7f-f8hq4 to edge-node2
  Normal  Pulling    9m3s   kubelet, edge-node2  Pulling image "registry.cn-hangzhou.aliyuncs.com/latelee/webgin:v1.1"
  Normal  Pulled     8m57s  kubelet, edge-node2  Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/latelee/webgin:v1.1"
  Normal  Created    8m57s  kubelet, edge-node2  Created container webgin
  Normal  Started    8m57s  kubelet, edge-node2  Started container webgin

KubeEdge：
# kubectl describe pod webgin-deployment-57cbd68f7f-ktmfq

Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  9m25s  default-scheduler  Successfully assigned default/webgin-deployment-57cbd68f7f-ktmfq to latelee.org.ttucon-2142ec

结果：k8s 的输出信息相对全面一些。

对比kubectl logs命令。

k8s：
# kubectl logs webgin-deployment-57cbd68f7f-f8hq4
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  9m25s  default-scheduler  Successfully assigned default/webgin-deployment-57cbd68f7f-ktmfq to latelee.org.ttucon-2142ec
root@ubuntu:mytest# kubectl describe pod webgin-deployment-57cbd68f7f-ktmfqkubectl logs webgin-deployment-57cbd68f7f-f8hq4^C
root@ubuntu:mytest# kubectl logs webgin-deployment-57cbd68f7f-f8hq4
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] GET    /                         --> main.myIndex (3 handlers)
gin server start...
[GIN-debug] Listening and serving HTTP on :80
[GIN] 2020/03/27 - 13:13:21 | 200 |     171.926µs |   192.168.0.102 | GET      "/"

KubeEdge：
# kubectl logs webgin-deployment-57cbd68f7f-ktmfq
Error from server: Get https://192.168.0.220:10250/containerLogs/default/webgin-deployment-57cbd68f7f-ktmfq/webgin: dial tcp 192.168.0.220:10250: connect: connection refuse

结果：KubeEdge 不支持该命令。

对比kubectl exec命令。

k8s：
# kubectl exec -it webgin-deployment-57cbd68f7f-f8hq4 -- uname -a
Linux edge-node2 4.4.0-174-generic #204-Ubuntu SMP Wed Jan 29 06:41:01 UTC 2020 x86_64 GNU/Linux

KubeEdge：
# kubectl exec -it webgin-deployment-57cbd68f7f-ktmfq -- uname -a
Error from server: error dialing backend: dial tcp 192.168.0.220:10250: connect: connection refused

结果：KubeEdge 不支持该命令。

针对 hostNetwork 模式，在arm节点上查看容器IP(提供主要内容)：

# docker exec -it 71605a5e17a3 ifconfig
docker0   Link encap:Ethernet  HWaddr 02:42:00:00:00:94  
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0

eth0      Link encap:Ethernet  HWaddr 4C:00:00:00:00:EC  
          inet addr:192.168.0.220  Bcast:192.168.0.255  Mask:255.255.255.0

在arm节点上，没有产生如veth216ffbc7之类名称的网络设备。

升级

此处简述即可。
1、
修改 yaml 配置文件，修改镜像版本号，即将registry.cn-hangzhou.aliyuncs.com/latelee/webgin:v1.1改为registry.cn-hangzhou.aliyuncs.com/latelee/webgin:v1.2。
再更新：kubectl apply -f webgin-service.yaml。测试几次，均失败。一个是Pending，一个是Running。

2、命令行修改镜像名

1	kubectl set image deployment webgin-deployment webgin=registry.cn-hangzhou.aliyuncs.com/latelee/webgin:v1.2

测试几次，发现有时成功，有时失败。失败时一个是Pending，一个是Terminating。（注：在k8s节点可以成功）

kubectl rollout status deployment webgin-deployment
在spec添加：

minReadySeconds: 5
strategy:
# indicate which strategy we want for rolling update
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 1

修改，再应用，可行。

查看版本：

1	kubectl rollout history deployment webgin-deployment

回滚到指定的第2次升级的版本（2是指k8s记录的第2次升级）
kubectl rollout undo deployment webgin-deployment –to-revision=2

指定的镜像不存在，再升级（即故意让其失败）

应用时添加kubectl apply -f webgin-service.yaml --record=true
会记录版本信息，但似乎无用。

# kubectl rollout history deployment webgin-deployment
deployment.apps/webgin-deployment 
REVISION  CHANGE-CAUSE
1         kubectl apply --filename=webgin-service.yaml --record=true
2         kubectl apply --filename=webgin-service.yaml --record=true
3         kubectl apply --filename=webgin-service.yaml --record=true

疑问：如果更方便维护版本，即知道1做了什么，2做了什么

结论

两者还是存在差异。如部分命令不支持，如通信不稳定。

本文所用镜像，真实存在，但可能会不定时更新。本文所述，仅为本人实际测试之现象，不具通用性。

USB学习：hidapi库使用

2020-02-12T15:04:00.000Z

hidapi是一个开源的操作 HID 设备的库，使用 C 语言实现，适用于 Windows、Linux 和 Mac OSX 平台。注意，这个库是针对 HID 设备的，其它 USB 设备（如 U 盘）不一定适合。

hidapi 介绍

hidapi 源码仓库为：https://github.com/libusb/hidapi 。主要目录介绍如下：

hidapi: 头文件（所有平台共用一份头文件）hidapi.h
libusb：Linux系统实现源码文件hid.c，使用libusb库实现的方式
linux：Linux系统实现源码文件hid.c，使用内核接口实现方式
windows：Windows系统实现源码文件hid.c
mac：Mac OSX 系统实现源码文件hid.c
hidtest：测试代码 hidtest.c

注意，Linux 系统有两种实现方式，各有秋千，可根据需求选择，笔者使用内核接口的实现方式。

跨平台

hidapi 源码本身根据不同平台实现，但对外提供接口的头文件相同。因此直接使用对应平台的实现源码文件即可。Windows 系统下，官方建议使用 dll 方式，不过笔者发现直接使用源码文件更好，因为可以直接跟踪源码（方便调试阶段）。
如果我们的项目本身需要跨平台，则可以使用 _WIN32 或 __linux__ 这样的宏来区别平台，以便使用不同的源码文件。

主要接口介绍

本文不详细介绍测试示例，根据笔者的实践经验，按流程列出主要的接口。

初始化和退出：
1、hid_init：初始化，无参数，可以不调用，因为在后续接口中会自动判断并调用。
2、hid_exit：退出，实际上是销毁结构体等，如果不调用，会造成内存泄漏。

枚举：
1、hid_enumerate：枚举设备，返回的是hid_device_info链表。一般使用hid_enumerate(0, 0)枚举所有设备。枚举一般用于获取设备ID或者设备路径。如果提前知道这些信息，亦可不用枚举。
2、hid_free_enumeration：释放枚举所用到的链表。

设备打开与关闭：
1、hid_open：打开指定 VID 和 PID 设备，返回设备结构体指针，如 hid_device *handle; handle = hid_open(0x4d8, 0x3f, NULL)。在设备读写和关闭函数，均需要该结构体指针。
2、hid_open_path：根据设备路径打开设备，设备路径由hid_enumerate获取。如Linux下的 /dev/hidraw0。
3、hid_close：关闭设备。

feature report 收发：
1、hid_get_feature_report：获取 Feature report。
2、hid_send_feature_report：发送 Feature report。

读写：
1、hid_read：读取数据 Input report。
2、hid_write：写数据 Output report。

获取错误：
hid_error：返回上一次的错误，wchar_t类型字符串形式，注意，该字符串无须调用者释放。

其它如hid_get_manufacturer_string、hid_get_product_string等，似乎重要程度不够，故不列出。

Windows 使用

Windows 系统需要添加setupapi.lib库，可使用如下方式添加：

1
2
3

#ifdef _WIN32
#pragma comment (lib,"setupapi.lib")
#endif

也可以在项目属性->链接器->输入，在依赖项添加这个库。

MFC 工程的使用

由于 hidapi 使用 C 语言实现，需要将源码hidapi.c后缀改为hidapi.cpp，并且在该文件最前面添加头文件包含：

1	#include "stdafx.h"

同样需要添加setupapi.lib库的支持。

Windows 系统 Qt 工程的使用

添加 hidapi 文件即可，在工程文件.pro中添加相应的依赖库，如下：

1	win32: LIBS += -lsetupapi

开发经验备忘

网上关于 hidapi 的介绍，几乎都是基于官方测试代码的——当然，本文也是。但本节说一说笔者的开发经验。
1、有些 USB HID 设备在系统中可能对应多个设备，此情况下，PID 和 VID 均相同，无法使用，因此，需要使用来打开。

2、有些设备使用读写方式打开会失败，此情况下，会使用读方式打开，详见 hid.c 源码的 hid_open_path 函数。但并不代表该设备不可写，至少笔者曾遇到这种情况。为了验证，换了 Win7 系统，换了 Linux 系统测试，前后花费大概三个晚上调试、实验，最终得到此结论，无须修改源码。

3、一般来说，读写 feature report 类似于命令协议本身的传输（当然命令本身也可带数据），而读写则是读写大量数据。实情如何，取决下位机的实现。比如，可以通过 feature report 来实现固件的更新。

4、feature report 第1个字节必须是 ID 值，且必须与下位机一致，否则无法建立传输，对于 hidapi 而言，即为读写失败。传输的数据也要保证正确。
5、写 feature 报告时，必须额外添加1字节的 ID并且必须在第0位置，否则会失败。获取时，真正数据在 ID 之后，所以要跳过1字节。

小结

本文仅对 hidapi 进行简单的了解及测试，其提供的接口友好并且基本满足一般项目的应用需求。
测试代码已上传 GitHub 仓库：

USB HID学习：一点开发记录

2020-02-12T14:00:00.000Z

某天，曾经的前同事找我，说有个USB项目。因为知道我当时离职在找工作，于是转给我，然后介绍客户给我。

了解需求后，我分析了一下，主要是与USB设备通信的上位机，MFC我熟悉，USB找了一个开源的库hidapi，可跨平台运行，下载编码，尝试读取鼠标信息，获取到信息。于是确定可以接，就答应客户。之后深入了解需求。认为难度在于MFC绘图以及HID协议交互。最后决定分阶段实施，当然，费用也是分阶段付。

第一阶段

前期做界面，完成2种版本发给客户，客户选中其中一款。同时寄硬件设备过来。
在板子未到之前，完成主体界面窗口以及主要布局。研究MFC程序开机自动启动，系统托盘等功能。

板子到之后，就开始研究HID通信。客户发来一个工具读取参数的，可与设备正常通信。不过自编写的代码读取不到参数。重新研究HID协议，安装bus hound抓USB包，对照协议分析报文，对HID有一点认识。深入跟踪hidapi库源码，发现打开USB设备时出错，具体来说，枚举阶段，以读的方式打开，其后使用读写方式，但失败，返回ERROR_ACCESS_DENIED（错误码为5L），于是再使用读方式打开，成功。于是怀疑是因为读写方式打开失败的原因。网上说windows10系统不让以读写方式打开HID，切换win7虚拟机，测试，效果一样。在Linux系统用root权限跑同样代码，却正常。一度陷入困境。

研究了几天，实在无法，跟客户反映困难，客户找了份C#的代码，看了里面标记的时间，虽然有些年头了，但也尝试跑，发现可以。即：同样使用windows10系统，是能正常与HID设备通信的（其实先前工具能读取参数亦证明了）。于是分析C#代码，发现在收发feature report的地方有问题。原来feature report对大小有规定，目前知道是32个字符（使用hidapi等代码，必须多加一个ID的字节，共33字节），如果不符合长度要求，会返回错误，错误码为87L（即ERROR_INVALID_PARAMETER）。修改了，一切正常，难题解决。

接着研究通过USB写数据到板子上的flash上，亦遇到问题。通过bus hound分析，发现写的数据有部分与烧写文件二进制对应不上。找了很久，发现代码的偏移量范围有问题。由于feature report只有32字节，一次要写512字节，所以要循环多次，偏移量变量使用uint8，范围只有255，所以512字节都是前面128字节的内容，内容出错了，当然失败。修正后，一切正常。

到此，第一阶段结束，界面布局、语言切换、操作flash，全部完成，从第一天接触至完成，耗时三周。为了给客户一个良好印象，赶了进度，晚上基本搞到1~2点，周末大部分时间也在搞。虽然赋闲在家，但因为要照料小孩，也要煮饭买菜，日常琐事也占用很多时间。还好，客户如期打款，暂时缓解了燃眉之急。

后续阶段

后续阶段基本没有真正的难度，要说耗时的，主要是需求的不确定性，由于对背景及行业知识了解不多，很多时候，客户所述的需求都很多简单，但对我而言并不简单，所以来回多次沟通。
客户提到要支持 windows xp 和 windows 7 系统，由于前期没有确认这点，所以选择 VS 2015 开发，经测试，还是不能在 xp 上运行，于是跟客户反馈，最终确认无须支持 xp 系统。
由于要显示校准的过程，所以需要画光标，并进行闪烁，最大支持9点，约十年前知道了 tslib 库，当在 GitHub 上看到 tslib 的十字形图标时，倍感亲切。于是在 MFC 中实现了一模一样的图案。不知这叫抄袭还是叫致敬。
另一块画图相关的是坐标及柱状图显示，但无论怎样，也找不到根据鼠标缩放的方案，也是由于这个原因，开始学习 Qt，想看看另一套图形开发框架的效果。当然，这是另一话题了。

程序设计

由于个人崇尚简洁，因此原则上最终的程序只有一个exe，不依赖其它文件（当然，有些系统级别的dll，不在此列）。所以将 hidapi 源码文件直接添加到工程项目中，再将其封装为类。接着是中英文的切换，开始考虑使用po来实现语言的翻译，但实施起来过于复杂。既要编写语言文件，又要编译，所以舍弃。最终使用笨方式，在界面设计之时使用中文，然后再在代码中切换。语言切换和开机启动等标志写到注册表中。
此项目中，使用到如下技术：HID 数据读写、USB 设备拔插检测、父子窗口通信、开机启动、系统托盘、注册表读写、控件画图。

问题小结

1、打开HID设备返回 ERROR_ACCESS_DENIED（错误码为5L）问题。
失败的原因是，Windows认为鼠标、键盘，不应该用读写（实际影响的应该是“写”）方式打开。为安全起见，因此不提供写机制。hidapi 作者在 GitHub 的 issue 中亦提到这点。

打开设备函数为 open_device，如下：

static HANDLE open_device(const char *path, BOOL open_rw)
{
    HANDLE handle;
    DWORD desired_access = (open_rw)? (GENERIC_WRITE | GENERIC_READ): 0;
    DWORD share_mode = FILE_SHARE_READ|FILE_SHARE_WRITE;

    handle = CreateFileA(path,
        desired_access,
        share_mode,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED,/*FILE_ATTRIBUTE_NORMAL,*/
        0);

    return handle;
}

实际使用CreateFileA函数，传入参数有二：路径以及权限（是否读写）。枚举时，不使用读写。即CreateFileA第二个参数dwDesiredAccess为0。在正式打开时，先尝试读写（一般会失败），失败后再用0，此时成功。误入死胡同，以为发送数据一定要读写方式打开，关键语句：

1	desired_access = (open_rw)? (GENERIC_WRITE \| GENERIC_READ): 0;

研究很久，也设置过STANDARD_RIGHTS_READ、STANDARD_RIGHTS_WRITE，但失败。
后找到一篇类似问题的帖子。里面提到的问题虽然类似，但本质不同。帖子作者是同一USB设备被识别出2个设备，因此可以通过path判断出HID class那一个。但是作者并不成功，鉴于问题表现不同，没细看帖子（英文资料都是跳着看的）。

2、feature report 发送问题。
发送report，出错，在hid_send_feature_report函数，即HidD_SetFeature函数用GetLastError获取错误码，返回87L（即ERROR_INVALID_PARAMETER）。改report报文大小为0x20+1，成功，bus hound可捕获到。（观察bus hound以及网上一些报文分析（长度为0x20），结合ERROR_INVALID_PARAMETER，猜测可能是长度问题，改之，亦成功。）

心得体会

开始之时，先完成主体框架，再慢慢细化，先有大纲，再有细节，做到胸有成竹，不怕花时间修改，否则要等某个前置资源，如果资源不到位，只能等，一切都是空想。另外，整体架构好之后，不一定按需求前后实现，可以切换，即把多个需求错峰实现，这样，利用大脑的潜时间、暗时间帮我们思考问题。通俗地说，当遇到一个坎时，可以先跳过，过几天可能想到解决之法。做其它事也类似，比如写书。

对于结构体、移位或crc计算等，需要限制变量的位数。但是对于长度、返回值、偏移量等，直接用int即可，在PC领域开发，不考虑字节的节省。这个在开发时没有引起足够重视，导致花费一定时间。

关于需求，其实谁也不能保证一开始就十分准确和完整，都是慢慢补充的。有时候，客户也不知道要实现什么样的功能，做成什么样的东西，此时，我们可以引导客户，甚至先按自己想法完成一版，让客户评估。如果我们等客户，客户等我们，这样徒耗时间，于项目无补。如果谁也不提方案，则自行提出，如果谁也不提意见，则按自己的意见。这是我比较喜欢的做事方式。

做这个项目时，汲取了之前的经验，一切从实际问题出发，追求速度，不扩展研究技术。先完成，再慢慢回顾和总结。其实，编程这么多年，在开发时会有自己的一套准则，以目前来看，开发必须要有git版本管理，编码格式和注释，简短的开发手记，如果不做这些，会觉得不安心。

项目小结

说实话，现在回顾，当初的评估是有点冒险的。我只是使用 hidapi 获取了鼠标的信息，但写数据未测试（注：尝试写数据给鼠标，失败）。因此，首次调试硬件时就遇到大问题，一度以为案子会失败。幸好客户给了 C# 代码，幸好有一点点 C# 基础，跟踪调试后最终解决问题。解决过程中到网卡查阅大量资料，还研究了 hidapi 源码。
无论怎样，从开发中的紧张担心到现在的释然，成就感还是有一点的，因为积累了一个方面的开发经验，除了熬夜的不良后果外，其它都是好的。

USB HID学习：MFC检测USB拔插事件

2020-02-11T15:00:00.000Z

MFC具备检测设备的消息，但需要手动添加。针对USB设备，需要注册对应的GUID方可。本文对此进行简单记录。
本省略对MFC机制的描述，仅描述主要的模块代码。

一、步骤

Dbt.h头文件引用

在stdafx.h(或有关的头文件）添加Dbt.h头文件的引用：

#include

注册USB设备GUID

在对话框初始化函数中注册：

BOOL CFooDlg::OnInitDialog()
{
    CDialogEx::OnInitDialog();

    // 将“关于...”菜单项添加到系统菜单中。

    // ...

    // 注册HID事件
    DEV_BROADCAST_DEVICEINTERFACE DevBroadcastDeviceInterface;

    memset(&DevBroadcastDeviceInterface, 0, sizeof(DEV_BROADCAST_DEVICEINTERFACE));
    DevBroadcastDeviceInterface.dbcc_size = sizeof(DEV_BROADCAST_DEVICEINTERFACE);
    DevBroadcastDeviceInterface.dbcc_devicetype = DBT_DEVTYP_DEVICEINTERFACE;
    // HID设备的GUID，可在设备管理器中查询，经查结果如下：
    // {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
    // 注：使用真实的HID的GUID，反正检测不出来，如果是其它的GUID，所有USB事件都能检测出
    const GUID GUID_DEVINTERFACE_LIST[] = {
        { 0xA5DCBF10, 0x6530, 0x11D2,{ 0x90, 0x1F, 0x00, 0xC0, 0x4F, 0xB9, 0x51, 0xED } }, // USB设备
        { 0x53f56307, 0xb6bf, 0x11d0,{ 0x94, 0xf2, 0x00, 0xa0, 0xc9, 0x1e, 0xfb, 0x8b } }, // 磁盘（U盘）
        { 0x4D1E55B2, 0xF16F, 0x11CF,{ 0x88, 0xCB, 0x00, 0x11, 0x11, 0x00, 0x00, 0x30 } }, // HID
        { 0x745A17A0, 0x74D3, 0x11D0,{ 0xB6, 0xFE, 0x00, 0xA0, 0xC9, 0x0F, 0x57, 0xDA } }, // 另一个HID
        { 0xad498944, 0x762f, 0x11d0,{ 0x8d, 0xcb, 0x00, 0xc0, 0x4f, 0xc3, 0x35, 0x8c } } }; // 网卡
    // 可以循环注册所有列出的GUID，此处只使用一种
    DevBroadcastDeviceInterface.dbcc_classguid = GUID_DEVINTERFACE_LIST[2];

    RegisterDeviceNotification(this->GetSafeHwnd(), &DevBroadcastDeviceInterface, DEVICE_NOTIFY_WINDOW_HANDLE);

}

说明1：不同的USB设备使用不同的GUID表示。在注册时需要指定要检测哪一类，本文针对HID，有兴趣者可使用其它来测试。
说明2：笔者使用的键盘有多个USB设备，其一为HID设备，在设备管理器中查询其类GUID为745a17a0-74d3-11d0-b6fe-00a0c90f57da。
说明3：查询到的GUID与代码GUID结构体本质一样，形式不同。具体参考定义。

消息函数声明

在对话框头文件声明消息函数：

1	afx_msg BOOL OnDeviceChange(UINT nEventType, DWORD dwData);

消息声明

在对话框实现文件中添加ON_WM_DEVICECHANGE消息：

BEGIN_MESSAGE_MAP(CFooDlg, CDialogEx)
    ON_WM_SYSCOMMAND()
    ON_WM_PAINT()
    ON_WM_QUERYDRAGICON()
    ON_MESSAGE(WM_SHOWTASK, OnSystemtray)
    // ...
    ON_WM_SIZE()
    ON_WM_DESTROY()
    ON_WM_DEVICECHANGE() // USB HID设备检测消息
END_MESSAGE_MAP()

消息响应函数实现

下面实现OnDeviceChange函数：

BOOL CFooDlg::OnDeviceChange(UINT nEventType, DWORD dwData)
{
    DEV_BROADCAST_DEVICEINTERFACE* dbd = (DEV_BROADCAST_DEVICEINTERFACE*)dwData;
    
    wchar_t vid[32] = { 0 };
    int offset = 4 * 3 + sizeof(GUID)+10;
    CString szInfo;
    int sendtype = 0;
    switch (nEventType)
    {
    case DBT_DEVICEARRIVAL:
    {
        memcpy(vid, (char*)dwData + offset, 32);
        wchar_t* rr = wcsstr(vid, L"VID_AA55"); // ！！ 可过滤特定设备ID，下同
        if (rr == NULL)
        {
            return FALSE;
        }

        szInfo.Format(L"提示信息: 设备已插入.\n");

        this->GetDlgItem(IDC_STC_DEVINFO)->SetWindowText(szInfo);

    }
    break;
    case DBT_DEVICEREMOVECOMPLETE:
    {
        // 注：dbd->dbcc_name只有1个字节，不能直接用其来做源地址拷贝，直接使用偏移，上同
        //wmemcpy(vid, (wchar_t*)dwData + offset, 32);
        memcpy(vid, (char*)dwData + offset, 32);
        wchar_t* rr = wcsstr(vid, L"VID_AA55");
        if (rr == NULL)
        {
            return FALSE;
        }

        szInfo.Format(L"提示信息: 设备已移除.\n");

        this->GetDlgItem(IDC_STC_DEVINFO)->SetWindowText(szInfo);

    }
    break;

    default:
    {
        //szInfo.Format(L"[%d]got event: %d\n", cnt, nEventType);
        //this->GetDlgItem(IDC_STC_DEVINFO)->SetWindowText(szInfo);
    }
    break;
    }

    return TRUE;
}

注1：只有注册的设备，nEventType才有DBT_DEVICEARRIVAL、DBT_DEVICEREMOVECOMPLETE（当然也有其它值，按下不提），如果不注册，nEventType的值为7。
注2：查了些资料，说nEventType值不同，dwData亦不同。但本文没有深入研究。
注3：如果要针对某一种设备，如所属为HID，但厂家不同，则可以通过查找VID关键字来过滤。文中代码使用偏移量外加字符串搜索来实现，仅作示例，有些绕，但能实现功能。

后在 Windows 7 系统上用 VS 2015 编译，发现 ON_WM_DEVICECHANGE 出错，经排查，定位到

1	afx_msg BOOL OnDeviceChange(UINT nEventType, DWORD dwData);

声明不兼容。修改为：

1	afx_msg BOOL OnDeviceChange(UINT nEventType, DWORD_PTR dwData);

可解决编译问题，并且在 Win10 也能正常编译。

二、测试

使用

1	0xA5DCBF10, 0x6530, 0x11D2,{ 0x90, 0x1F, 0x00, 0xC0, 0x4F, 0xB9, 0x51, 0xED }

可以检测出所有的USB设备事件。包括U盘、键盘等。
使用

1	0x53f56307, 0xb6bf, 0x11d0,{ 0x94, 0xf2, 0x00, 0xa0, 0xc9, 0x1e, 0xfb, 0x8b }

只能检测出U盘事件。
使用

1	0x745A17A0, 0x74D3, 0x11D0,{ 0xB6, 0xFE, 0x00, 0xA0, 0xC9, 0x0F, 0x57, 0xDA }

检测不出HID事件（此处原因未知）。
但是，使用

1	0x4D1E55B2, 0xF16F, 0x11CF,{ 0x88, 0xCB, 0x00, 0x11, 0x11, 0x00, 0x00, 0x30 }

可以检测出HID事件。

三、小结

本文不过多涉及检测原理，代码测试通过。
需要指出的是，在 Windows 上使用 Qt 编程检测 USB 事件，也是使用本文所提到的技术，包括注册、响应事件。毕竟，无论 MFC 还是 Qt 程序，都是在 Windows 上运行的，可谓殊途同归。当然，如 Linux 或 MacOS 系统，机制已然不同，不在此列。

USB HID学习：GUID列表

2020-02-11T12:02:00.000Z

本文列出windows系统常见USB设备及GUID，所有内容来自网络。

1394 Host Bus Controller
Class = 1394
ClassGuid = {6bdd1fc1-810f-11d0-bec7-08002be2092f}
This class includes system-supplied drivers of 1394 host controllers connected on a PCI bus, but not drivers of 1394 peripherals.

Battery Devices
Class = Battery
ClassGuid = {72631e54-78a4-11d0-bcf7-00aa00b7b32a}
This class includes drivers of battery devices and UPSes.

 

CD-ROM Drives
Class = CDROM
ClassGuid = {4d36e965-e325-11ce-bfc1-08002be10318}
This class includes drivers of CD-ROM drives, including SCSI CD-ROM drives. By default, the system's CD-ROM class installer also installs a system-supplied CD audio driver and CD-ROM changer driver as PnP filters.

 

Disk Drives
Class = DiskDrive
ClassGuid = {4d36e967-e325-11ce-bfc1-08002be10318}
This class includes drivers of hard disk drives. See also the HDC and SCSIAdapter classes.

 

Display Adapters
Class = Display
ClassGuid = {4d36e968-e325-11ce-bfc1-08002be10318}
This class includes drivers of video adapters, including display drivers and video miniports.

 

Floppy Disk Controllers 
Class = FDC
ClassGuid = {4d36e969-e325-11ce-bfc1-08002be10318}
This class includes drivers of floppy disk drive controllers.

 

Floppy Disk Drives
Class= FloppyDisk
ClassGuid= {4d36e980-e325-11ce-bfc1-08002be10318}
This class includes drivers of floppy drives.

 

Hard Disk Controllers
Class = HDC
ClassGuid = {4d36e96a-e325-11ce-bfc1-08002be10318}
This class includes drivers of hard disk controllers, including ATA/ATAPI controllers but not SCSI and RAID disk controllers.

 

Human Input Devices (HID)
Class = HIDClass
ClassGuid = {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
This class includes devices that export interfaces of the HID class, including HID keyboard and mouse devices, which the installed HID device drivers enumerate as their respective "child" devices. (See also the Keyboard or Mouse classes later in this list.)

 

Imaging Device
Class = Image
ClassGuid = {6bdd1fc6-810f-11d0-bec7-08002be2092f}
This class includes drivers of still-image capture devices, digital cameras, and scanners.

 

IrDA Devices
Class = Infrared
ClassGuid = {6bdd1fc5-810f-11d0-bec7-08002be2092f}
This class includes Serial-IR and Fast-IR NDIS miniports, but see also the Network Adapter class for other NDIS NIC miniports.

 

Keyboard
Class = Keyboard
ClassGuid = {4d36e96b-e325-11ce-bfc1-08002be10318}
This class includes all keyboards. That is, it also must be specified in the (secondary) INF for an enumerated "child" HID keyboard device.


Medium Changers
Class= MediumChanger
ClassGuid= {ce5939ae-ebde-11d0-b181-0000f8753ec4}
This class includes drivers of SCSI media changer devices.

Memory Technology Driver
Class = MTD
ClassGUID = {4d36e970-e325-11ce-bfc1-08002be10318}
This class includes drivers for memory devices, such as flash memory cards.

Multimedia
Class = Media
ClassGuid = {4d36e96c-e325-11ce-bfc1-08002be10318}
This class includes Audio and DVD multimedia devices, joystick ports, and full-motion video-capture devices.

Modem
Class = Modem
ClassGuid = {4d36e96d-e325-11ce-bfc1-08002be10318}
This class installs modems. An INF for a device of this class installs no device driver(s), but rather specifies the features and configuration information of a particular modem and stores this information in the registry. See also the Multifunction class.

Monitor
Class = Monitor
ClassGuid = {4d36e96e-e325-11ce-bfc1-08002be10318}
This class includes display monitors. An INF for a device of this class installs no device driver(s), but rather specifies the features of a particular monitor to be stored in the registry for use by drivers of video adapters. (Monitors are enumerated as the child devices of display adapters.)

Mouse
Class = Mouse
ClassGuid = {4d36e96f-e325-11ce-bfc1-08002be10318}
This class includes all mice and other kinds of pointing devices, such as trackballs. That is, it also must be specified in the (secondary) INF for an enumerated "child" HID mouse device.

Multifunction Devices
Class = Multifunction
ClassGuid = {4d36e971-e325-11ce-bfc1-08002be10318}
This class includes combo cards, such as a PCMCIA modem and netcard adapter. The driver for such a PnP multifunction device is installed under this class and enumerates the modem and netcard separately as its "child" devices.

Multi-port Serial Adapters
Class = MultiportSerial
ClassGuid = {50906cb8-ba12-11d1-bf5d-0000f805f530}
This class includes intelligent multiport serial cards, but not peripheral devices that connect to its ports. It does not include unintelligent (16550-type) mutiport serial controllers or single-port serial controllers (see the Ports class).

Network Adapter
Class = Net
ClassGuid = {4d36e972-e325-11ce-bfc1-08002be10318}
This class includes NDIS NIC miniports excluding Fast-IR miniports, NDIS intermediate drivers (of "virtual adapters"), and CoNDIS MCM miniports.

Network Client
Class = NetClient
ClassGuid = {4d36e973-e325-11ce-bfc1-08002be10318}
This class includes network and/or print providers.

 

Network Service
Class = NetService
ClassGuid = {4d36e974-e325-11ce-bfc1-08002be10318}
This class includes network services, such as redirectors and servers.

 

Network Transport
Class = NetTrans
ClassGuid = {4d36e975-e325-11ce-bfc1-08002be10318}
This class includes NDIS protocols, CoNDIS stand-alone call managers, and CoNDIS clients, as well as higher level drivers in transport stacks.

 

PCMCIA Adapters
Class = PCMCIA
ClassGuid = {4d36e977-e325-11ce-bfc1-08002be10318}
This class includes system-supplied drivers of PCMCIA and CardBus host controllers, but not drivers of PCMCIA or CardBus peripherals.

 

Ports (COM & LPT serial ports)
Class = Ports
ClassGuid = {4d36e978-e325-11ce-bfc1-08002be10318}
This class includes drivers of serial or parallel port devices, but see also the MultiportSerial class.

 

Printer
Class = Printer
ClassGuid = {4d36e979-e325-11ce-bfc1-08002be10318}
This class includes printers.

 

SCSI and RAID Controllers
Class = SCSIAdapter
ClassGuid = {4d36e97b-e325-11ce-bfc1-08002be10318}
This class includes SCSI HBA miniports and disk-array controller drivers.

 

Smart Card Readers
Class = SmartCardReader
ClassGuid = {50dd5230-ba8a-11d1-bf5d-0000f805f530}
This class includes drivers for smart card readers.

 

Storage Volumes
Class = Volume
ClassGuid = {71a27cdd-812a-11d0-bec7-08002be2092f}
This class includes storage volumes as defined by the system-supplied logical volume manager and class drivers that create device objects to represent storage volumes, such as the system disk class driver.

 

System Devices
Class = System
ClassGuid = {4d36e97d-e325-11ce-bfc1-08002be10318}
This class includes the Windows® 2000 HALs, system bus drivers, the system ACPI driver, and the system volume-manager driver. It also includes battery drivers and UPS drivers.

 

Tape Drives
Class = TapeDrive
ClassGuid = {6d807884-7d21-11cf-801c-08002be10318}
This class includes drivers of tape drives, including all tape miniclass drivers.

 

USB
Class = USB
ClassGuid = {36fc9e60-c465-11cf-8056-444553540000}
This class includes system-supplied (bus) drivers of USB host controllers and drivers of USB hubs, but not drivers of USB peripherals.

The following classes and GUIDs should not be used to install devices (or drivers) on Windows 2000 platforms:

 

Adapter
Class = Adapter
ClassGUID = {4d36e964-e325-11ce-bfc1-08002be10318}
This class is obsolete.

 

APM
Class = APMSupport
ClassGUID = {d45b1c18-c8fa-11d1-9f77-0000f805f530}
This class is reserved for system use.

 

Computer
Class = Computer
ClassGUID = {4d36e966-e325-11ce-bfc1-08002be10318}
This class is reserved for system use.


Decoders
Class = Decoder
ClassGUID = {6bdd1fc2-810f-11d0-bec7-08002be2092f}
This class is reserved for future use.

 

Global Positioning System
Class = GPS
ClassGUID = {6bdd1fc3-810f-11d0-bec7-08002be2092f}
This class is reserved for future use.

No driver
Class = NoDriver
ClassGUID = {4d36e976-e325-11ce-bfc1-08002be10318}
This class is obsolete.


Non-Plug and Play Drivers
Class = LegacyDriver
ClassGUID = {8ecc055d-047f-11d1-a537-0000f8753ed1}
This class is reserved for system use.

Other Devices
Class = Unknown
ClassGUID = {4d36e97e-e325-11ce-bfc1-08002be10318}
This class is reserved for system use. Enumerated devices for which the system cannot determine the type are installed under this class. Do not use this class if you're unsure in which class your device belongs; either determine the correct device setup class or create a new class.

Printer Upgrade
Class = Printer Upgrade
ClassGUID = {4d36e97a-e325-11ce-bfc1-08002be10318}
This class is reserved for system use.

Sound
Class = Sound
ClassGUID = {4d36e97c-e325-11ce-bfc1-08002be10318}
This class is obsolete.

USB Mass Storage Device
ClassGUID = a5dcbf10-6530-11d2-901f-00c04fb951ed

USB HID学习：数据包分析

2020-02-09T15:40:00.000Z

本文使用 Bus Hound 工具对 USB HID 设备数据包进行分析，并结合官方手册及网上文章进行整理。文中未提到的知识，建议移步参考资源。
以笔者经验，直接阅读协议无法直观理解，最好使用工具抓包，结合协议文档分析真实数据，ONVIF协议如是，IEEE802.3(802.11)如是，USB协议亦如是。

一、前置知识

1.1 描述符

USB 主机是通过各种描述符来识别设备的，有设备描述符，接口描述符，端点描述符，字符描述符，报告描述符等。
回到 HID，USB 主机在请求HID设备的配置描述符时，设备首先返回的描述符为：设备描述符、配置描述符、接口描述符、HID描述符、端点描述符。下文分析的数据，基本就是按照这些顺序。
USB HID 设备是通过报告来传送数据的，报告有：输入报告、输出报告、特性报告。
输入报告：是设备发送给主机的，例如 usb鼠标将鼠标移动和鼠标点击的信息返回给电脑，键盘将按键数据返回给电脑。输入报告是通过中断输入端点输入的。
输出报告：是主机发送给USB设备的，例如键盘上的数字键盘锁定灯和大写字母锁定灯等。报告是一个数据包，里面包含的是所要传送的数据。
报告描述符：是描述一个报告以及报告里面的数据是用来干什么的。通过它，USB 主机可以分析出报告里面的数据所要表达的意思。

1.2 数据格式

请求命令数据共8字节，在 Bus Hound 中，设置数据前面带有 CTL 字样。其格式如图1所示：

bmRequestType 表示请求类型，从图中看，其组成似乎复杂，但实际分析数据后，发现也就那几个特定的值，如0x80/0x81/0x20/0x21，等等。
wValue 可以理解为2个字节的数值，具体表达的意思，取决于请求类型。注意，USB 使用小端格式，超过2字节需调整顺序。
wIndex 类似，但一般表示索引或偏移量。
bRequest 使用1字节表示具体的请求码，如图2所示。

1.3 其它

bDescriptorType 表示描述符类型，主要有：

1：设备描述符
2：配置描述符
3：字符串描述符
4：接口描述符
5：终端描述符
0x21：HID描述符
0x22：报告描述符
0x23：物理描述符

设备类别bDeviceClass：

0x00：接口描述符中提供类的值
0x02：通信类
0x09：集线器类
0xDC：用于诊断用途的设备类
0xE0：无线通信设备类
0xFF：厂商定义的设备类

接口类别bInterfaceClass：

0x01：音频类
0x02：CDC控制类
0x03：HID人机接口类
0x05：物理类
0x06：图像类
0x07：打印机类
0x08：大数据存储类
0x09：集线器类
0x0A：CDC数据类
0x0B：智能卡类
0x0D：安全类
0xDC：诊断设备类
0xE0：无线控制器类
0xFE：特定应用类（包括红外的桥接器等）
0xFF：厂商定义的设备

HID接口描述符中bInterfaceProtocol：

0：NONE
1：键盘
2：鼠标
3~255：保留

HID 报告类别：

1：输入报告
2：输出报告
3：特征报告(feature report)
04-ff：保留

二、Bus Hound 使用

该软件下载安装过程省略。
点击工具栏的各项图标中进行操作。
“Capture”为抓包界面，右下角有开始和停止按钮。
“Save”可将抓取的数据保存为文件，方便日后分析。
“Settings”需要加大数据包的长度，否则，捕获的数据显示不全，导致误解。笔者所用参数如图3所示：

“Devices”可查看 USB 设备，并选择需要抓包的设备。如图4所示：

三、抓包分析

本文主要使用 USB 键盘的数据进行分析。打开工具监听，再插入键盘，得到数据。其抓包如图5所示：

保存的数据示例如下：

  Device - Device ID (followed by the endpoint for USB devices)
            (31) USB Composite Device
            (34) USB Input Device
            (35) USB Input Device
            (36) HID Keyboard Device
            (37) HID-compliant consumer control device
            (38) HID Keyboard Device
            (39) HID Keyboard Device
  Phase  - Phase Type
            CTL   USB control transfer       
            IN    Data in transfer           
            OUT   Data out transfer          
  Data   - Hex dump of the data transferred
  Descr  - Description of the phase
  Cmd... - Position in the captured data


Device  Phase  Data                      Description       Cmd.Phase.Ofs(rep)
------  -----  ------------------------  ----------------  ------------------
  31.0  CTL    80 06 00 01  00 00 12 00  GET DESCRIPTOR           1.1.0        
  31.0  IN     12 01 10 01  00 00 00 08  ........                 1.2.0        
               6d 02 02 00  01 01 00 00  m.......                 1.2.8        
               00 01                     ..                       1.2.16

第一部分是描述保存的数据格式，第二部分是真正的数据。第2列 Phase 表示数据的类型，最后一列表示数据的序号(如1.x表示第1次发送/返回的信息，2.x表示第2次，以此类推）。为方便分析 USB 请求流程，下面从开始的数据包进行分析——亦从 USB 设备插入到 USB 主机开始（最后一列从1.x开始）。

3.1 设备描述符

一个USB设备只有一个设备描述符。设备描述符主要记录的信息有：设备所使用的USB协议版本号、设备类型、端点0的最大包大小、厂商ID（VID，由 USB 组织分配）和产品ID（PID）、设备版本号、厂商字符串索引、产品字符串索引、设备序列号索引、可能的配置数等。

数据如下：

31.0  CTL    80 06 00 01  00 00 12 00  GET DESCRIPTOR           1.1.0        
31.0  IN     12 01 10 01  00 00 00 08  ........                 1.2.0        
             6d 02 02 00  01 01 00 00  m.......                 1.2.8        
             00 01                     ..                       1.2.16

请求数据为：

1	31.0 CTL 80 06 00 01 00 00 12 00

解析如下：

bmRequestType 80：数据方向从设备端到主机端；标准的请求；USB设备接收
bRequest 06：请求为 GET_DESCRIPTOR
wValue 00 01：(?)
wIndex 00 00：从偏移地址0开始读取设备描述符
wLength 12 00：下一阶段数据的长度为18个字节（小端格式，实际为0x0012，即18）

返回数据字段说明如图6所示。

数据为：

1
2
3

31.0  IN     12 01 10 01  00 00 00 08  ........                 1.2.0        
             6d 02 02 00  01 01 00 00  m.......                 1.2.8        
             00 01                     ..                       1.2.16

解析如下：

12 长度为18
01 表示设备描述符
1001 转换后为0110，表示USB协议版本1.1 （注：USB协议版本使用bcd表示）
00 设备类型（USB分配）
00 设备子类
00 协议码
08 端点0的最大包为8（注：仅有8、16、32、64这几个值）
6d02 VID，转换后为0x026d
0200 PID，转换后为0x0002
0101 设备版本号

本文使用的键盘信息如图7所示(主要核对VID和PID)：

3.2 配置描述符

设备描述符里决定了该设备有多少种配置，每种配置都有一个配置描述符。配置描述符主要记录的信息有：配置所包含的接口数、配置的编号、供电方式、是否支持远程唤醒、电流需求量等。
数据如下：

31.0  CTL    80 06 00 02  00 00 3b 00  GET DESCRIPTOR           3.1.0        
31.0  IN     09 02 3b 00  02 01 00 a0  ..;.....                 3.2.0        
             32 09 04 00  00 01 03 01  2.......                 3.2.8        
             01 00 09 21  11 01 00 01  ...!....                 3.2.16       
             22 41 00 07  05 81 03 08  "A......                 3.2.24       
             00 0c 09 04  01 00 01 03  ........                 3.2.32       
             00 00 00 09  21 11 01 00  ....!...                 3.2.40       
             01 22 5b 00  07 05 82 03  ."[.....                 3.2.48       
             08 00 0c                  ...                      3.2.56

（注：2.x数据与3.x有重复，故舍去）

请求：

31.0  CTL    80 06 00 02  00 00 09 00

bmRequestType 80：数据方向从设备端到主机端；标准的请求；USB设备接收
bRequest 06：请求为 GET_DESCRIPTOR
wValue 00 02：?
wIndex 00 00：从偏移地址0开始读取设备描述符
wLength 09 00：下一阶段数据的长度为9个字节（小端格式，实际为0x0009，即9）

返回数据字段说明如图8所示。

数据：

  31.0  IN     09 02 3b 00  02 01 00 a0 
               32 09 04 00  00 01 03 01 
               01 00 09 21  11 01 00 01 
               22 41 00 07  05 81 03 08 
               00 0c 09 04  01 00 01 03 
               00 00 00 09  21 11 01 00 
               01 22 5b 00  07 05 82 03 
               08 00 0c                
09 本描述符数据长度
02 类型，表示配置描述符
3b00 即003b，表示此次数据长度。包括其它描述符（配置、接口、终端和HID）的总长度
02 本配置支持的接口数量为2
01 设置配置命令(Set Configuration)的参数值
00 字符串描述符索引值，0表示没有
a0 电源和唤醒方式 a0表示总线供电（Bus Powered），远程唤醒（Remote Wakeup）
32：耗电电流，单位为2mA，此值表示50(0x32)*2=100mA

这里要说明的是，与配置描述符一起返回的有另外6个描述符。关键数据为09 04（2个）、09 21（2个）、07 05（2个）。在接下来的小节继续分析。

3.3 接口描述符

配置描述符之后紧接着就是接口描述符，接口描述符指明了接口的类型，对应的端点的数量。
在每个配置描述符中又定义了该配置有多少个接口，每个接口都有一个接口描述符。接口描述符主要记录的信息有：接口的编号、接口的端点数、接口所使用的类、子类、协议等。
本描述符字段说明如图9所示。

09 04 00 00 01 03 01 01 00    (1)
09 04 01 00 01 03 00 00 00    (2)

09 本描述符长度
04 类型值，表示接口描述符
00 接口数量为0
00 备用的接口描述符编号
01 接口终端数量为1
03 接口类型值，3表示HID（由USB分配）
01 子类型
01 协议码，1表示键盘。2为鼠标，0为无
00 本接口字符串描述符索引

可以看到，这里接口描述符指定的接口类别为 HID。注意，此处显示的是2个描述符数据，(1) 表示是键盘，但(2)却不是，原因为何，暂无深究。

3.4 HID描述符

HID 描述符指定了 HID 规范版本、HID 相关描述符类型（注：物理描述符不是必须的）。
本描述符字段说明如图10所示。

09 21 11 01 00 01 22 41 00
09 21 11 01 00 01 22 5b 00

09 本描述符长度
21 类别，21为HID描述符
1101 转换后为0111，表示USB协议版本为1.11(bcd码)
00 国家码
01 HID描述符数量为1
22 描述符类型，0x22为报告描述符，0x21为HID描述符，0x23为物理描述符
4100 描述符长度，此处为0x0041

3.5 端点描述符

端点描述符描述了数据的传输类型、传输方向、数据包大小和端点号（也可称为端点地址）等。
本描述符字段说明如图11所示。

07 05 81 03 08 00 0c
07 05 82 03 08 00 0c

07 本描述符长度
05 类别，5表示端点描述符
81 端点地址，Bit7表示方向，1为输入，0为输出，低4比特为端点号。81为输入的1号，82为输入的2号
03 端口属性，00表示控制，01为同步，02为批量，03为中断
0800 转换后为0x0008，表示最大包长度为8
0c 轮询时间间隔，单位ms

四、报告描述符

HID 特有的描述符共6种，本节分析其中的2种（另外的拿不到数据），捕获的数据来源一款 HID 设备。
请求数据遵循图1格式，具体如图12所示。

从图中可知，HID 请求类别只有0x21或0xa1两种。6种描述符请求如图13所示。

4.1 设置报告描述符

请求数据及解析：

  47.0  CTL    21 09 00 03  00 00 20 00  SET REPORT              19.1.0 
21 请求类别 0x21最高比特为0，表示数据方向从主机到设备（即输出）
09 请求，9表示设置报告
0003 低字节为报告ID，其值为0，高字节为报告类别，3表示 feature，1为输入报告，2为输出报告
0000 索引值
2000 转换后为0x0020，表示报告数据长度为32字节

本描述符字段说明如图14所示。

设置的输出数据示例如下：

47.0  OUT    55 55 01 4c  61 74 65 01  UU.Late.                19.2.0 
             02 c2 00 00  00 00 00 00  ........                19.2.8 
             00 00 00 00  00 00 00 00  ........                19.2.16
             00 00 00 00  00 00 00 00  ........                19.2.24

4.2 获取报告描述符

请求数据及解析：

  47.0  CTL    a1 01 00 03  00 00 20 00  GET REPORT              20.1.0 
a1 请求类别 0xa1最高比特为1，表示数据方向从设备到主机（即输入）
01 请求，1表示获取报告
后面数据同上

本描述符字段说明如图15所示。

输入的数据示例如下：

47.0  IN     55 55 01 4c  61 74 65 01  UU.Late.                20.2.0 
             03 c3 00 00  00 00 00 00  ........                20.2.8 
             00 00 00 00  00 00 00 00  ........                20.2.16
             00 00 00 00  00 00 00 00  ........                20.2.24

在开发中，报告ID是十分重要的，前面示例的ID为0，下面给出给出报告ID为9的数据：

47.0  CTL    21 09 09 03  00 00 21 00  SET REPORT               3.1.0        
47.0  OUT    09 55 55 04  30 00 bb 00  .UU.....                 3.2.0        
             f6 00 77 00  00 00 00 00  ..w.....                 3.2.8        
             00 00 00 00  00 00 00 00  ........                 3.2.16       
             00 00 00 00  00 00 00 00  ........                 3.2.24           
47.0  CTL    a1 01 09 03  00 00 21 00  GET REPORT               5.1.0(3)     
47.0  IN     09 3c 3c 3c  3c 3c 3c 3c  .<<<<<<<                 5.2.0        
             3c 3c 3c 3c  3c 3c 3c 3c  <<<<<<<<                 5.2.8        
             3c 3c 3c 3c  3c 3c 3c 3c  <<<<<<<<                 5.2.16       
             3c 3c 3c 3c  3c 3c 3c 3c  <<<<<<<<                 5.2.24       
             3c                        <                        5.2.32

可以看到，ID 为9时，数据前面多了 ID，而 ID 为0时则没有。然而，在使用 hidapi 库设置 feature 报告时，必须额外添加1字节的 ID，否则会失败。获取时，真正数据在 ID 之后，所以要跳过1字节。

五、小结

实际上，本文只是分析了部分类别的数据，限于条件无法分析所有的数据。本着够用的目的，不再对 HID 进行深入研究。
关于报告描述符，看了几次官方文档以及网上的文章，还是摸不着头脑，就笔者不多的经验来看，其中的一个应用，可能是 USB 主机和 USB 设备进行自定义数据的传输方式。即不需要理会官方文档那些复杂的描述，当成一种固定格式的数据来分析，当然，需要双方事前约定好。
感谢网络众多的 USB 文章，本文做了很多参考，另加上个人的整理和理解，但是错误所在难免，不当之处，请方家指正。

六、参考资源

USB之（三）USB描述符和命令（请求）
USB之（四）HID设备类协议
 USB描述符解析

2020.2.9 周日深夜

USB HID学习：初识

2020-02-09T08:00:00.000Z

USB协议本身非常复杂，如果一头扎入协议中研究，会走不出来。许多年前笔者就是因为协议过于复杂放弃研究，只留下“EndPoint”、“中断传输”这些概念。
本文开始的几篇文章将介绍一下笔者所了解的 USB 知识。虽然仅限于 HID 设备，但也能一窥 USB 门径。本着实用使用、适用合用之目的，暂且抛开 USB 协议本身，以一个门外汉的心态学习 USB HID 的开发。

一、HID协议

简单来说，USB 包括了 USB 主机（USB HOST）和 USB 设备（USB Device），两者的称呼是相对而言的，通常情况，我们的 PC 机是主机，像 USB 键盘、USB 鼠标、USB 游戏杆、USB 串口线、U盘，等带 USB 接口的设备，都是 USB 设备。

在 USB 中，主机是通过设备的各种描述符来识别设备的，主要有如下几类：设备描述符，接口描述符，端点描述符，字符描述符，报告描述符。

当主机请求 HID 设备（HID 设备插入主机）时，设备返回给主机的描述符依次为设备描述符、配置描述符、接口描述符、HID描述符、端点描述符。在后续文章将分析这些描述符。
同时 HID 还扩展了其特定请求，HID设备类特定的命令（请求）有6个，它们分别是Get_Report、Get_Idle、Get_Protocol、Set_Report、Set_Idle和Set_Protocol。后续针对 Report 进行分析。

至于开发，
可以看看《圈圈教你玩 USB》这本著作。在 STM32 平台比较容易实现 USB 协议，这也是应用比较多的平台。
对于 HID 协议，主要参考《Device Class Definition for Human Interface Devices (HID)》 V1.11版本。

二、工具

一般地，第一步了解 USB 设备的，莫过于 Windows 系统的设备管理器了，点击对应的设备，可以查询许多信息，如设备VID和PID，设备路径，类GUID，等等。这些信息在开发中有重要的作用。另外，还有许多第三方的工具，如 Bus Hound 可捕获 USB 数据包，对分析协议十分有用，类似的还有 USBlyzer 软件。
首次学习 USB 协议，建议使用 Bus Hound 捕获键盘或鼠标的数据分析，将工具得到的数据与手册相结合，才能发挥作用。图1是使用 Bus Hound 选择 USB 键盘作为分析的对象。

三、开发

在代码中对 USB 设备进行操作，可以直接使用系统提供的 API，也可以使用第三方库，如 https://github.com/libusb/hidapi ，该库使用C语言实现，根据不同平台实现相同的接口，封装的接口较友好，建议使用。
一般地，很多应用场合，都是利用了 USB 接口，再在其上实现一套自定义的应用层协议。类似利用 socket 进行网络传输，但传输的数据，均使用自定义的协议。

实际上，一旦调用 USB 传输，一旦将 USB 读写（包括 feature report）封装起来，就可以关注上层协议，而不是 USB 自身协议。这也是一个重要的开发思路。

后续文章，将根据笔者的一点经验和资料收集，展开一些研究。

Qt实践录：一些界面设计的记录示例

2020-02-03T12:00:00.000Z

本文主要记录使用 Qt 实现某一些小功能的示例。

控件特定条件下显示

有些场合，需要隐藏界面某些功能，面向特定人员使用，如一些维护升级工具，面向现场支持人员和面向开发人员，所用之功能不同，但又不想同时维护多个工具，则可以隐藏部分功能。
本节演示双击某个提示语（使用 QLabel），再显示另一个按钮的功能。
0、设计
界面有2个控件：一为 QLabel，控件名称为lbShow，显示提示语（文字可以为空，放置某个角落），一为 QPushButton，控件名称为btnRegister，表示某个功能的按钮。
1、声明（实际为重载）事件过滤函数eventFilter：

1	bool eventFilter(QObject watched, QEvent event);

2、隐藏按钮，针对lbShow安装事件过滤器：

// 先隐藏注册按钮
ui->btnRegister->hide();
// 显示注册按钮的触发事件：// 双击左上方控件显示
ui->lbShow->installEventFilter(this);

3、实现事件过滤：


bool DialogDemo::eventFilter(QObject *watched, QEvent *event)
{
    // 针对lbShow控件
    if (watched == ui->lbShow)
    {
        //判断双击事件
        if (event->type() == QEvent::MouseButtonDblClick)
        {
            // 双击显示
            ui->btnRegister->show();
            return true;
        }
        else
        {
            return false;
        }
    }
    else
    {
        return QWidget::eventFilter(watched, event);
    }
}

关于eventFilter函数实现形式及返回值，可参考 Qt助手。一般来说，当自定义的事件处理完毕需要返回 true，如果不是监控的控件，交回父类处理。

扩展1：关于eventFilter函数，其作用远超上文所述，理论上可以监听检测很多按钮的各种事件，如状态栏的 QLabel 单击，QLabel 画图，等等。（笔者暂时未能大量实验）
扩展2：本节仅提出一种简单的隐藏方法，实际上，可以点击三次、四次，甚至五六次某个地方（如某个图标，某个按钮旁边），实现方式实质是监控触发的事件，再进行按钮的显示，而且，还可以使用密码授权。当然，安全和便利是矛盾体，要适当取舍。

状态栏

对于 MainWindow，Qt 设计师中有几个默认布局，中间为路面窗体，最上方为标题栏，其下为工具栏，最下方为状态栏。——这也是一般应用程序的样式。状态栏无法在设计师中拖放控件，虽是缺点，但也是优点，因为可以通过封装的函数，直接复用于项目中，当然还需要根据实际情况进行适当修改。本节给出一个简单的模板，内容有：
左侧为信息提示，接着是数据显示（如收发计数、字数、行号），右侧是系统时间，接着是版权或提示语，接着是退出图标按钮。
这些内容，均可以自由定制，不用拘泥，适合实际需求的方是最佳的。
关于状态栏的几个技术要点：
1、MainWindow 默认有 QStatusBar对象，名为 statusbar。
2、提示信息使用showMessage函数，可指定显示的时长。
3、使用addPermanentWidget添加部件，否则临时提示信息会将其覆盖掉。
下面是示例代码：

    // 状态栏相关
    QLabel* m_stsDebugInfo1;
    QLabel* m_stsDebugInfo2;
    QLabel* m_stsSysTime;
    QLabel* m_stsCopyright;
    QLabel* m_stsExit;
    
void MainWindow::initStatusBar2()
{
    // 状态栏分别为：
    // 临时信息
    // 提示信息（可多个）
    // 系统时间
    // 版本信息（或版权声明）
    // 退出图标
    m_stsDebugInfo1 = new QLabel();
    m_stsDebugInfo2 = new QLabel();
    m_stsSysTime = new QLabel();
    m_stsCopyright = new QLabel();
    m_stsExit = new QLabel();

    m_stsDebugInfo1->setMinimumWidth(100);
    ui->statusbar->addPermanentWidget(m_stsDebugInfo1);
    m_stsDebugInfo1->setText("cnt: 250");

    m_stsDebugInfo2->setMinimumWidth(100);
    ui->statusbar->addPermanentWidget(m_stsDebugInfo2);

    ui->statusbar->showMessage(tr("临时信息!"), 500);

    QDateTime dateTime(QDateTime::currentDateTime());
    QString timeStr = dateTime.toString("yyyy-MM-dd HH:mm:ss.zzz");
    m_stsSysTime->setText(timeStr);
    ui->statusbar->addPermanentWidget(m_stsSysTime);

    // 版权信息
    m_stsCopyright->setFrameStyle(QFrame::NoFrame);
    m_stsCopyright->setText(tr("  技术主页  "));
    m_stsCopyright->setOpenExternalLinks(true);
    ui->statusbar->addPermanentWidget(m_stsCopyright);

    // 退出图标
    m_stsExit->installEventFilter(this); // 安装事件过滤，以便获取其单击事件
    m_stsExit->setToolTip("Exit App");
    // 贴图
    QPixmap exitIcon(":/images/exit.jpg");
    m_stsExit->setMinimumWidth(22);
    m_stsExit->setPixmap(exitIcon);
    ui->statusbar->addPermanentWidget(m_stsExit);

    connect(this, &MainWindow::sig_exit, qApp, &QApplication::quit); // 直接关联到全局的退出槽
}