遇到一个valgrind自身的bug

背景

公司C++项目代码使用了cppcheck做静态代码检查,也使用valgrind检查是否有内存泄漏问题。我多次强调要做到0警告,虽然有了CICD的Jenkins自动检查,也将结果通过邮件发给项目人员,但有的人还是没去修正警告,由于不是自己管辖范围,不好多说什么。
最近使用valgrind测试,遇到了未识别指令的问题(运行的程序被认为是非法指令)。经查发现是valgrind版本太低造成的。

问题出现

运行命令如下:

1
valgrind  --leak-check=full --show-leak-kinds=all  ./a.out

错误提示如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
vex amd64->IR: unhandled instruction bytes: 0xF 0xC7 0xF0 0x89 0x6 0xF 0x42 0xC1
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==3562== valgrind: Unrecognised instruction at address 0x4ef1b15.
==3562== at 0x4EF1B15: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==3562== by 0x4EF1CB1: std::random_device::_M_getval() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==3562== by 0x400D8B: std::random_device::operator()() (in /home/latelee/test/mytest/warningtest/a.out)
==3562== by 0x400FA1: Init() (in /home/latelee/test/mytest/warningtest/a.out)
==3562== by 0x400DD7: GetRandomC11() (in /home/latelee/test/mytest/warningtest/a.out)
==3562== by 0x400B55: GetRandomNum() (in /home/latelee/test/mytest/warningtest/a.out)
==3562== by 0x400D0A: main (in /home/latelee/test/mytest/warningtest/a.out)

==26759== Process terminating with default action of signal 4 (SIGILL)
==26759== Illegal opcode at address 0x4EF1B15
==26759== at 0x4EF1B15: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==26759== by 0x4EF1CB1: std::random_device::_M_getval() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==26759== by 0x400D8B: std::random_device::operator()() (in /home/latelee/test/mytest/warningtest/a.out)
==26759== by 0x400FA1: Init() (in /home/latelee/test/mytest/warningtest/a.out)
==26759== by 0x400DD7: GetRandomC11() (in /home/latelee/test/mytest/warningtest/a.out)
==26759== by 0x400B55: GetRandomNum() (in /home/latelee/test/mytest/warningtest/a.out)
==26759== by 0x400D0A: main (in /home/latelee/test/mytest/warningtest/a.out)

同时,valgrind也进行了提醒:

1
2
3
4
5
6
7
8
9
10
==2636== Your program just tried to execute an instruction that Valgrind
==2636== did not recognise. There are two possible reasons for this.
==2636== 1. Your program has a bug and erroneously jumped to a non-code
==2636== location. If you are running Memcheck and you just saw a
==2636== warning about a bad jump, it's probably your program's fault.
==2636== 2. The instruction is legitimate but Valgrind doesn't handle it,
==2636== i.e. it's Valgrind's fault. If you think this is the case or
==2636== you are not sure, please let us know and we'll try to fix it.
==2636== Either way, Valgrind will now raise a SIGILL signal which will
==2636== probably kill your program.

大概意思是说,要么是程序代码真的有bug,要么是valgrind本身有bug(顺便反馈给作者)。反复阅读代码,统计new和delete出现次数,都没问题。
后来在上看到有介绍,大概意思是不支持_M_getval(),帖子中还附带了补丁,也建议使用新版本。本文使用新版本valgrind进行测试。

新版本编译、测试

先查看当前版本,如下:

1
2
$ valgrind --version
valgrind-3.11.0

在官网上查看到最新版本是3.13,下载地址。接着是常规的编译安装:

1
2
3
4
5
tar jxf valgrind-3.13.0.tar.bz2 

./configure --prefix=/home/latelee/bin/valgrind
make -j
make install

使用新版本进行测试,命令如下:

1
$/home/latelee/bin/valgrind/bin/valgrind --leak-check=full --show-leak-kinds=all  ./a.out

这次的结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
==30979== Memcheck, a memory error detector
==30979== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==30979== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==30979== Command: ./a.out
==30979==
==30979== Conditional jump or move depends on uninitialised value(s)
==30979== at 0x400E36: Uninit() (in /home/latelee/test/mytest/warningtest/a.out)
==30979== by 0x400B65: GetRandomNum() (in /home/latelee/test/mytest/warningtest/a.out)
==30979== by 0x400D0A: main (in /home/latelee/test/mytest/warningtest/a.out)
==30979==
-316985884
==30979==
==30979== HEAP SUMMARY:
==30979== in use at exit: 72,704 bytes in 1 blocks
==30979== total heap usage: 7 allocs, 6 frees, 84,856 bytes allocated
==30979==
==30979== LEAK SUMMARY:
==30979== definitely lost: 0 bytes in 0 blocks
==30979== indirectly lost: 0 bytes in 0 blocks
==30979== possibly lost: 0 bytes in 0 blocks
==30979== still reachable: 72,704 bytes in 1 blocks
==30979== suppressed: 0 bytes in 0 blocks
==30979== Rerun with --leak-check=full to see details of leaked memory
==30979==
==30979== For counts of detected and suppressed errors, rerun with: -v
==30979== Use --track-origins=yes to see where uninitialised values come from
==30979== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

可以看到,虽然有error,但已经没有valgrind: Unrecognised instruction at address的错误信息了。

小结

建议使用源码编译安装valgrind,减少其自身bug带来的误判。

李迟 2018.8.30 周四 夜

  • 本文作者:李迟
  • 版权声明:原创文章,版权归署名作者,转载建议注明出处(当然不注明亦可)。
  • 本文链接:/my-study/a-tittle-valgrind-bug.html