介绍内存问题定位工具并进行相关实验:Valgrind Massif
、AddressSanitizer
、Memory Leak and Growth火焰图
和 bcc中内存相关的工具。
1. 背景
利用 Valgrind Massif、AddressSanitizer 进行内存相关实验。以及使用 并发与异步编程(三) – 性能分析工具:gperftools和火焰图 中未展开的 Memory Leak and Growth火焰图。
并介绍下bcc tools里面内存相关的工具。
2. 测试程序demo
生成一个测试demo,也可见:leak_test.cpp
- 模拟内存泄漏:在单独线程中,随机申请 1KB~1MB 之间的内存,并且50%的概率不释放
- 模拟空悬指针和野指针
编译:g++ -o leak_test leak_test.cpp -g -pthread
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <time.h>
#include <pthread.h>
#define MAX_CHUNK_SIZE 1024 * 1024 // 最大块大小为 1 MB
#define MIN_CHUNK_SIZE 1024 // 最小块大小为 1 KB
// 模拟随机内存泄漏的函数
void* random_leak_memory(void* arg) {
int max_chunks = *(int*)arg;
srand(time(NULL)); // 初始化随机数种子
for (int i = 0; i < max_chunks; i++) {
// 随机生成内存块大小 (1 KB 到 1 MB)
size_t chunk_size = (rand() % (MAX_CHUNK_SIZE - MIN_CHUNK_SIZE + 1)) + MIN_CHUNK_SIZE;
// 分配内存
void *ptr = malloc(chunk_size);
if (ptr == NULL) {
perror("malloc failed");
exit(EXIT_FAILURE);
}
// 填充数据以确保内存真正被使用
memset(ptr, 0, chunk_size);
printf("Allocated chunk %d of size %zu bytes\n", i + 1, chunk_size);
// 随机决定是否释放内存(50% 的概率不释放)
if (rand() % 2 == 0) {
printf("Freeing chunk %d\n", i + 1);
free(ptr);
} else {
printf("Leaking chunk %d\n", i + 1);
}
// 等待模拟实际运行中的内存使用
sleep(1);
}
return NULL;
}
// 模拟空悬指针的问题
void simulate_dangling_pointer() {
// 分配内存并初始化
int *ptr = (int*)malloc(sizeof(int));
if (ptr == NULL) {
perror("malloc failed");
exit(EXIT_FAILURE);
}
*ptr = 42;
printf("Allocated memory and initialized with value: %d\n", *ptr);
// 释放内存
free(ptr);
printf("Memory freed, but ptr is still accessible.\n");
// 访问已释放的内存(产生空悬指针)
printf("Dangling pointer triggered: Accessing freed memory...\n");
// 这里尝试访问已经释放的内存
printf("Value at dangling pointer: %d\n", *ptr); // 可能导致未定义行为
}
// 模拟野指针的问题
void simulate_wild_pointer() {
int *wild_ptr; // 声明但不初始化
printf("Wild pointer declared but not initialized.\n");
// 尝试使用未初始化的指针(产生野指针)
printf("Wild pointer triggered: Accessing uninitialized memory...\n");
// 这里尝试访问未初始化的指针
printf("Value at wild pointer: %d\n", *wild_ptr); // 可能导致段错误
}
int main(int argc, char *argv[]) {
if (argc != 2) {
fprintf(stderr, "Usage: %s <max_number_of_chunks>\n", argv[0]);
return EXIT_FAILURE;
}
printf("ASAN_OPTIONS=%s\n", getenv("ASAN_OPTIONS"));
int max_chunks = atoi(argv[1]);
if (max_chunks <= 0) {
fprintf(stderr, "Please provide a positive number of chunks.\n");
return EXIT_FAILURE;
}
pthread_t leak_thread;
pthread_create(&leak_thread, NULL, random_leak_memory, &max_chunks);
// 主线程进行空悬指针和野指针的模拟
printf("\nTesting Dangling Pointer:\n");
simulate_dangling_pointer();
printf("\nTesting Wild Pointer:\n");
simulate_wild_pointer();
printf("\nRandom memory issue simulation completed. Check memory usage with tools like Valgrind or memleak.\n");
// 等待泄漏线程完成
pthread_join(leak_thread, NULL);
printf("\nAll Tests Done.\n");
// 主动进入无限循环,方便观察内存占用情况
while (1) {
sleep(1);
}
return 0;
}
3. Valgrind Massif
Valgrind比较普遍的用法是用memcheck
检查内存泄漏,不指定工具时默认就用memcheck。不过有些内存申请了只是未被有效使用则memcheck
识别不清楚,此时可利用massif
辅助分析。
可查看官网介绍:Valgrind Massif
- Valgrind提供的堆分析器
massif
,用于监控程序的堆内存使用情况,可辅助识别内存泄漏和不必要的内存使用- massif会统计不同大小的内存块分配情况,可辅助分析内存分布情况,比如是否有大量小块、少量大块的分配
- massif会报告程序运行期间的峰值内存使用量,包括堆内存的最大分配量,对于评估程序的内存需求非常有用
- 性能影响:程序变慢
20
倍左右(10~30
) - 使用方式:
-g
编译;valgrind --tool=massif xxx
,会生成一个分析文件;ms_print ./massif.out.18042
输出报告 - 实用选项
--time-unit=B
:massif是定时获取快照的,默认时间单位(time-unit)
是指令数,--time-unit=B
则指定时间单位是字节,能更精准地体现内存分配释放与时间的关系(还是定时采样,不是说每次内存分配都会快照到)--pages-as-heap=yes
:把内存映射页当作堆内存来处理,如此就能捕获更多类型的内存分配,比如mmap
系统调用分配的内存--depth=N
:限制调用栈的深度--threshold=N
:只记录内存分配比例大于N% 的函数--max-snapshots=N
:限制生成的快照数量,减少磁盘空间消耗和分析时间--detailed-count=N
:指定详细快照的最大数量
- 示例:
valgrind --tool=massif --time-unit=B --pages-as-heap=yes ./leak_test 5
3.1. memcheck测试
memcheck测试:valgrind --tool=memcheck --leak-check=full ./leak_test 5
,最后ctrl+c打断程序
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[CentOS-root@xdlinux ➜ leak git:(main) ✗ ]$ valgrind --tool=memcheck --leak-check=full ./leak_test 5
==23944== Memcheck, a memory error detector
...
# 代码运行的随机泄漏情况
Allocated chunk 1 of size 380016 bytes
Freeing chunk 1
Allocated chunk 2 of size 67379 bytes
Leaking chunk 2
Allocated chunk 3 of size 594641 bytes
Leaking chunk 3
Allocated chunk 4 of size 510578 bytes
Leaking chunk 4
Allocated chunk 5 of size 869036 bytes
Leaking chunk 5
...
==23944== 2,041,634 bytes in 4 blocks are definitely lost in loss record 1 of 1
==23944== at 0x4C360A5: malloc (vg_replace_malloc.c:380)
==23944== by 0x400AEC: random_leak_memory(void*) (leak_test.cpp:21)
==23944== by 0x577D179: start_thread (pthread_create.c:479)
==23944== by 0x5A91DC2: clone (clone.S:95)
==23944==
==23944== LEAK SUMMARY:
==23944== definitely lost: 2,041,634 bytes in 4 blocks
==23944== indirectly lost: 0 bytes in 0 blocks
==23944== possibly lost: 0 bytes in 0 blocks
==23944== still reachable: 0 bytes in 0 blocks
==23944== suppressed: 0 bytes in 0 blocks
3.2. massif测试
3.2.1. 生成数据文件
massif测试:valgrind --tool=massif --time-unit=B --pages-as-heap=yes ./leak_test 5
,最后ctrl+c打断程序
对生成的数据文件massif.out.pid
进行分析:ms_print massif.out.24000
(ms_print也在Valgrind包中)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[CentOS-root@xdlinux ➜ leak git:(main) ✗ ]$ valgrind --tool=massif --time-unit=B --pages-as-heap=yes ./leak_test 5
==24161== Massif, a heap profiler
...
# 代码运行的随机泄漏情况
Allocated chunk 1 of size 816446 bytes
Leaking chunk 1
Allocated chunk 2 of size 281989 bytes
Freeing chunk 2
Allocated chunk 3 of size 692444 bytes
Leaking chunk 3
Allocated chunk 4 of size 676581 bytes
Leaking chunk 4
Allocated chunk 5 of size 714673 bytes
Freeing chunk 5
...
# 打断程序,生成了 massif.out.24161 数据文件
[CentOS-root@xdlinux ➜ leak git:(main) ✗ ]$ ls -ltrh
total 56K
-rw-r--r-- 1 root root 3.3K Apr 4 16:31 leak_test.cpp
-rwxr-xr-x 1 root root 29K Apr 4 16:31 leak_test
-rw-r--r-- 1 root root 18K Apr 4 17:18 massif.out.24161
3.2.2. 结果分析和说明
ms_print massif.out.24161
结果分析,完整结果可见:1_ms_print_24161.result。(该目录下也可查看其他选项的结果,比如pages-as-heap=no
、stacks=yes
)
- ms_print结果中,最前面是一个字符组成的柱状图,不同字符表示不同含义
- 每条竖线表示一次快照(snapshot),体现当前的内存使用,下面的
Number of snapshots: 47
表示有47次快照 :
竖线表示普通快照,@
竖线表示详细快照,发生了内存分配,#
表示本次快照到的内存最大,也是一个详细快照peak snapshots
是通过发生内存释放时进行快照采样,可能存在误差,不一定是实际内存使用最大的时间点,仅供参考
Detailed snapshots: [9, 19, 29, 31, 36 (peak), 46]
表示有6次详细快照
- 每条竖线表示一次快照(snapshot),体现当前的内存使用,下面的
- 接着则是每次快照的信息,详细快照还会把堆栈打出来,比如:
[9, 19, 29, 31, 36 (peak), 46]
,编号36
达到内存使用顶峰n time(B) total(B) useful-heap(B) extra-heap(B) stacks(B)
编号 时间单位(字节) 消耗的内存 已分配内存 超出内存申请量的内存 栈占用的内存
- 栈占用的内存统计默认是关的,因为会大大降低massif的性能,可通过
--stacks=yes
开启(不能和--pages-as-heap=yes
混用)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
[CentOS-root@xdlinux ➜ leak git:(main) ✗ ]$ ms_print massif.out.24161
--------------------------------------------------------------------------------
Command: ./leak_test 5
Massif arguments: --time-unit=B --pages-as-heap=yes
ms_print arguments: massif.out.24161
--------------------------------------------------------------------------------
MB
153.6^ #
| #:::::::::
| #
| #
| # :::::::::::
| # :
| # :
| # :
| # : :
| # : :@
| # : :@
| # : :@
| # : :@
| # : :@
| # : :@
| # : :@
| # : :@
| :::::::::::::::::::::::::::::::::::::::::# : :@
| @::: # : :@
| @:@ : # : :@
0 +----------------------------------------------------------------------->MB
0 221.7
Number of snapshots: 47
Detailed snapshots: [9, 19, 29, 31, 36 (peak), 46]
--------------------------------------------------------------------------------
n time(B) total(B) useful-heap(B) extra-heap(B) stacks(B)
--------------------------------------------------------------------------------
0 8,192 8,192 8,192 0 0
1 16,384 16,384 16,384 0 0
2 196,608 196,608 196,608 0 0
3 208,896 208,896 208,896 0 0
4 212,992 212,992 212,992 0 0
5 217,088 217,088 217,088 0 0
6 225,280 225,280 225,280 0 0
7 229,376 229,376 229,376 0 0
8 233,472 233,472 233,472 0 0
9 233,472 233,472 233,472 0 0
100.00% (233,472B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->98.25% (229,376B) 0x0: ???
|
->01.75% (4,096B) 0x4000FA0: ??? (in /usr/lib64/ld-2.28.so)
...
--------------------------------------------------------------------------------
n time(B) total(B) useful-heap(B) extra-heap(B) stacks(B)
--------------------------------------------------------------------------------
32 18,378,752 18,362,368 18,362,368 0 0
33 18,550,784 18,460,672 18,460,672 0 0
36 161,161,216 161,071,104 161,071,104 0 0
100.00% (161,071,104B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->88.54% (142,610,432B) 0x5877707: __mmap64 (mmap64.c:52)
| ->88.54% (142,610,432B) 0x5877707: mmap (mmap64.c:40)
| ->83.33% (134,217,728B) 0x58016F6: new_heap (arena.c:489)
| | ->83.33% (134,217,728B) 0x58022A1: _int_new_arena (arena.c:694)
| | ->83.33% (134,217,728B) 0x58022A1: arena_get2.part.6 (arena.c:913)
| | ->83.33% (134,217,728B) 0x5804E5C: arena_get2 (arena.c:881)
| | ->83.33% (134,217,728B) 0x5804E5C: tcache_init.part.7 (malloc.c:2995)
| | ->83.33% (134,217,728B) 0x5805B85: tcache_init (malloc.c:2992)
| | ->83.33% (134,217,728B) 0x5805B85: malloc (malloc.c:3051)
| | ->83.33% (134,217,728B) 0x400AEC: random_leak_memory(void*) (leak_test.cpp:21)
| | ->83.33% (134,217,728B) 0x5568179: start_thread (pthread_create.c:479)
| | ->83.33% (134,217,728B) 0x587CDC2: clone (clone.S:95)
...
--------------------------------------------------------------------------------
n time(B) total(B) useful-heap(B) extra-heap(B) stacks(B)
--------------------------------------------------------------------------------
37 161,165,312 161,067,008 161,067,008 0 0
38 191,287,296 130,945,024 130,945,024 0 0
39 229,089,280 94,781,440 94,781,440 0 0
40 229,371,904 95,064,064 95,064,064 0 0
41 229,376,000 95,059,968 95,059,968 0 0
42 230,350,848 95,477,760 95,477,760 0 0
43 231,030,784 96,157,696 96,157,696 0 0
44 231,747,584 96,874,496 96,874,496 0 0
45 231,751,680 96,870,400 96,870,400 0 0
46 232,468,480 96,153,600 96,153,600 0 0
100.00% (96,153,600B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->80.80% (77,692,928B) 0x5877707: __mmap64 (mmap64.c:52)
| ->80.80% (77,692,928B) 0x5877707: mmap (mmap64.c:40)
...
4. Sanitizer
4.1. Sanitizer系列工具说明
Google的Sanitizer
系列工具,在gcc和clang中都集成了,通过-fsanitize=
即可开启,还可以在程序运行时动态进行开关
1、AddressSanitizer(ASan
),检测内存访问错误,如越界访问、使用已释放的内存(悬空指针)、重复释放等
-fsanitize=address
- 若要检测到报错后支持继续执行,编译时需要加
-fsanitize-recover=address
,并且运行时设置ASAN_OPTIONS=halt_on_error=0
- 需要安装:
yum install libasan
,否则编译会提示缺libasan.so
库
- 若要检测到报错后支持继续执行,编译时需要加
- 性能影响(仅作参考):程序变慢约2倍(取决于代码复杂度);内存占用增加约2倍
- 详见:AddressSanitizer
- 使用有疑问可以先看FAQ中是否已覆盖
- 相关编译选项和运行时选项,可见:Flags
2、LeakSanitizer(LSan
),检测内存泄漏
-fsanitize=leak
- 需要安装:
yum install libasan
,否则编译会提示缺libasan.so
库
- 需要安装:
- 性能影响:运行时开销极低,与 ASan 结合时影响较小(约 1-2 倍);内存轻微增加
- 详见:AddressSanitizerLeakSanitizer
3、MemorySanitizer(MSan
),检测程序中使用未初始化的内存
-fsanitize=memory
- 性能影响:程序变慢约3倍;影子内存与程序内存1:1,内存占用显著增加
4、ThreadSanitizer(TSan
),检测多线程程序中的数据竞争和死锁
-fsanitize=thread
- 性能影响:变慢约5-15倍;内存消耗增加5-10倍
5、UndefinedBehavaiorSnitizer(UBSan
),检测未定义行为,如整数溢出、空指针解引用、类型转换错误等
-fsanitize=undefined
- 性能影响:开销通常小于10%
对比汇总:
工具 | 检测类型 | 性能影响(时间) | 内存占用 |
---|---|---|---|
AddressSanitizer (ASan) | 内存访问错误、泄漏 | 2× | 高(虚拟内存) |
ThreadSanitizer (TSan) | 数据竞争、死锁 | 5-15× | 极高 |
MemorySanitizer (MSan) | 未初始化内存使用 | 3× | 高 |
UndefinedBehaviorSanitizer (UBSan) | 未定义行为 | <10% | 低 |
Valgrind | 综合检测(内存错误、性能) | 20× | 极高 |
使用建议:
- 快速开发调试:优先使用 ASan 或 UBSan(性能影响小,覆盖常见问题)。
- ASan(内存错误) + UBSan(未定义行为) + LSan(泄漏检测)覆盖大部分常见问题
- 多线程问题:使用 TSan。
- 未初始化内存:使用 MSan(需确保依赖库支持)。
- 全面检测:分阶段使用不同工具,避免同时启用多个工具(如 ASan + TSan 冲突)。
- LSAN 和 TSAN 不能同时启用(-fsanitize=thread 和 -fsanitize=leak 冲突)
- 替代方案:先使用 TSAN 检测数据竞争,再使用 LSAN 检测泄漏(分两次编译运行)
- 性能敏感场景:禁用 Sanitizer 或仅在关键模块启用。
- 遗留系统:Valgrind 仍为无源码调试的备选方案
Sanitizer工具使用时需要用系统默认的常规内存分配器,跟踪标准内存管理函数,比如RocksDB里使用时就禁用了jemalloc:
4.2. 实验
还是使用上述demo,编译器:gcc version 8.5.0 20210514 (Red Hat 8.5.0-4) (GCC)
Makefile:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# 编译器和标志
CC = g++
CFLAGS = -Wall -g
LDFLAGS = -lpthread
# 程序名称和源文件
TARGET = leak_test
SRCS = leak_test.cpp
# 默认目标
all: $(TARGET)
# 编译普通版本
$(TARGET): $(SRCS)
$(CC) $(CFLAGS) -o $(TARGET) $(SRCS) $(LDFLAGS)
# 使用 AddressSanitizer 编译
# 若要检测到报错后支持继续执行,需要加`-fsanitize-recover=address`,并且运行时设置`ASAN_OPTIONS=halt_on_error=0`
asan: CFLAGS += -fsanitize=address -fsanitize-recover=address
asan: clean $(TARGET)
# 单独使用 LeakSanitizer 编译 (通常与 AddressSanitizer 一起启用)
# AddressSanitizer里面已经默认集成了LeakSanitizer,asan不需要显式指定
lsan: CFLAGS += -fsanitize=leak
lsan: clean $(TARGET)
# MemorySanitizer
msan: CFLAGS += -fsanitize=memory
msan: clean $(TARGET)
# 使用 UndefinedBehaviorSanitizer 编译
ubsan: CFLAGS += -fsanitize=undefined
ubsan: clean $(TARGET)
# ThreadSanitizer
tsan: CFLAGS += -fsanitize=thread
tsan: clean $(TARGET)
# 清理生成的文件
clean:
rm -f $(TARGET)
.PHONY: all asan lsan ubsan clean
4.2.1. AddressSanitizer
默认情况下,ASan在检测到错误(如内存泄漏、空悬指针访问等)时会终止程序。若要检测到报错后支持继续执行,编译时需要加-fsanitize-recover=address
,并且运行时设置ASAN_OPTIONS=halt_on_error=0
编译:
1
2
3
4
5
6
7
8
# -Wall 编译器就会警告不规范使用
[CentOS-root@xdlinux ➜ leak git:(main) ✗ ]$ make asan
rm -f leak_test
g++ -Wall -g -fsanitize=address -o leak_test leak_test.cpp -lpthread
leak_test.cpp: In function ‘void simulate_wild_pointer()’:
leak_test.cpp:81:11: warning: ‘wild_ptr’ is used uninitialized in this function [-Wuninitialized]
printf("Value at wild pointer: %d\n", *wild_ptr); // 可能导致段错误
~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
运行:ASAN_OPTIONS=
指定选项,多个用:
间隔
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[CentOS-root@xdlinux ➜ leak git:(main) ✗ ]$ ASAN_OPTIONS="halt_on_error=0:detect_leaks=1:log_path=AddressSanitizer/asan.log" ./leak_test 5
ASAN_OPTIONS=halt_on_error=0:detect_leaks=1:log_path=AddressSanitizer/asan.log
Testing Dangling Pointer:
Allocated memory and initialized with value: 42
Memory freed, but ptr is still accessible.
Dangling pointer triggered: Accessing freed memory...
Allocated chunk 1 of size 43233 bytes
Freeing chunk 1
Value at dangling pointer: 260046849
Testing Wild Pointer:
Wild pointer declared but not initialized.
Wild pointer triggered: Accessing uninitialized memory...
Value at wild pointer: 1102416563
Random memory issue simulation completed. Check memory usage with tools like Valgrind or memleak.
Allocated chunk 2 of size 53407 bytes
Leaking chunk 2
Allocated chunk 3 of size 53586 bytes
Freeing chunk 3
Allocated chunk 4 of size 5692 bytes
Freeing chunk 4
Allocated chunk 5 of size 22918 bytes
Freeing chunk 5
^C
结果:
- 可看到 空悬指针访问(下面的
heap-use-after-free
)、野指针访问(下面的stack-buffer-underflow
)都检测出来了。 - 内存泄漏没识别,并不是因为未加
-fsanitize=leak
的原因,而是主线程while(1)
死循环,最后是通过ctrl+c结束的,但lsan需要依赖程序退出时(atexit
)检查
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
=================================================================
==38822==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000010 at pc 0x00000040111b bp 0x7fff7ee56650 sp 0x7fff7ee56640
READ of size 4 at 0x602000000010 thread T0
#0 0x40111a in simulate_dangling_pointer() /home/workspace/prog-playground/memory/leak/leak_test.cpp:64
#1 0x4013ca in main /home/workspace/prog-playground/memory/leak/leak_test.cpp:103
#2 0x7f32aa24b492 in __libc_start_main ../csu/libc-start.c:314
#3 0x400e0d in _start (/home/workspace/prog-playground/memory/leak/leak_test+0x400e0d)
...
=================================================================
==38822==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7fff7ee56690 at pc 0x000000401188 bp 0x7fff7ee56650 sp 0x7fff7ee56640
READ of size 4 at 0x7fff7ee56690 thread T0
#0 0x401187 in simulate_wild_pointer() /home/workspace/prog-playground/memory/leak/leak_test.cpp:81
#1 0x4013d9 in main /home/workspace/prog-playground/memory/leak/leak_test.cpp:106
#2 0x7f32aa24b492 in __libc_start_main ../csu/libc-start.c:314
#3 0x400e0d in _start (/home/workspace/prog-playground/memory/leak/leak_test+0x400e0d)
...
4.2.2. LeakSanitizer
AddressSanitizer里面已经默认集成了LeakSanitizer,编译时不需要显式指定-fsanitize=leak
,但注意还是需要安装liblsan:yum install liblsan
。
而且AddressSanitizer中 detect_leaks=1
是默认打开的,ASAN_OPTIONS
中不用显式指定。所以实践中建议直接使用 AddressSanitizer,不需单独使用LeakSanitizer。
之前内存泄漏没识别,并不是因为未加-fsanitize=leak
的原因,而是主线程while(1)
死循环,最后是通过ctrl+c结束的,但lsan需要依赖程序退出时(atexit
)检测泄漏。有2种方式触发检查:
- 1、移除无限循环,让程序自然退出
- 2、显式触发泄漏检查。
- 比如调试长期运行的服务,需要代码里调用:
__lsan_do_leak_check();
- 包含头文件:
#include <sanitizer/lsan_interface.h>
- 比如调试长期运行的服务,需要代码里调用:
两种方式:
- 下面实际验证都是生效的
- 不过方式2打印内存泄漏后还是自动退出了程序,因为LeakSanitizer检测到泄漏后,默认会终止程序,即使设置了
ASAN_OPTIONS="halt_on_error=0"
- LSAN 独立于 ASAN,
ASAN_OPTIONS
控制 ASAN 错误(如越界访问)是否终止程序,但 不控制 LSAN 的行为。 - LSAN 的默认行为,检测到内存泄漏时,LSAN 会打印报告并终止程序(默认退出码为 1),无论是否调用
__lsan_do_leak_check()
- 可通过设置
LSAN_OPTIONS
的退出码exitcode=0
,让程序继续运行:LSAN_OPTIONS="exitcode=0" ASAN_OPTIONS="halt_on_error=0" ./leak_test 5
- LSAN 独立于 ASAN,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// 方式1:
int main(int argc, char *argv[]) {
...
// 主动进入无限循环,方便观察内存占用情况
// while (1) {
// sleep(1);
// }
}
// 方式2:
int main(int argc, char *argv[]) {
...
// 手动触发内存泄漏检查,避免手动ctrl+c打断下面的while循环时无法触发检查
// 实际项目中可添加条件编译宏
__lsan_do_leak_check();
// 主动进入无限循环,方便观察内存占用情况
while (1) {
sleep(1);
}
}
重新make asan
(不用单独的LeakSanitizer)编译运行:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[CentOS-root@xdlinux ➜ leak git:(main) ✗ ]$ ASAN_OPTIONS="halt_on_error=0:log_path=AddressSanitizer/asan_no_while.log" ./leak_test 5
ASAN_OPTIONS=halt_on_error=0:log_path=AddressSanitizer/asan_no_while.log
Testing Dangling Pointer:
Allocated memory and initialized with value: 42
Memory freed, but ptr is still accessible.
Dangling pointer triggered: Accessing freed memory...
Allocated chunk 1 of size 516871 bytes
Leaking chunk 1
Value at dangling pointer: 67108865
Testing Wild Pointer:
Wild pointer declared but not initialized.
Wild pointer triggered: Accessing uninitialized memory...
Value at wild pointer: 1102416563
Random memory issue simulation completed. Check memory usage with tools like Valgrind or memleak.
Allocated chunk 2 of size 10176 bytes
Leaking chunk 2
Allocated chunk 3 of size 288425 bytes
Leaking chunk 3
Allocated chunk 4 of size 652067 bytes
Leaking chunk 4
Allocated chunk 5 of size 184961 bytes
Leaking chunk 5
All Tests Done.
查看检测结果,可看到除了检测到上面2个内存问题,最后还检测到了内存泄漏:
1
2
3
4
5
6
7
8
9
10
11
# AddressSanitizer/asan_no_while.log.41093
...
=================================================================
==41093==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 1652500 byte(s) in 5 object(s) allocated from:
#0 0x7f0c4e316ba8 in __interceptor_malloc (/lib64/libasan.so.5+0xefba8)
#1 0x400f83 in random_leak_memory(void*) /home/workspace/prog-playground/memory/leak/leak_test.cpp:21
#2 0x7f0c4e00f179 in start_thread /usr/src/debug/glibc-2.28/nptl/pthread_create.c:479
SUMMARY: AddressSanitizer: 1652500 byte(s) leaked in 5 allocation(s).
设置LSAN_OPTIONS
:注意log_path
也要设置在这里
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[CentOS-root@xdlinux ➜ leak git:(main) ✗ ]$ LSAN_OPTIONS="exitcode=0:log_path=AddressSanitizer/asan-with-LSAN_OPTIONS.log" ASAN_OPTIONS="halt_on_error=0" ./leak_test 5
ASAN_OPTIONS=halt_on_error=0
LSAN_OPTIONS=exitcode=0:log_path=AddressSanitizer/asan-with-LSAN_OPTIONS.log
Testing Dangling Pointer:
Allocated memory and initialized with value: 42
Memory freed, but ptr is still accessible.
Dangling pointer triggered: Accessing freed memory...
Allocated chunk 1 of size 946309 bytes
Freeing chunk 1
Value at dangling pointer: 1317011457
Testing Wild Pointer:
Wild pointer declared but not initialized.
Wild pointer triggered: Accessing uninitialized memory...
Value at wild pointer: 1102416563
Random memory issue simulation completed. Check memory usage with tools like Valgrind or memleak.
Allocated chunk 2 of size 505380 bytes
Leaking chunk 2
Allocated chunk 3 of size 5840 bytes
Freeing chunk 3
Allocated chunk 4 of size 1029577 bytes
Freeing chunk 4
Allocated chunk 5 of size 922868 bytes
Freeing chunk 5
All Tests Done.
^C
5. Memory 火焰图
介绍文章:Memory Leak and Growth火焰图
借助perf
和eBPF
来生成内存的火焰图,文章介绍了4种方法:
- 1、追踪用户态的
malloc()
,free()
- 使用bcc下的 stackcount 工具采集用户态的内存分配
/usr/share/bcc/tools/stackcount -p $(pidof mysqld) -U c:malloc > out_mysqld.stack
- 生成火焰图:
stackcollapse.pl < out_mysqld.stack | flamegraph.pl --color=mem --title="malloc() Flame Graph" --countname="calls" > out_mysqld.svg
- memleak:但是要检查内存泄漏的话,需要同时追踪
malloc
、realloc
、calloc
、posix_memalign
等等库函数调用,bcc里的memleak已经实现了,可以直接用- 采集:
/usr/share/bcc/tools/memleak -p $(pidof mysqld) > memleak_mysqld.stack
,并用客户端连接mysql触发一些查询操作
- 采集:
- 性能对比:
- 追踪
uprobes
使程序变慢 4 倍(4.15内核) - libtcmalloc 的堆采集,则变慢 6 倍
- 追踪
- 使用bcc下的 stackcount 工具采集用户态的内存分配
- 2、追踪系统调用:
brk()
brk()
一般不会被应用程序直接调用,比用户态的malloc
频率低很多。用户态分配器用malloc
/calloc
等申请的内存一般在其内存池缓存中,不会频繁brk
增加内存。- 所以可以用
perf
来采集:perf record -e syscalls:sys_enter_brk -a -g -- sleep 10
perf script |stackcollapse-perf.pl | flamegraph.pl --color=mem --title="Heap Expansion Flame Graph" --countname="calls" > brk.svg
- 也可以用bcc:
/usr/share/bcc/tools/stackcount __x64_sys_brk
SyS_brk
可能是老内核的方式- 可
bpftrace -l|grep brk
过滤查看符号 - 也可查看系统符号:
grep -i sys_brk /proc/kallsyms
- 也可到tracing文件系统下查看:
grep brk /sys/kernel/tracing/available_*
- 3、追踪系统调用:
mmap()
perf record -e syscalls:sys_enter_mmap -a -g -- sleep 10
/usr/share/bcc/tools/stackcount __x64_sys_mmap
- 4、追踪缺页中断:
page-faults
perf record -e page-faults -a -g -- sleep 30
/usr/share/bcc/tools/stackcount 't:exceptions:page_fault_*'