用dmalloc调试memory leak的一点使用经验

用c/c++开发的程序,不可避免地会遇到memory leak的问题。在俺们的项目里自己的code基本上已经boost::xxx_ptr了,但是用上第三方的库,总还是得用raw pointer,于是会有各种诡异的memory leak问题。

Dmalloc可谓这方面的神器,它功能很多,不过这里只记录一下怎么用它来找memory leak。

  • 环境: Fedora 11, gcc 4.4.1
  • 准备:
    1. Get dmalloc from http://dmalloc.com/ (latest version 5.5.2)
    2. Patch cxx return value, 如果没有这个patch,dmalloc log里c++的函数地址似乎有问题
    3. make && make install, 注意还需要额外make cxx, threads, install也一样
    4. 设置环境变量,我们只需要打印log信息,所以
    dmalloc runtime -p print-messages
    然后设置相应的环境变量就ok了
  • 调试:
    放两个例子。(相关code在这里 dmalloc_share.tar.gz

    1) 最简单的例子 basic_usage

    $ ./basic_usage
    1361264794: 6: Dumping Chunk Statistics:
    1361264794: 6: basic-block 4096 bytes, alignment 8 bytes
    1361264794: 6: heap address range: 0x7f7840c63000 to 0x7f7840c69000, 24576 bytes
    1361264794: 6:     user blocks: 3 blocks, 11216 bytes (45%)
    1361264794: 6:    admin blocks: 3 blocks, 12288 bytes (50%)
    1361264794: 6:    total blocks: 6 blocks, 24576 bytes
    1361264794: 6: heap checked 0
    1361264794: 6: alloc calls: malloc 4, calloc 0, realloc 0, free 2
    1361264794: 6: alloc calls: recalloc 0, memalign 0, valloc 0
    1361264794: 6: alloc calls: new 0, delete 0
    1361264794: 6:   current memory in use: 2000 bytes (2 pnts)
    1361264794: 6:  total memory allocated: 3600 bytes (4 pnts)
    1361264794: 6:  max in use at one time: 2000 bytes (2 pnts)
    1361264794: 6: max alloced with 1 call: 1200 bytes
    1361264794: 6: max unused memory space: 1072 bytes (34%)
    1361264794: 6: top 10 allocations:
    1361264794: 6:  total-size  count in-use-size  count  source
    1361264794: 6:        1200      1        1200      1  ra=0x400ede
    1361264794: 6:        1200      1           0      0  ra=0x400eb3
    1361264794: 6:         800      1         800      1  ra=0x400e9b
    1361264794: 6:         400      1           0      0  ra=0x400e70
    1361264794: 6:        3600      4        2000      2  Total of 4
    1361264794: 6: Dumping Not-Freed Pointers Changed Since Start:
    1361264794: 6:  not freed: '0x7f7840c63008|s1' (1200 bytes) from 'ra=0x400ede'
    1361264794: 6:  not freed: '0x7f7840c65c08|s1' (800 bytes) from 'ra=0x400e9b'
    1361264794: 6:  total-size  count  source
    1361264794: 6:        1200      1  ra=0x400ede
    1361264794: 6:         800      1  ra=0x400e9b
    1361264794: 6:        2000      2  Total of 2
    1361264794: 6: ending time = 1361264794, elapsed since start = 0:00:00
    

    很明显,地址0x400ede 0x400e9b这两个地方有memory leak,用addr2line看一下:

    $ addr2line -e ./basic_usage 0x400ede
    /xxx/basic_usage.cpp:18
    $ addr2line -e ./basic_usage 0x400e9b
    /xxx/basic_usage.cpp:12
    

    就很清楚了

    2) tricky_usage, 稍微有点tricky的调试方式,稍微解释一下:
    a. 注册SIGHUP和SIGINT的handler;
    b. 收到一次SIGHUP,调用dmalloc_mark(),标志一下当前的内存使用情况;
    c. 再收到一次SIGHUP, 调用dmalloc_log_changed(),记录从上次mark之后的内存变化;
    d. 收到SIGINT,调用 dmalloc_log_unfreed() 打印所有没有被free的内存

    $ ./tricky_usage&  # get the pid
    $ kill -1 pid  # mark, and call testFunc()
    $ kill -1 pid  # log_changed memory
    Asking dmalloc to log changed
    1361264378: 5: Dumping Not-Freed and Freed Pointers Changed Since Start:
    1361264378: 5: not freed: '0x7f67d2384808|s1' (1200 bytes) from 'ra=0x4010ab'
    1361264378: 5: not freed: '0x7f67d2386c08|s1' (800 bytes) from 'ra=0x40105b'
    1361264378: 5: freed: '0x7f67d2389e08|s2' (400 bytes) from 'ra=0x401046'
    1361264378: 5: total-size count source
    1361264378: 5: 1200 1 ra=0x4010ab
    1361264378: 5: 800 1 ra=0x40105b
    1361264378: 5: 400 1 ra=0x401046
    1361264378: 5: 2400 3 Total of 3
    

    注意”not freed”和”freed”,很明显吧 🙂
    多kill -1几次,可以观察到每一次freedAndNew这个变量会在下一次被free掉,这个并不算leak,真正的leak是notFreed这个变量。

    然后kill -2看看结果。

    $ kill -2 pid  # log all unfreed variables
    1361264409: 18: dumping the unfreed pointers
    1361264409: 18: Dumping Not-Freed Pointers Changed Since Start:
    1361264409: 18: not freed: '0x7f67d2383808|s1' (1200 bytes) from 'ra=0x4010ab'
    1361264409: 18: not freed: '0x7f67d2386408|s1' (800 bytes) from 'ra=0x40105b'
    1361264409: 18: not freed: '0x7f67d2386808|s1' (800 bytes) from 'ra=0x40105b'
    1361264409: 18: not freed: '0x7f67d2386c08|s1' (800 bytes) from 'ra=0x40105b'
    1361264409: 18: total-size count source
    1361264409: 18: 2400 3 ra=0x40105b
    1361264409: 18: 1200 1 ra=0x4010ab
    1361264409: 18: 3600 4 Total of 2
    

    不解释了~
    想真正杀掉这个测试进程,kill -9吧

    总结:
    1) 设置dmalloc的环境变量,link libdmallocxxx,运行完程序会有dmalloc的分析数据;
    2) dmalloc_mark() 和 dmalloc_log_changed() 可以用来分析某一次事件或一段时间内的内存变化情况;
    3) dmalloc_log_unfreed() 可以打印当前所有没有free掉的变量

Share