Brillo: 使用gdb调试native进程

当系统中的native进程出现crash的时候,如果系统中存在debugger进程,那么我们就可以在logcat中看到出现native crash时,CPU各个寄存器的值,crash所在线程的backtrace(调用栈:包含模块名及相对内存地址)。这个时候,我们可以通过addr2line(arm-linux-androideabi-addr2line) 查找模块各地址对应的源代码,当然,我们也可以通过gdb直接attach出问题的进程,使用bt命令直接打印出出问题线程的调用栈。

 

最近在解决USB无法网卡的问题的时候,shill进程经常crash, 为了确定问题点,将debuggerd (system/core/debugger)编译进system.img了。当shill进程crash的时候,我们就可以从logcat中看到类似如下信息:

12-27 06:19:36.098   520   520 F libc    : Fatal signal 6 (SIGABRT), code -6 in tid 520 (shill)
12-27 06:19:36.150    92    92 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
12-27 06:19:36.150    92    92 F DEBUG   : Build fingerprint: 'unknown'
12-27 06:19:36.150    92    92 F DEBUG   : Revision: '0'
12-27 06:19:36.151    92    92 F DEBUG   : ABI: 'arm'
12-27 06:19:36.151    92    92 F DEBUG   : pid: 520, tid: 520, name: shill  >>> /system/bin/shill <<<
12-27 06:19:36.151    92    92 F DEBUG   : signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
12-27 06:19:36.162    92    92 F DEBUG   : Abort message: '[FATAL:chromeos_supplicant_process_proxy.cc(168)] Failed to get interface wlan0_sta: fi.w1.wpa_supplicant1.InterfaceUnknown wpa_supplicant knows nothing about this interface.
12-27 06:19:36.162    92    92 F DEBUG   : '
12-27 06:19:36.162    92    92 F DEBUG   :     r0 00000000  r1 00000208  r2 00000006  r3 00000000
12-27 06:19:36.162    92    92 F DEBUG   :     r4 76f07acc  r5 00000006  r6 76f07a74  r7 0000010c
12-27 06:19:36.162    92    92 F DEBUG   :     r8 7eb85d0c  r9 7eb85d24  sl fffffcd8  fp 00002188
12-27 06:19:36.162    92    92 F DEBUG   :     ip 0000000b  sp 7eb85ca0  lr 75dda0e3  pc 75ddc85c  cpsr 80000010
12-27 06:19:36.182    92    92 F DEBUG   : 
12-27 06:19:36.182    92    92 F DEBUG   : backtrace:
12-27 06:19:36.182    92    92 F DEBUG   :     #00 pc 0004885c  /system/lib/libc.so (tgkill+12)
12-27 06:19:36.182    92    92 F DEBUG   :     #01 pc 000460df  /system/lib/libc.so (pthread_kill+34)
12-27 06:19:36.182    92    92 F DEBUG   :     #02 pc 0001afe7  /system/lib/libc.so (raise+10)
12-27 06:19:36.182    92    92 F DEBUG   :     #03 pc 00017715  /system/lib/libc.so (__libc_android_abort+34)
12-27 06:19:36.182    92    92 F DEBUG   :     #04 pc 00016e8c  /system/lib/libc.so (abort+4)
12-27 06:19:36.183    92    92 F DEBUG   :     #05 pc 0006cdcf  /system/lib/libchrome.so (base::debug::BreakDebugger()+2)
12-27 06:19:36.183    92    92 F DEBUG   :     #06 pc 00079613  /system/lib/libchrome.so (logging::LogMessage::~LogMessage()+590)
12-27 06:19:36.183    92    92 F DEBUG   :     #07 pc 000dba2f  /system/bin/shill
12-27 06:19:36.183    92    92 F DEBUG   :     #08 pc 000bfe65  /system/bin/shill
12-27 06:19:36.183    92    92 F DEBUG   :     #09 pc 000be8c7  /system/bin/shill
12-27 06:19:36.183    92    92 F DEBUG   :     #10 pc 0006d77f  /system/lib/libchrome.so (base::debug::TaskAnnotator::RunTask(char const*, char const*, base::PendingTask const&)+130)
12-27 06:19:36.183    92    92 F DEBUG   :     #11 pc 0007ba77  /system/lib/libchrome.so (base::MessageLoop::RunTask(base::PendingTask const&)+98)
12-27 06:19:36.183    92    92 F DEBUG   :     #12 pc 0007bb1d  /system/lib/libchrome.so (base::MessageLoop::DeferOrRunPendingTask(base::PendingTask const&)+18)
12-27 06:19:36.183    92    92 F DEBUG   :     #13 pc 0007bc63  /system/lib/libchrome.so (base::MessageLoop::DoWork()+194)
12-27 06:19:36.183    92    92 F DEBUG   :     #14 pc 0007d0eb  /system/lib/libchrome.so (base::MessagePumpLibevent::Run(base::MessagePump::Delegate*)+338)
12-27 06:19:36.183    92    92 F DEBUG   :     #15 pc 00085eb9  /system/lib/libchrome.so (base::RunLoop::Run()+58)
12-27 06:19:36.184    92    92 F DEBUG   :     #16 pc 00013909  /system/lib/libbrillo.so (brillo::BaseMessageLoop::Run()+18)
12-27 06:19:36.184    92    92 F DEBUG   :     #17 pc 0001814f  /system/lib/libbrillo.so (brillo::Daemon::Run()+20)
12-27 06:19:36.184    92    92 F DEBUG   :     #18 pc 00021c41  /system/bin/shill
12-27 06:19:36.184    92    92 F DEBUG   :     #19 pc 000162dd  /system/lib/libc.so (__libc_init+52)
12-27 06:19:36.184    92    92 F DEBUG   :     #20 pc 0002168c  /system/bin/shill
12-27 06:19:36.358    92    92 F DEBUG   : 
12-27 06:19:36.358    92    92 F DEBUG   : Tombstone written to: /data/tombstones/tombstone_00
12-27 06:19:36.653   525   525 W crash_reporter: Received crash notification for shill[520] sig 6, user 0 (ignoring - no consent)

光从这些log信息,我们还不能定位到出现问题的代码在哪个文件,第几行出现的问题。当然有了这些信息,我们可以通过addr2line这个工具来定位出问题的代码(通过log中提示的出问题的地址以及所对应的模块)。当然,如果有gdb的话,我们可以直接attach到这个进程,并通过gdb的bt(backtrace)直接打印当前的调用栈。下面请看具体的操作流程:

设置系统属性debug.debuggerd.wait_for gdb为true:

$ adb shell setprop debug.debuggerd.wait_for_gdb true

设置完这个系统属性之后,当shill再次crash的时候,logcat中就会多出如下信息:

12-27 06:19:41.073    92    92 F DEBUG   :     #19 pc 000162dd  /system/lib/libc.so (__libc_init+52)
12-27 06:19:41.073    92    92 F DEBUG   :     #20 pc 0002168c  /system/bin/shill
12-27 06:19:41.248    92    92 F DEBUG   : 
12-27 06:19:41.248    92    92 F DEBUG   : Tombstone written to: /data/tombstones/tombstone_00
12-27 06:19:41.248    92    92 I         : ***********************************************************
12-27 06:19:41.248    92    92 I         : * Process 528 has been suspended while crashing.
12-27 06:19:41.248    92    92 I         : * To attach gdbserver and start gdb, run this on the host:
12-27 06:19:41.248    92    92 I         : *
12-27 06:19:41.248    92    92 I         : *     gdbclient 528
12-27 06:19:41.248    92    92 I         : *
12-27 06:19:41.248    92    92 I         : * Wait for gdb to start, then press the VOLUME DOWN key
12-27 06:19:41.248    92    92 I         : * to let the process continue crashing.
12-27 06:19:41.248    92    92 I         : ***********************************************************
12-27 06:19:58.621   106   106 I tlsdated: [event:action_resolve_proxy] no dynamic proxy for google.com:443

这时候就可以直接在命令行中输入gdbclient 528去attach这个进程进行调试。但在brillo-m8-release分支中,gdbclient(是build/envsetup.sh 里面的一个函数,从提交的log中看,在2015-2-12将这个功能被修改成脚本):

commit f9631fd9db447a32135f4060a0d5b51bb799f2d4
Author: Dan Albert <danalbert@google.com>
Date:   Thu Feb 12 11:14:39 2015 -0800

    Remove gdbclient from envsetup.
    
    gdbclient is being promoted to a real script:
    https://android-review.googlesource.com/#/c/131831/
    
    Change-Id: I4bb70ad44cec0ebf62d9e8e355c22ed8b708868b

找到相关的脚本文件,通过help,我们知道可以这么做去attach这个进程(看来debuggerd(system/core/debugggerd)又要对这个描述进行修改了):

$ tools/bdk/debugging/gdbclient.py -p 528
Redirecting gdbclient output to /tmp/gdbclient-8536
WARNING:root:couldn't find /local/brillo-m8-dev/development/scripts/gdb/dalvik.gdb - ART debugging options will not be available

GNU gdb (GDB) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Failed to read a valid object file image from memory.
tgkill () at bionic/libc/arch-arm/syscalls/tgkill.S:9
9	    mov     r7, ip
(gdb) 

在gdb中使用bt查看调用栈:

(gdb) bt
#0  tgkill () at bionic/libc/arch-arm/syscalls/tgkill.S:9
#1  0x75dfb0e2 in pthread_kill (t=<optimized out>, sig=6) at bionic/libc/bionic/pthread_kill.cpp:45
#2  0x75dcffea in raise (sig=528) at bionic/libc/bionic/raise.cpp:34
#3  0x75dcc718 in __libc_android_abort () at bionic/libc/bionic/abort.cpp:47
#4  0x75dcbe90 in abort () at bionic/libc/arch-arm/bionic/abort_arm.S:43
#5  0x75fd8dd2 in base::debug::BreakDebugger () at external/libchrome/base/debug/debugger_posix.cc:240
#6  0x75fe5616 in logging::LogMessage::~LogMessage (this=0x7ea4115c) at external/libchrome/base/logging.cc:652
#7  0x54b96a32 in shill::ChromeosSupplicantProcessProxy::GetInterface (this=<optimized out>, ifname=..., rpc_identifier=<optimized out>)
    at system/connectivity/shill/dbus/chromeos_supplicant_process_proxy.cc:168
#8  0x54b7ae66 in shill::WiFi::ConnectToSupplicant (this=0x75c36c00) at system/connectivity/shill/wifi/wifi.cc:2312
#9  0x54b798ca in shill::WiFi::OnSupplicantAppear (this=0x75c36c00) at system/connectivity/shill/wifi/wifi.cc:2216
#10 0x75fd9780 in base::Callback<void ()>::Run() const (this=<optimized out>) at external/libchrome/base/callback.h:396
#11 base::debug::TaskAnnotator::RunTask (this=<optimized out>, queue_function=<optimized out>, run_function=<optimized out>, 
    pending_task=...) at external/libchrome/base/debug/task_annotator.cc:62
#12 0x75fe7a7a in base::MessageLoop::RunTask (this=0x75cb1028, pending_task=...)
    at external/libchrome/base/message_loop/message_loop.cc:458
#13 0x75fe7b20 in base::MessageLoop::DeferOrRunPendingTask (this=0x0, pending_task=...)
    at external/libchrome/base/message_loop/message_loop.cc:468
#14 0x75fe7c66 in base::MessageLoop::DoWork (this=0x75cb1028) at external/libchrome/base/message_loop/message_loop.cc:580
#15 0x75fe90ec in base::MessagePumpLibevent::Run (this=0x75c97080, delegate=0x75cb1028)
    at external/libchrome/base/message_loop/message_pump_libevent.cc:240
#16 0x75ff1ebc in base::RunLoop::Run (this=0x7ea419d8) at external/libchrome/base/run_loop.cc:55
#17 0x7626190c in brillo::BaseMessageLoop::Run (this=0x75cb10e0) at external/libbrillo/brillo/message_loops/base_message_loop.cc:161
#18 0x76266152 in brillo::Daemon::Run (this=0x75cb1000) at external/libbrillo/brillo/daemons/daemon.cc:29
#19 0x54adcc42 in main (argc=<optimized out>, argv=0x7ea41dd4) at system/connectivity/shill/shill_main.cc:252
(gdb) 

这时,我们就可以知道,是这行代码引起的问题:

#7  0x54b96a32 in shill::ChromeosSupplicantProcessProxy::GetInterface (this=, ifname=..., rpc_identifier=)
    at system/connectivity/shill/dbus/chromeos_supplicant_process_proxy.cc:168

就可以找到相关的文件作进一步的分析。

发表评论

电子邮件地址不会被公开。 必填项已用*标注