当系统中的native进程出现crash的时候,如果系统中存在debugger进程,那么我们就可以在logcat中看到出现native crash时,CPU各个寄存器的值,crash所在线程的backtrace(调用栈:包含模块名及相对内存地址)。这个时候,我们可以通过addr2line(arm-linux-androideabi-addr2line) 查找模块各地址对应的源代码,当然,我们也可以通过gdb直接attach出问题的进程,使用bt命令直接打印出出问题线程的调用栈。
最近在解决USB无法网卡的问题的时候,shill进程经常crash, 为了确定问题点,将debuggerd (system/core/debugger)编译进system.img了。当shill进程crash的时候,我们就可以从logcat中看到类似如下信息:
12-27 06:19:36.098 520 520 F libc : Fatal signal 6 (SIGABRT), code -6 in tid 520 (shill) 12-27 06:19:36.150 92 92 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** 12-27 06:19:36.150 92 92 F DEBUG : Build fingerprint: 'unknown' 12-27 06:19:36.150 92 92 F DEBUG : Revision: '0' 12-27 06:19:36.151 92 92 F DEBUG : ABI: 'arm' 12-27 06:19:36.151 92 92 F DEBUG : pid: 520, tid: 520, name: shill >>> /system/bin/shill <<< 12-27 06:19:36.151 92 92 F DEBUG : signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr -------- 12-27 06:19:36.162 92 92 F DEBUG : Abort message: '[FATAL:chromeos_supplicant_process_proxy.cc(168)] Failed to get interface wlan0_sta: fi.w1.wpa_supplicant1.InterfaceUnknown wpa_supplicant knows nothing about this interface. 12-27 06:19:36.162 92 92 F DEBUG : ' 12-27 06:19:36.162 92 92 F DEBUG : r0 00000000 r1 00000208 r2 00000006 r3 00000000 12-27 06:19:36.162 92 92 F DEBUG : r4 76f07acc r5 00000006 r6 76f07a74 r7 0000010c 12-27 06:19:36.162 92 92 F DEBUG : r8 7eb85d0c r9 7eb85d24 sl fffffcd8 fp 00002188 12-27 06:19:36.162 92 92 F DEBUG : ip 0000000b sp 7eb85ca0 lr 75dda0e3 pc 75ddc85c cpsr 80000010 12-27 06:19:36.182 92 92 F DEBUG : 12-27 06:19:36.182 92 92 F DEBUG : backtrace: 12-27 06:19:36.182 92 92 F DEBUG : #00 pc 0004885c /system/lib/libc.so (tgkill+12) 12-27 06:19:36.182 92 92 F DEBUG : #01 pc 000460df /system/lib/libc.so (pthread_kill+34) 12-27 06:19:36.182 92 92 F DEBUG : #02 pc 0001afe7 /system/lib/libc.so (raise+10) 12-27 06:19:36.182 92 92 F DEBUG : #03 pc 00017715 /system/lib/libc.so (__libc_android_abort+34) 12-27 06:19:36.182 92 92 F DEBUG : #04 pc 00016e8c /system/lib/libc.so (abort+4) 12-27 06:19:36.183 92 92 F DEBUG : #05 pc 0006cdcf /system/lib/libchrome.so (base::debug::BreakDebugger()+2) 12-27 06:19:36.183 92 92 F DEBUG : #06 pc 00079613 /system/lib/libchrome.so (logging::LogMessage::~LogMessage()+590) 12-27 06:19:36.183 92 92 F DEBUG : #07 pc 000dba2f /system/bin/shill 12-27 06:19:36.183 92 92 F DEBUG : #08 pc 000bfe65 /system/bin/shill 12-27 06:19:36.183 92 92 F DEBUG : #09 pc 000be8c7 /system/bin/shill 12-27 06:19:36.183 92 92 F DEBUG : #10 pc 0006d77f /system/lib/libchrome.so (base::debug::TaskAnnotator::RunTask(char const*, char const*, base::PendingTask const&)+130) 12-27 06:19:36.183 92 92 F DEBUG : #11 pc 0007ba77 /system/lib/libchrome.so (base::MessageLoop::RunTask(base::PendingTask const&)+98) 12-27 06:19:36.183 92 92 F DEBUG : #12 pc 0007bb1d /system/lib/libchrome.so (base::MessageLoop::DeferOrRunPendingTask(base::PendingTask const&)+18) 12-27 06:19:36.183 92 92 F DEBUG : #13 pc 0007bc63 /system/lib/libchrome.so (base::MessageLoop::DoWork()+194) 12-27 06:19:36.183 92 92 F DEBUG : #14 pc 0007d0eb /system/lib/libchrome.so (base::MessagePumpLibevent::Run(base::MessagePump::Delegate*)+338) 12-27 06:19:36.183 92 92 F DEBUG : #15 pc 00085eb9 /system/lib/libchrome.so (base::RunLoop::Run()+58) 12-27 06:19:36.184 92 92 F DEBUG : #16 pc 00013909 /system/lib/libbrillo.so (brillo::BaseMessageLoop::Run()+18) 12-27 06:19:36.184 92 92 F DEBUG : #17 pc 0001814f /system/lib/libbrillo.so (brillo::Daemon::Run()+20) 12-27 06:19:36.184 92 92 F DEBUG : #18 pc 00021c41 /system/bin/shill 12-27 06:19:36.184 92 92 F DEBUG : #19 pc 000162dd /system/lib/libc.so (__libc_init+52) 12-27 06:19:36.184 92 92 F DEBUG : #20 pc 0002168c /system/bin/shill 12-27 06:19:36.358 92 92 F DEBUG : 12-27 06:19:36.358 92 92 F DEBUG : Tombstone written to: /data/tombstones/tombstone_00 12-27 06:19:36.653 525 525 W crash_reporter: Received crash notification for shill[520] sig 6, user 0 (ignoring - no consent)
光从这些log信息,我们还不能定位到出现问题的代码在哪个文件,第几行出现的问题。当然有了这些信息,我们可以通过addr2line这个工具来定位出问题的代码(通过log中提示的出问题的地址以及所对应的模块)。当然,如果有gdb的话,我们可以直接attach到这个进程,并通过gdb的bt(backtrace)直接打印当前的调用栈。下面请看具体的操作流程:
设置系统属性debug.debuggerd.wait_for gdb为true:
$ adb shell setprop debug.debuggerd.wait_for_gdb true
设置完这个系统属性之后,当shill再次crash的时候,logcat中就会多出如下信息:
12-27 06:19:41.073 92 92 F DEBUG : #19 pc 000162dd /system/lib/libc.so (__libc_init+52) 12-27 06:19:41.073 92 92 F DEBUG : #20 pc 0002168c /system/bin/shill 12-27 06:19:41.248 92 92 F DEBUG : 12-27 06:19:41.248 92 92 F DEBUG : Tombstone written to: /data/tombstones/tombstone_00 12-27 06:19:41.248 92 92 I : *********************************************************** 12-27 06:19:41.248 92 92 I : * Process 528 has been suspended while crashing. 12-27 06:19:41.248 92 92 I : * To attach gdbserver and start gdb, run this on the host: 12-27 06:19:41.248 92 92 I : * 12-27 06:19:41.248 92 92 I : * gdbclient 528 12-27 06:19:41.248 92 92 I : * 12-27 06:19:41.248 92 92 I : * Wait for gdb to start, then press the VOLUME DOWN key 12-27 06:19:41.248 92 92 I : * to let the process continue crashing. 12-27 06:19:41.248 92 92 I : *********************************************************** 12-27 06:19:58.621 106 106 I tlsdated: [event:action_resolve_proxy] no dynamic proxy for google.com:443
这时候就可以直接在命令行中输入gdbclient 528去attach这个进程进行调试。但在brillo-m8-release分支中,gdbclient(是build/envsetup.sh 里面的一个函数,从提交的log中看,在2015-2-12将这个功能被修改成脚本):
commit f9631fd9db447a32135f4060a0d5b51bb799f2d4 Author: Dan Albert <danalbert@google.com> Date: Thu Feb 12 11:14:39 2015 -0800 Remove gdbclient from envsetup. gdbclient is being promoted to a real script: https://android-review.googlesource.com/#/c/131831/ Change-Id: I4bb70ad44cec0ebf62d9e8e355c22ed8b708868b
找到相关的脚本文件,通过help,我们知道可以这么做去attach这个进程(看来debuggerd(system/core/debugggerd)又要对这个描述进行修改了):
$ tools/bdk/debugging/gdbclient.py -p 528 Redirecting gdbclient output to /tmp/gdbclient-8536 WARNING:root:couldn't find /local/brillo-m8-dev/development/scripts/gdb/dalvik.gdb - ART debugging options will not be available GNU gdb (GDB) 7.10 Copyright (C) 2015 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word". Failed to read a valid object file image from memory. tgkill () at bionic/libc/arch-arm/syscalls/tgkill.S:9 9 mov r7, ip (gdb)
在gdb中使用bt查看调用栈:
(gdb) bt #0 tgkill () at bionic/libc/arch-arm/syscalls/tgkill.S:9 #1 0x75dfb0e2 in pthread_kill (t=<optimized out>, sig=6) at bionic/libc/bionic/pthread_kill.cpp:45 #2 0x75dcffea in raise (sig=528) at bionic/libc/bionic/raise.cpp:34 #3 0x75dcc718 in __libc_android_abort () at bionic/libc/bionic/abort.cpp:47 #4 0x75dcbe90 in abort () at bionic/libc/arch-arm/bionic/abort_arm.S:43 #5 0x75fd8dd2 in base::debug::BreakDebugger () at external/libchrome/base/debug/debugger_posix.cc:240 #6 0x75fe5616 in logging::LogMessage::~LogMessage (this=0x7ea4115c) at external/libchrome/base/logging.cc:652 #7 0x54b96a32 in shill::ChromeosSupplicantProcessProxy::GetInterface (this=<optimized out>, ifname=..., rpc_identifier=<optimized out>) at system/connectivity/shill/dbus/chromeos_supplicant_process_proxy.cc:168 #8 0x54b7ae66 in shill::WiFi::ConnectToSupplicant (this=0x75c36c00) at system/connectivity/shill/wifi/wifi.cc:2312 #9 0x54b798ca in shill::WiFi::OnSupplicantAppear (this=0x75c36c00) at system/connectivity/shill/wifi/wifi.cc:2216 #10 0x75fd9780 in base::Callback<void ()>::Run() const (this=<optimized out>) at external/libchrome/base/callback.h:396 #11 base::debug::TaskAnnotator::RunTask (this=<optimized out>, queue_function=<optimized out>, run_function=<optimized out>, pending_task=...) at external/libchrome/base/debug/task_annotator.cc:62 #12 0x75fe7a7a in base::MessageLoop::RunTask (this=0x75cb1028, pending_task=...) at external/libchrome/base/message_loop/message_loop.cc:458 #13 0x75fe7b20 in base::MessageLoop::DeferOrRunPendingTask (this=0x0, pending_task=...) at external/libchrome/base/message_loop/message_loop.cc:468 #14 0x75fe7c66 in base::MessageLoop::DoWork (this=0x75cb1028) at external/libchrome/base/message_loop/message_loop.cc:580 #15 0x75fe90ec in base::MessagePumpLibevent::Run (this=0x75c97080, delegate=0x75cb1028) at external/libchrome/base/message_loop/message_pump_libevent.cc:240 #16 0x75ff1ebc in base::RunLoop::Run (this=0x7ea419d8) at external/libchrome/base/run_loop.cc:55 #17 0x7626190c in brillo::BaseMessageLoop::Run (this=0x75cb10e0) at external/libbrillo/brillo/message_loops/base_message_loop.cc:161 #18 0x76266152 in brillo::Daemon::Run (this=0x75cb1000) at external/libbrillo/brillo/daemons/daemon.cc:29 #19 0x54adcc42 in main (argc=<optimized out>, argv=0x7ea41dd4) at system/connectivity/shill/shill_main.cc:252 (gdb)
这时,我们就可以知道,是这行代码引起的问题:
#7 0x54b96a32 in shill::ChromeosSupplicantProcessProxy::GetInterface (this=, ifname=..., rpc_identifier=) at system/connectivity/shill/dbus/chromeos_supplicant_process_proxy.cc:168
就可以找到相关的文件作进一步的分析。