Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV or SIGBUS while returning from native method under gdbserver #628

Closed
dantipov opened this issue Jan 25, 2018 · 12 comments
Closed

SIGSEGV or SIGBUS while returning from native method under gdbserver #628

dantipov opened this issue Jan 25, 2018 · 12 comments
Assignees
Milestone

Comments

@dantipov
Copy link

An application’s process execution is terminated with SIGBUS or SIGSEGV when control flow returns from a JNI method call that is defined in an ARM v7 ELF module if the application is run under prebuilt/android-arm/gdbserver/gdbserver from NDK r16b.

This behavior is observed on Nexus 6P and Pixel XL devices running Android 8.1 January 5 OTA security update.

The bug seems to be ARMv7-specific, as ARMv8 build of the sample works fine then it’s debugged by the same GDB/gdbserver combo.

On target, gdbserver startup command line is:

$ run-as com.example.twolibs /data/data/com.example.twolibs/gdbserver --attach tcp:9999 [pid]

On host, debugging session is:

$ /home/dantipov/android/android-ndk-r16b/prebuilt/linux-x86_64/bin/gdb -q
(gdb) target extended-remote :9999
Remote debugging using :9999
Reading /system/bin/app_process32 from remote target...
[...a lot of library transfer...]
(gdb) info dll
From To Syms Read Shared Object Library
0xf2647fa0 0xf26bbb40 Yes () target:/system/bin/linker
...
[...a lot of libraries shown...]
0xd53ea698 0xd53eb576 Yes (
) target:/data/app/com.example.twolibs-vArWzXBjlMD4efey3B7_-w==/lib/arm/libtwolib-second.so
(gdb) add-symbol-file /tmp/libtwolib-second.so 0xd53ea698
add symbol table from file "/tmp/libtwolib-second.so" at
.text_addr = 0xd53ea698
(y or n) y
Reading symbols from /tmp/libtwolib-second.so...done.
(gdb) b second.c:27
Breakpoint 1 at 0xd53ea712: file /home/dantipov/android/twolibs/app/src/main/cpp/second.c, line 27.
(gdb) c
Continuing.

Thread 1 "example.twolibs" hit Breakpoint 1, Java_com_example_twolibs_TwoLibs_add (env=0xeee312a0, this=0xffc7c0ac, x=1000, y=42)
at /home/dantipov/android/twolibs/app/src/main/cpp/second.c:27
27 return first(x, y);
(gdb) finish
Run till exit from #0 Java_com_example_twolibs_TwoLibs_add (env=0xeee312a0, this=0xffc7c0ac, x=1000, y=42) at /home/dantipov/android/twolibs/app/src/main/cpp/second.c:27

Thread 1 "example.twolibs" received signal SIGSEGV, Segmentation fault.
0xd5599052 in oatexec () from target:/data/app/com.example.twolibs-vArWzXBjlMD4efey3B7_-w==/oat/arm/base.odex
(gdb) bt
#0 0xd5599052 in oatexec () from target:/data/app/com.example.twolibs-vArWzXBjlMD4efey3B7_-w==/oat/arm/base.odex
#1 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) set arm force-mode thumb
(gdb) disassemble
Dump of assembler code for function oatexec:
0xd5599000 <+0>: movs r0, r0
0xd5599002 <+2>: movs r0, r0
0xd5599004 <+4>: movs r0, r0
0xd5599006 <+6>: movs r0, r0
0xd5599008 <+8>: lsls r0, r0, #2
0xd559900a <+10>: movs r0, r0
0xd559900c <+12>: ldr r5, [pc, #896] ; (0xd5599390)
0xd559900e <+14>: movs r0, r0
0xd5599010 <+16>: movs r0, r0
0xd5599012 <+18>: vaddl.u<illegal width 64> q8, d31, d4
0xd5599016 <+22>: movs r0, r0
0xd5599018 <+24>: stmdb sp!, {r5, r6, r7, r8, r10, r11, lr}
0xd559901c <+28>: vpush {s16-s31}
0xd5599020 <+32>: sub sp, #36 ; 0x24
0xd5599022 <+34>: str r0, [sp, #0]
0xd5599024 <+36>: str r1, [sp, #132] ; 0x84
0xd5599026 <+38>: str r2, [sp, #136] ; 0x88
0xd5599028 <+40>: str r3, [sp, #140] ; 0x8c
0xd559902a <+42>: mov.w r12, #1
0xd559902e <+46>: str.w r12, [sp, #8]
0xd5599032 <+50>: ldr.w r12, [r9, #204] ; 0xcc
0xd5599036 <+54>: str.w r12, [sp, #4]
0xd559903a <+58>: add.w r12, sp, #4
0xd559903e <+62>: str.w r12, [r9, #204] ; 0xcc
0xd5599042 <+66>: ldr.w r12, [sp, #132] ; 0x84
0xd5599046 <+70>: str.w r12, [sp, #12]
0xd559904a <+74>: str.w sp, [r9, #148] ; 0x94
0xd559904e <+78>: mov r0, r9
0xd5599050 <+80>: ldr.w r12, [r0, #460] ; 0x1cc
0xd5599054 <+84>: blx r12
0xd5599056 <+86>: str r0, [sp, #16]
0xd5599058 <+88>: ldr r3, [sp, #140] ; 0x8c
0xd559905a <+90>: ldr r2, [sp, #136] ; 0x88
0xd559905c <+92>: add r1, sp, #12
0xd559905e <+94>: ldr.w r0, [r9, #164] ; 0xa4
0xd5599062 <+98>: ldr.w r12, [sp]
0xd5599066 <+102>: ldr.w r12, [r12, #24]
0xd559906a <+106>: blx r12
0xd559906c <+108>: str r0, [sp, #20]
0xd559906e <+110>: ldr r0, [sp, #16]
0xd5599070 <+112>: mov r1, r9
0xd5599072 <+114>: ldr.w r12, [r1, #472] ; 0x1d8
0xd5599076 <+118>: blx r12
0xd5599078 <+120>: ldr r0, [sp, #20]
0xd559907a <+122>: ldr.w r12, [r9, #140] ; 0x8c
0xd559907e <+126>: cmp.w r12, #0
0xd5599082 <+130>: bne.n 0xd5599094 <oatexec+148>
0xd5599084 <+132>: add sp, #36 ; 0x24
0xd5599086 <+134>: vpop {s16-s31}
0xd559908a <+138>: ldmia.w sp!, {r5, r6, r7, r8, r10, r11, lr}
0xd559908e <+142>: ldr.w r8, [r9, #52] ; 0x34
0xd5599092 <+146>: bx lr
0xd5599094 <+148>: mov r0, r12
0xd5599096 <+150>: ldr.w r12, [r9, #704] ; 0x2c0
0xd559909a <+2>: blx r12
End of assembler dump.
getprop.txt
twolibs.zip

@rprichard
Copy link
Collaborator

The instructions weren't completely clear to me. When I set a breakpoint on second.c:27, the app is already loaded, and its activity has already called add, so the breakpoint isn't hit when I continue. Am I missing something?

I managed to reproduce the segfault anyway by moving the call into an onResume override. After I set the breakpoint, I suspend and resume the app. It segfaults after running finish.

@rprichard
Copy link
Collaborator

The oatexec function is Thumb code, but its symbol doesn't appear to be marked as a Thumb symbol. i.e. The lowest bit of its address in the symbol table is 0. Ordinarily, that bit in an armv7 ELF file indicates ARM/Thumb mode (0 == ARM, 1 == Thumb).

I use readelf -sW to check the symbol address. nm and objdump strip the the Thumb flag from the addresses.

The set arm force-mode thumb command above forces GDB to assume every function is Thumb, even if the symbol table says it's ARM. If I use that command early, then the program doesn't segfault.

@enh
Copy link
Contributor

enh commented Jan 26, 2018

seems like an ART bug?

    if (text_size != 0u) {
      Elf_Word oatexec = dynstr_.Add("oatexec");
      dynsym_.Add(oatexec, &text_, text_.GetAddress(), text_size, STB_GLOBAL, STT_OBJECT);

@dantipov
Copy link
Author

I managed to call add by rotating the phone :-). And yes, enforcing Thumb before finish helps me to prevent SIGSEGV as well. Nice.

@dsrbecky
Copy link

The oatexec is not a function. It is ART-internal symbol added the first byte of .text section (as there is no other good way to find where .text is in memory). It is marked STT_OBJECT, so I am not sure why gdb still uses it as if it was function.

Nonetheless, I guess the gdb can not find the proper function name, so it just uses any previous symbol as the reference point.

Is there .gnu_debugdata in the ELF file? (that should generally contain the compressed proper symbols)

@dantipov
Copy link
Author

dantipov commented Jan 26, 2018

base.odex looks stripped:

  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .rodata           PROGBITS        00001000 001000 008000 00   A  0   0 4096
  [ 2] .text             PROGBITS        00009000 009000 00009c 00  AX  0   0 4096
  [ 3] .bss              NOBITS          0000a000 000000 00a000 00   A  0   0 4096
  [ 4] .dynstr           STRTAB          00014000 00a000 00003d 00   A  0   0 4096
  [ 5] .dynsym           DYNSYM          00014040 00a040 000060 10   A  4   0  4
  [ 6] .hash             HASH            000140a0 00a0a0 000024 04   A  5   0  4
  [ 7] .dynamic          DYNAMIC         00015000 00b000 000038 08   A  4   0 4096
  [ 8] .shstrtab         STRTAB          00000000 00c000 00003d 00      0   0 4096

libtwolib-second.so isn't stripped, but there is no .gnu_debugdata section:

  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .note.android.ide NOTE            00000134 000134 000098 00   A  0   0  4
  [ 2] .note.gnu.build-i NOTE            000001cc 0001cc 000024 00   A  0   0  4
  [ 3] .dynsym           DYNSYM          000001f0 0001f0 000130 10   A  4   1  4
  [ 4] .dynstr           STRTAB          00000320 000320 00012e 00   A  0   0  1
  [ 5] .hash             HASH            00000450 000450 000098 04   A  3   0  4
  [ 6] .gnu.version      VERSYM          000004e8 0004e8 000026 02   A  3   0  2
  [ 7] .gnu.version_d    VERDEF          00000510 000510 00001c 00   A  4   1  4
  [ 8] .gnu.version_r    VERNEED         0000052c 00052c 000020 00   A  4   1  4
  [ 9] .rel.dyn          REL             0000054c 00054c 000048 08   A  3   0  4
  [10] .rel.plt          REL             00000594 000594 000060 08   A  3   0  4
  [11] .plt              PROGBITS        000005f4 0005f4 0000a4 00  AX  0   0  4
  [12] .text             PROGBITS        00000698 000698 000ede 00  AX  0   0  4
  [13] .ARM.extab        PROGBITS        00001578 001578 00003c 00   A  0   0  4
  [14] .ARM.exidx        ARM_EXIDX       000015b4 0015b4 000128 08  AL 12   0  4
  [15] .rodata           PROGBITS        000016dc 0016dc 000029 01 AMS  0   0  1
  [16] .fini_array       FINI_ARRAY      00002e80 001e80 000008 04  WA  0   0  4
  [17] .init_array       INIT_ARRAY      00002e88 001e88 000004 04  WA  0   0  1
  [18] .dynamic          DYNAMIC         00002e8c 001e8c 000118 08  WA  4   0  4
  [19] .got              PROGBITS        00002fa4 001fa4 00005c 00  WA  0   0  4
  [20] .data             PROGBITS        00003000 002000 0919e4 00  WA  0   0  4
  [21] .bss              NOBITS          000949e4 0939e4 000001 00  WA  0   0  1
  [22] .comment          PROGBITS        00000000 0939e4 000065 01  MS  0   0  1
  [23] .debug_str        PROGBITS        00000000 093a49 002364 01  MS  0   0  1
  [24] .debug_loc        PROGBITS        00000000 095dad 0019e1 00      0   0  1
  [25] .debug_abbrev     PROGBITS        00000000 09778e 00088c 00      0   0  1
  [26] .debug_info       PROGBITS        00000000 09801a 004e9f 00      0   0  1
  [27] .debug_ranges     PROGBITS        00000000 09ceb9 0000b8 00      0   0  1
  [28] .debug_macinfo    PROGBITS        00000000 09cf71 000003 00      0   0  1
  [29] .debug_pubnames   PROGBITS        00000000 09cf74 000099 00      0   0  1
  [30] .debug_pubtypes   PROGBITS        00000000 09d00d 000362 00      0   0  1
  [31] .debug_line       PROGBITS        00000000 09d36f 000a36 00      0   0  1
  [32] .debug_frame      PROGBITS        00000000 09dda8 000434 00      0   0  4
  [33] .debug_aranges    PROGBITS        00000000 09e1e0 000060 00      0   0  8
  [34] .note.gnu.gold-ve NOTE            00000000 09e240 00001c 00      0   0  4
  [35] .ARM.attributes   ARM_ATTRIBUTES  00000000 09e25c 00003b 00      0   0  1
  [36] .symtab           SYMTAB          00000000 09e298 0008e0 10     37 124  4
  [37] .strtab           STRTAB          00000000 09eb78 0007c4 00      0   0  1
  [38] .shstrtab         STRTAB          00000000 09f33c 0001af 00      0   0  1

@rprichard
Copy link
Collaborator

The oatexec is not a function. It is ART-internal symbol added the first byte of .text section (as there is no other good way to find where .text is in memory). It is marked STT_OBJECT, so I am not sure why gdb still uses it as if it was function.

When gdb's arm_pc_is_thumb function finds the symbol containing an address, it checks the MSYMBOL_IS_SPECIAL flag but not the minimal_symbol_type, which can be mst_text, mst_text_gnu_ifunc, mst_file_text, mst_data, mst_bss, ... That's probably the explanation.

Maybe gdb should ignore data symbols. If I made that change, I think gdb would fall back to the detected ARM/Thumb mode of the "current frame". That'll work as long as every function is Thumb.

If a file has ARM mapping symbols ($a.nn / $t.nn), then gdb prefers to use those over the symbol table. I see them in the platform binaries, but only in .symtab, not .dynsym. base.odex only has .dynsym.

I don't see a .gnu_debugdata section in the app's odex file, but I do see one in /system/framework/arm/boot-framework.oat:

$ adb pull /system/framework/arm/boot-framework.oat && readelf -SW boot-framework.oat 
/system/framework/arm/boot-framework.oat: 1 file pulled. 20.0 MB/s (21229948 bytes in 1.013s)
There are 11 section headers, starting at offset 0x143efc4:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .note.gnu.build-id NOTE            00000234 000234 000024 00   A  0   0  4
  [ 2] .rodata           PROGBITS        00001000 001000 5a0000 00   A  0   0 4096
  [ 3] .text             PROGBITS        005a1000 5a1000 df086c 00  AX  0   0 4096
  [ 4] .bss              NOBITS          01392000 000000 0063e4 00   A  0   0 4096
  [ 5] .dynstr           STRTAB          01399000 1392000 000060 00   A  0   0 4096
  [ 6] .dynsym           DYNSYM          01399060 1392060 000080 10   A  5   0  4
  [ 7] .hash             HASH            013990e0 13920e0 00002c 04   A  6   0  4
  [ 8] .dynamic          DYNAMIC         0139a000 1393000 000038 08   A  5   0 4096
  [ 9] .gnu_debugdata    PROGBITS        00000000 1394000 0aaf64 00      0   0 4096
  [10] .shstrtab         STRTAB          00000000 143ef64 00005f 00      0   0  1

I see that LLDB knows about oat files. It looks for the .oat and .odex extensions as well as the oatexec and oatdata symbols. https://github.com/llvm-mirror/lldb/blob/d0c2df1fedf5dad964c54a954a309472795e886f/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp#L2122

@DanAlbert DanAlbert added this to the r18 milestone Mar 15, 2018
@rprichard
Copy link
Collaborator

I had come up with a hack that worked around this issue: https://android-review.googlesource.com/c/toolchain/gdb/+/602241. I'm not sure it's a good idea.

IIRC, gdb seemed to think oatexec was one giant function. Maybe that could cause trouble in a real app, where oatexec is much larger than a typical function.

@DanAlbert
Copy link
Member

Sounds like the non-hack fix would be a more invasive gdb change. Given that this already works in lldb, I think our answer for a Java interoperable debugger should just be to use lldb.

Related to that, @dantipov (and others): would including lldb directly in the NDK and adding ndk-gdb support for it help you? aiui it isn't actually that much work for us to get lldb into the NDK proper now, we're just unsure of the value since we don't have a good idea of how many people use the NDK in absence of the rest of the SDK.

@dsrbecky
Copy link

PS: Is this reproducible in recent aosp builds? (I made the symbol size 0 - aka unknown).

@rprichard
Copy link
Collaborator

(I made the symbol size 0 - aka unknown).

Thanks for letting me know.

When I last looked at this, in this situation, I think the only information gdb had about addresses in the oatexec object was the oatexec symbol itself. With the symbol having 0 size, I'd expect gdb to stop assuming OAT PCs were ARM-mode, but I'm guessing it would use a heuristic to decide ARM-vs-Thumb, and that could still fail.

Maybe there's other some other debug/symbol information gdb would or could find, though. I'd have to retest.

@rprichard
Copy link
Collaborator

I retested with a recent P build, and the size of the oatexec and oatdex symbols were 0. The oatlastword and oatdexlastword symbols still had a size of 4, so I wonder if there's a potential for trouble near the end of the executable area. The app's base.odex file also had a .gnu_debugdata section, and using gdb, I was able to disassemble a com.example.rprichard.test30ndkbuild.MainActivity.stringFromJNI function and step through it one instruction at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants