-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault during tests of Node 10.15.3 on PowerPC musl #27068
Comments
very interesting! when you say every other iteration it miscalculates If there are JS involvement, taking compilation traces with |
Since this seems to affect only V8 itself: It would be interesting to see if you could reproduce this with newer versions of Node or V8, maybe try to find a pure-JS/V8-only reproduction (using |
Did you try the current node master branch to see if the situation is any different? |
@gireeshpunathil I'm not experienced enough with V8 internals to tell if it is going back to "JavaScript land" or if it is simply JITing the regex itself (ala PCRE-JIT). It is definitely leaving C++ for generated code, though, as some other watchpoints hit ::Execute and some were in generated code. @addaleax
Even if it did, I don't have a Google account, so I don't think I'd be able to file it. @mscdex I'm going to try master later this morning to see if anything is different. I just wanted to report the bug while all of this was still fresh in my mind, before I started working on other things and forgot all the details. (Our goal is obviously to package a release, not master - hence my tests were done with 10.15.3.) |
Yep, master sure is different. Lots more crashes :(
|
@awilfox thanks for reporting once we see if it is still reproduce-able on master or not we'll make sure to get the team that works on V8 for PPC engaged as well. |
@mhdawson As noted, it is reproducable on master. (See my last comment, directly above yours.) I will note that if you would like to reproduce it yourself, you will need to apply a single change, because musl uses the ELFv2 ABI: diff --git a/deps/v8/src/ppc/constants-ppc.h b/deps/v8/src/ppc/constants-ppc.h
index 016bc71d26..930e755054 100644
--- a/deps/v8/src/ppc/constants-ppc.h
+++ b/deps/v8/src/ppc/constants-ppc.h
@@ -21,7 +21,7 @@
#endif
#if V8_HOST_ARCH_PPC && \
- (V8_OS_AIX || (V8_TARGET_ARCH_PPC64 && V8_TARGET_BIG_ENDIAN))
+ (V8_OS_AIX || (V8_TARGET_ARCH_PPC64 && _CALL_ELF == 1))
#define ABI_USES_FUNCTION_DESCRIPTORS 1
#else
#define ABI_USES_FUNCTION_DESCRIPTORS 0
@@ -33,13 +33,13 @@
#define ABI_PASSES_HANDLES_IN_REGS 0
#endif
-#if !V8_HOST_ARCH_PPC || !V8_TARGET_ARCH_PPC64 || V8_TARGET_LITTLE_ENDIAN
+#if !V8_HOST_ARCH_PPC || !V8_TARGET_ARCH_PPC64 || _CALL_ELF == 2
#define ABI_RETURNS_OBJECT_PAIRS_IN_REGS 1
#else
#define ABI_RETURNS_OBJECT_PAIRS_IN_REGS 0
#endif
-#if !V8_HOST_ARCH_PPC || (V8_TARGET_ARCH_PPC64 && V8_TARGET_LITTLE_ENDIAN)
+#if !V8_HOST_ARCH_PPC || (V8_TARGET_ARCH_PPC64 && _CALL_ELF == 2)
#define ABI_CALL_VIA_IP 1
#else
#define ABI_CALL_VIA_IP 0 |
Hello @awilfox, I am on V8 on pLinux. Thanks for your very detail report. Here is my question:
note: Please check ABI related constants on src/regexp/ppc/regexp-macro-assembler-ppc.h // Offsets from frame_pointer() of function parameters and stored registers.
static const int kFramePointer = 0;
// Above the frame pointer - Stored registers and stack passed parameters.
// Register 25..31.
static const int kStoredRegisters = kFramePointer;
// Return address (stored from link register, read into pc on return).
static const int kReturnAddress = kStoredRegisters + 7 * kPointerSize;
static const int kCallerFrame = kReturnAddress + kPointerSize;
// Stack parameters placed by caller.
static const int kIsolate =
kCallerFrame + kStackFrameExtraParamSlot * kPointerSize;
// Below the frame pointer.
// Register parameters stored by setup code.
static const int kDirectCall = kFramePointer - kPointerSize;
static const int kStackHighEnd = kDirectCall - kPointerSize;
static const int kNumOutputRegisters = kStackHighEnd - kPointerSize;
static const int kRegisterOutput = kNumOutputRegisters - kPointerSize;
static const int kInputEnd = kRegisterOutput - kPointerSize;
static const int kInputStart = kInputEnd - kPointerSize;
static const int kStartIndex = kInputStart - kPointerSize;
static const int kInputString = kStartIndex - kPointerSize;
// When adding local variables remember to push space for them in
// the frame in GetCode.
static const int kSuccessfulCaptures = kInputString - kPointerSize;
static const int kStringStartMinusOne = kSuccessfulCaptures - kPointerSize;
// First register address. Following registers are below it on the stack.
static const int kRegisterZero = kStringStartMinusOne - kPointerSize;
|
Further, I recommend to build v8 and run all v8 tests to make sure it works on your distribution. The steps to build/test v8 master could be found here: https://github.com/john-yan/Docker-V8-Build. But you might want to test v8 7.3, which is the stable version. Just need to run cd v8 && git checkout branch-heads/7.3 && gclient sync Right after |
It looks like Correcting it to use I'll have one of our team members that has signed Google's CLA write and submit better patches upstream. Thank you again very much for the pointer! |
@awilfox As we already drop Linux PPC BE support on V8, some of the features still might not work. |
FreeBSD PowerPC64 is also working on moving from AIX (ELFv1) to ELFv2 BE. _CALL_ELF is the correct way to detect the ABI in use and is not directly related to the endianness. |
@bdragon28 I agree with you. I am just saying endianness might be a concern down the road in terms of v8 support. |
I've been doing some really deep digging on this issue to attempt to make it as easy as possible for an expert to fix, but sadly, I am not that expert; I've never done anything with V8 before.
I'm the project lead of Adélie Linux, and we're hoping to bring Node.js to our distribution. We are using the musl libc instead of glibc (like Void/musl and others). One of our primary CPU architectures is the 64-bit PowerPC. Eight tests fail on this platform:
I believe the
EMFILE
ones are possibly related to customisations in the musl library that we have done and this issue does not cover them. This issue is abouttest-querywrap
.This crash happens every time. A core dump, from Release or Debug
node
binaries, can be provided if helpful. The crash happens with the following backtrace:Breaking at the CheckStackGuardState function:
This shows that the
isolate_
pointer in the frame is being corrupted. Next I tried using the Debug build. I had to merge a single line patch from Node master (which removed the ra != r0 assertion from D_TYPE instructions in the PPC assembler), and also correct the V8 memory allocator to align properly on PPC (thehint
atheap/spaces.cc:130
had to be changed fromGetRandomMmapAddr()
to(GetRandomMmapAddr() & ~(alignment-1))
for debugging purposes only). Using this new Debug build and hardware watchpoints, I was able to find:It appears that the frame pointer is being improperly adjusted at some point. The
Isolate
pointer is being written atre_frame+16
, when it belongs atre_frame+176
.In a Release build, the object being created at
re_frame+176
is aRegExpStackScope
; its first member iscapacity
, which is set to 0; trying to use this object as anIsolate
is causing the NULL pointer dereference. The Debug build crashes with a different backtrace, because the object is aRegExpStack
, which has valid pointers (but not correct pointers):If I manually set the Isolate pointer at re_frame+176 to the correct value, and continue, then the third time it checks the StackGuard state, the Isolate point is correct when it enters the method. The fourth time, it is incorrect again. This repeats every even numbered time the StackGuard state is checked. If I manually correct the value on each even-numered attempt, the test passes correctly.
I am unsure on how to proceed to actually fix this bug. The V8 regexp code is, frankly, a twisty maze that is very hard for me to follow; it is spread out amongst multiple files and even some headers. I am hopeful that the Node community may know of a way to find the root issue so that we may fix it.
The text was updated successfully, but these errors were encountered: