Bug 120351
Summary: | Wrong Debugger Code - UBSAN splat in drivers/acpi/acpica/dsutils.c:641:16 | ||
---|---|---|---|
Product: | ACPI | Reporter: | Wilfried Klaebe (linux-kernel) |
Component: | ACPICA-Core | Assignee: | Lv Zheng (lv.zheng) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | jirislaby, kilobyte, lv.zheng, navinp1912, Robert.Moore, rui.zhang, rwarsow |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.7.0-rc3 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
.config
acpidump -b generated .dat files tentative patch [PATCH] ACPICA: Dispatcher: Add check to avoid accessing non-existing operands [PATCH] ACPICA: Dispatcher: Remove possible useless debugger message serial console log |
Description
Wilfried Klaebe
2016-06-15 12:24:32 UTC
Created attachment 220091 [details]
.config
Please provide the acpidump of the machine. BTW, is this a regression? If so, could you give it a "git bisect"? Thanks and best regards -Lv I don't know whether this is a regression, I only activated the UB checks in 4.7-rc3. Attaching a tar file of "acpidump -b" generated .dat files. Created attachment 220411 [details]
acpidump -b generated .dat files
Hopefully, we can reproduce it using the acpidump output locally. Thanks -Lv Please upload the dmidecode of the machine. My code may not be the same as yours, please also upload /usr/local/src/kernel/linux-git/drivers/acpi/acpica/dsutils.c here. This change makes it go away for me. walk_state->num_operands is 0 in my case when i booting the host and the array index is hence negative since we do num_operands -1.So i check whether num_operands > 0 before accessing that. I also tried to do return AE_BAD_ADDRESS when walk_state->num_operands==0 but that made my mouse and other usb devices to not work and dmesg full of exceptions. The below change silents doesn't access and it gets rid of it. My kernel version is 4.7.0-rc7+ diff --git a/drivers/acpi/acpica/dsutils.c b/drivers/acpi/acpica/dsutils.c index f393de9..71bae80 100644 --- a/drivers/acpi/acpica/dsutils.c +++ b/drivers/acpi/acpica/dsutils.c @@ -637,11 +637,13 @@ acpi_ds_create_operand(struct acpi_walk_state *walk_state, ACPI_DEBUG_PRINT((ACPI_DB_DISPATCH, "Argument previously created, already stacked\n")); - acpi_db_display_argument_object(walk_state-> + if(walk_state->num_operands>0) { + acpi_db_display_argument_object(walk_state-> operands[walk_state-> num_operands - 1], walk_state); + } /* * Use value that was already previously returned This reliably pops up on my home box, too. And Navin's patch makes it go away (tested on 4.7.0). At a brief look at the code around, it appears to be a sane fix (although I don't know these parts). As for bisecting, the warning happens on the earliest kernel that had the UBSAN machinery; backporting it to test earlier kernels would be a lot of work so I stopped at that. (In reply to Adam Borowski from comment #9) > This reliably pops up on my home box, too. And Navin's patch makes it go > away (tested on 4.7.0). At a brief look at the code around, it appears to > be a sane fix (although I don't know these parts). > > As for bisecting, the warning happens on the earliest kernel that had the > UBSAN machinery; backporting it to test earlier kernels would be a lot of > work so I stopped at that. Yes, no bisect is needed. This looks like a bug recently introduced by me when I was trying to enable debugger code in Linux. I'll check if such kind of bugs can be seen in other debugger enabling code. Thanks for the report. Looks like a bug related to this commit: https://github.com/acpica/acpica/commit/689fefd We need to check if this fix is proper. Thanks Any updates? I have just hit it on 4.8-rc7 too. Created attachment 239421 [details] tentative patch The patch works for me, and looks sane: if there's nothing on stack, there's nothing to display. And trying to display that element (currently only when single stepping) would use an out-of-bounds piece of memory; some types of values (string, buffer) would dereference an invalid pointer. If that's ok with you, all that's needed is patch metadata. The patch's author is navinp1912, the only use of that name in the kernel is "Navin P.S" <navinp1912@gmail.com> -- how should this be written? According to the comment 11, ACPI_PARSEOP_IN_STACK is designed that it ensures there is always an AML operand object in the stack. Why the triggered bug shows there is no object in the stack. So we believe this is not the right fix. And this issue still needs to be root caused. *** Bug 194845 has been marked as a duplicate of this bug. *** This may fix the issue: https://github.com/acpica/acpica/pull/245 Upon applying the obvious adaptation of acpica pull 245 to the kernel, the UBSAN warning is still there. With the if statement guarding it, this implies that ACPI_PARSEOP_IN_STACK is there despite the operand stack being empty. Try this one. A static code analysis may flag it, but the code is using the NumOperands as an index into the operands array. /* Debugger single-step support - display operand */ if (WalkState->NumOperands) { AcpiDbDisplayArgumentObject ( WalkState->Operands [WalkState->NumOperands - 1], WalkState); } Which is essentially the same patch as posted in #18 Sorry, #13 Re-checked: both #18 and #13 do the trick atop 4.11-rc4. ok, thanks. We'll commit it. While we are at it, can you tell us the steps we need to add UBSAN to our test builds? It sounds like this could help us a lot. thanks, bob CONFIG_UBSAN, it's in: Kernel hacking/Undefined behaviour sanity checker; make sure UBSAN_SANITIZE_ALL is not disabled. It needs a not too old version of gcc, 4.9 is good enough, maybe even 4.8. For non-kernel code: gcc -fsanitize=undefined; it's a nice checker you might want to try on all your projects. Marking this as solved, will close upon being merged by upstream. To Bob The problem is: 1. The purpose the message is not clear to me. I managed to get the 2 lines running on debugger window. And it actually always dumps an operand that has been dumped previously by another create_operand call. 2. So I guess purpose of the message may be related to PARSEOP_IN_STACK. And I confirmed, when NumOperands == 0, PARSEOP_IN_STACK is always unset but AML_HAS_RETVAL is always set. So if (NumOperands) fix is equivelent to if (PARSEOP_IN_STACK). That's why I provide the 2 kinds of fix to you. However if I don't modify code, there is no chance the operand will be dumped on the debugger window, so this looks more like a dead untested code. And thus I prefer the fix provided in this pull request: https://github.com/acpica/acpica/pull/245 Created attachment 255657 [details]
[PATCH] ACPICA: Dispatcher: Add check to avoid accessing non-existing operands
Created attachment 255659 [details]
[PATCH] ACPICA: Dispatcher: Remove possible useless debugger message
Adam: Can attachment 255657 [details] or attachment 255659 [details] fix this issue? Thanks Lv So IMO, the purpose of this debugger message is either related to PARSEOP_IN_STACK or nothing. We should either make it related to PARSEOP_IN_STACK or remove it. Just adding a coverity sanity check will make the purpose of the debugger message more implicit than it's used to be. As mentioned in #17, 255657 doesn't fix the warning. This seems wrong, but the only explanation I see is that ACPI_PARSEOP_IN_STACK is on despite the stack being empty. 255659 obviously does the trick. A call that's gone can't misbehave :p (In reply to Adam Borowski from comment #17) > Upon applying the obvious adaptation of acpica pull 245 to the kernel, the > UBSAN warning is still there. > > With the if statement guarding it, this implies that ACPI_PARSEOP_IN_STACK > is there despite the operand stack being empty. On acpiexec, after applying it, the warning is unreproducible. I think we are using same DSDT. I was using 220411. (In reply to Adam Borowski from comment #30) > As mentioned in #17, 255657 doesn't fix the warning. This seems wrong, but > the only explanation I see is that ACPI_PARSEOP_IN_STACK is on despite the > stack being empty. If this is true, the purpose of this debugger message is more confusing... We should delete it. Yes, I believe that the call to AcpiDbDisplayArgumentObject is in fact wrong. Probably "obsolete" would be a better term. The debug message is just that, a debug trace message. I don't see why anyone cares about this. Created attachment 255669 [details]
serial console log
In case it's of some use: here's a log with the following debug patch added:
diff --git a/drivers/acpi/acpica/dsutils.c b/drivers/acpi/acpica/dsutils.c
index 049fbab4e5a6..892b1f8e502b 100644
--- a/drivers/acpi/acpica/dsutils.c
+++ b/drivers/acpi/acpica/dsutils.c
@@ -633,6 +633,8 @@ acpi_ds_create_operand(struct acpi_walk_state *walk_state,
if ((op_info->flags & AML_HAS_RETVAL) ||
(arg->common.flags & ACPI_PARSEOP_IN_STACK)) {
+ printk("already stacked, op flags = %x, arg flags = %x",
+ op_info->flags, arg->common.flags);
ACPI_DEBUG_PRINT((ACPI_DB_DISPATCH,
"Argument previously created, already stacked\n"));
Everything works fine here by simply removing the DisplayArgument call. In the -1 case, DisplayArgument outputs a "null" object which is wrong. Hi, Adam: > As mentioned in #17, 255657 doesn't fix the warning. > This seems wrong, but the only explanation I see is that > ACPI_PARSEOP_IN_STACK is on despite the stack being empty. See op flag HAS_RET_VAL is 0x400, op flag ACPI_PARSEOP_IN_STACK is 0x10 After applying attachment 255657 [details], the logic is: if ((op_info->flags & AML_HAS_RETVAL) || (arg->common.flags & ACPI_PARSEOP_IN_STACK)) { if (arg->common.flags & ACPI_PARSEOP_IN_STACK)) { ACPI_DEBUG_PRINT((ACPI_DB_DISPATCH, ... According to your log: [ 1.709477] already stacked, op flags = 1400, arg flags = 1 So first "if" matches but second "if" doesn't. The buggy statement won't be reached. Why can you still trigger the issue? Did you apply the right patch? Thanks Lv It was unmodified mainline with just that printk patch. Then attachment 255657 [details] actually should work... Since ACPI_PARSEOP_IN_STACK always ensures that an operand is stacked. 1. The if (num > 0) fix makes login in an even hidden style. We shouldn't chose this fix. 2. The debugging log should be applied to if (ACPI_PARSEOP_IN_STACK). While it is under a wrong condition of if (AML_HAS_RETVAL). we can further fix it by adding additional if (ACPI_PARSEOP_IN_STACK). That's why attachment 255657 [details]. Anyway, I managed to enable the debugging message. It seems to be a wrong result of ACPI_PARSEOP_IN_STACK support commit, which adds redundant logs and messes up the debugging output. So finally we chose the removal one of attachment 255659 [details]. :) Thanks for the help. I'm sorry to having your name missing in the ACPICA release patch. I just copied names from 1st comment of buglinks to the final release patch. Now I've asked Rafael to help to add it when he merges the patch internally. Hope it can work. Best regards Lv |