Bug 120351

Summary: Wrong Debugger Code - UBSAN splat in drivers/acpi/acpica/dsutils.c:641:16
Product: ACPI Reporter: Wilfried Klaebe (linux-kernel)
Component: ACPICA-CoreAssignee: Lv Zheng (lv.zheng)
Status: CLOSED CODE_FIX    
Severity: normal CC: jirislaby, kilobyte, lv.zheng, navinp1912, Robert.Moore, rui.zhang, rwarsow
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.7.0-rc3 Subsystem:
Regression: No Bisected commit-id:
Attachments: .config
acpidump -b generated .dat files
tentative patch
[PATCH] ACPICA: Dispatcher: Add check to avoid accessing non-existing operands
[PATCH] ACPICA: Dispatcher: Remove possible useless debugger message
serial console log

Description Wilfried Klaebe 2016-06-15 12:24:32 UTC
While booting, UBSAN reports an index out of range use in drivers/acpi/acpica/dsutils.c:641:16:

[    0.568951] ================================================================================
[    0.568984] UBSAN: Undefined behaviour in /usr/local/src/kernel/linux-git/drivers/acpi/acpica/dsutils.c:641:16
[    0.569015] index -1 is out of range for type 'acpi_operand_object *[9]'
[    0.569045] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc3-00002-g44c5afa #1
[    0.569077] Hardware name: Apple Inc. MacBookPro8,2/Mac-94245A3940C91C80, BIOS    MBP81.88Z.0047.B27.1201241646 01/24/12
[    0.569109]  0000000000000036 ffff88026610baf8 ffffffff815c8665 ffffffff811b1994
[    0.569141]  ffff88026610bb20 ffffffffffffffff ffff88026610bb10 ffffffff8163e77d
[    0.569172]  ffffffff8254e740 ffff88026610bb68 ffffffff8163ee95 0000000000000296
[    0.569203] Call Trace:
[    0.569233]  [<ffffffff815c8665>] dump_stack+0x68/0xa3
[    0.569262]  [<ffffffff811b1994>] ? console_unlock+0x284/0x6a0
[    0.569293]  [<ffffffff8163e77d>] ubsan_epilogue+0xd/0x40
[    0.569322]  [<ffffffff8163ee95>] __ubsan_handle_out_of_bounds+0x75/0xa0
[    0.569353]  [<ffffffff816c4180>] acpi_ds_create_operand+0x244/0x2e4
[    0.569382]  [<ffffffff816c4348>] acpi_ds_create_operands+0x128/0x179
[    0.569410]  [<ffffffff816c49f3>] acpi_ds_exec_end_op+0x109/0x56a
[    0.569440]  [<ffffffff816dd65d>] acpi_ps_parse_loop+0x722/0x774
[    0.569469]  [<ffffffff816de6b9>] acpi_ps_parse_aml+0xa7/0x2d2
[    0.569499]  [<ffffffff816df277>] acpi_ps_execute_method+0x1e3/0x21d
[    0.569527]  [<ffffffff816d73c0>] acpi_ns_evaluate+0x228/0x2b8
[    0.569556]  [<ffffffff816d7aa1>] acpi_ns_initialize_devices+0x150/0x1e4
[    0.569587]  [<ffffffff8292f05c>] ? acpi_sleep_proc_init+0x28/0x28
[    0.569615]  [<ffffffff82930a72>] acpi_initialize_objects+0x33/0x46
[    0.569645]  [<ffffffff8292f10b>] acpi_init+0xaf/0x2d5
[    0.569673]  [<ffffffff81775517>] ? __class_create+0x47/0x80
[    0.569702]  [<ffffffff8292c6cd>] ? video_setup+0x123/0x123
[    0.569731]  [<ffffffff8100044c>] do_one_initcall+0x5c/0x1e0
[    0.569761]  [<ffffffff828f270b>] kernel_init_freeable+0x33b/0x3d1
[    0.569791]  [<ffffffff81b2642a>] kernel_init+0xa/0x120
[    0.569820]  [<ffffffff81b3232f>] ret_from_fork+0x1f/0x40
[    0.569848]  [<ffffffff81b26420>] ? rest_init+0x170/0x170
[    0.569875] ================================================================================
Comment 1 Wilfried Klaebe 2016-06-15 12:27:56 UTC
Created attachment 220091 [details]
.config
Comment 2 Lv Zheng 2016-06-16 01:20:33 UTC
Please provide the acpidump of the machine.
BTW, is this a regression?
If so, could you give it a "git bisect"?

Thanks and best regards
-Lv
Comment 3 Wilfried Klaebe 2016-06-17 04:37:02 UTC
I don't know whether this is a regression, I only activated the UB checks in  4.7-rc3.

Attaching a tar file of "acpidump -b" generated .dat files.
Comment 4 Wilfried Klaebe 2016-06-17 04:37:54 UTC
Created attachment 220411 [details]
acpidump -b generated .dat files
Comment 5 Lv Zheng 2016-06-20 02:29:01 UTC
Hopefully, we can reproduce it using the acpidump output locally.

Thanks
-Lv
Comment 6 Lv Zheng 2016-07-06 07:26:44 UTC
Please upload the dmidecode of the machine.
Comment 7 Lv Zheng 2016-07-06 07:29:52 UTC
My code may not be the same as yours, please also upload /usr/local/src/kernel/linux-git/drivers/acpi/acpica/dsutils.c here.
Comment 8 Navin 2016-07-19 05:50:37 UTC
This change makes it go away for me. walk_state->num_operands is 0 in my case when i booting the host and the array index is hence negative since we do num_operands -1.So i check whether num_operands > 0 before accessing that.

I also tried to do return AE_BAD_ADDRESS when walk_state->num_operands==0 but that made my mouse and other usb devices to not work and dmesg full of exceptions. The below change silents doesn't access and it gets rid of it.

My kernel version is 4.7.0-rc7+ 


diff --git a/drivers/acpi/acpica/dsutils.c b/drivers/acpi/acpica/dsutils.c
index f393de9..71bae80 100644
--- a/drivers/acpi/acpica/dsutils.c
+++ b/drivers/acpi/acpica/dsutils.c
@@ -637,11 +637,13 @@ acpi_ds_create_operand(struct acpi_walk_state *walk_state,
                        ACPI_DEBUG_PRINT((ACPI_DB_DISPATCH,
                                          "Argument previously created, already stacked\n"));
 
-                       acpi_db_display_argument_object(walk_state->
+                       if(walk_state->num_operands>0) {
+                               acpi_db_display_argument_object(walk_state->
                                                        operands[walk_state->
                                                                 num_operands -
                                                                 1],
                                                        walk_state);
+                       }
 
                        /*
                         * Use value that was already previously returned
Comment 9 Adam Borowski 2016-08-06 03:53:49 UTC
This reliably pops up on my home box, too.  And Navin's patch makes it go away (tested on 4.7.0).  At a brief look at the code around, it appears to be a sane fix (although I don't know these parts).

As for bisecting, the warning happens on the earliest kernel that had the UBSAN machinery; backporting it to test earlier kernels would be a lot of work so I stopped at that.
Comment 10 Lv Zheng 2016-08-08 01:12:40 UTC
(In reply to Adam Borowski from comment #9)
> This reliably pops up on my home box, too.  And Navin's patch makes it go
> away (tested on 4.7.0).  At a brief look at the code around, it appears to
> be a sane fix (although I don't know these parts).
> 
> As for bisecting, the warning happens on the earliest kernel that had the
> UBSAN machinery; backporting it to test earlier kernels would be a lot of
> work so I stopped at that.

Yes, no bisect is needed.
This looks like a bug recently introduced by me when I was trying to enable debugger code in Linux.
I'll check if such kind of bugs can be seen in other debugger enabling code.

Thanks for the report.
Comment 11 Lv Zheng 2016-08-08 01:41:55 UTC
Looks like a bug related to this commit:
https://github.com/acpica/acpica/commit/689fefd
We need to check if this fix is proper.

Thanks
Comment 12 Jiri Slaby 2016-09-22 12:24:16 UTC
Any updates? I have just hit it on 4.8-rc7 too.
Comment 13 Adam Borowski 2016-09-22 14:56:02 UTC
Created attachment 239421 [details]
tentative patch

The patch works for me, and looks sane: if there's nothing on stack, there's nothing to display.  And trying to display that element (currently only when single stepping) would use an out-of-bounds piece of memory; some types of values (string, buffer) would dereference an invalid pointer.

If that's ok with you, all that's needed is patch metadata.  The patch's author is navinp1912, the only use of that name in the kernel is "Navin P.S" <navinp1912@gmail.com> -- how should this be written?
Comment 14 Lv Zheng 2016-09-23 00:07:00 UTC
According to the comment 11, ACPI_PARSEOP_IN_STACK is designed that it ensures there is always an AML operand object in the stack.
Why the triggered bug shows there is no object in the stack.
So we believe this is not the right fix.
And this issue still needs to be root caused.
Comment 15 Lv Zheng 2017-03-29 07:37:28 UTC
*** Bug 194845 has been marked as a duplicate of this bug. ***
Comment 16 Lv Zheng 2017-03-29 07:38:20 UTC
This may fix the issue:
https://github.com/acpica/acpica/pull/245
Comment 17 Adam Borowski 2017-03-29 13:11:13 UTC
Upon applying the obvious adaptation of acpica pull 245 to the kernel, the UBSAN warning is still there.

With the if statement guarding it, this implies that ACPI_PARSEOP_IN_STACK is there despite the operand stack being empty.
Comment 18 Robert Moore 2017-03-29 15:00:44 UTC
Try this one. A static code analysis may flag it, but the code is using the NumOperands as an index into the operands array.

            /* Debugger single-step support - display operand */

            if (WalkState->NumOperands)
            {
                AcpiDbDisplayArgumentObject (
                    WalkState->Operands [WalkState->NumOperands - 1],
                    WalkState);
            }
Comment 19 Robert Moore 2017-03-29 15:03:43 UTC
Which is essentially the same patch as posted in #18
Comment 20 Robert Moore 2017-03-29 15:05:22 UTC
Sorry, #13
Comment 21 Adam Borowski 2017-03-29 17:51:07 UTC
Re-checked: both #18 and #13 do the trick atop 4.11-rc4.
Comment 22 Robert Moore 2017-03-29 17:54:47 UTC
ok, thanks. We'll commit it.

While we are at it, can you tell us the steps we need to add UBSAN to our test builds? It sounds like this could help us a lot.
thanks,
bob
Comment 23 Adam Borowski 2017-03-29 18:06:16 UTC
CONFIG_UBSAN, it's in: Kernel hacking/Undefined behaviour sanity checker; make sure UBSAN_SANITIZE_ALL is not disabled.  It needs a not too old version of gcc, 4.9 is good enough, maybe even 4.8.

For non-kernel code: gcc -fsanitize=undefined; it's a nice checker you might want to try on all your projects.
Comment 24 Lv Zheng 2017-03-31 05:21:05 UTC
Marking this as solved, will close upon being merged by upstream.
Comment 25 Lv Zheng 2017-03-31 05:41:22 UTC
To Bob

The problem is:
1. The purpose the message is not clear to me.
   I managed to get the 2 lines running on debugger window.
   And it actually always dumps an operand that has been dumped previously by another create_operand call.
2. So I guess purpose of the message may be related to PARSEOP_IN_STACK.
   And I confirmed, when NumOperands == 0, PARSEOP_IN_STACK is always unset but AML_HAS_RETVAL is always set.
   So if (NumOperands) fix is equivelent to if (PARSEOP_IN_STACK).
That's why I provide the 2 kinds of fix to you.

However if I don't modify code, there is no chance the operand will be dumped on the debugger window, so this looks more like a dead untested code.
And thus I prefer the fix provided in this pull request:
https://github.com/acpica/acpica/pull/245
Comment 26 Lv Zheng 2017-03-31 05:45:39 UTC
Created attachment 255657 [details]
[PATCH] ACPICA: Dispatcher: Add check to avoid accessing non-existing operands
Comment 27 Lv Zheng 2017-03-31 05:46:19 UTC
Created attachment 255659 [details]
[PATCH] ACPICA: Dispatcher: Remove possible useless debugger message
Comment 28 Lv Zheng 2017-03-31 05:47:02 UTC
Adam:

Can attachment 255657 [details] or attachment 255659 [details] fix this issue?

Thanks
Lv
Comment 29 Lv Zheng 2017-03-31 05:50:26 UTC
So IMO, the purpose of this debugger message is either related to PARSEOP_IN_STACK or nothing.
We should either make it related to PARSEOP_IN_STACK or remove it.
Just adding a coverity sanity check will make the purpose of the debugger message more implicit than it's used to be.
Comment 30 Adam Borowski 2017-03-31 06:40:46 UTC
As mentioned in #17, 255657 doesn't fix the warning.  This seems wrong, but the only explanation I see is that ACPI_PARSEOP_IN_STACK is on despite the stack being empty.

255659 obviously does the trick.  A call that's gone can't misbehave :p
Comment 31 Lv Zheng 2017-03-31 09:34:14 UTC
(In reply to Adam Borowski from comment #17)
> Upon applying the obvious adaptation of acpica pull 245 to the kernel, the
> UBSAN warning is still there.
> 
> With the if statement guarding it, this implies that ACPI_PARSEOP_IN_STACK
> is there despite the operand stack being empty.

On acpiexec, after applying it, the warning is unreproducible.
I think we are using same DSDT.
I was using 220411.
Comment 32 Lv Zheng 2017-03-31 09:35:17 UTC
(In reply to Adam Borowski from comment #30)
> As mentioned in #17, 255657 doesn't fix the warning.  This seems wrong, but
> the only explanation I see is that ACPI_PARSEOP_IN_STACK is on despite the
> stack being empty.

If this is true, the purpose of this debugger message is more confusing...
We should delete it.
Comment 33 Robert Moore 2017-03-31 16:20:02 UTC
Yes, I believe that the call to AcpiDbDisplayArgumentObject is in fact wrong. Probably "obsolete" would be a better term.

The debug message is just that, a debug trace message. I don't see why anyone cares about this.
Comment 34 Adam Borowski 2017-03-31 19:17:17 UTC
Created attachment 255669 [details]
serial console log

In case it's of some use: here's a log with the following debug patch added:

diff --git a/drivers/acpi/acpica/dsutils.c b/drivers/acpi/acpica/dsutils.c
index 049fbab4e5a6..892b1f8e502b 100644
--- a/drivers/acpi/acpica/dsutils.c
+++ b/drivers/acpi/acpica/dsutils.c
@@ -633,6 +633,8 @@ acpi_ds_create_operand(struct acpi_walk_state *walk_state,
 
                if ((op_info->flags & AML_HAS_RETVAL) ||
                    (arg->common.flags & ACPI_PARSEOP_IN_STACK)) {
+                       printk("already stacked, op flags = %x, arg flags = %x",
+                               op_info->flags, arg->common.flags);
                        ACPI_DEBUG_PRINT((ACPI_DB_DISPATCH,
                                          "Argument previously created, already stacked\n"));
Comment 35 Robert Moore 2017-03-31 19:23:10 UTC
Everything works fine here by simply removing the DisplayArgument call. In the -1 case, DisplayArgument outputs a "null" object which is wrong.
Comment 36 Lv Zheng 2017-06-09 01:06:40 UTC
Hi, Adam:

> As mentioned in #17, 255657 doesn't fix the warning.
> This seems wrong, but the only explanation I see is that
> ACPI_PARSEOP_IN_STACK is on despite the stack being empty.

See op flag HAS_RET_VAL is 0x400, op flag ACPI_PARSEOP_IN_STACK is 0x10

After applying attachment 255657 [details], the logic is:
if ((op_info->flags & AML_HAS_RETVAL) ||
    (arg->common.flags & ACPI_PARSEOP_IN_STACK)) {
    if (arg->common.flags & ACPI_PARSEOP_IN_STACK)) {
		ACPI_DEBUG_PRINT((ACPI_DB_DISPATCH,
...

 
According to your log:
[    1.709477] already stacked, op flags = 1400, arg flags = 1

So first "if" matches but second "if" doesn't.
The buggy statement won't be reached.
Why can you still trigger the issue?
Did you apply the right patch?

Thanks
Lv
Comment 37 Adam Borowski 2017-06-09 01:10:30 UTC
It was unmodified mainline with just that printk patch.
Comment 38 Lv Zheng 2017-06-09 02:15:18 UTC
Then attachment 255657 [details] actually should work...
Since ACPI_PARSEOP_IN_STACK always ensures that an operand is stacked.
1. The if (num > 0) fix makes login in an even hidden style.
   We shouldn't chose this fix.
2. The debugging log should be applied to if (ACPI_PARSEOP_IN_STACK).
   While it is under a wrong condition of if (AML_HAS_RETVAL).
   we can further fix it by adding additional if (ACPI_PARSEOP_IN_STACK).
   That's why attachment 255657 [details].

Anyway, I managed to enable the debugging message.
It seems to be a wrong result of ACPI_PARSEOP_IN_STACK support commit, which adds redundant logs and messes up the debugging output.

So finally we chose the removal one of attachment 255659 [details]. :)

Thanks for the help.
I'm sorry to having your name missing in the ACPICA release patch.
I just copied names from 1st comment of buglinks to the final release patch.

Now I've asked Rafael to help to add it when he merges the patch internally.
Hope it can work.

Best regards
Lv