Bug 6810 - ACPI sleeping with interrupts disabled
Summary: ACPI sleeping with interrupts disabled
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: ACPICA-Core (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-07-10 15:52 UTC by Alan Stern
Modified: 2006-09-28 13:18 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.18-rc1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmesg log of bootup (29.03 KB, text/plain)
2006-07-10 15:53 UTC, Alan Stern
Details
dmesg log of suspend-to-disk and resume (16.65 KB, text/plain)
2006-07-10 15:55 UTC, Alan Stern
Details
dmesg log of suspend-to-disk and resume with timeout set to 0 (16.46 KB, text/plain)
2006-07-10 15:57 UTC, Alan Stern
Details
patch vs 2.6.18-rc2 (701 bytes, patch)
2006-07-26 18:16 UTC, Len Brown
Details | Diff
patch vs 2.6.18-rc4 (1.07 KB, patch)
2006-08-16 16:07 UTC, Len Brown
Details | Diff

Description Alan Stern 2006-07-10 15:52:30 UTC
Motherboard is K7S5A with AMD Duron processor.

As discussed elsewhere, during resume the system faults in
acpi_os_wait_semaphore() at a time when interrupts are disabled.  When I patch
the routine by setting timeout to 0 if irq_disabled(), the same thing happens in
acpi_evaluate_integer().

One bright spot is that with 2.6.18-rc1 these faults no longer cause an
interrupt storm, as happens in 2.6.17.

I will attach dmesg logs of the bootup and the two suspend-to-disk attempts.
Comment 1 Alan Stern 2006-07-10 15:53:59 UTC
Created attachment 8524 [details]
dmesg log of bootup
Comment 2 Alan Stern 2006-07-10 15:55:51 UTC
Created attachment 8525 [details]
dmesg log of suspend-to-disk and resume

Note: This test was made using 2.6.18-rc1 plus the changes in Greg KH's
gregkh-all-2.6.18-rc1 patch plus the patch attached to bug #3469.
Comment 3 Alan Stern 2006-07-10 15:57:21 UTC
Created attachment 8526 [details]
dmesg log of suspend-to-disk and resume with timeout set to 0

This test was run with the same kernel as before, plus this patch:

--- usb-2.6.orig/drivers/acpi/osl.c
+++ usb-2.6/drivers/acpi/osl.c
@@ -758,6 +758,9 @@ acpi_status acpi_os_wait_semaphore(acpi_
	ACPI_DEBUG_PRINT((ACPI_DB_MUTEX, "Waiting for semaphore[%p|%d|%d]\n",
			  handle, units, timeout));
 
+	if (irqs_disabled())
+		timeout = 0;
+
	switch (timeout) {
		/*
		 * No Wait:
Comment 4 Len Brown 2006-07-26 18:16:00 UTC
Created attachment 8623 [details]
patch vs 2.6.18-rc2

seems devious to nuke the timeout if irqs disabled.
Perhaps this patch to simply do a lock_try() upfront
is more understandable?
Comment 5 Alan Stern 2006-07-27 07:11:47 UTC
I can't try your patch right now, but it looks like it should solve the problem
involving acpi_os_wait_semaphore.  But what about the oops shown in attachment
8526 [details] (comment #3)?  Are acpi_pci_link_set() and its subordinates supposed to be
able to run with interrupts disabled?
Comment 6 Len Brown 2006-08-16 12:41:49 UTC
re: comment #3 and comment #5, 
yes, there is a different patch for acpi_evaluate_integer(). 
 
The direct answer to your question is NO, 
acpi_pci_link_set() is NOT designed to be called with interrupts off -- 
because it can call the AML in interpreter, which needs to allocate. 
 
It magically works at boot time due to system_state. 
 
Although suspend/resume is analogous to boot time, it was 
deemed preferable to hack all the invocations that might 
provoke might_sleep() rather than to deal with it again in once place. 
This route wasn't my 1st choice, can you tell?:-) 
 
Comment 7 Len Brown 2006-08-16 16:07:17 UTC
Created attachment 8805 [details]
patch vs 2.6.18-rc4

this consolidated patch handles both the acpi_os_acquire_semaphore() case
and the acpi_evaluate_integer() case.
Comment 8 Alan Stern 2006-08-20 14:11:44 UTC
The attachment in comment #7 works okay.  In fact, when I tried it without the
hunk for acpi_evaluate_integer() it still worked!  Apparently something else has
changed in the interim, and as a result that routine is no longer called at a
bad time.  Leaving out the original change to acpi_os_acquire_semaphore() did
cause the error to re-appear, so that much _is_ necessary.

By the way, I noticed that this patch could go ahead to simplify the timeout==0
case in the big "switch" statement.
Comment 9 Len Brown 2006-08-27 19:36:13 UTC
shipped upstream post 2.6.18-rc4. 
closed. 

Note You need to log in before you can comment on or make changes to this bug.