Bug 9121 - BUG: sleeping function called from invalid context at kernel/rwsem.c:20
Summary: BUG: sleeping function called from invalid context at kernel/rwsem.c:20
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: EDAC (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: doug thompson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-04 03:40 UTC by Christopher Brown
Modified: 2008-01-23 23:51 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.22.9
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Christopher Brown 2007-10-04 03:40:07 UTC
Most recent kernel where this bug did not occur: Unknown
Distribution: Fedora 7
Hardware Environment: 

http://snecker.fedorapeople.org/lshal.txt
http://snecker.fedorapeople.org/dmidecode.txt
http://snecker.fedorapeople.org/lspci.txt
http://snecker.fedorapeople.org/dmesg.txt

Software Environment: 2.6.22.9-91.fc7

Problem Description:

BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
 [<c043b796>] down_read+0x12/0x28
 [<c04f7522>] pci_get_subsys+0x71/0xf3
 [<c042ff4f>] process_timeout+0x0/0x5
 [<c04f75ba>] pci_get_device+0x16/0x19
 [<f8c518cf>] edac_kernel_thread+0x94/0xef [edac_mc]
 [<f8c5183b>] edac_kernel_thread+0x0/0xef [edac_mc]
 [<c0438682>] kthread+0x38/0x5e
 [<c043864a>] kthread+0x0/0x5e
 [<c0405b6b>] kernel_thread_helper+0x7/0x10
 =======================
EDAC PCI: Signaled System Error on 0000:00:19.0
EDAC PCI: Bridge Signaled System Error on 0000:00:19.0
EDAC PCI: Bridge Detected Parity Error on 0000:00:19.0

Full log at:

http://snecker.fedorapeople.org/dmesg_with_edac_loaded.txt

Steps to reproduce:

# /sbin/modprobe edac_mc
# echo 1 > /sys/devices/system/edac/pci/check_pci_parity

This is also seen on x86_64 as per:

https://bugzilla.redhat.com/show_bug.cgi?id=299821

This was also reported on:

http://lkml.org/lkml/2007/9/21/537

without response.

System stability does not appear to be affected and the driver can be unloaded without error, stopping above messages.
Comment 1 doug thompson 2007-10-04 11:14:20 UTC
Thanks for the find.

Leave PCI scanning OFF

echo 0 > /sys/devices/system/edac/pci/check_pci_parity

to work around this. Memory scanning will still work.

I believe I fixed this in 2.6.23 EDAC code.

I will get a patch for 2.6.22

Don't know why I didn't get an email for this bug entry.
I did find a mail filtering error, so I didn't see the posting in LKML. I should now when EDAC is in the subject line.
Comment 2 doug thompson 2007-10-04 11:43:24 UTC
As for the output:

EDAC PCI: Signaled System Error on 0000:00:19.0
EDAC PCI: Bridge Signaled System Error on 0000:00:19.0
EDAC PCI: Bridge Detected Parity Error on 0000:00:19.0

the PCI scanning is finding bad status on device 19.0

Whether or not that is false positive (bad hardware) or a valid error remains to be seen at this point.

FYI:  To SKIP that particular device set a one for the PCI device attribute 'broken_parity_status' located in  /sys/devices/pci0000:00/0000:00:19.0 (or whatever bus number it is on)

echo 1 > /sys/devices/pci0000:00/0000:00:19.0

will cause the EDAC PCI to skip that device

doug t
Comment 3 Christopher Brown 2007-10-04 11:53:04 UTC
Okay, thanks for the update Doug. On my laptop I get the following:

BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
 [<c043b796>] down_read+0x12/0x28
 [<c04f7522>] pci_get_subsys+0x71/0xf3
 [<c042ff4f>] process_timeout+0x0/0x5
 [<c04f75ba>] pci_get_device+0x16/0x19
 [<f8cee8cf>] edac_kernel_thread+0x94/0xef [edac_mc]
 [<f8cee83b>] edac_kernel_thread+0x0/0xef [edac_mc]
 [<c0438682>] kthread+0x38/0x5e
 [<c043864a>] kthread+0x0/0x5e
 [<c0405b6b>] kernel_thread_helper+0x7/0x10
 =======================
EDAC PCI: Detected Parity Error on 0000:00:1e.0

If you can generate a patch for 2.6.22 I have access to the Fedora build system and can happily test this for you.
Comment 4 Christopher Brown 2007-10-04 11:57:22 UTC
Running:

# echo 1 > /sys/devices/pci0000:00/0000:00:1e.0/broken_parity_status

does indeed stop the error messages appearing when module is reloaded.
Comment 5 doug thompson 2007-12-17 22:46:12 UTC
Christopher,

Patch is below to move the irq save to a safer location and a small time period when in effect

I found the change in 2.6.23 code base and made the mods for this 2.6.22 kernel snapshot

See how this does for you.

doug thompson



Index: linux-2.6.22.5/drivers/edac/edac_mc.c
===================================================================
--- linux-2.6.22.5.orig/drivers/edac/edac_mc.c
+++ linux-2.6.22.5/drivers/edac/edac_mc.c
@@ -478,12 +478,19 @@ static void edac_pci_dev_parity_clear(st
  */
 static void edac_pci_dev_parity_test(struct pci_dev *dev)
 {
+	unsigned long flags;
 	u16 status;
 	u8  header_type;
 
-	/* read the STATUS register on this device
-	 */
+	/* stop any interrupts until we can acquire the status */
+	local_irq_save(flags);
+
+	/* read the STATUS register on this device */
 	status = get_pci_parity_status(dev, 0);
+	pci_read_config_byte(dev, PCI_HEADER_TYPE, &header_type);
+
+	/* Restore interrupts */
+	local_irq_restore(flags);
 
 	debugf2("PCI STATUS= 0x%04x %s\n", status, dev->dev.bus_id );
 
@@ -511,9 +518,6 @@ static void edac_pci_dev_parity_test(str
 		}
 	}
 
-	/* read the device TYPE, looking for bridges */
-	pci_read_config_byte(dev, PCI_HEADER_TYPE, &header_type);
-
 	debugf2("PCI HEADER TYPE= 0x%02x %s\n", header_type, dev->dev.bus_id );
 
 	if ((header_type & 0x7F) == PCI_HEADER_TYPE_BRIDGE) {
@@ -569,7 +573,6 @@ static inline void edac_pci_dev_parity_i
 
 static void do_pci_parity_check(void)
 {
-	unsigned long flags;
 	int before_count;
 
 	debugf3("%s()\n", __func__);
@@ -580,11 +583,9 @@ static void do_pci_parity_check(void)
 	before_count = atomic_read(&pci_parity_count);
 
 	/* scan all PCI devices looking for a Parity Error on devices and
-	 * bridges
+	 * bridges.
 	 */
-	local_irq_save(flags);
 	edac_pci_dev_parity_iterator(edac_pci_dev_parity_test);
-	local_irq_restore(flags);
 
 	/* Only if operator has selected panic on PCI Error */
 	if (panic_on_pci_parity) {
Comment 6 Christopher Brown 2007-12-18 04:14:57 UTC
Hi Doug,

I have updated the systems here to 2.6.23 and they no longer have the issue, though it looks like EDAC has undergone substantial change between the two. For some insane reason the Fedora project doesn't keep old kernel versions around to test in situations such as this so unless I can find somewhere to get it from my hands are tied. Might be better to let the fedora kernel maintainer cc'd into this send this to the RHEL backports folks...?

Anyway, thanks for looking at it. My bad for not updating this bug report to say I'm now a happy customer. Insert usual poor excuse here. Will let you decide on best resolution for this. If someone does know where I can get my hands on source for 2.6.22 I'm happy to build and test.

Regards
Chris
Comment 7 doug thompson 2007-12-18 08:10:08 UTC
Well, I let iti sit for 2 months myself, as I was battling other fires.

Yes 2.6.23 is where I took the core of the patch from, as I had fixed it in there.

2.6.23 had some major improvements, etc.

thanks

doug t



--- bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9121
> 
> 
> 
> 
> 
> ------- Comment #6 from snecklifter@gmail.com  2007-12-18 04:14 -------
> Hi Doug,
> 
> I have updated the systems here to 2.6.23 and they no longer have the issue,
> though it looks like EDAC has undergone substantial change between the two.
> For
> some insane reason the Fedora project doesn't keep old kernel versions around
> to test in situations such as this so unless I can find somewhere to get it
> from my hands are tied. Might be better to let the fedora kernel maintainer
> cc'd into this send this to the RHEL backports folks...?
> 
> Anyway, thanks for looking at it. My bad for not updating this bug report to
> say I'm now a happy customer. Insert usual poor excuse here. Will let you
> decide on best resolution for this. If someone does know where I can get my
> hands on source for 2.6.22 I'm happy to build and test.
> 
> Regards
> Chris
> 
> 
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
> 


W1DUG

Note You need to log in before you can comment on or make changes to this bug.