Bug 1258
Summary: | sbp2 reports slab corruption of hpsb_packet in 2.6.0-test5-mm3 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Alastair Tse (liquidx) |
Component: | IEEE1394 | Assignee: | Ben Collins (bcollins) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | high | CC: | stefanr, zwane |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.0-test5-mm3 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
output of /proc/config.gz
dmesg output 'dmesg' showing configuration and backtrace .config for linux-2.6.0-test7 |
Description
Alastair Tse
2003-09-22 07:38:59 UTC
Created attachment 919 [details]
output of /proc/config.gz
Created attachment 920 [details]
dmesg output
I've got a little update on the situation. I've tried applying a small patch from linux1394.org's SVN repository to fix a memory leak on ieee1394_core.c because I found that it was something to do with free hpsb packets are the wrong time. http://www.linux1394.org/viewcvs/trunk/ieee1394_core.c?r1=1047&r2=1063 This improved the situation a great deal as I can now access my sbp2 device properly. I am however getting another error in dmesg, although it appears to be as fatal as the one I was getting before: ohci1394: $Rev: 1023 $ Ben Collins <bcollins@debian.org> PCI: Found IRQ 9 for device 0000:00:08.0 PCI: Sharing IRQ 9 with 0000:00:07.2 ohci1394_0: OHCI-1394 1.0 (PCI): IRQ=[9] MMIO=[fedf7000-fedf77ff] Max Packet=[2048] Debug: sleeping function called from invalid context at mm/slab.c:1817 Call Trace: [<c0119781>] __might_sleep+0x61/0x80 [<c013ce24>] __kmalloc+0xa4/0xb0 [<c8b8d419>] hpsb_create_hostinfo+0x39/0xd0 [ieee1394] [<c8b92294>] nodemgr_add_host+0x24/0x130 [ieee1394] [<c8b9d82d>] ohci_initialize+0x21d/0x230 [ohci1394] [<c8b8dd36>] highlevel_add_host+0x86/0xa0 [ieee1394] [<c8b8d20d>] hpsb_add_host+0x6d/0xa0 [ieee1394] [<c8ba1b32>] ohci1394_pci_probe+0x3f2/0x590 [ohci1394] [<c8b9f970>] ohci_irq_handler+0x0/0x7d0 [ohci1394] [<c026fdbd>] pci_device_probe_static+0x4d/0x70 [<c026ff4c>] __pci_device_probe+0x3c/0x50 [<c026ff8c>] pci_device_probe+0x2c/0x50 [<c02b4cbd>] bus_match+0x3d/0x70 [<c02b4e20>] driver_attach+0x70/0xb0 [<c02b5134>] bus_add_driver+0xb4/0xd0 [<c02b55a1>] driver_register+0x31/0x40 [<c027020b>] pci_register_driver+0x5b/0x80 [<c8b2a015>] ohci1394_init+0x15/0x3e [ohci1394] [<c01323d2>] sys_init_module+0x132/0x280 [<c01093fb>] syscall_call+0x7/0xb PM: Adding info for ieee1394:fw-host0 PM: Adding info for ieee1394:0001d200e003ddcb PM: Adding info for ieee1394:0001d200e003ddcb-0 ieee1394: Node added: ID:BUS[0-00:1023] GUID[0001d200e003ddcb] PM: Adding info for ieee1394:08004603002e6128 ieee1394: Host added: ID:BUS[0-01:1023] GUID[08004603002e6128] sbp2: $Rev: 1018 $ Ben Collins <bcollins@debian.org> scsi0 : SCSI emulation for IEEE-1394 SBP-2 Devices PM: Adding info for No Bus:host0 ieee1394: sbp2: Logged into SBP-2 device ieee1394: sbp2: Node 0-00:1023: Max speed [S400] - Max payload [2048] Vendor: ST312002 Model: 3A Rev: Type: Direct-Access ANSI SCSI revision: 06 SCSI device sda: 234441648 512-byte hdwr sectors (120034 MB) sda: asking for cache data failed sda: assuming drive cache: write through /dev/scsi/host0/bus0/target0/lun0: p1 p2 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 PM: Adding info for scsi:0:0:0:0 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0 oops, i mean it appears to be _not_ as fatal as the one before. sorry, please ignore comment #3. i discovered that message was only when i booted into an older kernel. the problem with the slab error is still present after the patch in comment #3. another update on this bug. i've discovered that it is triggered by CONFIG_DEBUG_SLAB for both -test4-bk3 and -test5-mm3. by compiling the kernel without CONFIG_DEBUG_SLAB makes the symptoms go away, but I'm not sure whether it is the real solution or if there is some memory corruption going on in the ieee1394 code This bug (or one rather similar to it) still fails on 2.6.0-test7 and has been plaguing me on any kernel i've tried since Linux-2.4.19 (so i have simply used that kernel to write data CDs and left it otherwise alone). Roughly the same procedure, except for me, 'sbp2' gets auto-loaded, and the backtrace typically happens several seconds later. tvr-vaio:~# modprobe ohci1394 ohci1394: $Rev: 1045 $ Ben Collins <bcollins@debian.org> ohci1394_0: OHCI-1394 1.1 (PCI): IRQ=[9] MMIO=[e0205000-e02057ff] Max Packet=[2048] tvr-vaio:~# sbp2: $Rev: 1034 $ Ben Collins <bcollins@debian.org> scsi1 : SCSI emulation for IEEE-1394 SBP-2 Devices ieee1394: sbp2: Logged into SBP-2 device Slab corruption: start=cd594718, expend=cd594777, problemat=cd594748 Last user: [<d0b7314c>](free_hpsb_packet+0x2c/0x40 [ieee1394]) Data: ************************************************D5 D6 D6 D6 01 00 00 00 ***************************************A5 Next: 71 F0 2C .4C 31 B7 D0 71 F0 2C ..................... slab error in check_poison_obj(): cache `hpsb_packet': object was modified after freeing Call Trace: [<c013abfb>] check_poison_obj+0x10b/0x1a0 [<c013ae3d>] slab_destroy+0x1ad/0x1c0 [<c013d498>] reap_timer_fnc+0x148/0x220 [<c013d350>] reap_timer_fnc+0x0/0x220 [<c01225c0>] run_timer_softirq+0xb0/0x170 [<c011e465>] do_softirq+0xa5/0xb0 [<c010bd45>] do_IRQ+0xe5/0x120 [<c010a35c>] common_interrupt+0x18/0x20 [<c01bc0f6>] acpi_processor_idle+0xe8/0x1e3 [<c0105000>] _stext+0x0/0x30 [<c01080f4>] cpu_idle+0x34/0x40 [<c0312765>] start_kernel+0x145/0x150 [<c03124e0>] unknown_bootoption+0x0/0x110 tvr-vaio:~# cat > /tmp/console.log I'm running a Sony R505EL with Debian LINUX (testing/unstable branch), and i will attach 'dmesg' output and a .config file. Here's my 'lspci': 00:00.0 Host bridge: Intel Corp. 82830 830 Chipset Host Bridge (rev 04) 00:02.0 VGA compatible controller: Intel Corp. 82830 CGC [Chipset Graphics Controller] (rev 04) 00:02.1 Display controller: Intel Corp. 82830 CGC [Chipset Graphics Controller] 00:1d.0 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #1) (rev 02) 00:1d.1 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #2) (rev 02) 00:1d.2 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #3) (rev 02) 00:1e.0 PCI bridge: Intel Corp. 82801BAM/CAM PCI Bridge (rev 42) 00:1f.0 ISA bridge: Intel Corp. 82801CAM ISA Bridge (LPC) (rev 02) 00:1f.1 IDE interface: Intel Corp. 82801CAM IDE U100 (rev 02) 00:1f.3 SMBus: Intel Corp. 82801CA/CAM SMBus (rev 02) 00:1f.5 Multimedia audio controller: Intel Corp. 82801CA/CAM AC'97 Audio (rev 02) 00:1f.6 Modem: Intel Corp. 82801CA/CAM AC'97 Modem (rev 02) 02:02.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 02:05.0 CardBus bridge: Ricoh Co Ltd RL5c475 (rev 80) 02:08.0 Ethernet controller: Intel Corp. 82801CAM (ICH3) PRO/100 VE (LOM) Ethernet Controller (rev 42) Thank you for your efforts. Feel free to write for more information. Created attachment 1023 [details]
'dmesg' showing configuration and backtrace
Created attachment 1024 [details]
.config for linux-2.6.0-test7
0xcd594748 - 0xcd594718 = 0x30 gdb) p ((struct hpsb_packet *)0)->state_change Cannot access memory at address 0x30 So someone is doing a down/up on the state_change semaphore after it's freed... <bcollins@debian.org> privately suggested getting the tree from linux1394.org from the kernel mailing list, which i did by getting the 'tarball' from: http://www.linux1394.org/viewcvs/ and using its directory 'ieee1394/trunk/' in place of '.../drivers/ieee394'. That sugestion proved to be very helpful (thank very much) and indeed it allows 'modprobe ohci1394' to succeed and CD/RW operations to occur. There are still glitches, though, as 'rmmod sbp2' or 'rmmod ohci1394' give me backtrace(s), bug reports of which have been sent privately (and will be cheerfully provided upon request). (Software suspend also does not work properly with the associated device, but i have no idea whether that ever worked and does not appear to affect other non-CD/RW operations.) Ben, could you change the status of this bug to CLOSED to get it out of sight? |