Distribution: Gentoo Linux GCC 3.3.1 Hardware Environment: Sony VAIO N505VE Intel Pentium Celeron 333MHz Intel 440BX Controller Intel PIIX4 IDE/USB controller FireWire (IEEE 1394): Sony Corporation CXD3222 i.LINK Controller (rev 02) External Firewire Hard Disk with Oxford Chipset using SBP2 mcvaio ~ % lspci 00:00.0 Host bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03) 00:07.0 ISA bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 02) 00:07.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01) 00:07.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01) 00:07.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:08.0 FireWire (IEEE 1394): Sony Corporation CXD3222 i.LINK Controller (rev 02) 00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio Controller] (rev 02) 00:0a.0 VGA compatible controller: Neomagic Corporation NM2200 [MagicGraph 256AV] (rev 20) 00:0b.0 Communication controller: Rockwell International HCF 56k Data/Fax Modem (rev 01) 00:0c.0 CardBus bridge: Ricoh Co Ltd RL5c475 (rev 80) 01:00.0 Ethernet controller: 3Com Corporation 3c575 [Megahertz] 10/100 LAN CardBus (rev 01) mcvaio ~ % cat /proc/interrupts CPU0 0: 2605066 XT-PIC timer 1: 13 XT-PIC i8042 2: 0 XT-PIC cascade 8: 2 XT-PIC rtc 9: 35259 XT-PIC yenta, uhci-hcd, YMFPCI, eth0, ohci1394 11: 2 XT-PIC sonypi 12: 52 XT-PIC i8042 14: 4625 XT-PIC ide0 NMI: 0 LOC: 0 ERR: 0 Software Environment: Linux mcvaio 2.6.0-test5-mm3 #3 Mon Sep 22 14:18:04 BST 2003 i686 Celeron (Mendocino) GenuineIntel GNU/Linux Gnu C 3.3.1 Gnu make 3.80 util-linux 2.12 mount 2.12 e2fsprogs 1.34 jfsutils 1.1.3 xfsprogs 2.3.9 pcmcia-cs 3.2.4 PPP 2.4.1 nfs-utils 1.0.5 Linux C Library 2.3.2 Dynamic linker (ldd) 2.3.2 Procps 3.1.11 Net-tools 1.60 Kbd 1.08 Sh-utils 5.0 Modules Loaded sd_mod sbp2 ohci1394 ieee1394 ipt_ULOG ipt_TOS ipt_state ipt_REJECT ipt_LOG ipt_limit iptable_mangle iptable_nat ip_conntrack iptable_filter ip_tables md5 ipv6 autofs snd_seq_midi snd_opl3_synth snd_seq_instr snd_seq_midi_emul snd_ainstr_fm snd_ymfpci snd_ac97_codec snd_opl3_lib snd_hwdep snd_mpu401_uart snd_rawmidi snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_page_alloc snd_timer snd_mixer_oss snd soundcore rtc bnep sco l2cap hci_usb bluetooth visor usbserial usbmouse hid uhci_hcd usbcore sonypi 3c59x ds yenta_socket pcmcia_core sg Problem Description: This problem only started occuring after 2.6.0-test4-bk3. I've tried 2.6.0-test5 and 2.6.0-test5-mm3. Basically, when I load the modules "ohci1394" which autoloads "sbp2", it will report a "Slab Error" with something accessing hpsb_packet after a couple of seconds. The full error is attached from dmesg. Once this error occurs, the machine cannot get access to the sbp2 device. However, this works fine with 2.6.0-test4-bk3 which I have been using for the last 2 weeks. Another error I encountered which is unrelated in this bug report was with USB that was fixed by reverting this patch here from -test4-bk3 to test4-bk4: http://marc.theaimsgroup.com/?l=linux-kernel&m=106421269204054&w=2 I'm not too sure what else you might need. I do have CONFIG_DEBUG_SLAB enabled in my kernel. Steps to reproduce: 1. Boot up kernel 2. modprobe ieee1394 3. modprobe ohci1394 - here it produces the slab error.
Created attachment 919 [details] output of /proc/config.gz
Created attachment 920 [details] dmesg output
I've got a little update on the situation. I've tried applying a small patch from linux1394.org's SVN repository to fix a memory leak on ieee1394_core.c because I found that it was something to do with free hpsb packets are the wrong time. http://www.linux1394.org/viewcvs/trunk/ieee1394_core.c?r1=1047&r2=1063 This improved the situation a great deal as I can now access my sbp2 device properly. I am however getting another error in dmesg, although it appears to be as fatal as the one I was getting before: ohci1394: $Rev: 1023 $ Ben Collins <bcollins@debian.org> PCI: Found IRQ 9 for device 0000:00:08.0 PCI: Sharing IRQ 9 with 0000:00:07.2 ohci1394_0: OHCI-1394 1.0 (PCI): IRQ=[9] MMIO=[fedf7000-fedf77ff] Max Packet=[2048] Debug: sleeping function called from invalid context at mm/slab.c:1817 Call Trace: [<c0119781>] __might_sleep+0x61/0x80 [<c013ce24>] __kmalloc+0xa4/0xb0 [<c8b8d419>] hpsb_create_hostinfo+0x39/0xd0 [ieee1394] [<c8b92294>] nodemgr_add_host+0x24/0x130 [ieee1394] [<c8b9d82d>] ohci_initialize+0x21d/0x230 [ohci1394] [<c8b8dd36>] highlevel_add_host+0x86/0xa0 [ieee1394] [<c8b8d20d>] hpsb_add_host+0x6d/0xa0 [ieee1394] [<c8ba1b32>] ohci1394_pci_probe+0x3f2/0x590 [ohci1394] [<c8b9f970>] ohci_irq_handler+0x0/0x7d0 [ohci1394] [<c026fdbd>] pci_device_probe_static+0x4d/0x70 [<c026ff4c>] __pci_device_probe+0x3c/0x50 [<c026ff8c>] pci_device_probe+0x2c/0x50 [<c02b4cbd>] bus_match+0x3d/0x70 [<c02b4e20>] driver_attach+0x70/0xb0 [<c02b5134>] bus_add_driver+0xb4/0xd0 [<c02b55a1>] driver_register+0x31/0x40 [<c027020b>] pci_register_driver+0x5b/0x80 [<c8b2a015>] ohci1394_init+0x15/0x3e [ohci1394] [<c01323d2>] sys_init_module+0x132/0x280 [<c01093fb>] syscall_call+0x7/0xb PM: Adding info for ieee1394:fw-host0 PM: Adding info for ieee1394:0001d200e003ddcb PM: Adding info for ieee1394:0001d200e003ddcb-0 ieee1394: Node added: ID:BUS[0-00:1023] GUID[0001d200e003ddcb] PM: Adding info for ieee1394:08004603002e6128 ieee1394: Host added: ID:BUS[0-01:1023] GUID[08004603002e6128] sbp2: $Rev: 1018 $ Ben Collins <bcollins@debian.org> scsi0 : SCSI emulation for IEEE-1394 SBP-2 Devices PM: Adding info for No Bus:host0 ieee1394: sbp2: Logged into SBP-2 device ieee1394: sbp2: Node 0-00:1023: Max speed [S400] - Max payload [2048] Vendor: ST312002 Model: 3A Rev: Type: Direct-Access ANSI SCSI revision: 06 SCSI device sda: 234441648 512-byte hdwr sectors (120034 MB) sda: asking for cache data failed sda: assuming drive cache: write through /dev/scsi/host0/bus0/target0/lun0: p1 p2 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 PM: Adding info for scsi:0:0:0:0 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0
oops, i mean it appears to be _not_ as fatal as the one before.
sorry, please ignore comment #3. i discovered that message was only when i booted into an older kernel. the problem with the slab error is still present after the patch in comment #3.
another update on this bug. i've discovered that it is triggered by CONFIG_DEBUG_SLAB for both -test4-bk3 and -test5-mm3. by compiling the kernel without CONFIG_DEBUG_SLAB makes the symptoms go away, but I'm not sure whether it is the real solution or if there is some memory corruption going on in the ieee1394 code
This bug (or one rather similar to it) still fails on 2.6.0-test7 and has been plaguing me on any kernel i've tried since Linux-2.4.19 (so i have simply used that kernel to write data CDs and left it otherwise alone). Roughly the same procedure, except for me, 'sbp2' gets auto-loaded, and the backtrace typically happens several seconds later. tvr-vaio:~# modprobe ohci1394 ohci1394: $Rev: 1045 $ Ben Collins <bcollins@debian.org> ohci1394_0: OHCI-1394 1.1 (PCI): IRQ=[9] MMIO=[e0205000-e02057ff] Max Packet=[2048] tvr-vaio:~# sbp2: $Rev: 1034 $ Ben Collins <bcollins@debian.org> scsi1 : SCSI emulation for IEEE-1394 SBP-2 Devices ieee1394: sbp2: Logged into SBP-2 device Slab corruption: start=cd594718, expend=cd594777, problemat=cd594748 Last user: [<d0b7314c>](free_hpsb_packet+0x2c/0x40 [ieee1394]) Data: ************************************************D5 D6 D6 D6 01 00 00 00 ***************************************A5 Next: 71 F0 2C .4C 31 B7 D0 71 F0 2C ..................... slab error in check_poison_obj(): cache `hpsb_packet': object was modified after freeing Call Trace: [<c013abfb>] check_poison_obj+0x10b/0x1a0 [<c013ae3d>] slab_destroy+0x1ad/0x1c0 [<c013d498>] reap_timer_fnc+0x148/0x220 [<c013d350>] reap_timer_fnc+0x0/0x220 [<c01225c0>] run_timer_softirq+0xb0/0x170 [<c011e465>] do_softirq+0xa5/0xb0 [<c010bd45>] do_IRQ+0xe5/0x120 [<c010a35c>] common_interrupt+0x18/0x20 [<c01bc0f6>] acpi_processor_idle+0xe8/0x1e3 [<c0105000>] _stext+0x0/0x30 [<c01080f4>] cpu_idle+0x34/0x40 [<c0312765>] start_kernel+0x145/0x150 [<c03124e0>] unknown_bootoption+0x0/0x110 tvr-vaio:~# cat > /tmp/console.log I'm running a Sony R505EL with Debian LINUX (testing/unstable branch), and i will attach 'dmesg' output and a .config file. Here's my 'lspci': 00:00.0 Host bridge: Intel Corp. 82830 830 Chipset Host Bridge (rev 04) 00:02.0 VGA compatible controller: Intel Corp. 82830 CGC [Chipset Graphics Controller] (rev 04) 00:02.1 Display controller: Intel Corp. 82830 CGC [Chipset Graphics Controller] 00:1d.0 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #1) (rev 02) 00:1d.1 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #2) (rev 02) 00:1d.2 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #3) (rev 02) 00:1e.0 PCI bridge: Intel Corp. 82801BAM/CAM PCI Bridge (rev 42) 00:1f.0 ISA bridge: Intel Corp. 82801CAM ISA Bridge (LPC) (rev 02) 00:1f.1 IDE interface: Intel Corp. 82801CAM IDE U100 (rev 02) 00:1f.3 SMBus: Intel Corp. 82801CA/CAM SMBus (rev 02) 00:1f.5 Multimedia audio controller: Intel Corp. 82801CA/CAM AC'97 Audio (rev 02) 00:1f.6 Modem: Intel Corp. 82801CA/CAM AC'97 Modem (rev 02) 02:02.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 02:05.0 CardBus bridge: Ricoh Co Ltd RL5c475 (rev 80) 02:08.0 Ethernet controller: Intel Corp. 82801CAM (ICH3) PRO/100 VE (LOM) Ethernet Controller (rev 42) Thank you for your efforts. Feel free to write for more information.
Created attachment 1023 [details] 'dmesg' showing configuration and backtrace
Created attachment 1024 [details] .config for linux-2.6.0-test7
0xcd594748 - 0xcd594718 = 0x30 gdb) p ((struct hpsb_packet *)0)->state_change Cannot access memory at address 0x30 So someone is doing a down/up on the state_change semaphore after it's freed...
<bcollins@debian.org> privately suggested getting the tree from linux1394.org from the kernel mailing list, which i did by getting the 'tarball' from: http://www.linux1394.org/viewcvs/ and using its directory 'ieee1394/trunk/' in place of '.../drivers/ieee394'. That sugestion proved to be very helpful (thank very much) and indeed it allows 'modprobe ohci1394' to succeed and CD/RW operations to occur. There are still glitches, though, as 'rmmod sbp2' or 'rmmod ohci1394' give me backtrace(s), bug reports of which have been sent privately (and will be cheerfully provided upon request). (Software suspend also does not work properly with the associated device, but i have no idea whether that ever worked and does not appear to affect other non-CD/RW operations.)
Ben, could you change the status of this bug to CLOSED to get it out of sight?