Bug 1258 - sbp2 reports slab corruption of hpsb_packet in 2.6.0-test5-mm3
Summary: sbp2 reports slab corruption of hpsb_packet in 2.6.0-test5-mm3
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Ben Collins
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-09-22 07:38 UTC by Alastair Tse
Modified: 2007-07-13 11:51 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.0-test5-mm3
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
output of /proc/config.gz (31.58 KB, text/plain)
2003-09-22 07:40 UTC, Alastair Tse
Details
dmesg output (10.69 KB, text/plain)
2003-09-22 07:40 UTC, Alastair Tse
Details
'dmesg' showing configuration and backtrace (10.52 KB, text/plain)
2003-10-09 16:09 UTC, John Mock
Details
.config for linux-2.6.0-test7 (30.00 KB, text/plain)
2003-10-09 16:11 UTC, John Mock
Details

Description Alastair Tse 2003-09-22 07:38:59 UTC
Distribution:

Gentoo Linux
GCC 3.3.1

Hardware Environment:

Sony VAIO N505VE
Intel Pentium Celeron 333MHz
Intel 440BX Controller
Intel PIIX4 IDE/USB controller
FireWire (IEEE 1394): Sony Corporation CXD3222 i.LINK Controller (rev 02)

External Firewire Hard Disk with Oxford Chipset using SBP2


mcvaio ~ % lspci
00:00.0 Host bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP
disabled) (rev 03)
00:07.0 ISA bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:08.0 FireWire (IEEE 1394): Sony Corporation CXD3222 i.LINK Controller (rev 02)
00:09.0 Multimedia audio controller: Yamaha Corporation YMF-744B [DS-1S Audio
Controller] (rev 02)
00:0a.0 VGA compatible controller: Neomagic Corporation NM2200 [MagicGraph
256AV] (rev 20)
00:0b.0 Communication controller: Rockwell International HCF 56k Data/Fax Modem
(rev 01)
00:0c.0 CardBus bridge: Ricoh Co Ltd RL5c475 (rev 80)
01:00.0 Ethernet controller: 3Com Corporation 3c575 [Megahertz] 10/100 LAN
CardBus (rev 01)

mcvaio ~ % cat /proc/interrupts
           CPU0
  0:    2605066          XT-PIC  timer
  1:         13          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  8:          2          XT-PIC  rtc
  9:      35259          XT-PIC  yenta, uhci-hcd, YMFPCI, eth0, ohci1394
 11:          2          XT-PIC  sonypi
 12:         52          XT-PIC  i8042
 14:       4625          XT-PIC  ide0
NMI:          0
LOC:          0
ERR:          0

Software Environment:

Linux mcvaio 2.6.0-test5-mm3 #3 Mon Sep 22 14:18:04 BST 2003 i686 Celeron
(Mendocino) GenuineIntel GNU/Linux
  
Gnu C                  3.3.1
Gnu make               3.80
util-linux             2.12
mount                  2.12
e2fsprogs              1.34
jfsutils               1.1.3
xfsprogs               2.3.9
pcmcia-cs              3.2.4
PPP                    2.4.1
nfs-utils              1.0.5
Linux C Library        2.3.2
Dynamic linker (ldd)   2.3.2
Procps                 3.1.11
Net-tools              1.60
Kbd                    1.08
Sh-utils               5.0
Modules Loaded         sd_mod sbp2 ohci1394 ieee1394 ipt_ULOG ipt_TOS ipt_state
ipt_REJECT ipt_LOG ipt_limit iptable_mangle iptable_nat ip_conntrack
iptable_filter ip_tables md5 ipv6 autofs snd_seq_midi snd_opl3_synth
snd_seq_instr snd_seq_midi_emul snd_ainstr_fm snd_ymfpci snd_ac97_codec
snd_opl3_lib snd_hwdep snd_mpu401_uart snd_rawmidi snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_page_alloc
snd_timer snd_mixer_oss snd soundcore rtc bnep sco l2cap hci_usb bluetooth visor
usbserial usbmouse hid uhci_hcd usbcore sonypi 3c59x ds yenta_socket pcmcia_core sg

Problem Description:

This problem only started occuring after 2.6.0-test4-bk3. I've tried 2.6.0-test5
and 2.6.0-test5-mm3. Basically, when I load the modules "ohci1394" which
autoloads "sbp2", it will report a "Slab Error" with something accessing
hpsb_packet after a couple of seconds. The full error is attached from dmesg.

Once this error occurs, the machine cannot get access to the sbp2 device.
However, this works fine with 2.6.0-test4-bk3 which I have been using for the
last 2 weeks. Another error I encountered which is unrelated in this bug report
was with USB that was fixed by reverting this patch here from -test4-bk3 to
test4-bk4:

http://marc.theaimsgroup.com/?l=linux-kernel&m=106421269204054&w=2

I'm not too sure what else you might need. I do have CONFIG_DEBUG_SLAB enabled
in my kernel.

Steps to reproduce:

1. Boot up kernel
2. modprobe ieee1394
3. modprobe ohci1394
- here it produces the slab error.
Comment 1 Alastair Tse 2003-09-22 07:40:09 UTC
Created attachment 919 [details]
output of /proc/config.gz
Comment 2 Alastair Tse 2003-09-22 07:40:35 UTC
Created attachment 920 [details]
dmesg output
Comment 3 Alastair Tse 2003-09-22 09:15:06 UTC
I've got a little update on the situation. I've tried applying a small patch
from linux1394.org's SVN repository to fix a memory leak on ieee1394_core.c
because I found that it was something to do with free hpsb packets are the wrong
time.

http://www.linux1394.org/viewcvs/trunk/ieee1394_core.c?r1=1047&r2=1063

This improved the situation a great deal as I can now access my sbp2 device
properly. I am however getting another error in dmesg, although it appears to be
as fatal as the one I was getting before:

ohci1394: $Rev: 1023 $ Ben Collins <bcollins@debian.org>
PCI: Found IRQ 9 for device 0000:00:08.0
PCI: Sharing IRQ 9 with 0000:00:07.2
ohci1394_0: OHCI-1394 1.0 (PCI): IRQ=[9]  MMIO=[fedf7000-fedf77ff]  Max
Packet=[2048]
Debug: sleeping function called from invalid context at mm/slab.c:1817
Call Trace:
 [<c0119781>] __might_sleep+0x61/0x80
 [<c013ce24>] __kmalloc+0xa4/0xb0
 [<c8b8d419>] hpsb_create_hostinfo+0x39/0xd0 [ieee1394]
 [<c8b92294>] nodemgr_add_host+0x24/0x130 [ieee1394]
 [<c8b9d82d>] ohci_initialize+0x21d/0x230 [ohci1394]
 [<c8b8dd36>] highlevel_add_host+0x86/0xa0 [ieee1394]
 [<c8b8d20d>] hpsb_add_host+0x6d/0xa0 [ieee1394]
 [<c8ba1b32>] ohci1394_pci_probe+0x3f2/0x590 [ohci1394]
 [<c8b9f970>] ohci_irq_handler+0x0/0x7d0 [ohci1394]
 [<c026fdbd>] pci_device_probe_static+0x4d/0x70
 [<c026ff4c>] __pci_device_probe+0x3c/0x50
 [<c026ff8c>] pci_device_probe+0x2c/0x50
 [<c02b4cbd>] bus_match+0x3d/0x70
 [<c02b4e20>] driver_attach+0x70/0xb0
 [<c02b5134>] bus_add_driver+0xb4/0xd0
 [<c02b55a1>] driver_register+0x31/0x40
 [<c027020b>] pci_register_driver+0x5b/0x80
 [<c8b2a015>] ohci1394_init+0x15/0x3e [ohci1394]
 [<c01323d2>] sys_init_module+0x132/0x280
 [<c01093fb>] syscall_call+0x7/0xb
 
PM: Adding info for ieee1394:fw-host0
PM: Adding info for ieee1394:0001d200e003ddcb
PM: Adding info for ieee1394:0001d200e003ddcb-0
ieee1394: Node added: ID:BUS[0-00:1023]  GUID[0001d200e003ddcb]
PM: Adding info for ieee1394:08004603002e6128
ieee1394: Host added: ID:BUS[0-01:1023]  GUID[08004603002e6128]
sbp2: $Rev: 1018 $ Ben Collins <bcollins@debian.org>
scsi0 : SCSI emulation for IEEE-1394 SBP-2 Devices
PM: Adding info for No Bus:host0
ieee1394: sbp2: Logged into SBP-2 device
ieee1394: sbp2: Node 0-00:1023: Max speed [S400] - Max payload [2048]
  Vendor: ST312002  Model: 3A                Rev:
  Type:   Direct-Access                      ANSI SCSI revision: 06
SCSI device sda: 234441648 512-byte hdwr sectors (120034 MB)
sda: asking for cache data failed
sda: assuming drive cache: write through
 /dev/scsi/host0/bus0/target0/lun0: p1 p2
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
PM: Adding info for scsi:0:0:0:0
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0,  type 0
Comment 4 Alastair Tse 2003-09-22 09:16:02 UTC
oops, i mean it appears to be _not_ as fatal as the one before.
Comment 5 Alastair Tse 2003-09-22 14:50:38 UTC
sorry, please ignore comment #3. i discovered that message was only when i
booted into an older kernel. the problem with the slab error is still present
after the patch in comment #3.
Comment 6 Alastair Tse 2003-09-23 12:36:10 UTC
another update on this bug. i've discovered that it is triggered by
CONFIG_DEBUG_SLAB for both -test4-bk3 and -test5-mm3. by compiling the kernel
without CONFIG_DEBUG_SLAB makes the symptoms go away, but I'm not sure whether
it is the real solution or if there is some memory corruption going on in the
ieee1394 code
Comment 7 John Mock 2003-10-09 16:04:07 UTC
This bug (or one rather similar to it) still fails on 2.6.0-test7 and has
been plaguing me on any kernel i've tried since Linux-2.4.19 (so i have
simply used that kernel to write data CDs and left it otherwise alone).

Roughly the same procedure, except for me, 'sbp2' gets auto-loaded, and
the backtrace typically happens several seconds later.

    tvr-vaio:~# modprobe ohci1394
    ohci1394: $Rev: 1045 $ Ben Collins <bcollins@debian.org>
    ohci1394_0: OHCI-1394 1.1 (PCI): IRQ=[9]  MMIO=[e0205000-e02057ff]  Max
Packet=[2048]
    tvr-vaio:~# sbp2: $Rev: 1034 $ Ben Collins <bcollins@debian.org>
    scsi1 : SCSI emulation for IEEE-1394 SBP-2 Devices
    ieee1394: sbp2: Logged into SBP-2 device
    Slab corruption: start=cd594718, expend=cd594777, problemat=cd594748
    Last user: [<d0b7314c>](free_hpsb_packet+0x2c/0x40 [ieee1394])
    Data: ************************************************D5 D6 D6 D6 01 00 00
00 ***************************************A5
    Next: 71 F0 2C .4C 31 B7 D0 71 F0 2C .....................
    slab error in check_poison_obj(): cache `hpsb_packet': object was modified
after freeing
    Call Trace:
     [<c013abfb>] check_poison_obj+0x10b/0x1a0
     [<c013ae3d>] slab_destroy+0x1ad/0x1c0
     [<c013d498>] reap_timer_fnc+0x148/0x220
     [<c013d350>] reap_timer_fnc+0x0/0x220
     [<c01225c0>] run_timer_softirq+0xb0/0x170
     [<c011e465>] do_softirq+0xa5/0xb0
     [<c010bd45>] do_IRQ+0xe5/0x120
     [<c010a35c>] common_interrupt+0x18/0x20
     [<c01bc0f6>] acpi_processor_idle+0xe8/0x1e3
     [<c0105000>] _stext+0x0/0x30
     [<c01080f4>] cpu_idle+0x34/0x40
     [<c0312765>] start_kernel+0x145/0x150
     [<c03124e0>] unknown_bootoption+0x0/0x110


    tvr-vaio:~# cat > /tmp/console.log

I'm running a Sony R505EL with Debian LINUX (testing/unstable branch), and i
will attach 'dmesg' output and a .config file.  Here's my 'lspci':

    00:00.0 Host bridge: Intel Corp. 82830 830 Chipset Host Bridge (rev 04)
    00:02.0 VGA compatible controller: Intel Corp. 82830 CGC [Chipset Graphics
Controller] (rev 04)
    00:02.1 Display controller: Intel Corp. 82830 CGC [Chipset Graphics Controller]
    00:1d.0 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #1) (rev 02)
    00:1d.1 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #2) (rev 02)
    00:1d.2 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #3) (rev 02)
    00:1e.0 PCI bridge: Intel Corp. 82801BAM/CAM PCI Bridge (rev 42)
    00:1f.0 ISA bridge: Intel Corp. 82801CAM ISA Bridge (LPC) (rev 02)
    00:1f.1 IDE interface: Intel Corp. 82801CAM IDE U100 (rev 02)
    00:1f.3 SMBus: Intel Corp. 82801CA/CAM SMBus (rev 02)
    00:1f.5 Multimedia audio controller: Intel Corp. 82801CA/CAM AC'97 Audio
(rev 02)
    00:1f.6 Modem: Intel Corp. 82801CA/CAM AC'97 Modem (rev 02)
    02:02.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000
Controller (PHY/Link)
    02:05.0 CardBus bridge: Ricoh Co Ltd RL5c475 (rev 80)
    02:08.0 Ethernet controller: Intel Corp. 82801CAM (ICH3) PRO/100 VE (LOM)
Ethernet Controller (rev 42)


Thank you for your efforts.  Feel free to write for more information.
Comment 8 John Mock 2003-10-09 16:09:23 UTC
Created attachment 1023 [details]
'dmesg' showing configuration and backtrace
Comment 9 John Mock 2003-10-09 16:11:10 UTC
Created attachment 1024 [details]
.config for linux-2.6.0-test7
Comment 10 Zwane Mwaikambo 2003-10-10 19:09:08 UTC
0xcd594748 - 0xcd594718 = 0x30

gdb) p ((struct hpsb_packet *)0)->state_change
Cannot access memory at address 0x30

So someone is doing a down/up on the state_change semaphore after it's
freed...
Comment 11 John Mock 2003-10-13 22:15:24 UTC
<bcollins@debian.org> privately suggested getting the tree from linux1394.org
from the kernel mailing list, which i did by getting the 'tarball' from:

	http://www.linux1394.org/viewcvs/

and using its directory 'ieee1394/trunk/' in place of '.../drivers/ieee394'.
That sugestion proved to be very helpful (thank very much) and indeed it
allows 'modprobe ohci1394' to succeed and CD/RW operations to occur.

There are still glitches, though, as 'rmmod sbp2' or 'rmmod ohci1394' give 
me backtrace(s), bug reports of which have been sent privately (and will 
be cheerfully provided upon request).  (Software suspend also does not work
properly with the associated device, but i have no idea whether that ever
worked and does not appear to affect other non-CD/RW operations.)
Comment 12 Stefan Richter 2006-12-01 02:01:39 UTC
Ben, could you change the status of this bug to CLOSED to get it out of sight?

Note You need to log in before you can comment on or make changes to this bug.