Bug 6114
Summary: | Initio sbp2 causes: "slab error in cache_free_debugcheck(): cache `size-512(DMA)': double free, or memory outside" object was overwritten | ||
---|---|---|---|
Product: | SCSI Drivers | Reporter: | Bernhard Kaindl (bk) |
Component: | Other | Assignee: | Stefan Richter (stefanr) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | ||
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.16-rc4 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
dmesg from 2.6.14
dmesg of sbp2 and modprobe sd_mod while the problematic TYPE_RBC disk was alreay connected to the bus that's diff between this debug log and the one where it works with Al Viro's patch |
Description
Bernhard Kaindl
2006-02-21 03:58:26 UTC
Created attachment 7427 [details]
dmesg from 2.6.14
full dmesg from 2.4.14-rc5 (2.4.14.r4 looks the same) attached.
This is a cut and paste from the lines around the first
cache_free_debugcheck():
ohci1394: fw-host0: IntEvent: 00000010
ohci1394: fw-host0: Got RQPkt interrupt status=0x00008451
ohci1394: fw-host0: Single packet rcv'd
ohci1394: fw-host0: Packet received from node 0 ack=0x11 spd=2 tcode=0x1
length=28
ctx=0 tlabel=44
ieee1394: received packet: ffc1b110 ffc0fffe 00000000 00080000
slab error in cache_free_debugcheck(): cache `sgpool-8': double free, or memory
out
side object was overwritten
[<c0142d9d>] cache_free_debugcheck+0x1ad/0x250
[<c0143789>] kmem_cache_free+0x39/0x90
[<c0289c3d>] scsi_io_completion+0x24d/0x540
[<c0289c3d>] scsi_io_completion+0x24d/0x540
[<c028a1c7>] scsi_generic_done+0x37/0x50
[<c0283eff>] scsi_softirq+0x16f/0x1a0
[<c011d9c2>] __do_softirq+0x42/0xa0
[<c011da46>] do_softirq+0x26/0x30
[<c01049be>] do_IRQ+0x1e/0x30
[<c01032c2>] common_interrupt+0x1a/0x20
[<c014007b>] test_clear_page_writeback+0x9b/0xe0
[<c0141822>] check_poison_obj+0x72/0x1a0
[<c014320f>] cache_alloc_debugcheck_after+0x8f/0x180
[<c0143711>] __kmalloc+0x91/0xd0
[<c01a4825>] get_mem_for_virtual_node+0x65/0xe0
[<c01a4825>] get_mem_for_virtual_node+0x65/0xe0
[<c01a4cba>] fix_nodes+0xda/0x3e0
[<c01b2a21>] reiserfs_paste_into_item+0x141/0x220
[<c019f2e2>] reiserfs_allocate_blocks_for_region+0x10a2/0x1660
[<c01985bf>] make_cpu_key+0x4f/0x60
[<c01aef2f>] pathrelse+0x2f/0x50
[<c01a0e8c>] reiserfs_file_write+0x53c/0x790
[<c013e066>] __alloc_pages+0x1c6/0x4a0
[<c016533c>] pipe_poll+0x3c/0xd0
[<c01415fe>] poison_obj+0x2e/0x60
[<c0158aa1>] vfs_write+0xb1/0x130
[<c0158beb>] sys_write+0x4b/0x80
[<c01030f5>] syscall_call+0x7/0xb
c14a5af0: redzone 1: 0x170fc2a5, redzone 2: 0xc013cb9a.
didn't trigger with a 2.3.13.1 so far, but I continue testing older versions
verified that 2.6.13/2.6.14-rc1 is the window where it triggers. Tests so far: Doesn't trigger so far with 2.6.13 Triggers reliably with 2.6.14-rc{1,2,3,4,5} 2.6.1{4,5}, 2.6.16-rc{1,2,3,4} With the 2.6.16-rc's I can also trigger it on AMD64 running x86_64. Triggered on <= 2.6.15 only with Samsung X20 (Pentium M) and Initio 6L200P0 Rev: 2.35 (200 GB). Didn't trigger it with the other Firewire disk. I'll try to find out which changes between 2.6.13 and 2.6.14-rc1 make the message appear on the X20. Fixed by patch "sbp2: update 36byte inquiry workaround (fix compatibility regression)", available in the 1394 development git tree: http://www.kernel.org/git/?p=linux/kernel/git/scjody/ieee1394.git;a=commit;h=99496037c6744fd938ffb8ccfc8fc91762322ff8 or here: http://me.in-berlin.de/~s5r6/linux1394/updates/ and also in 2.6.16-rc4-mm1: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc4/2.6.16-rc4-mm1/broken-out/git-ieee1394.patch Please verify. Created attachment 7432 [details] dmesg of sbp2 and modprobe sd_mod while the problematic TYPE_RBC disk was alreay connected to the bus I get the attached dmesg, with the patch ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc4/2.6.16-rc4-mm1/broken-out/git-ieee1394.patch attached. I on the log are also some debug printk("force_inquiry_hack = %d", force_inqiry_hack); to have evidence of the value of the variable, which I forced to 1 to be sure it's active. The patch in http://sourceforge.net/mailarchive/message.php?msg_id=14879016 from Al Viro at in the Thread "Re: TYPE_RBC cache fixes (sbp2.c affected)" fixed it for me and for Stefan Richer, as well, see his reply: http://sourceforge.net/mailarchive/message.php?msg_id=14879017 Next attachment will be the diff between this debug log and the one where it works with Al Viro's patch (linked above) Created attachment 7433 [details]
that's diff between this debug log and the one where it works with Al Viro's patch
notable in the diff is:
+sda: missing header in MODE_SENSE response
-ieee1394: sbp2: SCSI transfer size = 17d9
+ieee1394: sbp2: SCSI transfer size = d
-sda: got wrong page
-sda: assuming drive cache: write through
-slab error in cache_free_debugcheck(): cache `size-512(DMA)': double free, or
memory outside object was overwri
tten
- [<c0151126>] cache_free_debugcheck+0x186/0x210
- [<e099095b>] sd_revalidate_disk+0x5bb/0xe10 [sd_mod]
...
- [<c0102ebd>] syscall_call+0x7/0xb
-c009ec80: redzone 1: 0x170fc2a5, redzone 2: 0x3e881a4.
+SCSI device sda: drive cache: write back
(and repeated a second time)
Looks good here so far, hdparm -t says 25MB/s,
dd_rescue from /dev/zero to disk ~14MB/s, both with default serialize_io=1,
with serialize_io=0, hdparm -t says 27 MB/s.
this fixes the bug for me, great!
Just for reference, this is exacly the disk which triggered the slab message: http://www.ciao.de/Avalon_Mobile_Festplatte_200_GB__2138203 It has been sold in the supermarkets of the German 'Soft'-Discounter plus.de a few weeks ago. It can be also found uniqely using the search words Plus Avalon 200GB using some search engines. Bug category changed to "SCSI Drivers". I split Al Viro's patch into the general critical part and the special optional part and submitted it for inclusion into 2.6.16 and 2.6.15.x. http://marc.theaimsgroup.com/?t=114065708500001 http://marc.theaimsgroup.com/?t=114065708500002 Patch was merged by Linus. http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=489708007785389941a89fa06aedc5ec53303c96 Tested a kernel with the fix (based on 2.6.16-rc4-git10) and could not reproduce anymore, the kernel messages with my disk (I didn't pass any special options to modprobe or any other modifications) were: scsi0 : SBP-2 IEEE-1394 ieee1394: sbp2: Node 0-00:1023: Using 36byte inquiry workaround ieee1394: sbp2: Logged into SBP-2 device ieee1394: Node 0-00:1023: Max speed [S400] - Max payload [2048] Vendor: Initio Model: 6L200P0 Rev: 2.35 Type: Direct-Access ANSI SCSI revision: 00 SCSI device sda: 398297088 512-byte hdwr sectors (203928 MB) sda: Write Protect is off sda: Mode Sense: 86 0b 00 02 sda: assuming drive cache: write through SCSI device sda: 398297088 512-byte hdwr sectors (203928 MB) sda: Write Protect is off sda: Mode Sense: 86 0b 00 02 sda: assuming drive cache: write through sda: sda1 sda2 sda3 sd 0:0:0:0: Attached scsi disk sda Performance using the simple tests which I did previously was practically the same. I can confirm that the fixes in mainline work, many thanks! |