Bug 8672 - kernel panic when using software raid1 on top of cciss
Summary: kernel panic when using software raid1 on top of cciss
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: MD (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Neil Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-06-25 07:31 UTC by leo weppelman
Modified: 2009-03-23 11:12 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.21
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
tracing patch (3.35 KB, patch)
2007-11-22 18:00 UTC, Neil Brown
Details | Diff

Description leo weppelman 2007-06-25 07:31:49 UTC
Most recent kernel where this bug did not occur:
Distribution: Debian unstable
Hardware Environment: HP ML-350 G5
Software Environment:
Problem Description:
The trace written to the console:
=================================

kernel BUG at drivers/block/cciss.c:2479!
invalid opcode: 0000 [#1]
SMP 
Modules linked in: mptctl sg nfsd exportfs lockd nfs_acl sunrpc ipv6 8021q raw dm_snapshot shpchp pci_hotplug psmouse serio_raw pcspkr ext3 jbd mbcache raid1 md_mod dm_mod ide_generic ide_cd usbhid hid piix tg3 cciss bnx2 generic ehci_hcd ide_core uhci_hcd usbcore thermal processor fan sr_mod cdrom mptscsih mptbase aic7xxx sym53c8xx scsi_transport_spi BusLogic scsi_mod
CPU:    0
EIP:    0060:[<f894c119>]    Not tainted VLI
EFLAGS: 00010012   (2.6.21-1-686 #1)
EIP is at do_cciss_request+0x44/0x349 [cciss]
eax: f693b350   ebx: dfb7dbac   ecx: 00000000   edx: 00000000
esi: 00000800   edi: f5ff99c0   ebp: f693b350   esp: dfcafbec
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Process md0_raid1 (pid: 6108, ti=dfcae000 task=df981a90 task.ti=dfcae000)
Stack: 00000000 dfd6004c dfb7dbac 00000c00 dfd10000 00000001 00000001 00000001 
       dfcafc2c c17430e0 00000c00 00000000 00000200 c16bad80 00000000 00000000 
       c195a24c dfebaadc 00000000 00000040 00000040 dfcafc9c dfd60250 dfc97bec 
Call Trace:
 [<c01bd25b>] elv_next_request+0x10d/0x11c
 [<f894a105>] start_io+0x7b/0xe1 [cciss]
 [<f894c413>] do_cciss_request+0x33e/0x349 [cciss]
 [<c016438a>] cache_alloc_refill+0x58/0x466
 [<c01c6dfc>] cfq_set_request+0x299/0x315
 [<c01bd54a>] elv_rb_add+0x65/0x6d
 [<c01c63e8>] cfq_add_rq_rb+0x5c/0x6b
 [<c01c6421>] cfq_insert_request+0x2a/0x3ae
 [<c0129e5f>] lock_timer_base+0x15/0x2f
 [<c012a15d>] del_timer+0x48/0x4e
 [<c01bfa97>] blk_remove_plug+0x57/0x63
 [<c0129f73>] __mod_timer+0x9c/0xa6
 [<c01bfac0>] __generic_unplug_device+0x1d/0x1f
 [<c01c0c02>] __make_request+0x34c/0x46c
 [<c01bedfa>] generic_make_request+0x1a9/0x1b9
 [<c012a15d>] del_timer+0x48/0x4e
 [<c01bfa97>] blk_remove_plug+0x57/0x63
 [<f8961ba9>] raid1d+0xbf/0xd0e [raid1]
 [<c0102ff1>] __switch_to+0xfe/0x131
 [<c011b0fe>] __activate_task+0x1c/0x29
 [<c029d6d9>] schedule_timeout+0x13/0x8d
 [<c0124b9c>] do_exit+0x6c2/0x6c6
 [<f89a1a71>] md_thread+0xc6/0xdd [md_mod]
 [<c01328e5>] autoremove_wake_function+0x0/0x35
 [<f89a19ab>] md_thread+0x0/0xdd [md_mod]
 [<c013281a>] kthread+0xb2/0xdc
 [<c0132768>] kthread+0x0/0xdc
 [<c01049a7>] kernel_thread_helper+0x7/0x10
 =======================
Code: 44 24 10 8b 82 dc 00 00 00 84 c0 0f 88 0b 03 00 00 8b 44 24 08 e8 46 10 87 c7 85 c0 89 c5 0f 84 f8 02 00 00 66 83 78 68 1f 76 04 <0f> 0b eb fe 8b 44 24 10 ba 01 00 00 00 e8 a2 f9 ff ff 85 c0 89 
EIP: [<f894c119>] do_cciss_request+0x44/0x349 [cciss] SS:ESP 0068:dfcafbec

The bug is triggered on the line:
   BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);
in the function drivers/block/cciss.c:do_cciss_request()

It looks like the raid code doen not hounour the limitations of the cciss
driver. I have added a printk to dm_table_set_restrictions() and I saw
the value of 31 (MAXSGENTRIES) pass by.

I filed this bug at first as  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=426705, but as it seems to be
de dm-driver, it looks more at place here. An other bug that looks related is: http://bugzilla.kernel.org/show_bug.cgi?id=7763, but it is in dm_crypt. But from
the data in the attachment in #13, it seems to have the same cause.

To add another data-point, I compiled a kernel with 
 MAX_PHYS_SEGMENTS = MAX_HW_SEGMENTS = 16
This make the problem go away (as expected).

Steps to reproduce:
I have an ML-350-G5 with an E200i raid controller. There are 2 logical drives
defined that map 1-1 on a physical drive. Those disks are part of a software
RAID-1 array. When initializing an oracle database on the system, the
system panics.
Comment 1 Alasdair G Kergon 2007-11-13 04:19:48 UTC
Please change 'component' field to 'MD'.
Comment 2 Neil Brown 2007-11-15 16:24:17 UTC
Hi Leo,
 Your stack trace suggests that you are using 'md' raid1, not 'dm' raid1.
 However the fact that a printk in dm_table_set_restrictions printed something
 suggests that you are using dm as well.

 Could you please explain exactly the configuration of devices, raid/lvm and filesystems.

 Both dm and md to attempt to honour the max_phys_segments limitation, so
 something odd must be happening.  A description of your configuration might
 help me work out where to look.

Thanks,
NeilBrown
Comment 3 Neil Brown 2007-11-15 16:25:21 UTC
Setting to Need Info for configuration description

(I cannot seem to both accept a bug, and set needinfo at the same time...)
Comment 4 leo weppelman 2007-11-18 23:33:44 UTC
On Thu, Nov 15, 2007 at 04:24:18PM -0800, bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=8672
> 
> 
> neilb@suse.de changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>          AssignedTo|mike.miller@hp.com          |neilb@suse.de
>              Status|NEW                         |ASSIGNED
> 
> 
> 
> 
> ------- Comment #2 from neilb@suse.de  2007-11-15 16:24 -------
> Hi Leo,
>  Your stack trace suggests that you are using 'md' raid1, not 'dm' raid1.
>  However the fact that a printk in dm_table_set_restrictions printed
>  something
>  suggests that you are using dm as well.
> 
>  Could you please explain exactly the configuration of devices, raid/lvm and
> filesystems.
> 
>  Both dm and md to attempt to honour the max_phys_segments limitation, so
>  something odd must be happening.  A description of your configuration might
>  help me work out where to look.

Neil,

We have servers with 2 physical disks. Those disks are partitioned alike in
a data, a swap and a boot partition. The data and the swap partitions are
both mirrored as raid1 devices using mdadm(1). So here you have md on top
of cciss. On top of the raid1 data partition, we use lvm2 to create the
logical volumes for the filesystems (pvcreate -> vgcreat -> lvcreate ).

Leo.
Comment 5 Neil Brown 2007-11-22 18:00:05 UTC
Hmmm.. nothing obvious there that would trigger a problem.

Could you try the patch which I will attach?  It just prints out some
relevant values at various places so we can see where something
goes wrong.

Thanks.
Comment 6 Neil Brown 2007-11-22 18:00:44 UTC
Created attachment 13702 [details]
tracing patch

Please try this patch and report what is printed.
Comment 7 leo weppelman 2007-11-26 04:49:39 UTC
Ok, I tried it:

<kernel boots>
   ...
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before set_limits : rs=31 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
In table_set_restrictions: 31
  Logical volume "snap" created
  27 logical volume(s) in volume group "ahraid" now active
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
  Logical volume "snap" successfully removed
Starting raid devices: done.
Starting raid devices: done.
    ...
  <starting database init>
    ...
kernel BUG at drivers/block/cciss.c:2482!
invalid opcode: 0000 [#1]
SMP
Modules linked in: sg ipmi_devintf ipmi_watchdog ipmi_poweroff ipmi_si ipmi_msghandler 8021q nfsd exportfs lockd nfs_acl sunrpc ipv6 raw dm_snapshot tsdev usbhid hid ehci_hcd uhci_hcd usbcore evdev psmouse serio_raw pcspkr shpchp pci_hotplug ext3 jbd mbcache dm_mod ide_generic raid1 md_mod ide_cd ata_piix libata piix generic bnx2 ide_core tg3 cciss thermal processor fan sr_mod cdrom mptscsih mptbase aic7xxx sym53c8xx scsi_transport_spi BusLogic scsi_mod
CPU:    0
EIP:    0060:[<f88e9652>]    Not tainted VLI
EFLAGS: 00010012   (2.6.21.5-ah2 #1)
EIP is at do_cciss_request+0x75/0x36c [cciss]
eax: 0000001b   ebx: dfe92bac   ecx: 00000082   edx: 00000000
esi: 00000800   edi: f2bc0bc0   ebp: f75ace18   esp: dfef9bf0
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Process md0_raid1 (pid: 1131, ti=dfef8000 task=dfa7a580 task.ti=dfef8000)
Stack: f88ecfe2 00000020 0000001f 0000001f dff5d000 00000001 dfe92bac 00000c00
       dfb70000 00000001 00000001 dfef9c3c c16dff00 00000c00 00000000 00000200
       00000001 00000040 dfbe0250 dfb6cbec 00000000 f75a9ed0 c01bd4a9 00000001
Call Trace:
 [<c01bd4a9>] elv_next_request+0x10d/0x11c
 [<f88e6105>] start_io+0x7b/0xe1 [cciss]
 [<f88e993e>] do_cciss_request+0x361/0x36c [cciss]
 [<c01645fa>] cache_alloc_refill+0x58/0x465
 [<c01c700d>] cfq_set_request+0x299/0x315
 [<c01bd798>] elv_rb_add+0x65/0x6d
 [<c01c65ec>] cfq_add_rq_rb+0x5c/0x6b
 [<c01c6625>] cfq_insert_request+0x2a/0x3ae
 [<c0129efc>] lock_timer_base+0x15/0x2f
 [<c012a1fa>] del_timer+0x48/0x4e
 [<c01bfcb9>] blk_remove_plug+0x57/0x63
 [<c012a010>] __mod_timer+0x9c/0xa6
 [<c01bfce2>] __generic_unplug_device+0x1d/0x1f
 [<c01c0e15>] __make_request+0x337/0x456
 [<c01bf02a>] generic_make_request+0x1a9/0x1b9
 [<c012a1fa>] del_timer+0x48/0x4e
 [<c01bfcb9>] blk_remove_plug+0x57/0x63
 [<f896abb5>] raid1d+0xbe/0xd02 [raid1]
 [<c0102fee>] __switch_to+0xfe/0x130
 [<c029d990>] schedule_timeout+0x13/0x8d
 [<c010475b>] common_interrupt+0x23/0x28
 [<c029007b>] __find_acq_core+0xa7/0x30d
 [<f89dba6a>] md_thread+0xc6/0xdc [md_mod]
 [<c0132971>] autoremove_wake_function+0x0/0x35
 [<f89db9a4>] md_thread+0x0/0xdc [md_mod]
 [<c01328a6>] kthread+0xb2/0xdc
 [<c01327f4>] kthread+0x0/0xdc
 [<c01049a3>] kernel_thread_helper+0x7/0x10
 =======================
Code: 18 0f b7 81 40 01 00 00 c7 44 24 08 1f 00 00 00 89 54 24 04 c7 04 24 e2 cf 8e f8 89 44 24 0c e8 b8 8f 83 c7 66 83 7d 68 1f 76 04 <0f> 0b eb fe 8b 44 24 20 ba 01 00 00 00 e8 ce cc ff ff 85 c0 89
EIP: [<f88e9652>] do_cciss_request+0x75/0x36c [cciss] SS:ESP 0068:dfef9bf0
Comment 8 Neil Brown 2007-11-28 19:16:43 UTC
The patch should have cause a message to be printed immediate before the
  kernel BUG at ...

message.  I don't see it in the output you provide.  Did you clip it, or 
was it really not there.
If you did leave it out by mistake, please provide it.

Thanks.
Comment 9 leo weppelman 2007-11-28 23:07:32 UTC
It really isn't there. And it _is_ in the source, just checked it. Can it be
that it has not been flushed? Is there a way to make sure that the message is
printed before running into the BUG_ON() macro?
Comment 10 Neil Brown 2007-11-28 23:58:44 UTC
On Wednesday November 28, bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=8672

> It really isn't there. And it _is_ in the source, just checked it. Can it be
> that it has not been flushed? Is there a way to make sure that the message is
> printed before running into the BUG_ON() macro?

The BUG_ON macro uses printk to print messages, so there is no way it
would interfere with any previous printk.

It is possible that cciss is a module, and an old module is being
loaded?

The patch will have changed the line number of the BUG, and between
the oops it changes from 2479 to 2482, which is a 3 line change to
match the 3 lines that were added, so it seems to be OK.

So that tells us that .... the BUG_ON macro is being miscompiled???.

How about changing the code to read:

	if (creq->nr_phys_segments > MAXSGENTRIES) {
		printk("nps = %d, MAX=%d mps=%d\n", creq->nr_phys_segments, MAXSGENTRIES,
		       q->max_phys_segments);
	 	BUG();
	}

Then we will definitely get a printk before the BUG.

NeilBrown

 
Comment 11 leo weppelman 2007-11-29 05:56:08 UTC
I am out of ideas....
  - I tried your suggestion above, but the was no output from the printk(). I
    even added an extra comment line before BUG(), so the line number would         
    change.
  - I tried printk(KERN_CRIT "nps = %d....
    to see if it would make a difference. It didn't.
  - I added an udelay() after the printk, see if that would help... No.
  - I added
         int i; for (i = 0; i < 1000; i++) usleep(1000);
    just before BUG(), but it does not help. Allthough it hints at the fact that
    our code is used, since the BUG has moved to CPU#1 and I now have a soft
    lockup on CPU#0. And, oh, adding 'nosmp' to the kernel bootparms does
    not work either, since that will make the disk probe fail...
Comment 12 Neil Brown 2007-11-29 15:31:37 UTC
Weird....

How about changing the code to

 if (creq->nr_phys_segments > MAXSGENTRIES) {
      printk(".....", ......);
      end_that_request_last(creq, -EIO);
      goto queue;
}

so that instead of calling BUG, we just fail the request...
Comment 13 leo weppelman 2007-12-03 00:02:02 UTC
OK, tried that one, but no printf output. I now get:
------------[ cut here ]------------
kernel BUG at block/ll_rw_blk.c:2749!
invalid opcode: 0000 [#1]
SMP
Modules linked in: sg ipmi_devintf ipmi_watchdog ipmi_poweroff ipmi_si ipmi_msghandler 8021q nfsd exportfs lockd nfs_acl sunrpc ipv6 raw dm_snapshot tsdev usbhid hid ehci_hcd uhci_hcd evdev psmouse usbcore shpchp pci_hotplug serio_raw pcspkr ext3 jbd mbcache dm_mod ide_generic raid1 md_mod ide_cd ata_piix libata piix bnx2 generic ide_core tg3 cciss thermal processor fan sr_mod cdrom mptscsih mptbase aic7xxx sym53c8xx scsi_transport_spi BusLogic scsi_mod
CPU:    0
EIP:    0060:[<c01bf118>]    Not tainted VLI
EFLAGS: 00010083   (2.6.21.5-ah2 #2)
EIP is at __blk_put_request+0x33/0x75
eax: 00000001   ebx: dfdef858   ecx: 00000000   edx: 00000000
esi: 00018485   edi: dfbe8bac   ebp: dfdef858   esp: c1b6bbbc
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Process md0_raid1 (pid: 1132, ti=c1b6a000 task=dfe0da90 task.ti=c1b6a000)
Stack: dfbe8bac 00000800 dfbe8bac dfdef858 f88e9657 f88ecfe2 00000020 0000001f
       0000001f c1b22750 c1b22750 dfbe8bac f88e9941 dfbb0000 dfe84e00 c01c6134
       dfba747c dfe84e00 00000001 dfbe87ec 00000000 dfbb0000 00000005 00000005
Call Trace:
 [<f88e9657>] do_cciss_request+0x7a/0x36f [cciss]
 [<f88e9941>] do_cciss_request+0x364/0x36f [cciss]
 [<c01c6134>] cfq_dispatch_insert+0x34/0x53
 [<c01c700d>] cfq_set_request+0x299/0x315
 [<c01bd798>] elv_rb_add+0x65/0x6d
 [<c0129efc>] lock_timer_base+0x15/0x2f
 [<c012a1fa>] del_timer+0x48/0x4e
 [<c0129efc>] lock_timer_base+0x15/0x2f
 [<c012a1fa>] del_timer+0x48/0x4e
 [<c01bfcb9>] blk_remove_plug+0x57/0x63
 [<c01bfce2>] __generic_unplug_device+0x1d/0x1f
 [<c01bd55e>] elv_insert+0xa6/0x146
 [<c012a010>] __mod_timer+0x9c/0xa6
 [<c01c0e07>] __make_request+0x329/0x456
 [<c01bf02a>] generic_make_request+0x1a9/0x1b9
 [<c012a1fa>] del_timer+0x48/0x4e
 [<c01bfcb9>] blk_remove_plug+0x57/0x63
 [<f88aebb5>] raid1d+0xbe/0xd02 [raid1]
 [<c0102fee>] __switch_to+0xfe/0x130
 [<c029d990>] schedule_timeout+0x13/0x8d
 [<c010475b>] common_interrupt+0x23/0x28
 [<f89dba6a>] md_thread+0xc6/0xdc [md_mod]
 [<c0132971>] autoremove_wake_function+0x0/0x35
 [<f89db9a4>] md_thread+0x0/0xdc [md_mod]
 [<c01328a6>] kthread+0xb2/0xdc
 [<c01327f4>] kthread+0x0/0xdc
 [<c01049a3>] kernel_thread_helper+0x7/0x10
 =======================
Code: 89 d3 74 64 8b 82 80 00 00 00 48 85 c0 89 82 80 00 00 00 75 53 89 f8 e8 25 e2 ff ff 8b 73 14 f7 c6 00 00 01 00 74 41 39 1b 74 04 <0f> 0b eb fe 83 7b 48 00 74 04 0f 0b eb fe 89 f5 81 e5 00 04 00
EIP: [<c01bf118>] __blk_put_request+0x33/0x75 SS:ESP 0068:c1b6bbbc

Note that block/ll_rw_blk.c:2749 contains:
      BUG_ON(!list_empty(&req->queuelist));
Comment 14 Neil Brown 2007-12-05 14:54:51 UTC
I'm afraid I don't know what to suggest next.
The fact that the printk isn't working suggests that something very
odd is going on, and until that is resolve, there is not a lot of 
point pursuing the original symptom.

I would probably start sprinkling printks all over the place and trying to
figure out why some work  and some don't.
Comment 15 leo weppelman 2007-12-06 02:21:15 UTC
OK, got it!
I have modified lib/bug.c. I added:
 char leo_msg[1024];
 EXPORT_SYMBOL(leo_msg);

and to report_bug(), I added just below the 'cut here' print:
  if (leo_msg[0] != '\0') {
   printk(KERN_EMERG "%s", leo_msg);
  }

In cciss.c I changed the debugging code block to:
   if (creq->nr_phys_segments > MAXSGENTRIES) {
      extern char leo_msg[];
      printk(KERN_EMERG "nps = %d, MAX=%d mps=%d\n", creq->nr_phys_segments,
                  MAXSGENTRIES, q->max_phys_segments);
      sprintf(leo_msg, "nps = %d, MAX=%d mps=%d\n", creq->nr_phys_segments,
                  MAXSGENTRIES, q->max_phys_segments);
      BUG();
   }

And now I've got both messages... But alas, finally the needed data ;-)
I pasted it from the riloe serial output:

[ .... First part of kernel boot msg's deleted .... ]]
hda: ATAPI 48X DVD-ROM CD-R/RW drive, 1536kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
cciss0: <0x3238> at PCI 0000:13:08.0 IRQ 216 using DAC
      blocks= 143305920 block_size= 512
      heads=255, sectors=32, cylinders=17562

      blocks= 143305920 block_size= 512
      heads=255, sectors=32, cylinders=17562

      blocks= 143305920 block_size= 512
      heads=255, sectors=32, cylinders=17562

 cciss/c0d0: p1 p2 p3
      blocks= 143305920 block_size= 512
      heads=255, sectors=32, cylinders=17562

 cciss/c0d1: p1 p2 p3
tg3.c:v3.75.2 (June 5, 2007)
ACPI: PCI Interrupt 0000:14:01.0[A] -> GSI 19 (level, low) -> IRQ 18
eth1: Tigon3 [partno(407709-001) rev 1100 PHY(5703)] (PCIX:133MHz:64-bit) 10/100/1000Base-T Ethernet 00:18:fe:83:a5:ec
eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[1] TSOcap[1]
eth1: dma_rwctrl[769c4000] dma_mask[64-bit]
Done.
Begin: Mounting root file system... ...
Begin: Running /scripts/local-top ...
md: raid1 personality registered for level 1
md: md1 stopped.
md: bind<cciss/c0d1p1>
md: bind<cciss/c0d0p1>
Before stack: t=128 b=31
After stack: t=31
Before stack: t=31 b=31
After stack: t=31
raid1: raid set md1 active with 2 out of 2 mirrors
mdadm: /dev//md1 has been started with 2 drives.
md: md0 stopped.
md: bind<cciss/c0d1p3>
md: bind<cciss/c0d0p3>
md: md0: raid array is not clean -- starting background reconstruction
Before stack: t=128 b=31
After stack: t=31
raid1: raid set md0 active with 1 out of 2 mirrors
Before stack: t=31 b=31
After stack: t=31
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:cciss/c0d0p3
 disk 1, wo:1, o:1, dev:cciss/c0d1p3
md: recovery of RAID array md0
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
mdadm: /dev//md0md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
 has been started with 1 drive (md: using 128k window, over a total of 33794560 blocks.
out of 2) and 1 spare.
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel@redhat.com
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
  27 logical volume(s) in volume group "ahraid" now active
Done.
Begin: Running /scripts/local-premount ...
Done.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
[ .... Initrd does his thing .... ]
Loading kernel module raid1.
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
Before set_limits : rs=0 q=31
After set_limits: 31
Before set_limits : rs=31 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
  Logical volume "snap" created
  27 logical volume(s) in volume group "ahraid" now active
Before set_limits : rs=0 q=31
After set_limits: 31
Before combine_low : lhs=0 rhs=31
After combine_low: 31
Before valid: 31
After valid: 31
In table_set_restrictions: 31
  Logical volume "snap" successfully removed
Starting raid devices: done.
Setting up LVM Volume Groups...
[ .... Rest of boot msg's deleted .... ]

             nps = 32, MAX=31 mps=31
------------[ cut here ]------------
nps = 32, MAX=31 mps=31
kernel BUG at drivers/block/cciss.c:2485!
invalid opcode: 0000 [#1]
SMP
Modules linked in: sg ipmi_devintf ipmi_watchdog ipmi_poweroff ipmi_si ipmi_msghandler 8021q nfsd exportfs lockd nfs_acl sunrpc ipv6 raw dm_snapshot tsdev usbhid hid ehci_hcd uhci_hcd shpchp pci_hotplug usbcore psmouse serio_raw evdev pcspkr ext3 jbd mbcache dm_mod ide_generic raid1 md_mod ide_cd ata_piix libata piix bnx2 generic ide_core tg3 cciss thermal processor fan sr_mod cdrom mptscsih mptbase aic7xxx sym53c8xx scsi_transport_spi BusLogic scsi_mod
CPU:    1
EIP:    0060:[<f88b067e>]    Not tainted VLI
EFLAGS: 00010086   (2.6.21.5-ah2 #5)
EIP is at do_cciss_request+0xa1/0x398 [cciss]
eax: 00000018   ebx: dfe34bac   ecx: c1b41bd4   edx: c1b41bd4
esi: 00000800   edi: dfe34bac   ebp: f7b7be18   esp: c1b41bcc
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Process md0_raid1 (pid: 1128, ti=c1b40000 task=dfb3c580 task.ti=c1b40000)
Stack: c03bba00 f88b401e 00000020 0000001f 0000001f dfbe2750 029ae960 dfe34bac
       dfeac59c dfbd0000 c01c6134 dfeac59c dfe8ca00 dff517a0 00000001 dfe347ec
       00000000 dfbd0000 00000009 00000009 c1b41cbc c1677660 00000000 00000000
Call Trace:
 [<c01c6134>] cfq_dispatch_insert+0x34/0x53
 [<c01c700d>] cfq_set_request+0x299/0x315
 [<c01bd798>] elv_rb_add+0x65/0x6d
 [<c0129efc>] lock_timer_base+0x15/0x2f
 [<c012a1fa>] del_timer+0x48/0x4e
 [<c0129efc>] lock_timer_base+0x15/0x2f
 [<c012a1fa>] del_timer+0x48/0x4e
 [<c01bfcb9>] blk_remove_plug+0x57/0x63
 [<c01bfce2>] __generic_unplug_device+0x1d/0x1f
 [<c01bd55e>] elv_insert+0xa6/0x146
 [<c012a010>] __mod_timer+0x9c/0xa6
 [<c01c0e07>] __make_request+0x329/0x456
 [<c01bf02a>] generic_make_request+0x1a9/0x1b9
 [<c012a1fa>] del_timer+0x48/0x4e
 [<c01bfcb9>] blk_remove_plug+0x57/0x63
 [<f89a1bb5>] raid1d+0xbe/0xd02 [raid1]
 [<c0102fee>] __switch_to+0xfe/0x130
 [<c029d9b0>] schedule_timeout+0x13/0x8d
 [<c010475b>] common_interrupt+0x23/0x28
 [<f89bea6a>] md_thread+0xc6/0xdc [md_mod]
 [<c0132971>] autoremove_wake_function+0x0/0x35
 [<f89be9a4>] md_thread+0x0/0xdc [md_mod]
 [<c01328a6>] kthread+0xb2/0xdc
 [<c01327f4>] kthread+0x0/0xdc
 [<c01049a3>] kernel_thread_helper+0x7/0x10
 =======================
Code: 01 00 00 0f b7 55 68 c7 44 24 0c 1f 00 00 00 c7 44 24 04 1e 40 8b f8 89 44 24 10 89 54 24 08 c7 04 24 00 ba 3b c0 e8 7f b6 91 c7 <0f> 0b eb fe 8b 44 24 24 ba 01 00 00 00 e8 a2 cc ff ff 85 c0 89
EIP: [<f88b067e>] do_cciss_request+0xa1/0x398 [cciss] SS:ESP 0068:c1b41bcc
Comment 16 Neil Brown 2007-12-06 22:54:41 UTC
So, despite max_phys_segments always being 31, we have a case where
nr_phys_segments is 32.

I suspect the most likely cause of this is that blk_recalc_rq_segments makes
different BIOVEC_PHYS_MERGEABLE etc decisions at the cciss level than
at the md or dm level.

I wonder if it could be seg_boundary_mask.  dm default it to '-1' and then
uses min_non_zero, so it might stay at -1 which would seem wrong.

Maybe try changing the -1 in 'check_for_valid_limits' in driver/md/dm-table.c
to 0xffffffff (8 f's).

It is just a hunch and I will try to look closer and see if there are other
possibilities, but it is worth trying I think.
Comment 17 Milan Broz 2007-12-07 00:11:19 UTC
Neil,

I recently found similar problem in dm-crypt, (bug 7763) see patch
http://www2.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-use-bio_add_page.patch

In raid1.c are similar code patterns directly manipulating with bvec,
isn't this source of problems too ?

Milan
Comment 18 Neil Brown 2007-12-09 16:16:56 UTC
On Friday December 7, bugme-daemon@bugzilla.kernel.org wrote:
> Neil,
> 
> I recently found similar problem in dm-crypt, (bug 7763) see patch
>
> http://www2.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-use-bio_add_page.patch
> 
> In raid1.c are similar code patterns directly manipulating with bvec,
> isn't this source of problems too ?

Hmmm. that's a nasty problem.

It could affect raid1 if --write-behind is being used, which I don't
think is the case here.  Fixing the --write-behind case would be
rather ugly.  Probably the best fix would be to set the
seb_boundary_mask very low if --write-behind is in effect.

NeilBrown
Comment 19 Alasdair G Kergon 2008-12-02 03:17:33 UTC
If anyone's still watching this, please try the patch in:
  http://lkml.org/lkml/2008/11/30/108
  Subject: [PATCH] block: fix setting of max_segment_size and seg_boundary mask

Note You need to log in before you can comment on or make changes to this bug.