Bug 16275

Summary: Kernel panic when mounting filesystem
Product: IO/Storage Reporter: Pawel Staszewski (pstaszewski)
Component: SCSIAssignee: linux-scsi (linux-scsi)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, drew.kay, maciej.rutecki, neilb, pstaszewski
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35-rc3-next-20100622 Subsystem:
Regression: No Bisected commit-id:
Attachments: Call_Trace
Panic first screen
Panic second screen
panic after change bool to unsigned long
panic second screen #2

Description Pawel Staszewski 2010-06-23 06:55:14 UTC
Created attachment 26910 [details]
Call_Trace

Kernel panic when mounting filesystem.

cat /etc/fstab
/dev/md1                /boot           ext2            noauto,noatime  1 2
/dev/md2                /               reiserfs        noatime         0 1
/dev/sda3               none            swap            sw,pri=1        0 0
/dev/sdb3               none            swap            sw,pri=1        0 0


Call Trace in attached file.
Comment 1 Neil Brown 2010-06-23 07:18:18 UTC
Looks more like a SCSI problem than an md problem.

I'll reassign it.
Comment 2 Neil Brown 2010-06-23 07:20:21 UTC
The traceback in the attachments looks like this might be a SCSI problem.

Anyone able to look??
Thanks.
Comment 3 Pawel Staszewski 2010-06-23 07:26:47 UTC
i have no scsi in my system.
here is lspci:
 lspci
00:00.0 Host bridge: Intel Corporation 5100 Chipset Memory Controller Hub (rev 90)
00:02.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x8 Port 2-3 (rev 90)
00:04.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x16 Port 4-7 (rev 90)
00:08.0 System peripheral: Intel Corporation 5100 Chipset DMA Engine (rev 90)
00:10.0 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90)
00:10.1 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90)
00:10.2 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90)
00:11.0 Host bridge: Intel Corporation 5100 Chipset Reserved Registers (rev 90)
00:13.0 Host bridge: Intel Corporation 5100 Chipset Reserved Registers (rev 90)
00:15.0 Host bridge: Intel Corporation 5100 Chipset DDR Channel 0 Registers (rev 90)
00:16.0 Host bridge: Intel Corporation 5100 Chipset DDR Channel 1 Registers (rev 90)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 Ethernet controller: Intel Corporation 82598EB 10 Gigabit AT CX4 Network Connection (rev 01)
04:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03)
05:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
06:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)

And .config
cat .config | grep SCSI
# SCSI device support
CONFIG_SCSI_MOD=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
# CONFIG_SCSI_TGT is not set
CONFIG_SCSI_NETLINK=y
# CONFIG_SCSI_PROC_FS is not set
# SCSI support type (disk, tape, CD-ROM)
# CONFIG_SCSI_MULTI_LUN is not set
CONFIG_SCSI_CONSTANTS=y
# CONFIG_SCSI_LOGGING is not set
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m
# SCSI Transports
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
# CONFIG_SCSI_ISCSI_ATTRS is not set
# CONFIG_SCSI_SAS_LIBSAS is not set
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
CONFIG_SCSI_AIC79XX=y
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_VMWARE_PVSCSI is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_SCSI_BFA_FC is not set
# CONFIG_SCSI_DH is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
# CONFIG_ISCSI_IBFT_FIND is not set
# CONFIG_ISCSI_BOOT_SYSFS is not set
Comment 4 Neil Brown 2010-06-23 07:35:54 UTC
Yep... SATA uses the same infrastructure as SCSI so a number of the
function names still contain "SCSI" even though it is't really working with
a SCSI disk.
Still - the SCSI people have a better chance and understanding the stack trace.
Comment 5 Pawel Staszewski 2010-06-25 08:13:33 UTC
any news ?
I can test if someone give - patch that fix this issue.
Comment 6 Anonymous Emailer 2010-06-27 21:06:55 UTC
Reply-To: James.Bottomley@suse.de

On Fri, 2010-06-25 at 08:13 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> any news ?
> I can test if someone give - patch that fix this issue.

For those who don't have access to bugzilla, there's a gif image of the
crash:

https://bugzilla.kernel.org/attachment.cgi?id=26910

Unfortunately, the crucial information has scrolled off the top of the
screen, but the presentation is some type of crash in
scsi_setup_fs_cmnd().  Without the missing information, it's hard to
say, but I'd would be a reasonable guess that it's this BUG_ON:

	/*
	 * Filesystem requests must transfer data.
	 */
	BUG_ON(!req->nr_phys_segments);

Which would point to the filesystem.  First order of business would be
to validate the guess by getting the missing part of the trace, and then
start looking into the relevant filesystem.

James
Comment 7 Pawel Staszewski 2010-07-05 16:01:26 UTC
The same problem is with 2.6.35-rc4.

Sorry i can't provide now full panic log from output console because server is 1000km from me.... - and there is limited access to monitor and console..

last good working kernel was 2.6.35-rc1
Comment 8 Pawel Staszewski 2010-07-20 09:56:32 UTC
Today i tested kernel 2.6.25-rc5 /next-20100720
And photos from panic in attached files.
Comment 9 Pawel Staszewski 2010-07-20 09:57:34 UTC
Created attachment 27167 [details]
Panic first screen
Comment 10 Pawel Staszewski 2010-07-20 09:58:19 UTC
Created attachment 27168 [details]
Panic second screen
Comment 11 Pawel Staszewski 2010-07-21 09:02:10 UTC
Any news ?
Comment 12 Pawel Staszewski 2010-07-21 20:08:49 UTC
I check now and latest working kernel is 2.6.25-rc5
Comment 13 Anonymous Emailer 2010-07-21 20:44:23 UTC
Reply-To: James.Bottomley@suse.de

On Tue, 2010-07-20 at 09:57 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> --- Comment #9 from Pawel Staszewski <pstaszewski@artcom.pl>  2010-07-20
> 09:57:34 ---
> Created an attachment (id=27167)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=27167)
> Panic first screen

Well, this validates the guess that it's the zero segment check, thanks!
so it's clearly a filesystem bug.

Which filesystem is this?  I tried looking for the 'Checking internal
tree' message, but couldn't find it in the sources.

James
Comment 14 Andrew Kay 2010-07-22 03:09:20 UTC
ReiserFS most likely. The "checking internal tree" is consistent with a reiserfs filesystem during boot.
Comment 15 Pawel Staszewski 2010-07-22 06:40:36 UTC
Yes this is reiserfs.
Comment 16 Neil Brown 2010-07-22 07:41:04 UTC
It looks like the problem is commit 74450be123b6f3cb480c358a056be398cce6aa6e

This was recently reported on linux-kernel (and other mailing lists).

It is probable that the 'bool' variables (do_sync and do_barrier) in
drivers/md/raid1.c need to be changed to 'unsigned long'. 
Can you please test that and report?

Thanks,
NeilBrown
Comment 17 Pawel Staszewski 2010-07-22 08:22:17 UTC
You mean reverting this commit 74450be123b6f3cb480c358a056be398cce6aa6e

Thanks
Pawel
Comment 18 Neil Brown 2010-07-22 08:36:58 UTC
No, I mean replacing 'bool' with 'unsigned long'.

Reverting the commit almost certainly will work, so won't tell
us much.

Thanks.
Comment 19 Maciej Rutecki 2010-07-23 19:21:43 UTC
*** Bug 16447 has been marked as a duplicate of this bug. ***
Comment 20 Pawel Staszewski 2010-07-24 10:20:16 UTC
I think this needs a little more work than replacing 'bool' with 'unsigned long'.
I made this and also have panic.

Sorry for no trace output but i forget to catch console - and also have hard access to this machine.

Thanks
Pawel
Comment 21 Pawel Staszewski 2010-07-26 14:44:14 UTC
Ok i reproduce this on VirtualBox with md and raid1.
Changes that i made:

const bool -> const unsigned long (where was do_sync function)
and 
bool do_barriers; -> unsigned long do_barriers;

Now i have:
Kernel BUG at fs/inode.c:298!

Picture from trace in attached file: panic_unsigned_long.jpg
Comment 22 Pawel Staszewski 2010-07-26 14:45:20 UTC
Created attachment 27260 [details]
panic after change bool to unsigned long
Comment 23 Pawel Staszewski 2010-07-26 14:54:36 UTC
Created attachment 27262 [details]
panic second screen #2
Comment 24 Pawel Staszewski 2010-08-03 10:29:42 UTC
Any news ?
Comment 25 Neil Brown 2010-08-03 11:59:34 UTC
Looks like a totally different bug - something to do with Al Viros VFS changes
by the look of it.
I suggest opening a separate bug as it really is a different issue.