Bug 16275
Summary: | Kernel panic when mounting filesystem | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Pawel Staszewski (pstaszewski) |
Component: | SCSI | Assignee: | linux-scsi (linux-scsi) |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | alan, drew.kay, maciej.rutecki, neilb, pstaszewski |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.35-rc3-next-20100622 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Call_Trace
Panic first screen Panic second screen panic after change bool to unsigned long panic second screen #2 |
Looks more like a SCSI problem than an md problem. I'll reassign it. The traceback in the attachments looks like this might be a SCSI problem. Anyone able to look?? Thanks. i have no scsi in my system. here is lspci: lspci 00:00.0 Host bridge: Intel Corporation 5100 Chipset Memory Controller Hub (rev 90) 00:02.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x8 Port 2-3 (rev 90) 00:04.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x16 Port 4-7 (rev 90) 00:08.0 System peripheral: Intel Corporation 5100 Chipset DMA Engine (rev 90) 00:10.0 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90) 00:10.1 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90) 00:10.2 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90) 00:11.0 Host bridge: Intel Corporation 5100 Chipset Reserved Registers (rev 90) 00:13.0 Host bridge: Intel Corporation 5100 Chipset Reserved Registers (rev 90) 00:15.0 Host bridge: Intel Corporation 5100 Chipset DDR Channel 0 Registers (rev 90) 00:16.0 Host bridge: Intel Corporation 5100 Chipset DDR Channel 1 Registers (rev 90) 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02) 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02) 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02) 00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02) 00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) 00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02) 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02) 01:00.0 Ethernet controller: Intel Corporation 82598EB 10 Gigabit AT CX4 Network Connection (rev 01) 04:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03) 05:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller 06:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) And .config cat .config | grep SCSI # SCSI device support CONFIG_SCSI_MOD=y CONFIG_SCSI=y CONFIG_SCSI_DMA=y # CONFIG_SCSI_TGT is not set CONFIG_SCSI_NETLINK=y # CONFIG_SCSI_PROC_FS is not set # SCSI support type (disk, tape, CD-ROM) # CONFIG_SCSI_MULTI_LUN is not set CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_LOGGING is not set # CONFIG_SCSI_SCAN_ASYNC is not set CONFIG_SCSI_WAIT_SCAN=m # SCSI Transports CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_FC_ATTRS=y # CONFIG_SCSI_ISCSI_ATTRS is not set # CONFIG_SCSI_SAS_LIBSAS is not set # CONFIG_SCSI_SRP_ATTRS is not set CONFIG_SCSI_LOWLEVEL=y # CONFIG_ISCSI_TCP is not set # CONFIG_SCSI_CXGB3_ISCSI is not set # CONFIG_SCSI_BNX2_ISCSI is not set # CONFIG_BE2ISCSI is not set # CONFIG_SCSI_HPSA is not set # CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_3W_SAS is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AACRAID is not set # CONFIG_SCSI_AIC7XXX is not set # CONFIG_SCSI_AIC7XXX_OLD is not set CONFIG_SCSI_AIC79XX=y # CONFIG_SCSI_AIC94XX is not set # CONFIG_SCSI_MVSAS is not set # CONFIG_SCSI_DPT_I2O is not set # CONFIG_SCSI_ADVANSYS is not set # CONFIG_SCSI_ARCMSR is not set # CONFIG_SCSI_MPT2SAS is not set # CONFIG_SCSI_HPTIOP is not set # CONFIG_SCSI_BUSLOGIC is not set # CONFIG_VMWARE_PVSCSI is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_EATA is not set # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set # CONFIG_SCSI_STEX is not set # CONFIG_SCSI_SYM53C8XX_2 is not set # CONFIG_SCSI_IPR is not set # CONFIG_SCSI_QLOGIC_1280 is not set # CONFIG_SCSI_QLA_FC is not set # CONFIG_SCSI_QLA_ISCSI is not set # CONFIG_SCSI_LPFC is not set # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_DEBUG is not set # CONFIG_SCSI_PMCRAID is not set # CONFIG_SCSI_PM8001 is not set # CONFIG_SCSI_SRP is not set # CONFIG_SCSI_BFA_FC is not set # CONFIG_SCSI_DH is not set # CONFIG_SCSI_OSD_INITIATOR is not set # NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may # CONFIG_ISCSI_IBFT_FIND is not set # CONFIG_ISCSI_BOOT_SYSFS is not set Yep... SATA uses the same infrastructure as SCSI so a number of the function names still contain "SCSI" even though it is't really working with a SCSI disk. Still - the SCSI people have a better chance and understanding the stack trace. any news ? I can test if someone give - patch that fix this issue. Reply-To: James.Bottomley@suse.de On Fri, 2010-06-25 at 08:13 +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > any news ? > I can test if someone give - patch that fix this issue. For those who don't have access to bugzilla, there's a gif image of the crash: https://bugzilla.kernel.org/attachment.cgi?id=26910 Unfortunately, the crucial information has scrolled off the top of the screen, but the presentation is some type of crash in scsi_setup_fs_cmnd(). Without the missing information, it's hard to say, but I'd would be a reasonable guess that it's this BUG_ON: /* * Filesystem requests must transfer data. */ BUG_ON(!req->nr_phys_segments); Which would point to the filesystem. First order of business would be to validate the guess by getting the missing part of the trace, and then start looking into the relevant filesystem. James The same problem is with 2.6.35-rc4. Sorry i can't provide now full panic log from output console because server is 1000km from me.... - and there is limited access to monitor and console.. last good working kernel was 2.6.35-rc1 Today i tested kernel 2.6.25-rc5 /next-20100720 And photos from panic in attached files. Created attachment 27167 [details]
Panic first screen
Created attachment 27168 [details]
Panic second screen
Any news ? I check now and latest working kernel is 2.6.25-rc5 Reply-To: James.Bottomley@suse.de On Tue, 2010-07-20 at 09:57 +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > --- Comment #9 from Pawel Staszewski <pstaszewski@artcom.pl> 2010-07-20 > 09:57:34 --- > Created an attachment (id=27167) > --> (https://bugzilla.kernel.org/attachment.cgi?id=27167) > Panic first screen Well, this validates the guess that it's the zero segment check, thanks! so it's clearly a filesystem bug. Which filesystem is this? I tried looking for the 'Checking internal tree' message, but couldn't find it in the sources. James ReiserFS most likely. The "checking internal tree" is consistent with a reiserfs filesystem during boot. Yes this is reiserfs. It looks like the problem is commit 74450be123b6f3cb480c358a056be398cce6aa6e This was recently reported on linux-kernel (and other mailing lists). It is probable that the 'bool' variables (do_sync and do_barrier) in drivers/md/raid1.c need to be changed to 'unsigned long'. Can you please test that and report? Thanks, NeilBrown You mean reverting this commit 74450be123b6f3cb480c358a056be398cce6aa6e Thanks Pawel No, I mean replacing 'bool' with 'unsigned long'. Reverting the commit almost certainly will work, so won't tell us much. Thanks. *** Bug 16447 has been marked as a duplicate of this bug. *** I think this needs a little more work than replacing 'bool' with 'unsigned long'. I made this and also have panic. Sorry for no trace output but i forget to catch console - and also have hard access to this machine. Thanks Pawel Ok i reproduce this on VirtualBox with md and raid1. Changes that i made: const bool -> const unsigned long (where was do_sync function) and bool do_barriers; -> unsigned long do_barriers; Now i have: Kernel BUG at fs/inode.c:298! Picture from trace in attached file: panic_unsigned_long.jpg Created attachment 27260 [details]
panic after change bool to unsigned long
Created attachment 27262 [details]
panic second screen #2
Any news ? Looks like a totally different bug - something to do with Al Viros VFS changes by the look of it. I suggest opening a separate bug as it really is a different issue. |
Created attachment 26910 [details] Call_Trace Kernel panic when mounting filesystem. cat /etc/fstab /dev/md1 /boot ext2 noauto,noatime 1 2 /dev/md2 / reiserfs noatime 0 1 /dev/sda3 none swap sw,pri=1 0 0 /dev/sdb3 none swap sw,pri=1 0 0 Call Trace in attached file.