Distribution: Gentoo 2005.0 with both gentoo kernel and vanila sources, tested versions 2.6.11-gentoo-r6 and 2.6.12-rc3. Hardware Environment: 2 diferent machines all brand new N
Created attachment 5023 [details] lspci output, kernel config, dmesg output Usefull information.
I installed a 2.4 kernel series on the same machines, and i don't have any problems with this series, so i think it's 2.6 specific bug and pretty serious...
I was wrong... with 2.4.28-gentooo-r8 the problem appears, the only diference is that the kernel doesn't remount the partition in read-only mode, but i guess thats normal 2.6 is better prepared for this kind of situation and does this in order to prevent any data loss.
Downstream bug report: http://bugs.gentoo.org/show_bug.cgi?id=91586
bugme-daemon@kernel-bugs.osdl.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=4585 > Has anyone done much testing on ext3 on >1TB devices?
The only place I can find any testing of ext3 > 1Tb is at https://listman.redhat.com/archives/ext3-users/
I have the same problem with "just" 936 Gig. EXT3-fs error (device dm-0): ext3_new_block: Allocating block in system zone - block = 78676529 Aborting journal on device dm-0. EXT3-fs error (device dm-0) in ext3_prepare_write: Journal has aborted ext3_abort called. EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_committed_data This happens when I run e2fsck: Group 2187's inode table at 71663664 conflicts with some other fs block. Relocate<y>? yes Group 2187's inode table at 71663665 conflicts with some other fs block. Relocate<y>? yes .............lots of these ............... Error allocating 512 contiguous block(s) in block group 2187 for inode table: Could not allocate block in ext2 filesystem Error allocating 512 contiguous block(s) in block group 3125 for inode table: Could not allocate block in ext2 filesystem Restarting e2fsck from the beginning... e2fsck then restarts over and over again.
Just as a point of reference, I am aware of many hundreds of ext3 filesystems over 1TB, up to 2TB with 2.4.21 RHEL3, 2.6.9 RHEL4, and 2.6.5 SLES9, and at least some with vanilla 2.6.12.6 without any problems. All of these systems use external harware RAID devices, so eliminate a lot of potential sources of problems (LVM/dm, MD), and usually have the same SCSI controller (qla2300). I suspect if the problem is an intermittent issue (is comment #7 intermittent?) that it may be hardware related. There have been intermittent reports of problems using ext3 filesystems over 2TB (some mentioning 64-bit hosts are OK), even though RHEL 4 claims support up to 8TB and I can't find any caveats listed for this. In most of the > 2TB problem reports the superblock, group descriptors, and large portions of the start of the filesystem are completely corrupted and overwritten. In the next few months I should be getting unrestricted access to a system with 24TB of storage that I can reformat and data-check as desired.
Just a quick thought: Have you run memtest over your machines over a weekend or not?
I believe the root cause of this bug is the lack of CONFIG_LBD on some custom-made kernels. The lack of CONFIG_LBD on a 32-bit platform causes > 2TB filesystems to silently truncate block numbers to 2TB (even if they are in a partition < 2TB in size on a > 2TB device). This was fixed in 2.6.17.
Created attachment 12239 [details] copy of errors from the /var/log/messages file
Sorry my comments when I posted the file seem to have got dropped. This bug seesm to have re-appeared. I have a 3TB Raid array running on an Areca raid controller. Previously the controller was running the array as a 1.8TB partition and a 1.2TB partition, the 1.8Tb had to be expanded to the full 3TB (less overhead). I migrated all the data off the array (which has been working fine for 8 months) and recreted it as follows:- parted /dev/sdb 1. rm 1 (removed the remaining old 1.8TB partition) 2. unit GB 3. mklabel - gpt 4. mkpartfs - primary - ext2 - 0 - 3000 5. q The partition was created without problems and was converted to ext3 using tune2fs -j /dev/sdb1, again no problems reported. As soon as I started to migrate the data back the messages log started to fill with the meesages shown on the arlier attachment. The partition has already gone into ro mode once and although appearing stable right now (even though the log is filling with the error messages I'm not happy moving 2.2TB of data back onto the array. I'm currently running 2.6.21-1.3228-fc7 x86_64. Any ideas?