Latest working kernel version: 2.6.28.1 Earliest failing kernel version: 2.6.29-rc2-git3 Distribution: Debian/testing Hardware Environment: Software Environment: Problem Description: Steps to reproduce: With 2.6.29-rc2-git3, I suddenly get: [ 73.306279] ip_tables: (C) 2000-2006 Netfilter Core Team [ 80.047614] end_request: I/O error, dev cciss/c0d0, sector 52257960 [ 80.047689] JBD: barrier-based sync failed on cciss!c0d0p8:8 - disabling barriers [ 80.089865] end_request: I/O error, dev cciss/c0d0, sector 87435720 [ 80.089931] JBD: barrier-based sync failed on cciss!c0d0p9:8 - disabling barriers [ 80.141581] end_request: I/O error, dev cciss/c0d0, sector 105586480 [ 80.141643] JBD: barrier-based sync failed on cciss!c0d0p10:8 - disabling barriers With 2.6.28.1 I didn't get those.
[ 5.928545] EXT4-fs: barriers enabled [ 5.937105] kjournald2 starting: pid 703, dev cciss!c0d0p8:8, commit interval 5 seconds [ 5.946272] EXT4 FS on cciss!c0d0p8, internal journal on cciss!c0d0p8:8 [ 5.946331] EXT4-fs: delayed allocation enabled [ 5.946392] EXT4-fs: file extents enabled [ 5.946592] EXT4-fs: mballoc enabled [ 5.946646] EXT4-fs: mounted filesystem cciss!c0d0p8 with ordered data mode [ 5.976729] EXT4-fs: barriers enabled [ 5.985294] kjournald2 starting: pid 704, dev cciss!c0d0p9:8, commit interval 5 seconds [ 5.989773] EXT4 FS on cciss!c0d0p9, internal journal on cciss!c0d0p9:8 [ 5.989833] EXT4-fs: delayed allocation enabled [ 5.989889] EXT4-fs: file extents enabled [ 5.990093] EXT4-fs: mballoc enabled [ 5.990148] EXT4-fs: mounted filesystem cciss!c0d0p9 with ordered data mode [ 6.011721] EXT4-fs: barriers enabled [ 6.012150] kjournald2 starting: pid 705, dev cciss!c0d0p10:8, commit interval 5 seconds [ 6.020273] EXT4 FS on cciss!c0d0p10, internal journal on cciss!c0d0p10:8 [ 6.020332] EXT4-fs: delayed allocation enabled [ 6.020387] EXT4-fs: file extents enabled [ 6.020604] EXT4-fs: mballoc enabled [ 6.020659] EXT4-fs: mounted filesystem cciss!c0d0p10 with ordered data mode
Looks like the cciss driver is returning an error when we try to do a write with barriers enabled. If we get a failure return from the device driver, we fall back to writing the commit block w/o barriers, and that is apparently succeeding. So this looks like a cciss issue. I've added Mike Miller, the maintainer of the cciss driver, to the cc list. Mike, does this ring any bells; has there been any changes between 2.6.28.1 and 2.6.29-rc2-git3 that might account for this?
cciss does not support write barriers at this time. Seems that someone in the community requested updated specs so they might implement the support, but I can't find that particular mail. I'll have to look at how other drivers implement write barriers.
Mike, thanks. So was cciss silently ignoring the barrier request in 2.6.28?
Yes, that's correct. We've always ignored the request but you had to be looking for the failure in the kernel logs. I don't know of any problems related to the lack of support.
Also, I just noticed (DUH!) that this is EXT4. In the past I've seen the failure on reiserfs. I don't recall offhand the I/O error, however.
I have a couple of questions about write barriers. It seems that write barriers are used to ensure the proper ordering of data being written to disk from within the drives write cache, not controller cache. Is this accurate? If this is correct then there may be no need for the support on Smart Array controllers. All SCSI and SAS disks shipped by HP have the drive write cache disabled and we do not provide a mechanism to enable that cache. NOTE: Some SATA configurations do allow the drive write cache to be enabled. When using the Battery Backed Write Cache (BBWC) on the controller there is no way to flush the data for a particular logical volume. It's all or nothing. If the user does not have the BBWC then all data is written directly to each disk in the logical volume. Given this information does it make sense to implement write barriers for cciss?
Something I should have mentioned, we do plan to investigate the I/O error and correct the problem. But since we've never supported write barriers is it possible something in EXT4 changed and is now producing the error? I've been negligent and haven't looked at EXT4. :(
ext4 is now defaulting to barriers on if the storage supports it; ext3 can be mounted with barriers, but it's not the default, so the error would be pretty uncommon with ext3. That may be the difference. FWIW xfs has had barriers on by default for a long time, so you likely would have seen a similar message if xfs were used.
It was ext4 before, it's ext4 now, so that's not it :)
Here's the results in my lab using 2.6.29-rc2: cciss2: <0x3230> at PCI 0000:02:00.0 IRQ 80 using DAC blocks= 213196320 block_size= 512 heads=255, sectors=32, cylinders=26127 blocks= 213196320 block_size= 512 heads=255, sectors=32, cylinders=26127 cciss/c2d0: unknown partition table EXT4-fs: barriers enabled kjournald2 starting: pid 15279, dev cciss!c2d0p1:8, commit interval 5 seconds EXT4 FS on cciss!c2d0p1, internal journal on cciss!c2d0p1:8 EXT4-fs: delayed allocation enabled EXT4-fs: file extents enabled EXT4-fs: mballoc enabled EXT4-fs: mounted filesystem cciss!c2d0p1 with ordered data mode [root@testmonkey e2fsprogs-1.41.4]# mount /dev/sda2 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/sda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/cciss/c2d0p1 on /test type ext4 (rw) I used e2fsprogs-1.41.4. Comments?
I was writing to the ext4 filesystem using this: time dd if=/dev/sda of=/test/sda.file And when I hit the end of it I got: EXT4-fs: mounted filesystem cciss!c2d0p1 with ordered data mode end_request: I/O error, dev cciss/c2d0, sector 105119855 JBD: barrier-based sync failed on cciss!c2d0p1:8 - disabling barriers dd used greatest stack depth: 1108 bytes left The write just sat there until I killed it.
Still there in 2.6.29-rc3
JFYI, this seems to be the same problem as is handled in 12497.
*** This bug has been marked as a duplicate of bug 12497 ***