Bug 12551 - end_request: I/O error, dev cciss/c0d0, sector 87435720
Summary: end_request: I/O error, dev cciss/c0d0, sector 87435720
Status: CLOSED DUPLICATE of bug 12497
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Mike Miller
URL:
Keywords:
Depends on:
Blocks: 12398
  Show dependency tree
 
Reported: 2009-01-27 06:51 UTC by Ralf Hildebrandt
Modified: 2009-03-03 11:43 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.29-rc2-git3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Ralf Hildebrandt 2009-01-27 06:51:56 UTC
Latest working kernel version: 2.6.28.1
Earliest failing kernel version: 2.6.29-rc2-git3
Distribution: Debian/testing
Hardware Environment:
Software Environment:
Problem Description:

Steps to reproduce:

With 2.6.29-rc2-git3, I suddenly get:

[   73.306279] ip_tables: (C) 2000-2006 Netfilter Core Team
[   80.047614] end_request: I/O error, dev cciss/c0d0, sector 52257960
[   80.047689] JBD: barrier-based sync failed on cciss!c0d0p8:8 - disabling barriers
[   80.089865] end_request: I/O error, dev cciss/c0d0, sector 87435720
[   80.089931] JBD: barrier-based sync failed on cciss!c0d0p9:8 - disabling barriers
[   80.141581] end_request: I/O error, dev cciss/c0d0, sector 105586480
[   80.141643] JBD: barrier-based sync failed on cciss!c0d0p10:8 - disabling barriers

With 2.6.28.1 I didn't get those.
Comment 1 Ralf Hildebrandt 2009-01-27 06:52:34 UTC
[    5.928545] EXT4-fs: barriers enabled
[    5.937105] kjournald2 starting: pid 703, dev cciss!c0d0p8:8, commit interval 5 seconds
[    5.946272] EXT4 FS on cciss!c0d0p8, internal journal on cciss!c0d0p8:8
[    5.946331] EXT4-fs: delayed allocation enabled
[    5.946392] EXT4-fs: file extents enabled
[    5.946592] EXT4-fs: mballoc enabled
[    5.946646] EXT4-fs: mounted filesystem cciss!c0d0p8 with ordered data mode
[    5.976729] EXT4-fs: barriers enabled
[    5.985294] kjournald2 starting: pid 704, dev cciss!c0d0p9:8, commit interval 5 seconds
[    5.989773] EXT4 FS on cciss!c0d0p9, internal journal on cciss!c0d0p9:8
[    5.989833] EXT4-fs: delayed allocation enabled
[    5.989889] EXT4-fs: file extents enabled
[    5.990093] EXT4-fs: mballoc enabled
[    5.990148] EXT4-fs: mounted filesystem cciss!c0d0p9 with ordered data mode
[    6.011721] EXT4-fs: barriers enabled
[    6.012150] kjournald2 starting: pid 705, dev cciss!c0d0p10:8, commit interval 5 seconds
[    6.020273] EXT4 FS on cciss!c0d0p10, internal journal on cciss!c0d0p10:8
[    6.020332] EXT4-fs: delayed allocation enabled
[    6.020387] EXT4-fs: file extents enabled
[    6.020604] EXT4-fs: mballoc enabled
[    6.020659] EXT4-fs: mounted filesystem cciss!c0d0p10 with ordered data mode
Comment 2 Theodore Tso 2009-01-27 07:42:47 UTC
Looks like the cciss driver is returning an error when we try to do a write with barriers enabled.   If we get a failure return from the device driver, we fall back to writing the commit block w/o barriers, and that is apparently succeeding.   So this looks like a cciss issue.

I've added Mike Miller, the maintainer of the cciss driver, to the cc list.  Mike, does this ring any bells; has there been any changes between 2.6.28.1 and 2.6.29-rc2-git3 that might account for this?
Comment 3 Mike Miller 2009-01-27 08:51:36 UTC
cciss does not support write barriers at this time. Seems that someone in the community requested updated specs so they might implement the support, but I can't find that particular mail.
I'll have to look at how other drivers implement write barriers.
Comment 4 Theodore Tso 2009-01-27 09:05:27 UTC
Mike, thanks.   So was cciss silently ignoring the barrier request in 2.6.28?
Comment 5 Mike Miller 2009-01-27 09:09:27 UTC
Yes, that's correct. We've always ignored the request but you had to be looking for the failure in the kernel logs. I don't know of any problems related to the lack of support.
Comment 6 Mike Miller 2009-01-27 09:16:15 UTC
Also, I just noticed (DUH!) that this is EXT4. In the past I've seen the failure on reiserfs. I don't recall offhand the I/O error, however.
Comment 7 Mike Miller 2009-01-27 13:21:20 UTC
I have a couple of questions about write barriers. It seems that write barriers are used to ensure the proper ordering of data being written to disk from within the drives write cache, not controller cache. Is this accurate?
If this is correct then there may be no need for the support on Smart Array controllers. All SCSI and SAS disks shipped by HP have the drive write cache disabled and we do not provide a mechanism to enable that cache. NOTE: Some SATA configurations do allow the drive write cache to be enabled.
When using the Battery Backed Write Cache (BBWC) on the controller there is no way to flush the data for a particular logical volume. It's all or nothing. If the user does not have the BBWC then all data is written directly to each disk in the logical volume.
Given this information does it make sense to implement write barriers for cciss?
Comment 8 Mike Miller 2009-01-27 13:24:21 UTC
Something I should have mentioned, we do plan to investigate the I/O error and correct the problem. But since we've never supported write barriers is it possible something in EXT4 changed and is now producing the error? I've been negligent and haven't looked at EXT4. :(
Comment 9 Eric Sandeen 2009-01-30 08:24:03 UTC
ext4 is now defaulting to barriers on if the storage supports it; ext3 can be mounted with barriers, but it's not the default, so the error would be pretty uncommon with ext3.  That may be the difference.  FWIW xfs has had barriers on by default for a long time, so you likely would have seen a similar message if xfs were used.
Comment 10 Ralf Hildebrandt 2009-01-31 06:30:11 UTC
It was ext4 before, it's ext4 now, so that's not it :)
Comment 11 Mike Miller 2009-02-02 09:08:48 UTC
Here's the results in my lab using 2.6.29-rc2:

cciss2: <0x3230> at PCI 0000:02:00.0 IRQ 80 using DAC
      blocks= 213196320 block_size= 512
      heads=255, sectors=32, cylinders=26127

      blocks= 213196320 block_size= 512
      heads=255, sectors=32, cylinders=26127

 cciss/c2d0: unknown partition table
EXT4-fs: barriers enabled
kjournald2 starting: pid 15279, dev cciss!c2d0p1:8, commit interval 5 seconds
EXT4 FS on cciss!c2d0p1, internal journal on cciss!c2d0p1:8
EXT4-fs: delayed allocation enabled
EXT4-fs: file extents enabled
EXT4-fs: mballoc enabled
EXT4-fs: mounted filesystem cciss!c2d0p1 with ordered data mode

[root@testmonkey e2fsprogs-1.41.4]# mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/cciss/c2d0p1 on /test type ext4 (rw)

I used e2fsprogs-1.41.4. Comments?
Comment 12 Mike Miller 2009-02-02 10:54:44 UTC
I was writing to the ext4 filesystem using this:

time dd if=/dev/sda of=/test/sda.file

And when I hit the end of it I got:

EXT4-fs: mounted filesystem cciss!c2d0p1 with ordered data mode
end_request: I/O error, dev cciss/c2d0, sector 105119855
JBD: barrier-based sync failed on cciss!c2d0p1:8 - disabling barriers
dd used greatest stack depth: 1108 bytes left

The write just sat there until I killed it. 
Comment 13 Ralf Hildebrandt 2009-02-04 05:00:21 UTC
Still there in 2.6.29-rc3
Comment 14 Jan Kara 2009-03-03 11:33:20 UTC
JFYI, this seems to be the same problem as is handled in 12497.
Comment 15 Rafael J. Wysocki 2009-03-03 11:43:25 UTC

*** This bug has been marked as a duplicate of bug 12497 ***

Note You need to log in before you can comment on or make changes to this bug.