Bug 9554
Summary: | write barriers over device mapper are not supported | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Martin Steigerwald (Martin) |
Component: | LVM2/DM | Assignee: | Alasdair G Kergon (agk) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | agk, axboe, bernie, diegocg, elliot.li.tech, knikanth, marcus, mishu, pedrib |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.27 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Martin Steigerwald
2007-12-12 13:59:20 UTC
Any update on that? Well I didn't use LVM for valuable data on my new harddisk for my IBM ThinkPad T42 in order to safely take advantage of the harddisks write cache, but still I think write barriers over device mapper should be supported. For a customer we disabled the write caches of the drives on any server without a NVRAM backed up harddisk controller. It doesn't *seem* to harm performance that much with ext3 and XFS, but its still not optimal in my eyes. It's a general Linux problem. Barriers will also fail on md devices (Linux software RAID). One also can't use barriers with encrypted devices (LUKS / cryptsetup - it uses dm). Too bad md / device mapper are not integrated with filesystems any better. So this bug is relevant in more kernel subsystems. Yes, Thomasz, I know. Basically barriers do not work in many interesting places in the kernel. AFAIK md is also a dm application. I think we may need to add a comment into the document, encouraging people to turn off write-cache, at least on their laptops. People often encrypt their laptops' HD. And laptops are most likely to hit power loss or uncleanly shutdown, thus are most likely to suffer from this. >encouraging people to turn off write-cache, If Disk has no NCQ support this result in 50% performance degradation. >at least on their laptops. Write disk cache drop may happens only after hard power failure (cutting power cable from your box), or by explicit reset command which was sent to device. Pushing "Reset" button,kernel panic or BUG_ON not affect disk write cache and it will be flushed later without any pain. Accidental power failure is almost impossible on any laptops because it has battery. So this issue affect only big box without UPS. I beg to differ in two ways: 1) When I disabled the write cache back then when I had the XFS corruption due to the changes to the block layer and defaultly disabled write barriers in 2.6.16 I felt no performance degradation at all. So I am not that sure about a 50% performance loss due to switching of the hard drive write cache. However I AFAIR I did not measure it with benchmarks, this has just been my subjective impression - which counts BTW cause I am the one working with my laptop;-). The page cache was still active nonetheless and it might be filesystem dependent. With XFS I hardly felt a difference. 2) I had my older laptop - ThinkPad T23 - *without* battery in it, cause it could not be programmed to only charge it when its 90% oder 70% full as the new one - a T42 - is capable to. As I read that each charge cycle counts as such even when it is just charging 1% of the capacity I thought removing the battery would be better. But also apart from that, back then I had hard crashes where the only thing I knew that I could still do was to forcefully power off the machine. So maybe there will not be a disk cache drop on the hard crash, but somehow I had to get back to a working system again - and I didn't feel like using some kernel debugger and serial console or sysrq magic back then. I am pretty sure to remember that the XFS partition with the root filesystem was damaged more than once[1]. And that the issue was worked-around for me by *disabling* the write cache back then. And that it didn't happen anymore with 2.6.17 even when the write cache was on *while* barriers were enabled. Granted I did not try without write barriers once again... but I also won't do it ever again as long as I do not have strong evidence that I do not put my data at risk. So at least for me that issue affected a laptop. Three times. On one of them only a complete restore of the partition has helped since the damages were too big. So as long as I cannot be convinced that I have seen something completely different as what I thought I have seen, I consider the lack of write barrier support in a kernel subsystem without a BIG FAT FARNING in its documentation or even better on mounting the filesystem a bug. Maybe it would also do if the warning of the filesystem would be more visible, not everyone looks at dmesg after successful mounting a filesystem. And it also affects server. We disabled the write cache on all the servers of a big web server cluster that did not sport controller side battery backed up write caches. Including an external RAID array. Upto to know we luckily had no performance issues (see point 1). And luckily not a single filesystem crash since its start in production use even tough once power was lost in both data centers that cluster is redundantly resided in. Well in the end I really would love to read anything about the issue I brought up in this bug report from a device mapper / block layer kernel developer. Are you one? So far it appears to me that this bug has been completely ignored by kernel developers except for Andrew Morton who added a CC. Well I can live with that, as long as I am happy to use my laptop without LVM and as long as servers we run run fast enough without write caches or have battery backed up RAM. [1] http://bugzilla.kernel.org/show_bug.cgi?id=6380, the links to the mails of the linux-xfs mailinglists are broken however and I don't want to invest the time to search to what they changed. Well a reset button on my ThinkPad could also solve the issue for me, as I understand? I did not yet found one. AFAIK I can only power it off in case of a hard crash / kernel BUG. And still big boxes with commodity hardware (without battery backed up RAM) are also affected. And Linux was always good running at those. Revisiting this bug.... (lack of barriers support on the device mapper) Ext4, which got released a few days ago, enables barriers by default, so this message and problem will be seen by many people as soon as distros like Fedora start shipping Ext4 as the default FS. For DM devices within a single device, barriers are apparently supported as of commit ab4c1424882be9cd70b89abf2b484add355712fa. (In reply to comment #9) > For DM devices within a single device, barriers are apparently supported as > of > commit ab4c1424882be9cd70b89abf2b484add355712fa. Does this include VGs created on top of an md raid5 volume? Problem still occurs with Debian squeeze/testing and vanilla kernel 2.6.29.4 I have a luks partition which displays the following warning on boot: [ 23.490430] Filesystem "dm-0": Disabling barriers, trial barrier write failed [ 23.520985] XFS mounting filesystem dm-0 [ 23.905776] Ending clean XFS mount for filesystem: dm-0 I'm not sure what you mean by "DM devices within a single device", but this is a logical partition in a single laptop hard disk. As of 2.6.31-rc1, write barriers are supported by most device-mapper targets. (Just dm-raid1 and dm-mpath still need finishing.) Barriers should be fully supported by dm from 2.6.33-rc1 onwards. Thanks a lot to all involved developers and testers! Fine to see that this finally got resolved. |