Bug 9554

Summary: write barriers over device mapper are not supported
Product: IO/Storage Reporter: Martin Steigerwald (Martin)
Component: LVM2/DMAssignee: Alasdair G Kergon (agk)
Status: CLOSED CODE_FIX    
Severity: normal CC: agk, axboe, bernie, diegocg, elliot.li.tech, knikanth, marcus, mishu, pedrib
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27 Subsystem:
Regression: No Bisected commit-id:

Description Martin Steigerwald 2007-12-12 13:59:20 UTC
Most recent kernel where this bug did not occur: -
Distribution: Debian Etch/Lenny/Sid/Experimental
Hardware Environment: IBM ThinkPad T42 with internal harddisk

Software Environment: 
martin@shambala> apt-show-versions | egrep "(xfsprogs|dmsetup|lvm2)"
lvm2/sid uptodate 2.02.26-1+b1
xfsprogs/sid uptodate 2.9.4-2
dmsetup/sid uptodate 2:1.02.20-2

The internal harddisk is accessed by libata PATA.

Problem Description:
Write barriers over device mapper are not supported, which means that write barriers over LVM2 are also not supported.

Looking at dm_request() as Eric Sandeen did in a post on the XFS mailing list suggests that there are technical reasons for the lack of write barrier support[1]:

martin@shambala> grep -A 9 -m1 "dm_request" linux-2.6.23/drivers/md/dm.c
static int dm_request(struct request_queue *q, struct bio *bio)
{
        int r;
        int rw = bio_data_dir(bio);
        struct mapped_device *md = q->queuedata;

        /*
         * There is no use in forwarding any barrier request since we can't
         * guarantee it is (or can be) handled by the targets correctly.
         */

From a user point of view I actually would like write barrier support with device mapper and the dm application LVM2 regardless of any technical difficulties. And fail to see the point of write barriers when half the kernel actually doesn't support it.

Using write barriers helped a lot with stability of XFS on unclean filesystem shutdowns. And even with ext3 which seems to be less vunerable IO to out of order IO requests, well with any journalling filesystem write barriers should help stability on commodity hardware.

Is write barrier support for device mapper, at least with LVM2, planned? IMHO there should be a way that allow dm targets to report whether they can handle barrier requests.

[1] http://oss.sgi.com/archives/xfs/2007-12/msg00080.html


Steps to reproduce:
Mounting an ext3 from a logical volume (LVM 2) with -o barrier=1 and touching a file gives:
JBD: barrier-based sync failed on dm-0 - disabling barriers

Mounting a XFS from a logical volume gives:
Filesystem "dm-0": Disabling barriers, not supported by the underlying device

Mounting a reiserfs from a logical volume gives:
reiserfs: using flush barriers
[...]
reiserfs: disabling flush barriers on dm-2
Comment 1 Martin Steigerwald 2008-01-07 13:51:01 UTC
Any update on that?

Well I didn't use LVM for valuable data on my new harddisk for my IBM ThinkPad T42 in order to safely take advantage of the harddisks write cache, but still I think write barriers over device mapper should be supported.

For a customer we disabled the write caches of the drives on any server without a NVRAM backed up harddisk controller. It doesn't *seem* to harm performance that much with ext3 and XFS, but its still not optimal in my eyes.
Comment 2 Tomasz Chmielewski 2008-01-18 06:52:37 UTC
It's a general Linux problem. Barriers will also fail on md devices (Linux software RAID).

One also can't use barriers with encrypted devices (LUKS / cryptsetup - it uses dm).

Too bad md / device mapper are not integrated with filesystems any better.

So this bug is relevant in more kernel subsystems.
Comment 3 Martin Steigerwald 2008-02-27 14:03:50 UTC
Yes, Thomasz, I know. Basically barriers do not work in many interesting places in the kernel. AFAIK md is also a dm application.
Comment 4 Yan Li 2008-05-31 00:10:59 UTC
I think we may need to add a comment into the document, encouraging people to turn off write-cache, at least on their laptops.  People often encrypt their laptops' HD. And laptops are most likely to hit power loss or uncleanly shutdown, thus are most likely to suffer from this.
Comment 5 Dmitry Monakhov 2008-06-01 07:48:18 UTC
>encouraging people to turn off write-cache,
If Disk has no NCQ support this result in 50% performance degradation.
>at least on their laptops.
Write disk cache drop may happens only after hard power failure (cutting power cable from your box), or by explicit reset command which was sent to device.
Pushing "Reset" button,kernel panic or BUG_ON not affect disk write cache and it
will be flushed later without any pain.
Accidental power failure is almost impossible on any laptops because it has battery. So this issue affect only big box without UPS.
Comment 6 Martin Steigerwald 2008-06-01 12:34:48 UTC
I beg to differ in two ways:

1) When I disabled the write cache back then when I had the XFS corruption due to the changes to the block layer and defaultly disabled write barriers in 2.6.16 I felt no performance degradation at all. So I am not that sure about a 50% performance loss due to switching of the hard drive write cache. However I AFAIR I did not measure it with benchmarks, this has just been my subjective impression - which counts BTW cause I am the one working with my laptop;-). The page cache was still active nonetheless and it might be filesystem dependent. With XFS I hardly felt a difference.

2) I had my older laptop - ThinkPad T23 - *without* battery in it, cause it could not be programmed to only charge it when its 90% oder 70% full as the new one - a T42 - is capable to. As I read that each charge cycle counts as such even when it is just charging 1% of the capacity I thought removing the battery would be better. But also apart from that, back then I had hard crashes where the only thing I knew that I could still do was to forcefully power off the machine. So maybe there will not be a disk cache drop on the hard crash, but somehow I had to get back to a working system again - and I didn't feel like using some kernel debugger and serial console or sysrq magic back then. I am pretty sure to remember that the XFS partition with the root filesystem was damaged more than once[1]. And that the issue was worked-around for me by *disabling* the write cache back then. And that it didn't happen anymore with 2.6.17 even when the write cache was on *while* barriers were enabled. Granted I did not try without write barriers once again... but I also won't do it ever again as long as I do not have strong evidence that I do not put my data at risk.

So at least for me that issue affected a laptop. Three times. On one of them only a complete restore of the partition has helped since the damages were too big. So as long as I cannot be convinced that I have seen something completely different as what I thought I have seen, I consider the lack of write barrier support in a kernel subsystem without a BIG FAT FARNING in its documentation or even better on mounting the filesystem a bug. Maybe it would also do if the warning of the filesystem would be more visible, not everyone looks at dmesg after successful mounting a filesystem.

And it also affects server. We disabled the write cache on all the servers of a big web server cluster that did not sport controller side battery backed up write caches. Including an external RAID array. Upto to know we luckily had no performance issues (see point 1). And luckily not a single filesystem crash since its start in production use even tough once power was lost in both data centers that cluster is redundantly resided in.

Well in the end I really would love to read anything about the issue I brought up in this bug report from a device mapper / block layer kernel developer. Are you one? So far it appears to me that this bug has been completely ignored by kernel developers except for Andrew Morton who added a CC. Well I can live with that, as long as I am happy to use my laptop without LVM and as long as servers we run run fast enough without write caches or have battery backed up RAM.

[1] http://bugzilla.kernel.org/show_bug.cgi?id=6380, the links to the mails of the linux-xfs mailinglists are broken however and I don't want to invest the time to search to what they changed.
Comment 7 Martin Steigerwald 2008-06-01 12:38:22 UTC
Well a reset button on my ThinkPad could also solve the issue for me, as I understand? I did not yet found one. AFAIK I can only power it off in case of a hard crash / kernel BUG. And still big boxes with commodity hardware (without battery backed up RAM) are also affected. And Linux was always good running at those.
Comment 8 Diego Calleja 2008-12-26 15:05:02 UTC
Revisiting this bug.... (lack of barriers support on the device mapper)

Ext4, which got released a few days ago, enables barriers by default, so this message and problem will be seen by many people as soon as distros like Fedora start shipping Ext4 as the default FS.
Comment 9 Diego Calleja 2009-01-06 08:44:27 UTC
For DM devices within a single device, barriers are apparently supported as of commit ab4c1424882be9cd70b89abf2b484add355712fa.
Comment 10 Bernie Innocenti 2009-02-28 12:54:30 UTC
(In reply to comment #9)
> For DM devices within a single device, barriers are apparently supported as
> of
> commit ab4c1424882be9cd70b89abf2b484add355712fa.

Does this include VGs created on top of an md raid5 volume?
Comment 11 Pedro Ribeiro 2009-06-16 22:55:51 UTC
Problem still occurs with Debian squeeze/testing and vanilla kernel 2.6.29.4

I have a luks partition which displays the following warning on boot:
[   23.490430] Filesystem "dm-0": Disabling barriers, trial barrier write failed
[   23.520985] XFS mounting filesystem dm-0
[   23.905776] Ending clean XFS mount for filesystem: dm-0

I'm not sure what you mean by "DM devices within a single device", but this is a logical partition in a single laptop hard disk.
Comment 12 Alasdair G Kergon 2009-07-01 10:47:44 UTC
As of 2.6.31-rc1, write barriers are supported by most device-mapper targets. 

(Just dm-raid1 and dm-mpath still need finishing.)
Comment 13 Alasdair G Kergon 2010-01-13 19:04:40 UTC
Barriers should be fully supported by dm from 2.6.33-rc1 onwards.
Comment 14 Martin Steigerwald 2010-01-14 10:00:55 UTC
Thanks a lot to all involved developers and testers! Fine to see that this finally got resolved.