Subject : WARNING: at fs/fs-writeback.c:1127 __mark_inode_dirty Submitter : Zdenek Kabelac <zdenek.kabelac@gmail.com> Date : 2010-06-28 9:40 Message-ID : AANLkTin24fr5O4_q5Xbo9Y_NKkEmtcp6Hgmr9_4qXaFz@mail.gmail.com References : http://marc.info/?l=linux-kernel&m=127771804806465&w=2 This entry is being used for tracking a regression from 2.6.34. Please don't close it until the problem is fixed in the mainline.
This is a duplicate of Bug #16122.
*** Bug 16122 has been marked as a duplicate of this bug. ***
I was looking into this. A trivial reproducer is: mknod devzero c 1 5; touch devzero Essentially, the problem is that any device inode can get dirty because of time stamp update or similar reason but some BDIs (in this case the one backing /dev/zero) do not expect dirty inodes. Hmm, I now have to think how to fix this... It would be more natural if device inodes were queued for metadata updates in the queues of "parent" filesystem. But this brings problem that device inode would need to be in two queues - one for metadata update to the parent filesystem and one for data writeout to the device (this is not exactly the case of /dev/zero but if you have a device node for /dev/sdb on /dev/sda, then this is exactly what we would need).
On Friday, July 23, 2010, Larry Finger wrote: > On 07/23/2010 06:42 AM, Rafael J. Wysocki wrote: > > This message contains a list of some regressions from 2.6.34, > > for which there are no fixes in the mainline known to the tracking team. > > If any of them have been fixed already, please let us know. > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16312 > > Subject : WARNING: at fs/fs-writeback.c:1127 __mark_inode_dirty > > Submitter : Zdenek Kabelac<zdenek.kabelac@gmail.com> > > Date : 2010-06-28 9:40 (26 days old) > > I still have this in 2.6.35-rc5.
Created attachment 27255 [details] First patch to fix the issue
Created attachment 27256 [details] Second patch to fix the issue The above two patches fix the issue for me.
Guys, it would be nice if you could test 2.6.35-rc5 (or basically 2.6.35-whatever) with the above to patches and check that they fix the issue for you. Thanks.
Hmm, looking at the original report referenced, my patches need not fix this particular warning. This particular warning rather looks like a race when udisks-part-id manages to read the device (and touch it's atime) before sd_probe_async runs and registers the bdi (via add_disk()). That seems to be possible as device_add() is run before we even schedule to run sd_async_probe() in a separate thread. Jens, any idea how this should be fixed? I believe we should avoid this race because we should better have the device's queue fully set up before allowing IO to it... But I'm far from fully understanding this area...
These patches break my system (the one reported in Bug #16122). It obviously is different. After installing the patches. I get a NULL pointer dereference for address 0x30 as __mark_inode_dirty + 0xf0/0x1a0 (x86_64 system). The code offset 0xf30. I have a 2.6.35-rc6 kernel. The assembler code around that address is /home/finger/linux-temp/fs/fs-writeback.c:960 f28: 4d 8b ac 24 b0 01 00 mov 0x1b0(%r12),%r13 f2f: 00 /home/finger/linux-temp/fs/fs-writeback.c:962 f30: 41 f6 45 30 02 testb $0x2,0x30(%r13) f35: 75 31 jne f68 <__mark_inode_dirty+0x128> f37: 41 f6 45 28 10 testb $0x10,0x28(%r13) f3c: 75 2a jne f68 <__mark_inode_dirty+0x128> The code at line 960 is: struct backing_dev_info *bdi = wb->bdi; if (bdi_cap_writeback_dirty(bdi) && !test_bit(BDI_registered, &bdi->state)) { WARN_ON(1); printk(KERN_ERR "bdi-%s not registered\n", bdi->name); } and wb->bdi is NULL.
The rest of the traceback is: touch_atime+0x12d vfs_readdir+0xbe sys_detdents64+0x80 system_call_fastpath+0x16
Doh, thanks for testing. It looks as if we somehow get to bdi which isn't properly initialized (wb->bdi should always point back to bdi containing the wb structure). Can you try once more with the debugging patch below applied on top of my two patches and report what it writes to the log? Thanks.
Created attachment 27267 [details] Debugging patch to print more BDI information
The log will have to be hand written. The system would not boot, neither a netconsole or serial console is possible, and the pictures I took of the screen are really blurry. The beginning of the dump is: Inode devices at sysfs has uninit bdi Mapping bdi sysfs, caps 131 inode_to_bdi returns bdi noop, caps 0 bdi=noop not registered BUG: unable to handle NULL pointer at offset 0x8 __mark_inode_dirty+0x159/0x240 The above NULL pointer is in the expansion of list_move(&inode->i_list, &wb->b_dirty); at the statement next->prev = new Beyond this point, it generates lots of other errors and eventually hangs. Good luck.
Thanks for the prompt test. Indeed noop_backing_dev_info isn't properly initialized. Attached patch should fix this so this patch plus patches from comments 5 and 6 should at least boot (at least they do for me ;).
Created attachment 27272 [details] Initialize noop bdi properly
With the patches from comments 5, 6 and 15, my system boots cleanly. Good job. As there probably will not be another rc for 2.6.35, this fix is just in time. Thanks.
I forgot one part. If you want, add a "Tested-by: Larry Finger <Larry.Finger@lwfinger.net> to the patches.
Handled-By : Jan Kara <jack@suse.cz> Patch : https://bugzilla.kernel.org/attachment.cgi?id=27272
This one needs to be reopened. The WARNING that I get does not happen every reboot, but now I'm getting 2 when it happens. This kernel includes the 3 patches posted above. [ 12.183797] ------------[ cut here ]------------ [ 12.183808] WARNING: at fs/fs-writeback.c:964 __mark_inode_dirty+0x10f/0x1a0() [ 12.183811] Hardware name: HP Pavilion dv2700 Notebook PC [ 12.183814] Modules linked in: loop(+) dm_mod ide_cd_mod cdrom ide_pci_generic arc4 ecb b43 rng_core mac80211 snd_hda_codec_conexant cfg80211 rfkill led_class amd74xx ssb mmc_core ide_core snd_hda_intel snd_hda_codec snd_pcm snd_timer k8temp pcmcia pcmcia_core forcedeth snd i2c_nforce2 joydev hwmon sg snd_page_alloc serio_raw i2c_core ac battery button ext4 mbcache jbd2 crc16 ohci_hcd ehci_hcd sd_mod usbcore fan processor ahci libahci libata scsi_mod thermal [ 12.183872] Pid: 2444, comm: udisks-part-id Not tainted 2.6.35-rc6-realtek+ #11 [ 12.183875] Call Trace: [ 12.183882] [<ffffffff8104608a>] warn_slowpath_common+0x7a/0xb0 [ 12.183887] [<ffffffff810460d5>] warn_slowpath_null+0x15/0x20 [ 12.183891] [<ffffffff81129d5f>] __mark_inode_dirty+0x10f/0x1a0 [ 12.183895] [<ffffffff8111e07d>] touch_atime+0x12d/0x170 [ 12.183901] [<ffffffff810cab91>] generic_file_aio_read+0x5c1/0x720 [ 12.183906] [<ffffffff81107ca2>] do_sync_read+0xd2/0x110 [ 12.183912] [<ffffffff81077e7d>] ? trace_hardirqs_on+0xd/0x10 [ 12.183916] [<ffffffff811083c3>] vfs_read+0xb3/0x170 [ 12.183921] [<ffffffff81002d1c>] ? sysret_check+0x27/0x62 [ 12.183925] [<ffffffff811084cc>] sys_read+0x4c/0x80 [ 12.183929] [<ffffffff81002ceb>] system_call_fastpath+0x16/0x1b [ 12.183932] ---[ end trace fdce3ec00c26fa3d ]--- [ 12.183935] bdi-block not registered [ 12.205056] ------------[ cut here ]------------ [ 12.205067] WARNING: at fs/fs-writeback.c:964 __mark_inode_dirty+0x10f/0x1a0() [ 12.205070] Hardware name: HP Pavilion dv2700 Notebook PC [ 12.205072] Modules linked in: loop(+) dm_mod ide_cd_mod cdrom ide_pci_generic arc4 ecb b43 rng_core mac80211 snd_hda_codec_conexant cfg80211 rfkill led_class amd74xx ssb mmc_core ide_core snd_hda_intel snd_hda_codec snd_pcm snd_timer k8temp pcmcia pcmcia_core forcedeth snd i2c_nforce2 joydev hwmon sg snd_page_alloc serio_raw i2c_core ac battery button ext4 mbcache jbd2 crc16 ohci_hcd ehci_hcd sd_mod usbcore fan processor ahci libahci libata scsi_mod thermal [ 12.205131] Pid: 2456, comm: udisks-part-id Tainted: G W 2.6.35-rc6-realtek+ #11 [ 12.205134] Call Trace: [ 12.205141] [<ffffffff8104608a>] warn_slowpath_common+0x7a/0xb0 [ 12.205146] [<ffffffff810460d5>] warn_slowpath_null+0x15/0x20 [ 12.205150] [<ffffffff81129d5f>] __mark_inode_dirty+0x10f/0x1a0 [ 12.205155] [<ffffffff8111e07d>] touch_atime+0x12d/0x170 [ 12.205160] [<ffffffff810cab91>] generic_file_aio_read+0x5c1/0x720 [ 12.205165] [<ffffffff81107ca2>] do_sync_read+0xd2/0x110 [ 12.205171] [<ffffffff81077e7d>] ? trace_hardirqs_on+0xd/0x10 [ 12.205175] [<ffffffff811083c3>] vfs_read+0xb3/0x170 [ 12.205180] [<ffffffff81002d1c>] ? sysret_check+0x27/0x62 [ 12.205183] [<ffffffff811084cc>] sys_read+0x4c/0x80 [ 12.205188] [<ffffffff81002ceb>] system_call_fastpath+0x16/0x1b [ 12.205191] ---[ end trace fdce3ec00c26fa3e ]--- [ 12.205193] bdi-block not registered I will put the diagnostic patch from comment 12 to get more info.
The diagnostics patch from #12 did not change anything. I am beginning to think that Bug 16312 is not the same as Bug 16122. Even with the patches from 16312, I still get warnings as below: [ 11.728776] ------------[ cut here ]------------ [ 11.728787] WARNING: at fs/fs-writeback.c:964 __mark_inode_dirty+0x10f/0x1a0() [ 11.728790] Hardware name: HP Pavilion dv2700 Notebook PC [ 11.728792] Modules linked in: loop(+) dm_mod ide_cd_mod cdrom snd_hda_codec_conexant ide_pci_generic arc4 ecb b43 rng_core mac80211 snd_hda_intel r8712u(C) cfg80211 snd_hda_codec amd74xx snd_pcm sg ide_core rfkill led_class snd_timer ssb mmc_core pcmcia snd joydev k8temp hwmon i2c_nforce2 pcmcia_core forcedeth serio_raw snd_page_alloc i2c_core battery ac button ext4 mbcache jbd2 crc16 ohci_hcd sd_mod ehci_hcd usbcore fan processor ahci libahci libata scsi_mod thermal [ 11.728854] Pid: 2449, comm: udisks-part-id Tainted: G C 2.6.35-rc6-realtek+ #15 [ 11.728857] Call Trace: [ 11.728865] [<ffffffff8104608a>] warn_slowpath_common+0x7a/0xb0 [ 11.728869] [<ffffffff810460d5>] warn_slowpath_null+0x15/0x20 [ 11.728874] [<ffffffff81129d5f>] __mark_inode_dirty+0x10f/0x1a0 [ 11.728879] [<ffffffff8111e07d>] touch_atime+0x12d/0x170 [ 11.728885] [<ffffffff810cab91>] generic_file_aio_read+0x5c1/0x720 [ 11.728890] [<ffffffff81107ca2>] do_sync_read+0xd2/0x110 [ 11.728896] [<ffffffff81077e7d>] ? trace_hardirqs_on+0xd/0x10 [ 11.728900] [<ffffffff811083c3>] vfs_read+0xb3/0x170 [ 11.728906] [<ffffffff81002d1c>] ? sysret_check+0x27/0x62 [ 11.728909] [<ffffffff811084cc>] sys_read+0x4c/0x80 [ 11.728914] [<ffffffff81002ceb>] system_call_fastpath+0x16/0x1b [ 11.728917] ---[ end trace 32e16cacad33229f ]--- [ 11.728919] bdi-block not registered The warnings do not occur with every boot and appear to be some kind of race condition.
Yes, I think these are two different bugs just with a similar manifestation. See comment 8 in this bug. I'm just not sure how to fix the second bug in sd.c. I think I'll send a direct email to a few people to catch some attention.
Created attachment 27356 [details] sd-delay-uevent.patch This would make sd announce the sd device after block device is set up, which probably is the right thing to do but it still is wrong to trigger WARN_ON() when a user program accesses a device node which isn't ready yet. Can't that warning be removed by using a default bdi w/ BDI_CAP_NO_ACCT_AND_WRITEBACK set for block device nodes? Thanks.
Thanks for the patch Tejun! Larry, could you please try it? Well, my naive opinion rather is that userspace shouldn't be able to access a device node which isn't fully set up. That could have unexpected pitfalls for the process and the kernel as well, couldn't it? Technically, it's also hard to switch BDI under a living device inode - the BDI pointer could be cached in various places...
I will as soon as I can. The hard drive in the box with that particular problem failed and it is in the shop being repaired. That will take longer than it should as it is in warranty and the work has to be done at a central repair facility. The 20 minute job of replacing the hard drive in that notebook will take 3-4 weeks.
(In reply to comment #23) > Well, my naive opinion rather is that userspace shouldn't be able to access a > device node which isn't fully set up. That could have unexpected pitfalls for > the process and the kernel as well, couldn't it? But device nodes can be created for any device number whether the device exists or not. The patch makes sd announce it later so at least udev doesn't try to access it earlier but unless I'm misunderstanding something userspace still can trigger these warnings easily. Thansk.
I finally got my computer back. Tejun's patch seems to fix the problem. The warning has been intermittent and I will report back if the warning returns later.
The patches here seem not to be mainline yet. I can see some discussion about the patch set of Jan Kara on lkml[1]. But I can't see Tejun's patch either in linus tree nor on lkml. Are these still necessary? [1] References: http://lkml.indiana.edu/hypermail/linux/kernel/1007.3/00884.html
Ah, okay, now I remember this one. Jan, what do you think about comment#25? The patch would prevent udev from triggering the problem automatically but it's still broken. Thanks.
Ignore-Patch : https://bugzilla.kernel.org/attachment.cgi?id=27272
OK, Tejun, I finally read the code. So you are right that the patch doesn't solve the race. The race seems to be in block/genhd.c:add_disk(). Once we call register_disk(), device can be successfully opened (get_gendisk() from blkdev_open() will succeed as soon as disk's kobject exists). But until we call bdi_register_dev(), backing dev isn't properly initialized and thus can see the warning. This also explains why Larry is able to see the warning even with your patch.
Created attachment 30282 [details] Fix race in add_disk() causing warning in __mark_inode_dirty().
This patch should fix the problem. The kernel boots with it so it shouldn't be really wrong. I'll send it to Jens for review and inclusion...
Handled-By : Jan Kara <jack@suse.cz> Patch : https://bugzilla.kernel.org/attachment.cgi?id=30282
I will test that patch when I get back to my machine, but that will be several weeks. Larry
*** Bug 17461 has been marked as a duplicate of this bug. ***
What is the current status of this issue, and which of the attached patches fixes this issue ? This still occurs with 2.6.35.7 when loading the kernel module ib_srp: ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:951 __mark_inode_dirty+0x104/0x180() Hardware name: P5Q DELUXE Modules linked in: ib_srp scsi_transport_srp scsi_tgt netconsole configfs snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet rdma_ucm rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf mlx4_ib ib_mad ib_core fuse loop dm_mod snd_hda_codec_atihdmi snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd mlx4_core sr_mod button i2c_i801 intel_agp soundcore snd_page_alloc serio_raw pcspkr i2c_core cdrom sg usbhid hid uhci_hcd rtc_cmos rtc_core ehci_hcd rtc_lib sd_mod crc_t10dif usbcore edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix pata_marvell ahci libahci libata scsi_mod thermal processor thermal_sys hwmon [last unloaded: ib_srp] Pid: 3900, comm: blkid Not tainted 2.6.35.7 #1 Call Trace: [<ffffffff8104a81f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8104a87a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8112f384>] __mark_inode_dirty+0x104/0x180 [<ffffffff811232f5>] touch_atime+0x135/0x180 [<ffffffff810c7f07>] generic_file_aio_read+0x2c7/0x6c0 [<ffffffff8110ccfa>] do_sync_read+0xda/0x120 [<ffffffff8103049a>] ? do_page_fault+0x18a/0x3d0 [<ffffffff8111cb6d>] ? vfs_ioctl+0x3d/0xd0 [<ffffffff8111cd28>] ? do_vfs_ioctl+0x88/0x540 [<ffffffff8110d925>] vfs_read+0xb5/0x180 [<ffffffff8110dae1>] sys_read+0x51/0x80 [<ffffffff81002fdb>] system_call_fastpath+0x16/0x1b ---[ end trace d756b562ec21031e ]--- bdi-block not registered
It should be the one mentioned in comment #33 . But we have no confirmation about it yet. Can you test it?
The patch works (I can reproduce the problem without it), but it's not going to be included in 2.6.36 AFAICT. It's going to be merged into 2.6.37 and than put into -stable.
(In reply to comment #37) > It should be the one mentioned in comment #33 . > > But we have no confirmation about it yet. Can you test it? That patch works great on my setup.
fixed in .37-rc1 by: commit 01ea50638bc04ca5259f5711fcdedefcdde1cf43 Author: Signed-off-by: Jan Kara <jack@suse.cz> Date: Thu Sep 16 20:36:36 2010 +0200 block: Fix race during disk initialization