Bug 15074
Summary: | Problem with quota on ext4 | ||
---|---|---|---|
Product: | File System | Reporter: | Fabio Fantoni (fantonifabio) |
Component: | ext4 | Assignee: | Jan Kara (jack) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | dmonakhov, gt6, jack |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.33-rc4 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Fix warning at dquot_claim_space()
add warning message on quotaon in case of reserved space print warning about inconsistent quota usage Patch to avoid false warnings when filesystem was written before quotaon |
Description
Fabio Fantoni
2010-01-16 21:49:42 UTC
I can confirm this on 2.6.32.6 (built Jan 25). It's the official Archlinux kernel26. Using these fstab options: /dev/sda1 / ext4 defaults,noatime,usrjquota=aquota.user,jqfmt=vfsv0 0 1 /dev/sda2 /home ext4 defaults,noatime,usrjquota=aquota.user,jqfmt=vfsv0 0 1 After a couple of minutes, and lots and lots of these backtraces, the machine remounts / readonly and spits a couple of quota and IO errors to TTY1. Here's the backtrace: Jan 29 01:07:52 jackfruit kernel: ------------[ cut here ]------------ Jan 29 01:07:52 jackfruit kernel: WARNING: at fs/quota/dquot.c:964 dquot_claim_ space+0x14c/0x170() Jan 29 01:07:52 jackfruit kernel: Hardware name: System Product Name Jan 29 01:07:52 jackfruit kernel: Modules linked in: quota_v2 quota_tree snd_se q_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_via82xx_modem snd_via82xx sn d_pcm_oss snd_ac97_codec snd_mixer_oss snd_mpu401 ac97_bus snd_mpu401_uart batt ery snd_pcm snd_rawmidi fan ppdev snd_timer analog snd_seq_device ac uhci_hcd i 2c_viapro lp psmouse snd ns558 parport_pc edac_core ehci_hcd asus_atk0110 serio_raw pcspkr thermal shpchp edac_mce_amd gameport parport button k8temp processor snd_page_alloc soundcore i2c_core sg evdev usbcore sky2 pci_hotplug rtc_cmos rtc_core rtc_lib ext4 mbcache jbd2 crc16 sr_mod cdrom sd_mod pata_via sata_via pata_acpi ata_generic 3w_9xxx libata scsi_mod Jan 29 01:07:52 jackfruit kernel: Pid: 1324, comm: flush-8:0 Tainted: G W 2.6.32-ARCH #1 Jan 29 01:07:52 jackfruit kernel: Call Trace: Jan 29 01:07:52 jackfruit kernel: [<ffffffff810579d3>] ? warn_slowpath_common+0x73/0xb0 Jan 29 01:07:52 jackfruit kernel: [<ffffffff81164acc>] ? dquot_claim_space+0x14c/0x170 Jan 29 01:07:52 jackfruit kernel: [<ffffffffa010b65b>] ? ext4_mb_mark_diskspace_used+0x3eb/0x4c0 [ext4] Jan 29 01:07:52 jackfruit kernel: [<ffffffffa010fbdb>] ? ext4_mb_new_blocks+0x29b/0x5a0 [ext4] Jan 29 01:07:52 jackfruit kernel: [<ffffffffa010473f>] ? ext4_ext_find_extent+0x2af/0x2f0 [ext4] Jan 29 01:07:52 jackfruit kernel: [<ffffffffa01068d0>] ? ext4_ext_get_blocks+0xe40/0x1610 [ext4] Jan 29 01:07:52 jackfruit kernel: [<ffffffffa010fbdb>] ? ext4_mb_new_blocks+0x2 9b/0x5a0 [ext4] Jan 29 01:07:52 jackfruit kernel: [<ffffffffa010473f>] ? ext4_ext_find_extent+0 x2af/0x2f0 [ext4] Jan 29 01:07:52 jackfruit kernel: [<ffffffffa01068d0>] ? ext4_ext_get_blocks+0x e40/0x1610 [ext4] Jan 29 01:07:52 jackfruit kernel: [<ffffffff810d5fe9>] ? mempool_alloc+0x59/0x140 Jan 29 01:07:52 jackfruit kernel: [<ffffffffa00e44cb>] ? ext4_get_blocks+0x1fb/0x370 [ext4] Jan 29 01:07:52 jackfruit kernel: [<ffffffffa00e47cb>] ? mpage_da_map_blocks+0xbb/0x450 [ext4] Jan 29 01:07:52 jackfruit kernel: [<ffffffffa00e4e9a>] ? ext4_da_writepages+0x33a/0x700 [ext4] Jan 29 01:07:52 jackfruit kernel: [<ffffffff810db030>] ? __writepage+0x0/0x30 Jan 29 01:07:52 jackfruit kernel: [<ffffffff81075e04>] ? bit_waitqueue+0x14/0xcJan 29 01:07:52 jackfruit kernel: [<ffffffff81075e04>] ? bit_waitqueue+0x14/0xc 0 Jan 29 01:07:52 jackfruit kernel: [<ffffffff81135b03>] ? writeback_single_inode+0xf3/0x3c0 Jan 29 01:07:52 jackfruit kernel: [<ffffffff81136b63>] ? writeback_inodes_wb+0x403/0x5d0 Jan 29 01:07:52 jackfruit kernel: [<ffffffff81136e3d>] ? wb_writeback+0x10d/0x1e0 Jan 29 01:07:52 jackfruit kernel: [<ffffffff811371c9>] ? wb_do_writeback+0x1d9/0x1f0 Jan 29 01:07:52 jackfruit kernel: [<ffffffff81067e20>] ? process_timeout+0x0/0x10 Jan 29 01:07:52 jackfruit kernel: [<ffffffff81137223>] ? bdi_writeback_task+0x43/0xd0 Jan 29 01:07:52 jackfruit kernel: [<ffffffff810ecf80>] ? bdi_start_fn+0x0/0xf0 Jan 29 01:07:52 jackfruit kernel: [<ffffffff810ecffe>] ? bdi_start_fn+0x7e/0xf0 Jan 29 01:07:52 jackfruit kernel: [<ffffffff810ecf80>] ? bdi_start_fn+0x0/0xf0 Jan 29 01:07:52 jackfruit kernel: [<ffffffff81075b5e>] ? kthread+0x8e/0xa0 Jan 29 01:07:52 jackfruit kernel: [<ffffffff8101311a>] ? child_rip+0xa/0x20 Jan 29 01:07:52 jackfruit kernel: [<ffffffff81075ad0>] ? kthread+0x0/0xa0 Jan 29 01:07:52 jackfruit kernel: [<ffffffff81013110>] ? child_rip+0x0/0x20 Jan 29 01:07:52 jackfruit kernel: ---[ end trace 366c68d94c2ad887 ]--- also with rc6, if can help when i active quota give me warning about quota journaling support on ext4 not active, this update calltrace: Feb 2 18:36:26 ns3093701 kernel: [ 155.133830] ------------[ cut here ]------------ Feb 2 18:36:26 ns3093701 kernel: [ 155.133832] WARNING: at fs/quota/dquot.c:984 dquot_claim_space+0x98/0x109() Feb 2 18:36:26 ns3093701 kernel: [ 155.133833] Hardware name: X8SIL Feb 2 18:36:26 ns3093701 kernel: [ 155.133834] Modules linked in: xt_multiport iptable_filter ip_tables x_tables quota_v1 snd_pcm snd_timer tpm_tis tpm tpm_bios i2c_i801 snd soundcore snd_page_alloc psmouse serio_raw pcspkr i2c_core evdev processor ext4 mbcache jbd2 crc16 sd_mod crc_t10dif ata_generic ata_piix libata scsi_mod ide_pci_generic ehci_hcd ide_core e1000e button thermal fan thermal_sys [last unloaded: scsi_wait_scan] Feb 2 18:36:26 ns3093701 kernel: [ 155.133850] Pid: 1763, comm: flush-8:0 Tainted: G W 2.6.33-rc6 #1 Feb 2 18:36:26 ns3093701 kernel: [ 155.133852] Call Trace: Feb 2 18:36:26 ns3093701 kernel: [ 155.133854] [<ffffffff8111d3b5>] ? dquot_claim_space+0x98/0x109 Feb 2 18:36:26 ns3093701 kernel: [ 155.133856] [<ffffffff8111d3b5>] ? dquot_claim_space+0x98/0x109 Feb 2 18:36:26 ns3093701 kernel: [ 155.133858] [<ffffffff810457ad>] ? warn_slowpath_common+0x77/0xa3 Feb 2 18:36:26 ns3093701 kernel: [ 155.133860] [<ffffffff8111d3b5>] ? dquot_claim_space+0x98/0x109 Feb 2 18:36:26 ns3093701 kernel: [ 155.133866] [<ffffffffa00fbdb0>] ? vfs_dq_claim_block+0x32/0x5c [ext4] Feb 2 18:36:26 ns3093701 kernel: [ 155.133871] [<ffffffffa00fbef9>] ? ext4_da_update_reserve_space+0x11f/0x185 [ext4] Feb 2 18:36:26 ns3093701 kernel: [ 155.133877] [<ffffffffa0117f3c>] ? ext4_ext_get_blocks+0x15a5/0x1735 [ext4] Feb 2 18:36:26 ns3093701 kernel: [ 155.133880] [<ffffffff81161a01>] ? elv_rb_add+0x65/0x6c Feb 2 18:36:26 ns3093701 kernel: [ 155.133882] [<ffffffff8116aac5>] ? blk_recount_segments+0x17/0x27 Feb 2 18:36:26 ns3093701 kernel: [ 155.133884] [<ffffffff81160784>] ? elv_merged_request+0x30/0x39 Feb 2 18:36:26 ns3093701 kernel: [ 155.133887] [<ffffffff81167e57>] ? __make_request+0x2d0/0x428 Feb 2 18:36:26 ns3093701 kernel: [ 155.133889] [<ffffffff811666a7>] ? generic_make_request+0x285/0x2d0 Feb 2 18:36:26 ns3093701 kernel: [ 155.133891] [<ffffffff812f1f8e>] ? __down_write_nested+0x15/0xd1 Feb 2 18:36:26 ns3093701 kernel: [ 155.133897] [<ffffffffa00fc0b9>] ? ext4_get_blocks+0x15a/0x23a [ext4] Feb 2 18:36:26 ns3093701 kernel: [ 155.133902] [<ffffffffa00fc5d2>] ? mpage_da_map_blocks+0xa4/0x579 [ext4] Feb 2 18:36:26 ns3093701 kernel: [ 155.133905] [<ffffffff810b3fe2>] ? pagevec_lookup_tag+0x1a/0x21 Feb 2 18:36:26 ns3093701 kernel: [ 155.133907] [<ffffffff810b2be1>] ? write_cache_pages+0x118/0x29f Feb 2 18:36:26 ns3093701 kernel: [ 155.133912] [<ffffffffa00fd210>] ? __mpage_da_writepage+0x0/0x146 [ext4] Feb 2 18:36:26 ns3093701 kernel: [ 155.133918] [<ffffffffa00fcf83>] ? ext4_da_writepages+0x4dc/0x6c8 [ext4] Feb 2 18:36:26 ns3093701 kernel: [ 155.133920] [<ffffffff810ac27e>] ? find_get_pages_tag+0x46/0xeb Feb 2 18:36:26 ns3093701 kernel: [ 155.133922] [<ffffffff810ff97b>] ? end_buffer_async_write+0x0/0x11d Feb 2 18:36:26 ns3093701 kernel: [ 155.133926] [<ffffffff810f96e6>] ? writeback_single_inode+0xe3/0x2d6 Feb 2 18:36:26 ns3093701 kernel: [ 155.133928] [<ffffffff810fa32a>] ? writeback_inodes_wb+0x362/0x439 Feb 2 18:36:26 ns3093701 kernel: [ 155.133930] [<ffffffff810fa534>] ? wb_writeback+0x133/0x1b2 Feb 2 18:36:26 ns3093701 kernel: [ 155.133933] [<ffffffff810fa7a0>] ? wb_do_writeback+0x145/0x15b Feb 2 18:36:26 ns3093701 kernel: [ 155.133935] [<ffffffff810fa7e7>] ? bdi_writeback_task+0x31/0x9d Feb 2 18:36:26 ns3093701 kernel: [ 155.133937] [<ffffffff810bfeb6>] ? bdi_start_fn+0x0/0xd2 Feb 2 18:36:26 ns3093701 kernel: [ 155.133939] [<ffffffff810bff26>] ? bdi_start_fn+0x70/0xd2 Feb 2 18:36:26 ns3093701 kernel: [ 155.133941] [<ffffffff810bfeb6>] ? bdi_start_fn+0x0/0xd2 Feb 2 18:36:26 ns3093701 kernel: [ 155.133943] [<ffffffff8105d4e9>] ? kthread+0x79/0x81 Feb 2 18:36:26 ns3093701 kernel: [ 155.133946] [<ffffffff81009824>] ? kernel_thread_helper+0x4/0x10 Feb 2 18:36:26 ns3093701 kernel: [ 155.133948] [<ffffffff8105d470>] ? kthread+0x0/0x81 Feb 2 18:36:26 ns3093701 kernel: [ 155.133950] [<ffffffff81009820>] ? kernel_thread_helper+0x0/0x10 Feb 2 18:36:26 ns3093701 kernel: [ 155.133951] ---[ end trace 4eaa2a86a8e2da2e ]--- Fabio, thanks for the report. By -rc6 you mean 2.6.33-rc6? Dmitry, are you aware of any other races in the code? No, i don't know any real usecase bugs. I do know about rw=>ro bug, but it is not happens in normal(usual) situation. http://lkml.org/lkml/2010/1/24/25 yes 2.6.33-rc6, the warning on quota active (journaling support on ext4 not active) not help? if necessary on next install i post details message Wait a minute, seems that you have write to a file system before quotaon Is is true ? If so then you at do the wrong things because your quota will becomes out of sync in this case. I have a patch which fix this issue, http://marc.info/?l=linux-ext4&m=126106648829617&w=2). It was accepted by Jan but seems was dropped somewhere. I'm also complitly forgot about it. Cat you try it. But in fact the patch just makes kernel more patient to users misbehaviour. Argh, sorry for that. I've folded that change into your patch implementing reservation handling but then replaced reservation handling patch with a new version from you and forgot about this update. I've readded the patch to my tree now. Created attachment 24972 [details]
Fix warning at dquot_claim_space()
Dmitry, your patch didn't fix the whole issue for me. I also need this patch to avoid the warnings...
The bad thing about the patch is that it removes the possibility to quickly see cases where we reserved less than we've later claimed but OTOH giving false warnings is nasty as well... Created attachment 24985 [details]
add warning message on quotaon in case of reserved space
We can detect that space was reserved before quotaon during inides quota initialisation. Lets print warning message about this.
(In reply to comment #8) > Created an attachment (id=24972) [details] > Fix warning at dquot_claim_space() > > Dmitry, your patch didn't fix the whole issue for me. I also need this patch > to > avoid the warnings... This is dangerous changes because this may this remove important sanity check. It is false(not invalid) after few seconds after quotaon. But later it can helps us to catch a lot of bugs. I have found chown bug only because of this check. I've attached a patch which add warning message, but after some thougths i comes in to idea to add force charging a reserved space in ->initialize() method This allow us to preserve warning in claim_space, and helps us to handle user misbehaviour. Will prepare patch in minute. Created attachment 24986 [details] print warning about inconsistent quota usage Jan this patch does as much as we can in case of unexpected quota inconsistency. I should be applied on top of http://marc.info/?l=linux-ext4&m=126106356824375&w=2 Should i combine it in to one patch ? 1) It charge reserved space during quotaon 2) fixup quota_reservation on claim and release. In both cases error message will be printed(in first case it will be printed twice) test dd if=/dev/zero of=/mnt bs=1k & quotaon /mnt killall dd sync I've test it and has got nice warning messages without triggering WARN(true) in claim_rsv and free_rsv Created attachment 24989 [details]
Patch to avoid false warnings when filesystem was written before quotaon
Dmitry, I think your patch was a bit of an overengineering. But I like the idea of keeping the WARN_ON so here's a simpler patch.
Ok. Agree. There is only semantic difference between both patches. But your definitely clear. BTW: i've tried to count total reservation in add_dquot_ref() in order to give user a picture about amount of the leakage he has. Because leackage of 500Mb look really scary :) The fix is queued in my for_next branch, I'm closing the bug. |