Bug 14697 (Roland) - quota problem
Summary: quota problem
Status: RESOLVED CODE_FIX
Alias: Roland
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jan Kara
URL: none
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-26 17:03 UTC by Paul Roland
Modified: 2010-02-25 01:59 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.31.6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Patch with various ext4 quota fixes (16.91 KB, patch)
2009-12-16 13:46 UTC, Jan Kara
Details | Diff
Patches fixing WARN_ON when filesystem is written before quota is turned on (2.83 KB, application/x-gzip)
2010-02-24 11:21 UTC, Jan Kara
Details

Description Paul Roland 2009-11-26 17:03:32 UTC
------------[ cut here ]------------
WARNING: at fs/quota/dquot.c:964 ()
Hardware name:
Pid: 251, comm: pdflush Not tainted 2.6.31-xpr-cp #1
Call Trace:
 [<ffffffff8135c302>] ?
 [<ffffffff810e81fa>] ?
 [<ffffffff81032d30>] ?
 [<ffffffff810e81fa>] ?
 [<ffffffff8113fb8f>] ?
 [<ffffffff8113b612>] ?
 [<ffffffff8113fde0>] ?
 [<ffffffff811377b4>] ?
 [<ffffffff8118aa61>] ?
 [<ffffffff8112224e>] ?
 [<ffffffff811225b7>] ?
 [<ffffffff8106d54a>] ?
 [<ffffffff8106b414>] ?
 [<ffffffff811246f0>] ?
 [<ffffffff81145fbe>] ?
 [<ffffffff8106d54a>] ?
 [<ffffffff8112572c>] ?
 [<ffffffff8106b710>] ?
 [<ffffffff810b860c>] ?
 [<ffffffff810b8cc8>] ?
 [<ffffffff810b8fff>] ?
 [<ffffffff8106c4cc>] ?
 [<ffffffff8106c9c0>] ?
 [<ffffffff8106cad1>] ?
 [<ffffffff8106c410>] ?
 [<ffffffff81049e16>] ?
 [<ffffffff81516ba0>] ?
 [<ffffffff8100354a>] ?
 [<ffffffff81516ba0>] ?
 [<ffffffff81049d70>] ?
 [<ffffffff81003540>] ?
---[ end trace 0c8fe3df085412ef ]---
Comment 1 Jan Kara 2009-11-27 13:21:42 UTC
I guess it's just another instance of
http://kerneloops.org/searchweek.php?search=dquot_claim_space

There are instances of this bug Redhat, Novell, Ubuntu and I suppose other bugzillas. Mingming who wrote the code doesn't seem to have time to fix it
so I guess I'll have to dive into the deep dark corners of ext4 mballoc allocator. I've already tried but always gave it up after half an hour with disgust ;)

OK, ok, I'm finishing with ranting and going to do some work...
Comment 2 Paul Roland 2009-12-02 00:53:56 UTC
This bug exists also in mainline kernel. I suppose is not necessary to fill another bug report.
Comment 3 Jan Kara 2009-12-02 14:11:36 UTC
No. BTW: Do you have any recipe how to trigger the warning?
Comment 4 Paul Roland 2009-12-03 03:36:41 UTC
Nope it appears suddenly.
The message repeats quite often but quota works as I can see...
Comment 5 Paul Roland 2009-12-03 03:37:28 UTC
lease tell me if there's anything I can help I can provide shell access to this machine is nit a production environment yet.
Comment 6 Jan Kara 2009-12-16 13:42:26 UTC
Dmitry Monakhov has just fixed several bugs in ext4 quota reservation which should hopefully also fix your problem. I'll attach here a combo patch with all the fixes.
Comment 7 Jan Kara 2009-12-16 13:46:51 UTC
Created attachment 24208 [details]
Patch with various ext4 quota fixes

This patch is on top of current Linus's tree. You can for example take 2.6.32-git12 (http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.32-git12.bz2) as a base for my patch.
Comment 8 Jan Kara 2009-12-16 13:47:42 UTC
Could you please try running with the above fix and see whether your issue gets fixed? Thanks.
Comment 9 Paul Roland 2010-01-05 10:19:32 UTC
ok I just upgraded to 2.6.32.2, I will post here if anything else accurs.
Comment 10 Paul Roland 2010-01-05 10:36:36 UTC
------------[ cut here ]------------
WARNING: at fs/quota/dquot.c:964 ()
Hardware name:
Pid: 1755, comm: flush-254:0 Not tainted 2.6.32-xpr-cp #1
Call Trace:
 [<ffffffff81368b82>] ?
 [<ffffffff810ed0ba>] ?
 [<ffffffff81033630>] ?
 [<ffffffff810ed0ba>] ?
 [<ffffffff811473dd>] ?
 [<ffffffff81147650>] ?
 [<ffffffff8113e0a1>] ?
 [<ffffffff8104aa40>] ?
 [<ffffffff81192b72>] ?
 [<ffffffff81190411>] ?
 [<ffffffff8119313d>] ?
 [<ffffffff81128e4e>] ?
 [<ffffffff811291c7>] ?
 [<ffffffff8106f03a>] ?
 [<ffffffff8106d3a4>] ?
 [<ffffffff8112b580>] ?
 [<ffffffff8112c588>] ?
 [<ffffffff810c4e60>] ?
 [<ffffffff810c5f40>] ?
 [<ffffffff811aba53>] ?
 [<ffffffff810bcfd2>] ?
 [<ffffffff810bd85e>] ?
 [<ffffffff810bdac8>] ?
 [<ffffffff810bddae>] ?
 [<ffffffff810bde28>] ?
 [<ffffffff81079440>] ?
 [<ffffffff810794a3>] ?
 [<ffffffff8104aa06>] ?
 [<ffffffff810034ca>] ?
 [<ffffffff8104a970>] ?
 [<ffffffff810034c0>] ?
---[ end trace a92b953f5f42206a ]---


that would be 2.6.32.2, seems like the problem persists.
Comment 11 Paul Roland 2010-01-05 10:43:06 UTC
I also need to mention that all this is inside a KVM virtual machine. Virtio block
Comment 12 Jan Kara 2010-01-05 10:59:34 UTC
2.6.32.2 does not have the fixes - Greg has included the fixes into his stable tree queue just yesterday. So either you have to wait for the next -stable release or try the patch above...
Comment 13 Paul Roland 2010-01-11 03:42:16 UTC
patch seems to work, when will it be included in stable?
Comment 14 Jan Kara 2010-01-11 12:56:41 UTC
Thanks for testing the patch! There appeared some problems with the fix so it was not included in the latest -stable release. Hopefully I can get it into the next one.
Comment 15 Paul Roland 2010-02-22 12:47:17 UTC
ok, I am back
now the codes are mouch more often, and it's 2.6.32.8

------------[ cut here ]------------
WARNING: at fs/quota/dquot.c:964 ()
Hardware name: VMware Virtual Platform
Modules linked in:
Pid: 979, comm: syslogd Tainted: G        W  2.6.32.8 #1
Call Trace:
 [<ffffffff810e8a4a>] ?
 [<ffffffff810e8a4a>] ?
 [<ffffffff81030635>] ?
 [<ffffffff810e8a4a>] ?
 [<ffffffff8113a4ed>] ?
 [<ffffffff81135453>] ?
 [<ffffffff8113a760>] ?
 [<ffffffff811311a0>] ?
 [<ffffffff810bedbd>] ?
 [<ffffffff8139a940>] ?
 [<ffffffff81132c63>] ?
 [<ffffffff81119d35>] ?
 [<ffffffff81146ff6>] ?
 [<ffffffff8111bf5e>] ?
 [<ffffffff811ade23>] ?
 [<ffffffff8111c2d7>] ?
 [<ffffffff810706ca>] ?
 [<ffffffff8106eb24>] ?
 [<ffffffff8111e630>] ?
 [<ffffffff8106945d>] ?
 [<ffffffff8111f632>] ?
 [<ffffffff81097a3b>] ?
 [<ffffffff810482d0>] ?
 [<ffffffff81068d3b>] ?
 [<ffffffff81068db1>] ?
 [<ffffffff8139a660>] ?
 [<ffffffff810bcf6a>] ?
 [<ffffffff810bd056>] ?
 [<ffffffff810bd0ab>] ?
 [<ffffffff810023eb>] ?
---[ end trace bb0af93142efddf7 ]---
Comment 16 Paul Roland 2010-02-23 11:12:12 UTC
this error now comes almoust every minute....
Comment 17 Jan Kara 2010-02-23 11:22:02 UTC
That's nasty. We didn't get all the fixes to stable because they were just too big :(. Could you try running with 2.6.33-rc8 kernel? That should solve your problems...
Comment 18 Paul Roland 2010-02-23 22:28:33 UTC
the 2.6.33-rc8 version acts like this:

------------[ cut here ]------------
WARNING: at fs/quota/dquot.c:984 dquot_claim_space+0x12a/0x130()
Hardware name: VMware Virtual Platform
Modules linked in:
Pid: 791, comm: flush-8:0 Not tainted 2.6.32-xpr #1
Call Trace:
 [<ffffffff810e719a>] ? dquot_claim_space+0x12a/0x130
 [<ffffffff810e719a>] ? dquot_claim_space+0x12a/0x130
 [<ffffffff81033ae5>] ? warn_slowpath_common+0x85/0xb0
 [<ffffffff810e719a>] ? dquot_claim_space+0x12a/0x130
 [<ffffffff8111a943>] ? ext4_da_update_reserve_space+0x233/0x360
 [<ffffffff81130951>] ? ext4_ext_get_blocks+0x1561/0x16a0
 [<ffffffff81049000>] ? bit_waitqueue+0x10/0xb0
 [<ffffffff8118d622>] ? get_request+0x2f2/0x380
 [<ffffffff8118aee1>] ? elv_rb_add+0x61/0x70
 [<ffffffff81342415>] ? __down_write_nested+0x15/0xd0
 [<ffffffff8111ab72>] ? ext4_get_blocks+0x102/0x230
 [<ffffffff8111adb0>] ? mpage_da_map_blocks+0xb0/0x6a0
 [<ffffffff8107716a>] ? pagevec_lookup_tag+0x1a/0x30
 [<ffffffff810755c5>] ? write_cache_pages+0xe5/0x330
 [<ffffffff8111d040>] ? __mpage_da_writepage+0x0/0x170
 [<ffffffff8111e039>] ? ext4_da_writepages+0x329/0x6d0
 [<ffffffff81198b03>] ? cpumask_next_and+0x23/0x40
 [<ffffffff810bc2f2>] ? writeback_single_inode+0x92/0x280
 [<ffffffff810bcbb0>] ? writeback_inodes_wb+0x270/0x390
 [<ffffffff810bce06>] ? wb_writeback+0x136/0x1f0
 [<ffffffff810bd104>] ? wb_do_writeback+0x174/0x190
 [<ffffffff810bd15b>] ? bdi_writeback_task+0x3b/0xc0
 [<ffffffff81080b90>] ? bdi_start_fn+0x0/0xd0
 [<ffffffff81080bf3>] ? bdi_start_fn+0x63/0xd0
 [<ffffffff81048d86>] ? kthread+0x96/0xa0
 [<ffffffff81002f54>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff81048cf0>] ? kthread+0x0/0xa0
 [<ffffffff81002f50>] ? kernel_thread_helper+0x0/0x10
---[ end trace 9af19d2cc5731f69 ]---
Comment 19 Paul Roland 2010-02-23 22:30:47 UTC
at least seems to be more specific:)
Comment 20 Jan Kara 2010-02-24 11:20:18 UTC
Hmm, the version 2.6.32-xpr looks strange. Are you sure it's 2.6.33-rc8? And by any chance: Don't you write to the filesystem before turning quotas on? If yes, you might want to try 2.6.33-rc8 with the two patches I'll attach here applied.
Comment 21 Jan Kara 2010-02-24 11:21:54 UTC
Created attachment 25192 [details]
Patches fixing WARN_ON when filesystem is written before quota is turned on
Comment 22 Paul Roland 2010-02-24 19:07:41 UTC
the problem is that I am kind of stuck with the 2.6.32 branch because of the grsecurity patch which is crucial for tpe and some security stuff implemented in it, it's a shell access machine and it was already targeted twice.

I am recompiling the kernel with those patches now. thank you
Comment 23 Paul Roland 2010-02-24 20:15:55 UTC
ok, with this patches and correct kernel version I have enabled journaled quota.
The error dissapeared but I have grsec features missing now, is there any plan to include this on the main kernel? 2.6.32 ? thank you
Comment 24 Jan Kara 2010-02-24 21:20:39 UTC
Ah, finally. Thanks for testing. Since Linus has just released 2.6.33 they won't make it there but I'll push these two patches to 2.6.33-stable. For 2.6.32 - quite a bit had to be done in 2.6.33 to fix all the breakage so pushing all the stuff to 2.6.32-stable isn't likely...

BTW: When these patches fix problems for you, it means that you write to the filesystem before running quotaon and thus your quota accounting is going to be
imprecise. It's good to turn on quotas from init scripts so that such things
cannot happen...
Comment 25 Paul Roland 2010-02-25 00:09:32 UTC
yes, im already reveiving errors regardign that I will check out init scripts.
also im using rhel 5 and rhel's quota does not support ext4 so is quite a diff.

so when 2.6.33 will be out I suppose this patches will be included ?
Comment 26 Jan Kara 2010-02-25 00:27:33 UTC
The problem is 2.6.33 *is* already out :) So the above two patches will be only in the first stable release...
Comment 27 Paul Roland 2010-02-25 01:59:32 UTC
then I guess 2.6.33.1 is what im waiting for.

Note You need to log in before you can comment on or make changes to this bug.