Bug 6002

Summary: quota + 2.6.15.1
Product: File System Reporter: Nerijus Kislauskas (nerijus.kislauskas)
Component: OtherAssignee: Jan Kara (jack)
Status: CLOSED CODE_FIX    
Severity: high CC: akpm
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: Linux version 2.6.15.1 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Patch fixing BUG in invalidate_dquots()

Description Nerijus Kislauskas 2006-02-03 03:27:25 UTC
Problem Description:
We where testing our group quotas over ext3 file system. Kernel crashed after
"quotaoff -ap" command (Segmentation fault) and whole system freezed, no
responses from host. We had to reboot it. 

Distribution:
Debian stable

Hardware Environment:
IBM eServer xSeries 346 + DS400 total storage

Kernel configuration:
CONFIG_XFS_QUOTA=y
CONFIG_QUOTA=y
CONFIG_QUOTACTL=y
CONFIG_QFMT_V1=m
CONFIG_QFMT_V2=m
CONFIG_EXT3_FS=y

Software Environment: 
quota, quotatools

Here is a log from syslog:
Feb  3 11:53:39 bedugnis kernel: ------------[ cut here ]------------
Feb  3 11:53:39 bedugnis kernel: kernel BUG at fs/dquot.c:423!
Feb  3 11:53:39 bedugnis kernel: invalid operand: 0000 [#1]
Feb  3 11:53:39 bedugnis kernel: SMP 
Feb  3 11:53:39 bedugnis kernel: Modules linked in: quota_v1 aic7xxx
scsi_transport_spi qla2300 qla2xxx firmware_class scsi_transport_fc
Feb  3 11:53:39 bedugnis kernel: CPU:    0
Feb  3 11:53:39 bedugnis kernel: EIP:    0060:[<c016f167>]    Not tainted VLI
Feb  3 11:53:39 bedugnis kernel: EFLAGS: 00010202   (2.6.15.1) 
Feb  3 11:53:39 bedugnis kernel: EIP is at invalidate_dquots+0x47/0xde
Feb  3 11:53:39 bedugnis kernel: eax: 00000001   ebx: f18ff580   ecx: f18ff580 
 edx: ffff0001
Feb  3 11:53:39 bedugnis kernel: esi: d4aba100   edi: 00000001   ebp: f7474800 
 esp: c07d1eb8
Feb  3 11:53:39 bedugnis kernel: ds: 007b   es: 007b   ss: 0068
Feb  3 11:53:39 bedugnis kernel: Process quotaoff (pid: 3275,
threadinfo=c07d0000 task=f771ea30)
Feb  3 11:53:39 bedugnis kernel: Stack: f74748d0 00000001 00000004 f74748ac
c0170750 f7474800 00000001 f7474800 
Feb  3 11:53:39 bedugnis kernel: 00000001 00000004 00000000 00000000 f7474800
00800003 00000001 00000000 
Feb  3 11:53:39 bedugnis kernel: c017192b f7474800 00000001 c0151eab ca5df000
c0243fdb f73d17a8 fffffff4 
Feb  3 11:53:39 bedugnis kernel: Call Trace:
Feb  3 11:53:39 bedugnis kernel: [<c0170750>] vfs_quota_off+0x8b/0x1f2
Feb  3 11:53:39 bedugnis kernel: [<c017192b>] do_quotactl+0x101/0x30b
Feb  3 11:53:39 bedugnis kernel: [<c0151eab>] getname+0x5b/0x92
Feb  3 11:53:39 bedugnis kernel: [<c0243fdb>] _atomic_dec_and_lock+0x27/0x44
Feb  3 11:53:39 bedugnis kernel: [<c015e4bf>] mntput_no_expire+0x14/0x5f
Feb  3 11:53:39 bedugnis kernel: [<c014df2f>] lookup_bdev+0x6d/0x7c
Feb  3 11:53:39 bedugnis kernel: [<c0171631>] check_quotactl_valid+0x34/0x3a
Feb  3 11:53:39 bedugnis kernel: [<c0171bd2>] sys_quotactl+0x9d/0xb5
Feb  3 11:53:39 bedugnis kernel: [<c0102789>] syscall_call+0x7/0xb
Feb  3 11:53:39 bedugnis kernel: Code: 58 f8 8b 73 08 83 ee 08 3d 8c e3 3d c0 0f
84 a3 00 00 00 39 6b 44 0f 85 84 00 00 00 0f bf 43 58 39 f8 75 7c 8b 43 34 85 c0
74 08 <0f> 0b a7 01 3e ee 37 c0 8b 53 04 85 d2 74 18 8b 03 85 c0 89 02 
Feb  3 12:17:30 bedugnis syslog-ng[2668]: syslog-ng version 1.6.5 starting

cat /proc/modules:
aic7xxx 139956 0 - Live 0xf8ac0000
scsi_transport_spi 23424 1 aic7xxx, Live 0xf8a37000
quota_v1 7424 10 - Live 0xf8831000
qla2300 128896 0 - Live 0xf8a72000
qla2xxx 121564 8 qla2300, Live 0xf8a53000
firmware_class 14080 1 qla2xxx, Live 0xf8837000
scsi_transport_fc 31360 1 qla2xxx, Live 0xf8816000
Comment 1 Jan Kara 2006-02-06 04:06:18 UTC
Hmm, that's strange. It looks as if someone still uses quota while
invalidate_dquots() is running. What were you running to test the quota? Was
something running while you did quotaoff?
Comment 2 Nerijus Kislauskas 2006-02-06 06:59:33 UTC
It was strange to us that when quotacheck finished job, it couldn't rename
quota.user.new to quota.user. Output was something like that:

bedugnis:~# quotacheck -a
quotacheck: Cannot rename new quotafile /disks/disk1/quota.user.new to name
/disks/disk1/quota.user: Operation not permited

Also the same with quota.group. If quotas is turned off, then it is ok, and
quotacheck could easily rename needed files (both user and quota). Segmentation
fault come right after "quotaoff -g /disks/disk1" command. We share this
partition over NFS (nfsd) to other 2 systems, so disk activity is high. I've
tested this situation on my desktop (Debian unstable and it's 2.6.15.1
precompiled kernel). Testings were passed (quotacheck renamed quota.user and
quota.group easily, but 1 user was there). Our system has aproximately 25000 -
30000 mail users, and we had to go back to 2.4.31 kernel. Does these numbers
could be a reason for file accessing and renaming problems? (a lot of mail is
delivered to this mashine). Also we need to check for quotas very often.
Comment 3 Jan Kara 2006-02-17 02:48:35 UTC
OK, I think I've found the problem (at least after applying the patch I was not
able to reproduce the problem any longer). I'll post here a patch in a while -
could you please test it also in your environment?
Comment 4 Jan Kara 2006-02-17 02:49:36 UTC
Created attachment 7380 [details]
Patch fixing BUG in invalidate_dquots()
Comment 5 Nerijus Kislauskas 2006-02-20 02:14:58 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm sorry, but I'm unable to test your patch. This server now is a
production server, others too. I will stay on a 2.4 kernel (if it would
be necessary to move to 2.6) , so there is no environment to test it
now. Anyway, thanks for patch. I'm glad to help you.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFD+ZZWVaDswePJAMURAmW5AJ420UpGld9Iu18k1ZdsRPY0vkToDgCeM6tC
CB04Xd8yh7NeVDD8e2A+gKk=
=xrXM
-----END PGP SIGNATURE-----

Comment 6 Jan Kara 2006-02-20 02:49:03 UTC
OK, I've submitted patch for inclusion in kernel.