Problem Description: We where testing our group quotas over ext3 file system. Kernel crashed after "quotaoff -ap" command (Segmentation fault) and whole system freezed, no responses from host. We had to reboot it. Distribution: Debian stable Hardware Environment: IBM eServer xSeries 346 + DS400 total storage Kernel configuration: CONFIG_XFS_QUOTA=y CONFIG_QUOTA=y CONFIG_QUOTACTL=y CONFIG_QFMT_V1=m CONFIG_QFMT_V2=m CONFIG_EXT3_FS=y Software Environment: quota, quotatools Here is a log from syslog: Feb 3 11:53:39 bedugnis kernel: ------------[ cut here ]------------ Feb 3 11:53:39 bedugnis kernel: kernel BUG at fs/dquot.c:423! Feb 3 11:53:39 bedugnis kernel: invalid operand: 0000 [#1] Feb 3 11:53:39 bedugnis kernel: SMP Feb 3 11:53:39 bedugnis kernel: Modules linked in: quota_v1 aic7xxx scsi_transport_spi qla2300 qla2xxx firmware_class scsi_transport_fc Feb 3 11:53:39 bedugnis kernel: CPU: 0 Feb 3 11:53:39 bedugnis kernel: EIP: 0060:[<c016f167>] Not tainted VLI Feb 3 11:53:39 bedugnis kernel: EFLAGS: 00010202 (2.6.15.1) Feb 3 11:53:39 bedugnis kernel: EIP is at invalidate_dquots+0x47/0xde Feb 3 11:53:39 bedugnis kernel: eax: 00000001 ebx: f18ff580 ecx: f18ff580 edx: ffff0001 Feb 3 11:53:39 bedugnis kernel: esi: d4aba100 edi: 00000001 ebp: f7474800 esp: c07d1eb8 Feb 3 11:53:39 bedugnis kernel: ds: 007b es: 007b ss: 0068 Feb 3 11:53:39 bedugnis kernel: Process quotaoff (pid: 3275, threadinfo=c07d0000 task=f771ea30) Feb 3 11:53:39 bedugnis kernel: Stack: f74748d0 00000001 00000004 f74748ac c0170750 f7474800 00000001 f7474800 Feb 3 11:53:39 bedugnis kernel: 00000001 00000004 00000000 00000000 f7474800 00800003 00000001 00000000 Feb 3 11:53:39 bedugnis kernel: c017192b f7474800 00000001 c0151eab ca5df000 c0243fdb f73d17a8 fffffff4 Feb 3 11:53:39 bedugnis kernel: Call Trace: Feb 3 11:53:39 bedugnis kernel: [<c0170750>] vfs_quota_off+0x8b/0x1f2 Feb 3 11:53:39 bedugnis kernel: [<c017192b>] do_quotactl+0x101/0x30b Feb 3 11:53:39 bedugnis kernel: [<c0151eab>] getname+0x5b/0x92 Feb 3 11:53:39 bedugnis kernel: [<c0243fdb>] _atomic_dec_and_lock+0x27/0x44 Feb 3 11:53:39 bedugnis kernel: [<c015e4bf>] mntput_no_expire+0x14/0x5f Feb 3 11:53:39 bedugnis kernel: [<c014df2f>] lookup_bdev+0x6d/0x7c Feb 3 11:53:39 bedugnis kernel: [<c0171631>] check_quotactl_valid+0x34/0x3a Feb 3 11:53:39 bedugnis kernel: [<c0171bd2>] sys_quotactl+0x9d/0xb5 Feb 3 11:53:39 bedugnis kernel: [<c0102789>] syscall_call+0x7/0xb Feb 3 11:53:39 bedugnis kernel: Code: 58 f8 8b 73 08 83 ee 08 3d 8c e3 3d c0 0f 84 a3 00 00 00 39 6b 44 0f 85 84 00 00 00 0f bf 43 58 39 f8 75 7c 8b 43 34 85 c0 74 08 <0f> 0b a7 01 3e ee 37 c0 8b 53 04 85 d2 74 18 8b 03 85 c0 89 02 Feb 3 12:17:30 bedugnis syslog-ng[2668]: syslog-ng version 1.6.5 starting cat /proc/modules: aic7xxx 139956 0 - Live 0xf8ac0000 scsi_transport_spi 23424 1 aic7xxx, Live 0xf8a37000 quota_v1 7424 10 - Live 0xf8831000 qla2300 128896 0 - Live 0xf8a72000 qla2xxx 121564 8 qla2300, Live 0xf8a53000 firmware_class 14080 1 qla2xxx, Live 0xf8837000 scsi_transport_fc 31360 1 qla2xxx, Live 0xf8816000
Hmm, that's strange. It looks as if someone still uses quota while invalidate_dquots() is running. What were you running to test the quota? Was something running while you did quotaoff?
It was strange to us that when quotacheck finished job, it couldn't rename quota.user.new to quota.user. Output was something like that: bedugnis:~# quotacheck -a quotacheck: Cannot rename new quotafile /disks/disk1/quota.user.new to name /disks/disk1/quota.user: Operation not permited Also the same with quota.group. If quotas is turned off, then it is ok, and quotacheck could easily rename needed files (both user and quota). Segmentation fault come right after "quotaoff -g /disks/disk1" command. We share this partition over NFS (nfsd) to other 2 systems, so disk activity is high. I've tested this situation on my desktop (Debian unstable and it's 2.6.15.1 precompiled kernel). Testings were passed (quotacheck renamed quota.user and quota.group easily, but 1 user was there). Our system has aproximately 25000 - 30000 mail users, and we had to go back to 2.4.31 kernel. Does these numbers could be a reason for file accessing and renaming problems? (a lot of mail is delivered to this mashine). Also we need to check for quotas very often.
OK, I think I've found the problem (at least after applying the patch I was not able to reproduce the problem any longer). I'll post here a patch in a while - could you please test it also in your environment?
Created attachment 7380 [details] Patch fixing BUG in invalidate_dquots()
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm sorry, but I'm unable to test your patch. This server now is a production server, others too. I will stay on a 2.4 kernel (if it would be necessary to move to 2.6) , so there is no environment to test it now. Anyway, thanks for patch. I'm glad to help you. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFD+ZZWVaDswePJAMURAmW5AJ420UpGld9Iu18k1ZdsRPY0vkToDgCeM6tC CB04Xd8yh7NeVDD8e2A+gKk= =xrXM -----END PGP SIGNATURE-----
OK, I've submitted patch for inclusion in kernel.