Bug 11688
Summary: | Oops while writing on a newly ext4 formated 1TB partition on a external USB drive | ||
---|---|---|---|
Product: | File System | Reporter: | Vedran Furač (vedranf) |
Component: | ext3 | Assignee: | Jan Kara (jack) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | vedranf |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.27-rc7 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | Fix for the bug in ext4_write_begin and ext4_da_write_begin |
Description
Vedran Furač
2008-10-02 18:38:53 UTC
More readable version: [ 5015.394427] ------------[ cut here ]------------ [ 5015.394434] kernel BUG at fs/jbd/transaction.c:275! [ 5015.394436] invalid opcode: 0000 [#1] SMP [ 5015.394439] Modules linked in: isofs zlib_inflate udf crc_itu_t ext4dev jbd2 crc16 nls_utf8 ntfs nls_base radeon drm tun autofs4 ipv6 video output ac battery nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc quota_v2 fuse it87 hwmon_vid eeprom tuner_simple tuner_types tuner snd_cs4232 snd_opl3_lib snd_hwdep snd_cs4231_lib snd_mpu401 snd_mpu401_uart ns558 snd_seq_midi gameport snd_rawmidi bttv snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss rtc_cmos snd_mixer_oss videodev v4l1_compat ir_common rtc_core compat_ioctl32 i2c_algo_bit rtc_lib v4l2_common videobuf_dma_sg videobuf_core snd_bt87x btcx_risc tveeprom snd_pcm snd_seq_dummy serio_raw pcspkr psmouse shpchp snd_seq_oss snd_seq_midi_event snd_seq snd_timer snd_seq_device pci_hotplug snd soundcore snd_page_alloc parport_pc parport button i2c_nforce2 nvidia_agp i2c_core agpgart evdev joydev sd_mod ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom ide_disk ata_generic libata dock usbhid hid ff_memless 8139cp amd74xx usb_storage scsi_mod floppy ide_pci_generic 8139too mii ide_core ohci_hcd ehci_hcd usbcore thermal processor fan thermal_sys [ 5015.394510] [ 5015.394514] Pid: 6978, comm: kio_file Tainted: G W (2.6.27-rc7-686 #1) [ 5015.394517] EIP: 0060:[<f90b5b01>] EFLAGS: 00210287 CPU: 0 [ 5015.394530] EIP is at journal_start+0x2f/0xa9 [jbd] [ 5015.394533] EAX: f4591160 EBX: f5d413ec ECX: f70a0800 EDX: 00000012 [ 5015.394536] ESI: f714d000 EDI: 0000004a EBP: 00000012 ESP: f71d5c20 [ 5015.394539] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 5015.394542] Process kio_file (pid: 6978, ti=f71d4000 task=f34fcd00 task.ti=f71d4000) [ 5015.394544] Stack: f5d413ec f77a0c9c f77a0c9c 0000004a f71d5c6c f911d9aa f77a0c9c f77a0c9c [ 5015.394551] 0000004a c01a143e c01884d5 f77a0ca4 c0188a22 f5c824ec 00000000 00000080 [ 5015.394557] 00000080 c0188c3d 00000080 f7688e50 f0aac76c 00000c1c c035d07c 00000080 [ 5015.394563] Call Trace: [ 5015.394566] [<f911d9aa>] ext3_dquot_drop+0x29/0x5d [ext3] [ 5015.394585] [<c01a143e>] vfs_dq_drop+0x35/0x36 [ 5015.394594] [<c01884d5>] clear_inode+0x53/0xb3 [ 5015.394598] [<c0188a22>] dispose_list+0x46/0xca [ 5015.394602] [<c0188c3d>] shrink_icache_memory+0x197/0x1bf [ 5015.394605] [<c0161ba9>] shrink_slab+0xd3/0x132 [ 5015.394612] [<c0161e45>] try_to_free_pages+0x23d/0x347 [ 5015.394616] [<c0160c4f>] isolate_pages_global+0x0/0x42 [ 5015.394620] [<c015dc5b>] __alloc_pages_internal+0x213/0x34e [ 5015.394624] [<c0159472>] __grab_cache_page+0x2d/0x60 [ 5015.394627] [<f958a65a>] ext4_da_write_begin+0x58/0xe9 [ext4dev] [ 5015.394649] [<c0159bc8>] generic_file_buffered_write+0xef/0x556 [ 5015.394657] [<c01be51e>] cap_inode_need_killpriv+0x25/0x35 [ 5015.394663] [<c01bf8a9>] security_inode_need_killpriv+0xc/0xd [ 5015.394668] [<c0158d60>] file_remove_suid+0x18/0x47 [ 5015.394674] [<c015a69b>] __generic_file_aio_write_nolock+0x465/0x4c8 [ 5015.394678] [<c0118a91>] kunmap_atomic+0x67/0x8a [ 5015.394684] [<c02c19d3>] _spin_lock+0x5/0x7 [ 5015.394691] [<c0188f44>] touch_atime+0xe/0xce [ 5015.394694] [<c015a750>] generic_file_aio_write+0x52/0xa9 [ 5015.394698] [<f9585055>] ext4_file_write+0xa7/0x11d [ext4dev] [ 5015.394711] [<c017817c>] do_sync_write+0xbf/0x100 [ 5015.394717] [<c01331fc>] autoremove_wake_function+0x0/0x2d [ 5015.394724] [<c01bf8d0>] security_file_permission+0xc/0xd [ 5015.394728] [<c01780bd>] do_sync_write+0x0/0x100 [ 5015.394732] [<c01788e6>] vfs_write+0x83/0x120 [ 5015.394735] [<c0178d99>] sys_write+0x3c/0x63 [ 5015.394739] [<c01037f3>] sysenter_do_call+0x12/0x2f [ 5015.394744] ======================= [ 5015.394746] Code: 56 89 c6 53 bb e2 ff ff ff 83 ec 04 64 a1 00 00 3c c0 8b 80 60 03 00 00 85 f6 89 04 24 74 7e 85 c0 74 11 89 c3 8b 00 39 30 74 04 <0f> 0b eb fe ff 43 08 eb 69 a1 68 ee 0b f9 ba 50 00 00 00 bb f4 [ 5015.394776] EIP: [<f90b5b01>] journal_start+0x2f/0xa9 [jbd] SS:ESP 0068:f71d5c20 [ 5015.394789] ---[ end trace c6daa70af0b08ca0 ]--- Can you replicate this on a newer kernel? This looks like a failure in ext3/jbd code, actually, triggered when the kernel was trying to free pages to get back some memory. Hi, any more information about this bug? How much memory do you have? What ext3 filesystems do you have mounted? The oops is in ext3 code: handle_t *journal_start(journal_t *journal, int nblocks) { handle_t *handle = journal_current_handle(); int err; if (!journal) return ERR_PTR(-EROFS); if (handle) { J_ASSERT(handle->h_transaction->t_journal == journal); <---- handle->h_ref++; return handle; } It seems to be related ext3's quota code -- and again it seems to be related to an attempt to free memory when the system is under memory pressure. This looks very much like a quota and/or ext3 bug. Sure, I'm waiting for 2.6.28 to enter debian so I can test. I have 1.7GB of memory, of that more than 1GB is free (excluding cache). All other filesystems are ext3 including the source (ext3->ext4). Quota is enabled but not set for any user. I disabled it (init.d/quota stop) and tried writing (ext3->ext4) again and oops didn't happen. Will test more, but this could be the issue. What I don't understand is why it doesn't happen when I'm writing to ext3. Ext4's delayed allocation will cause more memory pressure, because blocks aren't as aggressively pushed out to disk. How much memory do you have in your system, how many ext3/ext4 filesystems do you have mounted, how many of them are active (and how active) versus totally quiescent, and how big is the journal for those filesystems? total used free shared buffers cached Mem: 1773 1730 43 0 23 1185 -/+ buffers/cache: 521 1252 Swap: 0 0 0 /dev/mapper/vg-root on / type ext3 (rw,errors=remount-ro) /dev/mapper/vg-home on /home type ext3 (rw,usrquota) /dev/mapper/vg-var on /var type ext3 (rw) /dev/sda3 on /media/disk type ext4dev (rw,nosuid,nodev,uhelper=hal) /dev/sda2 on /media/disk-1 type ext3 (rw,nosuid,nodev,uhelper=hal,data=ordered) Copying from /home to /media/disk so only these are active if I understand you correctly. # debugfs -R "stat <8>" /dev/mapper/vg-home debugfs 1.41.3 (12-Oct-2008) Inode: 8 Type: regular Mode: 0600 Flags: 0x0 Generation: 0 Version: 0x00000000 User: 0 Group: 0 Size: 134217728 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 262416 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x487a96ce -- Mon Jul 14 01:59:10 2008 atime: 0x00000000 -- Thu Jan 1 01:00:00 1970 mtime: 0x487a96ce -- Mon Jul 14 01:59:10 2008 Size of extra inode fields: 0 # debugfs -R "stat <8>" /dev/sda3 debugfs 1.41.3 (12-Oct-2008) Inode: 8 Type: regular Mode: 0600 Flags: 0x80000 Generation: 0 Version: 0x00000000 User: 0 Group: 0 Size: 134217728 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 262144 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x48ee6d44 -- Thu Oct 9 22:44:52 2008 atime: 0x00000000 -- Thu Jan 1 01:00:00 1970 mtime: 0x48ee6d44 -- Thu Oct 9 22:44:52 2008 Size of extra inode fields: 0 So you only have 2 megabytes of memory.... Yeah, this sounds like a low-memory pressure situation. Ext4's delayed allocation will definitely use more memory, and it looks like you are tickling an ext3 memory pressure bug. Is there a reason why you aren't mounting your filesystems with noatime? That's undoubtedly responsible for the ext3 journal activity, and I suspect mounting /home with noatime will make the problem go away. It might also be interesting disabling quotas on /home, to see if the problem goes away or moves to something else. What is happening is that when ext4 starts writing, it needs memory for the page cache. To get that space, the VM system is trying to get space, so it calls among other things, shrink_icache_memory, to shrink the inode cache, which then calls clear_inode, which ended up calling ext3_dquot_drop, through the quota code. Ext3_dquot_drop() calls journal_start(), which then trips over a sanity check. In any case, given what you've told me, I'm even more convinced this is an ext3 and/or a quota system bug, which is getting triggered to severe memory pressure caused by ext4 and the fact that you have a very tiny amount of memory. Errr, that was output of $ free -m, so it's ~1.7GB of memory. Regarding noatime... I didn't change debian defaults. Disabling quotas (/etc/init.d/quota stop) seems to fix the problem. I copied some 70GB and oops didn't happen. I need to test it more, but I'm 90% sure it's quota related bug. You can assign this bug to someone who is in charge of ext3 and quota. Ted, I've looked into this and it is a bug in ext4. It should call grab_cache_page_write_begin() with AOP_FLAG_NOFS since it cannot handle recursion into the filesystem for freeing memory. The problem is that at the time grab_cache_page_write_begin() is called, a transaction is already started - this is what quota code complains about (because it also needs to start a transaction but current->journal_info is already occupied by ext4). If we wanted to avoid AOP_FLAG_NOFS we could get the page, start a transaction, lock and check the page but that's going to be a bit nasty. Created attachment 20124 [details]
Fix for the bug in ext4_write_begin and ext4_da_write_begin
This patch should fix the problem. Testing welcome ;)
OK, the patch was merged into ext4-tree. Closing the bug. |