Bug 16393

Summary: kernel BUG at fs/block_dev.c:765!
Product: File System Reporter: Maciej Rutecki (maciej.rutecki)
Component: OtherAssignee: Tejun Heo (tj)
Status: CLOSED CODE_FIX    
Severity: normal CC: cebbert, florian, Larry.Finger, maciej.rutecki, markus, rjw, tj
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35-rc5 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16055    
Attachments: dmesg
dmesg full
fdisk -l
parted -l
Full dmesg showing warning in mark_inode_dirty for bdi-block
fdisk -l output
fix-bd_prepare_to_claim

Description Maciej Rutecki 2010-07-14 19:50:49 UTC
Subject    : kernel BUG at fs/block_dev.c:765!
Submitter  : Markus Trippelsdorf <markus@trippelsdorf.de>
Date       : 2010-07-14 13:52
Message-ID : 20100714135217.GA1797@arch.tripp.de
References : http://marc.info/?l=linux-kernel&m=127911564213748&w=2

This entry is being used for tracking a regression from 2.6.34.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Markus Trippelsdorf 2010-07-23 12:38:41 UTC
It happened again today. See attachment.
Comment 2 Markus Trippelsdorf 2010-07-23 12:40:36 UTC
Created attachment 27217 [details]
dmesg
Comment 3 Rafael J. Wysocki 2010-07-23 20:09:52 UTC
On Friday, July 23, 2010, Markus Trippelsdorf wrote:
> On Fri, Jul 23, 2010 at 01:47:36PM +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a summary report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.34.  Please verify if it still should be listed and let the
> tracking team
> > know (either way).
> 
> It happend again today:
> 
> ...
> Jul 23 09:22:00 arch kernel: r8169 0000:02:05.0: eth0: link up
> Jul 23 09:22:00 arch kernel: r8169: WARNING! Changing of MTU on this NIC may
> lead to frame reception errors!
> Jul 23 13:41:43 arch kernel: ------------[ cut here ]------------
> Jul 23 13:41:43 arch kernel: kernel BUG at fs/block_dev.c:765!
> Jul 23 13:41:43 arch kernel: invalid opcode: 0000 [#1] SMP
> Jul 23 13:41:43 arch kernel: last sysfs file:
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
> Jul 23 13:41:43 arch kernel: CPU 1
> Jul 23 13:41:43 arch kernel: Pid: 2616, comm: mount Not tainted
> 2.6.35-rc6-00012-g7e5a57e #32 M4A78T-E/System Product Name
> Jul 23 13:41:43 arch kernel: RIP: 0010:[<ffffffff810f66c7>] 
> [<ffffffff810f66c7>] __bd_abort_claiming+0x17/0x40
> Jul 23 13:41:43 arch kernel: RSP: 0018:ffff880035d07d18  EFLAGS: 00010207
> Jul 23 13:41:43 arch kernel: RAX: 0000000000009292 RBX: ffff88011f8249c0 RCX:
> 0000000000000002
> Jul 23 13:41:43 arch kernel: RDX: 0000000000000003 RSI: ffffffff8182f0a0 RDI:
> ffff88011f8249c0
> Jul 23 13:41:43 arch kernel: RBP: ffff880035d07d28 R08: ffff880035d07b68 R09:
> ffffffff81877ea8
> Jul 23 13:41:43 arch kernel: R10: ffff880035d07cb8 R11: 0000000000000010 R12:
> ffff88011f8249c0
> Jul 23 13:41:43 arch kernel: R13: 0000000000000001 R14: ffffffff8182f0a0 R15:
> 00000000ffffff85
> Jul 23 13:41:43 arch kernel: FS:  00007f727e6a8740(0000)
> GS:ffff880001a80000(0000) knlGS:0000000000000000
> Jul 23 13:41:43 arch kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 000000008005003b
> Jul 23 13:41:43 arch kernel: CR2: 00007f99480f2000 CR3: 000000003c34d000 CR4:
> 00000000000006e0
> Jul 23 13:41:43 arch kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> Jul 23 13:41:43 arch kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> Jul 23 13:41:43 arch kernel: Process mount (pid: 2616, threadinfo
> ffff880035d06000, task ffff88011afe0000)
> Jul 23 13:41:43 arch kernel: Stack:
> Jul 23 13:41:43 arch kernel: ffff880035d07d28 ffffffffffffff85
> ffff880035d07d68 ffffffff810f7b6c
> Jul 23 13:41:43 arch kernel: <0> ffff880035d07d98 0000000000000001
> ffffffff8182f0a0 0000000000000001
> Jul 23 13:41:43 arch kernel: <0> ffff88003c3c5000 ffffffff8182f0a0
> ffff880035d07de8 ffffffff810d3daa
> Jul 23 13:41:43 arch kernel: Call Trace:
> Jul 23 13:41:43 arch kernel: [<ffffffff810f7b6c>]
> open_bdev_exclusive+0xb3/0xc3
> Jul 23 13:41:43 arch kernel: [<ffffffff810d3daa>] get_sb_bdev+0x47/0x178
> Jul 23 13:41:43 arch kernel: [<ffffffff8113121c>] ?
> ext4_fill_super+0x0/0x208d
> Jul 23 13:41:43 arch kernel: [<ffffffff8112e6ba>] ext4_get_sb+0x18/0x1a
> Jul 23 13:41:43 arch kernel: [<ffffffff810d40af>] vfs_kern_mount+0x62/0xff
> Jul 23 13:41:43 arch kernel: [<ffffffff810d431b>] do_kern_mount+0x4d/0xef
> Jul 23 13:41:43 arch kernel: [<ffffffff810e86ad>] do_mount+0x6cb/0x744
> Jul 23 13:41:43 arch kernel: [<ffffffff810e87fb>] sys_mount+0x8c/0xcd
> Jul 23 13:41:43 arch kernel: [<ffffffff8102b93b>]
> system_call_fastpath+0x16/0x1b
> Jul 23 13:41:43 arch kernel: Code: c7 c3 f4 ff ff ff 5Jul 23 13:42:38 arch
> kernel: Linux version 2.6.35-rc6-00012-g7e5a57e (markus@arch.tripp.de) (gcc
> version 4.5.0 20100610 (prerelease) (GCC) ) #32 SMP Fri Jul 23 09:21:31 CEST
> 2010
> Jul 23 13:42:38 arch kernel: Command line:
> BOOT_IMAGE=/usr/src/linux/arch/x86/boot/bzImage root=/dev/sdb2 fbcon=rotate:3
> drm.edid_force_checksum=1 quiet
> Jul 23 13:42:38 arch kernel: BIOS-provided physical RAM map:
Comment 4 Tejun Heo 2010-08-02 16:33:17 UTC
Can you please attach boot log of a successful boot and the output of "fdisk -l"?

Thanks.
Comment 5 Markus Trippelsdorf 2010-08-02 16:42:15 UTC
Sure. See new attachments.
Thanks.
Comment 6 Markus Trippelsdorf 2010-08-02 16:43:03 UTC
Created attachment 27320 [details]
dmesg full
Comment 7 Markus Trippelsdorf 2010-08-02 16:44:22 UTC
Created attachment 27321 [details]
fdisk -l
Comment 8 Markus Trippelsdorf 2010-08-02 16:44:49 UTC
Created attachment 27322 [details]
parted -l
Comment 9 Larry Finger 2010-08-02 17:46:07 UTC
Created attachment 27325 [details]
Full dmesg showing warning in mark_inode_dirty for bdi-block

This dmesg is for my machine, which shows the warning at fs/fs-writeback.c:964 - a different location than for the OP.
Comment 10 Larry Finger 2010-08-02 17:48:48 UTC
Created attachment 27326 [details]
fdisk -l output

On my system, / is mounted on /dev/sda1 and /home is on /dev/sda3. Partition /dev/sda5 is not mounted.
Comment 11 Tejun Heo 2010-08-04 15:42:18 UTC
Patch posted upstream.

  http://thread.gmane.org/gmane.linux.acpi.devel/46852/focus=60778

Thanks.
Comment 12 Tejun Heo 2010-08-04 15:44:43 UTC
Created attachment 27349 [details]
fix-bd_prepare_to_claim

Patch attached.  Can you please verify the fix?

Thanks.
Comment 13 Markus Trippelsdorf 2010-08-04 15:54:37 UTC
Thanks Tejun. The bug happens very rarely (once every 3-4 weeks on this machine)
so it will be hard to verify. But the fix looks obviously right.
(BTW the problem was reported by me, Maciej just opened this bug report)
Comment 14 Tejun Heo 2010-08-04 15:59:31 UTC
Oops, sorry about that.  Updated.
Comment 15 Larry Finger 2010-08-04 16:27:48 UTC
Unfortunately, my box that shows the problem had a disk drive failure, and will be going into the shop today for warranty repair. Nothing lost and it will not take long to do the test when I get it back as I was already running with a replacement drive, but Best Buy owes me a 250 GB 2.5 inch disk.
Comment 16 Chuck Ebbert 2010-08-19 07:19:32 UTC
Should be fixed by Tejun's patch, now upstream:

commit e75aa85892b2ee78c79edac720868cbef16e62eb
"block_dev: always serialize exclusive open attempts"

This commit also went into 2.6.35.1
Comment 17 Florian Mickler 2010-08-30 09:14:58 UTC
Also reported to be fixed by Markus:
On Mon, 30 Aug 2010 02:56:04 +0200
Markus Trippelsdorf <markus@trippelsdorf.de> wrote:

> On Mon, Aug 30, 2010 at 01:13:29AM +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.34 and 2.6.35.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.34 and 2.6.35.  Please verify if it still should
> > be listed and let the tracking team know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=16393
> > Subject             : kernel BUG at fs/block_dev.c:765!
> > Submitter   : Markus Trippelsdorf <markus@trippelsdorf.de>
> > Date                : 2010-07-14 13:52 (47 days old)
> > Message-ID  : <20100714135217.GA1797@arch.tripp.de>
> > References  : http://marc.info/?l=linux-kernel&m=127911564213748&w=2
> > 
> 
> This bug is already fixed by:
> e75aa85892b2ee78c79edac720868cbef16e62eb
> 
> Please close this entry.
> Thanks.
>
Comment 18 Larry Finger 2010-08-30 14:05:32 UTC
Commit e75aa85892b2ee78c79edac720868cbef16e62eb does not fix the problem on my system. Running 2.6.36-rc3, I get the following traceback:

[   12.192109] ------------[ cut here ]------------
[   12.192118] WARNING: at fs/fs-writeback.c:972 __mark_inode_dirty+0x1d4/0x1e0()
[   12.192122] Hardware name: HP Pavilion dv2700 Notebook PC
[   12.192124] bdi-block not registered
[   12.192126] Modules linked in: loop(+) dm_mod ide_cd_mod cdrom snd_hda_codec_conexant ide_pci_generic snd_hda_intel snd_hda_codec r8712u(C) amd74xx snd_pcm ide_core snd_timer snd k8temp sg joydev snd_page_alloc forcedeth hwmon i2c_nforce2 serio_raw i2c_core battery ac button ext3 jbd mbcache ohci_hcd sd_mod ehci_hcd usbcore fan processor ahci libahci libata scsi_mod thermal
[   12.192164] Pid: 2338, comm: udisks-part-id Tainted: G         C  2.6.36-rc3-realtek+ #7
[   12.192167] Call Trace:
[   12.192174]  [<ffffffff8104614a>] warn_slowpath_common+0x7a/0xb0
[   12.192178]  [<ffffffff81046221>] warn_slowpath_fmt+0x41/0x50
[   12.192183]  [<ffffffff81325584>] ? _raw_spin_lock+0x34/0x40
[   12.192186]  [<ffffffff8112ded5>] ? __mark_inode_dirty+0x65/0x1e0
[   12.192190]  [<ffffffff8112e044>] __mark_inode_dirty+0x1d4/0x1e0
[   12.192195]  [<ffffffff811210bd>] touch_atime+0x12d/0x170
[   12.192201]  [<ffffffff810cbb61>] generic_file_aio_read+0x5c1/0x720
[   12.192205]  [<ffffffff810e7fbd>] ? handle_mm_fault+0x75d/0x950
[   12.192211]  [<ffffffff8110a862>] do_sync_read+0xd2/0x110
[   12.192214]  [<ffffffff8107a4a0>] ? mark_held_locks+0x70/0xa0
[   12.192218]  [<ffffffff8132436f>] ? __mutex_unlock_slowpath+0xbf/0x150
[   12.192222]  [<ffffffff8107a6cd>] ? trace_hardirqs_on+0xd/0x10
[   12.192226]  [<ffffffff8110af93>] vfs_read+0xc3/0x170
[   12.192229]  [<ffffffff8110b08c>] sys_read+0x4c/0x80
[   12.192234]  [<ffffffff81002d2b>] system_call_fastpath+0x16/0x1b
[   12.192237] ---[ end trace 9b3df7d4a57af8dc ]---
[   12.198091] loop: module loaded

Should this one be reopened, or should I post a new bug? Although mine has the same symptom as others have posted, it is likely to arise from a different cause.
Comment 19 Florian Mickler 2010-08-30 14:28:57 UTC
This is probably up to Tejun. 

I've reopened it in the meantime, so we don't forget about it.
Comment 20 Tejun Heo 2010-08-30 14:32:40 UTC
Larry, you're hitting a different WARN_ON().  Your symptom is not the same at all. :-) Please file a separate bug report.  Thanks.
Comment 21 Florian Mickler 2010-08-30 14:42:48 UTC
Closed.
Comment 22 Larry Finger 2010-08-30 14:59:53 UTC
The new bug report is #17461.