Bug 4438

Summary: Badness in blk_remove_plug at drivers/block/ll_rw_blk.c:1238
Product: Networking Reporter: tobias (bugs)
Component: IPV4Assignee: Paul Mackerras (paulus)
Status: REJECTED INSUFFICIENT_DATA    
Severity: blocking CC: akpm, axboe, bunk
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.11.6 /2.6.11.3/ 2.6.12 / 2.6.13rc3,4 Subsystem:
Regression: --- Bisected commit-id:
Attachments: /proc/config.gz in 2.6.13rc3
bootlog
bootlog
config in 2.6.13-rc4
fixed config

Description tobias 2005-04-03 05:01:06 UTC
Distribution:linux fromscratch 5 and linuxfromscratch 6 current cvs
Hardware Environment:pentium3 500Mhz 384 MB ram Via mainboard SAMSUNG SC-140B
Harddisc
Software Environment: glibc 2.3.3. glibc 2.3.4 /
Problem Description: in the log apears following:

Apr  2 21:39:22 fujitsu Badness in blk_remove_plug at drivers/block/ll_rw_blk.c:1238
Apr  2 22:10:17 fujitsu [<c023ed09>] blk_remove_plug+0x69/0x70
Apr  2 22:10:17 fujitsu [<c023ed2d>] __generic_unplug_device+0x1d/0x40
Apr  2 22:10:17 fujitsu [<c0240689>] __make_request+0x509/0x530
Apr  2 22:10:17 fujitsu [<c012d0f0>] autoremove_wake_function+0x0/0x60
Apr  2 22:10:17 fujitsu [<c0240af4>] generic_make_request+0x184/0x250
Apr  2 22:10:17 fujitsu [<c0240af4>] generic_make_request+0x184/0x250
Apr  2 22:10:17 fujitsu [<c012d0f0>] autoremove_wake_function+0x0/0x60
Apr  2 22:10:17 fujitsu [<c012d0f0>] autoremove_wake_function+0x0/0x60
Apr  2 22:10:17 fujitsu [<c012d0f0>] autoremove_wake_function+0x0/0x60
Apr  2 22:10:17 fujitsu [<c013933f>] mempool_alloc+0x7f/0x1a0
Apr  2 22:10:17 fujitsu [<c012d0f0>] autoremove_wake_function+0x0/0x60
Apr  2 22:10:17 fujitsu [<c0240c16>] submit_bio+0x56/0xf0
Apr  2 22:10:17 fujitsu [<d91d143a>] xfs_convert_page+0x16a/0x2e0 [xfs]
Apr  2 22:10:17 fujitsu [<c015c89b>] bio_alloc+0x17b/0x1f0
Apr  2 22:10:17 fujitsu [<d9060aca>] lbmStartIO+0xca/0x130 [jfs]
Apr  2 22:10:17 fujitsu [<d90608f9>] lbmWrite+0x179/0x200 [jfs]
Apr  2 22:10:17 fujitsu [<d91d161f>] xfs_cluster_write+0x6f/0x90 [xfs]
Apr  2 22:10:17 fujitsu [<d905eb2c>] lmGCwrite+0x9c/0xf0 [jfs]
Apr  2 22:10:17 fujitsu [<d905e80d>] lmNextPage+0x14d/0x160 [jfs]
Apr  2 22:10:17 fujitsu [<d90624ea>] txLock+0x1ea/0x620 [jfs]
Apr  2 22:10:17 fujitsu [<d905e556>] lmWriteRecord+0x376/0x4e0 [jfs]
Apr  2 22:10:17 fujitsu [<c0113ae3>] __wake_up+0x53/0x80
Apr  2 22:10:17 fujitsu [<d905e037>] lmLog+0x67/0x210 [jfs]
Apr  2 22:10:17 fujitsu [<d9063560>] diLog+0x120/0x130 [jfs]
Apr  2 22:10:17 fujitsu [<d90633cd>] txLog+0xed/0x160 [jfs]
Apr  2 22:10:17 fujitsu [<d906316d>] txCommit+0x28d/0x400 [jfs]
Apr  2 22:10:17 fujitsu [<d9061e6a>] txBegin+0x40a/0x490 [jfs]
Apr  2 22:10:17 fujitsu [<d91d3595>] pagebuf_rele+0x25/0x130 [xfs]
Apr  2 22:10:17 fujitsu [<d90410af>] jfs_commit_inode+0x14f/0x170 [jfs]
Apr  2 22:10:17 fujitsu [<d9041123>] jfs_write_inode+0x53/0x80 [jfs]
Apr  2 22:10:17 fujitsu [<c017d432>] __sync_single_inode+0x132/0x270
Apr  2 22:10:17 fujitsu [<c017d5af>] __writeback_single_inode+0x3f/0x180
Apr  2 22:10:17 fujitsu [<d91da3f8>] xfs_bdstrat_cb+0x38/0x50 [xfs]
Apr  2 22:10:17 fujitsu [<d91c36d3>] xfs_trans_first_ail+0x13/0x20 [xfs]
Apr  2 22:10:17 fujitsu [<d91b32b0>] xfs_log_need_covered+0x80/0xe0 [xfs]
Apr  2 22:10:17 fujitsu [<d91c7498>] xfs_syncsub+0x198/0x380 [xfs]
Apr  2 22:10:17 fujitsu [<c017d895>] sync_sb_inodes+0x1a5/0x300
Apr  2 22:10:17 fujitsu [<c017dadd>] writeback_inodes+0xed/0x130
Apr  2 22:10:17 fujitsu [<c013ce00>] pdflush+0x0/0x30
Apr  2 22:10:17 fujitsu [<c013c106>] wb_kupdate+0x96/0x110
Apr  2 22:10:17 fujitsu [<c013ccd8>] __pdflush+0xd8/0x200
Apr  2 22:10:17 fujitsu [<c013ce26>] pdflush+0x26/0x30
Apr  2 22:10:17 fujitsu [<c013c070>] wb_kupdate+0x0/0x110
Apr  2 22:10:17 fujitsu [<c013ce00>] pdflush+0x0/0x30
Apr  2 22:10:17 fujitsu [<c012cb59>] kthread+0xa9/0xf0
Apr  2 22:10:17 fujitsu [<c012cab0>] kthread+0x0/0xf0
Apr  2 22:10:17 fujitsu [<c01012fd>] kernel_thread_helper+0x5/0x18


Badness in blk_remove_plug at drivers/block/ll_rw_blk.c:1238
 [<c01fb85f>] blk_remove_plug+0x6f/0x80
 [<c01fb898>] __generic_unplug_device+0x28/0x40
 [<c01fccd7>] __make_request+0x297/0x4a0
 [<c0125010>] autoremove_wake_function+0x0/0x60
 [<c01fd297>] generic_make_request+0x157/0x1f0
 [<c0125010>] autoremove_wake_function+0x0/0x60
 [<d9114abe>] xtLookup+0xde/0x1e0 [jfs]
 [<c0125010>] autoremove_wake_function+0x0/0x60
 [<c012fe03>] mempool_alloc+0x73/0x130
 [<c0125010>] autoremove_wake_function+0x0/0x60
 [<c01fd392>] submit_bio+0x62/0x100
 [<c014e4b7>] bio_alloc+0xe7/0x1e0
 [<d912d955>] lbmStartIO+0xa5/0xd0 [jfs]
 [<d912b154>] __get_metapage+0x54/0x360 [jfs]
 [<d912d80e>] lbmWrite+0xbe/0x110 [jfs]
 [<d912c180>] lmGCwrite+0xa0/0xf0 [jfs]
 [<d912bf4a>] lmNextPage+0xfa/0x100 [jfs]
 [<d912be0a>] lmWriteRecord+0x32a/0x370 [jfs]
 [<d912ba4c>] lmLog+0xbc/0x150 [jfs]
 [<d912f905>] dataLog+0xb5/0xd0 [jfs]
 [<d912f6ec>] txLog+0xcc/0x130 [jfs]
 [<d912f4f0>] txCommit+0x1c0/0x2f0 [jfs]
 [<d91249ed>] dtInsert+0x15d/0x220 [jfs]
 [<d91131da>] jfs_rename+0x48a/0x8b0 [jfs]
 [<c011fef2>] in_group_p+0x42/0x80
 [<c0156162>] generic_permission+0x72/0x130
 [<c011fef2>] in_group_p+0x42/0x80
 [<c01594b4>] vfs_rename_other+0xa4/0xf0
 [<c015972e>] vfs_rename+0x22e/0x450
 [<c0159b10>] sys_rename+0x1c0/0x1f0
 [<c0152d4b>] sys_lstat64+0x1b/0x40
 [<c010243d>] sysenter_past_esp+0x52/0x75

Steps to reproduce:
start and wait . This happends every few Minutes
Comment 1 Andrew Morton 2005-05-25 21:45:18 UTC
This is weird and shouldn't happen.

Can you please test 2.6.12-rc5 and update this report?

Thanks.
Comment 2 Adrian Bunk 2005-07-03 07:07:32 UTC
No answer for Andrew's question whether it's reproducible with a more recent kernel.

Please reopen this bug if you can still reproduce it with kernel 2.6.12.2.
Comment 3 tobias 2005-07-31 07:06:17 UTC
I is still in 2.6.13rc3

log from 2.6.13rc3

Badness in blk_remove_plug at drivers/block/ll_rw_blk.c:1424
 [<c0249559>] blk_remove_plug+0x69/0x70
 [<c024957a>] __generic_unplug_device+0x1a/0x30
 [<c024ac88>] __make_request+0x248/0x5a0
 [<c0140583>] mempool_alloc+0x33/0x110
 [<c0131240>] autoremove_wake_function+0x0/0x60
 [<c024b3ed>] generic_make_request+0x11d/0x210
 [<c0131240>] autoremove_wake_function+0x0/0x60
 [<c013cfaf>] find_lock_page+0xaf/0xe0
 [<c0131240>] autoremove_wake_function+0x0/0x60
 [<c0140583>] mempool_alloc+0x33/0x110
 [<d8f315b2>] _pagebuf_lookup_pages+0x1a2/0x2f0 [xfs]
 [<c024b536>] submit_bio+0x56/0xf0
 [<c0163ca0>] bio_alloc_bioset+0x130/0x1f0
 [<c0164244>] bio_add_page+0x34/0x40
 [<d8f325ef>] _pagebuf_ioapply+0x19f/0x2d0 [xfs]
 [<c0116cc0>] default_wake_function+0x0/0x20
 [<c0116cc0>] default_wake_function+0x0/0x20
 [<d8f32768>] pagebuf_iorequest+0x48/0x1b0 [xfs]
 [<c0116cc0>] default_wake_function+0x0/0x20
 [<c0116cc0>] default_wake_function+0x0/0x20
 [<d8f39268>] xfs_bdstrat_cb+0x38/0x50 [xfs]
 [<d8f322b6>] pagebuf_iostart+0x46/0xa0 [xfs]
 [<d8f39230>] xfs_bdstrat_cb+0x0/0x50 [xfs]
 [<d8f0bbe8>] xfs_iflush+0x2b8/0x4e0 [xfs]
 [<d8f2c867>] xfs_inode_flush+0x157/0x220 [xfs]
 [<d8f399f0>] linvfs_write_inode+0x40/0x80 [xfs]
 [<c0185fa2>] __sync_single_inode+0x132/0x270
 [<c018611f>] __writeback_single_inode+0x3f/0x180
 [<c0108b78>] enable_8259A_irq+0x48/0x90
 [<c013b591>] __do_IRQ+0x111/0x160
 [<c0186405>] sync_sb_inodes+0x1a5/0x300
 [<c018664d>] writeback_inodes+0xed/0x130
 [<c0143806>] wb_kupdate+0xb6/0x140
 [<c0144400>] pdflush+0x0/0x30
 [<c01442d8>] __pdflush+0xd8/0x200
 [<c0144426>] pdflush+0x26/0x30
 [<c0143750>] wb_kupdate+0x0/0x140
 [<c0144400>] pdflush+0x0/0x30
 [<c0130ca9>] kthread+0xa9/0xf0
 [<c0130c00>] kthread+0x0/0xf0
 [<c0101395>] kernel_thread_helper+0x5/0x10
Comment 4 Adrian Bunk 2005-07-31 07:22:51 UTC
I'm not full understanding what is going wrong here.

Jens, can you look into this?

Tobias, please attach 
- your .config and
- the output of scripts/ver_linux
to this bug.
Comment 5 tobias 2005-07-31 08:16:38 UTC
Created attachment 5433 [details]
/proc/config.gz in 2.6.13rc3

my config used in 2.6.13rc3
Comment 6 Jens Axboe 2005-08-01 00:03:52 UTC
Tobias, please also post a dmesg from a booted system. Any way I can attempt to
reproduce this?
Comment 7 tobias 2005-08-01 03:37:14 UTC
Created attachment 5446 [details]
bootlog 

I updated to 2.6.13rc4 , it still happens, here the boot log
Comment 8 tobias 2005-08-01 03:38:30 UTC
Created attachment 5447 [details]
bootlog 

I updated to 2.6.13rc4 , it still happens, here the boot log
Comment 9 tobias 2005-08-01 03:48:09 UTC
my output of scripts ver_linux
./scripts/ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux fujitsu.ti 2.6.13-rc4 #1 Mon Aug 1 11:19:36 CEST 2005 i686 pentium3 i386
GNU/Linux

Gnu C                  3.4.3
Gnu make               3.80
binutils               2.15.94.0.2.2
util-linux             2.12q
mount                  2.12q
module-init-tools      3.1
e2fsprogs              1.37
jfsutils               1.1.7
reiserfsprogs          3.6.19
reiser4progs           line
xfsprogs               2.6.25
quota-tools            3.12.
PPP                    2.4.3
nfs-utils              1.0.6
Linux C Library        2.3.4
Dynamic linker (ldd)   2.3.4
Linux C++ Library      6.0.3
Procps                 3.2.5
Net-tools              1.60
Kbd                    1.12
Sh-utils               5.2.1
udev                   058
Modules Loaded         snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_pcm_oss snd_mixer_oss nfsd lockd sunrpc ip6t_limit ip6table_filter
ip6_tables ipt_MASQUERADE ipt_TCPMSS ipt_state ipt_REJECT ipt_LOG ipt_limit
iptable_mangle iptable_nat iptable_filter ip_conntrack_ftp ip_conntrack_irc r128
drm ip_conntrack ip_tables af_packet ppp_synctty ppp_async crc_ccitt ppp_generic
slhc cls_route cls_u32 cls_fw sch_sfq sch_htb ohci_hcd ehci_hcd tsdev psmouse
8250_pnp 8250 serial_core floppy pcspkr rtc aty128fb snd_es1938 gameport snd_pcm
snd_page_alloc snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawmidi
snd_seq_device snd i2c_viapro i2c_core uhci_hcd usbcore pci_hotplug via_agp
agpgart evdev autofs4 squashfs zlib_inflate nls_iso8859_1 nls_cp437 vfat fat xfs
exportfs jfs loop ipv6 binfmt_misc parport_pc lp parport 8139too mii ide_cd
cdrom capability commoncap via82cxxx video thermal processor fan button battery
ac unix
Comment 10 tobias 2005-08-01 03:52:34 UTC
Created attachment 5448 [details]
config in 2.6.13-rc4

here my config in 2.6.13-rc4
Comment 11 Jens Axboe 2005-08-01 05:03:58 UTC
My only guess so far is that it is a bug somewhere else, that triggers the
irq_disabled() check in ll_rw_blk.c. Can you perhaps try without ppp and
netfilter? I can see it triggers pretty quickly, any idea if something starts
running after boot and triggers it?
Comment 12 tobias 2005-08-01 14:22:54 UTC
i did run the systen half an hour without ppp . there was no badness in the logs
in the time.

after after starting the internet without the firewall the messages about the
badness  were after a few minutes  in the log.

so the bug is not in the iptables code.

it might  be in the pppd driver .

I use pppoe (ADSL) using the userspace pppoe driver.

On my old Installation I used the kernel space driver , were the badness also
also was.

my ethernet driver is 8139too
Comment 13 Andrew Morton 2005-08-01 14:52:51 UTC
Well this is a good one.  How can ppp cause local IRQs to get anabled
again?  I don't see any notifier chain callouts or accidental softirq
runnings in that codepath.

Comment 14 Jens Axboe 2005-08-02 00:13:27 UTC
Yeah it's pretty funky... Tobias, can you try and disable preempt and see if
that changes anything?
Comment 15 Jens Axboe 2005-08-02 00:19:07 UTC
Reassigning
Comment 16 tobias 2005-08-02 07:32:36 UTC
Hi
Disabling preempt did not fix it
Comment 17 Paul Mackerras 2005-08-02 15:24:06 UTC
PPP pleads not guilty, your Honour.  Nowhere in ppp_generic.c does it disable or
enable interrupts (it all runs at softirq and process level).  There are only
two places in ppp_async.c where it disables/enables interrupts and in both
places it's quite clear that the disable and enable are balanced.  It sounds
like he is running ppp over a pty, but the pty code doesn't disable or enable
interrupts either.  The code in drivers/char/tty_io.c does, but the disables and
enables are all nicely balanced.

Maybe it's the netfilter code's fault?
Comment 18 Andrew Morton 2005-08-02 17:26:20 UTC
I don't see how it can be *anybody's* fault.   We disable local interrupts,
diddle around a bit and then detect that local interrupts somehow
got turned on again.   I just don't see a window in there in which
any other piece of code can get control.

Weird.

Comment 19 tobias 2005-08-08 11:37:34 UTC
Created attachment 5550 [details]
fixed config 

Hi I changed my kernel config , after removing most parts of my config, which
are not really needed on my system the the badness does not apear anymore.
Comment 20 Adrian Bunk 2006-11-28 21:52:17 UTC
Tobias, is this issue still present with any .config in recent kernels?
Comment 21 Adrian Bunk 2007-02-13 00:25:40 UTC
Please reopen this bug if it's still present with kernel 2.6.20.