Bug 7962
Summary: | oops in port_carrier_check | ||
---|---|---|---|
Product: | Networking | Reporter: | Pascal Terjan (pterjan) |
Component: | Other | Assignee: | Stephen Hemminger (stephen) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | akpm, protasnb, stephen |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.20-rc7 | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
Pascal Terjan
2007-02-07 12:31:42 UTC
The part in warnings : Feb 7 21:20:18 plop kernel: f0eed6f1 Feb 7 21:20:18 plop kernel: Modules linked in: bridge tun button kqemu snd_rtctimer ppp_async crc_ccitt ppp_generic slhc vfat fat vboxdrv sg sd_mod usb_storage scsi_mod cpufreq_userspace isofs capability commoncap radeon drm sit ipv6 snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eepro100 mii snd_intel8x0 snd_ac97_codec af_packet ac97_bus snd_pcm snd_timer snd_page_alloc snd soundcore video thermal sbs i2c_ec i2c_core fan dock container battery ac ide_cd cdrom binfmt_misc loop dm_mod pcmcia firmware_class yenta_socket rsrc_nonstatic pcmcia_core cpufreq_ondemand cpufreq_conservative cpufreq_powersave speedstep_centrino freq_table processor intel_agp agpgart ibm_acpi backlight nvram pcspkr usbmouse usbhid hid ff_memless ehci_hcd uhci_hcd usbcore joydev tsdev evdev ext3 jbd ide_generic Reply-To: akpm@linux-foundation.org Begin forwarded message: Date: Wed, 7 Feb 2007 12:41:07 -0800 From: bugme-daemon@bugzilla.kernel.org To: bugme-new@lists.osdl.org Subject: [Bugme-new] [Bug 7962] New: oops in port_carrier_check http://bugzilla.kernel.org/show_bug.cgi?id=7962 Summary: oops in port_carrier_check Kernel Version: 2.6.20-rc7 Status: NEW Severity: normal Owner: acme@conectiva.com.br Submitter: pterjan@gmail.com While playing with qemu, I got a oops in bridge (and lost keyboard) : Feb 7 21:20:18 plop kernel: BUG: unable to handle kernel paging request at virtual address 6b6b6b6b Feb 7 21:20:18 plop kernel: printing eip: Feb 7 21:20:18 plop kernel: *pde = 00000000 Feb 7 21:20:18 plop kernel: Oops: 0000 [#1] Feb 7 21:20:18 plop kernel: CPU: 0 Feb 7 21:20:19 plop kernel: EIP: 0060:[pg0+814360305/1067136000] Not tainted VLI Feb 7 21:20:19 plop kernel: EIP: 0060:[<f0eed6f1>] Not tainted VLI Feb 7 21:20:19 plop kernel: EFLAGS: 00010202 (2.6.20.0.rc7-1mdv #1) Feb 7 21:20:19 plop kernel: EIP is at port_carrier_check+0x22/0x75 [bridge] Feb 7 21:20:19 plop kernel: eax: 6b6b6b6b ebx: 6b6b6b6b ecx: 00000000 edx: 00000001 Feb 7 21:20:19 plop kernel: esi: eb99b120 edi: 00000296 ebp: eff0bf58 esp: eff0bf4c Feb 7 21:20:19 plop kernel: ds: 007b es: 007b ss: 0068 Feb 7 21:20:19 plop kernel: Process events/0 (pid: 4, ti=eff0a000 task=eff09530 task.ti=eff0a000) Feb 7 21:20:19 plop kernel: Stack: cd566744 eff4e86c 00000296 eff0bf84 c012534a eff0bf70 00000296 eff0bfa0 Feb 7 21:20:19 plop kernel: eff0bfac cd566740 f0eed6cf eff4e86c eff03ec8 eff0bfb4 eff0bfc4 c012590d Feb 7 21:20:19 plop kernel: 00000001 00000000 00000001 00010000 00000000 00000000 eff09530 c0114770 Feb 7 21:20:19 plop kernel: Call Trace: Feb 7 21:20:19 plop kernel: [show_trace_log_lvl+26/47] show_trace_log_lvl+0x1a/0x2f Feb 7 21:20:19 plop kernel: [<c010422c>] show_trace_log_lvl+0x1a/0x2f Feb 7 21:20:19 plop kernel: [show_stack_log_lvl+155/163] show_stack_log_lvl+0x9b/0xa3 Feb 7 21:20:19 plop kernel: [<c01042dc>] show_stack_log_lvl+0x9b/0xa3 Feb 7 21:20:19 plop kernel: [show_registers+402/616] show_registers+0x192/0x268 Feb 7 21:20:19 plop kernel: [<c0104476>] show_registers+0x192/0x268 Feb 7 21:20:19 plop kernel: [die+234/511] die+0xea/0x1ff Feb 7 21:20:19 plop kernel: [<c0104636>] die+0xea/0x1ff Feb 7 21:20:19 plop kernel: [do_page_fault+1111/1334] do_page_fault+0x457/0x536 Feb 7 21:20:19 plop kernel: [<c02c0c73>] do_page_fault+0x457/0x536 Feb 7 21:20:19 plop kernel: [error_code+116/128] error_code+0x74/0x80 Feb 7 21:20:19 plop kernel: [<c02bf624>] error_code+0x74/0x80 Feb 7 21:20:19 plop kernel: [run_workqueue+142/333] run_workqueue+0x8e/0x14d Feb 7 21:20:19 plop kernel: [<c012534a>] run_workqueue+0x8e/0x14d Feb 7 21:20:19 plop kernel: [worker_thread+260/302] worker_thread+0x104/0x12e Feb 7 21:20:19 plop kernel: [<c012590d>] worker_thread+0x104/0x12e Feb 7 21:20:19 plop kernel: [kthread+163/206] kthread+0xa3/0xce Feb 7 21:20:19 plop kernel: [<c0127e55>] kthread+0xa3/0xce Feb 7 21:20:19 plop kernel: [kernel_thread_helper+7/16] kernel_thread_helper+0x7/0x10 Feb 7 21:20:19 plop kernel: [<c0103ed7>] kernel_thread_helper+0x7/0x10 Feb 7 21:20:19 plop kernel: ======================= Feb 7 21:20:19 plop kernel: Code: 38 cf 89 d8 5b 5e 5f 5d c3 55 89 e5 57 56 53 8b b0 24 ff ff ff 0f ba 30 00 e8 d3 20 38 cf 8b 9e 40 02 00 00 85 db 74 4c 8b 46 2c <8b> 3b a8 10 75 0a 89 f0 e8 e2 f9 ff ff 89 43 2c 8b 47 30 f6 40 Feb 7 21:20:19 plop kernel: EIP: [pg0+814360305/1067136000] port_carrier_check+0x22/0x75 [bridge] SS:ESP 0068:eff0bf4c Feb 7 21:20:19 plop kernel: EIP: [<f0eed6f1>] port_carrier_check+0x22/0x75 [bridge] SS:ESP 0068:eff0bf4c ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. Reply-To: shemminger@linux-foundation.org On Wed, 7 Feb 2007 12:52:16 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > > > Begin forwarded message: > > Date: Wed, 7 Feb 2007 12:41:07 -0800 > From: bugme-daemon@bugzilla.kernel.org > To: bugme-new@lists.osdl.org > Subject: [Bugme-new] [Bug 7962] New: oops in port_carrier_check > > > http://bugzilla.kernel.org/show_bug.cgi?id=7962 > > Summary: oops in port_carrier_check > Kernel Version: 2.6.20-rc7 > Status: NEW > Severity: normal > Owner: acme@conectiva.com.br > Submitter: pterjan@gmail.com > > > While playing with qemu, I got a oops in bridge (and lost keyboard) : > > Feb 7 21:20:18 plop kernel: BUG: unable to handle kernel paging request at > virtual address 6b6b6b6b > Feb 7 21:20:18 plop kernel: printing eip: > Feb 7 21:20:18 plop kernel: *pde = 00000000 > Feb 7 21:20:18 plop kernel: Oops: 0000 [#1] > Feb 7 21:20:18 plop kernel: CPU: 0 > Feb 7 21:20:19 plop kernel: EIP: 0060:[pg0+814360305/1067136000] Not > tainted VLI > Feb 7 21:20:19 plop kernel: EIP: 0060:[<f0eed6f1>] Not tainted VLI > Feb 7 21:20:19 plop kernel: EFLAGS: 00010202 (2.6.20.0.rc7-1mdv #1) > Feb 7 21:20:19 plop kernel: EIP is at port_carrier_check+0x22/0x75 [bridge] > Feb 7 21:20:19 plop kernel: eax: 6b6b6b6b ebx: 6b6b6b6b ecx: 00000000 > edx: 00000001 > Feb 7 21:20:19 plop kernel: esi: eb99b120 edi: 00000296 ebp: eff0bf58 > esp: eff0bf4c > Feb 7 21:20:19 plop kernel: ds: 007b es: 007b ss: 0068 > Feb 7 21:20:19 plop kernel: Process events/0 (pid: 4, ti=eff0a000 task=eff09530 > task.ti=eff0a000) > Feb 7 21:20:19 plop kernel: Stack: cd566744 eff4e86c 00000296 eff0bf84 c012534a > eff0bf70 00000296 eff0bfa0 > Feb 7 21:20:19 plop kernel: eff0bfac cd566740 f0eed6cf eff4e86c eff03ec8 > eff0bfb4 eff0bfc4 c012590d > Feb 7 21:20:19 plop kernel: 00000001 00000000 00000001 00010000 00000000 > 00000000 eff09530 c0114770 > Feb 7 21:20:19 plop kernel: Call Trace: > Feb 7 21:20:19 plop kernel: [show_trace_log_lvl+26/47] > show_trace_log_lvl+0x1a/0x2f > Feb 7 21:20:19 plop kernel: [<c010422c>] show_trace_log_lvl+0x1a/0x2f > Feb 7 21:20:19 plop kernel: [show_stack_log_lvl+155/163] > show_stack_log_lvl+0x9b/0xa3 > Feb 7 21:20:19 plop kernel: [<c01042dc>] show_stack_log_lvl+0x9b/0xa3 > Feb 7 21:20:19 plop kernel: [show_registers+402/616] show_registers+0x192/0x268 > Feb 7 21:20:19 plop kernel: [<c0104476>] show_registers+0x192/0x268 > Feb 7 21:20:19 plop kernel: [die+234/511] die+0xea/0x1ff > Feb 7 21:20:19 plop kernel: [<c0104636>] die+0xea/0x1ff > Feb 7 21:20:19 plop kernel: [do_page_fault+1111/1334] do_page_fault+0x457/0x536 > Feb 7 21:20:19 plop kernel: [<c02c0c73>] do_page_fault+0x457/0x536 > Feb 7 21:20:19 plop kernel: [error_code+116/128] error_code+0x74/0x80 > Feb 7 21:20:19 plop kernel: [<c02bf624>] error_code+0x74/0x80 > Feb 7 21:20:19 plop kernel: [run_workqueue+142/333] run_workqueue+0x8e/0x14d > Feb 7 21:20:19 plop kernel: [<c012534a>] run_workqueue+0x8e/0x14d > Feb 7 21:20:19 plop kernel: [worker_thread+260/302] worker_thread+0x104/0x12e > Feb 7 21:20:19 plop kernel: [<c012590d>] worker_thread+0x104/0x12e > Feb 7 21:20:19 plop kernel: [kthread+163/206] kthread+0xa3/0xce > Feb 7 21:20:19 plop kernel: [<c0127e55>] kthread+0xa3/0xce > Feb 7 21:20:19 plop kernel: [kernel_thread_helper+7/16] > kernel_thread_helper+0x7/0x10 > Feb 7 21:20:19 plop kernel: [<c0103ed7>] kernel_thread_helper+0x7/0x10 > Feb 7 21:20:19 plop kernel: ======================= > Feb 7 21:20:19 plop kernel: Code: 38 cf 89 d8 5b 5e 5f 5d c3 55 89 e5 57 56 53 > 8b b0 24 ff ff ff 0f ba 30 00 e8 d3 20 38 cf 8b 9e 40 02 00 00 85 db 74 4c 8b 46 > 2c <8b> 3b a8 10 75 0a 89 f0 e8 e2 f9 ff ff 89 43 2c 8b 47 30 f6 40 > Feb 7 21:20:19 plop kernel: EIP: [pg0+814360305/1067136000] > port_carrier_check+0x22/0x75 [bridge] SS:ESP 0068:eff0bf4c > Feb 7 21:20:19 plop kernel: EIP: [<f0eed6f1>] port_carrier_check+0x22/0x75 > [bridge] SS:ESP 0068:eff0bf4c > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. I wonder if this is work_queue API change fallout. On 07-02-2007 23:09, Stephen Hemminger wrote: > On Wed, 7 Feb 2007 12:52:16 -0800 > Andrew Morton <akpm@linux-foundation.org> wrote: ... >> Feb 7 21:20:18 plop kernel: BUG: unable to handle kernel paging request at >> virtual address 6b6b6b6b >> Feb 7 21:20:18 plop kernel: printing eip: >> Feb 7 21:20:18 plop kernel: *pde = 00000000 >> Feb 7 21:20:18 plop kernel: Oops: 0000 [#1] >> Feb 7 21:20:18 plop kernel: CPU: 0 >> Feb 7 21:20:19 plop kernel: EIP: 0060:[pg0+814360305/1067136000] Not >> tainted VLI >> Feb 7 21:20:19 plop kernel: EIP: 0060:[<f0eed6f1>] Not tainted VLI >> Feb 7 21:20:19 plop kernel: EFLAGS: 00010202 (2.6.20.0.rc7-1mdv #1) >> Feb 7 21:20:19 plop kernel: EIP is at port_carrier_check+0x22/0x75 [bridge] >> Feb 7 21:20:19 plop kernel: eax: 6b6b6b6b ebx: 6b6b6b6b ecx: 00000000 I think it's caused by pending delayed workqueue trying to use dev after kfree (POISON_FREE in eax, ebx). > static void port_carrier_check(struct work_struct *work) > { > struct net_bridge_port *p; > struct net_device *dev; > struct net_bridge *br; > > dev = container_of(work, struct net_bridge_port, > carrier_check.work)->dev; > work_release(work); > > rtnl_lock(); > p = dev->br_port; > if (!p) > goto done; > br = p->br; > > if (netif_carrier_ok(dev)) > p->path_cost = port_cost(dev); > > if (br->dev->flags & IFF_UP) { My investigation seems to point at this line (p == ebx but not NULL because of mem debugging on, probably). Regards, Jarek P. Reply-To: shemminger@linux-foundation.org On Fri, 9 Feb 2007 08:42:11 +0100 Jarek Poplawski <jarkao2@o2.pl> wrote: > On 07-02-2007 23:09, Stephen Hemminger wrote: > > On Wed, 7 Feb 2007 12:52:16 -0800 > > Andrew Morton <akpm@linux-foundation.org> wrote: > ... > >> Feb 7 21:20:18 plop kernel: BUG: unable to handle kernel paging request at > >> virtual address 6b6b6b6b > >> Feb 7 21:20:18 plop kernel: printing eip: > >> Feb 7 21:20:18 plop kernel: *pde = 00000000 > >> Feb 7 21:20:18 plop kernel: Oops: 0000 [#1] > >> Feb 7 21:20:18 plop kernel: CPU: 0 > >> Feb 7 21:20:19 plop kernel: EIP: 0060:[pg0+814360305/1067136000] Not > >> tainted VLI > >> Feb 7 21:20:19 plop kernel: EIP: 0060:[<f0eed6f1>] Not tainted VLI > >> Feb 7 21:20:19 plop kernel: EFLAGS: 00010202 (2.6.20.0.rc7-1mdv #1) > >> Feb 7 21:20:19 plop kernel: EIP is at port_carrier_check+0x22/0x75 [bridge] > >> Feb 7 21:20:19 plop kernel: eax: 6b6b6b6b ebx: 6b6b6b6b ecx: 00000000 > > I think it's caused by pending delayed workqueue > trying to use dev after kfree (POISON_FREE in eax, ebx). > > > static void port_carrier_check(struct work_struct *work) > > { > > struct net_bridge_port *p; > > struct net_device *dev; > > struct net_bridge *br; > > > > dev = container_of(work, struct net_bridge_port, > > carrier_check.work)->dev; > > work_release(work); > > > > rtnl_lock(); > > p = dev->br_port; > > if (!p) > > goto done; > > br = p->br; > > > > if (netif_carrier_ok(dev)) > > p->path_cost = port_cost(dev); > > > > if (br->dev->flags & IFF_UP) { > > My investigation seems to point at this line (p == ebx > but not NULL because of mem debugging on, probably). > The carrier_check is canceled by removal of port from bridge. Perhaps there is something broken in rcu assumptions under Qemu 2007/2/9, Stephen Hemminger <shemminger@linux-foundation.org>: > The carrier_check is canceled by removal of port from bridge. > Perhaps there is something broken in rcu assumptions under Qemu If that can help: I started /stopped qemu several times. Maybe I started /stopped qemu several times as I was testing new PXE support in qemu with different virtual nic. Each time, a tun device was created by qemu at startup and added to the bridge, and destroyed on exit. On Fri, Feb 09, 2007 at 09:52:04AM -0800, Stephen Hemminger wrote: > On Fri, 9 Feb 2007 08:42:11 +0100 > Jarek Poplawski <jarkao2@o2.pl> wrote: > > > On 07-02-2007 23:09, Stephen Hemminger wrote: > > > On Wed, 7 Feb 2007 12:52:16 -0800 > > > Andrew Morton <akpm@linux-foundation.org> wrote: > > ... > > >> Feb 7 21:20:18 plop kernel: BUG: unable to handle kernel paging request at > > >> virtual address 6b6b6b6b > > >> Feb 7 21:20:18 plop kernel: printing eip: > > >> Feb 7 21:20:18 plop kernel: *pde = 00000000 > > >> Feb 7 21:20:18 plop kernel: Oops: 0000 [#1] > > >> Feb 7 21:20:18 plop kernel: CPU: 0 > > >> Feb 7 21:20:19 plop kernel: EIP: 0060:[pg0+814360305/1067136000] Not > > >> tainted VLI > > >> Feb 7 21:20:19 plop kernel: EIP: 0060:[<f0eed6f1>] Not tainted VLI > > >> Feb 7 21:20:19 plop kernel: EFLAGS: 00010202 (2.6.20.0.rc7-1mdv #1) > > >> Feb 7 21:20:19 plop kernel: EIP is at port_carrier_check+0x22/0x75 [bridge] > > >> Feb 7 21:20:19 plop kernel: eax: 6b6b6b6b ebx: 6b6b6b6b ecx: 00000000 > > > > I think it's caused by pending delayed workqueue > > trying to use dev after kfree (POISON_FREE in eax, ebx). > > > > > static void port_carrier_check(struct work_struct *work) > > > { > > > struct net_bridge_port *p; > > > struct net_device *dev; > > > struct net_bridge *br; > > > > > > dev = container_of(work, struct net_bridge_port, > > > carrier_check.work)->dev; > > > work_release(work); > > > > > > rtnl_lock(); > > > p = dev->br_port; > > > if (!p) > > > goto done; > > > br = p->br; > > > > > > if (netif_carrier_ok(dev)) > > > p->path_cost = port_cost(dev); > > > > > > if (br->dev->flags & IFF_UP) { > > > > My investigation seems to point at this line (p == ebx > > but not NULL because of mem debugging on, probably). Sorry, I overpasted. This is the line: --> br = p->br; > The carrier_check is canceled by removal of port from bridge. > Perhaps there is something broken in rcu assumptions under Qemu If you mean this: > static void del_nbp(struct net_bridge_port *p) > { > ... > cancel_delayed_work(&p->carrier_check); it's not sufficient. According to workqueue.h: > /* > * Kill off a pending schedule_delayed_work(). Note that the work callback > * function may still be running on return from cancel_delayed_work(). Run > * flush_scheduled_work() to wait on it. > */ > static inline int cancel_delayed_work(struct delayed_work *work) I can't see how rcu could help here with this pointer to dev passed on to delayed_work (out of any rcu block). IMHO dev_hold/dev_put (or something alike) is needed here. Regards, Jarek P. Pascal, Any updates, have you tested with later kernels/qemu since? Thanks. I suspect we fixed this - it sounds familiar. Steve, can you take a look please? The carrier/delayed work was reworked some time ago to fix problems like this. So yes, it can be closed. See also. Author: Stephen Hemminger <shemminger@linux-foundation.org> 2007-02-22 09:10:18 Committer: David S. Miller <davem@sunset.davemloft.net> 2007-02-26 19:42:59 Parent: a10d567c89dfba90dde2e0515e25760fd74cde06 ([BRIDGE] br_if: Fix oops in port_carrier_check) Child: de79059ecd7cd650f3788ece978a64586921d1f1 ([BRIDGE]: adding new device to bridge should enable if up) Branches: origin, master Follows: v2.6.21-rc1 Precedes: v2.6.21-rc2 [BRIDGE]: eliminate workqueue for carrier check Having a work queue for checking carrier leads to lots of race issues. Simpler to just get the cost when data structure is created and update on change. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> |