Bug 10500

Summary: nozomi: kernel BUG()s when card is removed
Product: Drivers Reporter: Evgeni Golov (sargentd)
Component: PCMCIAAssignee: Frank Seidel (fseidel)
Status: REJECTED UNREPRODUCIBLE    
Severity: normal CC: fseidel, oleg
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25 Subsystem:
Regression: --- Bisected commit-id:
Attachments: test patch: add flush_workqueue() to tty_exit()
test patch: try to identify the bad work_struct
patch to nozomi to better handle still pending work
config for kernel .2.6.25

Description Evgeni Golov 2008-04-21 00:56:09 UTC
Latest working kernel version: none found
Earliest failing kernel version: 2.6.25
Distribution: Debian Sid
Hardware Environment: ThinkPad X31, T-Mobile web'n'walk compact card (Option GTmax EMEA)
Problem Description:
When the card is used (e.g. by some monitor-applet showing the current signal status or just pppd) and you eject it from the PCMCIA slot, the kernel BUG()s, stopping my keyboard (everything else works fine).

Steps to reproduce:
1. Insert the card
2. Either start a ppp session or open /dev/noz{0,2} with screen or minicom
3. Eject the card

Result:
pccard: card ejected from slot 0
BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<00000000>]
*pde = 00000000 
Oops: 0000 [#1] 
Modules linked in: ppp_deflate zlib_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc nozomi pcmcia yenta_socket rsrc_nonstatic pcmcia_core sha256_generic nls_iso8859_1 nls_cp437 vfat fat rfcomm l2cap bluetooth ipv6 cpufreq_stats sd_mod fuse cpufreq_conservative usb_storage scsi_mod battery ac video output snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss ehci_hcd thinkpad_acpi snd_mixer_oss uhci_hcd iTCO_wdt led_class psmouse dock i2c_i801 usbcore snd_pcm snd_timer snd_page_alloc evdev [last unloaded: nozomi]

Pid: 5, comm: events/0 Not tainted (2.6.25-x31-1 #1)
EIP: 0060:[<00000000>] EFLAGS: 00010247 CPU: 0
EIP is at 0x0
EAX: dc6f9d38 EBX: df80a9c0 ECX: dc6f9d3c EDX: df80a9c0
ESI: dc6f9d38 EDI: 00000000 EBP: df867fa8 ESP: df867f98
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process events/0 (pid: 5, ti=df866000 task=df846120 task.ti=df866000)
Stack: c0123bb6 df80a9c0 df867fb0 df80a9c8 df867fd0 c0124209 00000000 df846120 
       c0126322 df867fbc df867fbc df80a9c0 c012415b 00000000 df867fe0 c01261aa 
       c0126171 00000000 00000000 c010494b df83ff14 00000000 00000000 00000000 
Call Trace:
 [<c0123bb6>] ? run_workqueue+0x66/0xd1
 [<c0124209>] ? worker_thread+0xae/0xba
 [<c0126322>] ? autoremove_wake_function+0x0/0x30
 [<c012415b>] ? worker_thread+0x0/0xba
 [<c01261aa>] ? kthread+0x39/0x5f
 [<c0126171>] ? kthread+0x0/0x5f
 [<c010494b>] ? kernel_thread_helper+0x7/0x10
 =======================
Code:  Bad EIP value.
EIP: [<00000000>] 0x0 SS:ESP 0068:df867f98
---[ end trace 8abfb25e219403a8 ]---
ACPI: PCI interrupt for device 0000:03:00.0 disabled
Comment 1 Oleg Nesterov 2008-04-23 10:00:30 UTC
Created attachment 15864 [details]
test patch: add flush_workqueue() to tty_exit()

Well, I don't understand this code at all, but since nobody answers...

Evgeni, could you try this patch to see if it makes any difference?

In any case, nozomi.c:tty_exit() does something strange with workqueues,


        tty_exit:

                flush_scheduled_work();
Why?
                if (dc->port[i].tty && list_empty(->hangup_work.entry)
                        tty_hangup(dc->port[i].tty);

I guess list_empty() means "this work is not queued". We have
work_pending(work) for that. But why do we need this check at all?
tty_hangup()->schedule_work() will fail if work_struct is queued.

Oleg.
Comment 2 Frank Seidel 2008-04-23 10:42:09 UTC
Sorry, i am still very ill and am mostly away from mail currently. Probably next week i can also give it a look. Sorry for my bad response..
Comment 3 Evgeni Golov 2008-04-23 11:07:08 UTC
@Oleg: Can I spend you a beer or something? No more errors and my keyboard still works after the eject. The patch works (however I don't know if this is the right way - Frank has to comment on that).

@Frank: health comes first, so "Gute Besserung". We have all the time of the world to fix this bug.

Regards
Evgeni
Comment 4 Oleg Nesterov 2008-04-24 03:57:35 UTC
On 04/23, bugme-daemon@bugzilla.kernel.org wrote:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=10500
> 
> ------- Comment #3 from sargentd@die-welt.net  2008-04-23 11:07 -------
> @Oleg: Can I spend you a beer or something?

vodka please ;)

> No more errors and my keyboard
> still works after the eject. The patch works

Great, thanks!

> (however I don't know if this is
> the right way - Frank has to comment on that).

Yes sure. Get better Frank!

Oleg.
Comment 5 Evgeni Golov 2008-04-24 05:27:22 UTC
sadly today I get the BUG()s again - dunno why I could not trigger them yesterday?
Comment 6 Oleg Nesterov 2008-04-24 07:32:36 UTC
Created attachment 15890 [details]
test patch: try to identify the bad work_struct

> sadly today I get the BUG()s

the same trace?

Please drop the previous patch. Could you try this one?

Hopefully it can report exactly which work_struct was corrupted.

Oleg.
Comment 7 Frank Seidel 2008-05-15 07:19:17 UTC
Created attachment 16156 [details]
patch to nozomi to better handle still pending work

First thanks a lot for your nice wishes :-)
I'm slowly getting better and tried some time already to reproduce this bug, but i wasn't able to trigger this problem.
Evgeni, could you please send me your kernel config? Perhaps this
would make it easier to also run into it.

Besides that, could you give this attached patch a try?
Thanks for your patience again :-)
Comment 8 Evgeni Golov 2008-05-22 15:12:08 UTC
Created attachment 16249 [details]
config for kernel .2.6.25

here is my kernel config, the patch did not help :(
Comment 9 Frank Seidel 2008-05-28 05:12:37 UTC
Thanks for your help and patience again. Yes, the patch
was utterly wrong. Sorry for that, but be sure you aren't
at all forgotten!
Comment 10 Frank Seidel 2009-02-11 06:01:44 UTC
Does this bug still happen with 2.6.29-rc4?
Comment 11 Evgeni Golov 2009-02-12 23:46:35 UTC
Unfortunately, I don't have the hardware anymore, so I can't test it any longer.
Sorry.
Comment 12 Frank Seidel 2009-02-13 02:39:01 UTC
Hm, so as i cannot reproduce the problem with my nozomi card i for now close this bug. Feel free to reopen when you experience this again..