Bug 8424

Summary: AMD64 specific: PCMCIA CompactFlash reader fails to start, kernel oops
Product: Platform Specific/Hardware Reporter: Andrey Zaitsev (a.zaitsev)
Component: x86-64Assignee: Andi Kleen (andi-bz)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, andi-bz, gavrie, htejun, protasnb, vbraun
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.20 Subsystem:
Regression: --- Bisected commit-id:
Attachments: HW info of my laptop
dmesg of i386 system
dmesg of i386 system
working patch

Description Andrey Zaitsev 2007-05-03 06:15:23 UTC
Most recent kernel where this bug did *NOT* occur: No info
Distribution: Ubuntu 7.04 amd64
Hardware Environment: Acer Aspire 5101 AWLMi 
Software Environment: Ubuntu Linux 7.04 AMD64
Problem Description:

Probably this bug is a dublicate of #7711, but the dmesg output differs 
somehow. So when inserting pcmcia cf reader with a cf card in it I get the 
following dmesg output:

[ 267.670692] pccard: PCMCIA card inserted into slot 0
[ 267.670698] cs: memory probe 0x0c0000-0x0fffff: excluding 0xc0000-0xcffff 
0xe0000-0xfffff
[ 267.675119] cs: memory probe 0x30000000-0x33ffffff: excluding 0x30000000-
0x33ffffff
[ 267.675137] cs: memory probe 0x60000000-0x60ffffff: clean.
[ 267.683202] cs: memory probe 0xa0000000-0xa0ffffff: clean.
[ 267.691279] cs: memory probe 0xd0200000-0xd02fffff: excluding 0xd0200000-
0xd021ffff
[ 267.698541] pcmcia: registering new device pcmcia0.0
[ 267.913779] Unable to handle kernel NULL pointer dereference at 
0000000000000000 RIP:
[ 267.913784] [<ffffffff80222ef2>] dma_alloc_coherent+0x52/0x240
[ 267.913792] PGD 18c12067 PUD 18c16067 PMD 0
[ 267.913795] Oops: 0000 [1] SMP
[ 267.913798] CPU 0
[ 267.913799] Modules linked in: pata_pcmcia wlan_tkip ipv6 binfmt_misc rfcomm 
l2cap bluetooth fglrx(P) ppdev powernow_k8 cpufreq_conservative cpufreq_stats
 cpufreq_ondemand freq_table cpufreq_userspace cpufreq_powersave dev_acpi 
tc1100_wmi sony_acpi pcc_acpi sbs i2c_ec dock container asus_acpi button video 
batt
ery ac backlight parport_pc lp parport fuse joydev pcmcia snd_hda_intel 
snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss 
snd_seq_mid
i snd_rawmidi snd_seq_midi_event snd_seq yenta_socket rsrc_nonstatic 
pcmcia_core snd_timer snd_seq_device wlan_scan_sta sdhci ath_rate_sample pcspkr 
psmouse
k8temp mmc_core serio_raw i2c_piix4 ath_pci wlan snd soundcore i2c_core shpchp 
pci_hotplug snd_page_alloc ath_hal(P) af_packet tsdev evdev ext3 jbd mbcache 8
139cp ide_cd cdrom ide_disk atiixp generic 8139too mii ehci_hcd ata_generic 
libata scsi_mod ohci_hcd usbcore thermal processor fan fbcon tileblit font 
bitbli
t softcursor vesafb cfbcopyarea cfbimgblt cfbfillrect capability commoncap
[ 267.913850] Pid: 5809, comm: modprobe Tainted: P 2.6.20-15-generic #2
[ 267.913853] RIP: 0010:[<ffffffff80222ef2>] [<ffffffff80222ef2>] 
dma_alloc_coherent+0x52/0x240
[ 267.913860] RSP: 0000:ffff810001869748 EFLAGS: 00010206
[ 267.913862] RAX: 0000000000000000 RBX: 00000000000010d0 RCX: 00000000000010d4
[ 267.913865] RDX: 00000000ffffffff RSI: 0000000000000800 RDI: ffff810005c9a0c0
[ 267.913868] RBP: 00000000000000d0 R08: 0000000000000000 R09: ffff810005f9bc80
[ 267.913871] R10: 00000000fffffffc R11: 0000000000000001 R12: 00000000000007ff
[ 267.913874] R13: 00000000ffffffff R14: ffff810005c9a0c0 R15: 0000000000000800
[ 267.913878] FS: 00002b2ea423a6f0(0000) GS:ffffffff8054e000(0000) 
knlGS:0000000000000000
[ 267.913881] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 267.913883] CR2: 0000000000000000 CR3: 0000000018d2a000 CR4: 00000000000006e0
[ 267.913887] Process modprobe (pid: 5809, threadinfo ffff810001868000, task 
ffff81001bc93040)
[ 267.913889] Stack: ffff810018ce0558 ffff810005f9bc98 00000000000000d0 
0000000000000000
[ 267.913895] ffff810018ce0558 0000000000000800 ffff810005c9a0c0 
ffffffff803aa35b
[ 267.913899] 0000000000000002 ffff810018ce0520 ffff810005c9a0c0 
000000000001a10e
[ 267.913903] Call Trace:
[ 267.913915] [<ffffffff803aa35b>] dmam_alloc_coherent+0x5b/0xb0
[ 267.913940] [<ffffffff8809ab46>] :libata:ata_port_start+0x26/0x70
[ 267.913957] [<ffffffff8809f70b>] :libata:ata_device_add+0x23b/0x540
[ 267.913974] [<ffffffff884be4dc>] :pata_pcmcia:pcmcia_init_one+0x47c/0x4f0
[ 267.914093] [<ffffffff80309ff9>] sysfs_make_dirent+0x29/0xb0
[ 267.914106] [<ffffffff88314936>] :pcmcia:pcmcia_device_probe+0xd6/0x150
[ 267.914118] [<ffffffff803a6ad5>] really_probe+0xe5/0x190
[ 267.914126] [<ffffffff803a6dac>] __driver_attach+0x7c/0xd0
[ 267.914132] [<ffffffff803a6d30>] __driver_attach+0x0/0xd0
[ 267.914136] [<ffffffff803a5d29>] bus_for_each_dev+0x49/0x80
[ 267.914147] [<ffffffff803a612e>] bus_add_driver+0x7e/0x1e0
[ 267.914156] [<ffffffff802ac395>] sys_init_module+0x1905/0x1ab0
[ 267.914176] [<ffffffff802315a1>] __up_write+0x21/0x130
[ 267.914193] [<ffffffff8026111e>] system_call+0x7e/0x83
[ 267.914206]
[ 267.914207]
[ 267.914208] Code: 4c 23 28 49 39 d5 0f 46 d9 49 c1 ec 0b 90 4c 89 e0 ba ff ff
[ 267.914216] RIP [<ffffffff80222ef2>] dma_alloc_coherent+0x52/0x240
[ 267.914221] RSP <ffff810001869748>
[ 267.914223] CR2: 0000000000000000
[ 267.914225]

Please fix it.


Steps to reproduce: plug a pcmcia compact flash reader in a pcmcia slot, and 
inspect dmesg.
Comment 1 Andrey Zaitsev 2007-05-03 06:19:12 UTC
Created attachment 11384 [details]
HW info of my laptop
Comment 2 Andrew Morton 2007-05-03 10:38:30 UTC
Could you please verify that the same thing happens if the fglrx
driver has never been loaded (it surely will, but stranger things
have happened)

Thanks.
Comment 3 Andrey Zaitsev 2007-05-03 12:07:40 UTC
Created attachment 11386 [details]
dmesg of i386 system

Yes the bug is there if no fglrx loaded. I've booted from ubuntu's livecd
(there is bo fglrx by default) and could not mount CF either. Dmesg is the
same. I can post it if needed.

Nevertheless I've just tried i386 version of Ubuntu 7.04 and CF mounts
perfectly in it. I've attached the corresponding part of dmesg - it seems that
everything is fine in 32bit kernel.
Comment 4 Andrey Zaitsev 2007-05-03 12:08:54 UTC
Created attachment 11387 [details]
dmesg of i386 system

Yes the bug is there if no fglrx loaded. I've booted from ubuntu's livecd
(there is bo fglrx by default) and could not mount CF either. Dmesg is the
same. I can post it if needed.

Nevertheless I've just tried i386 version of Ubuntu 7.04 and CF mounts
perfectly in it. I've attached the corresponding part of dmesg - it seems that
everything is fine in 32bit kernel.
Comment 5 Tejun Heo 2007-05-03 12:43:30 UTC
Hmm... It seems to be a bug in x86_64 dma_alloc_coherent() not in libata. 
Cc'ing Andi.  Andi, what do you think?
Comment 6 Alan 2007-06-05 08:05:30 UTC
x86_64 seems to object to the DMA API's being used with a platform device. I
don't see why onr eading the code so maybe its misleading.

Can you remove the line which says

            .port_start = ata_port_start

recompile and see what happens ?
Comment 7 Andrey Zaitsev 2007-06-05 10:59:39 UTC
Alan, where should I change this line (which file)?

Thank you.
Comment 8 Alan 2007-06-05 15:25:55 UTC
drivers/ata/pata_pcmcia.c
Comment 9 Natalie Protasevich 2007-07-22 02:39:04 UTC
Any updates on this problem? Andrey, did you get chance to test according to #6?
Thanks.
Comment 10 Volker Braun 2007-07-22 10:20:28 UTC
I'm suffering from the same bug on my Lenovo thinkpad T61 running Fedora 7 x86_64, both with the current Fedora and vanilla 2.6.21.6 kernel. I tried the suggestion from #6 but still get an Oops, although in a slightly different place:

Jul 22 11:06:15 thinkpad kernel: pccard: PCMCIA card inserted into slot 0
Jul 22 11:06:15 thinkpad kernel: cs: memory probe 0x0c0000-0x0fffff: excluding 0xc0000-0xd3fff 0xe0000-0xfffff
Jul 22 11:06:15 thinkpad kernel: cs: memory probe 0x60000000-0x60ffffff: excluding 0x60000000-0x60ffffff
Jul 22 11:06:15 thinkpad kernel: cs: memory probe 0xa0000000-0xa0ffffff: excluding 0xa0000000-0xa0ffffff
Jul 22 11:06:15 thinkpad kernel: cs: memory probe 0xf4000000-0xf7ffffff: excluding 0xf4000000-0xf7ffffff
Jul 22 11:06:15 thinkpad kernel: cs: memory probe 0xf8300000-0xfbffffff: excluding 0xf8300000-0xf86cffff 0xf8e70000-0xf923ffff 0xf9db0000-0xfa17ffff 0xfacf0000-0xfb0bffff
Jul 22 11:06:15 thinkpad kernel: pcmcia: registering new device pcmcia0.0
Jul 22 11:06:15 thinkpad kernel: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 
Jul 22 11:06:15 thinkpad kernel:  [<0000000000000000>]
Jul 22 11:06:15 thinkpad kernel: PGD 1086b4067 PUD 1086b8067 PMD 0 
Jul 22 11:06:15 thinkpad kernel: Oops: 0010 [1] SMP 
Jul 22 11:06:15 thinkpad kernel: CPU 1 
Jul 22 11:06:15 thinkpad kernel: Modules linked in: pata_pcmcia i915 drm autofs4 hidp rfcomm l2cap nf_conntrack_netbios_ns nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink ipt_REJECT iptable_filter ip_tables xt_tcpudp ip6t_REJECT ip6table_filter ip6_tables x_tables cpufreq_ondemand acpi_cpufreq dm_mirror dm_multipath dm_mod video sbs ibm_acpi i2c_ec button bay dock battery ac ipv6 parport_pc lp parport loop uinput arc4 ecb blkcipher snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event sdhci snd_seq snd_seq_device hci_usb snd_pcm_oss snd_mixer_oss serio_raw snd_pcm iwl4965 mmc_core snd_timer mac80211 cfg80211 bluetooth snd soundcore pcspkr i2c_i801 snd_page_alloc i2c_core e1000 shpchp sg usb_storage ata_piix ata_generic libata sd_mod scsi_mod ehci_hcd ohci_hcd uhci_hcd
Jul 22 11:06:15 thinkpad kernel: Pid: 3010, comm: modprobe Not tainted 2.6.21.6 #5
Jul 22 11:06:15 thinkpad kernel: RIP: 0010:[<0000000000000000>]  [<0000000000000000>]
Jul 22 11:06:15 thinkpad kernel: RSP: 0000:ffff8101086b1800  EFLAGS: 00010246
Jul 22 11:06:15 thinkpad kernel: RAX: ffffffff8839f920 RBX: ffff810112e4c000 RCX: 0000000000000000
Jul 22 11:06:15 thinkpad kernel: RDX: 0000000000000000 RSI: ffff8101086b1928 RDI: ffff810112e4c538
Jul 22 11:06:15 thinkpad kernel: RBP: ffff810112e4c538 R08: 0000000000000000 R09: ffff810112e4cbc8
Jul 22 11:06:15 thinkpad kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Jul 22 11:06:15 thinkpad kernel: R13: ffff810108732000 R14: ffff8101086b1888 R15: ffff81012b9731e8
Jul 22 11:06:15 thinkpad kernel: FS:  00002b1f9e059250(0000) GS:ffff81010388d740(0000) knlGS:0000000000000000
Jul 22 11:06:15 thinkpad kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 22 11:06:15 thinkpad kernel: CR2: 0000000000000000 CR3: 0000000108736000 CR4: 00000000000006e0
Jul 22 11:06:15 thinkpad kernel: Process modprobe (pid: 3010, threadinfo ffff8101086b0000, task ffff8101086b37c0)
Jul 22 11:06:15 thinkpad kernel: Stack:  ffffffff8805523c 000000000001710e ffffffff803af9d8 ffff81012c67b0c0
Jul 22 11:06:15 thinkpad kernel:  00000010394009a8 0000000000000000 000000000000710e 0000000000000282
Jul 22 11:06:15 thinkpad kernel:  00000000ffffffed ffff81012c67b000 000000000001710e ffff810108732000
Jul 22 11:06:15 thinkpad kernel: Call Trace:
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff8805523c>] :libata:ata_device_add+0x20b/0x4bb
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff803af9d8>] devres_add+0x32/0x45
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff8839d4c6>] :pata_pcmcia:pcmcia_init_one+0x47a/0x4e4
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff803b701e>] pcmcia_device_probe+0xc8/0x127
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff803ac5df>] really_probe+0xc5/0x14a
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff803ac847>] __driver_attach+0x90/0xcd
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff803ac7b7>] __driver_attach+0x0/0xcd
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff803ac7b7>] __driver_attach+0x0/0xcd
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff803aba7f>] bus_for_each_dev+0x43/0x6e
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff803abdc1>] bus_add_driver+0x6b/0x18d
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff8029b8ac>] sys_init_module+0x164a/0x17ac
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff802ab5ac>] audit_syscall_entry+0x141/0x174
Jul 22 11:06:15 thinkpad kernel:  [<ffffffff8025729c>] tracesys+0xdc/0xe1
Jul 22 11:06:15 thinkpad kernel: 
Jul 22 11:06:15 thinkpad kernel: 
Jul 22 11:06:15 thinkpad kernel: Code:  Bad RIP value.
Jul 22 11:06:15 thinkpad kernel: RIP  [<0000000000000000>]
Jul 22 11:06:15 thinkpad kernel:  RSP <ffff8101086b1800>
Jul 22 11:06:15 thinkpad kernel: CR2: 0000000000000000
Jul 22 11:06:19 thinkpad kernel: pccard: card ejected from slot 0
Comment 11 Volker Braun 2007-07-22 11:50:24 UTC
Created attachment 12094 [details]
working patch

I've taken a look at the sources, and not setting port_start clearly yields a NULL dereference since ata_device_add calls ap->ops->port_start(ap);. However, setting port_start/_stop to dummy functions does work, see the accompanying patch. 

With my pata_pcmcia.patch, plugging in my pcmcia compact flash reader works. I successfully read multiple GB of data from 2 different CF cards. 

Is this patch correct for other pcmcia pata devices? It always disables DMA. Now my pcmcia card reader never used DMA, so thats fine, but what about other devices?
Comment 12 Andi Kleen 2007-08-09 09:36:53 UTC
From the oops it looks like dev->dma_mask is NULL

u64             *dma_mask;      /* dma mask (if dma'able device) */

Clearly passing a non DMA able device to dma_alloc_coherent doesn't make sense.

Anyways I can add a check for that and fall back to ISA DMA
in this caxe, but it might be better to fix 
the callers to pass a real device with a dma mask. If you want to do ISA DMA
and don't know your device please pass NULL.
Comment 13 Alan 2007-09-10 09:38:24 UTC
pcmcia no longer tries to do DMA allocs in this case. For the more general case it seems to be an issue for the pcmcia driver layer to fix not libata.
Comment 14 Andrey Zaitsev 2007-09-10 22:44:24 UTC
>>>Any updates on this problem? Andrey, did you get chance to test according to
#6?
>>>Thanks.

Sorry for not answering for a long time. Yes, I've tried suggestions from the post #6 -> nothing changed. For some reasons I had to switch to 32-bit, but now I'm back to 64bit, so I'm at your srvice again.
Comment 15 Andrey Zaitsev 2007-10-19 11:30:07 UTC
Closing this bug - fixed in 2.6.22