Bug 9905 - sdhci module hangs Everex StepNote 2053T
Summary: sdhci module hangs Everex StepNote 2053T
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: MMC/SD (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Alan
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-02-06 14:27 UTC by Leann Ogasawara
Modified: 2012-06-18 21:11 UTC (History)
14 users (show)

See Also:
Kernel Version: 2.6.24
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Version of sdhci.c used in "Run 1" in Comment #9 (41.99 KB, text/plain)
2008-02-16 06:15 UTC, Jennifer Hodgdon
Details
Version of sdhci.c used in Run #2 in Comment #9 (42.02 KB, text/plain)
2008-02-16 06:20 UTC, Jennifer Hodgdon
Details
Kernel log of modprobe run 1 in Comment #9 of Bug 9905 (1.90 KB, text/plain)
2008-02-16 06:22 UTC, Jennifer Hodgdon
Details
Screen shot (photo) of run 1 in Comment #9 of Bug 9905 (111.75 KB, image/jpeg)
2008-02-16 06:23 UTC, Jennifer Hodgdon
Details
Screen shot (photo) of run 2 in Comment #9 of Bug 9905 (77.34 KB, image/jpeg)
2008-02-16 06:24 UTC, Jennifer Hodgdon
Details
Output of lspci -vv (9.75 KB, text/plain)
2008-05-27 07:13 UTC, Jennifer Hodgdon
Details
Boot log (27.53 KB, text/plain)
2008-05-27 07:15 UTC, Jennifer Hodgdon
Details
disassembled DSDT (18.59 KB, application/x-gzip)
2008-09-10 14:16 UTC, pablomme
Details
reassign PCI BARs that conflict with PNP resources (4.81 KB, patch)
2008-09-15 17:05 UTC, Jesse Barnes
Details | Diff
reconstructed dmesg (12.39 KB, application/x-gzip)
2008-09-16 10:33 UTC, pablomme
Details
oops (4.82 KB, patch)
2008-09-16 17:14 UTC, Jesse Barnes
Details | Diff
dmesg (11.65 KB, application/x-gzip)
2008-09-16 18:56 UTC, pablomme
Details
add debug output and catch more cases (1.35 KB, patch)
2008-09-18 15:40 UTC, Jesse Barnes
Details | Diff
dmesg after patch #3 (11.93 KB, application/x-gzip)
2008-09-18 17:17 UTC, pablomme
Details
dmesg of 2.6.35-rc5 with kernel booted with pci=usecrs (33.85 KB, text/plain)
2010-07-20 21:53 UTC, tomas m
Details
report using the everest software under win_PE (194.41 KB, text/plain)
2010-07-21 14:14 UTC, tomas m
Details
BIOS vs Linux vs WinPE resource assignments (2.88 KB, text/plain)
2010-07-21 19:01 UTC, Bjorn Helgaas
Details
dmesg with acpi debug information (31.83 KB, text/plain)
2010-07-21 22:03 UTC, tomas m
Details
dmesg with acpi debug information and kernel built with ACPI_DEBUG (66.20 KB, text/plain)
2010-07-22 00:01 UTC, tomas m
Details
dmesg with pci=use_crs and reserve=ffb0.... (31.25 KB, application/octet-stream)
2010-07-22 18:40 UTC, Arne Fitzenreiter
Details
add PNP resource debug output (2.60 KB, patch)
2010-07-22 22:36 UTC, Bjorn Helgaas
Details | Diff
dmesg with patches and pci=use_crs (32.65 KB, application/octet-stream)
2010-07-24 15:54 UTC, Arne Fitzenreiter
Details
Now with pci=bar... (32.20 KB, application/octet-stream)
2010-07-24 15:57 UTC, Arne Fitzenreiter
Details

Description Leann Ogasawara 2008-02-06 14:27:09 UTC
Latest working kernel version:
Earliest failing kernel version:2.6.20
Distribution: Ubuntu Hardy Alpha
Hardware Environment:
Software Environment:
Problem Description:

Forwarding bug from Ubuntu user who mentioned they also are witnessing this bug in mainline 2.6.24 kernel.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/187671

"On an Everex StepNote 2053T laptop, loading the sdhci module causes the machine to hang completely. SysRq keys don't work, and the only way to reboot is to hold the power switch down.

This is a major problem, because the install disk and/or live CD tries to load the module, so you can't install off a standard Ubuntu disk.The only way to boot up is to blacklist this kernel, or buld a kernel without the sdhci module.

This affects kernels 2.6.20 through 2.6.24; I haven't tested earlier ones. I've found this problem on Ubuntu kernels from Feisty, Gutsy, and Hardy Alpha 3 distributions, as well as straight kernel.org kernels that I built from source (2.6.20, 2.6.22, 2.6.23, and 2.6.24 releases; didn't technically test 2.6.21 but I imagine it's the same). There are no obvious compile options for this module either, and no bootup options.

The card reader on this laptop is apparently an O2 Micro MMC, SD, Memory Stick. Here's the lspci output that seems relevant:

    03:06.2 Generic system peripheral [0805]: O2 Micro, Inc. Integrated MMC/SD Controller [1217:7120] (rev 01)

    03:06.3 Mass storage controller [0180]: O2 Micro, Inc. Integrated MS/xD Controller [1217:7130] (rev 01)

If you need more info, or want someone to test solutions to this bug, be sure to let me know. I'm comfortable applying patches, building kernel.org source, running commands, etc.

The laptop testing page for this laptop (with more info about it) is here: https://wiki.ubuntu.com/LaptopTestingTeam/EverexStepNoteSA2053T "
Comment 1 Jennifer Hodgdon 2008-02-06 14:58:22 UTC
I'm the original Ubuntu reporter. We sent this bug to the kernel mailing list several months back but didn't get any response. Let me know if I can help test any proposed fixes - I have git installed, etc.
Comment 2 Pierre Ossman 2008-02-07 23:08:16 UTC
This is new. Have you tested running without X and checking if there's an oops when the system locks up?

The next step would be to enable MMC_DEBUG and see what the final output lines are. Include "debug" as a parameter to the kernel and it should dump everything to the console.

Also make sure you inform both your laptop vendor and O2 Micro that you are seeing problems because they do not bother to cooperate with the Linux community.
Comment 3 Jennifer Hodgdon 2008-02-08 07:38:52 UTC
When I escape out of X, and run "sudo modprobe sdhci", all I get is a single line saying:

sdhci:slo0: Unknown controller version (16). You may experience problems.

There's no oops. And then the system requires a hard reboot, SysRq doesn't work.

I think I tried MMC_DEBUG on one of the earlier kernels, and didn't get any output, but I'll try again and report back one way or another in a little while. 
Comment 4 Jennifer Hodgdon 2008-02-08 07:51:00 UTC
It turns out I still had CONFIG_MMC_DEBUG in my .config for that build. 

I added debug to the end of the kernel line in grub and booted up again, escaped out of X to a console, and did modprobe sdhci as root, but the output is identical (different time stamp at the beginning of the line is the only difference).

If you have any other ideas for debugging, I'd be glad to try them. I did at some point try adding debug prints into the SDHCI and MMC modules to try and do some debugging myself, but I didn't get anywhere and probably didn't really understand all that was going on (I'm not a device driver programmer)... 

Sorry, I omitted a character in the output line (had to type it in from looking at the other computer screen):

[    90.0464545] sdhci:slot0: Unknown controller version (16). You may experience problems.
Comment 5 Pierre Ossman 2008-02-09 09:58:15 UTC
Annoying. Just to make sure we actually got all the data, you could make one last go after doing "echo 9 > /proc/sysrq-trigger".

The warning about unknown controller version is harmless right now.

I don't have any really good debugging ideas right now. Adding printks are probably the only next step. We need to figure out where it hangs. Since you don't get any more output, it should hang somewhere in the sdhci_probe() function. So start by littering that with printks.
Comment 6 Jennifer Hodgdon 2008-02-10 10:03:52 UTC
Actually, I've done that. When I put all the debugging print statements in before, as I recall (it was 6 months or so ago), I was able to see all of them through the last line of the sdhci_probe function. They were just simple print statements, like "got to this line", but they all printed out up until that function was returning. 

That was when I stopped debugging, because I had very little idea which functions might be the issue then.

Anyway, if you want to send me a C file with tons of debugging statements, I'd be glad to run it and send a photo of the screen when it locks up. Since the computer ends up totally locked up, it's hard to capture the text...

Alternatively, I can try putting in some debug prints in the sdhci_probe function again and see if I can be more specific about "The print statement here did print out before it locked up" (line numbers, etc.). Just let me know -- I'm happy to spend some time on this, especially if it means the problem might get solved and I could use the card reader, boot live CDs, etc. on this laptop. :) 
Comment 7 Pierre Ossman 2008-02-13 09:59:41 UTC
I'm always short on time, so the more self-sufficient you can be, the faster things will get resolved. :)

It should not be possible for sdhci_probe() to finish without some debug messages from the MMC core. Are you sure you had MMC_DEBUG enabled and debug messages were sent to the console when you tested this?
Comment 8 Jennifer Hodgdon 2008-02-13 13:53:42 UTC
I am certain that the MMC_DEBUG kernel option was turned on in my .config when I compiled the kernel, since it is there now, and I haven't changed that option through several kernel versions at this point. I compiled 2.6.24 on Jan 26. 

I am also certain that I booted up with "debug" (no quotes) added to the end of the kernel line in my grub record when I did the most recent test.

I am also certain that when I did "modprobe sdhci" at the console (after booting up with debug and escaping out of X with ctrl-alt-f2 and logging in to a TTY), I got only the line saying "Unknown controller version". When I do "modprobe -n -v sdhci", it tells me it will load both mmc_core and sdhci, by the way (I've blacklisted "sdhci" for normal use so that I can boot up).

Any ideas? I'm not a kernel expert, so I probably did something wrong there... is there any other kernel option I need to turn on to get output in general, or bootup option I need, or option to modprobe? Hopefully I'm not just being dense.

Anyway, although I am not a kernel or device driver expert, I am a C programmer, so I can certainly add the print statements in again (haven't tried this since the 2.6.20 kernel I think) and see what happens. I will probably have time to do that later this week (have to do some work for my pesky clients first).
Comment 9 Jennifer Hodgdon 2008-02-16 06:11:16 UTC
OK, I had some time, so I did at least some rudimentary debugging -- I put a bunch of print statements in the sdhci.c file -- several in the sdhci_probe and sdhci_probe_slot functions, and then one at the beginning and end of each function in the file (I'll attach the file I used shortly). I also changed a bunch of the INFO and WARN level prints in the file to ERR (to make sure I would see them), and in the kernel.h file, changed the debug print macro to use ERR level too (I'm still not sure why I was unable to see WARN level printing in my previous testing, but this took care of that issue!) Then I compiled, and put the new mmc_core.ko and sdhci.ko modules into my kernel location (those appeared to be the only two modules loaded when I did modprobe -v -n sdhci). 

Here's what happened:

Run 1: With all the debug statements enabled, I did "modprobe -v sdhci" at the console and captured the beginning of the output in kernel.log before the computer locked up, and took a photo of the screen after it locked up, but there was a gap in the middle and I couldn't tell whether or not it had definitely finished sdhci_probe before locking. I'll attach the lines from the log and the screen shot (literally a photo of the screen). (I realize it may be possible to do something with a serial connection and get the full text output, but this laptop doesn't have a serial port, so I'm not sure, and anyway I didn't do that.) The last things it did were MMC saying "starting CMD0 arg 00000000 flags 000000c0", which set off a bunch of stuff in the sdhci module, finishing with a call to sdhci_tasklet_finish and then sdhci_set_ios, which both completed, and then I got a linux prompt back and it locked up.

Run 2: I commented out some of the debug statements that were clogging the screen at the end of run 1 and recompiled, this time just copying over the sdhci.ko module (since that's all I had changed). Again, I did "modprobe -v sdchi" at the console. This time there wasn't much in the kernel log, but I did verify in the screen shot (which I'll attach) that the sdhci_probe function finished completely before it locked up (the print statement at the end of sdhci_probe was the last output before the linux prompt came back, and then there was some more console output that looks like 3 calls to set_ios associated with some MMC "clock" output, then the above-mentioned MMC "starting CMD0 arg 00000000 flags 000000c0", with the same result (locked up after getting a linux prompt back).

I'll leave the interpretation to you, since I don't really know anything about device drivers in general or MMC/SDHCI in particular... the module definitely is getting through modprobe and then starting on its regular business before it locks up, though. 

If you can suggest any additional debugging (e.g. attach a new sdhci.c file to try with more print statements, or some suggestions of where I might put some more that would give you more information), please let me know... I think I've done about all I can. I'll attach the files now so you can see what I saw.
Comment 10 Jennifer Hodgdon 2008-02-16 06:15:34 UTC
Created attachment 14866 [details]
Version of sdhci.c used in "Run 1" in Comment #9

This is sdhci.c with a lot of print statements and some WARN/INFO statements changed to ERR, to get a lot of debugging output. See Comment #9 of Bug 9905 for details.
Comment 11 Jennifer Hodgdon 2008-02-16 06:20:00 UTC
Created attachment 14867 [details]
Version of sdhci.c used in Run #2 in Comment #9

This is sdhci.c with a few less print statements and some WARN/INFO statements
changed to ERR, to get a little bit less debugging output. See Comment #9 of Bug 9905 for details.
Comment 12 Jennifer Hodgdon 2008-02-16 06:22:09 UTC
Created attachment 14868 [details]
Kernel log of modprobe run 1 in Comment #9 of Bug 9905

These are the lines created in the kernel log before the computer locked up, in run 1 described in Comment #9 of Bug 9905 -- see also the screen shot I'm about to attach.
Comment 13 Jennifer Hodgdon 2008-02-16 06:23:39 UTC
Created attachment 14869 [details]
Screen shot (photo) of run 1 in Comment #9 of Bug 9905
Comment 14 Jennifer Hodgdon 2008-02-16 06:24:06 UTC
Created attachment 14870 [details]
Screen shot (photo) of run 2 in Comment #9 of Bug 9905
Comment 15 Pierre Ossman 2008-02-25 03:55:18 UTC
In the first photo, it has indeed finished probing. So the fact that it locks up there is extremely odd as the driver will no longer be poking the hardware.

The second photo seems to be incorrect. In the text you say that you get the prompt back, but there is no prompt in the photo. So it's difficult to make any qualified guesses from that.

Since there is some delay to the hang, we need to start taking pieces out and see what it is that provokes the hang. As you do not have a card in the slot, the driver will not send any actual command requests to the controller. But there is a lot of other fiddling going on. Try disabling each of the following and see which one makes the problem go away:

1. LED control. Modify sdhci_activate_led() and sdhci_deactivate_led() to just return early.

2. Reset on each finished request. Comment out the big if-clause in sdhci_tasklet_finish().

3. Assorted hardware fiddling in sdhci_set_ios(). Disable individually each of the different section by commenting them out. (it's sufficient to remove the write calls, just don't forget sdhci_set_clock()).

There is also another way you might pinpoint the problem. You can configure the kernel to add a delay after each printk(). You can find that setting under the debugging options in the kernel config.
Comment 16 Jennifer Hodgdon 2008-02-25 07:55:40 UTC
In the second photo, the prompt is up in the middle of the screen; you are right there is not one at the end. The photo is not "incorrect" -- it's a photo of my actual screen when it locked up on the second run.

Anyway, I will try your debugging suggestions and report back, in the next few days. Thanks for being interested in solving this!  

By the way, I do not think there is an LED associated with this drive on my laptop. At least, there is none in evidence near the drive.
Comment 17 Jennifer Hodgdon 2008-02-26 07:51:45 UTC
I have some more information for you.

I tried your suggestions from Comment #15, and nothing worked -- the laptop still locked up and I didn't get any more information about where.

So I went even more drastic: in *every* function inside sdhci.c (except the sdhci_probe and sdhci_probe_slot functions), I put a return right at the top. And it still locked up. So I am pretty sure the locking up is happening in the mmc_core module, rather than actually happening inside an sdhci.c function. 

Just loading mmc_core with modprobe doesn't cause the lockup, though. You have to load mmc_core and run the modprobe sdhci functionality (which is all I currently have enabled in sdhci) for it to lock up.

So, do you have any debugging hints for mmc_core? I'll try blindly commenting out stuff, adding debug statements, etc. in the meantime, and I'll let you know if I find anything.
Comment 18 Pierre Ossman 2008-03-01 05:47:24 UTC
(In reply to comment #16)
> By the way, I do not think there is an LED associated with this drive on my
> laptop. At least, there is none in evidence near the drive.

That's quite common, but the driver cannot determine this. That's why I wanted you to test removing that code in case they've done some stupid wiring that causes a hang when the LED pin is used.

(In reply to comment #17)
> 
> So I went even more drastic: in *every* function inside sdhci.c (except the
> sdhci_probe and sdhci_probe_slot functions), I put a return right at the top.
> And it still locked up. So I am pretty sure the locking up is happening in
> the
> mmc_core module, rather than actually happening inside an sdhci.c function. 
> 

That's extremely odd. It's just sdhci that pokes hardware, so the mmc core shouldn't be able to lock up your system (unless it corrupts memory).

Since you've still got the probing in there my first guess is that the init sequence is enough to kill the machine.

Try the following:

1. Avoid registering the device with the mmc layer. Comment out mmc_add_host() and mmc_remove_host() in sdhci.c.

2. If the above still hangs, try also modifying sdhci_probe() to believe that sdhci_probe_slot() failed. The effect should be that it powers up the chip, then directly powers it down again.
Comment 19 Jennifer Hodgdon 2008-03-04 11:39:29 UTC
Well, those two suggestions didn't quite work, but I was able to make the problem go away by putting 

return -ENODEV;

right at the top of sdchi_probe_slot. I will try a "binary search" method (move it down to the middle of that function, then the middle of the non-functional section, etc.) to see if I can narrow down the exact offending line. But I guess it's somewhere in there: if sdchi probes the slot fully, even ignoring the successful return value, the computer hangs afterwards. Some progress... maybe we'll track this down after all?
Comment 20 Pierre Ossman 2008-03-04 21:55:01 UTC
So if it still locks up even using method 2, it means that just activating the hardware for a brief period kills the machine.

Pinpointing the problem in the probe routine sounds good, yes. Be aware that a simple return will leak memory and all kinds of reasources. Try using gotos to the cleanup portion in the end of sdhci_probe_slot().
Comment 21 Jennifer Hodgdon 2008-03-05 07:17:26 UTC
Thanks, I figured that out. :) 

I actually have it narrowed down now to the debug routine sdhci_dumpregs that prints all the registers, which is called at the end of sdhci_probe_slot if you have the MMC_DEBUG config option turned on (which I do). With my current (almost completely commented out) version of sdhci.c, which is only basically doing sdhci_probe_slot and then returning, if I remove that call to sdhci_dumpregs, the computer no longer hangs after modprobe. It was an exciting moment, not to have to do a hard reboot! 

So I'm now trying to pinpoint which exact memory read(s) is/are causing the trouble. The host->ioaddr address doesn't look suspicious, and leaving in the earlier read in sdhci_probe_slot that read the version information does not cause the computer to hang.

Presumably, in the non-MMC-debug version of the kernel, similar reads/writes in the non-debug parts of the sdhci routines are causing the problems... ?? anyway, I'll see if I can narrow it down to a particular address or addresses and then give you a full report.
Comment 22 Jennifer Hodgdon 2008-03-05 07:21:52 UTC
By the way, before hanging, the printout is giving me 0x00000000 for a lot of the results in that register dump. Is that normal? Could be partly a result of all the routines I have commented out (if the sdhci module normally would be setting those registers), but it seemed a bit suspicious to me. 
Comment 23 Jennifer Hodgdon 2008-03-05 15:30:55 UTC
OK. I haven't tested all the calls in sdhci_dumpregs, but I've tested a bunch of them. Here are some results:

These reads are OK -- leave these un-commented and the computer doesn't hang after my current attenuated modprobe:

readw(host->ioaddr + SDHCI_HOST_VERSION)
readw(host->ioaddr + SDHCI_BLOCK_SIZE)
readl(host->ioaddr + SDHCI_ARGUMENT)
readl(host->ioaddr + SDHCI_CAPABILITIES)

These reads are not OK -- uncommenting any of these reads will cause the computer to hang after modprobe:

readl(host->ioaddr + SDHCI_DMA_ADDRESS)
readw(host->ioaddr + SDHCI_BLOCK_COUNT)
readw(host->ioaddr + SDHCI_TRANSFER_MODE)

Any thoughts now? This doesn't look all that good...
Comment 24 Pierre Ossman 2008-03-09 08:01:18 UTC
(In reply to comment #21)
> 
> Presumably, in the non-MMC-debug version of the kernel, similar reads/writes
> in
> the non-debug parts of the sdhci routines are causing the problems... ??

Yes. All of those registers are heavily used during normal operation.

(In reply to comment #22)
> By the way, before hanging, the printout is giving me 0x00000000 for a lot of
> the results in that register dump. Is that normal? Could be partly a result
> of
> all the routines I have commented out (if the sdhci module normally would be
> setting those registers), but it seemed a bit suspicious to me. 
> 

It's quite normal for those registers to have value 0. Most things are designed to have 0 be the default value.

(In reply to comment #23)
> 
> These reads are not OK -- uncommenting any of these reads will cause the
> computer to hang after modprobe:
> 
> readl(host->ioaddr + SDHCI_DMA_ADDRESS)
> readw(host->ioaddr + SDHCI_BLOCK_COUNT)
> readw(host->ioaddr + SDHCI_TRANSFER_MODE)
> 
> Any thoughts now? This doesn't look all that good...
> 

This is of course extremely odd. But all of those can be avoided without affecting functionality. So the next step would be to remove the reads of those registers and see if you can get a working controller.

There are two reads for SDHCI_DMA_ADDRESS, two for SDHCI_BLOCK_COUNT and one for SDHCI_TRANSFER_MODE. You can just comment them out in all but one of the cases. For SDHCI_BLOCK_COUNT, you need to compute bytes_xfered as data->blksz * data->blocks for now.
Comment 25 Jennifer Hodgdon 2008-03-13 08:57:45 UTC
In my last report, I had only tested some of the reads in that sdhci_dumpregs function. Now I have tested all of them, and I found some more reads that cause the machine to hang (actually, about half of them do). 

So, here's a list of all the reads in the dummpreg function, along with their hex offsets; whether they are doing a readl, readr, or readb; and whether doing that particular read causes the machine to hang or not:

SDHCI_DMA_ADDRESS = 0x00  L (hang)
SDHCI_BLOCK_SIZE = 0x04  W (ok)
SDHCI_BLOCK_COUNT = 0x06  W (hang)
SDHCI_ARGUMENT = 0x08 L (ok)
SDHCI_TRANSFER_MODE = 0x0C W (hang)
SDHCI_PRESENT_STATE = 0x24 L (ok)
SDHCI_HOST_CONTROL = 0x28 B (hang)
SDHCI_POWER_CONTROL = 0x29 B (ok)
SDHCI_BLOCK_GAP_CONTROL = 0x2A B (hang)
SDHCI_WAKE_UP_CONTROL = 0x2B B (ok)
SDHCI_CLOCK_CONTROL = 0x2C W (hang)
SDHCI_TIMEOUT_CONTROL = 0x2E B (ok)
SDHCI_INT_STATUS = 0x30 L (hang)
SDHCI_INT_ENABLE = 0x34 L (ok)
SDCHI_SIGNAL_ENABLE = 0x38 L (hang)
SDHCI_ACMD12_ERR = 0x3C W (ok)
SDHCI_CAPABILITIES = 0x40 L (ok)
SDHCI_MAX_CURRENT = 0x48 L (hang)
SDHCI_SLOT_INT_STATUS = 0xFC W (hang)
SDHCI_HOST_VERSION = 0xFE W (ok)

I am not seeing a pattern there...

Just for completeness, here is the kernel log output for the sdhci_probe_slot function -- I added a line that prints out the value of sdhci->ioaddr, and replaced all the items marked "hang" above with 0 in the register dump function:

sdhci [sdhci_probe_slot()]: slot 0 at 0xffbfe800, irq 19
sdhci [sdhci_probe_slot()]: IOADDR is 0xf8a7e800
sdhci:slot0: Unknown controller version (16). You may experience problems.
sdhci [sdhci_probe_slot()]: Controller doesn't have DMA capability
sdhci: ============== REGISTER DUMP ==============
sdhci: Sys addr: 0x00000000 | Version:  0x00001010
sdhci: Blk size: 0x00000000 | Blk cnt:  0x00000000
sdhci: Argument: 0x00000000 | Trn mode: 0x00000000
sdhci: Present:  0x01fa0000 | Host ctl: 0x00000000
sdhci: Power:    0x00000000 | Blk gap:  0x00000000
sdhci: Wake-up:  0x00000000 | Clock:    0x00000000
sdhci: Timeout:  0x00000000 | Int stat: 0x00000000
sdhci: Int enab: 0x00000000 | Sig enab: 0x00000000
sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000
sdhci: Caps:     0x038021a1 | Max curr: 0x00000000
sdhci: =============================

Let me know if you still think it is worthwhile to comment out all the lines concerning all of these registers. I should have some time next week to try that, if so.
Comment 26 Jennifer Hodgdon 2008-03-13 09:04:00 UTC
Just one observation/question: host->addr is coming out at 0xffbfe800, and host->ioaddr is coming out at 0xf8a7e800 -- is it normal for host->ioaddr to be *below* host->addr? I don't know much of anything about the SDHCI module... just thought I would point it out in the above output.
Comment 27 Pierre Ossman 2008-03-15 08:37:22 UTC
(In reply to comment #25)
> 
> I am not seeing a pattern there...
> 

I do. It is very close to every other read being a hang. How did you do this testing? Have you tried leaving a single read at a time?

> 
> Let me know if you still think it is worthwhile to comment out all the lines
> concerning all of these registers. I should have some time next week to try
> that, if so.
> 

It would seem that would be entirely insufficient. With so many registers causing a hang, I suspect there is a more general problem.

(In reply to comment #26)
> Just one observation/question: host->addr is coming out at 0xffbfe800, and
> host->ioaddr is coming out at 0xf8a7e800 -- is it normal for host->ioaddr to
> be
> *below* host->addr? I don't know much of anything about the SDHCI module...
> just thought I would point it out in the above output.
> 

I'm not familiar with the mapper algorithms, but this result is not very surprising since the physical address is very close to the 4 GB limit.
Comment 28 Jennifer Hodgdon 2008-03-17 08:16:58 UTC
(In reply to comment #27)
> (In reply to comment #25)
> > 
> > I am not seeing a pattern there...
> > 
> 
> I do. It is very close to every other read being a hang. How did you do this
> testing? Have you tried leaving a single read at a time?

Sorry, I should have explained. I started at the top of the sdhci_dumpregs function, and commented out all of the readl, readw, and readb lines. Then I uncommented one. If the computer didn't hang, I would leave that one uncommented, and then uncomment another one. If it made the computer hang, I would comment it out and uncomment the next one.

So by the end, I had all the ones marked "OK" in the above list uncommented, and all the ones marked "not OK" commented out. So I don't think it's just an "every other read" problem, because I have about half of them running with no problems.
Comment 29 Pierre Ossman 2008-03-24 06:06:05 UTC
(In reply to comment #28)
> 
> So by the end, I had all the ones marked "OK" in the above list uncommented,
> and all the ones marked "not OK" commented out. So I don't think it's just an
> "every other read" problem, because I have about half of them running with no
> problems.
> 

Still, there is a very uncanny pattern there. Could you do some other variants so that we are 100% sure that it is the actual registers, and not just the access pattern that is the problem?
Comment 30 Jennifer Hodgdon 2008-03-25 14:12:17 UTC
(In reply to comment #29)
> Still, there is a very uncanny pattern there. Could you do some other
> variants
> so that we are 100% sure that it is the actual registers, and not just the
> access pattern that is the problem?

Good thought.

So, I took my working list of 10 reads (which doesn't hang the machine), and substituted SDHCI_DMA_ADDRESS (which I had marked as not working) in the place of SDHCI_ARGUMENT (which happens somewhere in the middle of the function, and both are readl calls). 

The machine hung when I did modprobe sdhci.

Just to get one more data point, I then put SDHCI_ARGUMENT back in, and instead put SDHCI_DMA_ADDRESS in the place of SDHCI_INT_ENABLE. 

This time it didn't hang. 

So I guess you are right, that it's something else, not those particular addresses. Still, it's not as simple as "every other read", because I have a list of 10 reads enabled, and substituting SDHCI_DMA_ADDRESS in at one position in the list is OK, and in another one it isn't; with the original list it's OK too. 

I'm more perplexed than before....
Comment 31 Pierre Ossman 2008-03-30 11:38:47 UTC
I share that sentiment. It might be something on the PCI level that is misbehaving. Unfortunately, that's a bit outside of my expertise.

Could you try some different combinations and see if you can figure out exactly what makes things hang? I'll see if I can get someone else to also have a look at this.
Comment 32 Jennifer Hodgdon 2008-04-04 07:41:02 UTC
I don't really know what to try next... it just seems random to me, and each test of some configuration takes quite a while to perform (change code, re-compile module, try modprobe, reboot -- even if the machine doesn't hang, I have to reboot to retry, because rmmod/modprobe after a "successful" modprobe doesn't follow the same code path). 

I don't really understand why *reading* from a particular address could cause the whole machine to hang in the first place. Do you? 
Comment 33 Pierre Ossman 2008-04-05 06:48:02 UTC
(In reply to comment #32)
> I don't really know what to try next... it just seems random to me, and each
> test of some configuration takes quite a while to perform (change code,
> re-compile module, try modprobe, reboot -- even if the machine doesn't hang,
> I
> have to reboot to retry, because rmmod/modprobe after a "successful" modprobe
> doesn't follow the same code path). 

Sorry about that, but the problem is so completely weird that debugging is more or less wild guessing and then testing those guesses. We can hold off a bit until some PCI expert can have a look though.

> 
> I don't really understand why *reading* from a particular address could cause
> the whole machine to hang in the first place. Do you? 
> 

Not the slightest.
Comment 34 Pierre Ossman 2008-04-05 06:49:26 UTC
Greg, could you have a look at this? Looks to me more like some low level PCI problem, than a driver bug.
Comment 35 Pierre Ossman 2008-05-26 03:50:47 UTC
Jesse, could you please have a look at this bug now that you're the new PCI maintainer?
Comment 36 Jesse Barnes 2008-05-26 10:14:08 UTC
This looks like an ugly one...  So we have some sequence of MMIO reads (readl etc.) that can cause the machine to hang?  Normally reads that don't complete should result in a PCI master abort, which should give you all 1s in the read result (0xffffffff or whatever you register size was); the fact that it hangs in this case is strange.

Can you attach the output of 'lspci -vv' from your machine, along with the boot log?  It may be that you have a hardware problem, or that one of the PCI bridges on the way out to this device is misconfigured somehow.  Or there could be an overlap between what the sdhci device decodes and some other device...
Comment 37 Arne Fitzenreiter 2008-05-26 14:51:32 UTC
My Averatec has the same problem. I have seen that windows reconfigure the memory area of the sdhci and the 8139c so i also think there is a configuration problem.

http://bugzilla.kernel.org/show_bug.cgi?id=10231
Comment 38 Jennifer Hodgdon 2008-05-27 07:13:25 UTC
Created attachment 16294 [details]
Output of lspci -vv
Comment 39 Jennifer Hodgdon 2008-05-27 07:15:41 UTC
Created attachment 16295 [details]
Boot log

Hopefully this is what you mean by the "boot log"? I am attaching the output of dmesg after rebooting. I'm currently running the Ubuntu kernel that came with 8.04, which is a 2.6.24 derivative. The sdhci module is, of course, blacklisted (since loading it hangs the system).
Comment 40 Jennifer Hodgdon 2008-05-27 07:17:05 UTC
If there's anything else I can do to help debug this, or any other output you need, let me know... As you can probably tell from above, I am not afraid to try things and make the machine hang... :) 
Comment 41 Pierre Ossman 2008-07-23 07:48:40 UTC
Reassigning to Jesse so he doesn't forget about this bug. ;)
Comment 42 Jesse Barnes 2008-07-23 09:36:39 UTC
Ah, and I *had* forgotten about it; ignorance is bliss. :)

I didn't see Arne's update in #37 and Jennifer's subsequent update of #10231.  It sounds like the 8139 probably decodes a range that overlaps with the SD controller.  If they both respond to MMIO reads the bus could hang, freezing the machine.

I wonder if we could add a quirk to increase the MMIO resource size for 8139?  Question is, how big should it be?

Hm, let's see if the realtek windows driver release notes have anything useful...  Nope.

But working on the assumption that the size is the problem (but assuming 8139 isn't *totally* broken and doesn't decode everything on the bus) we can try increasing the size.  Something like this might work (you may have to correct the device or vendor ID if I got it wrong).

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 338a3f9..3377907 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1381,6 +1381,21 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TOSHIBA_2,
                         PCI_DEVICE_ID_TOSHIBA_TC86C001_IDE,
                         quirk_tc86c001_ide);

+/*
+ * Apparently 8139 chips decode more than their advertised MMIO range.
+ * Increase the size to avoid conflicts.
+ */
+static void __init quirk_8139_mmio_size(struct pci_dev *dev)
+{
+       struct resource *r = &dev->resource[1];
+
+       r->start = 0;
+       r->end = 0xfffff;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_REALTEK,
+                        PCI_DEVICE_ID_REALTEK_8139,
+                        quirk_8139_mmio_size);
+
 static void __devinit quirk_netmos(struct pci_dev *dev)
 {
        unsigned int num_parallel = (dev->subsystem_device & 0xf0) >> 4;
Comment 43 Jennifer Hodgdon 2008-07-24 07:47:49 UTC
Thanks for giving this some attention again!

Those vendor/device IDs look correct to me -- at least, they are used elsewhere in 8139-related driver files. And it sounds like an interesting hypothesis. Since both 8139too and sdhci had issues on this laptop, it would make some sense that one could be causing the other's problem. 

So I'll give this patch a shot in the next day or two. First I have to pull/checkout/build 2.6.26 and make sure that is working on this machine... I haven't built the kernel in a while (Ubuntu flipped the problematic MMIO/PIO 8139too config setting with the release of 8.04, so I can use their kernels' version of 8139too now). (I'm sure you kernel folks would be horrified to use a kernel two versions back and with Ubuntu's mods, but I am just happy to have something that works. :) ) (Of course, it would be even better if the card reader worked too.)
Comment 44 Jennifer Hodgdon 2008-07-26 07:12:52 UTC
Well, I got the 2.6.26 kernel built, booted up to make sure it ran on my laptop, and verified that doing modprobe sdhci still hung the system (no surprise there). 

Then I put in the patch from Comment #42 above, did a complete kernel rebuild from clean (just to make sure), installed that kernel, rebooted, and... unfortunately doing sudo modprobe sdhci still hangs the system.

So... too bad!

Any other ideas?
Comment 45 Jesse Barnes 2008-07-27 17:46:43 UTC
Hm... can you verify that the quirk is being executed and affecting the resource size used by the driver?  Assuming that part is working, you could try making the resource reservation even larger in the quirk...
Comment 46 Jennifer Hodgdon 2008-07-28 07:12:23 UTC
I can try putting in some kind of a kernel log statement to verify it's being executed. 

As far as making it even bigger, can you give me a bit of guidance -- how big can it be?
Comment 47 Jesse Barnes 2008-07-28 09:46:37 UTC
On Monday, July 28, 2008 7:12 am bugme-daemon@bugzilla.kernel.org wrote:
> ------- Comment #46 from yahgrp@poplarware.com  2008-07-28 07:12 -------
> I can try putting in some kind of a kernel log statement to verify it's
> being executed.
>
> As far as making it even bigger, can you give me a bit of guidance -- how
> big can it be?

Oh you can probably make it pretty big (~256M) before you start having 
trouble, depending on how much RAM and many PCI devices you have and how much 
address space they want.  Hopefully the device isn't so broken that you'll 
need a full 256M window reserved for it though.

Jesse
Comment 48 Jennifer Hodgdon 2008-07-29 11:51:22 UTC
OK, I put in a printk statement and verified the quirk function was being run.

Then I tried some different values for r->end in the patch from Comment #42 above:
  - 0xfffff  -- still hangs when I modprobe sdhci
  - 0xfffffff -- still hangs
  - 0xffffffffffff -- compile warning "integer constant is too large for 'long' type" (I'm running/compiling a 32-bit kernel for obscure reasons, and apparently that struct element is an unsigned long)
  - 0xffffffff (max allowed in struct) -- still hangs

So, I guess there is either something wrong with the codes, or the idea doesn't work. I'll reopen the bug... 
Comment 49 Jennifer Hodgdon 2008-07-29 12:00:04 UTC
By the way, it also looks like with the quirk resource -> end set to 0xffffffff, the 8139too module is not working. At least, I don't seem to have any network interface coming up. I haven't investigated further... 
Comment 50 Jesse Barnes 2008-07-29 12:45:35 UTC
Yeah, setting it to take that much space will just fail (or eat *all* of your address space :)...

Hm, too bad that idea didn't work.  I guess we need another type of quirk that forces 8139 into PIO mode.  Looks like the driver already has one check, maybe we can just add another like so?

diff --git a/drivers/net/8139too.c b/drivers/net/8139too.c
index 8a5b0d2..2dfccd0 100644
--- a/drivers/net/8139too.c
+++ b/drivers/net/8139too.c
@@ -953,8 +953,10 @@ static int __devinit rtl8139_init_one (struct pci_dev *pdev,

        if (pdev->vendor == PCI_VENDOR_ID_REALTEK &&
            pdev->device == PCI_DEVICE_ID_REALTEK_8139 &&
-           pdev->subsystem_vendor == PCI_VENDOR_ID_ATHEROS &&
-           pdev->subsystem_device == PCI_DEVICE_ID_REALTEK_8139) {
+           ((pdev->subsystem_vendor == PCI_VENDOR_ID_ATHEROS &&
+             pdev->subsystem_device == PCI_DEVICE_ID_REALTEK_8139) ||
+            (pdev->subsystem_vendor == 0x14ff &&
+             pdev->subsystem_device == 0xa003))) {
                printk(KERN_INFO "8139too: OQO Model 2 detected. Forcing PIO\n");
                use_io = 1;
        }
Comment 51 Jennifer Hodgdon 2008-07-29 15:06:59 UTC
Are you aware that on this particular laptop, compiling the 8139too module without
   CONFIG_8139TOO_PIO=y
in the .config makes it so that loading the 8139too module causes the laptop to hang? So I am already compiling with that flag set. It seems as though that should be forcing it into PIO mode? Not sure, I really don't know much about device drivers (actually, I have no idea what PIO mode means). 

Let's see. Looking at the 8139too.c file, this config option sets a flag
    #define USE_IO_OPS 1
up towards the top of the file, which causes several things to happen farther down.... Is that the same thing as what your patch would do? The patch you suggested doesn't seem to apply to the 8139too.c file I have anyway. Which version are you patching against? I am building 2.6.26 currently. The context of the patched lines isn't right... those lines don't exist in my file, and the use_io variable doesn't exist.
Comment 52 Jennifer Hodgdon 2008-07-29 15:28:52 UTC
Someone just emailed me privately about this bug and explained what PIO and MMIO mean... thanks!

Also he had the suggestion that maybe both the 8139too in MMIO mode and the sdhci module are interfering with some 3rd device, rather than with each other. Is that possible? 
Comment 53 Jesse Barnes 2008-07-29 18:10:55 UTC
Oh, 10231 made me think that it was 8139's MMIO usage that was causing trouble, I missed that Ubuntu already disabled 8139 MMIO.

So yeah it's probably a more general MMIO bug on this platform.

Pierre, is there a way of running SDHCI in PIO only mode?  I wonder what Windows does on this platform to avoid this problem, maybe there's some special bridge programming needed.
Comment 54 Jennifer Hodgdon 2008-07-30 08:14:04 UTC
Yes, the Ubuntu team disabled 8139 MMIO because of this laptop. Someone had filed a bug, and either I or someone else (I don't recall) pointed out what the solution was, and they flipped the config bit, which enabled all of us poor souls who own one of these laptops to thereby update the kernel using the standard end-user upgrade path. As far as I know, Ubuntu is currently the only distro that can be installed from one of its install CDs on this laptop (with some shenanigans to blacklist SDHCI on initial bootup). (Maybe Gentoo would be an exception, as I think you configure as you go? I haven't tried that route.) All other install CDs we tried last year when I first got this laptop (and we tried a LOT of distros of many flavors) hang during initial bootup. It was only because some guy in Germany who owned a clone of this laptop had created an alternate Ubuntu install CD (without the SDHCI module and with 8139too set to PIO) that I was even able to get linux running on it, and that's why I am using Ubuntu.

Anyway, all that aside, I like the idea of making the SDHCI use PIO, if it's possible, since that was the solution to the 8139 problem that got this laptop running in the first place.

By the way, I have read somewhere that the 2.6.27 kernel has some MMIO debugging built into it. (I am still compiling/running 2.6.26.) Do you think it would be helpful if I pulled 2.6.27 RC1 (or whatever the latest RC is) and tried it out? Would it be likely to give us some useful information on the SDHCI problems? If so, can either Jesse or Pierre give me any suggestions on config options I would need to set in order to get this useful information?
Comment 55 tomas m 2008-08-27 10:59:38 UTC
im adding myself since im interested in the outcomes of this discussion...
Comment 56 Jesse Barnes 2008-08-27 11:57:29 UTC
I'm not sure about the MMIO debugging stuff; it might help but our best bet would be to get some info about the twinhead multifunction device from the vendor.  Sounds like there are some workarounds we're missing.  I've pinged the folks at O2 Micro, we'll see if the get back to us...
Comment 57 Arne Fitzenreiter 2008-09-08 13:17:11 UTC
I think in the laptop is an other device that also use the MMIO Address Area:
ffbfe800-ffbfecff of maybee also to ffbfefff

I have added a hack that move nic and cardreader out of this area and both are working
http://bugzilla.kernel.org/show_bug.cgi?id=10231

Can you made correct patch to blacklist this area on the H12Y, to change the mmio size of the chips to force linux to move it is not a good way.
Comment 58 tomas m 2008-09-08 18:48:12 UTC
arne, thanks for this. ive tried to apply the patch, and it wouldnt do it with patch -p1 < quirks-h12y.patch, it fails to apply it NOT finding the file.
im no guru on the matter, but i had to manually add the lines to /drivers/pci/quirks.c
am i doing something wrong?
Comment 59 pablomme 2008-09-10 12:06:38 UTC
Arne's patch (which I applied on Ubuntu's kernel 2.6.24-21.42) seems to work on my laptop. Tomas, you need to chdir into drivers/pci and apply a 'patch -p0 < quirks-h12y.patch' there.
Comment 60 Jesse Barnes 2008-09-10 13:25:05 UTC
Yeah Arne's patch addresses the root problem much better than the hack I posted.  I wonder if there's an ACPI table on this machine that would tell us about these hidden resources?  Either way we can push the Arne's quirk upstream if he spins a new one with some comments and sends it to jbarnes@virtuousgeek.org.

thanks,
Jesse
Comment 61 pablomme 2008-09-10 14:16:27 UTC
Created attachment 17716 [details]
disassembled DSDT

I have no idea about this, but I have attached what I think is the ACPI table for my machine (/proc/acpi/dsdt, disassembled with iasl and gzipped), which somebody else may be able to interpret.
Comment 62 Jesse Barnes 2008-09-12 16:46:38 UTC
Hm, looks like those devices overlap with one of the PNP resources:
                Device (RMSC)                                                   
                {                                                               
                    Name (_HID, EisaId ("PNP0C02"))                             
                    Name (_UID, 0x10)                                           
                    Name (CRS, ResourceTemplate ()                              
                    {                                                           
                        IO (Decode16,                                           
                            0x0010,             // Range Minimum                
                            0x0010,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x10,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x0022,             // Range Minimum                
                            0x0022,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x1E,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x0044,             // Range Minimum                
                            0x0044,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x1C,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x0063,             // Range Minimum                
                            0x0063,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x01,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x0065,             // Range Minimum                
                            0x0065,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x01,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x0067,             // Range Minimum                
                            0x0067,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x09,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x0072,             // Range Minimum                
                            0x0072,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x0E,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x0080,             // Range Minimum                
                            0x0080,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x01,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x0084,             // Range Minimum                
                            0x0084,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x03,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x0088,             // Range Minimum                
                            0x0088,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x01,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x008C,             // Range Minimum                
                            0x008C,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x03,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x0090,             // Range Minimum                
                            0x0090,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x10,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x00A2,             // Range Minimum                
                            0x00A2,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x1E,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x00E0,             // Range Minimum                
                            0x00E0,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x10,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x04D0,             // Range Minimum                
                            0x04D0,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x02,               // Length                       
                            )                                                   
                        IO (Decode16,                                           
                            0x0000,             // Range Minimum                
                            0x0000,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x00,               // Length                       
                            _Y0C)                                               
                        IO (Decode16,                                           
                            0x0000,             // Range Minimum                
                            0x0000,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x00,               // Length                       
                            _Y0D)                                               
                        IO (Decode16,                                           
                            0x0000,             // Range Minimum                
                            0x0000,             // Range Maximum                
                            0x00,               // Alignment                    
                            0x00,               // Length                       
                            _Y0E)                                               
                        Memory32Fixed (ReadWrite,                               
                            0xFED1C000,         // Address Base                 
                            0x00004000,         // Address Length               
                            )                                                   
                        Memory32Fixed (ReadWrite,                               
                            0xFED20000,         // Address Base                 
                            0x00070000,         // Address Length               
                            )                                                   
                        Memory32Fixed (ReadWrite,                               
                            0xFFB00000,         // Address Base                 
                            0x00100000,         // Address Length               
                            _Y0A)                                               
                        Memory32Fixed (ReadWrite,                               
                            0xFFF00000,         // Address Base                 
                            0x00100000,         // Address Length               
                            _Y0B)                                               
                    })                                                          
                    Method (_CRS, 0, NotSerialized)                             
                    {                                                           
                        CreateDWordField (CRS, \_SB.PCI0.SBRG.RMSC._Y0A._LEN, SML1)                                                                             
                        CreateDWordField (CRS, \_SB.PCI0.SBRG.RMSC._Y0A._BAS, SMB1)                                                                             
                        CreateDWordField (CRS, \_SB.PCI0.SBRG.RMSC._Y0B._LEN, HCTL)                                                                             
                        CreateDWordField (CRS, \_SB.PCI0.SBRG.RMSC._Y0B._BAS, HCTB)                                                                             
                        Store (0xFFB00000, SMB1)                                
                        Store (0x00100000, SML1)                                
                        Store (0xFFF00000, HCTB)                                
                        Store (0x00100000, HCTL)                                
                        CreateWordField (CRS, \_SB.PCI0.SBRG.RMSC._Y0C._MIN, GP00)                                                                              
                        CreateWordField (CRS, \_SB.PCI0.SBRG.RMSC._Y0C._MAX, GP01)                                                                              
                        CreateByteField (CRS, \_SB.PCI0.SBRG.RMSC._Y0C._LEN, GP0L)                                                                              
                        Store (PMBS, GP00)                                      
                        Store (PMBS, GP01)                                      
                        Store (PMLN, GP0L)                                      
                        If (SMBS)                                               
                        {                                                       
                            CreateWordField (CRS, \_SB.PCI0.SBRG.RMSC._Y0D._MIN, GP10)                                                                          
                            CreateWordField (CRS, \_SB.PCI0.SBRG.RMSC._Y0D._MAX, GP11)                                                                          
                            CreateByteField (CRS, \_SB.PCI0.SBRG.RMSC._Y0D._LEN, GP1L)
                            Store (SMBS, GP10)
                            Store (SMBS, GP11)
                            Store (SMBL, GP1L)
                        }

                        If (GPBS)
                        {
                            CreateWordField (CRS, \_SB.PCI0.SBRG.RMSC._Y0E._MIN, GP20)
                            CreateWordField (CRS, \_SB.PCI0.SBRG.RMSC._Y0E._MAX, GP21)
                            CreateByteField (CRS, \_SB.PCI0.SBRG.RMSC._Y0E._LEN, GP2L)
                            Store (GPBS, GP20)
                            Store (GPBS, GP21)
                            Store (GPLN, GP2L)
                        }

                        Return (CRS)
                    }
                }
            }

If this really is a device with registers, we shouldn't be assigning PCI devices to that region; maybe this is really an ACPI problem?
Comment 63 pablomme 2008-09-13 05:18:53 UTC
If I understand correctly, you're saying that the 0xFFB00000-0xFFBFFFFF range reserved in the third Memory32Fixed clashes with the MMIO range used for the card reader and nic. Is this it?

Is this setup (both the PNP and the MMIO reservation) suggested by the hardware, or is the kernel involved in any way? I assume it's the former, which is an interesting design flaw in this chipset..

Would the kernel conceivably be able to spot and solve clashes like this one automatically by looking at the ACPI table? This would fix any similar problems that may occur in the future. (This is a naive suggestion - it sounds feasible but it may be too complicated and/or of too little relevance to actually implement such a thing.)

Anyway, in the light of this, is Arne's patch the optimal solution?
Comment 64 Arne Fitzenreiter 2008-09-13 05:32:07 UTC
Yes this is the area. 
I think at first it is a bios bug because the bios also assign this addresses to the devices. I have checked this with DOS and some tools to read the pci registers.

In windows this area is reserved as "Motherboard resources" and windows move all devices out of it.

I think my patch work, but it is not the optimal solution, because it is possibe that the next device that you put into the express card slot get again this address.
Comment 65 pablomme 2008-09-13 06:38:46 UTC
> I think at first it is a bios bug because the bios also assign this addresses
> to the devices.

Perhaps there's a BIOS update that fixes this? My laptop comes with some R1.00 version. For the Twinhead H12Y model there's R1.04 and R1.08 at http://www.twinhead.com.tw/download.aspx . None of the other manufacturers have BIOS updates that I can find. I'm not trying the Twinhead updates in case I end up without a working BIOS, and anyway I don't have Windows anymore - but has anyone tried flashing the BIOS?

Back to the main point - can that address area be reserved by the Linux kernel, as a quirk? That would solve the issue more robustly, wouldn't it?
Comment 66 Jesse Barnes 2008-09-13 10:40:32 UTC
Yeah, I think the best solution would be for Linux to detect these PNP resources and reserve space for them.  They're fixed, so Linux should move any overlapping PCI devices to another region.  I haven't dug into the pnp driver enough to see if there's an easy way to do this; it might be that we should add some code to ACPI to reserve these regions early on...
Comment 67 pablomme 2008-09-13 19:43:38 UTC
So it seems there are three possible solutions, ordered by increasing correctness/difficulty:

1) Arne's current patch -- this solves the original issue but still leaves the possibility of something going wrong if an additional PCI[e] device is added.

2) Reserving the memory regions indicated in the DSDT above for this particular hardware (a quirk). This would solve all possible PCI issues on this machine, but not others.

3) Teaching the kernel to handle contradictory BIOS information by reserving memory for fixed resources so that relocatable ones don't overlap with them.

Now, option 1 we already have. Option 2 doesn't sound much more difficult than option 1. Option 3 is neat, but it probably takes quite a bit of work and expertise to implement, and anyway I don't know how often BIOSes suggest silly setups capable of hanging the computer.. I guess that it's either option 2 or 3 that would be the 'final' fix for this.

It may not be wise to include a provisional fix (be it 1 or 2) in the mainline kernel. However, especially if the solution is going to take a while to develop, I would be quite happy to see a provisional fix (be it 1 or 2) downstream in the Ubuntu kernel where this bug originated so that it ships with 8.10 (due 30-Oct if I remember correctly) and makes a few people happy.

Any thoughts on this?

BTW, I would help with the coding stuff, but I don't even code (in C, that is). Sorry about that.
Comment 68 Jesse Barnes 2008-09-15 17:05:49 UTC
Created attachment 17796 [details]
reassign PCI BARs that conflict with PNP resources

This patch attempts to detect PNP resource conflicts, forcing PCI to reallocate things.  I'm not sure if the quirks will run in the right order though, anyone care to test?
Comment 69 pablomme 2008-09-16 10:33:20 UTC
Created attachment 17810 [details]
reconstructed dmesg

The patch tries to do something, but it ends up disabling most of the hardware, so I end up on a busybox terminal without access to the hard drive or network.

To reconstruct dmesg I compiled the 2.6.27-3 Ubuntu kernel, booted into it, stored the dmesg, compiled the patched kernel, booted into it, ran 'dmesg | more' in busybox, compared visually with the previous log on my desktop machine and wrote the differences by hand. I've used '>>>>' and '<<<<' to flag additions and removals, emulating some kind of 'diff orig.log patched.log'. I hope the notation is understandable.

The point at which the patched kernel log ends is flagged with a line. I may have made typos or left something out, but I hope it's nothing critical.
Comment 70 Jesse Barnes 2008-09-16 13:34:12 UTC
Yeah that helps.  Looks like my patch is a bit too naive; I think I need to check what kind of PNP resource we're comparing against and only worry about non-aperture ones (or something, I'll talk with Bjorn).
Comment 71 Jesse Barnes 2008-09-16 17:14:13 UTC
Created attachment 17819 [details]
oops

Last one was extra bad.  This one might work a little better, but I still think we have to limit it to just pnp0c02 possibly.
Comment 72 pablomme 2008-09-16 18:56:43 UTC
Created attachment 17826 [details]
dmesg

Indeed it works better. This time it boots and the card reader is working. However the patch kills all USB ports and the CD drive, plus something called 'ata_piix' and I don't know if anything else. I've attached the dmesg, where I've put an asterisk at those lines which (I think) flag the new issues.

Minor thing: in the patch there are modifications to drivers/gpu/drm/i915/i915_irq.c which seem to be unrelated to this bug. Did you include these intentionally, or can I leave them out?
Comment 73 Jesse Barnes 2008-09-17 10:27:52 UTC
Oh no the i915 changes were a mistake, sorry about that.  I'll take a look at the dmesg and see if I can figure out what's going on (we're likely forcing a BAR reallocation that shouldn't really happen).
Comment 74 Jesse Barnes 2008-09-18 15:40:53 UTC
Created attachment 17870 [details]
add debug output and catch more cases

I'm not sure why the I/O port stuff fails yet but this patch should at least give us some more debug output.  I think it's also more correct, since before we wouldn't catch overlapping regions and reassign the BARs appropriately.
Comment 75 pablomme 2008-09-18 17:17:49 UTC
Created attachment 17871 [details]
dmesg after patch #3

Attached is the dmesg I get with this patch. Nothing seems to have changed in terms of what works and what doesn't.
Comment 76 Jennifer Hodgdon 2008-10-02 10:57:38 UTC
Wow, I have been out on vacation for a number of weeks -- glad Pablomme was around to test things! It looks like there isn't anything I should do right now to test... but I'm back in action and ready to help if there is anything I can do.
    --Jennifer
Comment 77 pablomme 2008-10-03 02:30:27 UTC
Jesse, any progress on this? If you deem it appropriate, I could suggest Arne's patch to be added to Ubuntu's kernel for the 8.10 release while you figure out a better solution (although we've just missed the beta, don't know if it'll be accepted).
Comment 78 Jesse Barnes 2008-10-03 08:50:06 UTC
Including Arne's patch in Ubuntu is a good idea; hopefully we can bang the PNP code into shape to avoid problems like this more generally, but in the meantime Arne's patch is a good workaround.  I'll ping Bjorn again about how we might fix this properly.
Comment 79 pablomme 2008-10-03 11:25:51 UTC
Arne, do you want to attach your latest patch to the launchpad bug ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/187671 ) and comment a bit on the situation upstream?

If you do, please do mention that your patch makes the one forcing the 8139too module into PIO mode (associated with the bug at
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/90271 ) obsolete.

Or let us know if you want somebody else to submit it.

Thanks!
Comment 80 pablomme 2008-11-21 09:59:45 UTC
Hi Jesse - have you made any progress on this one?

We didn't get to convince the Ubuntu kernel maintainers to include Arne's patch into 8.10, which is a pity.. Anyway, if you come up with new trial patches let me know, I'm still up for testing.
Comment 81 pablomme 2009-02-10 09:10:33 UTC
I was thinking, would any machine other than these laptops benefit from the general solution? If not, the solution is maybe not worth the effort. Arne's patch could be included in the kernel directly and the bug could be closed.
Comment 82 Arne Fitzenreiter 2009-02-10 11:25:46 UTC
Sorry. It should not added. My patch trigger an other problem. If you try to use the patched kernel with a pcmcia/yenta socket and you insert a Realtek 8139C PCMCIA card you get a kernel panic at the "quirk". Also if the quirk do nothing becuase the vendor id not match.

We work on the problem but i have not much time at the moment.
Comment 83 Pierre Ossman 2009-02-26 09:34:20 UTC
(In reply to comment #53)
> 
> Pierre, is there a way of running SDHCI in PIO only mode?
>

This is probably no longer relevant, but no you cannot operate SDHCI with just PIO. There is no ioport space for the device. :)

(In reply to comment #56)
> I've pinged the folks at O2 Micro, we'll see if the get back to us...
> 

If you have some contacts there, some erratas on their controllers would be nice. :)
Comment 84 Andrew Morton 2009-06-25 03:28:44 UTC
kernel-resourcec-fix-sign-extension-in-reserve_setup.patch should address this.
See http://bugzilla.kernel.org/show_bug.cgi?id=13253
Comment 85 tomas m 2009-06-25 13:23:00 UTC
ive proposed, and was approved, arne's patch to the archlinux developers. is the reserve option the prefered solution? im asking cause ive been following the issue closely and havent read, from the arch comunity any complains concerning this patch

should i sugest they remove the patch, and start using the reserve method? what are the cons/pros?
Comment 86 Arne Fitzenreiter 2009-06-25 13:42:56 UTC
As I have already written in Comment #82 my patch crash the kernel if a very common Realtek 8139 Cardbus Lancard is inserted. 

You should prefere the reserve option. But at the moment this option doesn't work on 64bit kernel without the patch from Comment #86.


The patch from Comment #84 doesn't address the main problem. It fix only the misinterpreted "reserve" parameter from the "work around" on 64bit kernels

Can anyone provide informations how i can auto add such "reserve" if a twinhead system was detected.
Comment 87 tomas m 2009-07-11 00:15:04 UTC
when using the reserve method, 8139too and sdhci_pci fail to reserve the memory region and dont load any device. is this the expected behaviour?

using kernel 2.6.31-rc2-git5 vanilla.
Comment 88 Arne Fitzenreiter 2009-07-11 23:10:45 UTC
No. Reserving the memory should force the kernel to reconfigure the devices to use an other memory area.

At my system the modules will load and work. (sdhci_pci and also 8139too)
But i have not tested this with kernel versions higher than 2.6.29
Comment 89 tomas m 2009-07-12 17:49:05 UTC
hmmm, is there any config setting in the kernel that would trigger this behaviour?

----
sdhci: Secure Digital Host Controller Interface driver
sdhci: Copyright(c) Pierre Ossman
sdhci-pci 0000:03:06.2: SDHCI controller found [1217:7120] (rev 1)
sdhci-pci 0000:03:06.2: PCI INT A -> GSI 19 (level, low) -> IRQ 19
sdhci-pci 0000:03:06.2: BAR 0: can't reserve mem region [0xffbfe800-0xffbfe8ff]
sdhci-pci 0000:03:06.2: cannot request region
sdhci-pci 0000:03:06.2: PCI INT A disabled
sdhci-pci: probe of 0000:03:06.2 failed with error -16
----
Comment 90 Bjorn Helgaas 2010-07-20 18:38:34 UTC
I'd like to look at this problem again.  If anybody can collect this information, it would be helpful:

  1) A complete dmesg log from a current upstream kernel, e.g., 2.6.35-rc5
     or newer, with the "pci=use_crs" kernel argument.

  2) A Windows system information report, such as the hardware-related pages
     from Everest (http://www.lavalys.com/products/everest-pc-diagnostics).
     The trial version of Everest is free, and I think it produces enough
     information for what we need.  It might be possible to run this using
     Windows PE (http://en.wikipedia.org/wiki/Windows_Preinstallation_Environment),
     which is apparently available free of charge and should be bootable
     from a DVD or a USB flash drive.
Comment 91 tomas m 2010-07-20 18:47:14 UTC
i will try to prepare this information tomorrow. checking the windows requirements right now since i dont have it installed..

mind you the kernel requires to be booted with the reserve= boot parameter.
Comment 92 tomas m 2010-07-20 21:53:43 UTC
Created attachment 27174 [details]
dmesg of 2.6.35-rc5 with kernel booted with pci=usecrs
Comment 93 tomas m 2010-07-21 14:14:09 UTC
Created attachment 27184 [details]
report using the everest software under win_PE
Comment 94 Bjorn Helgaas 2010-07-21 19:01:14 UTC
Created attachment 27189 [details]
BIOS vs Linux vs WinPE resource assignments

Thank you Tomas!  This attachment shows the resource assignments made by
BIOS and the changes Linux and WinPE made.

With "pci=use_crs", Linux moved all the devices into host bridge apertures.
I think this kernel will probably boot without the "reserve=" parameter,
as long as you do use "pci=use_crs".

In some cases, WinPE also moved devices into apertures.  I'm puzzled by
the cases where it did not, e.g., 03:06.2 and .3.  Those BARs are right
next to the 03:06.0 BARs, which WinPE *did* move, and they appear to conflict
with some ACPI device resources.
Comment 95 tomas m 2010-07-21 19:07:53 UTC
(In reply to comment #94)
> Created an attachment (id=27189) [details]
> BIOS vs Linux vs WinPE resource assignments
> 
> Thank you Tomas!  This attachment shows the resource assignments made by
> BIOS and the changes Linux and WinPE made.
> 
no, thank you for taking the time ;)

> With "pci=use_crs", Linux moved all the devices into host bridge apertures.
> I think this kernel will probably boot without the "reserve=" parameter,
> as long as you do use "pci=use_crs".
> 

i will test this as soon as i get home.


> In some cases, WinPE also moved devices into apertures.  I'm puzzled by
> the cases where it did not, e.g., 03:06.2 and .3.  Those BARs are right
> next to the 03:06.0 BARs, which WinPE *did* move, and they appear to conflict
> with some ACPI device resources.


other than stepping on someone else's resources i dont know what this all means. is this common in the BIOS world? how is it handled under linux? and what should i do next now that we know whats happening?

thanks again for the effort ;)
Comment 96 tomas m 2010-07-21 19:48:51 UTC
> In some cases, WinPE also moved devices into apertures.  I'm puzzled by
> the cases where it did not, e.g., 03:06.2 and .3.  Those BARs are right
> next to the 03:06.0 BARs, which WinPE *did* move, and they appear to conflict
> with some ACPI device resources.


03:06.2 is the SD Host controller. which didnt work with win_PE (no driver installed?). i dont know who is in charge of moving things around in windows, but it might be the driver.

03:06.3 is the ms/xD/SM controller, which i guess, wasnt working either (same slot in the notebook as 03:06.2). this was not tested, i did test with a SD card though. wasnt recognized.
Comment 97 Bjorn Helgaas 2010-07-21 21:31:35 UTC
I'm curious about the devices Windows did not move because I wonder if
there's something we should learn from that.  There's enough room in the
host bridge apertures for all devices, and Linux put them all inside the
apertures, which should work.

There are four devices Linux moved but Windows did not:

  00:02.0: reg 10: [mem 0xffe80000-0xffefffff]
  00:02.0: reg 1c: [mem 0xffe40000-0xffe7ffff]
  00:02.1: reg 10: [mem 0xffd80000-0xffdfffff]
  03:06.2: reg 10: [mem 0xffbfe800-0xffbfe8ff]
  03:06.3: reg 10: [mem 0xffbff000-0xffbfffff]

The 00:02 devices are VGA-related.  I can imagine an exception along the
lines of "we *know* VGA works because the BIOS used it, so don't touch it."

Without considering any device hierarchy, the 03:06 devices appear to
conflict with these ACPI devices:

  PNP0C01: [mem 0xfec00000-0xffffffff]
  PNP0C02: [mem 0xffb00000-0xffbfffff]

Tomas, could you turn on CONFIG_ACPI_DEBUG, boot with
"acpi.debug_layer=0x00010000 acpi.debug_level=0x00000004", and attach another
log?  Linux PNP currently throws away the hierarchy information, but maybe we
should be paying attention to it.

If anybody has a Windows installation where the SD host controller and/or the
MS/xD controller are working, I'd like to know what resources they are using
(e.g., from Everest or the Device Manager), and how those compare to Linux
(e.g., the output of "dmesg | grep 03:06").  Linux does device resource
reassignment eagerly, as soon as we discover the device, but it's conceivable
that Windows does it lazily, only when a driver claims the device.
Comment 98 tomas m 2010-07-21 21:56:27 UTC
booting with pci=use_crs (with no reserve= / acpi_sleep=nonvs ) works.

but suspending doesnt (as reported in bug 16396)
Comment 99 tomas m 2010-07-21 22:03:34 UTC
Created attachment 27192 [details]
dmesg with acpi debug information
Comment 100 tomas m 2010-07-22 00:01:41 UTC
Created attachment 27193 [details]
dmesg with acpi debug information and kernel built with ACPI_DEBUG

sorry, i missunderstood your post. heres a new dmesg. hope its got what you need
Comment 101 Bjorn Helgaas 2010-07-22 04:04:24 UTC
I think the best fix for this would be to turn on "pci=use_crs", either
just on this machine with a DMI quirk, or across the board.  There are
still some known issues that make me hesitant to do it across the board
yet.  In the meantime, we can specify "pci=use_crs" manually as a workaround.

On a bit of a tangent, the suspend/resume bug 16396 affects this same machine.
There's a patch for that bug, but it still requires a kernel boot option, which
is never ideal.  If anybody has Windows on this machine and can determine whether
suspend/resume works properly with Windows, we might be able to find a clue
that will help us fix Linux.

The Linux problem is apparently related to the 03:04.0 (8139too) and 03:06.2 
(sdhci) devices, so knowing how Windows configures those devices and the PCI
bridge leading to bus 03 would be useful.
Comment 102 tomas m 2010-07-22 10:52:04 UTC
since it seems im the only interested left in this bug report, i guess i will have to install the preloaded software that came with the notebook. 

i will try to get a hold of an extra sata drive and borg this thing a bit (im fond of my linux partitions ;) )

please in the meantime, could you propose test cases for the windows install? what information is needed (other than the everest pages).

i havent tested pci=use_crs + acpi_sleep=nonvs yet..but i suppose it will work. will report back if it does not.

one last question: whats the difference between the reserve= and pci=use_crs ?
which one should be the sane default?
Comment 103 pablomme 2010-07-22 11:07:41 UTC
> since it seems im the only interested left in this bug report

Unfortunately I don't have this laptop at hand, I gave it to my sister when I bought my current netbook. But, although I can't help, I'm interested in the outcome.
Comment 104 Bjorn Helgaas 2010-07-22 15:28:01 UTC
For this problem (machine hangs when loading sdhci), I think "pci=use_crs"
is better than "reserve=".  With "reserve=" you have to figure out which
addresses to reserve, which depends on the machine.  "pci=use_crs" is
generic and in theory, it should work anywhere.  We currently turn it on
automatically for all machines with BIOS dates of 2008 or later.

I think we could fix some old issues, like this one, if we made "pci=use_crs"
the default everywhere, but as I mentioned, we still have a couple known
issues that make that risky.  For example, bug 16228 is a system that only
works if we turn "pci=use_crs" OFF.  I think that issue is caused by the
fact that Linux will reassign a PCI device to an address marked "reserved"
in the E820 memory map (see bug 16228 comment 8).  In my opinion, this is
a serious defect in the way Linux handles "reserved" areas. but it's not
trivial to fix.

If anybody collects information about suspend/resume under Windows, please
attach it to bug 16396, not here, so we can keep these issues separate.
I only posted the request here because this bug has more people on the CC:
list who might be able to help investigate it.  If you're interested in the
suspend/resume problem, please subscribe to bug 16396.

As far as Windows test cases, I'm interested in:
  - Does suspend/resume work?
  - Does hibernate/resume work?
  - What resources are assigned to the devices on bus 03?
  - Do the devices on bus 03 work?  Do they require updated
    drivers from the OEM to make them work?
  - Do the bus 03 resources differ depending on whether the drivers
    are loaded?  I think you might be able to use Windows "safe" mode
    to prevent automatic driver loading.
Comment 105 Arne Fitzenreiter 2010-07-22 18:40:53 UTC
Created attachment 27209 [details]
dmesg with pci=use_crs and reserve=ffb0....

Hi Björn,

nice to hear again from you. I cannot confirm that pci=use_crs works on my Identical Averatec 2400. The boot still crash at 8139too in mmio mode.
Comment 106 Arne Fitzenreiter 2010-07-22 19:00:01 UTC
In Windows XP all devices are working with the normal drivers. But if the drives are not loaded Windows show a resourceconflict. (I not use standby or hibernation)

eg. For the 8139 Memoryarea: FFBFEC00-FFBFCFF is used by Mainboardresources. (I hope i have correct translated this because i use a German Windows. (Speicherbereich FFBFEC00 - FFBFECFF wird verwendet von: Hauptplatinenressourcen)
At normal boot Windows reconfigure this memory area to: FFE3E600 - FFE3E6FF
And show no conflicts.
Comment 107 tomas m 2010-07-22 19:05:10 UTC
(In reply to comment #106)
> In Windows XP all devices are working with the normal drivers. But if the
> drives are not loaded Windows show a resourceconflict. (I not use standby or
> hibernation)
> 
> eg. For the 8139 Memoryarea: FFBFEC00-FFBFCFF is used by Mainboardresources.
> (I
> hope i have correct translated this because i use a German Windows.
> (Speicherbereich FFBFEC00 - FFBFECFF wird verwendet von:
> Hauptplatinenressourcen)
> At normal boot Windows reconfigure this memory area to: FFE3E600 - FFE3E6FF
> And show no conflicts.

you saved me a lot of work ;) i was going to mirror my hdd somewhere else and install vista 

could you please test bug https://bugzilla.kernel.org/show_bug.cgi?id=16396 
and report suspend / hibernate from windows results there ? they were wondering if the notebook could suspend under windows.

thanks
Comment 108 Arne Fitzenreiter 2010-07-22 19:31:09 UTC
Suspend to disk and ram is working with windows xp without problems.

With Ubuntu 10.04 suspend to ram is working but to suspend to disk fail.
Comment 109 Bjorn Helgaas 2010-07-22 22:36:00 UTC
Created attachment 27214 [details]
add PNP resource debug output

Hi Arne, it's quite embarrassing that after two years, two bug reports,
and 150 comments, we still don't have a good resolution for this issue.

Let me see if I understand your comment 105.  The log (attachment 27209 [details])
shows a boot with "pci=use_crs reserve=0xffb00000,0x100000", and that works.
But if you boot with only "pci=use_crs", it fails.  Right?

Let's experiment with this.  Please apply the attached patch and also the
one from attachment 26819 [details] (bug 16228 comment 4).

Now, prevent 8139too from loading automatically (rename the module or
something).  If you boot with only "pci=use_crs", without using "reserve=",
you should be able to collect the dmesg log after Linux reassigns device
resources but before the 8139too driver loads and hangs the system.  (BTW,
if you don't mind, mark your attachments "text/plain" so they're easy to
open in a browser.)  If you then manually load 8139too, I guess you should
see the hang.

Now try a boot with "pci=bar=0000:03:04.0[14]=0x80000000".  This will move
the 8139 MMIO BAR elsewhere, on the theory that there's still another
device conflicting with it.  You can experiment with different addresses,
including the place where Windows puts it, and see whether any avoid the
hang.
Comment 110 Bjorn Helgaas 2010-07-22 23:15:27 UTC
Re: comment 105, I think I see the problem.  "pci=use_crs" doesn't help
in this case because the host bridge window:

  pci_root PNP0A08:00: host bridge window [mem 0x7f800000-0xffffffff]

does include all these problematic PCI BARs:

  pci 0000:03:04.0: reg 14: [mem 0xffbfec00-0xffbfecff]
  pci 0000:03:06.0: reg 14: [mem 0xffbfe000-0xffbfe7ff]
  pci 0000:03:06.2: reg 10: [mem 0xffbfe800-0xffbfe8ff]
  pci 0000:03:06.3: reg 10: [mem 0xffbff000-0xffbfffff]

They do conflict with this ACPI device:

  system 00:09: [mem 0xffb00000-0xffbfffff]

but we currently don't look at ACPI resources until later.  And even
then, all the PCI BARs are contained *within* the ACPI region, so I'm
not sure we'd find the conflict.

Don't waste your time experimenting with this.  I think it's clear that
we need some significant rework in the way Linux handles these ACPI
resources.

From comment 106, I think we can learn two very important things:

  1) Windows moves PCI devices to avoid conflicts with ACPI devices.
     Linux tends to trust PCI resources more than ACPI, but this may
     be a mistake.

  2) Windows doesn't move a PCI device until loading a driver for it.
     Linux moves PCI devices immediately, before loading any drivers,
     which may be more aggressive than necessary.
Comment 111 Arne Fitzenreiter 2010-07-24 15:54:48 UTC
Created attachment 27240 [details]
dmesg with patches and pci=use_crs

Loading of 8139too will crash. (mmio is still at 0xFFBFEC00)
Comment 112 Arne Fitzenreiter 2010-07-24 15:57:36 UTC
Created attachment 27241 [details]
Now with pci=bar...


8139too is useable (linux has choosed an other address)

03:04.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
	Subsystem: TWINHEAD INTERNATIONAL Corp Unknown device a003
	Flags: bus master, medium devsel, latency 64, IRQ 10
	I/O ports at d800 [size=256]
	Memory at ffbfec00 (32-bit, non-prefetchable) [size=256]
	Capabilities: [50] Power Management version 2
Comment 113 Arne Fitzenreiter 2010-07-24 16:09:10 UTC
Ups. Sorry the lspci was the wrong and i have missed a zero at bar parameter.
But the new address that linux chose has also worked.

If 0x80000000 was used linux use this.
Comment 114 Arne Fitzenreiter 2010-07-28 06:40:27 UTC
It is possible to add a quirk that reserve the memory if a H12Y was detected by PCI Subvendor/Subdevice ID until a better generic solution was found?
Comment 115 tomas m 2011-05-05 10:57:31 UTC
i've been using pci=use_crs for a while, but now with 2.6.39-rc series, this breaks intel's i915 modesettings. so i went back to the reserve=0xFFB00000,0x100000 method.
Comment 116 Alan 2012-05-17 15:35:56 UTC
Done... just getting it upstream

Note You need to log in before you can comment on or make changes to this bug.