Bug 111151

Summary: [pata_amd][pata_acpi] Internal base address register error prevents initialization of PATA drive
Product: IO/Storage Reporter: Andreas E (andi3)
Component: IDEAssignee: io_ide (io_ide)
Status: RESOLVED INVALID    
Severity: high CC: alan
Priority: P1    
Hardware: i386   
OS: Linux   
See Also: https://launchpad.net/bugs/1536397
Kernel Version: 4.2.0-10-generic Subsystem:
Regression: No Bisected commit-id:

Description Andreas E 2016-01-21 19:55:31 UTC
I guess this problem must have started when I replaced the mainboard of this old machine from a ECS Elitegroup one to a DFI NFII Ultra-AL.

As there are so many different older DFIs, it's so easy to get confused by them.

chipset: nVIDIA nForce2 Ultra 400 and nForce2 MCP
sata|ide interface: based on Marvell 88i8030 controller chip

First of all, you may NOT connect a SATA drive to the SATA port AND a PATA drive to Primary _Master_. (PATA to Primary _Slave_ is OK.)
This is the crude specialty of this board, respectively of the Marvell chipset used: the SATA drive will appear as PATA IDE channel 0! 

(NB: Can be very useful as in my case, because Victoria 3.3.3 (free) for DOS will _always_ hang when accessing my add-in cards; no matter whether SiI, VIA or Promise---tried them all. So this functionality is worth as gold, because otherwise I couldn't operate on a SATA drive with Victoria that way.)

So why not get rid of my second add-on card and boot my Ubuntu off my onboard SATA? 

EPIC FAIL...

Couldn't find root device.
Disconnected ALL drives BUT the SATA one with Ubuntu installed on it, plugged it into the onboard port - root device not found.
I got the following, very cryptic, error:

[1.916367] pata_amd 0000:00:09.0: version 0.4.1
[1.916383] pata_amd 0000:00:09.0: can't enable device: BAR 0 [io  size 0x0008] not assigned
[1.916453] pata_amd: probe of 0000:00:09.0 failed with error -22
[2.509228] pata_acpi 0000:00:09.0: can't enable device: BAR 0 [io  size 0x0008] not assigned
[2.509299] pata_acpi: probe of 0000:00:09.0 failed with error -22


Took me quite awhile of googling, until I figured out that BAR stands for Base Address Register.
OK, so it tries to enable the PATA drive via pata_amd, then as this fails, it tries pata_acpi.

** Biggest problem: **
pata_amd has "absolutist power" in Linux 4.
There is no fallback if pata_amd fails.
So blacklisting it (modprobe.blacklist=...) gets rid of the error messages but pretends as if the drive is not there.

** Also interesting: **
As mentioned, it does work in Victoria (DOS). Can boot off Hiren's Boot CD, launch Victoria, initialize the drive, read its SMART data, and everything else.

So the SATA/IDE controller combo _works_ on that board and it's clearly a Linux kernel shortcoming.
Possibly this weird SATA-showing-up-as-IDE0 thing simply hasn't been thought about yet.

As I assume that this problem might be a VERY tough task to debug, I could even agree to send my physical mainboard via snail-mail to a kernel developer for debugging on bit level.
Did I just make you laugh?
Well, it's no laughing matter. First of all, I doubt that w/o such kind of board physically at hand (beware! they've become rare), there'll be anything to do about that.
Second, another reason are the pata_amd / pata_acpi module messages which are WAY too sparse and non-descriptive.


But we'll see...
Comment 1 Alan 2016-01-22 20:57:09 UTC
You say "sata|ide interface: based on Marvell 88i8030 controller chip"

The 88i8030 is a PATA/SATA bridge, it's not a controller chip. It's an ancient lump of glue for nailing a SATA driver to a PATA port.


[1.916367] pata_amd 0000:00:09.0: version 0.4.1
[1.916383] pata_amd 0000:00:09.0: can't enable device: BAR 0 [io  size 0x0008] not assigned
[1.916453] pata_amd: probe of 0000:00:09.0 failed with error -22
[2.509228] pata_acpi 0000:00:09.0: can't enable device: BAR 0 [io  size 0x0008] not assigned
[2.509299] pata_acpi: probe of 0000:00:09.0 failed with error -22

So the BIOS hasn't configured the controller by the look of it. That or it is disabled in the NVRAM settings.

It tried pata_amd, that failed it tried pata_acpi that failed for the same reason.

The Ultra-AL is known to work so that suggests to me there's perhaps some kind of setting error on your board.

Might be useful if you could do an lspci -vvxxx with the box booting something like a live USB installation, and then attach it. That would at least allow folks to see how the firmware has left the setup.
Comment 2 Andreas E 2016-01-23 01:17:12 UTC
Alan, thanks a lot for your input.

First of all...this stuff DOES work in Kernel 2.6.xx, go figure.

I can prove that because I made a mistake yesterday by cleaning up my root partition a little bit too thoroughly. Well, both the empty (!) /run and /usr dirs must exist on the partition to hook in /usr at bootup. Thought either upstart or systemd would be as intelligent as creating either of them when not there. Bleh. (PEBKAC: find . -maxdepth 1 -type d ! -xtype l -empty -delete) ;-))

To cut it short, I had to fix this by booting off an old CD-based 2.6.xx kernel "mini linux", so I could watch the startup process AND ... !

amd74xx, the ancient IDE controller kernel module compiled in there, actaually *DID* initialize the drive properly. (Well, to make it tougher for testing, I intentionally booted off an _IDE_ CD-ROM drive! If amd74xx had shown the same behavior as pata_amd, it would've choked _while_ booting up the old linux!)
Comment 3 Andreas E 2016-01-23 01:38:32 UTC
P.S. 
>So the BIOS hasn't configured the controller by the look of it. 

Hang on, there's another specialty to mention. The SATA controller must be enabled in the "Genie BIOS" (Serial ATA control -- Enabled). That's a sort of appendix to the AWARD Bios 6.00 PG known especially from those old DFI boards.
But I'm sure it's enabled, because instead of the IDE Primary Master normally there, the SATA drive will show up.
Comment 4 Alan 2016-01-23 11:43:55 UTC
Helpful to know - if you can attach an lspci -vvxxx from both the failing and working cases then we have a good chance of pinning down what is going on.
Comment 5 Andreas E 2016-01-23 12:15:27 UTC
OK, I will.
Let me think...when booting FAILED, it stopped at the initramfs prompt.

I'm not sure if initramfs supports lspci, but I can try.
Also thought of forcing the system into a certain runlevel in both cases.
The Ubuntus have a very crazy "trick" to stop booting in mid-air: have an additional drive defined in /etc/fstab which you physically disconnect.

Ubuntu will then issue the well-known "A starting job has been running for...1m 30s" and eventually give up, sardonically "welcoming" you to emergency mode.
Comment 6 Alan 2016-01-23 17:09:36 UTC
Just boot a live USB distro with a modern kernel - that will let yo do the lspci -vvxxx running from USB even with no working hard disc
Comment 7 Andreas E 2016-01-23 18:58:19 UTC
Why, would have to create one first...I prefer CD live systems. There'd be a way to boot off my SATA add-in card (got both a PATA and SATA CD-ROM drive).
However, I think I know what you're on about. That method might be cluttering up lspci output too much in the end (too many components)
Comment 8 Andreas E 2016-01-23 21:13:14 UTC
Added Launchpad bug link above.
Shouldn't have missed that bug before.
"Sergey" claims it to be fixed in 4.4 And I'm on 4.2.0.
Must check!

*HOWEVER* Supposing it does work in 4.4, fix must be backported because corporate servers can't take the risk of changing whole kernel branch, merely a fraction of version (the y in x.y) will be appropriate.
Comment 9 Andreas E 2016-01-23 23:09:59 UTC
SUCCESS !!!

Linux my-lubuntubox 4.4.0-040400-generic #201601101930 SMP Mon Jan 11 00:49:33 UTC 2016 i686 athlon i686 GNU/Linux

That was it. Kernel 4.4.0 ONLY.

-—--—--—--—--—--—--—--—--—--
All 4.2.0 to 4.3.4 = FAILED.
-—--—--—--—--—--—--—--—--—--
4.4.0 = WORKS.

(there was nothing between 4.3.5 and 4.3.9 in http://kernel.ubuntu.com/~kernel-ppa/mainline/ AFAICS)

This is the first time I can write a post after having successfully booted off my _onboard_ SATA port.
So it does seem to be a kernel/module fault, and - for once - no user's hardware fault.

Remember, ancient Linux versions worked as well as non-Linux OSes (DOS).

Case closed (for me).
If you'd want anything else to know from me about this issue, feel free to ask.:-)

Thanks for your time, Alan.
Comment 10 Andreas E 2016-02-04 22:43:55 UTC
Closing this!

With lots of help from Oleg B. ( @ Launchpad), it could be figured out that for old and very old boards, it is mandatory to set pci=nocrs as a kernel cheat code. Otherwise none of the kernels < 4.4.0 will boot in old machines, or at least, there won't be any onboard IDE possible as the BAR is always (deemed) 0.
Comment 11 Andreas E 2016-02-04 22:44:56 UTC
--->> Continued on bug 111901