Most recent kernel where this bug did not occur:2.6.20 Distribution:ubuntu Hardware Environment: ICH7 + WDC WD2500JD-22HBC0 Problem Description: This is Forwarding of ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/112132 ------------------------------------------------------------------ I installed the 2.6.24-1 386 kernel and when it tries to boot, it generates the following messages: ata3.00 qc timeout (cmd 0x27) ata3.00 failed to read native max address (err_mask 0x4) that repeated a couple of times. Then I got ata3.00 revaluation failed (errno=-5) and the whole sequence repeated if I remember. I tried adding irqpoll to the boot sequence and it didn't help any.
Created attachment 13967 [details] dmesg 2.6.20 I'm attaching a dmesg dump from booting Gutsy with the 2.26.20-16 kernel which is latest working kernel.
Created attachment 13968 [details] lspci
2.6.24-1 386 ubuntu = 2.6.24-rc3 vanilla kernel
Does irqpoll help?
no. 2.6.15 ata_piix version 1.05 - worked well 2.6.20 ata_piix version 2.10ac1 - get "failed to set xfer mode error" but worked well with irqpull. 2.6.22 and 2.6.24-rc3 ata_piix version 2.12 do not working with or without irqpull.
It's irqpoll not irqpull. Can you please post failing log with irqpoll specified?
I think the irqpull is a typo. As the person originally reporting the issue, 2.6.20 works with irqpoll, 2.6.24 does not.
Hmmm... how about "nohz=off irqpoll"?
Hmmm... interesting. I have similar hardware. At least ICH7 but attached to sata ST380817AS, which working just fine.
nohz=off had no affect.
Can you please post failing boot log? If setting net or serial console is difficult, taking photos using digital camera and posting them should do.
I've tried to setup a netconsole with no luck. It just dumps it to the main console. kernel /boot/vmlinuz-2.6.24-1-386 root=UUID=fb444af4-46f0-480a-8f41-b5657fb15ac4 ro irqpoll netconsole=6665@192.168.1.110/eth0,6666@192.168.1.104/00:1B:63:A0:86:07 And neither Syslog or netcat -u -l -p 6666 received any input during boots with this parameter. I can't digital camera it because all the messages fly by too fast. I could shoot the final 24 lines, but I can also write them down and type them back in if thats helpful. The following kernel boots: kernel /boot/vmlinuz-2.6.20-16-generic root=UUID=fb444af4-46f0-480a-8f41-b5657fb15ac4 ro irqpoll however: kernel /boot/vmlinuz-2.6.22-14-386 root=UUID=fb444af4-46f0-480a-8f41-b5657fb15ac4 ro irqpoll does not boot, it gives the same errors that .24-1 gives. I'll try and write down the messages that I get on the console, though they are basically whats posted at the top of the bug.
Here is some ATA messages on the detected hardware I was able to catch: ata3: SATA max UDMA/133 cmd 0xfa00 ctl 0xf900 bmdma 0xf600 irq 18 ata4: SATA max UDMA/133 cmd ???? ctl 0xf700 bdma 0xf608 irq 18 (the 2.6.20-16-generic kernel reports these two lines in dmesg as: [ 10.877189] ata3: SATA max UDMA/133 cmd 0x0001fa00 ctl 0x0001f902 bmdma 0x0001f600 irq 18 [ 10.877261] ata4: SATA max UDMA/133 cmd 0x0001f800 ctl 0x0001f702 bmdma 0x0001f608 irq 18 The messages on the screen related to the ata3 controller just before it drops to a command prompt are: ata3.00: qc timeout (cmd 0x27) ata3.00: failed to read native max address (err_mask=0x4) ata3.00: failed to recover some devices ata3.00: qc timeout (cmd 0x27) ata3.00: failed to read native max address (err_mask=0x4) ata3.00: failed to recover some devices ata3.00: revaluation failed errno=-5 ata3.00: qc timeout (cmd 0x27) ata3.00: failed to read native max address (err_mask=0x4) ata3.00: failed to recover some devices ata3.00: revaluation failed errno=-5 ata3.00: soft reset ???? 4 ata3.00: EH complete ALERT! /dev/disk/by-uuid/fb444af4-46f0-480a-8f41-b5657fb15ac4 does not exist. and then it drops to the boot prompt. If there are any specific messages, I can try and catch them.
tcpdump or wireshark comes handy when debugging netconsole. Setting up netconsole won't suppress regular output. It will just shoot udp packets additionally. You're running with irqpoll set, right? It's weird that the device timesout READ_NATIVE_MAX_EXT. Hmmm... Does "irqpoll libata.ignore_hpa=0" work?
I'm experiencing a very similar bug and irqpoll option fixed the issue for me. I experienced this while booting 2.6.24-16-generic on Ubuntu Hardy Heron. This is a new machine so I don't know when this bug was introduced. I see the same messages: >ata3.00 qc timeout (cmd 0x27) >ata3.00 failed to read native max address (err_mask 0x4) > >that repeated a couple of times. Then I got > >ata3.00 revaluation failed (errno=-5) What exactly does irqpoll does? Do you want me to attach dmesg?
It works around IRQ delivery problems by polling IRQ handlers. You can encounter messages like the above for a number of reasons. Can you please post the result of "lspci -nn" and kernel boot log?
Here are the files you asked. Please let me know if you need something else
Created attachment 15939 [details] kernel boot log not sure if this is the right file.
Created attachment 15940 [details] lspci -nn
Thanks. Hmm... There's no known ATA related problem on ICH7. It could be that some other device is causing IRQ storm and taking down ATA controllers with it. Any chance you can capture the failing boot log? You can do it with netconsole or serial console.
I tried to setup netconsole but could not get the output. I followed http://www.coraid.com/support/cln/CLN-HOWTO/ar01s09.html Don't know what could be wrong. I tried tcpdump on remote machine but looks like the messages are not being sent as I can't see them even on tcpdump. The messages I see without irqpoll are very similar to the ones Rob sent. I'll try to take pictures of failling boot
Created attachment 22158 [details] bug logs, started with dvd and without ... logs from dmesg, lspci -nn and dmidecode...
got the same problem, so i hope the logs will be useful to fix that bug.
Lars, yours is different and looks more like a firmware problem. I'll prep a patch.
Actually, what you're seeing is a known problem. Yeap, pioneer DVDRTD08. The drive firmware seems buggy and times out SETXFERMODE if media is not present. I posted a patch to work around the problem but there were some disagreements. Will prep an updated version. Please standby a bit.
My problem is fixed since i installed Ubuntu 9.10. Now i can boot without cd/dvd in drive without any failure messages.My drive works fine and is detected correctly i think. Thanks anyway for your help and support.