My system have KDE and pm-utils-1.4.1 installed. When I plug in the AC adapter, any IO operation will hang. I cannot write dmesg into files because of IO hang so I shoot the screen: https://picasaweb.google.com/chaos.proton/Bugs Upon this bug happens, the disk led is on for a while than off for a while than on for an other while and back and forward... I seems there are fatal ata errors there. After I uninstalled pm-utils, things went OK. But the installed pm-utils is clean and AFAIS, the only thing it do is set link_power_management_policy to max_performance. If you couldn't see the screenshot clearly, I can attach the original ones.
Forgot to mention, I think you can safely ignore the usb stuff inserted in dmesg. It should have nothing to do with this bug.
Can you please post full boot kernel log? Also, is this a regression? Thanks.
Ooh, one more question. Does the machine come back from the hang? Or is the machine completely dead after that?
Yes, the hang is a regression. I've tried 2.6.38 it only give: [ 112.076118] ata1: hard resetting link [ 112.380680] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 112.382488] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04) [ 112.385131] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04) [ 112.385495] ata1.00: configured for UDMA/133 [ 112.387769] ata1: EH complete [ 112.454063] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 112.567175] EXT4-fs (sda5): re-mounted. Opts: commit=0 [ 112.727557] EXT4-fs (sda6): re-mounted. Opts: commit=0 when plugin the AC again. But no hang there. In 2.6.39-rc6+, the hang never come back. Since the every disk operation never returns, the system will become dead soon.(nearly every program use disk, right? ;) I will attach the full dmesg and some other info in the following posts.
Created attachment 57002 [details] dmesg of 2.6.39-rc6+ AC unplugged in about 154.372674.
Created attachment 57012 [details] dmesg of 2.6.38 AC unplugged in about 91.747641 and plugged back in about 112.076118.
Created attachment 57022 [details] smartctl --all /dev/sda
Weird, 2.6.38 is okay but 2.6.39-rc6+ isn't. The thing is that libata had major link power saving reimplementation during 2.6.38 devel cycle but there hasn't been any significant change in the area during 39 cycle. I've looked through all the libata changes but nothing rings a bell. Is the problem readily reproducible? Would you be interested in doing a bisection? Thank you.
Yes, very solid reproducible , I mean, every time. Hmm, I know how to bisect but I may only have enough time in the weekend to do the build/reboot/test thing... ;(
Ok, I think I found the bad commit: commit 270dac35c26433d06a89150c51e75ca0181ca7e4 Author: Jian Peng <jipeng2005@gmail.com> Date: Fri Apr 22 23:58:10 2011 -0700 libata: ahci_start_engine compliant to AHCI spec At the end of section 10.1 of AHCI spec (rev 1.3), it states Software shall not set PxCMD.ST to 1 until it is determined that a functoinal device is present on the port as determined by PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h Even though most AHCI host controller works without this check, specific controller will fail under this condition. Signed-off-by: Jian Peng <jipeng2005@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com> problem gone after I revert it.
First-Bad-Commit : 270dac35c26433d06a89150c51e75ca0181ca7e4
Gu Rui, thank you very much for bisecting. On the hindsight, yeap, that one makes sense. Also reported in the following thread. http://thread.gmane.org/gmane.linux.kernel/1138771 Revert patch posted. http://article.gmane.org/gmane.linux.ide/49533
Fixed by http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=22fe9446e82f1fe4b59900db4599061384efb0ad .
Created attachment 58232 [details] sata_alpm in pm-utils-1.4.1 As Jian Peng asked, I post the pm-utils script that cause the problem. I'm not familiar with pm-utils as well but I think the function of this script is call set_sata_alpm min_power and when AC is plugged out and set_sata_alpm max_performance when AC plugged in.
Created attachment 58242 [details] dmesg of the kernel with bad commit 270dac35c264 and ATA_VERBOSE_DEBUG in libata.h