Bug 212443
Summary: | Random hang at shutdown/poweroff - Sun Microsystems Ultra 24 Workstation (Ursa) | ||
---|---|---|---|
Product: | ACPI | Reporter: | Julius Henry Marx (sawbona) |
Component: | BIOS | Assignee: | Zhang Rui (rui.zhang) |
Status: | CLOSED INSUFFICIENT_DATA | ||
Severity: | high | CC: | lenb, rui.zhang |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | Linux devuan 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
screenshot
ultra24_dmesg ultra24_acpidump ultra24_syslog.txt |
Description
Julius Henry Marx
2021-03-25 17:21:12 UTC
Created attachment 296063 [details]
screenshot
Created attachment 296065 [details]
ultra24_dmesg
Created attachment 296067 [details]
ultra24_acpidump
Created attachment 296069 [details]
ultra24_syslog.txt
This bug was originally reported 2019-03-14 14:45:06 UTC here: https://bugzilla.kernel.org/show_bug.cgi?id=201965 Julius, Please try this experiment: offline all cpus except CPU0 before shutdown. The theory is that the SMM responsible for actual poweroff may have a bug if invoked from other than the BSP. Hello Len: Thanks for taking an interest in this but I'm afraid you give me too much credit. > ... offline all cpus except CPU0 before shutdown. I don't have a clue as to how to do that. Since I posted last, I did away with the modified DSDT file as it has been too much to set it up hassle every time the kernel is updated and it did not solve the problem. I have also tried shutting down the box with a modified script in an icon on the desktop panel: [code] ~$ cat /usr/bin/shutdown.sh #!/bin/sh # added to shutdown directly - no shutdown helper # options added to troubleshoot nic related bad shutdown PATH=/sbin:/bin:/usr/sbin:/usr/bin: # 1 # shutdown system directly # sudo shutdown -h now # 2 # sync # disable onboard eth wol # shutdown system directly # sync && sudo ethtool -s eth0 wol d && sudo shutdown -h now # 3 # sync # remove e1000e module # shutdown system directly # sync && sudo rmmod -s -v e1000e && sudo shutdown -h now # 4 # sync # disable onboard eth wol # remove e1000e module # shutdown system directly sync && sudo ethtool -s eth0 wol d && sudo rmmod -s -v e1000e && sudo shutdown -h now ~$ [/code] I have tried all the options you see there and settled on #4. As you can see, this script syncs, disables wol and removes the e1000e driver before shutting down. To no avail. 'Every so often' I get what I have coome to call a bad shudown. And in the years I have had this going on, the only thing I can link to this 'every so often' is changes in room temperature. eg: end of summer to beginning of autumn. One thing I neglected to add is this: In is BIOS version (1.56 - last available) it is *impossible* for me to: 1. disable the on Intel board GbE controller in BIOS: it is greyed out. 2. disable the ME "Firmware Power Control", the only change I can make is "Host Sleep States" ON to S0 or S3. (ony two options). Attempting to disable ME "Firmware Power Control" renders the box unusable. ie: totally non responsive to any kb. input, with both case and CPU fans blowing at 100%, like if the sensors (wherever they are) see a high temperature inside the box or at the CPU. The *only* way out of that nightmare is a hard shut-down and the *only* way to get it to boot properly again involves clearing the CMOS and a reflash of the ME BIOS. I have the feeling that this is a severe hardware issue which pops up in boxes running non-MS OSs. Since my last post I found this web page from 2007 (when the U24 was released) and which I could only access as cached content (ie: not available on-line). [url]https://webcache.googleusercontent.com/search?q=cache:1n3s2V4blzYJ:https://community.oracle.com/tech/apps-infra/discussion/1907211/ultra-24-and-intel-boot-agent+&cd=3&hl=en&ct=clnk&gl=us[/url] From the text I gather that these issues were present from the start. In any case, if you let me know how I what I can do to run the test you suggest, I'll be more than willing to do it and report back. Thanks in advance, JHM (In reply to Julius Henry Marx from comment #7) > Hello Len: > > Thanks for taking an interest in this but I'm afraid you give me too much > credit. > > > ... offline all cpus except CPU0 before shutdown. > I don't have a clue as to how to do that. > you can do it in this way, for all cpuX under /sys/devices/system/cpu, run echo 0 > cpuX/online, except for CPU0. In this way, you can offline all the cpus except cpu0. And then, you can check if the problem still exists. Bug closed as there is no response. Please feel free to reopen it if you can provide the info required. |