Here's a small sample of the errors: http://pastebin.com/b2Jv8z37 While these errors definitely have a purpose, these error messages are created far too quickly, and this causes severe problems on my system at least, the logs become bloated with these messages, completely filling my root partition, preventing me from using my system. I believe that anyone else who is able to trigger these messages at such a high rate will have the same problem. Based on the amount of messages and the time given for each, it appears that this error message appears approximately 1000 times per second (on my system), creating 3000 lines per second or ~400000 bytes per second (on my system). This fills up about 1.4 GB per hour (on my system) which fills up my root partition in less than a day. Once my root partition is filled, my system becomes unusable. My preferred solution to this issue is to limit the rate at which these messages appear. A limit of one error message per second seems like a good idea.
I also reported the bug on LaunchPad but I'm not entirely sure if it's the correct place. Anyway, here is the link to that: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1643719
(In reply to Aaron Franke from comment #0) > Here's a small sample of the errors: http://pastebin.com/b2Jv8z37 > can you please attach the errors in bugzilla? I can not access pastebin.
Created attachment 245501 [details] A small sample of the errors. Here you go Zhang Rui, you should be able to access this attachment.
We need acpidump to investigate. So please upload acpidump output here.
Not exactly sure what acpidump is but here's the output of acpi -V No support for device type: power_supply No support for device type: power_supply Cooling 0: x86_pkg_temp no state information available Cooling 1: intel_powerclamp no state information available Cooling 2: Processor 0 of 3 Cooling 3: Processor 0 of 3 Cooling 4: Processor 0 of 3 Cooling 5: Processor 0 of 3 Note, also, that I'm not really asking for the fix to be preventing this error, because I'm sure that there is in fact some problem with my system causing such an error, but I'm instead asking for a "speed limit" for these error messages, to prevent systems from becoming unusable due to gigantic log files, which would be very bad should anyone else have this same issue.
To obtain acpidump output, you need acpidump utility which can dump all ACPI tables from the system memory: # sudo acpidump > acpidump.txt You can find acpidump utility in the Linux kernel source tree: "tools/power/acpi/tools/acpidump" In order to compile latest acpidump: # cd tools # make acpi Or you can find it here: "https://acpica.org/downloads" And the build instructions are also on the web page.
> Note, also, that I'm not really asking for the fix to be preventing this > error, because I'm sure that there is in fact some problem with my system > causing such an error, but I'm instead asking for a "speed limit" for these > error messages, to prevent systems from becoming unusable due to gigantic log > files, which would be very bad should anyone else have this same issue. OK, however I'm not sure if this is possible in ACPICA. I'll give it a try.
I think the ratelimit must be related to specific error (namespace nodes, error codes, sometimes, parser stacks), which requires to maintain a very big error maps inside of kernel. I don't think it is possible. Or a simpler mechanism could just stop an error per-file-per-line. That surely will make many other useful errors muted. Which may not be expected by the developers. So this can be an optional feature controlled by the user during runtime. Thanks Lv
Created attachment 245901 [details] sudo acpidump > acpidump.txt
The entire contents of my /var/log folder after the error occured again (~350 MB extracted): http://www.mediafire.com/file/6744gbnkf3f4amb/log.tar.gz (too large to post as an attachment) Thanks for replying Lv Zheng. I completely trust your decisions about the issue, I'm not a kernel developer by any means so my educated guess as to what would work as a solution is likely to be wrong. Anyway, hopefully these files will help you find the best solution to this problem. I have attached an acpidump.txt as you reqested, and I've also attached the entirety of my /var/log so that you can see the error. The size of /var/log is limited to 350 MB because I recently changed my system to mount /var/log on a separate 360 MB partition due to this bug causing havoc on my root partition.
To Aaron Could you paste several lines of the log here? So that I can see which line I need to change for ACPICA. The website doesn't allow accesses from China. Thanks Lv
Hello Lv Zheng, I'm not exactly sure which one you mean because one is an attachment which should be accessible and one is a folder full of log files. The folder's kern.log is the same as the first attachment I made, "A small sample of the errors". It looks like this: Nov 24 21:03:42 aaron-xub16desk kernel: [ 93.264523] ACPI Exception: AE_NOT_FOUND, while evaluating GPE method [_L6F] (20150930/evgpe-592) Nov 24 21:03:42 aaron-xub16desk kernel: [ 93.265462] ACPI Error: [PGRT] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359) Nov 24 21:03:42 aaron-xub16desk kernel: [ 93.265463] ACPI Error: Method parse/execution failed [\_GPE._L6F] (Node ffff8804558d8000), AE_NOT_FOUND (20150930/psparse-542) Nov 24 21:03:42 aaron-xub16desk kernel: [ 93.265465] ACPI Exception: AE_NOT_FOUND, while evaluating GPE method [_L6F] (20150930/evgpe-592) Nov 24 21:03:42 aaron-xub16desk kernel: [ 93.266422] ACPI Error: [PGRT] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359) Nov 24 21:03:42 aaron-xub16desk kernel: [ 93.266424] ACPI Error: Method parse/execution failed [\_GPE._L6F] (Node ffff8804558d8000), AE_NOT_FOUND (20150930/psparse-542) These errors are spawned very rapidly, as stated before, and as you can see by the timestamps (those 6 errors are within 0.002 seconds of each other).
Also, please let me know if you have a specific preferred host in mind that you can access from China. I don't know where else to upload a file that you could access it from.
That's why I think this is useless: Nov 24 21:03:42 aaron-xub16desk kernel: [ 93.264523] ACPI Exception: AE_NOT_FOUND, while evaluating GPE method [_L6F] (20150930/evgpe-592) If this becomes a ONCE message, how do we know any further runtime AML failures? We can only see error once then. So we can know that \_L6F has failed, but we'll never know \_L6E has also failed. Thus I'll only provide a compile-time option for those who wants this feature. Thanks Lv
(In reply to Aaron Franke from comment #13) > Also, please let me know if you have a specific preferred host in mind that > you can access from China. I don't know where else to upload a file that you > could access it from. The following ones are commonly used in China: pan.baidu.com (you must have a baidu account) tecent QQ file sharing (you must have a Tencent QQ account) A Chinese internet explorer is always a BAT (BAIDU, Alibaba, Tencent) user. :D Thanks and best regards Lv
Created attachment 247281 [details] [PATCH] ACPICA: Utilities: Add ACPI_DO_ONCE macros Here you are. Please apply the fix, configure CONFIG_ACPI_ERROR_ONCE to y via "make menuconfig" and rebuild your kernel. Thanks
Ping... Is this patch working? Thanks Lv
I can see it's working on my machine. All table header printouts are shrinked, leaving only RSDP (special case), XSDT (1st message) and FACS (special case). So I'm closing this bug.
Unfortunately I was unable to figure it out earlier and was putting off trying again and now I'm unable to test it because I'm away from home for the holidays. If you could post more info about how to apply this fix then I'd gladly try it when I'm next available to do so - in 1-2 weeks probably. But, if you've tested it and it works for you, then hopefully everything is good and eventually this fix will make its way downstream to me and others. Thank you for everything by the way, Lv Zheng!
Created attachment 247991 [details] Dmesg on my test machine This is the dmesg output on my test machine. We can see: [ 0.000000] ACPI: Early table checksum verification disabled [ 0.000000] ACPI: RSDP 0x00000000AA3F8000 000024 (v02 OEMC ) This is printed from tbutils.c acpi_tb_parse_root_table(). [ 0.000000] ACPI: XSDT 0x00000000AA3F80A0 0000BC (v01 OEMC O E M C 00000300 AMI 00010013) This is printed from tbutils.c acpi_tb_parse_root_table(). In acpi_tb_print_table_header(), it is the 2nd ACPI_INFO, so it is a special case. [ 0.000000] ACPI: Table Upgrade: override [DSDT- OEMC-O E M C ] [ 0.000000] ACPI: DSDT 0x00000000AA3F81E8 Physical table override, new table: 0x00000000AA064000 This is printed from tbinstal.c acpi_tb_install_table_with_override(). In acpi_tb_print_table_header(), it is the 3rd ACPI_INFO. [ 0.000000] ACPI: FACS 0x00000000ABB4E080 000040 This is printed from tbutils.c This is printed from tbinstal.c acpi_tb_install_table_with_override(). In acpi_tb_print_table_header(), it is the 1st ACPI_INFO, so it is a special case. For 3rd ACPI_INFO, I can see only 1 entry while in a normal boot, I can see many table loading printouts. So the ACPI_DO_ONCE macros should be working. For ACPI_ERROR/ACPI_EXCEPTION/..., the result should be same. This is a trivial feature, let me close it. I'll discuss it with ACPICA upstream.
Sorry, the investigation has some slight issues: This is the dmesg output on my test machine. We can see: [ 0.000000] ACPI: RSDP 0x00000000AA3F8000 000024 (v02 OEMC ) This is printed from tbutils.c acpi_tb_parse_root_table(). In acpi_tb_print_table_header(), it is the 2nd ACPI_INFO, so it is a special case. [ 0.000000] ACPI: XSDT 0x00000000AA3F80A0 0000BC (v01 OEMC O E M C 00000300 AMI 00010013) This is printed from tbutils.c acpi_tb_parse_root_table(). In acpi_tb_print_table_header(), it is the 3rd ACPI_INFO. [ 0.000000] ACPI: Table Upgrade: override [DSDT- OEMC-O E M C ] [ 0.000000] ACPI: DSDT 0x00000000AA3F81E8 Physical table override, new table: 0x00000000AA064000 This is printed from tbinstal.c acpi_tb_install_table_with_override(). It is not related. [ 0.000000] ACPI: FACS 0x00000000ABB4E080 000040 This is printed from tbinstal.c acpi_tb_install_table_with_override(). In acpi_tb_print_table_header(), it is the 1st ACPI_INFO, so it is a special case. For 3rd ACPI_INFO, I can see only 1 entry while in a normal boot, I can see many table loading printouts. So the ACPI_DO_ONCE macros should be working.
If you want to try, then the steps are: 1. Download latest kernel source $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2. In kernel source tree, apply this patch (attachment 247281 [details]) to your kernel source $ patch -p 1 < <patch file> 3. In kernel source tree, configure the build $ make menuconfig Power management and ACPI options ---> ACPI (Advanced Configuration and Power Interface) Support ---> Print ACPICA error messages only once Select the "Print ACPICA error messages only once" 4. Build and boot the kernel 5. Upload the dmesg output here. Or you can simply wait until this feature is released from the ACPICA upstream. Thanks Lv
I still get spammed log entries with the "Linux 4.15.0-34-generic #37-Ubuntu" kernel on Ubuntu 18.04 LTS 64-bit. I can reproduce the problem, it happens whenever I plug in my eSATA external hard drive. I have /var/log mounted on a separate 400 MB partition to prevent it from killing my root partition's space. I overwrote "kern.log" and "syslog" with emptiness via "echo test > kern.log" as root when I started writing this message, and already my /var/log partition is about 70% full. ) Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.514328] ACPI Error: [PGRT] Namespace lookup failure, AE_NOT_FOUND (20170831/psargs-364) Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.514332] No Local Variables are initialized for Method [_L6F] Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.514333] No Arguments are initialized for method [_L6F] Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.514334] ACPI Error: Method parse/execution failed \_GPE._L6F, AE_NOT_FOUND (20170831/psparse-550) Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.514338] ACPI Exception: AE_NOT_FOUND, while evaluating GPE method [_L6F] (20170831/evgpe-646) Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.515863] ACPI Error: [PGRT] Namespace lookup failure, AE_NOT_FOUND (20170831/psargs-364) Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.515867] No Local Variables are initialized for Method [_L6F] Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.515868] No Arguments are initialized for method [_L6F] Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.515869] ACPI Error: Method parse/execution failed \_GPE._L6F, AE_NOT_FOUND (20170831/psparse-550) Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.515873] ACPI Exception: AE_NOT_FOUND, while evaluating GPE method [_L6F] (20170831/evgpe-646) The same message is repeated over and over. From the timestamps you can see that this happens about every 0.5 milliseconds (or 0.0005 seconds). The main problem here is the overlogging of errors. If there is an error, it is a good thing to log it. But it is not good if you log the same error 2000 times per second. There needs to be some kind of rate limit on error logging. I assume that whatever patch you wrote would've made it in to the kernel by now, so I'm re-opening this issue since the problem is still present.
Well, as the error is triggered in the GPE handler, I think the real problem is that we're getting an GPE 0x6F interrupt storm here. Even if we set the error limit, you will still see 2000 interrupts filed per second, which makes your machine hard to use or at least consumes much more power than it should. please confirm this by attaching the output of "grep . /sys/firmware/acpi/interrupts/gpe6F" please also attach the output of lspci -vx.
(In reply to Aaron Franke from comment #23) > I still get spammed log entries with the "Linux 4.15.0-34-generic > #37-Ubuntu" kernel on Ubuntu 18.04 LTS 64-bit. I can reproduce the problem, > it happens whenever I plug in my eSATA external hard drive. > > I have /var/log mounted on a separate 400 MB partition to prevent it from > killing my root partition's space. I overwrote "kern.log" and "syslog" with > emptiness via "echo test > kern.log" as root when I started writing this > message, and already my /var/log partition is about 70% full. > > ) > Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.514328] ACPI Error: [PGRT] > Namespace lookup failure, AE_NOT_FOUND (20170831/psargs-364) > Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.514332] No Local Variables > are initialized for Method [_L6F] > Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.514333] No Arguments are > initialized for method [_L6F] > Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.514334] ACPI Error: Method > parse/execution failed \_GPE._L6F, AE_NOT_FOUND (20170831/psparse-550) > Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.514338] ACPI Exception: > AE_NOT_FOUND, while evaluating GPE method [_L6F] (20170831/evgpe-646) > Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.515863] ACPI Error: [PGRT] > Namespace lookup failure, AE_NOT_FOUND (20170831/psargs-364) > Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.515867] No Local Variables > are initialized for Method [_L6F] > Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.515868] No Arguments are > initialized for method [_L6F] > Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.515869] ACPI Error: Method > parse/execution failed \_GPE._L6F, AE_NOT_FOUND (20170831/psparse-550) > Sep 28 04:46:06 aaron-xub18desk kernel: [ 8702.515873] ACPI Exception: > AE_NOT_FOUND, while evaluating GPE method [_L6F] (20170831/evgpe-646) > > The same message is repeated over and over. From the timestamps you can see > that this happens about every 0.5 milliseconds (or 0.0005 seconds). > > The main problem here is the overlogging of errors. If there is an error, it > is a good thing to log it. But it is not good if you log the same error 2000 > times per second. There needs to be some kind of rate limit on error logging. > > I assume that whatever patch you wrote would've made it in to the kernel by > now, so I'm re-opening this issue since the problem is still present. There are some fixes that may in theory help here and are not present in 4.15 (or anything based on it). Can you try 4.18.y or 4.19-rc5, please?
Created attachment 278851 [details] Kernel 4.15 with Nvidia drivers lspci -vx and acpi/interrupts/gpe6F
Created attachment 278853 [details] Kernel 4.15 with Nouveau driver lspci -vx and acpi/interrupts/gpe6F
Created attachment 278855 [details] Kernel 4.19 rc5 lspci -vx and acpi/interrupts/gpe6F Alright, I've tested 4.19 rc5 and posted the output of the requested commands in various text files. * I installed 4.19 rc5 from the Ubuntu packages here http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19-rc5/ (generic amd64) * I had to disable my Nvidia driver to use 4.19, and the problem does still occur, but I included multiple reports (4.15 Nvidia, 4.15 Nouveau, and 4.19 Nouveau) just in case it helps. The report with the Nvidia drivers installed was taken after several hours of uptime, while the other two were taken about a minute after booting.
Thanks. Would it be possible to attach the log from the 4.19-rc5?
Created attachment 278893 [details] ZIP archive of entire /var/log with Kernel 4.19-rc5 Here's all of my system logs from Kernel 4.19-rc5 (the entire contents of my /var/log folder) inside of a ZIP file. This way you will have any log you need. By the way, here's my full information according to uname -a: $ uname -a Linux aaron-xub18desk 4.19.0-041900rc5-generic #201809231830 SMP Sun Sep 23 18:32:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
I think we have some code to detect GPE storm and disable the GPE, no? @erik
As a data point, here is the ASL problem with the _L6F GPE method: Method (_L6F, 0, NotSerialized) // _Lxx: Level-Triggered GPE { \_SB.UGPS () If ((RTD3 == One)) { If (CondRefOf (\_GPE.AL6F)) { AL6F () } } If ((PGRT == One)) { PGRT simply does not exist anywhere (in the DSDT and all SSDTs), resulting in the execution error. Since the reference to PGRT is always referenced (no code path can avoid it), it is clear that this method was never tested. - ev _GPE._L6F Evaluating \_GPE._L6F Firmware Error (ACPI): Could not resolve [\_GPE._L6F.PGRT], AE_NOT_FOUND (20181003/psargs-503) Failed at AML Offset 00029, Opcode 002D: No Local Variables are initialized for Method [_L6F] No Arguments are initialized for method [_L6F] ACPI Error: Method parse/execution failed \_GPE._L6F, AE_NOT_FOUND (20181003/psparse-677) ACPI Error: AE_NOT_FOUND, while executing \_GPE._L6F from debugger (20181003/dbexec-334) 0x2 Outstanding allocations after evaluation of \_GPE._L6F Evaluation of \_GPE._L6F failed with status AE_NOT_FOUND Thus the issue appears to be: why is this GPE ever enabled?
@Aaron please try the code change below and let us know if you see any changes or not.. --- a/include/acpi/acpixf.h +++ b/include/acpi/acpixf.h @@ -161,7 +161,7 @@ * NOTE, this is essentially obsolete and will be removed soon * (01/2018). */ -ACPI_INIT_GLOBAL(u8, acpi_gbl_group_module_level_code, FALSE); +ACPI_INIT_GLOBAL(u8, acpi_gbl_group_module_level_code, TRUE); /* * Optionally support module level code by parsing an entire table as @@ -169,7 +169,7 @@ * NOTE, this is essentially obsolete and will be removed soon * (01/2018). */ -ACPI_INIT_GLOBAL(u8, acpi_gbl_execute_tables_as_methods, TRUE); +ACPI_INIT_GLOBAL(u8, acpi_gbl_execute_tables_as_methods, FALSE); /* * Optionally use 32-bit FADT addresses if and when there is a conflict
Bug closed as there is no response from the bug reporter. Please feel free to reopen it if you can reproduce the problem with latest upstream kernel and the code change above.