Bug 78781 - Kernel does not react due to error message flood in kerlen 3.15
Summary: Kernel does not react due to error message flood in kerlen 3.15
Status: RESOLVED WILL_NOT_FIX
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P1 high
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-23 15:06 UTC by caesar
Modified: 2014-07-17 13:19 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.15.4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description caesar 2014-06-23 15:06:46 UTC
Until Kernel 3.14.6 I got those Error Messages every 5 minutes:

Jun 21 15:31:23 localhost kernel: [Hardware Error]: MC2 Error: : EV error during data copyback.
Jun 21 15:31:23 localhost kernel: [Hardware Error]: Error Status: Corrected error, no action required.
Jun 21 15:31:23 localhost kernel: [Hardware Error]: CPU:0 (10:4:2) MC2_STATUS[Over|CE|-|-|AddrV]: 0xd40000000000017a
Jun 21 15:31:23 localhost kernel: [Hardware Error]: MC2_ADDR: 0x00000000011c2d80
Jun 21 15:31:23 localhost kernel: [Hardware Error]: cache level: L2, tx: GEN, mem-tx: EV
Jun 21 15:36:23 localhost kernel: [Hardware Error]: MC2 Error: : GEN parity/ECC error during data access from L2.
Jun 21 15:36:23 localhost kernel: [Hardware Error]: Error Status: Corrected error, no action required.
Jun 21 15:36:23 localhost kernel: [Hardware Error]: CPU:0 (10:4:2) MC2_STATUS[Over|CE|-|-|AddrV|CECC]: 0xd40041000000010a
Jun 21 15:36:23 localhost kernel: [Hardware Error]: MC2_ADDR: 0x00000002fe700a40
Jun 21 15:36:23 localhost kernel: [Hardware Error]: cache level: L2, tx: GEN, mem-tx: GEN

I have a AMD Phenom(tm) II X4 20 which basically is a AMD Phenom II x2 550 with two unlocked cored to become an AMD Phenom II x4 955. There might be some hardware error but it did not affect anything since over a year.

When I installed kernel 3.15 those messages began to appear several times a second. The PC did not crash but su and sudo stopped working, shutdown did not work, I guess the erros flood blocks messages to the kernel or something. I got over 80000 lines of erros in 5 minutes.
I guess some error reporting timedelay got changed which gaused the problem.
Comment 1 caesar 2014-06-28 14:04:06 UTC
The is still in 3.15.2
The log shows over 60 errors per second(!) with some lost due to "/dev/kmsg buffer overrun, some messages lost."
Comment 2 caesar 2014-07-08 23:39:21 UTC
Bug still present in 3.15.4 :(
Comment 3 caesar 2014-07-14 18:04:53 UTC
I tried 3.16rc5 and is is still there... can someone please look into it? :/
Comment 4 Andrey Utkin 2014-07-14 19:26:51 UTC
You can just stop syslog daemon to not have it flooding your HDD with logs. But i suppose it is not the reason of su and sudo not working.
Comment 5 Alan 2014-07-16 16:32:35 UTC
As it says "Hardware Error"  : Your machine logged a corrected L2 cache error. Talk to your hardware vendor.
Comment 6 caesar 2014-07-17 13:19:49 UTC
The hardware is not the problem, I got this error messages for a long time, there was a bug report here from someone else: https://bugzilla.kernel.org/show_bug.cgi?id=43205
The problem is, that the error message frequency changed from every 300s (which is fine) to several messages a second. And this is change occurred from 3.14 to 3.15.

Note You need to log in before you can comment on or make changes to this bug.