Bug 199763
Summary: | System is unresponsive, or completely frozen on high memory usage | ||
---|---|---|---|
Product: | Memory Management | Reporter: | SlayerProof32 (kortrax11) |
Component: | Other | Assignee: | Andrew Morton (akpm) |
Status: | REOPENED --- | ||
Severity: | high | CC: | cfeck, dion, dushistov, iam, matthew, ultra10e |
Priority: | P2 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.12.8, 4.13.16, 4.14, 4.16.8, 4.17 Rc8 (just the ones i've tested) | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | Dmesg after boot |
Description
SlayerProof32
2018-05-19 01:40:32 UTC
Relates to https://bugzilla.kernel.org/show_bug.cgi?id=196729 Please only mark as duplicate if you are a linux kernel Developer, and are working on a fix. Ive tested using Kernel version Kernel 4.17 rc8, Kernel 4.16.8, Kernel 4.12.8, and Kernel 4.14 across Manjaro Linux, Ubuntu Linux, Opensuse leap 15 and Fedora 28. Steps to trigger: -Open firefox with many tabs, or any other high memory usage program -Wait a second -System freezes. Sometimes the only fix is a hard reboot Other findings: -I notice really high cpu load averages if the system unfreezes -If the system is not frozen, it is highly unresponsive on high memory usage when swapping in my expierience -Hard drive indicator light stays solidly on when system is frozen (excessive hard disk use) -The reason the system freezes is because of high mem usage Tested on a Intel i5 520m with 4gb ram/ 4gb swap (Lenovo t410) Intel E6400 with 3gb ram/ 3gb swap This bug is really hard to deal with because it usually requires a hard restart. Please fix ASAP if possible Update: Another user reported excess flash drive usage on high memory usage when booted as a live user. This means that swap is not the issue. The issue is completely with the system RAM management Is this a 32-bit kernel or 64? You selected a hardware type of "IA-64"! As far as I know, this bug effects all 64 bit linux kernels released since 2007 is there a more appropriate hardware classification? I only have intel 64 bit hardware to test with. The ones listed are the kernels (as of today) that i've noticed the issue with The computer also seems to freeze when the disk is being used a lot. issue still occurs in the new kernels (4.16.11) Also occurs in 4.16.12 High disk usage, or high memory usage seems to cause this issue. Swap puts both these issues together. Still occurs 4.16.13. Please fix This is a critical issue, and Linux desktop can never be stable on older hardware with a big issue like this. Recently, my Linux system crashed with 4 tabs of google docs open in Firefox, and VLC. The same issue does not occur in Windows(Ugh, Microsoft) but when this issue occurs, and system freezes completely, I am forced to go back to windows. Please fix this critical bug, and make me not have to touch windows ever again. I would greatly appreciate it. Please provide more detail about your system configuration. lspci -v, cat /proc/scsi/scsi, etc (https://www.kernel.org/doc/html/latest/admin-guide/reporting-bugs.html#gather-information) I suspect your I/O subsystem simply can't cope with the load being thrown at it. It's *probably* seeks, but I don't know whether you're using an SSD or rotating storage. Sorry for not including that information. My Hard Disk is a Seagate ST9200420ASG running in a lenovo t410. https://www.cnet.com/products/seagate-momentus-laptop-st9200420asg-hard-drive-200-gb-sata-3gb-s/specs/ --------------------------------Detailed info---------------------------------- -Lspci -v https://pastebin.com/mnj4Bamu ------------------------------------------------------------------------------ -cat /proc/scsi/scsi Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: ST9200420ASG Rev: D Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: MATSHITA Model: DVD-RAM UJ892 Rev: SB01 Type: CD-ROM ANSI SCSI revision: 05 -------------------------------------------------------------------------------- -Parted -l Disk /dev/sda: 200GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: pmbr_boot Number Start End Size File system Name Flags 1 17.4kB 21.0GB 21.0GB btrfs 2 21.0GB 42.4GB 21.5GB ext4 5 42.4GB 164GB 122GB ext4 6 164GB 189GB 24.7GB btrfs 3 189GB 193GB 4295MB linux-swap(v1) swap 4 193GB 193GB 8389kB bios_grub I am booted off sda6 (btrfs) and using sda5 as a home partition. Swap is on sda3. ---------------------------------------------------------------------------- Cat /proc/version Linux version 4.16.13-300.fc28.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 8.1.1 20180502 (Red Hat 8.1.1-1) (GCC)) #1 SMP Wed May 30 14:31:00 UTC 2018 ---------------------------------------------------------------------------- Cat /proc/ioports 0000-0cf7 : PCI Bus 0000:00 0000-001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-0060 : keyboard 0061-0061 : PNP0800:00 0062-0062 : PNP0C09:00 0062-0062 : EC data 0064-0064 : keyboard 0066-0066 : PNP0C09:00 0066-0066 : EC cmd 0070-0071 : rtc0 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 00f0-00f0 : PNP0C04:00 03c0-03df : vga+ 0800-080f : pnp 00:01 0cf8-0cff : PCI conf1 0d00-ffff : PCI Bus 0000:00 1000-107f : pnp 00:01 1000-1003 : ACPI PM1a_EVT_BLK 1004-1005 : ACPI PM1a_CNT_BLK 1008-100b : ACPI PM_TMR 1020-102f : ACPI GPE0_BLK 1030-1033 : iTCO_wdt.0.auto 1030-1033 : iTCO_wdt 1050-1050 : ACPI PM2_CNT_BLK 1060-107f : iTCO_wdt.0.auto 1060-107f : iTCO_wdt 1180-11ff : pnp 00:01 15e0-15ef : pnp 00:01 1600-1641 : pnp 00:01 164e-164f : pnp 00:01 1800-1807 : 0000:00:02.0 1808-180f : 0000:00:16.3 1808-180f : serial 1810-1813 : 0000:00:1f.2 1810-1813 : ahci 1814-1817 : 0000:00:1f.2 1814-1817 : ahci 1818-181f : 0000:00:1f.2 1818-181f : ahci 1820-183f : 0000:00:19.0 1840-185f : 0000:00:1f.2 1840-185f : ahci 1860-1867 : 0000:00:1f.2 1860-1867 : ahci 1880-189f : 0000:00:1f.3 1880-189f : i801_smbus 2000-2fff : PCI Bus 0000:05 ----------------------------------------------------------------------------- /proc/iomem 00000000-00000fff : Reserved 00001000-0009e7ff : System RAM 0009e800-0009ffff : Reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000c7fff : Video ROM 000c8000-000cbfff : pnp 00:00 000cc000-000cffff : pnp 00:00 000d0000-000d0fff : Adapter ROM 000d1000-000d1fff : Adapter ROM 000d2000-000d3fff : Reserved 000d4000-000d7fff : PCI Bus 0000:00 000d8000-000dbfff : PCI Bus 0000:00 000dc000-000fffff : Reserved 000e0000-000effff : Extension ROM 000f0000-000fffff : System ROM 00100000-bb27bfff : System RAM 03000000-03c031d0 : Kernel code 03c031d1-04387f7f : Kernel data 04931000-04a86fff : Kernel bss bb27c000-bb281fff : Reserved bb282000-bb35dfff : System RAM bb35e000-bb370fff : Reserved bb371000-bb3f1fff : ACPI Non-volatile Storage bb3f2000-bb40efff : Reserved bb40f000-bb46efff : System RAM bb46f000-bb667fff : Reserved bb668000-bb6e7fff : ACPI Non-volatile Storage bb6e8000-bb70efff : Reserved bb70f000-bb716fff : System RAM bb717000-bb71efff : Reserved bb71f000-bb76afff : System RAM bb76b000-bb776fff : ACPI Non-volatile Storage bb777000-bb779fff : ACPI Tables bb77a000-bb780fff : ACPI Non-volatile Storage bb781000-bb781fff : ACPI Tables bb782000-bb78afff : ACPI Non-volatile Storage bb78b000-bb78bfff : ACPI Tables bb78c000-bb79efff : ACPI Non-volatile Storage bb79f000-bb7fefff : ACPI Tables bb7ff000-bb7fffff : System RAM bb800000-bfffffff : Reserved be000000-bfffffff : Graphics Stolen Memory c0000000-febfffff : PCI Bus 0000:00 c0000000-c0000fff : Intel Flush Page d0000000-dfffffff : 0000:00:02.0 e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff] e0000000-efffffff : Reserved e0000000-efffffff : pnp 00:01 f0000000-f1ffffff : PCI Bus 0000:05 f2000000-f23fffff : 0000:00:02.0 f2400000-f24fffff : PCI Bus 0000:03 f2400000-f2401fff : 0000:03:00.0 f2400000-f2401fff : iwlwifi f2500000-f25fffff : PCI Bus 0000:0d f2500000-f25000ff : 0000:0d:00.0 f2500000-f25000ff : mmc0 f2500400-f25004ff : 0000:0d:00.1 f2500800-f2500fff : 0000:0d:00.3 f2500800-f2500fff : firewire_ohci f2600000-f261ffff : 0000:00:19.0 f2600000-f261ffff : e1000e f2620000-f2623fff : 0000:00:1b.0 f2620000-f2623fff : ICH HD audio f2624000-f2624fff : 0000:00:16.3 f2625000-f2625fff : 0000:00:19.0 f2625000-f2625fff : e1000e f2626000-f2626fff : 0000:00:1f.6 f2626000-f2626fff : 0000:00:1f.6 f2827000-f28277ff : 0000:00:1f.2 f2827000-f28277ff : ahci f2827800-f282780f : 0000:00:16.0 f2827800-f282780f : mei_me f2828000-f28283ff : 0000:00:1a.0 f2828000-f28283ff : ehci_hcd f2828400-f28287ff : 0000:00:1d.0 f2828400-f28287ff : ehci_hcd f2828800-f28288ff : 0000:00:1f.3 f2900000-f29fffff : PCI Bus 0000:05 feaff000-feafffff : Reserved feaff000-feafffff : pnp 00:01 fec00000-fec0ffff : Reserved fec00000-fec003ff : IOAPIC 0 fed00000-fed003ff : HPET 0 fed00000-fed003ff : Reserved fed00000-fed003ff : PNP0103:00 fed10000-fed13fff : pnp 00:01 fed18000-fed18fff : pnp 00:01 fed19000-fed19fff : pnp 00:01 fed1c000-fed8ffff : Reserved fed1c000-fed1ffff : pnp 00:01 fed1f410-fed1f414 : iTCO_wdt.0.auto fed1f410-fed1f414 : iTCO_wdt.0.auto fed40000-fed44fff : TPM fed45000-fed4bfff : pnp 00:01 fee00000-fee00fff : Local APIC fee00000-fee00fff : Reserved ff000000-ffffffff : Reserved 100000000-137ffffff : System RAM ------------------------------------------------------------------------- Here is all the diagnostic info I could find that I believe to be relevant. Since i'm not a kernel expert, if there are other places for me to look for logs/diagnostics, please tell me, and I will happily fetch them. Created attachment 276357 [details]
Dmesg after boot
https://www.cnet.com/products/lenovo-thinkpad-t410-2522/review/ (with integrated graphics) https://support.lenovo.com/sg/en/solutions/pd006109 this one is better. Just for you to get an idea of the hardware i am using. It is the i5-520m version This still occurs on kernel 4.17.2. If the isssue is a I/O subsystem overload, how can we fix this? The computer i'am using is relatively new, and it shouldn't require bleeding edge hardware to run linux properly without crashes when I have more than 6 firefox tabs open. I'm willing to run any test you want. Update: I'd commented in detail about this bug in the other thread (https://bugzilla.kernel.org/show_bug.cgi?id=196729). I run the live versions of Linux on a 4GB Core-i5 laptop (and another 4GB pentium laptop also.) Just wanted to add: I've added 4Gb of RAM to the Core-i5 laptop for 8Gb total. With Fedora 28, the system will still cease up with maybe 2 dozen (or less depending on what's happening (video, etc) ) FF tabs opened/active. I came back here to note that, I'm currently using a Live Debian Stretch (9.5). There are obviously significant differences in the way these variants of Linux manage memory. Why? Because under the same system conditions (Gnome, same s/w programs installed and/or running), I can open WAY more tabs in FF on Debian; open more simultaneous programs, without fear of a sudden system heart-attack. In fact, it is much harder for me to cause the system freeze in Debian, even with approaching 50 tabs opened in FF developer 63... I understand there are underlying Fedora vs Debian system differences like: systemd vs init, and Wayland vs Xorg, Gnome versions (3.28.1 vs 3.22.5) and kernel revisions (4.16.3-301.fc28.x86_64 vs 4.9.110-1 (2018-07-05) ), but in all, I find Debian WAYYYY more forgiving, and more manageable, ESPECIALLY in light of this FATAL flaw, AND the known Gnome memory leak bug which can easily be remediated for in Debian by restarting Gnome (via Alt-F2, r) to free back up that memory. (The only way to accomplish this in Fedora is to actually log out of your session because of Wayland limitations.) Anyway, I jut thought it's another data point to add to the mystery. I still have to keep resource monitor opened even in Stretch, just in case, but I only crashed Stretch once over the past 3 months or so when I was in the 80's (mem % used) and let a video play for 2 hrs without checking up. Normally anyway, that percentage isn't rising above the 70's in my typical "working" environment. Finally, I'd like to mention to those asking for logs, etc., for this issue, realize that WHEN this issue occurs, it *is* essentially a heart-attack for the system. There is no recourse, and no way to gather logs. EVERYTHING ceases up- usually never to come back. A hard power-cycle is the only recourse, and NO logs which would shed light on the issue are written. EVERYTHING stops- including log writing. This is the reality. I *do* have a few logs from and old (non-live) Jesse 8.7.1 install-- for a few times when, the system did revive, after hours-- and there's nothing in there that would shed light on the issue. The few entries in the log that I've researched pointed to no other instances/causes of this same issue. It would be nice after 11 or 12 years of this issue, if someone higher up and more knowledgeable in the development "food chain" would would simply replicate the issue, it's not really that hard to do so at all. It honestly is a show-stopper. Ciao. Now on Manjaro Linux 18.0.1 with the same laptop and 8Gib of ram. 1. Memory gets near full 2. Swap partition starts to fill 3. Swap gets 80% full 4. System freezes. Shouldn’t the OOM killer be killing processes? Why isn’t it? My laptop was frozen for 2 hours with constant hard drive writes. This behavior is not seen in windows on the same machine. I eventually had to hard restart. @Iou is right. It is a show stopper, and should have been fixed by now after being a bug for 13 or so years. When it happens, it is impossible to collect logs, even if I do something like top -b >top.log How to test: Bootup your favorite distro 1. Do something memory intensive, like compiling, something that will use all your ram. 2. Open some Firefox tabs. 3. Watch disk thrashing occur with no way to get logs. Hold down the power button when you are ready to try again. If you like, I can open a new report with all the sys info in one place. |