Bug 199763 - System is unresponsive, or completely frozen on high memory usage
Summary: System is unresponsive, or completely frozen on high memory usage
Status: REOPENED
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P2 high
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-19 01:40 UTC by SlayerProof32
Modified: 2018-12-25 10:01 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.12.8, 4.13.16, 4.14, 4.16.8, 4.17 Rc8 (just the ones i've tested)
Tree: Mainline
Regression: Yes


Attachments
Dmesg after boot (75.85 KB, text/plain)
2018-06-06 22:04 UTC, SlayerProof32
Details

Description SlayerProof32 2018-05-19 01:40:32 UTC

    
Comment 1 SlayerProof32 2018-05-19 01:42:17 UTC
Relates to 

https://bugzilla.kernel.org/show_bug.cgi?id=196729

Please only mark as duplicate if you are a linux kernel Developer, and are working on a fix.
Comment 2 SlayerProof32 2018-05-19 01:45:46 UTC
Ive tested using Kernel version Kernel 4.17 rc8, Kernel 4.16.8, Kernel 4.12.8, and Kernel 4.14 across Manjaro Linux, Ubuntu Linux, Opensuse leap 15 and Fedora 28. 

Steps to trigger:
-Open firefox with many tabs, or any other high memory usage program
-Wait a second
-System freezes. Sometimes the only fix is a hard reboot

Other findings:
-I notice really high cpu load averages if the system unfreezes
-If the system is not frozen, it is highly unresponsive on high memory usage when swapping in my expierience
-Hard drive indicator light stays solidly on when system is frozen (excessive hard disk use)
-The reason the system freezes is because of high mem usage

Tested on a 
Intel i5 520m with 4gb ram/ 4gb swap (Lenovo t410)
Intel E6400 with 3gb ram/ 3gb swap 

This bug is really hard to deal with because it usually requires a hard restart. Please fix ASAP if possible
Comment 3 SlayerProof32 2018-05-19 21:44:32 UTC
Update: Another user reported excess flash drive usage on high memory usage when booted as a live user. This means that swap is not the issue. The issue is completely with the system RAM management
Comment 4 Andrew Morton 2018-05-21 23:17:32 UTC
Is this a 32-bit kernel or 64?

You selected a hardware type of "IA-64"!
Comment 5 SlayerProof32 2018-05-21 23:27:08 UTC
As far as I know, this bug effects all 64 bit linux kernels released since 2007
Comment 6 SlayerProof32 2018-05-21 23:28:13 UTC
is there a more appropriate hardware classification? I only have intel 64 bit hardware to test with.
Comment 7 SlayerProof32 2018-05-21 23:30:03 UTC
The ones listed are the kernels (as of today) that i've noticed the issue with
Comment 8 SlayerProof32 2018-05-23 00:17:10 UTC
The computer also seems to freeze when the disk is being used a lot.
Comment 9 SlayerProof32 2018-05-30 00:49:16 UTC
issue still occurs in the new kernels (4.16.11)
Comment 10 SlayerProof32 2018-05-30 00:50:04 UTC
Also occurs in 4.16.12
Comment 11 SlayerProof32 2018-05-30 00:54:05 UTC
High disk usage, or high memory usage seems to cause this issue. Swap puts both these issues together.
Comment 12 SlayerProof32 2018-06-03 22:08:00 UTC
Still occurs 4.16.13. Please fix
Comment 13 SlayerProof32 2018-06-05 01:47:17 UTC
This is a critical issue, and Linux desktop can never be stable on older hardware with a big issue like this. Recently, my Linux system crashed with 4 tabs of google docs open in Firefox, and VLC. The same issue does not occur in Windows(Ugh, Microsoft) but when this issue occurs, and system freezes completely, I am forced to go back to windows. Please fix this critical bug, and make me not have to touch windows ever again. I would greatly appreciate it.
Comment 14 Matthew Wilcox 2018-06-06 18:07:18 UTC
Please provide more detail about your system configuration.  lspci -v, cat /proc/scsi/scsi, etc (https://www.kernel.org/doc/html/latest/admin-guide/reporting-bugs.html#gather-information)

I suspect your I/O subsystem simply can't cope with the load being thrown at it.  It's *probably* seeks, but I don't know whether you're using an SSD or rotating storage.
Comment 15 SlayerProof32 2018-06-06 21:28:32 UTC
Sorry for not including that information. My Hard Disk is a Seagate ST9200420ASG running in a lenovo t410.

https://www.cnet.com/products/seagate-momentus-laptop-st9200420asg-hard-drive-200-gb-sata-3gb-s/specs/
--------------------------------Detailed info----------------------------------
-Lspci -v
https://pastebin.com/mnj4Bamu
------------------------------------------------------------------------------
-cat /proc/scsi/scsi

Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST9200420ASG     Rev: D   
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: MATSHITA Model: DVD-RAM UJ892    Rev: SB01
  Type:   CD-ROM                           ANSI  SCSI revision: 05

--------------------------------------------------------------------------------
-Parted -l

Disk /dev/sda: 200GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: pmbr_boot

Number  Start   End     Size    File system     Name  Flags
 1      17.4kB  21.0GB  21.0GB  btrfs
 2      21.0GB  42.4GB  21.5GB  ext4
 5      42.4GB  164GB   122GB   ext4
 6      164GB   189GB   24.7GB  btrfs
 3      189GB   193GB   4295MB  linux-swap(v1)        swap
 4      193GB   193GB   8389kB                        bios_grub

I am booted off sda6 (btrfs) and using sda5 as a home partition. Swap is on sda3.
----------------------------------------------------------------------------
Cat /proc/version

Linux version 4.16.13-300.fc28.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 8.1.1 20180502 (Red Hat 8.1.1-1) (GCC)) #1 SMP Wed May 30 14:31:00 UTC 2018
----------------------------------------------------------------------------
Cat /proc/ioports
0000-0cf7 : PCI Bus 0000:00
  0000-001f : dma1
  0020-0021 : pic1
  0040-0043 : timer0
  0050-0053 : timer1
  0060-0060 : keyboard
  0061-0061 : PNP0800:00
  0062-0062 : PNP0C09:00
    0062-0062 : EC data
  0064-0064 : keyboard
  0066-0066 : PNP0C09:00
    0066-0066 : EC cmd
  0070-0071 : rtc0
  0080-008f : dma page reg
  00a0-00a1 : pic2
  00c0-00df : dma2
  00f0-00ff : fpu
    00f0-00f0 : PNP0C04:00
  03c0-03df : vga+
  0800-080f : pnp 00:01
0cf8-0cff : PCI conf1
0d00-ffff : PCI Bus 0000:00
  1000-107f : pnp 00:01
    1000-1003 : ACPI PM1a_EVT_BLK
    1004-1005 : ACPI PM1a_CNT_BLK
    1008-100b : ACPI PM_TMR
    1020-102f : ACPI GPE0_BLK
    1030-1033 : iTCO_wdt.0.auto
      1030-1033 : iTCO_wdt
    1050-1050 : ACPI PM2_CNT_BLK
    1060-107f : iTCO_wdt.0.auto
      1060-107f : iTCO_wdt
  1180-11ff : pnp 00:01
  15e0-15ef : pnp 00:01
  1600-1641 : pnp 00:01
  164e-164f : pnp 00:01
  1800-1807 : 0000:00:02.0
  1808-180f : 0000:00:16.3
    1808-180f : serial
  1810-1813 : 0000:00:1f.2
    1810-1813 : ahci
  1814-1817 : 0000:00:1f.2
    1814-1817 : ahci
  1818-181f : 0000:00:1f.2
    1818-181f : ahci
  1820-183f : 0000:00:19.0
  1840-185f : 0000:00:1f.2
    1840-185f : ahci
  1860-1867 : 0000:00:1f.2
    1860-1867 : ahci
  1880-189f : 0000:00:1f.3
    1880-189f : i801_smbus
  2000-2fff : PCI Bus 0000:05
-----------------------------------------------------------------------------
/proc/iomem
00000000-00000fff : Reserved
00001000-0009e7ff : System RAM
0009e800-0009ffff : Reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c7fff : Video ROM
000c8000-000cbfff : pnp 00:00
000cc000-000cffff : pnp 00:00
000d0000-000d0fff : Adapter ROM
000d1000-000d1fff : Adapter ROM
000d2000-000d3fff : Reserved
000d4000-000d7fff : PCI Bus 0000:00
000d8000-000dbfff : PCI Bus 0000:00
000dc000-000fffff : Reserved
  000e0000-000effff : Extension ROM
  000f0000-000fffff : System ROM
00100000-bb27bfff : System RAM
  03000000-03c031d0 : Kernel code
  03c031d1-04387f7f : Kernel data
  04931000-04a86fff : Kernel bss
bb27c000-bb281fff : Reserved
bb282000-bb35dfff : System RAM
bb35e000-bb370fff : Reserved
bb371000-bb3f1fff : ACPI Non-volatile Storage
bb3f2000-bb40efff : Reserved
bb40f000-bb46efff : System RAM
bb46f000-bb667fff : Reserved
bb668000-bb6e7fff : ACPI Non-volatile Storage
bb6e8000-bb70efff : Reserved
bb70f000-bb716fff : System RAM
bb717000-bb71efff : Reserved
bb71f000-bb76afff : System RAM
bb76b000-bb776fff : ACPI Non-volatile Storage
bb777000-bb779fff : ACPI Tables
bb77a000-bb780fff : ACPI Non-volatile Storage
bb781000-bb781fff : ACPI Tables
bb782000-bb78afff : ACPI Non-volatile Storage
bb78b000-bb78bfff : ACPI Tables
bb78c000-bb79efff : ACPI Non-volatile Storage
bb79f000-bb7fefff : ACPI Tables
bb7ff000-bb7fffff : System RAM
bb800000-bfffffff : Reserved
  be000000-bfffffff : Graphics Stolen Memory
c0000000-febfffff : PCI Bus 0000:00
  c0000000-c0000fff : Intel Flush Page
  d0000000-dfffffff : 0000:00:02.0
  e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff]
    e0000000-efffffff : Reserved
      e0000000-efffffff : pnp 00:01
  f0000000-f1ffffff : PCI Bus 0000:05
  f2000000-f23fffff : 0000:00:02.0
  f2400000-f24fffff : PCI Bus 0000:03
    f2400000-f2401fff : 0000:03:00.0
      f2400000-f2401fff : iwlwifi
  f2500000-f25fffff : PCI Bus 0000:0d
    f2500000-f25000ff : 0000:0d:00.0
      f2500000-f25000ff : mmc0
    f2500400-f25004ff : 0000:0d:00.1
    f2500800-f2500fff : 0000:0d:00.3
      f2500800-f2500fff : firewire_ohci
  f2600000-f261ffff : 0000:00:19.0
    f2600000-f261ffff : e1000e
  f2620000-f2623fff : 0000:00:1b.0
    f2620000-f2623fff : ICH HD audio
  f2624000-f2624fff : 0000:00:16.3
  f2625000-f2625fff : 0000:00:19.0
    f2625000-f2625fff : e1000e
  f2626000-f2626fff : 0000:00:1f.6
    f2626000-f2626fff : 0000:00:1f.6
  f2827000-f28277ff : 0000:00:1f.2
    f2827000-f28277ff : ahci
  f2827800-f282780f : 0000:00:16.0
    f2827800-f282780f : mei_me
  f2828000-f28283ff : 0000:00:1a.0
    f2828000-f28283ff : ehci_hcd
  f2828400-f28287ff : 0000:00:1d.0
    f2828400-f28287ff : ehci_hcd
  f2828800-f28288ff : 0000:00:1f.3
  f2900000-f29fffff : PCI Bus 0000:05
  feaff000-feafffff : Reserved
    feaff000-feafffff : pnp 00:01
fec00000-fec0ffff : Reserved
  fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
  fed00000-fed003ff : Reserved
    fed00000-fed003ff : PNP0103:00
fed10000-fed13fff : pnp 00:01
fed18000-fed18fff : pnp 00:01
fed19000-fed19fff : pnp 00:01
fed1c000-fed8ffff : Reserved
  fed1c000-fed1ffff : pnp 00:01
    fed1f410-fed1f414 : iTCO_wdt.0.auto
      fed1f410-fed1f414 : iTCO_wdt.0.auto
  fed40000-fed44fff : TPM
  fed45000-fed4bfff : pnp 00:01
fee00000-fee00fff : Local APIC
  fee00000-fee00fff : Reserved
ff000000-ffffffff : Reserved
100000000-137ffffff : System RAM
-------------------------------------------------------------------------
Here is all the diagnostic info I could find that I believe to be relevant. Since i'm not a kernel expert, if there are other places for me to look for logs/diagnostics, please tell me, and I will happily fetch them.
Comment 16 SlayerProof32 2018-06-06 22:04:45 UTC
Created attachment 276357 [details]
Dmesg after boot
Comment 17 SlayerProof32 2018-06-08 00:40:31 UTC
https://www.cnet.com/products/lenovo-thinkpad-t410-2522/review/
(with integrated graphics)
Comment 18 SlayerProof32 2018-06-08 00:43:51 UTC
https://support.lenovo.com/sg/en/solutions/pd006109 this one is better. Just for you to get an idea of the hardware i am using.
Comment 19 SlayerProof32 2018-06-08 00:45:11 UTC
It is the i5-520m version
Comment 20 SlayerProof32 2018-06-16 16:09:38 UTC
This still occurs on kernel 4.17.2. If the isssue is a I/O subsystem overload, how can we fix this? The computer i'am using is relatively new, and it shouldn't require bleeding edge hardware to run linux properly without crashes when I have more than 6 firefox tabs open.

I'm willing to run any test you want.
Comment 21 lou 2018-09-23 08:16:17 UTC
Update:

I'd commented in detail about this bug in the other thread (https://bugzilla.kernel.org/show_bug.cgi?id=196729). I run the live versions of Linux on a 4GB Core-i5 laptop (and another 4GB pentium laptop also.)

Just wanted to add:

I've added 4Gb of RAM to the Core-i5 laptop for 8Gb total.

With Fedora 28, the system will still cease up with maybe 2 dozen (or less depending on what's happening (video, etc) ) FF tabs opened/active.

I came back here to note that, I'm currently using a Live Debian Stretch (9.5).

There are obviously significant differences in the way these variants of Linux manage memory.


Why?

Because under the same system conditions (Gnome, same s/w programs installed and/or running), I can open WAY more tabs in FF on Debian; open more simultaneous programs, without fear of a sudden system heart-attack.

In fact, it is much harder for me to cause the system freeze in Debian, even with approaching 50 tabs opened in FF developer 63...

I understand there are underlying Fedora vs Debian system differences like: systemd vs init, and Wayland vs Xorg, Gnome versions (3.28.1 vs 3.22.5) and kernel revisions (4.16.3-301.fc28.x86_64 vs 4.9.110-1 (2018-07-05) ), but in all, I find Debian WAYYYY more forgiving, and more manageable, ESPECIALLY in light of this FATAL flaw, AND the known Gnome memory leak bug which can easily be remediated for in Debian by restarting Gnome (via Alt-F2, r) to free back up that memory. (The only way to accomplish this in Fedora is to actually log out of your session because of Wayland limitations.)

Anyway, I jut thought it's another data point to add to the mystery.

I still have to keep resource monitor opened even in Stretch, just in case, but I only crashed Stretch once over the past 3 months or so when I was in the 80's (mem % used) and let a video play for 2 hrs without checking up.

Normally anyway, that percentage isn't rising above the 70's in my typical "working" environment.

Finally, I'd like to mention to those asking for logs, etc., for this issue, realize that WHEN this issue occurs, it *is* essentially a heart-attack for the system. There is no recourse, and no way to gather logs. EVERYTHING ceases up- usually never to come back. A hard power-cycle is the only recourse, and NO logs  which would shed light on the issue are written. EVERYTHING stops- including log writing.

This is the reality.

I *do* have a few logs from and old (non-live) Jesse 8.7.1 install-- for a few times when, the system did revive, after hours-- and there's nothing in there that would shed light on the issue. The few entries in the log that I've researched pointed to no other instances/causes of this same issue.

It would be nice after 11 or 12 years of this issue, if someone higher up and more knowledgeable in the development "food chain" would would simply replicate the issue, it's not really that hard to do so at all.

It honestly is a show-stopper.

Ciao.
Comment 22 SlayerProof32 2018-12-21 04:44:13 UTC
Now on Manjaro Linux 18.0.1 with the same laptop and 8Gib of ram.
1. Memory gets near full
2. Swap partition starts to fill
3. Swap gets 80% full
4. System freezes. Shouldn’t the OOM killer be killing processes? Why isn’t it? My laptop was frozen for 2 hours with constant hard drive writes. This behavior is not seen in windows on the same machine. I eventually had to hard restart.

@Iou is right. It is a show stopper, and should have been fixed by now after being a bug for 13 or so years. When it happens, it is impossible to collect logs, even if I do something like top -b >top.log 

How to test: Bootup your favorite distro
1. Do something memory intensive, like compiling, something that will use all your ram. 
2. Open some Firefox tabs.
3. Watch disk thrashing occur with no way to get logs. Hold down the power button when you are ready to try again.
Comment 23 SlayerProof32 2018-12-21 04:45:14 UTC
If you like, I can open a new report with all the sys info in one place.

Note You need to log in before you can comment on or make changes to this bug.