Bug 40262

Summary: PROBLEM: I/O storm from hell on kernel 3.0.0 when touch swap (swapfile or partition)
Product: Memory Management Reporter: g0re (g0re)
Component: OtherAssignee: Andrew Morton (akpm)
Status: RESOLVED CODE_FIX    
Severity: normal CC: florian, maciej.rutecki, rjw, StMichalke, stuffcorpse
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.0.0 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 36912    
Attachments: kernel config based on ARCH linux distro

Description g0re 2011-07-28 12:40:54 UTC
Created attachment 66982 [details]
kernel config based on ARCH linux distro

issue occurs in new kernel 3.0. 
does not occurs in 2.6.39.3/2.6.38.8

copy a file bigger than ram size (tar/cp/scp/dd/smb , local and remote)
load some .torrent with rtorrent (debian dvd isofiles)

observed in 3.0, the cache is not shrinking when another app request ram and swap occurs (mouse lag+keyboard lag+window redraw lag+ssh lag...)
observed in 2.6.3x.y, the cache is shrinking when another app request ram (opera/thunderbird/seamonkey/rdesktop) and swap occurs only when the cache is very low (and smoothly)

tested I/O schedulers: noop deadline cfq (no difference)
tested FS: XFS, JFS, EXT4, reiser3 (no difference)
tested HDDs: WDC WD400BB-60JKA0 (40GB - PATA) (to test pata_via)
             SAMSUNG SP0802N (80GB - PATA) (to test pata_via)
	     SAMSUNG HD103SI (1TB - SATA) (to test sata_via)
             SAMSUNG HM100UX (1TB - SATA) (to test sata_via)

obs:
nice -n 20 ionice -c3 trick does not work
tune vfs_cache_pressure/swappiness/dirty_ratio/dirty_background_ratio does not help too

some nfo when storm begins
/proc/meminfo
MemTotal:         446532 kB
MemFree:            5392 kB
Buffers:            3664 kB
Cached:           368872 kB
SwapCached:        20412 kB
Active:            89000 kB
Inactive:         331392 kB
Active(anon):      23048 kB
Inactive(anon):    24840 kB
Active(file):      65952 kB
Inactive(file):   306552 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:             0 kB
HighFree:              0 kB
LowTotal:         446532 kB
LowFree:            5392 kB
SwapTotal:        681980 kB
SwapFree:         600028 kB
Dirty:                20 kB
Writeback:             0 kB
AnonPages:         37972 kB
Mapped:            31592 kB
Shmem:                16 kB
Slab:              12304 kB
SReclaimable:       7596 kB
SUnreclaim:         4708 kB
KernelStack:         696 kB
PageTables:          960 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      905244 kB
Committed_AS:     187440 kB
VmallocTotal:     573496 kB
VmallocUsed:        5032 kB
VmallocChunk:     565776 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       4096 kB
DirectMap4k:       24512 kB
DirectMap4M:      434176 kB

extra: /proc/cmdline == printk.time=1 noisapnp libata.force=noncq logo.nologo maxcpus=4 nohz=off pci=nomsi pcie_pme=nomsi vga=normal video=vesafb:ywrap elevator=cfq libata.force=1.5Gbps loglevel=7

ver_linux 
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
 
Linux g0re 3.0.0-0x29A-20110722 #666 SMP PREEMPT Mon Jul 25 14:02:55 BRT 2011 i686 Intel(R) Pentium(R) 4 CPU 2.80GHz GenuineIntel GNU/Linux
 
Gnu C                  4.5.3
Gnu make               3.82
binutils               2.21.51.0.6.20110118
util-linux             2.18
mount                  support
module-init-tools      3.12
e2fsprogs              1.41.14
jfsutils               1.1.15
reiserfsprogs          3.6.21
xfsprogs               3.1.4
PPP                    2.4.5
Linux C Library        2.13
Dynamic linker (ldd)   2.13
Linux C++ Library      6.0.14
Procps                 3.2.8
Net-tools              1.60
Kbd                    1.15.2
Sh-utils               8.12
wireless-tools         29
Modules Loaded         fuse tun xt_length sch_htb xt_DSCP ipt_REDIRECT xt_tcpudp ipt_MASQUERADE iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables snd_pcm_oss snd_mixer_oss jfs lp parport_pc parport snd_via82xx gameport snd_ac97_codec ac97_bus snd_pcm snd_timer snd_page_alloc shpchp snd_mpu401_uart snd_rawmidi snd_seq_device snd pci_hotplug via_rhine i2c_viapro psmouse button via_agp agpgart sg evdev i2c_core soundcore mii serio_raw ext4 mbcache jbd2 crc16 sd_mod ata_generic usb_storage pata_via uhci_hcd sata_via pata_acpi libata ehci_hcd scsi_mod usbcore freq_table processor mperf


/proc/version
Linux version 3.0.0-0x29A-20110722 (root@g0re) (gcc version 4.5.3 (GCC) ) #666 SMP PREEMPT Mon Jul 25 14:02:55 BRT 2011


/proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 3
model name	: Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping	: 3
cpu MHz		: 2800.037
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pebs bts pni dtes64 monitor ds_cpl cid
bogomips	: 5602.41
clflush size	: 64
cache_alignment	: 128
address sizes	: 36 bits physical, 32 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 15
model		: 3
model name	: Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping	: 3
cpu MHz		: 2800.037
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 1
initial apicid	: 1
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pebs bts pni dtes64 monitor ds_cpl cid
bogomips	: 5602.19
clflush size	: 64
cache_alignment	: 128
address sizes	: 36 bits physical, 32 bits virtual
power management:

/proc/modules
tun 12822 2 - Live 0xdc819000
xt_length 864 2 - Live 0xdcd29000
sch_htb 12514 1 - Live 0xdcd3e000
xt_DSCP 1607 8 - Live 0xdccf9000
ipt_REDIRECT 907 1 - Live 0xdcc99000
xt_tcpudp 1875 9 - Live 0xdcc93000
ipt_MASQUERADE 1294 2 - Live 0xdcc2d000
iptable_mangle 1220 1 - Live 0xdcc07000
iptable_nat 3420 1 - Live 0xdcbe8000
nf_nat 12433 3 ipt_REDIRECT,ipt_MASQUERADE,iptable_nat, Live 0xdcbd0000
nf_conntrack_ipv4 9757 3 iptable_nat,nf_nat, Live 0xdcbcc000
nf_conntrack 50214 4 ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4, Live 0xdcbd7000
nf_defrag_ipv4 1015 1 nf_conntrack_ipv4, Live 0xdcb85000
iptable_filter 1092 0 - Live 0xdcb76000
ip_tables 9107 3 iptable_mangle,iptable_nat,iptable_filter, Live 0xdcb44000
x_tables 11698 9 xt_length,xt_DSCP,ipt_REDIRECT,xt_tcpudp,ipt_MASQUERADE,iptable_mangle,iptable_nat,iptable_filter,ip_tables, Live 0xdca90000
snd_pcm_oss 33598 0 - Live 0xdcb2e000
snd_mixer_oss 12909 1 snd_pcm_oss, Live 0xdca7a000
jfs 162565 1 - Live 0xdcada000
lp 6652 0 - Live 0xdca69000
parport_pc 27896 0 - Live 0xdca58000
parport 24947 2 lp,parport_pc, Live 0xdca3f000
snd_via82xx 17602 0 - Live 0xdc9b1000
gameport 6648 1 snd_via82xx, Live 0xdc9a2000
snd_ac97_codec 90538 1 snd_via82xx, Live 0xdc97e000
ac97_bus 810 1 snd_ac97_codec, Live 0xdc958000
snd_pcm 59717 3 snd_pcm_oss,snd_via82xx,snd_ac97_codec, Live 0xdc940000
snd_timer 15279 1 snd_pcm, Live 0xdc921000
snd_page_alloc 5773 2 snd_via82xx,snd_pcm, Live 0xdc8f6000
shpchp 22541 0 - Live 0xdc8e8000
snd_mpu401_uart 5007 1 snd_via82xx, Live 0xdc8db000
snd_rawmidi 15095 1 snd_mpu401_uart, Live 0xdc8cf000
snd_seq_device 4321 1 snd_rawmidi, Live 0xdc8c4000
snd 43259 9 snd_pcm_oss,snd_mixer_oss,snd_via82xx,snd_ac97_codec,snd_pcm,snd_timer,snd_mpu401_uart,snd_rawmidi,snd_seq_device, Live 0xdc8ae000
pci_hotplug 22115 1 shpchp, Live 0xdc892000
via_rhine 19419 0 - Live 0xdc87e000
i2c_viapro 4819 0 - Live 0xdc84d000
psmouse 56709 0 - Live 0xdc833000
button 3607 0 - Live 0xdc805000
via_agp 4975 1 - Live 0xdcdbf000
agpgart 21810 1 via_agp, Live 0xdcd83000
sg 20628 0 - Live 0xdcd5b000
evdev 7244 4 - Live 0xdcd3b000
i2c_core 16622 1 i2c_viapro, Live 0xdcd2c000
soundcore 4993 1 snd, Live 0xdcd1e000
mii 3343 1 via_rhine, Live 0xdcd16000
serio_raw 3422 0 - Live 0xdcd0f000
ext4 339308 1 - Live 0xdcca5000
mbcache 4250 1 ext4, Live 0xdcc2f000
jbd2 59479 1 ext4, Live 0xdcc14000
crc16 1069 1 ext4, Live 0xdcbf8000
sd_mod 26702 3 - Live 0xdcbeb000
ata_generic 2519 0 - Live 0xdcbd5000
usb_storage 35068 0 - Live 0xdcbc2000
pata_via 6739 2 - Live 0xdcba8000
uhci_hcd 19736 0 - Live 0xdcb9a000
sata_via 6168 0 - Live 0xdcb8c000
pata_acpi 2420 0 - Live 0xdcb83000
libata 154596 4 ata_generic,pata_via,sata_via,pata_acpi, Live 0xdcb4c000
ehci_hcd 34227 0 - Live 0xdca96000
scsi_mod 112053 4 sg,sd_mod,usb_storage,libata, Live 0xdc904000
usbcore 119283 4 usb_storage,uhci_hcd,ehci_hcd, Live 0xdc855000
freq_table 2047 0 - Live 0xdc817000
processor 21901 0 - Live 0xdc809000
mperf 963 0 - Live 0xdc7fa000

/proc/ioports 
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-0060 : keyboard
0064-0064 : keyboard
0070-0073 : rtc0
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : 0000:00:0f.1
  0170-0177 : pata_via
01f0-01f7 : 0000:00:0f.1
  01f0-01f7 : pata_via
0290-029f : pnp 00:03
02f8-02ff : serial
0376-0376 : 0000:00:0f.1
  0376-0376 : pata_via
03c0-03df : vga+
03f6-03f6 : 0000:00:0f.1
  03f6-03f6 : pata_via
03f8-03ff : serial
04d0-04d1 : pnp 00:03
0500-050f : pnp 00:02
  0500-0507 : vt596_smbus
0800-0805 : pnp 00:03
0cf8-0cff : PCI conf1
4000-407f : pnp 00:02
  4000-4003 : ACPI PM1a_EVT_BLK
  4004-4005 : ACPI PM1a_CNT_BLK
  4008-400b : ACPI PM_TMR
  4020-4023 : ACPI GPE0_BLK
9000-9007 : 0000:00:0f.0
  9000-9007 : sata_via
9400-9403 : 0000:00:0f.0
  9400-9403 : sata_via
9800-9807 : 0000:00:0f.0
  9800-9807 : sata_via
9c00-9c03 : 0000:00:0f.0
  9c00-9c03 : sata_via
a000-a00f : 0000:00:0f.0
  a000-a00f : sata_via
a400-a4ff : 0000:00:0f.0
  a400-a4ff : sata_via
a800-a80f : 0000:00:0f.1
  a800-a80f : pata_via
ac00-ac1f : 0000:00:10.0
  ac00-ac1f : uhci_hcd
b000-b01f : 0000:00:10.1
  b000-b01f : uhci_hcd
b400-b41f : 0000:00:10.2
  b400-b41f : uhci_hcd
b800-b81f : 0000:00:10.3
  b800-b81f : uhci_hcd
bc00-bcff : 0000:00:11.5
  bc00-bcff : VIA8237
c000-c0ff : 0000:00:12.0
  c000-c0ff : via-rhine

/proc/iomem
00000000-0000ffff : reserved
00010000-0009f7ff : System RAM
0009f800-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c83ff : Video ROM
000f0000-000fffff : reserved
  000f0000-000fffff : System ROM
00100000-1bfeffff : System RAM
  01000000-0135a44d : Kernel code
  0135a44e-014b9a3f : Kernel data
  0153e000-01622fff : Kernel bss
1bff0000-1bff2fff : ACPI Non-volatile Storage
1bff3000-1bffffff : ACPI Tables
d0000000-d7ffffff : 0000:00:00.0
d8000000-dbffffff : PCI Bus 0000:01
  d8000000-dbffffff : 0000:01:00.0
dc000000-ddffffff : PCI Bus 0000:01
  dc000000-dcffffff : 0000:01:00.0
  dd000000-dd00ffff : 0000:01:00.0
de000000-de0000ff : 0000:00:10.4
  de000000-de0000ff : ehci_hcd
de001000-de0010ff : 0000:00:12.0
  de001000-de0010ff : via-rhine
fec00000-ffffffff : reserved
  fec00000-fec003ff : IOAPIC 0
  fee00000-fee00fff : Local APIC
    fee00000-fee00fff : pnp 00:00
  fff80000-fffeffff : pnp 00:00
  ffff0000-ffffffff : pnp 00:00

lspci -vvv
00:00.0 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
	Subsystem: Giga-byte Technology Device 5000
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
	Latency: 8
	Region 0: Memory at d0000000 (32-bit, prefetchable) [size=128M]
	Capabilities: [80] AGP version 3.5
		Status: RQ=8 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3+ Rate=x4,x8
		Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=<none>
	Capabilities: [50] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: agpgart-via
	Kernel modules: via-agp

00:00.1 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0

00:00.2 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0

00:00.3 Host bridge: VIA Technologies, Inc. PT890 Host Bridge
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Kernel modules: via-agp

00:00.4 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0

00:00.7 Host bridge: VIA Technologies, Inc. CN700/VN800/P4M800CE/Pro Host Bridge
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0

00:01.0 PCI bridge: VIA Technologies, Inc. VT8237/VX700 PCI Bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: dc000000-ddffffff
	Prefetchable memory behind bridge: d8000000-dbffffff
	Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR+ <PERR+
	BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [70] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel modules: shpchp

00:0f.0 IDE interface: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) (prog-if 8f [Master SecP SecO PriP PriO])
	Subsystem: Giga-byte Technology GA-7VM400AM(F) Motherboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 32
	Interrupt: pin B routed to IRQ 20
	Region 0: I/O ports at 9000 [size=8]
	Region 1: I/O ports at 9400 [size=4]
	Region 2: I/O ports at 9800 [size=8]
	Region 3: I/O ports at 9c00 [size=4]
	Region 4: I/O ports at a000 [size=16]
	Region 5: I/O ports at a400 [size=256]
	Capabilities: [c0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: sata_via
	Kernel modules: sata_via

00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
	Subsystem: Giga-byte Technology GA-7VAX Mainboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 32
	Interrupt: pin A routed to IRQ 20
	Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
	Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1]
	Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8]
	Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) [size=1]
	Region 4: I/O ports at a800 [size=16]
	Capabilities: [c0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: pata_via
	Kernel modules: pata_via, via82cxxx

00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) (prog-if 00 [UHCI])
	Subsystem: Giga-byte Technology GA-7VAX Mainboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 32, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 21
	Region 4: I/O ports at ac00 [size=32]
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: uhci_hcd
	Kernel modules: uhci-hcd

00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) (prog-if 00 [UHCI])
	Subsystem: Giga-byte Technology GA-7VAX Mainboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 32, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 21
	Region 4: I/O ports at b000 [size=32]
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: uhci_hcd
	Kernel modules: uhci-hcd

00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) (prog-if 00 [UHCI])
	Subsystem: Giga-byte Technology GA-7VAX Mainboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 32, Cache Line Size: 32 bytes
	Interrupt: pin B routed to IRQ 21
	Region 4: I/O ports at b400 [size=32]
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: uhci_hcd
	Kernel modules: uhci-hcd

00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) (prog-if 00 [UHCI])
	Subsystem: Giga-byte Technology GA-7VAX Mainboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 32, Cache Line Size: 32 bytes
	Interrupt: pin B routed to IRQ 21
	Region 4: I/O ports at b800 [size=32]
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: uhci_hcd
	Kernel modules: uhci-hcd

00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) (prog-if 20 [EHCI])
	Subsystem: Giga-byte Technology GA-7VAX Mainboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 32, Cache Line Size: 32 bytes
	Interrupt: pin C routed to IRQ 21
	Region 0: Memory at de000000 (32-bit, non-prefetchable) [size=256]
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: ehci_hcd
	Kernel modules: ehci-hcd

00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South]
	Subsystem: Giga-byte Technology GA-7VT600 Motherboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Capabilities: [c0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel modules: i2c-viapro

00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
	Subsystem: Giga-byte Technology GA-7VAX Onboard Audio (Realtek ALC650)
	Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin C routed to IRQ 22
	Region 0: I/O ports at bc00 [size=256]
	Capabilities: [c0] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: VIA 82xx Audio
	Kernel modules: snd-via82xx

00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78)
	Subsystem: Giga-byte Technology Device e000
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64 (750ns min, 2000ns max), Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 23
	Region 0: I/O ports at c000 [size=256]
	Region 1: Memory at de001000 (32-bit, non-prefetchable) [size=256]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: via-rhine
	Kernel modules: via-rhine

01:00.0 VGA compatible controller: VIA Technologies, Inc. CN700/P4M800 Pro/P4M800 CE/VN800 [S3 UniChrome Pro] (rev 01) (prog-if 00 [VGA controller])
	Subsystem: Giga-byte Technology Device d000
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 32 (500ns min)
	Interrupt: pin A routed to IRQ 10
	Region 0: Memory at d8000000 (32-bit, prefetchable) [size=64M]
	Region 1: Memory at dc000000 (32-bit, non-prefetchable) [size=16M]
	[virtual] Expansion ROM at dd000000 [disabled] [size=64K]
	Capabilities: [60] Power Management version 2
		Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [70] AGP version 3.0
		Status: RQ=256 Iso- ArqSz=0 Cal=7 SBA+ ITACoh- GART64- HTrans- 64bit- FW- AGP3+ Rate=x4,x8
		Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP- GART64- 64bit- FW- Rate=<none>
	Kernel modules: viafb
Comment 1 Steffen Michalke 2011-08-02 13:37:42 UTC
I am experiencing the same here. The problem occurs while reading (mapping) a file, eg. md5sum -b largefile. It looks like caching has the absolute priority now ;-)

The limit for the page cache seems to be the physical RAM since 3.0, therefore the system is swapping out running processes agressively while reading large files. That "freezes" the whole system within a few seconds until the file operation will end and the running processes will have regained their RAM from swap.

A dirty workaround is to run a loop like
while sleep <smallint>; do sync; sysctl -w vm.drop_caches=1; done
during the file operation in order to drop the page cache right before the kernel has to swap active processes in favour of caching.
Comment 2 g0re 2011-08-24 17:25:52 UTC
A workaround for me:

1- Disable swapfile/partition and use zram with ~75% of total ram
modprobe zram && echo $(($(grep  MemTotal /proc/meminfo | awk '{print $2}')*768)) > /sys/block/zram0/disksize && mkswap /dev/zram0 && swapon -p 100 /dev/zram0

(one line)

2- Tune the vm/sched
kernel.sched_autogroup_enabled = 1
kernel.sched_child_runs_first = 0
kernel.sched_latency_ns = 4000000
kernel.sched_migration_cost = 500000
kernel.sched_min_granularity_ns = 4000000
kernel.sched_nr_migrate = 32
kernel.sched_rt_period_us = 1000000
kernel.sched_rt_runtime_us = 950000
kernel.sched_shares_window = 10000000
kernel.sched_time_avg = 1000
kernel.sched_tunable_scaling = 1
kernel.sched_wakeup_granularity_ns = 2000000
vm.block_dump = 0
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 20
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 60
vm.dirty_writeback_centisecs = 500
vm.extfrag_threshold = 500
vm.highmem_is_dirtyable = 0
vm.hugepages_treat_as_movable = 0
vm.hugetlb_shm_group = 0
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256	32	32
vm.max_map_count = 65530
vm.min_free_kbytes = 4096
vm.mmap_min_addr = 4096
vm.nr_hugepages = 0
vm.nr_overcommit_hugepages = 0
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
vm.page-cluster = 0
vm.panic_on_oom = 0
vm.scan_unevictable_pages = 0
vm.stat_interval = 1
vm.swappiness = 20
vm.vdso_enabled = 1
vm.vfs_cache_pressure = 1000

3- Lowering the nr_requests to 4 and rise read_ahead_kb to 512

obs: nr_requests = 4 on newer hardware kill throughput a lot

The old behavior back on 3.1-rc* series

Zram becomes the "holy grail" for me
Comment 3 Andrew Morton 2011-08-26 23:32:58 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 28 Jul 2011 12:41:03 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=40262

Two people are reporting this - there are some additional details in
bugzilla.

We seem to be going around in circles here.

I'll ask Rafael and Maciej to track this as a regression :(

>            Summary: PROBLEM: I/O storm from hell on kernel 3.0.0 when
>                     touch swap (swapfile or partition)
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 3.0.0
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: akpm@linux-foundation.org
>         ReportedBy: g0re@null.net
>         Regression: No
> 
> 
> Created an attachment (id=66982)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=66982)
> kernel config based on ARCH linux distro
> 
> issue occurs in new kernel 3.0. 
> does not occurs in 2.6.39.3/2.6.38.8
> 
> copy a file bigger than ram size (tar/cp/scp/dd/smb , local and remote)
> load some .torrent with rtorrent (debian dvd isofiles)
> 
> observed in 3.0, the cache is not shrinking when another app request ram and
> swap occurs (mouse lag+keyboard lag+window redraw lag+ssh lag...)
> observed in 2.6.3x.y, the cache is shrinking when another app request ram
> (opera/thunderbird/seamonkey/rdesktop) and swap occurs only when the cache is
> very low (and smoothly)
> 
> tested I/O schedulers: noop deadline cfq (no difference)
> tested FS: XFS, JFS, EXT4, reiser3 (no difference)
> tested HDDs: WDC WD400BB-60JKA0 (40GB - PATA) (to test pata_via)
>              SAMSUNG SP0802N (80GB - PATA) (to test pata_via)
>          SAMSUNG HD103SI (1TB - SATA) (to test sata_via)
>              SAMSUNG HM100UX (1TB - SATA) (to test sata_via)
> 
> obs:
> nice -n 20 ionice -c3 trick does not work
> tune vfs_cache_pressure/swappiness/dirty_ratio/dirty_background_ratio does
> not
> help too
> 
> some nfo when storm begins
> /proc/meminfo
> MemTotal:         446532 kB
> MemFree:            5392 kB
> Buffers:            3664 kB
> Cached:           368872 kB
> SwapCached:        20412 kB
> Active:            89000 kB
> Inactive:         331392 kB
> Active(anon):      23048 kB
> Inactive(anon):    24840 kB
> Active(file):      65952 kB
> Inactive(file):   306552 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> HighTotal:             0 kB
> HighFree:              0 kB
> LowTotal:         446532 kB
> LowFree:            5392 kB
> SwapTotal:        681980 kB
> SwapFree:         600028 kB
> Dirty:                20 kB
> Writeback:             0 kB
> AnonPages:         37972 kB
> Mapped:            31592 kB
> Shmem:                16 kB
> Slab:              12304 kB
> SReclaimable:       7596 kB
> SUnreclaim:         4708 kB
> KernelStack:         696 kB
> PageTables:          960 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:      905244 kB
> Committed_AS:     187440 kB
> VmallocTotal:     573496 kB
> VmallocUsed:        5032 kB
> VmallocChunk:     565776 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:         0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       4096 kB
> DirectMap4k:       24512 kB
> DirectMap4M:      434176 kB
> 
> extra: /proc/cmdline == printk.time=1 noisapnp libata.force=noncq logo.nologo
> maxcpus=4 nohz=off pci=nomsi pcie_pme=nomsi vga=normal video=vesafb:ywrap
> elevator=cfq libata.force=1.5Gbps loglevel=7
> 
> ...
>
Comment 4 Anonymous Emailer 2011-08-28 11:14:18 UTC
Reply-To: khlebnikov@openvz.org

Andrew Morton wrote:
 >
 > (switched to email.  Please respond via emailed reply-to-all, not via the
 > bugzilla web interface).
 >
 > On Thu, 28 Jul 2011 12:41:03 GMT
 > bugzilla-daemon@bugzilla.kernel.org wrote:
 >
 >> https://bugzilla.kernel.org/show_bug.cgi?id=40262
 >
 > Two people are reporting this - there are some additional details in
 > bugzilla.
 >
 > We seem to be going around in circles here.
 >
 > I'll ask Rafael and Maciej to track this as a regression :(
 >

>>
>> issue occurs in new kernel 3.0.
>> does not occurs in 2.6.39.3/2.6.38.8
>>

I guess this can be caused by commit v2.6.39-6846-g246e87a "memcg: fix vmscan count in small memcgs"
(it also tweaked kswapd besides of memcg reclaimer)
it was fixed in v3.0-5361-g4508378 "memcg: fix get_scan_count() for small targets"

commit 4508378b9523e22a2a0175d8bf64d932fb10a67d
Author: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date:   Tue Jul 26 16:08:24 2011 -0700

     memcg: fix vmscan count in small memcgs

     Commit 246e87a93934 ("memcg: fix get_scan_count() for small targets")
     fixes the memcg/kswapd behavior against small targets and prevent vmscan
     priority too high.

     But the implementation is too naive and adds another problem to small
     memcg.  It always force scan to 32 pages of file/anon and doesn't handle
     swappiness and other rotate_info.  It makes vmscan to scan anon LRU
     regardless of swappiness and make reclaim bad.  This patch fixes it by
     adjusting scanning count with regard to swappiness at el.

     At a test "cat 1G file under 300M limit." (swappiness=20)
      before patch
             scanned_pages_by_limit 360919
             scanned_anon_pages_by_limit 180469
             scanned_file_pages_by_limit 180450
             rotated_pages_by_limit 31
             rotated_anon_pages_by_limit 25
             rotated_file_pages_by_limit 6
             freed_pages_by_limit 180458
             freed_anon_pages_by_limit 19
             freed_file_pages_by_limit 180439
             elapsed_ns_by_limit 429758872
      after patch
             scanned_pages_by_limit 180674
             scanned_anon_pages_by_limit 24
             scanned_file_pages_by_limit 180650
             rotated_pages_by_limit 35
             rotated_anon_pages_by_limit 24
             rotated_file_pages_by_limit 11
             freed_pages_by_limit 180634
             freed_anon_pages_by_limit 0
             freed_file_pages_by_limit 180634
             elapsed_ns_by_limit 367119089
             scanned_pages_by_system 0

     the numbers of scanning anon are decreased(as expected), and elapsed time
     reduced. By this patch, small memcgs will work better.
     (*) Because the amount of file-cache is much bigger than anon,
         recalaim_stat's rotate-scan counter make scanning files more.


KAMEZAWA Hiroyuki added to CC

>> copy a file bigger than ram size (tar/cp/scp/dd/smb , local and remote)
>> load some .torrent with rtorrent (debian dvd isofiles)
>>
>> observed in 3.0, the cache is not shrinking when another app request ram and
>> swap occurs (mouse lag+keyboard lag+window redraw lag+ssh lag...)
>> observed in 2.6.3x.y, the cache is shrinking when another app request ram
>> (opera/thunderbird/seamonkey/rdesktop) and swap occurs only when the cache
>> is
>> very low (and smoothly)
>>
>> tested I/O schedulers: noop deadline cfq (no difference)
>> tested FS: XFS, JFS, EXT4, reiser3 (no difference)
>> tested HDDs: WDC WD400BB-60JKA0 (40GB - PATA) (to test pata_via)
>>               SAMSUNG SP0802N (80GB - PATA) (to test pata_via)
>>           SAMSUNG HD103SI (1TB - SATA) (to test sata_via)
>>               SAMSUNG HM100UX (1TB - SATA) (to test sata_via)
>>
>> obs:
>> nice -n 20 ionice -c3 trick does not work
>> tune vfs_cache_pressure/swappiness/dirty_ratio/dirty_background_ratio does
>> not
>> help too
>>
>> some nfo when storm begins
>> /proc/meminfo
>> MemTotal:         446532 kB
>> MemFree:            5392 kB
>> Buffers:            3664 kB
>> Cached:           368872 kB
>> SwapCached:        20412 kB
>> Active:            89000 kB
>> Inactive:         331392 kB
>> Active(anon):      23048 kB
>> Inactive(anon):    24840 kB
>> Active(file):      65952 kB
>> Inactive(file):   306552 kB
>> Unevictable:           0 kB
>> Mlocked:               0 kB
>> HighTotal:             0 kB
>> HighFree:              0 kB
>> LowTotal:         446532 kB
>> LowFree:            5392 kB
>> SwapTotal:        681980 kB
>> SwapFree:         600028 kB
>> Dirty:                20 kB
>> Writeback:             0 kB
>> AnonPages:         37972 kB
>> Mapped:            31592 kB
>> Shmem:                16 kB
>> Slab:              12304 kB
>> SReclaimable:       7596 kB
>> SUnreclaim:         4708 kB
>> KernelStack:         696 kB
>> PageTables:          960 kB
>> NFS_Unstable:          0 kB
>> Bounce:                0 kB
>> WritebackTmp:          0 kB
>> CommitLimit:      905244 kB
>> Committed_AS:     187440 kB
>> VmallocTotal:     573496 kB
>> VmallocUsed:        5032 kB
>> VmallocChunk:     565776 kB
>> HardwareCorrupted:     0 kB
>> AnonHugePages:         0 kB
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:       4096 kB
>> DirectMap4k:       24512 kB
>> DirectMap4M:      434176 kB
>>
>> extra: /proc/cmdline == printk.time=1 noisapnp libata.force=noncq
>> logo.nologo
>> maxcpus=4 nohz=off pci=nomsi pcie_pme=nomsi vga=normal video=vesafb:ywrap
>> elevator=cfq libata.force=1.5Gbps loglevel=7
>>
>> ...
>>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>
Comment 5 KAMEZAWA Hiroyuki 2011-08-29 01:12:34 UTC
On Sun, 28 Aug 2011 15:13:35 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Andrew Morton wrote:
>  >
>  > (switched to email.  Please respond via emailed reply-to-all, not via the
>  > bugzilla web interface).
>  >
>  > On Thu, 28 Jul 2011 12:41:03 GMT
>  > bugzilla-daemon@bugzilla.kernel.org wrote:
>  >
>  >> https://bugzilla.kernel.org/show_bug.cgi?id=40262
>  >
>  > Two people are reporting this - there are some additional details in
>  > bugzilla.
>  >
>  > We seem to be going around in circles here.
>  >
>  > I'll ask Rafael and Maciej to track this as a regression :(
>  >
> 
> >>
> >> issue occurs in new kernel 3.0.
> >> does not occurs in 2.6.39.3/2.6.38.8
> >>
> 
> I guess this can be caused by commit v2.6.39-6846-g246e87a "memcg: fix vmscan
> count in small memcgs"
> (it also tweaked kswapd besides of memcg reclaimer)
> it was fixed in v3.0-5361-g4508378 "memcg: fix get_scan_count() for small
> targets"
> 
> commit 4508378b9523e22a2a0175d8bf64d932fb10a67d
> Author: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Date:   Tue Jul 26 16:08:24 2011 -0700
> 
>      memcg: fix vmscan count in small memcgs
> 
>      Commit 246e87a93934 ("memcg: fix get_scan_count() for small targets")
>      fixes the memcg/kswapd behavior against small targets and prevent vmscan
>      priority too high.
> 
>      But the implementation is too naive and adds another problem to small
>      memcg.  It always force scan to 32 pages of file/anon and doesn't handle
>      swappiness and other rotate_info.  It makes vmscan to scan anon LRU
>      regardless of swappiness and make reclaim bad.  This patch fixes it by
>      adjusting scanning count with regard to swappiness at el.
> 
>      At a test "cat 1G file under 300M limit." (swappiness=20)
>       before patch
>              scanned_pages_by_limit 360919
>              scanned_anon_pages_by_limit 180469
>              scanned_file_pages_by_limit 180450
>              rotated_pages_by_limit 31
>              rotated_anon_pages_by_limit 25
>              rotated_file_pages_by_limit 6
>              freed_pages_by_limit 180458
>              freed_anon_pages_by_limit 19
>              freed_file_pages_by_limit 180439
>              elapsed_ns_by_limit 429758872
>       after patch
>              scanned_pages_by_limit 180674
>              scanned_anon_pages_by_limit 24
>              scanned_file_pages_by_limit 180650
>              rotated_pages_by_limit 35
>              rotated_anon_pages_by_limit 24
>              rotated_file_pages_by_limit 11
>              freed_pages_by_limit 180634
>              freed_anon_pages_by_limit 0
>              freed_file_pages_by_limit 180634
>              elapsed_ns_by_limit 367119089
>              scanned_pages_by_system 0
> 
>      the numbers of scanning anon are decreased(as expected), and elapsed
>      time
>      reduced. By this patch, small memcgs will work better.
>      (*) Because the amount of file-cache is much bigger than anon,
>          recalaim_stat's rotate-scan counter make scanning files more.
> 

Ah, yes. this patch may be able to fix the probelm...could you try ? 

Thanks,
-Kame
Comment 6 Steffen Michalke 2011-08-31 01:03:25 UTC
Am Montag, den 29.08.2011, 09:01 +0900 schrieb KAMEZAWA Hiroyuki: 
> On Sun, 28 Aug 2011 15:13:35 +0400
> Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:
> 
> > Andrew Morton wrote:
> >  >
> >  > (switched to email.  Please respond via emailed reply-to-all, not via
> the
> >  > bugzilla web interface).
> >  >
> >  > On Thu, 28 Jul 2011 12:41:03 GMT
> >  > bugzilla-daemon@bugzilla.kernel.org wrote:
> >  >
> >  >> https://bugzilla.kernel.org/show_bug.cgi?id=40262
> >  >
> >  > Two people are reporting this - there are some additional details in
> >  > bugzilla.
> >  >
> >  > We seem to be going around in circles here.
> >  >
> >  > I'll ask Rafael and Maciej to track this as a regression :(
> >  >
> > 
> > >>
> > >> issue occurs in new kernel 3.0.
> > >> does not occurs in 2.6.39.3/2.6.38.8
> > >>
> > 
> > I guess this can be caused by commit v2.6.39-6846-g246e87a "memcg: fix
> vmscan count in small memcgs"
> > (it also tweaked kswapd besides of memcg reclaimer)
> > it was fixed in v3.0-5361-g4508378 "memcg: fix get_scan_count() for small
> targets"
> > 
> > commit 4508378b9523e22a2a0175d8bf64d932fb10a67d
> > Author: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Date:   Tue Jul 26 16:08:24 2011 -0700
> > 
> >      memcg: fix vmscan count in small memcgs
> > 
> >      Commit 246e87a93934 ("memcg: fix get_scan_count() for small targets")
> >      fixes the memcg/kswapd behavior against small targets and prevent
> vmscan
> >      priority too high.
> > 
> >      But the implementation is too naive and adds another problem to small
> >      memcg.  It always force scan to 32 pages of file/anon and doesn't
> handle
> >      swappiness and other rotate_info.  It makes vmscan to scan anon LRU
> >      regardless of swappiness and make reclaim bad.  This patch fixes it by
> >      adjusting scanning count with regard to swappiness at el.
> > 
> >      At a test "cat 1G file under 300M limit." (swappiness=20)
> >       before patch
> >              scanned_pages_by_limit 360919
> >              scanned_anon_pages_by_limit 180469
> >              scanned_file_pages_by_limit 180450
> >              rotated_pages_by_limit 31
> >              rotated_anon_pages_by_limit 25
> >              rotated_file_pages_by_limit 6
> >              freed_pages_by_limit 180458
> >              freed_anon_pages_by_limit 19
> >              freed_file_pages_by_limit 180439
> >              elapsed_ns_by_limit 429758872
> >       after patch
> >              scanned_pages_by_limit 180674
> >              scanned_anon_pages_by_limit 24
> >              scanned_file_pages_by_limit 180650
> >              rotated_pages_by_limit 35
> >              rotated_anon_pages_by_limit 24
> >              rotated_file_pages_by_limit 11
> >              freed_pages_by_limit 180634
> >              freed_anon_pages_by_limit 0
> >              freed_file_pages_by_limit 180634
> >              elapsed_ns_by_limit 367119089
> >              scanned_pages_by_system 0
> > 
> >      the numbers of scanning anon are decreased(as expected), and elapsed
> time
> >      reduced. By this patch, small memcgs will work better.
> >      (*) Because the amount of file-cache is much bigger than anon,
> >          recalaim_stat's rotate-scan counter make scanning files more.
> > 
> 
> Ah, yes. this patch may be able to fix the probelm...could you try ? 
> 
> Thanks,
> -Kame

I have applied your memcg-fix-vmscan-count-in-small-memcgs patch to the
new kernel v3.0.4. It works wonderfully, thank you a lot! I have tested
reading and copying large files and found that these operations do not
strain the memory of other applications anymore.

Thank you,
Steffen
Comment 7 Florian Mickler 2012-01-24 23:10:51 UTC
If I understood the bug correctly, this is fixed.

Fixed in v3.1-rc1 by 

commit 4508378b9523e22a2a0175d8bf64d932fb10a67d
Author: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date:   Tue Jul 26 16:08:24 2011 -0700

     memcg: fix vmscan count in small memcgs