Bug 11720

Summary: when running smb service on a lv with 3 snapshot, system out of memory, and then client transfer stop
Product: IO/Storage Reporter: zhanghj (zhanghj_2000)
Component: LVM2/DMAssignee: Alasdair G Kergon (agk)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: high CC: agk, wen_zl, zhanghj_2000, zhlqcn
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24.7 Subsystem:
Regression: No Bisected commit-id:

Description zhanghj 2008-10-08 06:58:22 UTC
Latest working kernel version:
2.6.24.7
Earliest failing kernel version:
Distribution:

Problem Description:

when i running samba service on a logical volume with 3 snapshot created, 
and then copy files to the smb share dir from windows server2003, after about 
60GB data was  transfered, the transfer broken, and then the system print "out of memory " message.

the message like this:
irqbalance invoked oom-killer: gfp_mask=0x40d0, order=0, oomkilladj=0
Pid: 3405, comm: irqbalance Not tainted 2.6.24.7 #1
 [<c014cbd5>] oom_kill_process+0x105/0x110
 [<c014cf53>] out_of_memory+0x183/0x1c0
 [<c014f701>] __alloc_pages+0x281/0x380
 [<c0178b93>] do_path_lookup+0x73/0x1b0
 [<c014f8e3>] __get_free_pages+0x53/0x60
 [<c01acd31>] stat_open+0x51/0xb0
 [<c01a6ffe>] proc_reg_open+0x3e/0x80
 [<c01a6fc0>] proc_reg_open+0x0/0x80
 [<c016e8d7>] __dentry_open+0xb7/0x1d0
 [<c016eaa5>] nameidata_to_filp+0x35/0x40
 [<c016eafb>] do_filp_open+0x4b/0x60
 [<c016e432>] get_unused_fd_flags+0x52/0xd0
 [<c016eb5c>] do_sys_open+0x4c/0xe0
 [<c016ec2c>] sys_open+0x1c/0x20
 [<c0102b3e>] sysenter_past_esp+0x5f/0x85
 =======================
Mem-info:
DMA per-cpu:
CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    2: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    4: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    5: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    6: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    7: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd: 174   Cold: hi:   62, btch:  15 usd:  47
CPU    1: Hot: hi:  186, btch:  31 usd: 117   Cold: hi:   62, btch:  15 usd:   9
CPU    2: Hot: hi:  186, btch:  31 usd: 185   Cold: hi:   62, btch:  15 usd:  54
CPU    3: Hot: hi:  186, btch:  31 usd:  99   Cold: hi:   62, btch:  15 usd:   6
CPU    4: Hot: hi:  186, btch:  31 usd: 138   Cold: hi:   62, btch:  15 usd:  46
CPU    5: Hot: hi:  186, btch:  31 usd: 109   Cold: hi:   62, btch:  15 usd:  17
CPU    6: Hot: hi:  186, btch:  31 usd: 147   Cold: hi:   62, btch:  15 usd:  48
CPU    7: Hot: hi:  186, btch:  31 usd: 166   Cold: hi:   62, btch:  15 usd:  54
HighMem per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd: 129   Cold: hi:   62, btch:  15 usd:   5
CPU    1: Hot: hi:  186, btch:  31 usd: 152   Cold: hi:   62, btch:  15 usd:   2
CPU    2: Hot: hi:  186, btch:  31 usd:  64   Cold: hi:   62, btch:  15 usd:   9
CPU    3: Hot: hi:  186, btch:  31 usd: 174   Cold: hi:   62, btch:  15 usd:   4
CPU    4: Hot: hi:  186, btch:  31 usd:  37   Cold: hi:   62, btch:  15 usd:   2
CPU    5: Hot: hi:  186, btch:  31 usd:  24   Cold: hi:   62, btch:  15 usd:   1
CPU    6: Hot: hi:  186, btch:  31 usd:  14   Cold: hi:   62, btch:  15 usd:   2
CPU    7: Hot: hi:  186, btch:  31 usd:  47   Cold: hi:   62, btch:  15 usd:  13
Active:12045 inactive:85452 dirty:3 writeback:1998 unstable:0
 free:741906 slab:177482 mapped:3202 pagetables:221 bounce:0
DMA free:4656kB min:1168kB low:1460kB high:1752kB active:32kB inactive:0kB present:16256kB pages_scanned:176 all_unreclaimable? yes
lowmem_reserve[]: 0 873 5572 5572
Normal free:63480kB min:64364kB low:80452kB high:96544kB active:14864kB inactive:14672kB present:894080kB pages_scanned:44802 all_unreclaimable? yes
lowmem_reserve[]: 0 0 37592 37592
HighMem free:2899488kB min:512kB low:87112kB high:173712kB active:33284kB inactive:327136kB present:4811776kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 0*4kB 0*8kB 3*16kB 4*32kB 4*64kB 5*128kB 4*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4656kB
Normal: 141*4kB 37*8kB 12*16kB 3*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 15*4096kB = 63356kB
HighMem: 26*4kB 37*8kB 17*16kB 2*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 707*4096kB = 2899488kB
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap  = 0kB
Total swap = 0kB
Free swap:            0kB
1441792 pages of RAM
1212416 pages of HIGHMEM
408096 reserved pages
83655 pages shared
0 pages swap cached
3 pages dirty
1998 pages writeback
3202 pages mapped
177482 pages slab
221 pages pagetables
Out of memory: kill process 14478 (smbd) score 4672 or a child
Killed process 14478 (smbd)

then i reboot the system, and notice more than 800MB memory was used, 
the mem info was this:

cat /proc/meminfo 
MemTotal:      4134784 kB
MemFree:       3244336 kB
Buffers:         52928 kB
Cached:          90512 kB
SwapCached:          0 kB
Active:          48148 kB
Inactive:       110140 kB
HighTotal:     3270980 kB
HighFree:      3154348 kB
LowTotal:       863804 kB
LowFree:         89988 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:               8 kB
Writeback:           0 kB
AnonPages:       14868 kB
Mapped:          11052 kB
Slab:           669020 kB
SReclaimable:     5916 kB
SUnreclaim:     663104 kB
PageTables:        744 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   2067392 kB
Committed_AS:    94684 kB
VmallocTotal:   118776 kB
VmallocUsed:      7920 kB
VmallocChunk:   110724 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     2048 kB

# slabtop
 Active / Total Objects (% used)    : 27891183 / 27893316 (100.0%)
 Active / Total Slabs (% used)      : 166671 / 166671 (100.0%)
 Active / Total Caches (% used)     : 52 / 70 (74.3%)
 Active / Total Size (% used)       : 665653.52K / 665885.20K (100.0%)
 Minimum / Average / Maximum Object : 0.01K / 0.02K / 16.12K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
27785990 27785990 100%    0.02K 163447      170    653788K Acpi-Namespace
 28543  27824  97%    0.05K    391       73      1564K buffer_head
 13005  13004  99%    0.05K    153       85       612K sysfs_dir_cache
  9360   9331  99%    0.13K    312       30      1248K dentry
  7680   7679  99%    0.01K     15      512        60K kmalloc-8
  7616   7553  99%    0.06K    119       64       476K kmalloc-64
  6912   6459  93%    0.02K     27      256       108K kmalloc-16
  5120   5105  99%    0.03K     40      128       160K kmalloc-32
  2880   2878  99%    0.33K    240       12       960K inode_cache
  2142   2140  99%    0.04K     21      102        84K pid_namespace
  2090   2078  99%    0.34K    190       11       760K proc_inode_cache
  2048   2048 100%    0.02K      8      256        32K anon_vma
  2048   2048 100%    0.02K      8      256        32K revoke_record
  1984   1691  85%    0.12K     62       32       248K kmalloc-128
  1904   1902  99%    0.48K    238        8       952K ext2_inode_cache
  1833   1833 100%    0.29K    141       13       564K radix_tree_node
  1748   1392  79%    0.09K     38       46       152K vm_area_struct
  1360   1360 100%    0.02K      8      170        32K journal_handle
  1071   1013  94%    0.19K     51       21       204K kmalloc-192
  1032   1027  99%    0.50K    129        8       516K kmalloc-512
   927    925  99%    0.43K    103        9       412K shmem_inode_cache
   608    597  98%    2.00K    152        4      1216K kmalloc-2048
   525    525 100%    0.26K     35       15       140K xfs_efd_item
   488    488 100%    0.48K     61        8       244K ext3_inode_cache
   462    462 100%    0.09K     11       42        44K kmalloc-96
   350    315  90%    0.38K     35       10       140K signal_cache
   312    312 100%    0.10K      8       39        32K file_lock_cache
   308    307  99%    0.14K     11       28        44K idr_layer_cache
   306    278  90%    0.44K     34        9       136K mm_struct
   282    271  96%    1.31K     47        6       376K task_struct
   275    266  96%    0.75K     55        5       220K biovec-64
   264    262  99%    1.31K     44        6       352K sighand_cache

i notice the slab alloc a lot of memory for the Acpi-Namespace, and then i 
disable the acpi in the grub with "acpi=off" parameter, and then reboot the system.

after the system reboot ,get the slab info like this:
 Active / Total Objects (% used)    : 27890721 / 27892288 (100.0%)
 Active / Total Slabs (% used)      : 166681 / 166681 (100.0%)
 Active / Total Caches (% used)     : 52 / 70 (74.3%)
 Active / Total Size (% used)       : 665715.05K / 665934.09K (100.0%)
 Minimum / Average / Maximum Object : 0.01K / 0.02K / 16.12K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
27785310 27785265  99%    0.02K 163443      170    653772K ip_fib_hash
 28105  27999  99%    0.05K    385       73      1540K buffer_head
 13770  13769  99%    0.05K    162       85       648K sysfs_dir_cache
 10440  10404  99%    0.13K    348       30      1392K dentry
  7680   7490  97%    0.01K     15      512        60K kmalloc-8
  7488   7449  99%    0.06K    117       64       468K kmalloc-64
  6912   6758  97%    0.02K     27      256       108K kmalloc-16
  4736   4734  99%    0.03K     37      128       148K kmalloc-32
  2712   2710  99%    0.33K    226       12       904K inode_cache
  2442   2412  98%    0.34K    222       11       888K proc_inode_cache
  2304   2163  93%    0.02K      9      256        36K anon_vma
  2048   2048 100%    0.02K      8      256        32K revoke_record
  1984   1552  78%    0.12K     62       32       248K kmalloc-128
  1920   1918  99%    0.48K    240        8       960K ext2_inode_cache
  1859   1859 100%    0.29K    143       13       572K radix_tree_node
  1518   1311  86%    0.09K     33       46       132K vm_area_struct
  1360   1360 100%    0.02K      8      170        32K journal_handle
  1029   1004  97%    0.19K     49       21       196K kmalloc-192
   992    986  99%    0.50K    124        8       496K kmalloc-512
   873    873 100%    0.43K     97        9       388K shmem_inode_cache
   816    816 100%    0.04K      8      102        32K pid_namespace
   580    575  99%    2.00K    145        4      1160K kmalloc-2048
   540    540 100%    0.26K     36       15       144K xfs_efd_item
   464    460  99%    0.48K     58        8       232K ext3_inode_cache
   462    459  99%    0.09K     11       42        44K kmalloc-96
   360    319  88%    0.38K     36       10       144K signal_cache
   336    328  97%    0.14K     12       28        48K idr_layer_cache
   312    312 100%    0.10K      8       39        32K file_lock_cache
   300    273  91%    1.31K     50        6       400K task_struct
   297    275  92%    0.44K     33        9       132K mm_struct
   270    258  95%    1.31K     45        6       360K sighand_cache
   255    255 100%    0.75K     51        5       204K biovec-64
   245    245 100%    1.50K     49        5       392K biovec-128
   230    230 100%    3.00K    115        2       920K biovec-256

now the ip_fib_hash was alloced a lot of memory. what reason cause this?
and then i start copy data from win2003 to the smb share dir, the transfer was 
broken again.

the shared lv info:
  LV       VG   Attr   LSize   Origin   Snap%  Move Log Copy% 
  lv-ext3  vga  owi-a- 600.03G                                
  ss0      vga  swi-a- 200.03G lv-ext3  35.39                
  ss1      vga  swi-a- 200.03G lv-ext3  35.39                
  ss2      vga  swi-a- 200.03G lv-ext3  35.39  
the lvm version:
  LVM version:     2.02.16 (2006-12-01)
  Library version: 1.02.13 (2006-11-28)
  Driver version:  4.12.0
Comment 1 zhanghj 2008-10-08 07:07:50 UTC
when i run the command "lvchange -a n vga" , then i find the memory alloced for
Acpi-Namespace or ip_fib_hash was free, so maybe every snapshot was alloced about 200MB memory.
Comment 2 skysky 2008-10-16 04:54:59 UTC
The exception-table map the copied data location on the ‘origin-real’ device to the block location on the ‘snap-cow’ device. The ‘exceptiontable’mapping helps to route read/write I/O requests coming for the snapshot volume
to the right place. But when snapshot created and copying masses amount data, The exception-table entry is leaped and consume  lots of memory. moreover,the exception-table is in low memory which size is 896M. So, soon Out of memory is happened. The exception-table will be rebuilt at boot. 

  
   How and when to resolve it?

   thanks in advance.
Comment 3 Roel Broersma 2008-12-11 09:28:44 UTC
Did anyone already solved this in a Kervel version... ?
(i'm also having troubles with this bug..)
Comment 4 Roel Broersma 2008-12-14 07:25:09 UTC
We have appliad the fix in (OpenFiler) Kernel: 2.6.26.8-1.0.7  AND IT WORKS !
I've load tested for several days and it still works!  (disabling snapshots also seem to 'fix' the issue)

(for more info, see ticket: http://bugzilla.kernel.org/show_bug.cgi?id=11636 )
(also see that ticket for the kernel fix)
The creator of the fix (Mikulas Patocka) says there is also a lot (of this stuff) solved in the 2.6.28RC8 kernel.  Thanks!
Comment 5 Alasdair G Kergon 2009-10-18 18:38:21 UTC
With the snapshot changes made since this was reported which reduce memory usage, can we mark this one 'resolved' now?
Comment 6 Alasdair G Kergon 2012-05-19 01:09:16 UTC
Closing old bug.