Latest working kernel version: 2.6.24.7 Earliest failing kernel version: Distribution: Problem Description: when i running samba service on a logical volume with 3 snapshot created, and then copy files to the smb share dir from windows server2003, after about 60GB data was transfered, the transfer broken, and then the system print "out of memory " message. the message like this: irqbalance invoked oom-killer: gfp_mask=0x40d0, order=0, oomkilladj=0 Pid: 3405, comm: irqbalance Not tainted 2.6.24.7 #1 [<c014cbd5>] oom_kill_process+0x105/0x110 [<c014cf53>] out_of_memory+0x183/0x1c0 [<c014f701>] __alloc_pages+0x281/0x380 [<c0178b93>] do_path_lookup+0x73/0x1b0 [<c014f8e3>] __get_free_pages+0x53/0x60 [<c01acd31>] stat_open+0x51/0xb0 [<c01a6ffe>] proc_reg_open+0x3e/0x80 [<c01a6fc0>] proc_reg_open+0x0/0x80 [<c016e8d7>] __dentry_open+0xb7/0x1d0 [<c016eaa5>] nameidata_to_filp+0x35/0x40 [<c016eafb>] do_filp_open+0x4b/0x60 [<c016e432>] get_unused_fd_flags+0x52/0xd0 [<c016eb5c>] do_sys_open+0x4c/0xe0 [<c016ec2c>] sys_open+0x1c/0x20 [<c0102b3e>] sysenter_past_esp+0x5f/0x85 ======================= Mem-info: DMA per-cpu: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 CPU 2: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 CPU 3: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 CPU 4: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 CPU 5: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 CPU 6: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 CPU 7: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: Hot: hi: 186, btch: 31 usd: 174 Cold: hi: 62, btch: 15 usd: 47 CPU 1: Hot: hi: 186, btch: 31 usd: 117 Cold: hi: 62, btch: 15 usd: 9 CPU 2: Hot: hi: 186, btch: 31 usd: 185 Cold: hi: 62, btch: 15 usd: 54 CPU 3: Hot: hi: 186, btch: 31 usd: 99 Cold: hi: 62, btch: 15 usd: 6 CPU 4: Hot: hi: 186, btch: 31 usd: 138 Cold: hi: 62, btch: 15 usd: 46 CPU 5: Hot: hi: 186, btch: 31 usd: 109 Cold: hi: 62, btch: 15 usd: 17 CPU 6: Hot: hi: 186, btch: 31 usd: 147 Cold: hi: 62, btch: 15 usd: 48 CPU 7: Hot: hi: 186, btch: 31 usd: 166 Cold: hi: 62, btch: 15 usd: 54 HighMem per-cpu: CPU 0: Hot: hi: 186, btch: 31 usd: 129 Cold: hi: 62, btch: 15 usd: 5 CPU 1: Hot: hi: 186, btch: 31 usd: 152 Cold: hi: 62, btch: 15 usd: 2 CPU 2: Hot: hi: 186, btch: 31 usd: 64 Cold: hi: 62, btch: 15 usd: 9 CPU 3: Hot: hi: 186, btch: 31 usd: 174 Cold: hi: 62, btch: 15 usd: 4 CPU 4: Hot: hi: 186, btch: 31 usd: 37 Cold: hi: 62, btch: 15 usd: 2 CPU 5: Hot: hi: 186, btch: 31 usd: 24 Cold: hi: 62, btch: 15 usd: 1 CPU 6: Hot: hi: 186, btch: 31 usd: 14 Cold: hi: 62, btch: 15 usd: 2 CPU 7: Hot: hi: 186, btch: 31 usd: 47 Cold: hi: 62, btch: 15 usd: 13 Active:12045 inactive:85452 dirty:3 writeback:1998 unstable:0 free:741906 slab:177482 mapped:3202 pagetables:221 bounce:0 DMA free:4656kB min:1168kB low:1460kB high:1752kB active:32kB inactive:0kB present:16256kB pages_scanned:176 all_unreclaimable? yes lowmem_reserve[]: 0 873 5572 5572 Normal free:63480kB min:64364kB low:80452kB high:96544kB active:14864kB inactive:14672kB present:894080kB pages_scanned:44802 all_unreclaimable? yes lowmem_reserve[]: 0 0 37592 37592 HighMem free:2899488kB min:512kB low:87112kB high:173712kB active:33284kB inactive:327136kB present:4811776kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 0*4kB 0*8kB 3*16kB 4*32kB 4*64kB 5*128kB 4*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4656kB Normal: 141*4kB 37*8kB 12*16kB 3*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 15*4096kB = 63356kB HighMem: 26*4kB 37*8kB 17*16kB 2*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 707*4096kB = 2899488kB Swap cache: add 0, delete 0, find 0/0, race 0+0 Free swap = 0kB Total swap = 0kB Free swap: 0kB 1441792 pages of RAM 1212416 pages of HIGHMEM 408096 reserved pages 83655 pages shared 0 pages swap cached 3 pages dirty 1998 pages writeback 3202 pages mapped 177482 pages slab 221 pages pagetables Out of memory: kill process 14478 (smbd) score 4672 or a child Killed process 14478 (smbd) then i reboot the system, and notice more than 800MB memory was used, the mem info was this: cat /proc/meminfo MemTotal: 4134784 kB MemFree: 3244336 kB Buffers: 52928 kB Cached: 90512 kB SwapCached: 0 kB Active: 48148 kB Inactive: 110140 kB HighTotal: 3270980 kB HighFree: 3154348 kB LowTotal: 863804 kB LowFree: 89988 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 8 kB Writeback: 0 kB AnonPages: 14868 kB Mapped: 11052 kB Slab: 669020 kB SReclaimable: 5916 kB SUnreclaim: 663104 kB PageTables: 744 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 2067392 kB Committed_AS: 94684 kB VmallocTotal: 118776 kB VmallocUsed: 7920 kB VmallocChunk: 110724 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB # slabtop Active / Total Objects (% used) : 27891183 / 27893316 (100.0%) Active / Total Slabs (% used) : 166671 / 166671 (100.0%) Active / Total Caches (% used) : 52 / 70 (74.3%) Active / Total Size (% used) : 665653.52K / 665885.20K (100.0%) Minimum / Average / Maximum Object : 0.01K / 0.02K / 16.12K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 27785990 27785990 100% 0.02K 163447 170 653788K Acpi-Namespace 28543 27824 97% 0.05K 391 73 1564K buffer_head 13005 13004 99% 0.05K 153 85 612K sysfs_dir_cache 9360 9331 99% 0.13K 312 30 1248K dentry 7680 7679 99% 0.01K 15 512 60K kmalloc-8 7616 7553 99% 0.06K 119 64 476K kmalloc-64 6912 6459 93% 0.02K 27 256 108K kmalloc-16 5120 5105 99% 0.03K 40 128 160K kmalloc-32 2880 2878 99% 0.33K 240 12 960K inode_cache 2142 2140 99% 0.04K 21 102 84K pid_namespace 2090 2078 99% 0.34K 190 11 760K proc_inode_cache 2048 2048 100% 0.02K 8 256 32K anon_vma 2048 2048 100% 0.02K 8 256 32K revoke_record 1984 1691 85% 0.12K 62 32 248K kmalloc-128 1904 1902 99% 0.48K 238 8 952K ext2_inode_cache 1833 1833 100% 0.29K 141 13 564K radix_tree_node 1748 1392 79% 0.09K 38 46 152K vm_area_struct 1360 1360 100% 0.02K 8 170 32K journal_handle 1071 1013 94% 0.19K 51 21 204K kmalloc-192 1032 1027 99% 0.50K 129 8 516K kmalloc-512 927 925 99% 0.43K 103 9 412K shmem_inode_cache 608 597 98% 2.00K 152 4 1216K kmalloc-2048 525 525 100% 0.26K 35 15 140K xfs_efd_item 488 488 100% 0.48K 61 8 244K ext3_inode_cache 462 462 100% 0.09K 11 42 44K kmalloc-96 350 315 90% 0.38K 35 10 140K signal_cache 312 312 100% 0.10K 8 39 32K file_lock_cache 308 307 99% 0.14K 11 28 44K idr_layer_cache 306 278 90% 0.44K 34 9 136K mm_struct 282 271 96% 1.31K 47 6 376K task_struct 275 266 96% 0.75K 55 5 220K biovec-64 264 262 99% 1.31K 44 6 352K sighand_cache i notice the slab alloc a lot of memory for the Acpi-Namespace, and then i disable the acpi in the grub with "acpi=off" parameter, and then reboot the system. after the system reboot ,get the slab info like this: Active / Total Objects (% used) : 27890721 / 27892288 (100.0%) Active / Total Slabs (% used) : 166681 / 166681 (100.0%) Active / Total Caches (% used) : 52 / 70 (74.3%) Active / Total Size (% used) : 665715.05K / 665934.09K (100.0%) Minimum / Average / Maximum Object : 0.01K / 0.02K / 16.12K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 27785310 27785265 99% 0.02K 163443 170 653772K ip_fib_hash 28105 27999 99% 0.05K 385 73 1540K buffer_head 13770 13769 99% 0.05K 162 85 648K sysfs_dir_cache 10440 10404 99% 0.13K 348 30 1392K dentry 7680 7490 97% 0.01K 15 512 60K kmalloc-8 7488 7449 99% 0.06K 117 64 468K kmalloc-64 6912 6758 97% 0.02K 27 256 108K kmalloc-16 4736 4734 99% 0.03K 37 128 148K kmalloc-32 2712 2710 99% 0.33K 226 12 904K inode_cache 2442 2412 98% 0.34K 222 11 888K proc_inode_cache 2304 2163 93% 0.02K 9 256 36K anon_vma 2048 2048 100% 0.02K 8 256 32K revoke_record 1984 1552 78% 0.12K 62 32 248K kmalloc-128 1920 1918 99% 0.48K 240 8 960K ext2_inode_cache 1859 1859 100% 0.29K 143 13 572K radix_tree_node 1518 1311 86% 0.09K 33 46 132K vm_area_struct 1360 1360 100% 0.02K 8 170 32K journal_handle 1029 1004 97% 0.19K 49 21 196K kmalloc-192 992 986 99% 0.50K 124 8 496K kmalloc-512 873 873 100% 0.43K 97 9 388K shmem_inode_cache 816 816 100% 0.04K 8 102 32K pid_namespace 580 575 99% 2.00K 145 4 1160K kmalloc-2048 540 540 100% 0.26K 36 15 144K xfs_efd_item 464 460 99% 0.48K 58 8 232K ext3_inode_cache 462 459 99% 0.09K 11 42 44K kmalloc-96 360 319 88% 0.38K 36 10 144K signal_cache 336 328 97% 0.14K 12 28 48K idr_layer_cache 312 312 100% 0.10K 8 39 32K file_lock_cache 300 273 91% 1.31K 50 6 400K task_struct 297 275 92% 0.44K 33 9 132K mm_struct 270 258 95% 1.31K 45 6 360K sighand_cache 255 255 100% 0.75K 51 5 204K biovec-64 245 245 100% 1.50K 49 5 392K biovec-128 230 230 100% 3.00K 115 2 920K biovec-256 now the ip_fib_hash was alloced a lot of memory. what reason cause this? and then i start copy data from win2003 to the smb share dir, the transfer was broken again. the shared lv info: LV VG Attr LSize Origin Snap% Move Log Copy% lv-ext3 vga owi-a- 600.03G ss0 vga swi-a- 200.03G lv-ext3 35.39 ss1 vga swi-a- 200.03G lv-ext3 35.39 ss2 vga swi-a- 200.03G lv-ext3 35.39 the lvm version: LVM version: 2.02.16 (2006-12-01) Library version: 1.02.13 (2006-11-28) Driver version: 4.12.0
when i run the command "lvchange -a n vga" , then i find the memory alloced for Acpi-Namespace or ip_fib_hash was free, so maybe every snapshot was alloced about 200MB memory.
The exception-table map the copied data location on the ‘origin-real’ device to the block location on the ‘snap-cow’ device. The ‘exceptiontable’mapping helps to route read/write I/O requests coming for the snapshot volume to the right place. But when snapshot created and copying masses amount data, The exception-table entry is leaped and consume lots of memory. moreover,the exception-table is in low memory which size is 896M. So, soon Out of memory is happened. The exception-table will be rebuilt at boot. How and when to resolve it? thanks in advance.
Did anyone already solved this in a Kervel version... ? (i'm also having troubles with this bug..)
We have appliad the fix in (OpenFiler) Kernel: 2.6.26.8-1.0.7 AND IT WORKS ! I've load tested for several days and it still works! (disabling snapshots also seem to 'fix' the issue) (for more info, see ticket: http://bugzilla.kernel.org/show_bug.cgi?id=11636 ) (also see that ticket for the kernel fix) The creator of the fix (Mikulas Patocka) says there is also a lot (of this stuff) solved in the 2.6.28RC8 kernel. Thanks!
With the snapshot changes made since this was reported which reduce memory usage, can we mark this one 'resolved' now?
Closing old bug.