Latest working kernel version: (not known) Earliest failing kernel version: fails at least with 2.6.24.3-ubuntu and 2.6.25.6 Distribution: Ubuntu Linux 8.04 server Hardware Environment: x86_64 Software Environment: Problem Description: System starts behaving strange after about an hour of heavy iSCSI traffic: the console locks up (keys typed in are not echoed) and error messages are logged about problems with memory allocation. Steps to reproduce: This is how my setup looks like: - One server is running OpenSolaris build 90 and there have been 64 iSCSI targets configured on this system, where each target is 10 GB in size. - On the second server I have installed Ubuntu 8.04 server 64-bit, and I replaced the Ubuntu kernel by the 2.6.25.6 kernel. - Both systems are connected via a 4x SDR InfiniBand network (10 Gbit/s). - Open-iscsi was configured on the Linux server such that it connects to all 64 iSCSI targets and such that I/O to all 64 targets was active simultaneously. I used the following command (see below for the scripts): ./iscsi-logout-and-login && ./iscsi-stress-test & # cat iscsi-logout #!/bin/bash target_ip="${1:-192.168.67.10}" iscsiadm -m session \ | while read proto num ip iqn do iscsiadm -m node -p ${ip} -T $iqn --logout done # cat iscsi-logout-and-login #!/bin/bash target_ip=192.168.67.10 ./iscsi-logout "${target_ip}" i=0 iscsiadm -m discovery -t st -p ${target_ip} \ | grep '\.67\.' \ | cut -f2 -d' ' \ | sort -k 1.60 \ | while read iqn do if [ "$1" != "" -a "$i" -ge "${1:-0}" ]; then break fi iscsiadm -m node -p ${target_ip} -T ${iqn} --login iscsiadm --mode node -p ${target_ip} -T ${iqn} \ --op update -n "node.session.timeo.replacement_timeout" -v 10 iscsiadm --mode node -p ${target_ip} -T ${iqn} \ --op update -n "node.conn[0].timeo.noop_out_interval" -v 0 iscsiadm --mode node -p ${target_ip} -T ${iqn} \ --op update -n "node.conn[0].timeo.noop_out_timeout" -v 0 i=$((i+1)) done # cat iscsi-stress-test #!/bin/bash while true do date for d in /dev/disk/by-path/ip-*:3260-iscsi-iqn.1986-03.com.sun* do dd if=/dev/zero of=$d bs=100K & done wait done
Created attachment 16445 [details] Kernel config (.config)
Created attachment 16446 [details] /var/log/kern.log
Same test runs fine over Ethernet, so this is probably an issue in either the ib_ipoib kernel module or the ib_mthca kernel module.
Can you specify also which HCA HW you are using?
Another question: is this IPoIB datagram mode or connected mode? There is one GFP_ATOMIC allocation in the connected mode RX refill path; other than that I don't see anything in the IB side that is likely to hit.
Does the system actually recover? You say "the console locks up". Does it stay locked up, or is this a transient thing?
(In reply to comment #4) > Can you specify also which HCA HW you are using? This is what lspci told me about the HCA hardware: $ lspci|grep Inf 06:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev a0)
(In reply to comment #5) > Another question: is this IPoIB datagram mode or connected mode? iSCSI was set up such that it uses TCP/IPv4 -- does this answer your question ?
(In reply to comment #6) > Does the system actually recover? You say "the > console locks up". Does it stay locked up, or is this a transient thing? What I see is the following: - Any keys typed in at the console are no longer echoed. - If the console screensaver is active, pressing a key at the console does no longer unblank the console. - Anything I type in in an open ssh session is no longer echoed. - The system does no longer respond to ICMP requests sent to the InfiniBand interface. - The system still responds to most but not all ICMP requests sent to the Ethernet interface: # ping 192.168.64.8 -c 10 PING 192.168.64.8 (192.168.64.8) 56(84) bytes of data. 64 bytes from 192.168.64.8: icmp_seq=1 ttl=64 time=0.137 ms 64 bytes from 192.168.64.8: icmp_seq=2 ttl=64 time=0.094 ms 64 bytes from 192.168.64.8: icmp_seq=5 ttl=64 time=0.089 ms 64 bytes from 192.168.64.8: icmp_seq=6 ttl=64 time=0.150 ms 64 bytes from 192.168.64.8: icmp_seq=8 ttl=64 time=0.139 ms 64 bytes from 192.168.64.8: icmp_seq=9 ttl=64 time=0.068 ms 64 bytes from 192.168.64.8: icmp_seq=10 ttl=64 time=0.118 ms --- 192.168.64.8 ping statistics --- 10 packets transmitted, 7 received, 30% packet loss, time 8999ms rtt min/avg/max/mdev = 0.068/0.113/0.150/0.030 ms
re comment #8: no, the question is what /sys/class/net/ib0/mode shows.
(In reply to comment #10) > re comment #8: > > no, the question is what /sys/class/net/ib0/mode shows. $ cat /sys/class/net/ib0/mode datagram
Has anyone done anything further about this? One thing to do would be to take snapshots of /proc/slabinfo and /proc/meminfo while the test is running to try and see where memory is going. And if I recall correctly, there is something that can be done with slub_debug to track alloc callers to see who did allocations, in case one slab is tying up lots of memory.
Created attachment 16546 [details] Contents of /proc/meminfo, sampled once every 10 seconds. Appended the output of the following script, running on a vanilla 2.6.25.7 kernel: #!/bin/bash # Header echo -n "Time " cat /proc/meminfo | \ while read tag value unit do echo -n "${tag%:} " done echo "" # Actual log while true do echo -n "$(date +%s) " cat /proc/meminfo | \ while read tag value unit do echo -n "${value} " done echo "" sleep 10 done
(In reply to comment #12) > One thing to do would be to take snapshots of /proc/slabinfo and > /proc/meminfo > while the test is running to try and see where memory is going. If I interpret the contents of /proc/meminfo correctly, there is no SLAB/SLUB leak: the value of SUnreclaim remains constant during the test run.
FWIW, I _was_ seeing this too, but on plain GbE using two iscsi devices in an md raid-1 on a plain PC x86_64 initiator. The system did not lock up, however. It appears the problem was with my tweaked the TCP window/buffer size settings in /etc/sysctl.conf. After removing my tweaks (leaving the defaults) I no longer get this problem.
Hi, I'm experiencing memory problems as well. Situation: I've set up a machine as a FreeNAS 8.0.3 server with iSCSI. I've booted an imac with CloneZilla 1.2.10 (Linux kernel 3.1.0) Iscsi is set up according to: http://ninjix.blogspot.com/2010/05/use-clonezilla-for-physical-disk-to.html I'm copying a partition using partclone.hfsp -o /mnt/sdb/shalala.sda2.partclone -c -s /dev/sda2 The throughput is fantastic! It's around 40-60MB/s, on a direct computer-to-computer 1GB Ethernet link, which has a maximum of 100MB/s (6GB/min). I'm getting kernel messages like the following, on the console. No obvious effects on the system, but I don't know if the data will be damaged. [ 2688.898855] kworker/u:2: page allocation failure: order:1, mode:0x20 [ 2688.900009] Pid: 2058, comm: kworker/u:2 Tainted: G C 3.1.0-1-486 #1 [ 2688.900009] Call Trace: [ 2688.900009] [<c1077923>] ? warn_alloc_failed+0xbb/0xc9 [ 2688.900009] [<c10796a4>] ? __alloc_pages_nodemask+0x573/0x5da [ 2688.900009] [<c1099543>] ? ____cache_alloc+0x20b/0x39d [ 2688.900009] [<c10997df>] ? __kmalloc+0x63/0xab [ 2688.900009] [<c11d80a6>] ? pskb_expand_head+0xbc/0x1fa [ 2688.900009] [<c11d80a6>] ? pskb_expand_head+0xbc/0x1fa [ 2688.900009] [<c11d88b5>] ? __pskb_pull_tail+0x3d/0x1fe [ 2688.900009] [<c11dfb19>] ? dev_hard_start_xmit+0x312/0x46e [ 2688.900009] [<c11f0353>] ? sch_direct_xmit+0x32/0xc6 [ 2688.900009] [<c11dfe07>] ? dev_queue_xmit+0x192/0x299 [ 2688.900009] [<c11fef3e>] ? dst_output+0x9/0xa [ 2688.900009] [<c120f988>] ? tcp_transmit_skb+0x628/0x650 [ 2688.900009] [<c1210350>] ? tcp_write_xmit+0x628/0x712 [ 2688.900009] [<c1210475>] ? __tcp_push_pending_frames+0x15/0x42 [ 2688.900009] [<c1206eee>] ? tcp_sendpage+0x352/0x37d [ 2688.900009] [<c121de10>] ? inet_sendpage+0x56/0x7d [ 2688.900009] [<c1206b9c>] ? tcp_sendmsg+0x62a/0x62a [ 2688.900009] [<f8bd33be>] ? iscsi_sw_tcp_pdu_xmit+0x8d/0x1f2 [iscsi_tcp] [ 2688.900009] [<f8cb0226>] ? iscsi_tcp_task_xmit+0x1b/0x1a2 [libiscsi_tcp] [ 2688.900009] [<f8bbf980>] ? iscsi_xmit_task+0x2a/0x51 [libiscsi] [ 2688.900009] [<f8bc11d1>] ? iscsi_xmitworker+0x192/0x1e7 [libiscsi] [ 2688.900009] [<c12752f0>] ? common_interrupt+0x30/0x40 [ 2688.900009] [<c1032e97>] ? process_one_work+0xfb/0x1dc [ 2688.900009] [<f8bc103f>] ? iscsi_prep_scsi_cmd_pdu+0x4dd/0x4dd [libiscsi] [ 2688.900009] [<c10337a1>] ? worker_thread+0xa3/0x11a [ 2688.900009] [<c10336fe>] ? manage_workers.isra.33+0x18a/0x18a [ 2688.900009] [<c1035a4a>] ? kthread+0x62/0x67 [ 2688.900009] [<c10359e8>] ? flush_kthread_worker+0x73/0x73 [ 2688.900009] [<c1275306>] ? kernel_thread_helper+0x6/0xd [ 2688.900009] Mem-Info: [ 2688.900009] DMA per-cpu: [ 2688.900009] CPU 0: hi: 0, btch: 1 usd: 0 [ 2688.900009] Normal per-cpu: [ 2688.900009] CPU 0: hi: 186, btch: 31 usd: 58 [ 2688.900009] HighMem per-cpu: [ 2688.900009] CPU 0: hi: 186, btch: 31 usd: 29 [ 2688.900009] active_anon:4209 inactive_anon:21976 isolated_anon:0 [ 2688.900009] active_file:37896 inactive_file:429908 isolated_file:0 [ 2688.900009] unevictable:809 dirty:2745 writeback:16769 unstable:0 [ 2688.900009] free:3416 slab_reclaimable:9354 slab_unreclaimable:3288 [ 2688.900009] mapped:1132 shmem:22758 pagetables:146 bounce:0 [ 2688.900009] DMA free:3492kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:20kB inactive_file:10780kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:1184kB slab_unreclaimable:420kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2688.900009] lowmem_reserve[]: 0 865 2000 2000 [ 2688.900009] Normal free:4660kB min:3728kB low:4660kB high:5592kB active_anon:0kB inactive_anon:0kB active_file:28408kB inactive_file:785980kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:885944kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:36232kB slab_unreclaimable:12732kB kernel_stack:688kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2688.900009] lowmem_reserve[]: 0 0 9086 9086 [ 2688.900009] HighMem free:5512kB min:512kB low:1732kB high:2956kB active_anon:16836kB inactive_anon:87904kB active_file:123156kB inactive_file:922872kB unevictable:3236kB isolated(anon):0kB isolated(file):0kB present:1163044kB mlocked:3236kB dirty:10980kB writeback:67076kB mapped:4524kB shmem:91032kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:584kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2688.900009] lowmem_reserve[]: 0 0 0 0 [ 2688.900009] DMA: 13*4kB 4*8kB 5*16kB 8*32kB 8*64kB 4*128kB 4*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 3492kB [ 2688.900009] Normal: 1115*4kB 1*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4660kB [ 2688.900009] HighMem: 2*4kB 2*8kB 3*16kB 2*32kB 2*64kB 3*128kB 1*256kB 1*512kB 2*1024kB 1*2048kB 0*4096kB = 5512kB [ 2688.900009] 491060 total pagecache pages [ 2688.900009] 0 pages in swap cache [ 2688.900009] Swap cache stats: add 0, delete 0, find 0/0 [ 2688.900009] Free swap = 0kB [ 2688.900009] Total swap = 0kB [ 2688.900009] 520361 pages RAM [ 2688.900009] 293051 pages HighMem [ 2688.900009] 5480 pages reserved [ 2688.900009] 463661 pages shared [ 2688.900009] 53096 pages non-shared
This bug relates to a very old kernel. Closing as obsolete. If you see it in a recent kernel please open a new bug