Bug 9934

Summary: InfiniBand: ib_write_bw triggers kernel bug in ib_mthca
Product: Drivers Reporter: Bart Van Assche (bvanassche)
Component: Infiniband/RDMAAssignee: Roland Dreier (roland)
Status: RESOLVED CODE_FIX    
Severity: high CC: protasnb, roland
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Kernel config (.config)
add missing sg_init_table() call

Description Bart Van Assche 2008-02-11 04:54:35 UTC
Latest working kernel version: (not known)
Earliest failing kernel version: 2.6.24
Distribution: Ubuntu 7.10 server
Hardware Environment: Intel S5000PAL motherboard + Mellanox MT25204 InfiniBand HCA
Software Environment: Ubuntu 7.10 server + OFED 1.2.5.5 user space components.
Problem Description: ib_write_bw triggers kernel bug in ib_mthca

Steps to reproduce:

1) Compile the 2.6.24.2 kernel with kernel debugging enabled.
2) Download, compile and install the OFED 1.2.5.5 userspace components.
3) Run the following commands:
modprobe ib_uverbs
modprobe rdma_ucm
/bin/mkdir -p /dev/infiniband
if [ ! -e /dev/infiniband/uverbs0 ]; then
  /bin/mknod /dev/infiniband/uverbs0 c $(cat /sys/class/infiniband_verbs/uverbs0/dev | sed 's/:/ /g')
fi
if [ ! -e /dev/infiniband/rdma_cm ]; then
  /bin/mknod /dev/infiniband/rdma_cm c $(cat /sys/class/misc/rdma_cm/dev | sed 's/:/ /g')
fi
ib_write_bw

Result:

------------[ cut here ]------------
kernel BUG at include/linux/scatterlist.h:59!
invalid opcode: 0000 [1] SMP
CPU 1
Modules linked in: ib_iser iscsi_tcp libiscsi scsi_transport_iscsi rdma_ucm rdma_cm iw_cm ib_addr ib_uverbs ib_ipoib ib_cm ib_sa ipv6 parport_pc lp parport loop af_packet ib_mthca ib_mad ib_core pcspkr psmouse serio_raw iTCO_wdt iTCO_vendor_support shpchp pci_hotplug evdev ext3 jbd mbcache sg sd_mod sr_mod cdrom ata_piix ata_generic libata scsi_mod ehci_hcd uhci_hcd usbcore e1000 fuse
Pid: 4343, comm: ib_write_bw Not tainted 2.6.24 #3
RIP: 0010:[<ffffffff881a0191>]  [<ffffffff881a0191>] :ib_mthca:mthca_map_user_db+0x2c1/0x2f0
RSP: 0018:ffff81007b76dd58  EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff81007b7e9e58 RCX: ffff81007b7e9e48
RDX: ffff810003de3100 RSI: 0000000087654321 RDI: ffff810003e5d270
RBP: ffff81007c8c0000 R08: 800000007a5d8067 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: 000000000060c000 R14: ffff81007b7e9000 R15: ffff81007dcc3980
FS:  00002b21dcc8bc60(0000) GS:ffff81007f40d580(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000060e000 CR3: 000000007b751000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ib_write_bw (pid: 4343, threadinfo ffff81007b76c000, task ffff81007b7f0000)
Stack:  ffff81007b76dd98 0000000000000000 ffffffff80332e4d 0000000000000002
 ffff81007b7e9e48 ffff81007b7e9e78 ffff81007b7e9dc8 0000003f00000046
 ffff810003de3100 0000000000000046 ffff81007d3ec750 000000000000007f
Call Trace:
 [<ffffffff80332e4d>] __down_write_trylock+0x1d/0x60
 [<ffffffff8819e2b0>] :ib_mthca:mthca_create_cq+0xb0/0x200
 [<ffffffff8825a334>] :ib_uverbs:ib_uverbs_create_cq+0x1b4/0x2d0
 [<ffffffff8825639e>] :ib_uverbs:ib_uverbs_write+0xbe/0xe0
 [<ffffffff8029d27d>] vfs_write+0xdd/0x190
 [<ffffffff8029d983>] sys_write+0x53/0x90
 [<ffffffff8020c2ee>] system_call+0x7e/0x83


Code: 0f 0b eb fe 0f 0b eb fe 48 8b 74 24 30 4c 89 ae 88 00 00 00
RIP  [<ffffffff881a0191>] :ib_mthca:mthca_map_user_db+0x2c1/0x2f0
 RSP <ffff81007b76dd58>
---[ end trace 28761e105a0be450 ]---
Comment 1 Bart Van Assche 2008-02-11 04:55:58 UTC
Created attachment 14778 [details]
Kernel config (.config)
Comment 2 Roland Dreier 2008-02-11 16:01:39 UTC
Looks like the various sg-list conversions left out some needed calls to sg_init_table().  I'll fix this up.
Comment 3 Roland Dreier 2008-02-11 16:09:52 UTC
Created attachment 14782 [details]
add missing sg_init_table() call

Please try this patch and let me know if this fixes things.
Comment 4 Bart Van Assche 2008-02-11 23:39:22 UTC
The attached patch solves this issue. Thanks for the quick fix !
Comment 5 Natalie Protasevich 2008-05-20 09:35:01 UTC
Roland, I can't see this commit neither in Linus nor in IB tree. Is it planned to be submitted?
Comment 6 Roland Dreier 2008-05-20 13:22:40 UTC
Sorry, the upstream fix is in a slightly different spot.  It is fe174357 ("IB/mthca: Add missing sg_init_table() in mthca_map_user_db()"), which went into Linus's tree just after 2.6.25-rc1.