Bug 83391 - Oops on sd_mod
Summary: Oops on sd_mod
Status: RESOLVED INVALID
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P1 normal
Assignee: scsi_drivers-other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-28 02:32 UTC by tomsun
Modified: 2014-08-29 01:19 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.32
Subsystem:
Regression: No
Bisected commit-id:


Attachments
the function disassemble info (136.26 KB, text/plain)
2014-08-28 03:19 UTC, tomsun
Details

Description tomsun 2014-08-28 02:32:01 UTC
Now i met the oops of sd_mod more times, but i don't know the condition that the oops occured, the info as below.
 
BUG: unable to handle kernel paging request at ffff88001488c004
IP: [<ffffffffa019d01c>] sd_revalidate_disk+0x107c/0x1900 [sd_mod]
PGD 1a05067 PUD 1a09067 PMD 176067 PTE 0
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/scsi_host/host0/scan

Pid: 31, comm: ata_aux Tainted: G         C ----------------   2.6.32-220.el6.x86_64 #1 LENOVO QiTianM8250/LENOVO
RIP: e030:[<ffffffffa019d01c>]  [<ffffffffa019d01c>] sd_revalidate_disk+0x107c/0x1900 [sd_mod]
RSP: e02b:ffff88003eacbc60  EFLAGS: 00010246
RAX: ffff880039948000 RBX: ffff880039971000 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003819b420
RBP: ffff88003eacbd40 R08: 0000000000000018 R09: 00000000ffffffff
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88001488bfe0
R13: 0000000000000200 R14: ffff88003fc00040 R15: 0000000000000200
FS:  00007f92c694c700(0000) GS:ffff880004ff7000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88001488c004 CR3: 00000000379b9000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ata_aux (pid: 31, threadinfo ffff88003eaca000, task ffff88003eac6b00)
Stack:
 ffff880000007530 ffffffff00000005 0000000000000000 ffffffff8139c890
<0> ffff88003eac71a0 0000000000000000 ffff88003eacbcf2 0000000000000000
<0> ffff88003a21aa00 000000002542eab0 ffff880039957800 ffff880039948000
Call Trace:
 [<ffffffff8139c890>] ? ata_scsi_dev_rescan+0x0/0x110
 [<ffffffff8126ffda>] ? kobject_get+0x1a/0x30
 [<ffffffff81260000>] ? blkiocg_file_read_map+0xe0/0x100
 [<ffffffff811b1428>] revalidate_disk+0x38/0x90
 [<ffffffffa01994b7>] sd_rescan+0x27/0x40 [sd_mod]
 [<ffffffff8138b07d>] scsi_rescan_device+0x8d/0xe0
 [<ffffffff813697c9>] ? get_device+0x19/0x20
 [<ffffffff8139c94a>] ata_scsi_dev_rescan+0xba/0x110
 [<ffffffff8139c890>] ? ata_scsi_dev_rescan+0x0/0x110
 [<ffffffff8108c630>] worker_thread+0x170/0x2a0
 [<ffffffff81091fc0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff814daf2c>] ? _spin_unlock_irqrestore+0x1c/0x20
 [<ffffffff8108c4c0>] ? worker_thread+0x0/0x2a0
 [<ffffffff81091c56>] kthread+0x96/0xa0
 [<ffffffff8100d14a>] child_rip+0xa/0x20
 [<ffffffff8100c314>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8100ca9d>] ? retint_restore_args+0x5/0x6
 [<ffffffff8100d140>] ? child_rip+0x0/0x20

crash> l* sd_revalidate_disk+0x107c
0xffffffffa019d01c is in sd_revalidate_disk (/usr/src/debug/kernel-2.6.32/linux-2.6.32/arch/x86/include/asm/swab.h:53).
48                  : "=r" (v.s.a), "=r" (v.s.b)
49                  : "0" (v.s.a), "1" (v.s.b));
50      # endif
51              return v.u;
52      #else /* __i386__ */
53              asm("bswapq %0"
54                  : "=r" (val)
55                  : "0" (val));
56              return val;
57      #endif

Would you give me suggestions for this oops? thank you very much!
Comment 1 tomsun 2014-08-28 03:19:20 UTC
Created attachment 148621 [details]
the function disassemble info
Comment 2 tomsun 2014-08-28 07:55:20 UTC
static void sd_read_block_limits(struct scsi_disk *sdkp)
{
	unsigned int sector_sz = sdkp->device->sector_size;
	const int vpd_len = 32;
	unsigned char *buffer = kmalloc(vpd_len, GFP_KERNEL);

	if (!buffer ||
	    /* Block Limits VPD */
	    scsi_get_vpd_page(sdkp->device, 0xb0, buffer, vpd_len))
		goto out;

	blk_queue_io_min(sdkp->disk->queue,
			 get_unaligned_be16(&buffer[6]) * sector_sz);
	blk_queue_io_opt(sdkp->disk->queue,
			 get_unaligned_be32(&buffer[12]) * sector_sz);

	if (buffer[3] == 0x3c) {
		unsigned int lba_count, desc_count;

		sdkp->max_ws_blocks =
			(u32) min_not_zero(get_unaligned_be64(&buffer[36]),
					   (u64)0xffffffff);

		if (!sdkp->lbpme)
			goto out;

		lba_count = get_unaligned_be32(&buffer[20]);
		desc_count = get_unaligned_be32(&buffer[24]);

		if (lba_count && desc_count)
			sdkp->max_unmap_blocks = lba_count;

		sdkp->unmap_granularity = get_unaligned_be32(&buffer[28]);

		if (buffer[32] & 0x80)
			sdkp->unmap_alignment =
				get_unaligned_be32(&buffer[32]) & ~(1 << 31);

		if (!sdkp->lbpvpd) { /* LBP VPD page not provided */

			if (sdkp->max_unmap_blocks)
				sd_config_discard(sdkp, SD_LBP_UNMAP);
			else
				sd_config_discard(sdkp, SD_LBP_WS16);

		} else {	/* LBP VPD page tells us what to use */

			if (sdkp->lbpu && sdkp->max_unmap_blocks)
				sd_config_discard(sdkp, SD_LBP_UNMAP);
			else if (sdkp->lbpws)
				sd_config_discard(sdkp, SD_LBP_WS16);
			else if (sdkp->lbpws10)
				sd_config_discard(sdkp, SD_LBP_WS10);
			else
				sd_config_discard(sdkp, SD_LBP_DISABLE);
		}
	}

 out:
	kfree(buffer);
}

first, the pointer of buffer is malloced 32 bytes memory, but the buffer be misused as 64 bytes memory, ex. 	sdkp->max_ws_blocks =
			(u32) min_not_zero(get_unaligned_be64(&buffer[36]),
					   (u64)0xffffffff);
I don't know why, is it the bug for this oops?




thank you very much~
Comment 3 Jeff Moyer 2014-08-28 13:52:29 UTC
This is a vendor kernel.  You should file a bug report with Red Hat here:
  https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%206

First, though, you should update to a newer kernel... that one is several years old.
Comment 4 tomsun 2014-08-29 01:18:47 UTC
sorry, thank you very much!


(In reply to Jeff Moyer from comment #3)
> This is a vendor kernel.  You should file a bug report with Red Hat here:
>  
> https://bugzilla.redhat.com/enter_bug.
> cgi?product=Red%20Hat%20Enterprise%20Linux%206
> 
> First, though, you should update to a newer kernel... that one is several
> years old.

Note You need to log in before you can comment on or make changes to this bug.