Bug 48241

Summary: oops when setting up LVM
Product: IO/Storage Reporter: Daniel Santos (daniel.santos)
Component: SCSIAssignee: linux-scsi (linux-scsi)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.6.0-next-20121003 Subsystem:
Regression: No Bisected commit-id:
Attachments: image of oops
output of lshw
.config
oops of next-20121003 compiled with gcc 4.6.3
Working 3.6.0-vanilla .config
config-3.6.0-next-20121003 (second)

Description Daniel Santos 2012-10-03 15:04:47 UTC
Created attachment 81921 [details]
image of oops

see attached image
Comment 1 Daniel Santos 2012-10-03 15:05:37 UTC
Created attachment 81931 [details]
output of lshw
Comment 2 Daniel Santos 2012-10-03 15:06:49 UTC
Created attachment 81941 [details]
.config
Comment 3 Daniel Santos 2012-10-03 15:13:29 UTC
Oh, I forgot my compiler:

$ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/4.7.1/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /tmp/portage/sys-devel/gcc-4.7.1/work/gcc-4.7.1/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.7.1 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.1/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.7.1 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.7.1/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.7.1/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.1/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --disable-fixed-point --with-ppl --with-cloog --disable-ppl-version-check --with-cloog-include=/usr/include/cloog-ppl --enable-lto --enable-nls --without-included-gettext --with-system-zlib --enable-obsolete --disable-werror --enable-secureplt --enable-multilib --with-multilib-list=m32,m64 --enable-libmudflap --disable-libssp --enable-libgomp --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.7.1/python --enable-checking=release --enable-java-awt=gtk --enable-objc-gc --enable-languages=c,c++,java,objc,obj-c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-targets=all --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.7.1'
Thread model: posix
gcc version 4.7.1 (Gentoo 4.7.1)
Comment 4 Anonymous Emailer 2012-10-03 15:23:10 UTC
Reply-To: James.Bottomley@HansenPartnership.com

On Wed, 2012-10-03 at 15:04 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=48241
> 
>            Summary: oops when setting up LVM
>            Product: IO/Storage
>            Version: 2.5
>     Kernel Version: 3.6.0-next-20121003
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: SCSI
>         AssignedTo: linux-scsi@vger.kernel.org
>         ReportedBy: daniel.santos@pobox.com
>         Regression: No
> 
> 
> Created an attachment (id=81921)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=81921)
> image of oops

The image says the RIP is at kthread_data + 0xb

That implies something went wrong within the workqueue or kthread
systems, I've cc'd linux-kernel, but it's a bit of a vague thing to go
on and could conceivably be a hardware issue (or some weird thread
interaction in linux-next).

The first question would be "does it happen in vanilla 3.6"?

James
Comment 5 Daniel Santos 2012-10-03 15:36:53 UTC
> The image says the RIP is at kthread_data + 0xb
> 
> That implies something went wrong within the workqueue or kthread
> systems, I've cc'd linux-kernel, but it's a bit of a vague thing to go
> on and could conceivably be a hardware issue (or some weird thread
> interaction in linux-next).

OK, thanks. I was kinda guessing it was from scsi_setup_fs_cmnd when I choose a component, I don't really know that part of the kernel.

> The first question would be "does it happen in vanilla 3.6"?

I haven't tried 3.6 yet, but I did just rebuild using gcc 4.6.3 to make sure it wasn't compiler related.  After that, I will try 3.6 built with gcc 4.7.1.  Also, I got this in -mmotm prior to trying linux-next.
Comment 6 Daniel Santos 2012-10-03 16:16:48 UTC
Created attachment 81951 [details]
oops of next-20121003 compiled with gcc 4.6.3

So yeah, very similar oops under 4.6.3 (looks like all that changed were addresses & such).  I'm running 3.6.0 right now.

uugh, but now that I think about it, I didn't use the same config that I was running earlier since oldconfig was asking me tons of questions that were already answered with a previous config (options removed or moved possibly?).  To make this complete, I should use my current 3.6.0 config, rebase to next-20121003 and build that.
Comment 7 Daniel Santos 2012-10-03 16:21:22 UTC
Created attachment 81961 [details]
Working 3.6.0-vanilla .config

This is the .config from 3.6.0-vanilla that is working
Comment 8 Daniel Santos 2012-10-03 16:45:43 UTC
Created attachment 81971 [details]
config-3.6.0-next-20121003 (second)

OK, so I took my .config from 3.6.0-vanilla, ran oldconfig under 3.6.0-next-20121003 (using all default answers) and I have the same oops.
Comment 9 Anonymous Emailer 2012-10-04 07:34:55 UTC
Reply-To: danielfsantos@att.net

On 10/03/2012 10:23 AM, James Bottomley wrote:
> On Wed, 2012-10-03 at 15:04 +0000, bugzilla-daemon@bugzilla.kernel.org
> wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=48241
>>
>>            Summary: oops when setting up LVM
>>            Product: IO/Storage
>>            Version: 2.5
>>     Kernel Version: 3.6.0-next-20121003
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: SCSI
>>         AssignedTo: linux-scsi@vger.kernel.org
>>         ReportedBy: daniel.santos@pobox.com
>>         Regression: No
>>
>>
>> Created an attachment (id=81921)
>>  --> (https://bugzilla.kernel.org/attachment.cgi?id=81921)
>> image of oops
>
> The image says the RIP is at kthread_data + 0xb
>
> That implies something went wrong within the workqueue or kthread
> systems, I've cc'd linux-kernel, but it's a bit of a vague thing to go
> on and could conceivably be a hardware issue (or some weird thread
> interaction in linux-next).
>
> The first question would be "does it happen in vanilla 3.6"?
>
> James
So just to CC LKML, works in vanilla 3.6.0, happens in both -next & -mm,
tried compiling with both gcc 4.6.3 & 4.7.1.

Daniel
Comment 10 Daniel Santos 2012-10-05 18:38:08 UTC
Update: bug not present in Linus today (commit ecefbd94b834fa32559d854646d777c56749ef1c)