Bug 196343 - CDROM-related bug introduced in 4.11 merge window makes kernel freeze instead of doing read retires.
Summary: CDROM-related bug introduced in 4.11 merge window makes kernel freeze instead...
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: All Linux
: P1 high
Assignee: io_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-12 15:28 UTC by Valentin QUEQUET
Modified: 2018-09-04 12:42 UTC (History)
4 users (show)

See Also:
Kernel Version: 4.11-rc1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Keep SENSE buffer in bsg_command (1.67 KB, patch)
2017-07-19 15:19 UTC, Vitaly Mayatskikh
Details | Diff

Description Valentin QUEQUET 2017-07-12 15:28:27 UTC
Hi,

I run many pristine and non-pristine (Ubuntu Zesty and PPA/mainline, -generic and -lowlatency) 64-bit kernels on x86_64 platforms, both on bare metal and under VirtualBox (Both on Linux and on Windows).

I just found a CDROM-related bug introduced in 4.11 merge window which makes kernel freeze (only guest OS kernel when running under VM) instead of doing read retries (My CD is an specially crafted AudioCD-like cleaning CD which do plays music). That is the bug does not show with Linux 4.10 and previous versions, but shows with Linux 4.11-rc1 and subsequent versions (still present in 4.12).

Please, tell me which mailing list on VGER is more appropriate to publish my yet-to-do bisect, thanks.

Here follows the trace :

[  222.725026] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff89e8675a
[  222.725026] 
[  222.759159] CPU: 1 PID: 2605 Comm: vlc Not tainted 4.11.0-041100rc1-generic #201703051731
[  222.841402] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  222.895875] Call Trace:
[  222.897597]  dump_stack+0x63/0x81
[  222.905671]  panic+0xe4/0x22d
[  222.907701]  ? mmc_ioctl_cdrom_read_data+0x2aa/0x2b0
[  222.912909]  ? mmc_ioctl_cdrom_read_data+0x1ac/0x2b0
[  223.014433]  __stack_chk_fail+0x19/0x20
[  223.025257]  mmc_ioctl_cdrom_read_data+0x2aa/0x2b0
[  223.079413]  cdrom_ioctl+0xc72/0xff0
[  223.141559]  ? futex_wake_op+0x465/0x620
[  223.245436]  sr_block_ioctl+0x7c/0xc0
[  223.290226]  blkdev_ioctl+0x8ce/0x970
[  223.300801]  ? do_futex+0x1ff/0x530
[  223.380541]  block_ioctl+0x3d/0x50
[  223.464710]  do_vfs_ioctl+0xa3/0x600
[  223.467146]  ? SyS_futex+0x85/0x180
[  223.475714]  SyS_ioctl+0x79/0x90
[  223.479852]  entry_SYSCALL_64_fastpath+0x1e/0xad
[  223.486224] RIP: 0033:0x7fea5f4d2987
[  223.488751] RSP: 002b:00007fea10a5b858 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[  223.502623] RAX: ffffffffffffffda RBX: 00007fea00170fd0 RCX: 00007fea5f4d2987
[  223.518425] RDX: 00007fea0000a5c0 RSI: 0000000000005314 RDI: 0000000000000024
[  223.529491] RBP: 000000000000b7c0 R08: 0000000000028a64 R09: 000000001b4e81b5
[  223.548943] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000014
[  223.561016] R13: 00007fea00006eb8 R14: 000000000000b7c0 R15: 00007fea0400bb58
[  223.681643] Kernel Offset: 0x8800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  223.843408] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff89e8675a
[  223.843408]
Comment 1 Valentin QUEQUET 2017-07-15 06:10:13 UTC
Hi,

I happened to spot the offending commit :

82ed4db499b8598f16f8871261bff088d6b0597f is the first bad commit
commit 82ed4db499b8598f16f8871261bff088d6b0597f
Author: Christoph Hellwig <hch@lst.de>
Date:   Fri Jan 27 09:46:29 2017 +0100

    block: split scsi_request out of struct request
    
    And require all drivers that want to support BLOCK_PC to allocate it
    as the first thing of their private data.  To support this the legacy
    IDE and BSG code is switched to set cmd_size on their queues to let
    the block layer allocate the additional space.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@fb.com>

:040000 040000 38c071b04784bea133302e5d6c622d5199c15b4c 52d4a7b56b333d7494ac86ba5b66bcb80bc77502 M      block
:040000 040000 57983d79520cf3f81b3849ae02ab7d8f8abe5e35 1fe1125254e7ec2acb778280228e42a8df6554f6 M      drivers
:040000 040000 5f0b15d9d0a0c44af6bb622df7523b9a8971611b 32e627e635598801530640f7206aa06739fb9c84 M      fs
:040000 040000 93cbcc6cf5551ee8c23ae412301dd589fc88f3a7 3c478246251cc8d05a2d754fd5d745335de7cd97 M      include

Of which the error output follows :

[ 2878.623974] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffb9a6716a
[ 2878.623974] 
[ 2878.652298] CPU: 1 PID: 2953 Comm: vlc Not tainted 4.10.0-rc5-vq-cdrom-11+ #1
[ 2878.653477] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 2878.654871] Call Trace:
[ 2878.655305]  dump_stack+0x63/0x83
[ 2878.655882]  panic+0xe4/0x22d
[ 2878.656396]  ? mmc_ioctl_cdrom_read_data+0x2aa/0x2b0
[ 2878.661440]  ? mmc_ioctl_cdrom_read_data+0x1ac/0x2b0
[ 2878.662269]  __stack_chk_fail+0x19/0x20
[ 2878.662906]  mmc_ioctl_cdrom_read_data+0x2aa/0x2b0
[ 2878.665865]  cdrom_ioctl+0xc72/0xff0
[ 2878.666494]  sr_block_ioctl+0x7c/0xc0
[ 2878.701274]  blkdev_ioctl+0x8d3/0x980
[ 2878.861491]  block_ioctl+0x3d/0x50
[ 2878.862072]  do_vfs_ioctl+0xa3/0x5f0
[ 2878.862668]  ? __fget+0x77/0xb0
[ 2878.881462]  SyS_ioctl+0x79/0x90
[ 2878.882065]  entry_SYSCALL_64_fastpath+0x1e/0xad
[ 2878.882852] RIP: 0033:0x7fad0a483987
[ 2878.883450] RSP: 002b:00007facb2919ac8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 2878.884733] RAX: ffffffffffffffda RBX: 00007fac9c002830 RCX: 00007fad0a483987
[ 2878.921267] RDX: 00007fac9c185790 RSI: 0000000000005314 RDI: 0000000000000023
[ 2878.952306] RBP: 00000000000b92f8 R08: 0000000000000000 R09: 0000000000000dc5
[ 2879.001269] R10: 00007fac9c185790 R11: 0000000000000202 R12: 00000000aaab9745
[ 2879.042370] R13: 00007fac9c011570 R14: 00007fac9c001820 R15: 00007facac0070d0
[ 2879.051866] Kernel Offset: 0x38400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2879.072859] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffb9a6716a
[ 2879.072859] 

In hope this helps,

Sincerely,
Valentin
Comment 2 Vitaly Mayatskikh 2017-07-17 20:50:41 UTC
I see similar problem with mpt3sas and sg. Perhaps, every driver working with SCSI SENSE got broken.

   0xffffffffc02102bf:  mov    0x170(%rbx),%rdi
   0xffffffffc02102c6:  mov    -0x60(%rbp),%rsi
   0xffffffffc02102ca:  mov    $0x7,%ecx
   0xffffffffc02102cf:  rep movsl %ds:(%rsi),%es:(%rdi) <= RIP

    RIP: ffffffffc02102cf  RSP: ffffa43dc4c97b20  RFLAGS: 00010282
    RAX: 0000000000000038  RBX: ffff96ffd6c03600  RCX: 0000000000000007
    RDX: 0000000000000000  RSI: ffff96f9109b5c00  RDI: 0009000100d00000
    RBP: ffffa43dc4c97bc0   R8: 0000000000000038   R9: 0000000000000001
    R10: ffffa43dc4c97a10  R11: 0000000000000000  R12: ffff96ffd6c01400
    R13: ffff96f90f695298  R14: ffff96f90f694000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018


crash> p *(struct scsi_request *)0xffff96ffd6c03748
$2 = {
  __cmd = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", 
  cmd = 0xffff96ffd6c03748 "", 
  cmd_len = 16, 
  sense_len = 0, 
  resid_len = 16, 
  sense = 0x9000100d00000
}

scsi_request’s sense pointer is not a valid kernel address. In fact, it is never initialized, so the junk could be a valid address - just not yours.
Comment 3 Valentin QUEQUET 2017-07-18 05:34:10 UTC
Hello Vitaly,

Would you mind bisecting, Sir ?

Kind regards,
Valentin
Comment 4 Valentin QUEQUET 2017-07-18 05:35:12 UTC
(In reply to Vitaly Mayatskikh from comment #2)
> I see similar problem with mpt3sas and sg. Perhaps, every driver working
> with SCSI SENSE got broken.
> 
>    0xffffffffc02102bf:  mov    0x170(%rbx),%rdi
>    0xffffffffc02102c6:  mov    -0x60(%rbp),%rsi
>    0xffffffffc02102ca:  mov    $0x7,%ecx
>    0xffffffffc02102cf:  rep movsl %ds:(%rsi),%es:(%rdi) <= RIP
> 
>     RIP: ffffffffc02102cf  RSP: ffffa43dc4c97b20  RFLAGS: 00010282
>     RAX: 0000000000000038  RBX: ffff96ffd6c03600  RCX: 0000000000000007
>     RDX: 0000000000000000  RSI: ffff96f9109b5c00  RDI: 0009000100d00000
>     RBP: ffffa43dc4c97bc0   R8: 0000000000000038   R9: 0000000000000001
>     R10: ffffa43dc4c97a10  R11: 0000000000000000  R12: ffff96ffd6c01400
>     R13: ffff96f90f695298  R14: ffff96f90f694000  R15: 0000000000000000
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> 
> 
> crash> p *(struct scsi_request *)0xffff96ffd6c03748
> $2 = {
>   __cmd = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", 
>   cmd = 0xffff96ffd6c03748 "", 
>   cmd_len = 16, 
>   sense_len = 0, 
>   resid_len = 16, 
>   sense = 0x9000100d00000
> }
> 
> scsi_request’s sense pointer is not a valid kernel address. In fact, it is
> never initialized, so the junk could be a valid address - just not yours.

Hello Vitaly,

Would you mind bisecting, Sir ?

Kind regards,
Valentin
Comment 5 Vitaly Mayatskikh 2017-07-18 11:42:09 UTC
Same commit 82ed4db499b8598f16f8871261bff088d6b0597f. It removed SENSE buffer allocation in block/bsg and assignment in various places.
Comment 6 Vitaly Mayatskikh 2017-07-19 15:19:16 UTC
Created attachment 257611 [details]
Keep SENSE buffer in bsg_command

This seems to help.
Comment 7 Valentin QUEQUET 2017-07-20 09:15:04 UTC
(In reply to Vitaly Mayatskikh from comment #6)
> Created attachment 257611 [details]
> Keep SENSE buffer in bsg_command
> 
> This seems to help.

Unfortunately Sir, here your patch didn't help :

I just applied your patch onto offending commit 82ed4db but got exactly the same trace !

In hope this helps.

Sincerely,
Valentin
Comment 8 Vitaly Mayatskikh 2017-07-20 13:47:52 UTC
Yes, it fixes bsg case only.
Comment 9 Matt Wang 2018-02-22 07:33:34 UTC
Hello Valentin,

From the stack we can see that some ioctl to CD device lead to kernel crash. A coredump file will be very helpful.

Regards,
Matt
Comment 10 Piotr Kosinski 2018-03-27 19:26:32 UTC
This bug always happens when a CD is ejected while CDROMREADRAW ioctl is called (e.g. VLC Media Player can trigger it when an audio CD is ejected in the middle of playback).

On kernel 4.4 the CDROMREADRAW ioctl fails in that case with the error ENOMEDIUM.
Comment 11 Matt Wang 2018-09-04 12:42:48 UTC
I tried to play CD using VLC player on my desktop (Linux 4.18.5-1.el7.elrepo.x86_64), when I eject my CD, the system did not panic and VLC complains file missing. So I guess the mainline has already fixed this issue.

Note You need to log in before you can comment on or make changes to this bug.