Hi, I run many pristine and non-pristine (Ubuntu Zesty and PPA/mainline, -generic and -lowlatency) 64-bit kernels on x86_64 platforms, both on bare metal and under VirtualBox (Both on Linux and on Windows). I just found a CDROM-related bug introduced in 4.11 merge window which makes kernel freeze (only guest OS kernel when running under VM) instead of doing read retries (My CD is an specially crafted AudioCD-like cleaning CD which do plays music). That is the bug does not show with Linux 4.10 and previous versions, but shows with Linux 4.11-rc1 and subsequent versions (still present in 4.12). Please, tell me which mailing list on VGER is more appropriate to publish my yet-to-do bisect, thanks. Here follows the trace : [ 222.725026] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff89e8675a [ 222.725026] [ 222.759159] CPU: 1 PID: 2605 Comm: vlc Not tainted 4.11.0-041100rc1-generic #201703051731 [ 222.841402] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 222.895875] Call Trace: [ 222.897597] dump_stack+0x63/0x81 [ 222.905671] panic+0xe4/0x22d [ 222.907701] ? mmc_ioctl_cdrom_read_data+0x2aa/0x2b0 [ 222.912909] ? mmc_ioctl_cdrom_read_data+0x1ac/0x2b0 [ 223.014433] __stack_chk_fail+0x19/0x20 [ 223.025257] mmc_ioctl_cdrom_read_data+0x2aa/0x2b0 [ 223.079413] cdrom_ioctl+0xc72/0xff0 [ 223.141559] ? futex_wake_op+0x465/0x620 [ 223.245436] sr_block_ioctl+0x7c/0xc0 [ 223.290226] blkdev_ioctl+0x8ce/0x970 [ 223.300801] ? do_futex+0x1ff/0x530 [ 223.380541] block_ioctl+0x3d/0x50 [ 223.464710] do_vfs_ioctl+0xa3/0x600 [ 223.467146] ? SyS_futex+0x85/0x180 [ 223.475714] SyS_ioctl+0x79/0x90 [ 223.479852] entry_SYSCALL_64_fastpath+0x1e/0xad [ 223.486224] RIP: 0033:0x7fea5f4d2987 [ 223.488751] RSP: 002b:00007fea10a5b858 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [ 223.502623] RAX: ffffffffffffffda RBX: 00007fea00170fd0 RCX: 00007fea5f4d2987 [ 223.518425] RDX: 00007fea0000a5c0 RSI: 0000000000005314 RDI: 0000000000000024 [ 223.529491] RBP: 000000000000b7c0 R08: 0000000000028a64 R09: 000000001b4e81b5 [ 223.548943] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000014 [ 223.561016] R13: 00007fea00006eb8 R14: 000000000000b7c0 R15: 00007fea0400bb58 [ 223.681643] Kernel Offset: 0x8800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 223.843408] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff89e8675a [ 223.843408]
Hi, I happened to spot the offending commit : 82ed4db499b8598f16f8871261bff088d6b0597f is the first bad commit commit 82ed4db499b8598f16f8871261bff088d6b0597f Author: Christoph Hellwig <hch@lst.de> Date: Fri Jan 27 09:46:29 2017 +0100 block: split scsi_request out of struct request And require all drivers that want to support BLOCK_PC to allocate it as the first thing of their private data. To support this the legacy IDE and BSG code is switched to set cmd_size on their queues to let the block layer allocate the additional space. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> :040000 040000 38c071b04784bea133302e5d6c622d5199c15b4c 52d4a7b56b333d7494ac86ba5b66bcb80bc77502 M block :040000 040000 57983d79520cf3f81b3849ae02ab7d8f8abe5e35 1fe1125254e7ec2acb778280228e42a8df6554f6 M drivers :040000 040000 5f0b15d9d0a0c44af6bb622df7523b9a8971611b 32e627e635598801530640f7206aa06739fb9c84 M fs :040000 040000 93cbcc6cf5551ee8c23ae412301dd589fc88f3a7 3c478246251cc8d05a2d754fd5d745335de7cd97 M include Of which the error output follows : [ 2878.623974] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffb9a6716a [ 2878.623974] [ 2878.652298] CPU: 1 PID: 2953 Comm: vlc Not tainted 4.10.0-rc5-vq-cdrom-11+ #1 [ 2878.653477] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 2878.654871] Call Trace: [ 2878.655305] dump_stack+0x63/0x83 [ 2878.655882] panic+0xe4/0x22d [ 2878.656396] ? mmc_ioctl_cdrom_read_data+0x2aa/0x2b0 [ 2878.661440] ? mmc_ioctl_cdrom_read_data+0x1ac/0x2b0 [ 2878.662269] __stack_chk_fail+0x19/0x20 [ 2878.662906] mmc_ioctl_cdrom_read_data+0x2aa/0x2b0 [ 2878.665865] cdrom_ioctl+0xc72/0xff0 [ 2878.666494] sr_block_ioctl+0x7c/0xc0 [ 2878.701274] blkdev_ioctl+0x8d3/0x980 [ 2878.861491] block_ioctl+0x3d/0x50 [ 2878.862072] do_vfs_ioctl+0xa3/0x5f0 [ 2878.862668] ? __fget+0x77/0xb0 [ 2878.881462] SyS_ioctl+0x79/0x90 [ 2878.882065] entry_SYSCALL_64_fastpath+0x1e/0xad [ 2878.882852] RIP: 0033:0x7fad0a483987 [ 2878.883450] RSP: 002b:00007facb2919ac8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 2878.884733] RAX: ffffffffffffffda RBX: 00007fac9c002830 RCX: 00007fad0a483987 [ 2878.921267] RDX: 00007fac9c185790 RSI: 0000000000005314 RDI: 0000000000000023 [ 2878.952306] RBP: 00000000000b92f8 R08: 0000000000000000 R09: 0000000000000dc5 [ 2879.001269] R10: 00007fac9c185790 R11: 0000000000000202 R12: 00000000aaab9745 [ 2879.042370] R13: 00007fac9c011570 R14: 00007fac9c001820 R15: 00007facac0070d0 [ 2879.051866] Kernel Offset: 0x38400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 2879.072859] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffb9a6716a [ 2879.072859] In hope this helps, Sincerely, Valentin
I see similar problem with mpt3sas and sg. Perhaps, every driver working with SCSI SENSE got broken. 0xffffffffc02102bf: mov 0x170(%rbx),%rdi 0xffffffffc02102c6: mov -0x60(%rbp),%rsi 0xffffffffc02102ca: mov $0x7,%ecx 0xffffffffc02102cf: rep movsl %ds:(%rsi),%es:(%rdi) <= RIP RIP: ffffffffc02102cf RSP: ffffa43dc4c97b20 RFLAGS: 00010282 RAX: 0000000000000038 RBX: ffff96ffd6c03600 RCX: 0000000000000007 RDX: 0000000000000000 RSI: ffff96f9109b5c00 RDI: 0009000100d00000 RBP: ffffa43dc4c97bc0 R8: 0000000000000038 R9: 0000000000000001 R10: ffffa43dc4c97a10 R11: 0000000000000000 R12: ffff96ffd6c01400 R13: ffff96f90f695298 R14: ffff96f90f694000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 crash> p *(struct scsi_request *)0xffff96ffd6c03748 $2 = { __cmd = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", cmd = 0xffff96ffd6c03748 "", cmd_len = 16, sense_len = 0, resid_len = 16, sense = 0x9000100d00000 } scsi_request’s sense pointer is not a valid kernel address. In fact, it is never initialized, so the junk could be a valid address - just not yours.
Hello Vitaly, Would you mind bisecting, Sir ? Kind regards, Valentin
(In reply to Vitaly Mayatskikh from comment #2) > I see similar problem with mpt3sas and sg. Perhaps, every driver working > with SCSI SENSE got broken. > > 0xffffffffc02102bf: mov 0x170(%rbx),%rdi > 0xffffffffc02102c6: mov -0x60(%rbp),%rsi > 0xffffffffc02102ca: mov $0x7,%ecx > 0xffffffffc02102cf: rep movsl %ds:(%rsi),%es:(%rdi) <= RIP > > RIP: ffffffffc02102cf RSP: ffffa43dc4c97b20 RFLAGS: 00010282 > RAX: 0000000000000038 RBX: ffff96ffd6c03600 RCX: 0000000000000007 > RDX: 0000000000000000 RSI: ffff96f9109b5c00 RDI: 0009000100d00000 > RBP: ffffa43dc4c97bc0 R8: 0000000000000038 R9: 0000000000000001 > R10: ffffa43dc4c97a10 R11: 0000000000000000 R12: ffff96ffd6c01400 > R13: ffff96f90f695298 R14: ffff96f90f694000 R15: 0000000000000000 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > > crash> p *(struct scsi_request *)0xffff96ffd6c03748 > $2 = { > __cmd = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", > cmd = 0xffff96ffd6c03748 "", > cmd_len = 16, > sense_len = 0, > resid_len = 16, > sense = 0x9000100d00000 > } > > scsi_request’s sense pointer is not a valid kernel address. In fact, it is > never initialized, so the junk could be a valid address - just not yours. Hello Vitaly, Would you mind bisecting, Sir ? Kind regards, Valentin
Same commit 82ed4db499b8598f16f8871261bff088d6b0597f. It removed SENSE buffer allocation in block/bsg and assignment in various places.
Created attachment 257611 [details] Keep SENSE buffer in bsg_command This seems to help.
(In reply to Vitaly Mayatskikh from comment #6) > Created attachment 257611 [details] > Keep SENSE buffer in bsg_command > > This seems to help. Unfortunately Sir, here your patch didn't help : I just applied your patch onto offending commit 82ed4db but got exactly the same trace ! In hope this helps. Sincerely, Valentin
Yes, it fixes bsg case only.
Hello Valentin, From the stack we can see that some ioctl to CD device lead to kernel crash. A coredump file will be very helpful. Regards, Matt
This bug always happens when a CD is ejected while CDROMREADRAW ioctl is called (e.g. VLC Media Player can trigger it when an audio CD is ejected in the middle of playback). On kernel 4.4 the CDROMREADRAW ioctl fails in that case with the error ENOMEDIUM.
I tried to play CD using VLC player on my desktop (Linux 4.18.5-1.el7.elrepo.x86_64), when I eject my CD, the system did not panic and VLC complains file missing. So I guess the mainline has already fixed this issue.