Created attachment 303537 [details] picture of kernel panic After updating from 6.0.15 to 6.0.16, I have a reproducable setup to cause a kernel panic. I only have a photo of the panic (see attachment), but here are some of the lines __stack_chk_fail+0x10/0x10 cifs_get_inode_info+0xbcc/0xbe0 [cifs] ? memcg_slab_post_alloc_hook+0x185/0x2c0 cifs_lookup+0x15e/0x7f0 [cifs] __lookup_slow+0x73 Here are some info: Fedora 37 with 6.0.16-300.fc37.x86_64 (also reproducible with 6.0.17-300.fc37.x86_64)
There are not a lot of commits between 6.0.15 and 6.0.16 in fs/cifs. The most suspicious one might be 2d046892a493d https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.0.y&id=2d046892a493d9760c35fdaefc3017f27f91b621
I moved into a VM to copy the stack trace from the serial console and are now able to reproduce the problem by just calling cat on a file: /etc/fstab: //diskstation/home /mnt cifs noauto,user,vers=3,username=georg 0 0 # mount.cifs /mnt <enter password> # cd /mnt # echo foo > foo && rm -f foo It happens not all the times, but most of the times. Repeat the echo && rm command to trigger it. # grep cifs /proc/mounts //diskstation/home /mnt cifs rw,nosuid,nodev,noexec,relatime,vers=3.1.1,cache=strict,username=georg,uid=0,noforceuid,gid=0,noforcegid,addr=10.x.x.x,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=5 0 0 [ 52.175685] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: cifs_get_inode_info+0xbcc/0xbe0 [cifs] [ 52.177641] CPU: 0 PID: 836 Comm: rm Not tainted 6.0.16-300.fc37.x86_64 #1 [ 52.178708] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014 [ 52.180151] Call Trace: [ 52.180625] <TASK> [ 52.181062] dump_stack_lvl+0x44/0x5c [ 52.181724] panic+0x10a/0x2c0 [ 52.182302] ? cifs_get_inode_info+0xbcc/0xbe0 [cifs] [ 52.183544] __stack_chk_fail+0x10/0x10 [ 52.184343] cifs_get_inode_info+0xbcc/0xbe0 [cifs] [ 52.185554] cifs_revalidate_dentry_attr+0x158/0x380 [cifs] [ 52.187494] cifs_revalidate_dentry+0xf/0x30 [cifs] [ 52.188624] cifs_d_revalidate+0x59/0x150 [cifs] [ 52.189333] ? try_to_unlazy_next+0x9d/0x100 [ 52.189859] lookup_fast+0x71/0xe0 [ 52.190272] walk_component+0x1f/0x150 [ 52.190732] path_lookupat+0x67/0x190 [ 52.191140] ? get_page_from_freelist+0x81f/0x16c0 [ 52.191597] filename_lookup+0xd3/0x1c0 [ 52.191963] ? __check_object_size+0x1f4/0x250 [ 52.192391] vfs_statx+0x82/0x120 [ 52.192711] vfs_fstatat+0x51/0x70 [ 52.193039] __do_sys_newfstatat+0x2e/0x50 [ 52.193510] do_syscall_64+0x58/0x80 [ 52.193857] ? do_user_addr_fault+0x1ef/0x690 [ 52.194280] ? kvm_read_and_reset_apf_flags+0x3f/0x60 [ 52.194909] ? exc_page_fault+0x70/0x170 [ 52.195432] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 52.195835] RIP: 0033:0x7f106129b5be [ 52.196045] Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 07 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 41 89 ca b8 06 01 00 00 0f 05 <3d> 00 f0 ff ff 77 0b 31 c0 c3 0f 1f 84 00 00 00 00 00 48 8b 15 39 [ 52.196966] RSP: 002b:00007ffe14d2ceb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106 [ 52.197298] RAX: ffffffffffffffda RBX: 00005621a05a1810 RCX: 00007f106129b5be [ 52.197613] RDX: 00005621a05a1880 RSI: 00005621a05a1910 RDI: 00000000ffffff9c [ 52.197925] RBP: 00005621a05a0560 R08: 0000000000000000 R09: 0000000000000000 [ 52.198236] R10: 0000000000000100 R11: 0000000000000246 R12: 00005621a05a1880 [ 52.198553] R13: 0000000000000000 R14: 00005621a05a1700 R15: 00007ffe14d2e616 [ 52.198866] </TASK> [ 52.199031] Kernel Offset: 0x33000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 52.199504] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: cifs_get_inode_info+0xbcc/0xbe0 [cifs] ]---
Hi Georg, Thanks for the report. Given the above backtrace, looks like we're missing this commit in v6.0.16 9ee2afe5207b ("cifs: prevent copying past input buffer boundaries") Can you try again with it? Let us know if it works so we can ask GregKH to backport that commit for stable.
Hi Paulo, thank you for the quick response. Applying the patch fixes the issue.
Thanks for quickly testing it! I saw that you've already asked Greg to backport that commit, so could you please also update the bug accordingly once we get such commit in stable?
Alright! When it is in a stable release, I will update this bug.
Hello, I can confirm problems with CIFS on upgrading from 6.0.15 to 6.0.16. Just after upgrade, I wasn't able to log in after boot up. After authentication, system "crashed" (maybe a kernel-panic) but I am not able to see any messages, neither was able to access TTY-instances. Booting back on 6.0.15 everything is fine. Once I commented CIFS-mount at fstab I was able to log in on 6.0.16. So, I must agree with Georg Müller on severe issues in regard to CIFS (samba) mounts. If I can further help with more info, please tell me how (step by step).
Edit: I confirm, that mounting CIFS (samba) by "sudo mount -a" crashes my system either. Regards
I see that GregKH released 6.0.18[1] with such fix backported. So, please update to 6.0.18 and see if it still breaks. [1] https://lore.kernel.org/all/1673088307390@kroah.com/
Yeah, I just wanted to write that. The fix was also backported for the 5.4, 5.10 and 5.15 stable branches. I don't know if they were affected, since they don't have the symlink change (2d046892a493d). I am still trying to figure out which change exactly (read: which line change) triggered the bug.
(In reply to Georg Müller from comment #10) > Yeah, I just wanted to write that. > > The fix was also backported for the 5.4, 5.10 and 5.15 stable branches. I > don't know if they were affected, since they don't have the symlink change > (2d046892a493d). Yes, the commit would only help with 9ee2afe5207b ("cifs: prevent copying past input buffer boundaries") Perhaps we can ask GregKH to drop this commit in those 5.{4,10,15} stable branches. > I am still trying to figure out which change exactly (read: which line > change) triggered the bug. We used to allocate large chunks of memory (size of file_info + (PATH_MAX * 2)) just to hold the file metadata returned by ->query_path_info() in cifs_get_inode_info(). But since we only needed all the fields expect the filename, the symlink change used a stack-allocated variable (@tmp_data) instead to handle that information. Without 9ee2afe5207b, we would end up copying more bytes than needed in @tmp_data when calling ->query_path_info() in cifs_get_inode_info() thus causing a stack corruption.
(In reply to Paulo Alcantara from comment #11) > > I am still trying to figure out which change exactly (read: which line > > change) triggered the bug. > > We used to allocate large chunks of memory (size of file_info + (PATH_MAX * > 2)) just to hold the file metadata returned by ->query_path_info() in > cifs_get_inode_info(). But since we only needed all the fields expect the > filename, the symlink change used a stack-allocated variable (@tmp_data) > instead to handle that information. > > Without 9ee2afe5207b, we would end up copying more bytes than needed in > @tmp_data when calling ->query_path_info() in cifs_get_inode_info() thus > causing a stack corruption. I was looking at the wrong part of the patch. The upper part (buffer_length -> minbufsize) is what fixes it - I was looking at the dlen/min_len stuff, but this is not used in the code path which leads to the crash. But looking at the code, 9ee2afe5207b seems to be make the code safer even without the symlink patch which triggered the panic. Steven French asked to remove the patch from the 5.15/5.10/5.4 queues. Maybe you could reply to this request.
(In reply to Georg Müller from comment #12) > Steven French asked to remove the patch from the 5.15/5.10/5.4 queues. Maybe > you could reply to this request. Sorry for the noise, I have seen your response too late ;)
(In reply to Georg Müller from comment #12) > (In reply to Paulo Alcantara from comment #11) > > > I am still trying to figure out which change exactly (read: which line > > > change) triggered the bug. > > > > We used to allocate large chunks of memory (size of file_info + (PATH_MAX * > > 2)) just to hold the file metadata returned by ->query_path_info() in > > cifs_get_inode_info(). But since we only needed all the fields expect the > > filename, the symlink change used a stack-allocated variable (@tmp_data) > > instead to handle that information. > > > > Without 9ee2afe5207b, we would end up copying more bytes than needed in > > @tmp_data when calling ->query_path_info() in cifs_get_inode_info() thus > > causing a stack corruption. > > I was looking at the wrong part of the patch. The upper part (buffer_length > -> minbufsize) is what fixes it - I was looking at the dlen/min_len stuff, > but this is not used in the code path which leads to the crash. > > But looking at the code, 9ee2afe5207b seems to be make the code safer even > without the symlink patch which triggered the panic. > > Steven French asked to remove the patch from the 5.15/5.10/5.4 queues. Maybe > you could reply to this request. Yes, good point. The problem is that we haven't tested or looked at it enough in order to guarantee that it won't break any other existing code. And I agree with you that would probably make it safer. Thanks for looking into that!
Closing this as it is fixed with 6.0.18 and no other kernel has 2d046892a493d