Bug 216895 - Kernel Panic in cifs_get_inode_info after updating to 6.0.16
Summary: Kernel Panic in cifs_get_inode_info after updating to 6.0.16
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: CIFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_cifs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-06 11:37 UTC by Georg Müller
Modified: 2023-01-13 08:48 UTC (History)
4 users (show)

See Also:
Kernel Version: 6.0.16
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
picture of kernel panic (1.96 MB, image/jpeg)
2023-01-06 11:37 UTC, Georg Müller
Details

Description Georg Müller 2023-01-06 11:37:28 UTC
Created attachment 303537 [details]
picture of kernel panic

After updating from 6.0.15 to 6.0.16, I have a reproducable setup to cause a kernel panic.

I only have a photo of the panic (see attachment), but here are some of the lines

__stack_chk_fail+0x10/0x10
cifs_get_inode_info+0xbcc/0xbe0 [cifs]
? memcg_slab_post_alloc_hook+0x185/0x2c0
cifs_lookup+0x15e/0x7f0 [cifs]
__lookup_slow+0x73

Here are some info:
Fedora 37 with 6.0.16-300.fc37.x86_64 (also reproducible with 6.0.17-300.fc37.x86_64)
Comment 1 Georg Müller 2023-01-06 11:42:43 UTC
There are not a lot of commits between 6.0.15 and 6.0.16 in fs/cifs.

The most suspicious one might be 2d046892a493d

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.0.y&id=2d046892a493d9760c35fdaefc3017f27f91b621
Comment 2 Georg Müller 2023-01-06 12:34:55 UTC
I moved into a VM to copy the stack trace from the serial console and are now able to reproduce the problem by just calling cat on a file:

/etc/fstab:
//diskstation/home    /mnt             cifs    noauto,user,vers=3,username=georg 0 0

# mount.cifs /mnt
 <enter password>
# cd /mnt
# echo foo > foo && rm -f foo

It happens not all the times, but most of the times. Repeat the echo && rm command to trigger it.

# grep cifs /proc/mounts
//diskstation/home /mnt cifs rw,nosuid,nodev,noexec,relatime,vers=3.1.1,cache=strict,username=georg,uid=0,noforceuid,gid=0,noforcegid,addr=10.x.x.x,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=5 0 0


[   52.175685] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: cifs_get_inode_info+0xbcc/0xbe0 [cifs]
[   52.177641] CPU: 0 PID: 836 Comm: rm Not tainted 6.0.16-300.fc37.x86_64 #1
[   52.178708] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
[   52.180151] Call Trace:
[   52.180625]  <TASK>
[   52.181062]  dump_stack_lvl+0x44/0x5c
[   52.181724]  panic+0x10a/0x2c0
[   52.182302]  ? cifs_get_inode_info+0xbcc/0xbe0 [cifs]
[   52.183544]  __stack_chk_fail+0x10/0x10
[   52.184343]  cifs_get_inode_info+0xbcc/0xbe0 [cifs]
[   52.185554]  cifs_revalidate_dentry_attr+0x158/0x380 [cifs]
[   52.187494]  cifs_revalidate_dentry+0xf/0x30 [cifs]
[   52.188624]  cifs_d_revalidate+0x59/0x150 [cifs]
[   52.189333]  ? try_to_unlazy_next+0x9d/0x100
[   52.189859]  lookup_fast+0x71/0xe0
[   52.190272]  walk_component+0x1f/0x150
[   52.190732]  path_lookupat+0x67/0x190
[   52.191140]  ? get_page_from_freelist+0x81f/0x16c0
[   52.191597]  filename_lookup+0xd3/0x1c0
[   52.191963]  ? __check_object_size+0x1f4/0x250
[   52.192391]  vfs_statx+0x82/0x120
[   52.192711]  vfs_fstatat+0x51/0x70
[   52.193039]  __do_sys_newfstatat+0x2e/0x50
[   52.193510]  do_syscall_64+0x58/0x80
[   52.193857]  ? do_user_addr_fault+0x1ef/0x690
[   52.194280]  ? kvm_read_and_reset_apf_flags+0x3f/0x60
[   52.194909]  ? exc_page_fault+0x70/0x170
[   52.195432]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   52.195835] RIP: 0033:0x7f106129b5be
[   52.196045] Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 07 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 41 89 ca b8 06 01 00 00 0f 05 <3d> 00 f0 ff ff 77 0b 31 c0 c3 0f 1f 84 00 00 00 00 00 48 8b 15 39
[   52.196966] RSP: 002b:00007ffe14d2ceb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
[   52.197298] RAX: ffffffffffffffda RBX: 00005621a05a1810 RCX: 00007f106129b5be
[   52.197613] RDX: 00005621a05a1880 RSI: 00005621a05a1910 RDI: 00000000ffffff9c
[   52.197925] RBP: 00005621a05a0560 R08: 0000000000000000 R09: 0000000000000000
[   52.198236] R10: 0000000000000100 R11: 0000000000000246 R12: 00005621a05a1880
[   52.198553] R13: 0000000000000000 R14: 00005621a05a1700 R15: 00007ffe14d2e616
[   52.198866]  </TASK>
[   52.199031] Kernel Offset: 0x33000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   52.199504] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: cifs_get_inode_info+0xbcc/0xbe0 [cifs] ]---
Comment 3 Paulo Alcantara 2023-01-06 13:30:36 UTC
Hi Georg,

Thanks for the report.

Given the above backtrace, looks like we're missing this commit in
v6.0.16

	9ee2afe5207b ("cifs: prevent copying past input buffer boundaries")

Can you try again with it?  Let us know if it works so we can ask
GregKH to backport that commit for stable.
Comment 4 Georg Müller 2023-01-06 13:57:59 UTC
Hi Paulo,

thank you for the quick response.

Applying the patch fixes the issue.
Comment 5 Paulo Alcantara 2023-01-06 18:23:38 UTC
Thanks for quickly testing it!

I saw that you've already asked Greg to backport that commit, so could you please also update the bug accordingly once we get such commit in stable?
Comment 6 Georg Müller 2023-01-06 20:43:14 UTC
Alright!

When it is in a stable release, I will update this bug.
Comment 7 Careca70 2023-01-07 08:09:49 UTC
Hello,

I can confirm problems with CIFS on upgrading from 6.0.15 to 6.0.16.

Just after upgrade, I wasn't able to log in after boot up. After authentication, system "crashed" (maybe a kernel-panic) but I am not able to see any messages, neither was able to access TTY-instances.

Booting back on 6.0.15 everything is fine.

Once I commented CIFS-mount at fstab I was able to log in on 6.0.16.

So, I must agree with Georg Müller on severe issues in regard to CIFS (samba) mounts.


If I can further help with more info, please tell me how (step by step).
Comment 8 Careca70 2023-01-07 13:47:35 UTC
Edit:

I confirm, that mounting CIFS (samba) by "sudo mount -a" crashes my system either.

Regards
Comment 9 Paulo Alcantara 2023-01-07 16:16:11 UTC
I see that GregKH released 6.0.18[1] with such fix backported.

So, please update to 6.0.18 and see if it still breaks.

[1] https://lore.kernel.org/all/1673088307390@kroah.com/
Comment 10 Georg Müller 2023-01-07 16:18:52 UTC
Yeah, I just wanted to write that.

The fix was also backported for the 5.4, 5.10 and 5.15 stable branches. I don't know if they were affected, since they don't have the symlink change (2d046892a493d).

I am still trying to figure out which change exactly (read: which line change) triggered the bug.
Comment 11 Paulo Alcantara 2023-01-07 16:35:53 UTC
(In reply to Georg Müller from comment #10)
> Yeah, I just wanted to write that.
> 
> The fix was also backported for the 5.4, 5.10 and 5.15 stable branches. I
> don't know if they were affected, since they don't have the symlink change
> (2d046892a493d).

Yes, the commit would only help with

  9ee2afe5207b ("cifs: prevent copying past input buffer boundaries")

Perhaps we can ask GregKH to drop this commit in those 5.{4,10,15} stable branches.

> I am still trying to figure out which change exactly (read: which line
> change) triggered the bug.

We used to allocate large chunks of memory (size of file_info + (PATH_MAX * 2)) just to hold the file metadata returned by ->query_path_info() in cifs_get_inode_info().  But since we only needed all the fields expect the filename, the symlink change used a stack-allocated variable (@tmp_data) instead to handle that information.

Without 9ee2afe5207b, we would end up copying more bytes than needed in @tmp_data when calling ->query_path_info() in cifs_get_inode_info() thus causing a stack corruption.
Comment 12 Georg Müller 2023-01-07 18:07:22 UTC
(In reply to Paulo Alcantara from comment #11)
> > I am still trying to figure out which change exactly (read: which line
> > change) triggered the bug.
> 
> We used to allocate large chunks of memory (size of file_info + (PATH_MAX *
> 2)) just to hold the file metadata returned by ->query_path_info() in
> cifs_get_inode_info().  But since we only needed all the fields expect the
> filename, the symlink change used a stack-allocated variable (@tmp_data)
> instead to handle that information.
> 
> Without 9ee2afe5207b, we would end up copying more bytes than needed in
> @tmp_data when calling ->query_path_info() in cifs_get_inode_info() thus
> causing a stack corruption.

I was looking at the wrong part of the patch. The upper part (buffer_length -> minbufsize) is what fixes it - I was looking at the dlen/min_len stuff, but this is not used in the code path which leads to the crash.

But looking at the code, 9ee2afe5207b seems to be make the code safer even without the symlink patch which triggered the panic.

Steven French asked to remove the patch from the 5.15/5.10/5.4 queues. Maybe you could reply to this request.
Comment 13 Georg Müller 2023-01-07 18:11:31 UTC
(In reply to Georg Müller from comment #12)
> Steven French asked to remove the patch from the 5.15/5.10/5.4 queues. Maybe
> you could reply to this request.

Sorry for the noise, I have seen your response too late ;)
Comment 14 Paulo Alcantara 2023-01-07 18:12:31 UTC
(In reply to Georg Müller from comment #12)
> (In reply to Paulo Alcantara from comment #11)
> > > I am still trying to figure out which change exactly (read: which line
> > > change) triggered the bug.
> > 
> > We used to allocate large chunks of memory (size of file_info + (PATH_MAX *
> > 2)) just to hold the file metadata returned by ->query_path_info() in
> > cifs_get_inode_info().  But since we only needed all the fields expect the
> > filename, the symlink change used a stack-allocated variable (@tmp_data)
> > instead to handle that information.
> > 
> > Without 9ee2afe5207b, we would end up copying more bytes than needed in
> > @tmp_data when calling ->query_path_info() in cifs_get_inode_info() thus
> > causing a stack corruption.
> 
> I was looking at the wrong part of the patch. The upper part (buffer_length
> -> minbufsize) is what fixes it - I was looking at the dlen/min_len stuff,
> but this is not used in the code path which leads to the crash.
> 
> But looking at the code, 9ee2afe5207b seems to be make the code safer even
> without the symlink patch which triggered the panic.
> 
> Steven French asked to remove the patch from the 5.15/5.10/5.4 queues. Maybe
> you could reply to this request.

Yes, good point.  The problem is that we haven't tested or looked at it enough in order to guarantee that it won't break any other existing code.  And I agree with you that would probably make it safer.

Thanks for looking into that!
Comment 15 Georg Müller 2023-01-13 08:48:32 UTC
Closing this as it is fixed with 6.0.18 and no other kernel has 2d046892a493d

Note You need to log in before you can comment on or make changes to this bug.