Bug 200951 - kernel NULL pointer dereference in update_sit_entry
Summary: kernel NULL pointer dereference in update_sit_entry
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: f2fs (show other bugs)
Hardware: ARM Linux
: P1 high
Assignee: Default virtual assignee for f2fs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-27 13:44 UTC by vicencb
Modified: 2018-09-11 03:15 UTC (History)
1 user (show)

See Also:
Kernel Version: v4.19-rc1, v4.18.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (2.99 KB, text/plain)
2018-08-27 13:44 UTC, vicencb
Details
sg_readcap (425 bytes, text/plain)
2018-08-27 13:45 UTC, vicencb
Details
sg_vpd (1.69 KB, text/plain)
2018-08-27 13:46 UTC, vicencb
Details
fsck.f2fs (1.60 KB, text/plain)
2018-09-01 12:54 UTC, vicencb
Details
add some log (1.47 KB, text/plain)
2018-09-02 06:52 UTC, Chao Yu
Details
log results (35.10 KB, text/plain)
2018-09-02 16:47 UTC, vicencb
Details
add some log (bis) (2.50 KB, patch)
2018-09-02 16:49 UTC, vicencb
Details | Diff
log results (bis) (35.21 KB, text/plain)
2018-09-02 16:50 UTC, vicencb
Details

Description vicencb 2018-08-27 13:44:43 UTC
Created attachment 278127 [details]
dmesg

Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.

The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
 sda1: FAT  /boot
 sda2: F2FS /
 sda3: F2FS /home
 sda4: F2FS

The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
 mounting with "discard" option, but the device does not support discard

The USB host is USB3.0 and UASP capable. It is the one on RK3399.

Given this everything works fine, except there is no TRIM support.

In order to enable TRIM a new UDEV rule is added [1]:
 /etc/udev/rules.d/10-sata-bridge-trim.rules:
 ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
 Unable to handle kernel NULL pointer dereference

Also tested on a x86_64 system: works fine even with TRIM enabled.
 same disc
 same bridge
 different usb host controller
 different cpu architecture
 not root filesystem

Regards,
  Vicenç.

[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Comment 1 vicencb 2018-08-27 13:45:59 UTC
Created attachment 278129 [details]
sg_readcap
Comment 2 vicencb 2018-08-27 13:46:27 UTC
Created attachment 278131 [details]
sg_vpd
Comment 3 Chao Yu 2018-08-31 08:17:40 UTC
Hi,

Is this bug reproducible? Can you run fsck on device to check filesystem consistence.
Comment 4 vicencb 2018-09-01 12:53:38 UTC
Hi! Yes, it is reproducible.
Each time TRIM is enabled via the UDEV rule and the FS mounted with
the discard option, the issue is triggered at the very first write attempt.
Before mounting fsck.f2fs was successfully run on the FS.

Regards,
  Vicenç.
Comment 5 vicencb 2018-09-01 12:54:28 UTC
Created attachment 278227 [details]
fsck.f2fs
Comment 6 vicencb 2018-09-01 13:18:20 UTC
Hi, just found that the issue is also reproducible when mounted
with the "nodiscard" option, enabling TRIM via UDEV is enouth.
Comment 7 Chao Yu 2018-09-02 06:52:18 UTC
Created attachment 278233 [details]
add some log

Can you apply this patch attached and reproduce this bug again?
Comment 8 vicencb 2018-09-02 16:47:16 UTC
Created attachment 278241 [details]
log results

Hi, here are the results asked for.
Comment 9 vicencb 2018-09-02 16:49:16 UTC
Created attachment 278243 [details]
add some log (bis)

Hi, added a few more printk for extra verbosity.
Comment 10 vicencb 2018-09-02 16:50:15 UTC
Created attachment 278245 [details]
log results (bis)

The results of the previous patch.
Comment 11 Chao Yu 2018-09-03 01:27:12 UTC
Actually, the problem here is during mount(),  both blk_queue_discard(q) and f2fs_sb_has_blkzoned(sbi) return false, so f2fs will skip allocating memory for se->discard_map.

static inline bool f2fs_discard_en(struct f2fs_sb_info *sbi)
{
	struct request_queue *q = bdev_get_queue(sbi->sb->s_bdev);

	return blk_queue_discard(q) || f2fs_sb_has_blkzoned(sbi);
}

[    4.372580] f2fs_discard_en(1):0
[    4.498957] f2fs_discard_en(2):0

But later, during update_sit_entiry(), f2fs_discard_en() return true, then it will cause f2fs to update se->discard_map bitmap, result in panic.

[   56.939547] f2fs_discard_en(3):1 new_blkaddr:0xFFFF00000AB0383C
[   56.945027] f2fs_discard_en(5):1 se:0xFFFF8000ED285ED8


Is there any interface we can turn on discard of device in real time?
Comment 12 vicencb 2018-09-03 09:35:24 UTC
As stated in the bug description, there is an udev rule to set
the provisioning_mode attribute to unmap. When searching for
this attribute name, i found this:
drivers/scsi/sd.c:749
  blk_queue_max_discard_sectors(q, ...);
  blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);

So, it looks like the answer is yes, there is an interface to
turn on discard of device in real time.

Regards,
  Vicenç.
Comment 13 Chao Yu 2018-09-03 09:41:44 UTC
(In reply to vicencb from comment #12)
> As stated in the bug description, there is an udev rule to set

Sorry, I missed this one... :(

> the provisioning_mode attribute to unmap. When searching for
> this attribute name, i found this:
> drivers/scsi/sd.c:749
>   blk_queue_max_discard_sectors(q, ...);
>   blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
> 
> So, it looks like the answer is yes, there is an interface to
> turn on discard of device in real time.

Alright, let me send one patch to fix this issue.

> 
> Regards,
>   Vicenç.
Comment 14 vicencb 2018-09-03 09:54:29 UTC
This could explain why it fails on aarch64 but works on x86_64.
When this is tested on aarch64, the f2fs is the root FS, and the udev
rule is applied after mount because it is stored in the filesystemm itself.
Whereas in the x86_64 case, the udev rule is applied when the disk is
plugged in, before mounting.
All in all, it is a general issue, not an architecture-specific one.
Comment 15 Chao Yu 2018-09-04 09:43:31 UTC
I've sent one patch for this issue, just cc your email, also you can find the patch in below link[1], can you please try this patch to check whether it can fix this issue?

[1] https://sourceforge.net/p/linux-f2fs/mailman/message/36407378/
Comment 16 vicencb 2018-09-04 12:51:57 UTC
Hi,
thank you for being that fast!

I have tried it against v4.19-rc2
  https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=57361846b52bc686112da6ca5368d11210796804
but it does not apply
  checking file fs/f2fs/debug.c
  Hunk #1 succeeded at 190 (offset -3 lines).
  checking file fs/f2fs/f2fs.h
  Hunk #1 FAILED at 3436.
  1 out of 1 hunk FAILED
  checking file fs/f2fs/file.c
  checking file fs/f2fs/segment.c
  Hunk #1 succeeded at 1725 (offset -43 lines).
  ...
  Hunk #8 succeeded at 3959 (offset -45 lines).
  checking file fs/f2fs/super.c
  Hunk #1 succeeded at 360 (offset -3 lines).
  Hunk #2 FAILED at 417.
  Hunk #3 succeeded at 1031 (offset -29 lines).
  Hunk #4 FAILED at 1440.
  2 out of 4 hunks FAILED

Which kernel tree should i be using?

Regards,
  Vicenç.
Comment 17 Chao Yu 2018-09-04 14:13:33 UTC
Updated in below git link, :)

https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/log/?h=f2fs-dev
Comment 18 Chao Yu 2018-09-11 03:15:03 UTC
This issue has been fixed with patch listed in below link, so I tag this issue as resolved one.

https://sourceforge.net/p/linux-f2fs/mailman/message/36407519/

Note You need to log in before you can comment on or make changes to this bug.