Bug 47611

Summary: NULL pointer dereference in ext4_ext_remove_space on 3.5.1
Product: File System Reporter: Dan Carpenter (error27)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: dmonakhov, mehmet, tytso
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.5.1 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 47401    

Description Dan Carpenter 2012-09-17 11:32:34 UTC
https://lkml.org/lkml/2012/8/15/372

Date: Wed, 15 Aug 2012 21:33:29 +0300
From: Marti Raudsepp <marti@juffo.org>
To: Kernel hackers <linux-kernel@vger.kernel.org>, ext4 hackers <linux-ext4@vger.kernel.org>
Subject: NULL pointer dereference in ext4_ext_remove_space on 3.5.1
Message-ID: <CABRT9RAOhaxcYdCxMn5neJ9WT85r=h=7WgZ2dmLaOs-MMqDW9A@mail.gmail.com>

Hi list,

I was moving and deleting some files between two of my ext4 partitions
when it suddenly crashed and dropped me into an kernel oops screen
(below). I'm using ext4 on kernel 3.5.1 (Arch Linux). Both likely
suspect file systems are stored on LVM2, mounted with
data=writeback,errors=remount-ro,noatime

EXT4-fs (sda3): mounted filesystem with writeback data mode. Opts: (null)
EXT4-fs (dm-1): mounted filesystem with writeback data mode. Opts:
errors=remount-ro
EXT4-fs (dm-0): mounted filesystem with writeback data mode. Opts:
errors=remount-ro

I can't be bothered to transcribe all those 64-bit pointer values, but
here's the human-readable parts (screen shot links below):

BUG: unable to handle kernel NULL pointer dereference at 000...00028
IP: [...] ext4_ext_remove_space+0xaa4/0xef0 [ext4]
PGD ... PUD ... PMD 0
Oops: 0000 [#1] PREEMPT SMP
CPU 2
Modules linked in: [... bazillion modules ...]

Pid: 3687, comm: rm Tainted: G    C    3.5.1-1-ARCH #1 System
manufacturer System Product Name/M3A76-CM
RIP: 0010:[...] [...] ext4_ext_remove_space
[... CPU register dump ...]
Process rm (pid: 3687, threadinfo ..., task ...)
Call Trace:
 [...] ext4_ext_truncate+0x183/0x1c0 [ext]
 [...] ? ext4_mark_inode_dirty+0x7e/0x230 [ext4]
 [...] ext4_truncate+0x135/0x140 [ext4]
 [...] ext4_evict_inode+0x40f/0x4e0 [ext4]
 [...] evict+0xb8/0x1b0
 [...] iput+0x105/0x210
 [...] do_unlinkat+0x1b/0x50
 [...] ? flip_close+0x66/0xa0
 [...] sys_unlinkat+0x1b/0x50
 [...] system_call_fastpath+0x16/0x1b
Code: [...]
RIP [...] ext4_ext_remove_space+0xaa4/0xef0 [ext4]
CR2: 000...00028
Taint flag C is caused by the staging r8712u driver (Realtek USB Wi-Fi
adapter). But I wasn't even using it today, so I doubt that had
anything to do with it.

Some "screen shots" taken with a webcam (warning: CSI "zoom & enhance"
technology required)
http://ompldr.org/vZjQ3dw
http://ompldr.org/vZjQ3eg
http://ompldr.org/vZjQ4MA
http://ompldr.org/vZjQ3eQ
http://ompldr.org/vZjQ3eA

Regards,
Marti
Comment 1 Dmitry Monakhov 2012-09-17 12:00:39 UTC
This patch should helps
http://patchwork.ozlabs.org/patch/183649/
Comment 2 Theodore Tso 2012-09-17 12:07:17 UTC
Fixed by commit: 89a4e48f8479f: "ext4: fix kernel BUG on large-scale rm -rf commands", which appeared in Linus's tree as of 3.6-rc3, and which was backported to v3.5.3.

(Dmitry, looking at your patch, I'm guessing you haven't tried rebasing your patch series to 3.6-rc3 or later?   If you haven't, that would be much appreciated.)
Comment 3 Dmitry Monakhov 2012-09-17 12:28:07 UTC
(In reply to comment #2)
> Fixed by commit: 89a4e48f8479f: "ext4: fix kernel BUG on large-scale rm -rf
> commands", which appeared in Linus's tree as of 3.6-rc3, and which was
> backported to v3.5.3.
> 
> (Dmitry, looking at your patch, I'm guessing you haven't tried rebasing your
> patch series to 3.6-rc3 or later?   If you haven't, that would be much
> appreciated.)
I've prepared patches against your git tree 
http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git
And i can't find 3.6-rc3 tag here. Which git tree should i use when?
Comment 4 Theodore Tso 2012-09-17 14:08:30 UTC
Oh, my bad.  I hadn't pushed the master branch forward even though the patch had been sent to Linus and merged for 3.6-rc3.

The ext4 git tree has three branches of interest.   Internally, I work off of an ext4 patch queue which can be found here: git://repo.or.cz/ext4-patch-queue.git
or here: https://github.com/tytso/ext4-patch-queue.git

The base of the patch series is the "origin" branch on the ext4 git tree, and that is always a commit which is in Linus's mainline.  The last "stable patch" (before the "stable-boundary" no-op patch in the ext4 patch queue) is what is generally on the "dev" branch, and that is what is synched-up with linux-next.

The "master" branch supposed to live on a commit somewhere between "origin" and "dev", and represents a commitment that everything at or before the "master" branch pointer is a stable commit that I will not rewind or rebase.   Commits between "master" and "dev" are stable, and will probably not be rebased, but the commit description might change, or if a critical bug is found, a commit might require revision, or in rare cases, might get dropped.

So if you are doing developement using git, you're better off using the "master" branch, since that is a non-rewinding branch.  If you are just using patch series or some kind of patch queue (i.e., stgit, guilt, quilt, etc.) then it should be fine to use the "dev" branch.

For example, at the moment the 64-bit resize patches are still not yet in master, although at this point they are pretty stable and will probably not change.   Everything before the "master" branch, however, is guaranteed to be stable.
Comment 5 costinel 2012-09-19 18:02:40 UTC
Does this bug affect 3.2.0?
Comment 6 Theodore Tso 2012-09-19 18:06:37 UTC
No, it doesn't affect 3.2.0; this was a regression which was introduced in 3.5.0, and fixed by 3.5.3 (and in upstream by the time 3.6-rc3 was released).
Comment 7 Mehmet Giritli 2012-10-15 08:59:13 UTC
*** Bug 48711 has been marked as a duplicate of this bug. ***