Bug 9241

Summary: POSIX Access Control Lists cause bogus file system check errors
Product: File System Reporter: Josh Rosen (bjrosen)
Component: ext3Assignee: Andrew Morton (akpm)
Status: REJECTED UNREPRODUCIBLE    
Severity: high CC: agruen, zhseal0
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.23.1 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Snapshot of the error screen
Working .config file
.config for unbootable kernel

Description Josh Rosen 2007-10-27 18:17:25 UTC
Most recent kernel where this bug did not occur:2.6.23-6.fc8
Distribution:Fedora 8
Hardware Environment:
00:00.0 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
00:00.1 RAM memory: nVidia Corporation C51 Memory Controller 0 (rev a2)
00:00.2 RAM memory: nVidia Corporation C51 Memory Controller 1 (rev a2)
00:00.3 RAM memory: nVidia Corporation C51 Memory Controller 5 (rev a2)
00:00.4 RAM memory: nVidia Corporation C51 Memory Controller 4 (rev a2)
00:00.5 RAM memory: nVidia Corporation C51 Host Bridge (rev a2)
00:00.6 RAM memory: nVidia Corporation C51 Memory Controller 3 (rev a2)
00:00.7 RAM memory: nVidia Corporation C51 Memory Controller 2 (rev a2)
00:02.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1)
00:03.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1)
00:04.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1)
00:05.0 VGA compatible controller: nVidia Corporation C51 [Quadro NVS 210S/GeForce 6150LE] (rev a2)
00:09.0 RAM memory: nVidia Corporation MCP51 Host Bridge (rev a2)
00:0a.0 ISA bridge: nVidia Corporation MCP51 LPC Bridge (rev a2)
00:0a.1 SMBus: nVidia Corporation MCP51 SMBus (rev a2)
00:0a.2 RAM memory: nVidia Corporation MCP51 Memory Controller 0 (rev a2)
00:0b.0 USB Controller: nVidia Corporation MCP51 USB Controller (rev a2)
00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a2)
00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev a1)
00:10.0 PCI bridge: nVidia Corporation MCP51 PCI Bridge (rev a2)
00:10.1 Audio device: nVidia Corporation MCP51 High Definition Audio (rev a2)
00:14.0 Bridge: nVidia Corporation MCP51 Ethernet Controller (rev a1)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
Software Environment: Fedora 8 Test 3
Problem Description:My system fails to boot with 2.6.23, I get the error message

An Error Occurred During File System Check. Running fsck on all the partitions reveals no errors. I have a snapshot of the screen available, I'll try an upload it if I see a way to do that.

I've built a series of 2.6.23.1 kernels and narrowed the problem down to the POSIX ACLs. When I build a kernel with these switches it boots fine,

# File systems
#
CONFIG_EXT2_FS=m
# CONFIG_EXT2_FS_XATTR is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=m
# CONFIG_EXT3_FS_XATTR is not set
# CONFIG_EXT4DEV_FS is not set

The kernel built with these switches won't boot,


#
# File systems
#
CONFIG_EXT2_FS=m
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
# CONFIG_EXT2_FS_SECURITY is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=m
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
# CONFIG_EXT3_FS_SECURITY is not set

Steps to reproduce:
Try and boot a 2.6.23.1 kernel with POSIX_ACL
Comment 1 Josh Rosen 2007-10-27 18:19:33 UTC
Created attachment 13292 [details]
Snapshot of the error screen
Comment 2 Josh Rosen 2007-10-27 18:21:18 UTC
Created attachment 13293 [details]
Working .config file

A kernel built from this kernel boots.
Comment 3 Josh Rosen 2007-10-27 18:22:25 UTC
Created attachment 13294 [details]
.config for unbootable kernel

A kernel built with this .config file won't boot.
Comment 4 Andrew Morton 2007-10-27 22:18:53 UTC
You're saying that this problem does not occur in 2.6.23-6.fc8
and that it does occur in 2.6.23.1?  That seems a bit odd.

Maybe RH changed something, and changed their userspace to suit,
but that doesn't sound likely.

Anyway, hopefully Andreas will know what's going on here.
Comment 5 Josh Rosen 2007-10-28 05:19:35 UTC
It's possible that 2.6.23-6.fc8 was based on a release candidate and not the final 2.6.23. The problem occurs in 2.6.23 as well as 2.6.23.1, the 2.6.22.x kernels work fine so there must have been a change that was introduced in 2.6.23.
Comment 6 Andreas Gruenbacher 2007-10-28 06:13:00 UTC
The failure occurs while checking filesystems in the initrd, while the filesystem check in the running system shows no problems, right? Which messages does the failing e2fsck spit out? I assume that it's the same fsck version in both cases?
Comment 7 Josh Rosen 2007-10-28 06:49:49 UTC
Yes the failure occurs in the initrd, there are no failures when I run e2fsck. The only error messages that I have from the initrd are what you see on the snapshot that I attached. 
Comment 8 Andreas Gruenbacher 2007-10-28 07:29:30 UTC
The fsck error message(s) must be on the part of the log that scrolled out. Can you attach a serial console to preserve that output, or add a sleep after the fsck so that you'll see those messages?
Comment 9 Josh Rosen 2007-10-28 07:57:17 UTC
Please explain the procedure for setting up a serial console, give me step by step instruction including the name of a terminal emulator (it's been years since I've used one). I assume I need a to connect a serial cable to another box, set up a terminal emulator of some sort, and give some switch to the boot loader. Is there an easier way? (over ethernet perhaps?). 
Comment 10 Josh Rosen 2007-10-28 09:17:30 UTC
It's a reporting problem, the error message fails to give any information about which partition has the problem or what the problem is. E2fsck reported that all of the partitions were clean when I ran it without switches, however when I ran it with the -f switch it found problems with some of the attribute counts. After I fixed the problems I was able to boot.

So the main problem isn't with 2.6.23, it found the file system errors, it's with 2.6.22.x and earlier which didn't find a problem. The problem with 2.6.23 is that it needs better error reporting. At the very least it should specify which partition has the file system problem, it would be better if it also specfied what the problems are.
Comment 11 Andreas Gruenbacher 2007-11-14 23:56:57 UTC
I failed to reproduce, sorry. I'm not running Fedora and I'm not using SELinux which exercises extended attributes quite a bit though, so it may just be that the error doesn't trigger here. (That's still quite unlikely since so far; there is only this one bug report about it.)

Some testing advise: first, before running the suspect kernel version, make sure that the filesystem is consistent actually (e2fsck -f). Second, the initrd expectedly is doing nothing more than a filesystem check. If you don't know how to debug the problem directly in the initrd (serial console or simply by modifying the linuxrc script in the initrd, for example), you may as well boot into a rescue system and run e2fsck on the filesystem(s) in question from there. Can the bug still be reproduced with 2.6.23.x?

An interesting data point in addition to the e2fsck error messages would be the inode size used (i.e., the output of ``/sbin/tune2fs -l /dev/sda4 | grep "Inode size"'').

Finally, in case we won't manage to clearly track this issue down here, a more appropriate place to report the problem for getting more debugging help would be the Fedora community, who know about the specifics of the Fedora initrd, for example.