Bug 14979

Summary: "BUG: Bad page state in process mount" after poweroff, fsck, and remount (JFS)
Product: File System Reporter: Ben Blum (bblum)
Component: JFSAssignee: Dave Kleikamp (shaggy)
Status: RESOLVED OBSOLETE    
Severity: low CC: alan, bugzilla.kernel
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31-r4 Subsystem:
Regression: No Bisected commit-id:

Description Ben Blum 2010-01-02 23:05:10 UTC
Machine had its plug pulled in the middle of some disk activity, upon rebooting, I tried fscking and remounting the filesystem rw. The following surfaced in dmesg (many times), and shortly afterward the kernel remounted as ro.

BUG: Bad page state in process mount  pfn:7be91
page:ffffea0001b1afb8 flags:400000000000080c count:0 mapcount:0 mapping:(null) index:3589ad
Pid: 19270, comm: mount Tainted: G    B      2.6.31-gentoo-r4 #1
Call Trace:
 [<ffffffff810a7a11>] ? bad_page+0xd1/0x140
 [<ffffffff810a8ba8>] ? __pagevec_free+0x38/0x50
 [<ffffffff810ac305>] ? release_pages+0x145/0x210
 [<ffffffff810aca19>] ? __pagevec_release+0x19/0x30
 [<ffffffff810ad505>] ? truncate_inode_pages_range+0x2b5/0x3b0
 [<ffffffff811f15e4>] ? jfs_remount+0x184/0x1d0
 [<ffffffff810d8637>] ? do_remount_sb+0x77/0x100
 [<ffffffff810f114c>] ? do_mount+0x46c/0x8e0
 [<ffffffff810f1688>] ? sys_mount+0xc8/0x100
 [<ffffffff8100b3ab>] ? system_call_fastpath+0x16/0x1b

Trying to fsck again emitted an A-OK message, and let me remount rw again, but still gave various filesystem error messages in dmesg, and the partition got remounted ro shortly afterward again. It took a reboot plus one more fsck to get everything consistent again.
Comment 1 Charles 2011-04-26 08:53:10 UTC
Similar error message after a resize ...

Environment: Debian 6.0 Squeeze 64-bit with an un-customised 2.6.32 kernel, LVM 2.02.66 and jfsutils 1.1.1

Summary: /var's JFS file system was full. lvmextended its volume. Ran mount -o noatime,remount,resize /var. /var became read-only. Ran shutdown -r as a "quick fix" but the server did not reboot until power cycled.

Details ...

root@LS1:~# df -hT /var
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/VG1-var
               jfs     28G   28G     0 100% /var
root@LS1:~# lvextend --size 100G /dev/VG1/var
  Extending logical volume var to 100.00 GiB
  Logical volume var successfully resized
root@LS1:~# grep var /etc/fstab
/dev/mapper/VG1-var /var            jfs     noatime         0       2
root@LS1:~# mount -o noatime,remount,resize /var
root@LS1:~# df -hT /var
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/VG1-var
               jfs    100G   28G   73G  28% /varmount -o noatime,remount,resize /var

The above resulted in /var being read-only (error from an mkdir command: chown: changing ownership of `/var/chroot/bind9/var/cache/bind': Read-only file system).

Tried to re-mount read-write:
Code:

root@LS1:~# mount -o noatime,remount,rw /var
mount: /var not mounted already, or bad option
root@LS1:~# mount -o remount /var
mount: /var not mounted already, or bad option
root@LS1:~# shutdown -r now

Broadcast message from root@LS1 (pts/0) (Thu Apr 21 15:44:07 2011):

The system is going down for reboot NOW!

The system did not reboot.  After power-cycling, checked the /var/log files which had continued to be written after /var went read-only (because syslogd was holding them open?) ...

From /var/log/syslog first sign of trouble: Apr 21 15:39:28 LS1 ntpd[1228]: frequency file /var/lib/ntp/ntp.drift.TEMP: Read-only file system after which many more caused by same interspersed with 23 sets of messages similar to
Code:

Apr 21 15:41:36 LS1 kernel: [280952.602400] BUG: Bad page state in process mount  pfn:133191
Apr 21 15:41:36 LS1 kernel: [280952.602468] page:ffffea000432d7b8 flags:020000000000080c count:0 mapcount:0 mapping:(null) index:b0007
Apr 21 15:41:36 LS1 kernel: [280952.602576] Pid: 9009, comm: mount Not tainted 2.6.32-5-amd64 #1
Apr 21 15:41:36 LS1 kernel: [280952.602580] Call Trace:
Apr 21 15:41:36 LS1 kernel: [280952.602593]  [<ffffffff810b77c1>] ? bad_page+0x116/0x129
Apr 21 15:41:36 LS1 kernel: [280952.602601]  [<ffffffff810b7caa>] ? free_pages_check+0x38/0x57
Apr 21 15:41:36 LS1 kernel: [280952.602608]  [<ffffffff810b900c>] ? free_hot_cold_page+0x46/0x191
Apr 21 15:41:36 LS1 kernel: [280952.602615]  [<ffffffff810b91c0>] ? __pagevec_free+0x69/0x80
Apr 21 15:41:36 LS1 kernel: [280952.602622]  [<ffffffff810bc073>] ? release_pages+0x137/0x18d
Apr 21 15:41:36 LS1 kernel: [280952.602630]  [<ffffffff810bc62b>] ? __pagevec_release+0x19/0x22
Apr 21 15:41:36 LS1 kernel: [280952.602637]  [<ffffffff810bcdac>] ? truncate_inode_pages_range+0xdb/0x2b0
Apr 21 15:41:36 LS1 kernel: [280952.602658]  [<ffffffffa0092a39>] ? jfs_remount+0xb8/0x15d [jfs]
Apr 21 15:41:36 LS1 kernel: [280952.602666]  [<ffffffff810eff0a>] ? do_remount_sb+0xe0/0x122
Apr 21 15:41:36 LS1 kernel: [280952.602674]  [<ffffffff81104753>] ? do_mount+0x27a/0x792
Apr 21 15:41:36 LS1 kernel: [280952.602681]  [<ffffffff81104ceb>] ? sys_mount+0x80/0xbd
Apr 21 15:41:36 LS1 kernel: [280952.602688]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Apr 21 15:41:36 LS1 kernel: [280952.602693] Disabling lock debugging due to kernel taint

/var/log/syslog showed shutdown messages. The last (after normal named shutdown) was
Code:

Apr 21 15:44:29 LS1 ntpd[1228]: Deleting interface #6 tun0, 10.8.0.1#123, interface stats: received=0, sent=0, dropped=0, active_time=281100 secs

/var/log/messages had nothing extra.

/var/log/kern.log started with the last boot so not helpful.

/var/log/boot included:
Fri Apr 22 09:50:58 2011: fsck.jfs version 1.1.12, 24-Aug-2007
Fri Apr 22 09:50:58 2011: processing started: 4/22/2011 9.50.43
Fri Apr 22 09:50:58 2011: The current device is:  /dev/mapper/VG1-var
Fri Apr 22 09:50:58 2011: Block size in bytes:  4096
Fri Apr 22 09:50:58 2011: Filesystem size in blocks:  26214400
Fri Apr 22 09:50:58 2011: **Phase 0 - Replay Journal Log
Fri Apr 22 09:50:58 2011: **Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
Fri Apr 22 09:50:58 2011: **Phase 2 - Count links
Fri Apr 22 09:50:58 2011: **Phase 3 - Duplicate Block Rescan and Directory Connectedness
Fri Apr 22 09:50:58 2011: **Phase 4 - Report Problems
Fri Apr 22 09:50:58 2011: **Phase 5 - Check Connectivity
Fri Apr 22 09:50:58 2011: **Phase 6 - Perform Approved Corrections
Fri Apr 22 09:50:58 2011: **Phase 7 - Rebuild File/Directory Allocation Maps
Fri Apr 22 09:50:58 2011: **Phase 8 - Rebuild Disk Allocation Maps
Fri Apr 22 09:50:58 2011: 104857600 kilobytes total disk space.
Fri Apr 22 09:50:58 2011:      5054 kilobytes in 1276 directories.
Fri Apr 22 09:50:58 2011:  29248733 kilobytes in 18843 user files.
Fri Apr 22 09:50:58 2011:       912 kilobytes in extended attributes
Fri Apr 22 09:50:58 2011:     62853 kilobytes reserved for system use.
Fri Apr 22 09:50:58 2011:  75550156 kilobytes are available for use.
Fri Apr 22 09:50:58 2011: Filesystem is clean.

Checking /var/log/syslog after a normal shutdown -r now, the last messages were:
Code:

Apr 22 16:52:21 LS1 ntpd[1256]: ntpd exiting on signal 15
Apr 22 16:52:21 LS1 acpid: exiting
Apr 22 16:52:21 LS1 rpc.statd[976]: Caught signal 15, un-registering and exiting
Apr 22 16:52:22 LS1 kernel: Kernel logging (proc) stopped.
Apr 22 16:52:22 LS1 rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="1116" x-info="http://www.rsyslog.com"] exiting on signal 15.

There have been no further signs of trouble with this file system after the automatic fsck during the boot initiated by power cycling.
Comment 2 Charles 2012-03-11 03:54:14 UTC
Similar error messages after resizing the same file system, this time without using noatime ...

Environment: Debian 6.0 Squeeze 64-bit with standard distro 2.6.32 kernel, lvm2
2.02.66-5 and jfsutils 1.1.12-2.1

Summary: /var's JFS file system was 85% used.  Ran lvextend --size 150G /dev/VG1/var.  Ran mount -o remount,resize /var.  When tested by touch /var/trash/ got "touch: cannot touch `/var/trash': Read-only file system".  shutdown -r hung.  Forced reboot with echo 1 > /proc/sys/kernel/sysrq and echo b > /proc/sysrq-trigger

Details ...

root@LS1:~# df -hT /var
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/VG1-var
               jfs    100G   85G   16G  85% /var
root@LS1:~# lvextend --size 150G /dev/VG1/var
  Extending logical volume var to 150.00 GiB
  Logical volume var successfully resized
root@LS1:~# mount -o remount,resize /var
root@LS1:~# df -hT /var
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/mapper/VG1-var
               jfs    150G   85G   66G  57% /var
root@LS1:~# touch /var/trash
touch: cannot touch `/var/trash': Read-only file system
root@LS1:~# mount -o remount,rw /var
mount: /var not mounted already, or bad option
root@LS1:~# shutdown -r now

Broadcast message from root@LS1 (pts/0) (Sun Mar 11 08:41:23 2012):

The system is going down for reboot NOW!

Closed the ssh session.  Waited a minute or so.  Started new ssh session.  /var still read only. Forced a reboot:
root@LS1:~# echo 1 > /proc/sys/kernel/sysrq
root@LS1:~# echo b > /proc/sysrq-trigger

From /var/log/syslog first sign of trouble:
Mar 11 08:38:56 LS1 kernel: [51235.389183] ERROR: (device dm-2): diFree: numfree > numinos
Mar 11 08:38:57 LS1 nmbd[18562]: [2012/03/11 08:38:57.487548,  0] nmbd/nmbd_winsserver.c:2380(wins_write_database)
Mar 11 08:38:57 LS1 nmbd[18562]:   wins_write_database: Can't open /var/lib/samba/wins.dat.18562. Error was Read-only file system
...
More messages cause by read-only /var
...
Mar 11 08:40:46 LS1 kernel: [51346.048759] BUG: Bad page state in process mount  pfn:13352e
Mar 11 08:40:46 LS1 kernel: [51346.048827] page:ffffea000433a210 flags:020000000000080c count:0 mapcount:0 mapping:(null) index:b0007
Mar 11 08:40:46 LS1 kernel: [51346.048933] Pid: 18615, comm: mount Not tainted 2.6.32-5-amd64 #1
Mar 11 08:40:46 LS1 kernel: [51346.048937] Call Trace:
Mar 11 08:40:46 LS1 kernel: [51346.048949]  [<ffffffff810b7d41>] ? bad_page+0x116/0x129
Mar 11 08:40:46 LS1 kernel: [51346.048957]  [<ffffffff810b822a>] ? free_pages_check+0x38/0x57
Mar 11 08:40:46 LS1 kernel: [51346.048964]  [<ffffffff810b958c>] ? free_hot_cold_page+0x46/0x191
Mar 11 08:40:46 LS1 kernel: [51346.048970]  [<ffffffff810b9740>] ? __pagevec_free+0x69/0x80
Mar 11 08:40:46 LS1 kernel: [51346.048977]  [<ffffffff810bc5fb>] ? release_pages+0x137/0x18d
Mar 11 08:40:46 LS1 kernel: [51346.048985]  [<ffffffff810bcbb3>] ? __pagevec_release+0x19/0x22
Mar 11 08:40:46 LS1 kernel: [51346.048992]  [<ffffffff810bd334>] ? truncate_inode_pages_range+0xdb/0x2b0
Mar 11 08:40:46 LS1 kernel: [51346.049014]  [<ffffffffa0095a39>] ? jfs_remount+0xb8/0x15d [jfs]
Mar 11 08:40:46 LS1 kernel: [51346.049022]  [<ffffffff810f056a>] ? do_remount_sb+0xe0/0x122
Mar 11 08:40:46 LS1 kernel: [51346.049030]  [<ffffffff81104dcf>] ? do_mount+0x27a/0x792
Mar 11 08:40:46 LS1 kernel: [51346.049040]  [<ffffffff81105367>] ? sys_mount+0x80/0xbd
Mar 11 08:40:46 LS1 kernel: [51346.049049]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Mar 11 08:40:46 LS1 kernel: [51346.049055] Disabling lock debugging due to kernel taint
Mar 11 08:40:46 LS1 kernel: [51346.050735] BUG: Bad page state in process mount  pfn:13c366
Mar 11 08:40:46 LS1 kernel: [51346.050808] page:ffffea000452be50 flags:020000000000080c count:0 mapcount:0 mapping:(null) index:3c0005
Mar 11 08:40:46 LS1 kernel: [51346.050921] Pid: 18615, comm: mount Tainted: G    B      2.6.32-5-amd64 #1
Mar 11 08:40:46 LS1 kernel: [51346.050926] Call Trace:
Mar 11 08:40:46 LS1 kernel: [51346.050938]  [<ffffffff810b7d41>] ? bad_page+0x116/0x129
Mar 11 08:40:46 LS1 kernel: [51346.050946]  [<ffffffff810b822a>] ? free_pages_check+0x38/0x57
Mar 11 08:40:46 LS1 kernel: [51346.050952]  [<ffffffff810b958c>] ? free_hot_cold_page+0x46/0x191
Mar 11 08:40:46 LS1 kernel: [51346.050959]  [<ffffffff810b9740>] ? __pagevec_free+0x69/0x80
Mar 11 08:40:46 LS1 kernel: [51346.050966]  [<ffffffff810bc5fb>] ? release_pages+0x137/0x18d
Mar 11 08:40:46 LS1 kernel: [51346.050974]  [<ffffffff810bcbb3>] ? __pagevec_release+0x19/0x22
Mar 11 08:40:46 LS1 kernel: [51346.050981]  [<ffffffff810bd334>] ? truncate_inode_pages_range+0xdb/0x2b0
Mar 11 08:40:46 LS1 kernel: [51346.051002]  [<ffffffffa0095a39>] ? jfs_remount+0xb8/0x15d [jfs]
Mar 11 08:40:46 LS1 kernel: [51346.051010]  [<ffffffff810f056a>] ? do_remount_sb+0xe0/0x122
Mar 11 08:40:46 LS1 kernel: [51346.051017]  [<ffffffff81104dcf>] ? do_mount+0x27a/0x792
Mar 11 08:40:46 LS1 kernel: [51346.051024]  [<ffffffff81105367>] ? sys_mount+0x80/0xbd
Mar 11 08:40:46 LS1 kernel: [51346.051032]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Mar 11 08:40:46 LS1 kernel: [51346.052427] BUG: Bad page state in process mount  pfn:13acd5
...
Similar repeated several times.

After reboot, /var/log/boot included:
Sun Mar 11 08:51:41 2012: fsck.jfs version 1.1.12, 24-Aug-2007
Sun Mar 11 08:51:41 2012: processing started: 3/11/2012 8.51.11
Sun Mar 11 08:51:41 2012: The current device is:  /dev/mapper/VG1-var
Sun Mar 11 08:51:41 2012: Block size in bytes:  4096
Sun Mar 11 08:51:41 2012: Filesystem size in blocks:  39321600
Sun Mar 11 08:51:41 2012: **Phase 0 - Replay Journal Log
Sun Mar 11 08:51:41 2012: **Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
Sun Mar 11 08:51:41 2012: **Phase 2 - Count links
Sun Mar 11 08:51:41 2012: **Phase 3 - Duplicate Block Rescan and Directory Connectedness
Sun Mar 11 08:51:41 2012: **Phase 4 - Report Problems
Sun Mar 11 08:51:41 2012: **Phase 5 - Check Connectivity
Sun Mar 11 08:51:41 2012: **Phase 6 - Perform Approved Corrections
Sun Mar 11 08:51:41 2012: **Phase 7 - Rebuild File/Directory Allocation Maps
Sun Mar 11 08:51:41 2012: **Phase 8 - Rebuild Disk Allocation Maps
Sun Mar 11 08:51:41 2012: 157286400 kilobytes total disk space.
Sun Mar 11 08:51:41 2012:     25092 kilobytes in 2073 directories.
Sun Mar 11 08:51:41 2012:  88298381 kilobytes in 31403 user files.
Sun Mar 11 08:51:41 2012:       916 kilobytes in extended attributes
Sun Mar 11 08:51:41 2012:    116559 kilobytes reserved for system use.
Sun Mar 11 08:51:41 2012:  68895636 kilobytes are available for use.
Sun Mar 11 08:51:41 2012: Filesystem is clean.