Bug 14979
Summary: | "BUG: Bad page state in process mount" after poweroff, fsck, and remount (JFS) | ||
---|---|---|---|
Product: | File System | Reporter: | Ben Blum (bblum) |
Component: | JFS | Assignee: | Dave Kleikamp (shaggy) |
Status: | RESOLVED OBSOLETE | ||
Severity: | low | CC: | alan, bugzilla.kernel |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.31-r4 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Ben Blum
2010-01-02 23:05:10 UTC
Similar error message after a resize ... Environment: Debian 6.0 Squeeze 64-bit with an un-customised 2.6.32 kernel, LVM 2.02.66 and jfsutils 1.1.1 Summary: /var's JFS file system was full. lvmextended its volume. Ran mount -o noatime,remount,resize /var. /var became read-only. Ran shutdown -r as a "quick fix" but the server did not reboot until power cycled. Details ... root@LS1:~# df -hT /var Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/VG1-var jfs 28G 28G 0 100% /var root@LS1:~# lvextend --size 100G /dev/VG1/var Extending logical volume var to 100.00 GiB Logical volume var successfully resized root@LS1:~# grep var /etc/fstab /dev/mapper/VG1-var /var jfs noatime 0 2 root@LS1:~# mount -o noatime,remount,resize /var root@LS1:~# df -hT /var Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/VG1-var jfs 100G 28G 73G 28% /varmount -o noatime,remount,resize /var The above resulted in /var being read-only (error from an mkdir command: chown: changing ownership of `/var/chroot/bind9/var/cache/bind': Read-only file system). Tried to re-mount read-write: Code: root@LS1:~# mount -o noatime,remount,rw /var mount: /var not mounted already, or bad option root@LS1:~# mount -o remount /var mount: /var not mounted already, or bad option root@LS1:~# shutdown -r now Broadcast message from root@LS1 (pts/0) (Thu Apr 21 15:44:07 2011): The system is going down for reboot NOW! The system did not reboot. After power-cycling, checked the /var/log files which had continued to be written after /var went read-only (because syslogd was holding them open?) ... From /var/log/syslog first sign of trouble: Apr 21 15:39:28 LS1 ntpd[1228]: frequency file /var/lib/ntp/ntp.drift.TEMP: Read-only file system after which many more caused by same interspersed with 23 sets of messages similar to Code: Apr 21 15:41:36 LS1 kernel: [280952.602400] BUG: Bad page state in process mount pfn:133191 Apr 21 15:41:36 LS1 kernel: [280952.602468] page:ffffea000432d7b8 flags:020000000000080c count:0 mapcount:0 mapping:(null) index:b0007 Apr 21 15:41:36 LS1 kernel: [280952.602576] Pid: 9009, comm: mount Not tainted 2.6.32-5-amd64 #1 Apr 21 15:41:36 LS1 kernel: [280952.602580] Call Trace: Apr 21 15:41:36 LS1 kernel: [280952.602593] [<ffffffff810b77c1>] ? bad_page+0x116/0x129 Apr 21 15:41:36 LS1 kernel: [280952.602601] [<ffffffff810b7caa>] ? free_pages_check+0x38/0x57 Apr 21 15:41:36 LS1 kernel: [280952.602608] [<ffffffff810b900c>] ? free_hot_cold_page+0x46/0x191 Apr 21 15:41:36 LS1 kernel: [280952.602615] [<ffffffff810b91c0>] ? __pagevec_free+0x69/0x80 Apr 21 15:41:36 LS1 kernel: [280952.602622] [<ffffffff810bc073>] ? release_pages+0x137/0x18d Apr 21 15:41:36 LS1 kernel: [280952.602630] [<ffffffff810bc62b>] ? __pagevec_release+0x19/0x22 Apr 21 15:41:36 LS1 kernel: [280952.602637] [<ffffffff810bcdac>] ? truncate_inode_pages_range+0xdb/0x2b0 Apr 21 15:41:36 LS1 kernel: [280952.602658] [<ffffffffa0092a39>] ? jfs_remount+0xb8/0x15d [jfs] Apr 21 15:41:36 LS1 kernel: [280952.602666] [<ffffffff810eff0a>] ? do_remount_sb+0xe0/0x122 Apr 21 15:41:36 LS1 kernel: [280952.602674] [<ffffffff81104753>] ? do_mount+0x27a/0x792 Apr 21 15:41:36 LS1 kernel: [280952.602681] [<ffffffff81104ceb>] ? sys_mount+0x80/0xbd Apr 21 15:41:36 LS1 kernel: [280952.602688] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b Apr 21 15:41:36 LS1 kernel: [280952.602693] Disabling lock debugging due to kernel taint /var/log/syslog showed shutdown messages. The last (after normal named shutdown) was Code: Apr 21 15:44:29 LS1 ntpd[1228]: Deleting interface #6 tun0, 10.8.0.1#123, interface stats: received=0, sent=0, dropped=0, active_time=281100 secs /var/log/messages had nothing extra. /var/log/kern.log started with the last boot so not helpful. /var/log/boot included: Fri Apr 22 09:50:58 2011: fsck.jfs version 1.1.12, 24-Aug-2007 Fri Apr 22 09:50:58 2011: processing started: 4/22/2011 9.50.43 Fri Apr 22 09:50:58 2011: The current device is: /dev/mapper/VG1-var Fri Apr 22 09:50:58 2011: Block size in bytes: 4096 Fri Apr 22 09:50:58 2011: Filesystem size in blocks: 26214400 Fri Apr 22 09:50:58 2011: **Phase 0 - Replay Journal Log Fri Apr 22 09:50:58 2011: **Phase 1 - Check Blocks, Files/Directories, and Directory Entries Fri Apr 22 09:50:58 2011: **Phase 2 - Count links Fri Apr 22 09:50:58 2011: **Phase 3 - Duplicate Block Rescan and Directory Connectedness Fri Apr 22 09:50:58 2011: **Phase 4 - Report Problems Fri Apr 22 09:50:58 2011: **Phase 5 - Check Connectivity Fri Apr 22 09:50:58 2011: **Phase 6 - Perform Approved Corrections Fri Apr 22 09:50:58 2011: **Phase 7 - Rebuild File/Directory Allocation Maps Fri Apr 22 09:50:58 2011: **Phase 8 - Rebuild Disk Allocation Maps Fri Apr 22 09:50:58 2011: 104857600 kilobytes total disk space. Fri Apr 22 09:50:58 2011: 5054 kilobytes in 1276 directories. Fri Apr 22 09:50:58 2011: 29248733 kilobytes in 18843 user files. Fri Apr 22 09:50:58 2011: 912 kilobytes in extended attributes Fri Apr 22 09:50:58 2011: 62853 kilobytes reserved for system use. Fri Apr 22 09:50:58 2011: 75550156 kilobytes are available for use. Fri Apr 22 09:50:58 2011: Filesystem is clean. Checking /var/log/syslog after a normal shutdown -r now, the last messages were: Code: Apr 22 16:52:21 LS1 ntpd[1256]: ntpd exiting on signal 15 Apr 22 16:52:21 LS1 acpid: exiting Apr 22 16:52:21 LS1 rpc.statd[976]: Caught signal 15, un-registering and exiting Apr 22 16:52:22 LS1 kernel: Kernel logging (proc) stopped. Apr 22 16:52:22 LS1 rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="1116" x-info="http://www.rsyslog.com"] exiting on signal 15. There have been no further signs of trouble with this file system after the automatic fsck during the boot initiated by power cycling. Similar error messages after resizing the same file system, this time without using noatime ... Environment: Debian 6.0 Squeeze 64-bit with standard distro 2.6.32 kernel, lvm2 2.02.66-5 and jfsutils 1.1.12-2.1 Summary: /var's JFS file system was 85% used. Ran lvextend --size 150G /dev/VG1/var. Ran mount -o remount,resize /var. When tested by touch /var/trash/ got "touch: cannot touch `/var/trash': Read-only file system". shutdown -r hung. Forced reboot with echo 1 > /proc/sys/kernel/sysrq and echo b > /proc/sysrq-trigger Details ... root@LS1:~# df -hT /var Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/VG1-var jfs 100G 85G 16G 85% /var root@LS1:~# lvextend --size 150G /dev/VG1/var Extending logical volume var to 150.00 GiB Logical volume var successfully resized root@LS1:~# mount -o remount,resize /var root@LS1:~# df -hT /var Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/VG1-var jfs 150G 85G 66G 57% /var root@LS1:~# touch /var/trash touch: cannot touch `/var/trash': Read-only file system root@LS1:~# mount -o remount,rw /var mount: /var not mounted already, or bad option root@LS1:~# shutdown -r now Broadcast message from root@LS1 (pts/0) (Sun Mar 11 08:41:23 2012): The system is going down for reboot NOW! Closed the ssh session. Waited a minute or so. Started new ssh session. /var still read only. Forced a reboot: root@LS1:~# echo 1 > /proc/sys/kernel/sysrq root@LS1:~# echo b > /proc/sysrq-trigger From /var/log/syslog first sign of trouble: Mar 11 08:38:56 LS1 kernel: [51235.389183] ERROR: (device dm-2): diFree: numfree > numinos Mar 11 08:38:57 LS1 nmbd[18562]: [2012/03/11 08:38:57.487548, 0] nmbd/nmbd_winsserver.c:2380(wins_write_database) Mar 11 08:38:57 LS1 nmbd[18562]: wins_write_database: Can't open /var/lib/samba/wins.dat.18562. Error was Read-only file system ... More messages cause by read-only /var ... Mar 11 08:40:46 LS1 kernel: [51346.048759] BUG: Bad page state in process mount pfn:13352e Mar 11 08:40:46 LS1 kernel: [51346.048827] page:ffffea000433a210 flags:020000000000080c count:0 mapcount:0 mapping:(null) index:b0007 Mar 11 08:40:46 LS1 kernel: [51346.048933] Pid: 18615, comm: mount Not tainted 2.6.32-5-amd64 #1 Mar 11 08:40:46 LS1 kernel: [51346.048937] Call Trace: Mar 11 08:40:46 LS1 kernel: [51346.048949] [<ffffffff810b7d41>] ? bad_page+0x116/0x129 Mar 11 08:40:46 LS1 kernel: [51346.048957] [<ffffffff810b822a>] ? free_pages_check+0x38/0x57 Mar 11 08:40:46 LS1 kernel: [51346.048964] [<ffffffff810b958c>] ? free_hot_cold_page+0x46/0x191 Mar 11 08:40:46 LS1 kernel: [51346.048970] [<ffffffff810b9740>] ? __pagevec_free+0x69/0x80 Mar 11 08:40:46 LS1 kernel: [51346.048977] [<ffffffff810bc5fb>] ? release_pages+0x137/0x18d Mar 11 08:40:46 LS1 kernel: [51346.048985] [<ffffffff810bcbb3>] ? __pagevec_release+0x19/0x22 Mar 11 08:40:46 LS1 kernel: [51346.048992] [<ffffffff810bd334>] ? truncate_inode_pages_range+0xdb/0x2b0 Mar 11 08:40:46 LS1 kernel: [51346.049014] [<ffffffffa0095a39>] ? jfs_remount+0xb8/0x15d [jfs] Mar 11 08:40:46 LS1 kernel: [51346.049022] [<ffffffff810f056a>] ? do_remount_sb+0xe0/0x122 Mar 11 08:40:46 LS1 kernel: [51346.049030] [<ffffffff81104dcf>] ? do_mount+0x27a/0x792 Mar 11 08:40:46 LS1 kernel: [51346.049040] [<ffffffff81105367>] ? sys_mount+0x80/0xbd Mar 11 08:40:46 LS1 kernel: [51346.049049] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b Mar 11 08:40:46 LS1 kernel: [51346.049055] Disabling lock debugging due to kernel taint Mar 11 08:40:46 LS1 kernel: [51346.050735] BUG: Bad page state in process mount pfn:13c366 Mar 11 08:40:46 LS1 kernel: [51346.050808] page:ffffea000452be50 flags:020000000000080c count:0 mapcount:0 mapping:(null) index:3c0005 Mar 11 08:40:46 LS1 kernel: [51346.050921] Pid: 18615, comm: mount Tainted: G B 2.6.32-5-amd64 #1 Mar 11 08:40:46 LS1 kernel: [51346.050926] Call Trace: Mar 11 08:40:46 LS1 kernel: [51346.050938] [<ffffffff810b7d41>] ? bad_page+0x116/0x129 Mar 11 08:40:46 LS1 kernel: [51346.050946] [<ffffffff810b822a>] ? free_pages_check+0x38/0x57 Mar 11 08:40:46 LS1 kernel: [51346.050952] [<ffffffff810b958c>] ? free_hot_cold_page+0x46/0x191 Mar 11 08:40:46 LS1 kernel: [51346.050959] [<ffffffff810b9740>] ? __pagevec_free+0x69/0x80 Mar 11 08:40:46 LS1 kernel: [51346.050966] [<ffffffff810bc5fb>] ? release_pages+0x137/0x18d Mar 11 08:40:46 LS1 kernel: [51346.050974] [<ffffffff810bcbb3>] ? __pagevec_release+0x19/0x22 Mar 11 08:40:46 LS1 kernel: [51346.050981] [<ffffffff810bd334>] ? truncate_inode_pages_range+0xdb/0x2b0 Mar 11 08:40:46 LS1 kernel: [51346.051002] [<ffffffffa0095a39>] ? jfs_remount+0xb8/0x15d [jfs] Mar 11 08:40:46 LS1 kernel: [51346.051010] [<ffffffff810f056a>] ? do_remount_sb+0xe0/0x122 Mar 11 08:40:46 LS1 kernel: [51346.051017] [<ffffffff81104dcf>] ? do_mount+0x27a/0x792 Mar 11 08:40:46 LS1 kernel: [51346.051024] [<ffffffff81105367>] ? sys_mount+0x80/0xbd Mar 11 08:40:46 LS1 kernel: [51346.051032] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b Mar 11 08:40:46 LS1 kernel: [51346.052427] BUG: Bad page state in process mount pfn:13acd5 ... Similar repeated several times. After reboot, /var/log/boot included: Sun Mar 11 08:51:41 2012: fsck.jfs version 1.1.12, 24-Aug-2007 Sun Mar 11 08:51:41 2012: processing started: 3/11/2012 8.51.11 Sun Mar 11 08:51:41 2012: The current device is: /dev/mapper/VG1-var Sun Mar 11 08:51:41 2012: Block size in bytes: 4096 Sun Mar 11 08:51:41 2012: Filesystem size in blocks: 39321600 Sun Mar 11 08:51:41 2012: **Phase 0 - Replay Journal Log Sun Mar 11 08:51:41 2012: **Phase 1 - Check Blocks, Files/Directories, and Directory Entries Sun Mar 11 08:51:41 2012: **Phase 2 - Count links Sun Mar 11 08:51:41 2012: **Phase 3 - Duplicate Block Rescan and Directory Connectedness Sun Mar 11 08:51:41 2012: **Phase 4 - Report Problems Sun Mar 11 08:51:41 2012: **Phase 5 - Check Connectivity Sun Mar 11 08:51:41 2012: **Phase 6 - Perform Approved Corrections Sun Mar 11 08:51:41 2012: **Phase 7 - Rebuild File/Directory Allocation Maps Sun Mar 11 08:51:41 2012: **Phase 8 - Rebuild Disk Allocation Maps Sun Mar 11 08:51:41 2012: 157286400 kilobytes total disk space. Sun Mar 11 08:51:41 2012: 25092 kilobytes in 2073 directories. Sun Mar 11 08:51:41 2012: 88298381 kilobytes in 31403 user files. Sun Mar 11 08:51:41 2012: 916 kilobytes in extended attributes Sun Mar 11 08:51:41 2012: 116559 kilobytes reserved for system use. Sun Mar 11 08:51:41 2012: 68895636 kilobytes are available for use. Sun Mar 11 08:51:41 2012: Filesystem is clean. |