Bug 27492 - BUG: unable to handle kernel NULL pointer dereference, on high filesystem io
Summary: BUG: unable to handle kernel NULL pointer dereference, on high filesystem io
Status: RESOLVED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: XFS Guru
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-24 15:02 UTC by Jochen Buchholz
Modified: 2013-12-11 12:03 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.37
Subsystem:
Regression: No
Bisected commit-id:


Attachments
2.6.37 kernel config (95.59 KB, application/octet-stream)
2011-01-24 15:02 UTC, Jochen Buchholz
Details
vmap alias flush on failure (1.35 KB, patch)
2011-02-02 00:47 UTC, Dave Chinner
Details | Diff
Fix oops on directory read error (1.15 KB, patch)
2011-03-11 22:35 UTC, Dave Chinner
Details | Diff

Description Jochen Buchholz 2011-01-24 15:02:56 UTC
Created attachment 45012 [details]
2.6.37 kernel config

This bug happens on >= kernels since 2.6.34 on Gentoo 32 bit sytems: 
http://bugs.gentoo.org/show_bug.cgi?id=351938
I had many XFS chrashes on my Lenovo T61 Notebook with lvm under xfs (32 bit system (Intel(R) Core(TM)2 Duo CPU T8300  @ 2.40GHz, sata) and also under virtual kvm guests describet in the above gentoo bug. The systems chrashed also with vmalloc=256M very often. Kernel's <=2.6.32 works fine. 

Now I fetch the actual version 2.6.37 from kernel.org compile and boot my Gentoo kvm guest. I start an high load compilation with emerge and make -j 10 and distcc to a 64bit crosscompiler system. after compiling 15 packages I got the following:

xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
BUG: unable to handle kernel NULL pointer dereference at 00000008
IP: [<f9187b72>] xfs_da_do_buf+0x4c8/0x61f [xfs]
*pde = 00000000 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/virtio-pci/virtio1/net/eth0/broadcast
Modules linked in: ipv6 nfsd rtc_cmos tpm_tis rtc_core tpm tpm_bios rtc_lib i2c_piix4 virtio_net i6300esb i2c_core virtio_balloon processor pcspkr floppy thermal_sys button tg3 libphy e1000 fuse xfs exportfs nfs auth_rpcgss nfs_acl lockd sunrpc jfs raid10 dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log dm_mod scsi_wait_scan sl811_hcd usbhid ohci_hcd ssb uhci_hcd usb_storage ehci_hcd usbcore aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx qlogicfas408 gdth advansys initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_pcmcia pcmcia pcmcia_core

Pid: 425, comm: i686-pc-linux-g Not tainted 2.6.37 #1 /Bochs
EIP: 0060:[<f9187b72>] EFLAGS: 00210246 CPU: 1
EIP is at xfs_da_do_buf+0x4c8/0x61f [xfs]
EAX: 00000001 EBX: f5947800 ECX: 00000008 EDX: c5631d64
ESI: 00000000 EDI: 00000000 EBP: c5631d74 ESP: c5631d0c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process i686-pc-linux-g (pid: 425, ti=c5630000 task=c5248960 task.ti=c5630000)
Stack:
 007adb5f 00000000 c5631dc0 c55870dc f4c26220 00000000 00000001 00000000
 00000000 ffffffff c55870c0 00000000 00000000 c5631d48 c011d903 c5631d5c
 c0414e4e c5631d5c 00200286 eb74ebe0 c5631d6c 00000001 00000000 c5631d9c
Call Trace:
 [<c011d903>] ? default_spin_lock_flags+0x8/0xd
 [<c0414e4e>] ? _raw_spin_lock_irqsave+0x20/0x28
 [<f9187d25>] ? xfs_da_read_buf+0x18/0x1d [xfs]
 [<f918af47>] ? xfs_dir2_block_lookup_int+0x34/0x178 [xfs]
 [<f918af47>] ? xfs_dir2_block_lookup_int+0x34/0x178 [xfs]
 [<f919fe7c>] ? xfs_log_release_iclog+0x11/0x3a [xfs]
 [<f918b3a0>] ? xfs_dir2_block_removename+0x3d/0x18a [xfs]
 [<f9179022>] ? xfs_bmap_last_offset+0xe4/0xfa [xfs]
 [<f9189c6b>] ? xfs_dir2_isblock+0x18/0x62 [xfs]
 [<f918a213>] ? xfs_dir_removename+0xb4/0xe6 [xfs]
 [<f91ad30a>] ? xfs_remove+0x1d9/0x306 [xfs]
 [<f91b6d1a>] ? xfs_vn_unlink+0x30/0x62 [xfs]
 [<c01bfdd2>] ? vfs_unlink+0x5f/0xbc
 [<c01c1c3b>] ? do_unlinkat+0x94/0xfe
 [<c01a8e06>] ? do_munmap+0x217/0x231
 [<c01c1cb5>] ? sys_unlink+0x10/0x12
 [<c0102858>] ? sysenter_do_call+0x12/0x28
Code: 55 b8 8b 45 b4 eb 13 31 ff 83 7d f0 00 74 12 8b 4d 18 8d 55 f0 b8 01 00 00 00 e8 4c fa ff ff 89 c7 83 7d 14 01 0f 85 c7 00 00 00 <8b> 57 08 89 55 b4 89 d1 8b 52 08 8b 01 0f c8 86 f2 0f b7 d2 81 
EIP: [<f9187b72>] xfs_da_do_buf+0x4c8/0x61f [xfs] SS:ESP 0068:c5631d0c
CR2: 0000000000000008
---[ end trace 4aa7015e260c3faa ]---
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages

 cat /proc/meminfo | fgrep -i vmalloc
VmallocTotal:     122880 kB
VmallocUsed:       14088 kB
VmallocChunk:      81420 kB
Comment 1 Jochen Buchholz 2011-01-28 17:54:01 UTC
The next chrash with kernel 2.6.37 and the inkonsistent xfs filesystem under  param vmalloc=256M bootparam has an out of memeory message:
./
rsync: readlink_stat("/usr/portage/dev-perl/Algorithm-Annotate") failed: Cannot allocate memory (12)
rsync: readlink_stat("/usr/portage/dev-perl/CGI-Session") failed: Cannot allocate memory (12)
rsync: readlink_stat("/usr/portage/dev-perl/Class-XSAccessor") failed: Cannot allocate memory (12)
rsync: readlink_stat("/usr/portage/dev-perl/Math-Random") failed: Cannot allocate memory (12)
rsync: readlink_stat("/usr/portage/dev-perl/PostScript-Simple") failed: Cannot allocate memory (12)
IO error encountered -- skipping file deletion
rsync: recv_generator: failed to stat "/usr/portage/dev-php/metadata.xml": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/kde-base/metadata.xml": Cannot allocate memory (12)
metadata/
metadata/timestamp.chk
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/dev-embedded/arduino-0015-r1": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/dev-perl/XML-NodeFilter-0.01": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/dev-texlive/texlive-basic-2008-r1": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/gnome-extra/assogiate-0.2.1": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/kde-misc/adjustableclock-2.6.1": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/media-plugins/gst-plugins-ladspa-0.10.18": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/media-plugins/gst-plugins-vorbis-0.10.28": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/media-plugins/mythmusic-0.22_p22763": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/media-sound/gpodder-2.9": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/media-sound/justeport-0.2": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/media-sound/mt-daapd-0.2.4.2": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/media-sound/phonon-4.3.50_pre20090520": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/media-sound/qjackctl-0.3.2": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/media-sound/sexypsf-0.4.7": Cannot allocate memory (12)
rsync: recv_generator: failed to stat "/usr/portage/metadata/cache/net-misc/modemmanager-0.4": Cannot allocate memory (12)
.
.
.


dmesg:
 vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
.
.
.

# cat /proc/meminfo 
MemTotal:         775732 kB
MemFree:           84396 kB
Buffers:              20 kB
Cached:           302024 kB
SwapCached:         4564 kB
Active:           211520 kB
Inactive:         123696 kB
Active(anon):      15708 kB
Inactive(anon):    17932 kB
Active(file):     195812 kB
Inactive(file):   105764 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:        278524 kB
HighFree:            572 kB
LowTotal:         497208 kB
LowFree:           83824 kB
SwapTotal:       1025020 kB
SwapFree:         956068 kB
Dirty:                24 kB
Writeback:             0 kB
AnonPages:         29108 kB
Mapped:             6152 kB
Shmem:               468 kB
Slab:             314676 kB
SReclaimable:     301996 kB
SUnreclaim:        12680 kB
KernelStack:        1760 kB
PageTables:         1108 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1412884 kB
Committed_AS:     761328 kB
VmallocTotal:     262144 kB
VmallocUsed:       14088 kB
VmallocChunk:     215384 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       4096 kB
DirectMap4k:        8184 kB
DirectMap4M:      761856 kB
The XFS filesystem is unusable under 2.6.37 and an x86 32 bit system.
Comment 2 Jochen Buchholz 2011-01-31 23:14:56 UTC
Next crash in the middle of recompiling openoffice:

map allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
BUG: unable to handle kernel NULL pointer dereference at 00000008
IP: [<f0987b72>] xfs_da_do_buf+0x4c8/0x61f [xfs]
*pde = 00000000 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/virtio-pci/virtio1/net/eth0/broadcast
Modules linked in: ipv6 nfsd tpm_tis tpm rtc_cmos tpm_bios rtc_core i2c_piix4 i6300esb rtc_lib i2c_core floppy pcspkr virtio_net virtio_balloon processor thermal_sys button tg3 libphy e1000 fuse xfs exportfs nfs auth_rpcgss nfs_acl lockd sunrpc jfs raid10 dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log dm_mod scsi_wait_scan sl811_hcd usbhid ohci_hcd ssb uhci_hcd usb_storage ehci_hcd usbcore aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx qlogicfas408 gdth advansys initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_pcmcia pcmcia pcmcia_core

Pid: 14479, comm: patch Not tainted 2.6.37 #1 /Bochs
EIP: 0060:[<f0987b72>] EFLAGS: 00210246 CPU: 1
EIP is at xfs_da_do_buf+0x4c8/0x61f [xfs]
EAX: 00000001 EBX: ebd84c00 ECX: 00000008 EDX: c07f3cb0
ESI: 00000000 EDI: 00000000 EBP: c07f3cc0 ESP: c07f3c58
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process patch (pid: 14479, ti=c07f2000 task=e6024100 task.ti=c07f2000)
Stack:
 00613a81 00000000 ffffffff e1e4629c ec917380 00000000 00000001 00000000
 00000000 ffffffff e1e46280 00000000 00000001 c05a03e4 00000001 c05a03c0
 00000250 00000000 00000001 c07f3d20 c01949dc 00000001 00000000 c07f3ce8
Call Trace:
 [<c01949dc>] ? __alloc_pages_nodemask+0xf8/0x4d5
 [<f0987d25>] ? xfs_da_read_buf+0x18/0x1d [xfs]
 [<f098af47>] ? xfs_dir2_block_lookup_int+0x34/0x178 [xfs]
 [<f098af47>] ? xfs_dir2_block_lookup_int+0x34/0x178 [xfs]
 [<f0979022>] ? xfs_bmap_last_offset+0xe4/0xfa [xfs]
 [<f098b0c5>] ? xfs_dir2_block_replace+0x3a/0xc0 [xfs]
 [<f098a57c>] ? xfs_dir_replace+0xbf/0xf2 [xfs]
 [<f09a80b5>] ? xfs_rename+0x2f7/0x4b6 [xfs]
 [<f09b6a38>] ? xfs_vn_rename+0x58/0x62 [xfs]
 [<c01c00b7>] ? vfs_rename+0x217/0x35e
 [<c01c1b2e>] ? sys_renameat+0x17a/0x1df
 [<c01abfa8>] ? unlink_anon_vmas+0x8e/0xa3
 [<c01a8053>] ? remove_vma+0x41/0x47
 [<c01a8e06>] ? do_munmap+0x217/0x231
 [<c01c1ba5>] ? sys_rename+0x12/0x14
 [<c0102858>] ? sysenter_do_call+0x12/0x28
Code: 55 b8 8b 45 b4 eb 13 31 ff 83 7d f0 00 74 12 8b 4d 18 8d 55 f0 b8 01 00 00 00 e8 4c fa ff ff 89 c7 83 7d 14 01 0f 85 c7 00 00 00 <8b> 57 08 89 55 b4 89 d1 8b 52 08 8b 01 0f c8 86 f2 0f b7 d2 81 
EIP: [<f0987b72>] xfs_da_do_buf+0x4c8/0x61f [xfs] SS:ESP 0068:c07f3c58
CR2: 0000000000000008
---[ end trace d97f7d5f095dea70 ]---

root@ittest01:~ (master $ u=) # cat /proc/meminfo 
MemTotal:         775732 kB
MemFree:           46264 kB
Buffers:               0 kB
Cached:           579788 kB
SwapCached:        17116 kB
Active:           292832 kB
Inactive:         349308 kB
Active(anon):       4480 kB
Inactive(anon):    59836 kB
Active(file):     288352 kB
Inactive(file):   289472 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:        278524 kB
HighFree:            576 kB
LowTotal:         497208 kB
LowFree:           45688 kB
SwapTotal:       1025020 kB
SwapFree:         950652 kB
Dirty:                48 kB
Writeback:             0 kB
AnonPages:         47348 kB
Mapped:             8712 kB
Shmem:              1964 kB
Slab:              45768 kB
SReclaimable:      37020 kB
SUnreclaim:         8748 kB
KernelStack:        1816 kB
PageTables:         1200 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1412884 kB
Committed_AS:     802204 kB
VmallocTotal:     262144 kB
VmallocUsed:       14088 kB
VmallocChunk:     220660 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       4096 kB
DirectMap4k:        8184 kB
DirectMap4M:      761856 kB
Comment 3 Dave Chinner 2011-02-01 00:58:15 UTC
(In reply to comment #0)
> Created an attachment (id=45012) [details]
> 2.6.37 kernel config
> 
> This bug happens on >= kernels since 2.6.34 on Gentoo 32 bit sytems: 
> http://bugs.gentoo.org/show_bug.cgi?id=351938
> I had many XFS chrashes on my Lenovo T61 Notebook with lvm under xfs (32 bit
> system (Intel(R) Core(TM)2 Duo CPU T8300  @ 2.40GHz, sata) and also under
> virtual kvm guests describet in the above gentoo bug. The systems chrashed
> also
> with vmalloc=256M very often. Kernel's <=2.6.32 works fine. 
> 
> Now I fetch the actual version 2.6.37 from kernel.org compile and boot my
> Gentoo kvm guest. I start an high load compilation with emerge and make -j 10
> and distcc to a 64bit crosscompiler system. after compiling 15 packages I got
> the following:
> 
> xfs_buf_get: failed to map pages
> vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
..... 
>  cat /proc/meminfo | fgrep -i vmalloc 
> VmallocTotal:     122880 kB
> VmallocUsed:       14088 kB
> VmallocChunk:      81420 kB

This is probably caused by buildup of unflushed vmap ranges that have been released but not yet freed. That's more of a VM problem, not an XFS problem,
though I think I can work around it in XFS by forcing a flush of all the ranges not yet freed when an vm_map_pages() call fails. Need to read a little more code first, though.

> xfs_buf_get: failed to map pages
> vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
> xfs_buf_get: failed to map pages
> vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
> xfs_buf_get: failed to map pages
> BUG: unable to handle kernel NULL pointer dereference at 00000008
> IP: [<f9187b72>] xfs_da_do_buf+0x4c8/0x61f [xfs]
> *pde = 00000000 
> Oops: 0000 [#1] SMP 
> last sysfs file: /sys/devices/virtio-pci/virtio1/net/eth0/broadcast
> Modules linked in: ipv6 nfsd rtc_cmos tpm_tis rtc_core tpm tpm_bios rtc_lib
> i2c_piix4 virtio_net i6300esb i2c_core virtio_balloon processor pcspkr floppy
> thermal_sys button tg3 libphy e1000 fuse xfs exportfs nfs auth_rpcgss nfs_acl
> lockd sunrpc jfs raid10 dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log
> dm_mod scsi_wait_scan sl811_hcd usbhid ohci_hcd ssb uhci_hcd usb_storage
> ehci_hcd usbcore aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox
> megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx 3w_xxxx mptsas
> scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase
> atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx qlogicfas408 gdth
> advansys initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg
> pdc_adma
> sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli
> sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise
> pata_pcmcia pcmcia pcmcia_core
> 
> Pid: 425, comm: i686-pc-linux-g Not tainted 2.6.37 #1 /Bochs
> EIP: 0060:[<f9187b72>] EFLAGS: 00210246 CPU: 1
> EIP is at xfs_da_do_buf+0x4c8/0x61f [xfs]
> EAX: 00000001 EBX: f5947800 ECX: 00000008 EDX: c5631d64
> ESI: 00000000 EDI: 00000000 EBP: c5631d74 ESP: c5631d0c
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process i686-pc-linux-g (pid: 425, ti=c5630000 task=c5248960
> task.ti=c5630000)
> Stack:
>  007adb5f 00000000 c5631dc0 c55870dc f4c26220 00000000 00000001 00000000
>  00000000 ffffffff c55870c0 00000000 00000000 c5631d48 c011d903 c5631d5c
>  c0414e4e c5631d5c 00200286 eb74ebe0 c5631d6c 00000001 00000000 c5631d9c
> Call Trace:
>  [<c011d903>] ? default_spin_lock_flags+0x8/0xd
>  [<c0414e4e>] ? _raw_spin_lock_irqsave+0x20/0x28
>  [<f9187d25>] ? xfs_da_read_buf+0x18/0x1d [xfs]
>  [<f918af47>] ? xfs_dir2_block_lookup_int+0x34/0x178 [xfs]
>  [<f918af47>] ? xfs_dir2_block_lookup_int+0x34/0x178 [xfs]
>  [<f919fe7c>] ? xfs_log_release_iclog+0x11/0x3a [xfs]
>  [<f918b3a0>] ? xfs_dir2_block_removename+0x3d/0x18a [xfs]
>  [<f9179022>] ? xfs_bmap_last_offset+0xe4/0xfa [xfs]
>  [<f9189c6b>] ? xfs_dir2_isblock+0x18/0x62 [xfs]
>  [<f918a213>] ? xfs_dir_removename+0xb4/0xe6 [xfs]
>  [<f91ad30a>] ? xfs_remove+0x1d9/0x306 [xfs]
>  [<f91b6d1a>] ? xfs_vn_unlink+0x30/0x62 [xfs]
>  [<c01bfdd2>] ? vfs_unlink+0x5f/0xbc
>  [<c01c1c3b>] ? do_unlinkat+0x94/0xfe
>  [<c01a8e06>] ? do_munmap+0x217/0x231
>  [<c01c1cb5>] ? sys_unlink+0x10/0x12
>  [<c0102858>] ? sysenter_do_call+0x12/0x28

That shouldn't happen, though.

Can you post the output of 'xfs_info <mntpt>', as that might tell us why you're putting so much load on the vmap subsystem.

Also, can you post the output of:

$ echo 'l *(xfs_da_do_buf+0x4c8)' > gdb /path/to/vmlinux

On the kernel that produced this oops? I'd like to know where in xfs_da_do_buf() it actually crashed. If you've got a different kernel and crash, then modify "xfs_da_do_buf+0x4c8" to match the address reported for the oops.

Cheers,

Dave.
Comment 4 Dave Chinner 2011-02-02 00:47:35 UTC
Created attachment 45942 [details]
vmap alias flush on failure

Can you try this patch? It should flush the outstanding vmap aliases on the first failure, which should allow the retry to succeed.
Comment 5 Jochen Buchholz 2011-02-02 23:50:24 UTC
I patched xfs_buf.c and start rebuilding the kernel. The next chrash occurs very fast in the mproper phase. I will reboot the system and try at first your patch.  Here is the output, perhaps another important detail.  

EXT2-fs (vda1): warning: mounting unchecked fs, running e2fsck is recommended
hrtimer: interrupt took 59997566 ns
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
alloc_vmap_area: 22 callbacks suppressed
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
BUG: unable to handle kernel NULL pointer dereference at 00000008
IP: [<f0987b72>] xfs_da_do_buf+0x4c8/0x61f [xfs]
*pde = 00000000 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/net/eth0/broadcast
Modules linked in: ipv6 nfsd tpm_tis pcspkr tpm virtio_balloon i2c_piix4 tpm_bios rtc_cmos i2c_core i6300esb rtc_core rtc_lib floppy processor thermal_sys button tg3 libphy e1000 fuse xfs exportfs nfs auth_rpcgss nfs_acl lockd sunrpc jfs raid10 dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log dm_mod scsi_wait_scan sl811_hcd usbhid ohci_hcd ssb uhci_hcd usb_storage ehci_hcd usbcore aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx qlogicfas408 gdth advansys initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_pcmcia pcmcia pcmcia_core

Pid: 12059, comm: rm Not tainted 2.6.37 #1 /Bochs
EIP: 0060:[<f0987b72>] EFLAGS: 00010246 CPU: 1
EIP is at xfs_da_do_buf+0x4c8/0x61f [xfs]
EAX: 00000001 EBX: ed25d800 ECX: 00000008 EDX: c30a5d54
ESI: 00000000 EDI: 00000000 EBP: c30a5d64 ESP: c30a5cfc
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rm (pid: 12059, ti=c30a4000 task=df6c8ef0 task.ti=c30a4000)
Stack:
 0050605b 00000000 d00d7080 dee702dc c43bc1c0 f0979577 00000001 00000000
 00000000 ffffffff dee702c0 c30a5e0c d00d7080 c30a5e70 f09800e4 00000000
 c30a5e58 c30a5e54 c30a5e28 c30a5e0c 00000001 00000001 00000000 c30a5d8c
Call Trace:
 [<f0979577>] ? xfs_bmap_search_extents+0x3c/0xab [xfs]
 [<f09800e4>] ? xfs_bmapi+0x2eb/0x1042 [xfs]
 [<f0987d25>] ? xfs_da_read_buf+0x18/0x1d [xfs]
 [<f098c4da>] ? xfs_dir2_leaf_lookup_int+0x41/0x218 [xfs]
 [<f098c4da>] ? xfs_dir2_leaf_lookup_int+0x41/0x218 [xfs]
 [<f09ab049>] ? xfs_trans_read_buf+0x46/0x34b [xfs]
 [<f098cc18>] ? xfs_dir2_leaf_removename+0x41/0x2cc [xfs]
 [<f0979022>] ? xfs_bmap_last_offset+0xe4/0xfa [xfs]
 [<f0989c25>] ? xfs_dir2_isleaf+0x16/0x44 [xfs]
 [<f098a233>] ? xfs_dir_removename+0xd4/0xe6 [xfs]
 [<f09ad30a>] ? xfs_remove+0x1d9/0x306 [xfs]
 [<f09b6d1a>] ? xfs_vn_unlink+0x30/0x62 [xfs]
 [<c01bfdd2>] ? vfs_unlink+0x5f/0xbc
 [<c01c1c3b>] ? do_unlinkat+0x94/0xfe
 [<c01c1d9f>] ? sys_unlinkat+0x2b/0x32
 [<c0102858>] ? sysenter_do_call+0x12/0x28
Code: 55 b8 8b 45 b4 eb 13 31 ff 83 7d f0 00 74 12 8b 4d 18 8d 55 f0 b8 01 00 00 00 e8 4c fa ff ff 89 c7 83 7d 14 01 0f 85 c7 00 00 00 <8b> 57 08 89 55 b4 89 d1 8b 52 08 8b 01 0f c8 86 f2 0f b7 d2 81 
EIP: [<f0987b72>] xfs_da_do_buf+0x4c8/0x61f [xfs] SS:ESP 0068:c30a5cfc
CR2: 0000000000000008
---[ end trace c5ed2eecef7f9824 ]---
alloc_vmap_area: 19 callbacks suppressed
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
BUG: unable to handle kernel NULL pointer dereference at 00000008
IP: [<f0987b72>] xfs_da_do_buf+0x4c8/0x61f [xfs]
*pde = 00000000 
Oops: 0000 [#2] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/net/eth0/broadcast
Modules linked in: ipv6 nfsd tpm_tis pcspkr tpm virtio_balloon i2c_piix4 tpm_bios rtc_cmos i2c_core i6300esb rtc_core rtc_lib floppy processor thermal_sys button tg3 libphy e1000 fuse xfs exportfs nfs auth_rpcgss nfs_acl lockd sunrpc jfs raid10 dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log dm_mod scsi_wait_scan sl811_hcd usbhid ohci_hcd ssb uhci_hcd usb_storage ehci_hcd usbcore aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx qlogicfas408 gdth advansys initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_pcmcia pcmcia pcmcia_core

Pid: 12063, comm: rm Tainted: G      D     2.6.37 #1 /Bochs
EIP: 0060:[<f0987b72>] EFLAGS: 00010246 CPU: 0
EIP is at xfs_da_do_buf+0x4c8/0x61f [xfs]
EAX: 00000001 EBX: ed25d800 ECX: 00000008 EDX: d0579d58
ESI: 00000000 EDI: 00000000 EBP: d0579d68 ESP: d0579d00
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rm (pid: 12063, ti=d0578000 task=ea34f8e0 task.ti=d0578000)
Stack:
 004fb5aa 00000000 00000000 dee705dc ce074760 d0579d28 00000001 00000000
 00000000 ffffffff dee705c0 c0149f0b ecb8f980 ddd9cba8 d0579d4c f0986d61
 e3e4ba8c ddd9cba8 ecb8f980 d0579d68 d0579d58 00000001 00000000 d0579d90
Call Trace:
 [<c0149f0b>] ? up+0x2b/0x2f
 [<f0986d61>] ? xfs_buf_item_unlock+0xb6/0xbd [xfs]
 [<f0987d25>] ? xfs_da_read_buf+0x18/0x1d [xfs]
 [<f098c4da>] ? xfs_dir2_leaf_lookup_int+0x41/0x218 [xfs]
 [<f098c4da>] ? xfs_dir2_leaf_lookup_int+0x41/0x218 [xfs]
 [<f098cc18>] ? xfs_dir2_leaf_removename+0x41/0x2cc [xfs]
 [<f0979022>] ? xfs_bmap_last_offset+0xe4/0xfa [xfs]
 [<f0989c25>] ? xfs_dir2_isleaf+0x16/0x44 [xfs]
 [<f098a233>] ? xfs_dir_removename+0xd4/0xe6 [xfs]
 [<f09ad30a>] ? xfs_remove+0x1d9/0x306 [xfs]
 [<c01c6008>] ? shrink_dcache_parent+0x10f/0x122
 [<f09b6d1a>] ? xfs_vn_unlink+0x30/0x62 [xfs]
 [<c01c0263>] ? vfs_rmdir+0x65/0xb2
 [<c01c1d3c>] ? do_rmdir+0x85/0xbd
 [<c01c5e13>] ? dput+0xbc/0xfc
 [<c01b9387>] ? fput+0x19f/0x1a7
 [<c01b6a90>] ? filp_close+0x51/0x5b
 [<c01c1d98>] ? sys_unlinkat+0x24/0x32
 [<c0102858>] ? sysenter_do_call+0x12/0x28
Code: 55 b8 8b 45 b4 eb 13 31 ff 83 7d f0 00 74 12 8b 4d 18 8d 55 f0 b8 01 00 00 00 e8 4c fa ff ff 89 c7 83 7d 14 01 0f 85 c7 00 00 00 <8b> 57 08 89 55 b4 89 d1 8b 52 08 8b 01 0f c8 86 f2 0f b7 d2 81 
EIP: [<f0987b72>] xfs_da_do_buf+0x4c8/0x61f [xfs] SS:ESP 0068:d0579d00
CR2: 0000000000000008
---[ end trace c5ed2eecef7f9825 ]---
Comment 6 Jochen Buchholz 2011-02-03 12:00:00 UTC
I tried an emerge -e system  on the patched kernel and the system chrashed in the ncurses install procedure:
installing etip.h in /var/tmp/portage/sys-libs/ncurses-5.7-r3/image//usr/include/ncursesw
make[1]: Leaving directory `/var/tmp/portage/sys-libs/ncurses-5.7-r3/work/widec/c++'
 * Installing basic terminfo files in /etc...
>>> Completed installing ncurses-5.7-r3 into
>>> /var/tmp/portage/sys-libs/ncurses-5.7-r3/image/

strip: i686-pc-linux-gnu-strip --strip-unneeded -R .comment
   usr/bin/tic
   usr/bin/toe
   usr/bin/infocmp
   usr/bin/clear
   usr/bin/tput
   usr/bin/tset
   usr/lib/libform.so.5.7
   usr/lib/libmenu.so.5.7
   usr/lib/libpanel.so.5.7
   usr/lib/libformw.so.5.7
   usr/lib/libmenuw.so.5.7
   usr/lib/libpanelw.so.5.7
   lib/libncurses.so.5.7
   lib/libncursesw.so.5.7
   usr/lib/libform.a
   usr/lib/libformw.a
   usr/lib/libmenu.a
   usr/lib/libmenuw.a
   usr/lib/libncurses++.a
   usr/lib/libncurses++w.a
   usr/lib/libncurses.a
   usr/lib/libncursesw.a
   usr/lib/libpanel.a
   usr/lib/libpanelw.a
ecompressdir: bzip2 -9 /usr/share/man
ln: failed to create symbolic link `./man3/border_set.3x.bz2': Input/output error
ln: failed to create symbolic link `./man3/border_set.3x': Input/output error
rm: cannot remove `./man3/box_set.3x': Input/output error
ln: accessing `./man3/box_set.3x.bz2': Input/output error
ln: accessing `./man3/box_set.3x'
rm: cannot remove `./man3/hline_set.3x': Input/output error
ln: accessing `./man3/hline_set.3x.bz2': Input/output error
ln: accessing `./man3/hline_set.3x'
rm: cannot remove `./man3/mvhline_set.3x': Input/output error
ln: accessing `./man3/mvhline_set.3x.bz2': Input/output error
ln: accessing `./man3/mvhline_set.3x'
rm: cannot remove `./man3/mvvline_set.3x': Input/output error
ln: accessing `./man3/mvvline_set.3x.bz2': Input/output error

dmesg Output:
EXT2-fs (vda1): warning: mounting unchecked fs, running e2fsck is recommended
hrtimer: interrupt took 43926792 ns
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
Filesystem "dm-7": XFS internal error xfs_trans_cancel at line 1815 of file fs/xfs/xfs_trans.c.  Caller 0xf09acd92

Pid: 747, comm: ln Not tainted 2.6.37 #1
Call Trace:
 [<f0991bba>] xfs_error_report+0x2c/0x2e [xfs]
 [<f09a9d07>] xfs_trans_cancel+0x4b/0xc9 [xfs]
 [<f09acd92>] ? xfs_symlink+0x6d5/0x700 [xfs]
 [<f09acd92>] xfs_symlink+0x6d5/0x700 [xfs]
 [<c01ba1ff>] ? sget+0xc6/0x358
 [<c01bf791>] ? do_lookup+0x64/0xf4
 [<c02697da>] ? security_inode_permission+0x1a/0x20
 [<c01bf428>] ? exec_permission+0x5f/0x67
 [<c01c0f61>] ? link_path_walk+0x5d/0x84e
 [<c01c7297>] ? __d_lookup+0xac/0xdd
 [<f09b6cbc>] xfs_vn_symlink+0x60/0xa6 [xfs]
 [<c01bf43f>] ? generic_permission+0xf/0x85
 [<c01bfd2a>] vfs_symlink+0x6b/0xb4
 [<c01c2078>] sys_symlinkat+0x6c/0xac
 [<c01a826a>] ? unmap_region+0xf7/0x106
 [<c01a8053>] ? remove_vma+0x41/0x47
 [<c01a8e06>] ? do_munmap+0x217/0x231
 [<c01c20c8>] sys_symlink+0x10/0x12
 [<c0102858>] sysenter_do_call+0x12/0x28
xfs_force_shutdown(dm-7,0x8) called from line 1816 of file fs/xfs/xfs_trans.c.  Return address = 0xf09a9d1d
Filesystem "dm-7": Corruption of in-memory data detected.  Shutting down filesystem: dm-7
Please umount the filesystem, and rectify the problem(s)
Filesystem "dm-7": xfs_log_force: error 5 returned.

I forgot to install gdb before it, so I can't did your check - sorry. But it will my first step to install gdb  after rebooting the test host :-)

A short summary of the xfs problems again, perhaps it's helping you:
The xfs problems apeared at first on my 32bit Lenovo T61 Laptop after switching to Gentoo kernel 2.6.34-gentoo-r6 32-bit (include gentoo patches). 2.6.34 had also lvm issues. Here is a link to the discussion:
http://forums.gentoo.org/viewtopic-t-845611-highlight-.html

After switching to Gentoo kernel 2.6.35-r12 the vmap errors appears sometimes. The same with gentoo kernels 2.6.36. After an update to 2.6.35 on my 32bit kvm gentoo guest test system, the xfs file system crash also on it. So I did all test on this system to prevent the loose of data. I install the original 2.6.37 kernel by kernel.org, it chrashed also and I opened this Bug. 

The virtio disk of the test guest is a lvm logical volume on an 64bit Quad Core Gentoo system. I hope this helps.
Comment 7 Jochen Buchholz 2011-02-03 15:51:58 UTC
The next chrash has occured, but the gdb comand would not work:
root@ittest01:~ (master $ u=) # echo 'l *(xfs_da_do_buf+0x4c8)' | gdb /usr/src/linux/vmlinux           
GNU gdb (Gentoo 7.2 p1) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /usr/src/linux-2.6.37/vmlinux...(no debugging symbols found)...done.
(gdb) Hangup detected on fd 0
error detected on stdin

Ok - here the next dump:
eth0: no IPv6 routers present
hrtimer: interrupt took 29976382 ns
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
BUG: unable to handle kernel NULL pointer dereference at 00000008
IP: [<f0987b72>] xfs_da_do_buf+0x4c8/0x61f [xfs]
*pde = 00000000 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/net/eth0/broadcast
Modules linked in: ipv6 nfsd tpm_tis rtc_cmos rtc_core tpm rtc_lib tpm_bios floppy processor i6300esb button thermal_sys i2c_piix4 pcspkr i2c_core virtio_balloon tg3 libphy e1000 fuse xfs exportfs nfs auth_rpcgss nfs_acl lockd sunrpc jfs raid10 dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log dm_mod scsi_wait_scan sl811_hcd usbhid ohci_hcd ssb uhci_hcd usb_storage ehci_hcd usbcore aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx qlogicfas408 gdth advansys initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_pcmcia pcmcia pcmcia_core

Pid: 12336, comm: tar Not tainted 2.6.37 #1 /Bochs
EIP: 0060:[<f0987b72>] EFLAGS: 00210246 CPU: 0
EIP is at xfs_da_do_buf+0x4c8/0x61f [xfs]
EAX: 00000001 EBX: ed795c00 ECX: 00000008 EDX: d6345c64
ESI: 00000000 EDI: 00000000 EBP: d6345c74 ESP: d6345c0c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process tar (pid: 12336, ti=d6344000 task=e965a200 task.ti=d6344000)
Stack:
 00203c27 00000000 00000000 ed5b58dc cdc8ca40 f09ab2c2 00000001 00000000
 00000000 ffffffff ed5b58c0 d6345c44 00200292 e0f96920 ecb0e1f8 e0f96900
 ecb0e180 d6345c68 f09b27ea f09aafc0 c50dc27c 00000001 00000000 d6345c9c
Call Trace:
 [<f09ab2c2>] ? xfs_trans_read_buf+0x2bf/0x34b [xfs]
 [<f09b27ea>] ? xfs_buf_rele+0x57/0x9a [xfs]
 [<f09aafc0>] ? xfs_trans_brelse+0xd5/0xdb [xfs]
 [<f0987d25>] ? xfs_da_read_buf+0x18/0x1d [xfs]
 [<f0988df4>] ? xfs_da_node_lookup_int+0x4d/0x202 [xfs]
 [<f0988df4>] ? xfs_da_node_lookup_int+0x4d/0x202 [xfs]
 [<f098eb86>] ? xfs_dir2_node_addname+0x63/0x79e [xfs]
 [<f09af6a6>] ? kmem_zone_zalloc+0x1c/0x29 [xfs]
 [<f09a9817>] ? xfs_trans_add_item+0x18/0x48 [xfs]
 [<f0979022>] ? xfs_bmap_last_offset+0xe4/0xfa [xfs]
 [<f0989c25>] ? xfs_dir2_isleaf+0x16/0x44 [xfs]
 [<f098a6b6>] ? xfs_dir_createname+0x107/0x10f [xfs]
 [<f09ad8a0>] ? xfs_create+0x32b/0x508 [xfs]
 [<f09b6bb6>] ? xfs_vn_mknod+0xc8/0x153 [xfs]
 [<f09b6c5a>] ? xfs_vn_create+0xa/0xc [xfs]
 [<c01c05dc>] ? vfs_create+0x85/0xcf
 [<c01c0c83>] ? do_last+0x28a/0x50b
 [<c01c2471>] ? do_filp_open+0x1a7/0x47b
 [<c01aee96>] ? free_pages_and_swap_cache+0x6a/0x7d
 [<c01b6b76>] ? do_sys_open+0x44/0xbf
 [<c01b6c33>] ? sys_open+0x1e/0x26
 [<c0102858>] ? sysenter_do_call+0x12/0x28
Code: 55 b8 8b 45 b4 eb 13 31 ff 83 7d f0 00 74 12 8b 4d 18 8d 55 f0 b8 01 00 00 00 e8 4c fa ff ff 89 c7 83 7d 14 01 0f 85 c7 00 00 00 <8b> 57 08 89 55 b4 89 d1 8b 52 08 8b 01 0f c8 86 f2 0f b7 d2 81 
EIP: [<f0987b72>] xfs_da_do_buf+0x4c8/0x61f [xfs] SS:ESP 0068:d6345c0c
CR2: 0000000000000008
---[ end trace 6c4b38f1d30d44f1 ]---
EXT2-fs (vda1): warning: mounting unchecked fs, running e2fsck is recommended


The gentoo kernel 2.6.32 work since 4 weeks without any problems on my T61 Notebook (32 bit). LVM, xfs same configuration like the test host. I use XFS since Redhat 7.3 - many years with no issue. I think the bug depend on changes after 2.6.32. Have you an KVM environment? I can prepare a minimal gentoo test system as an image file?
Comment 8 Dave Chinner 2011-02-03 22:44:49 UTC
The trace in comment #6 is _not a crash_. It's a filesystem shutdown because a transaction failed. That's what is supposed to happen, and it tells you this:

Filesystem "dm-7": Corruption of in-memory data detected.  Shutting down
filesystem: dm-7
Please umount the filesystem, and rectify the problem(s)

As to gdb failing to work, try just running "gdb vmlinux" and running
'l *(xfs_da_do_buf+0x4c8)' at the command prompt once it has started.....

Cheers,

Dave.
Comment 9 Jochen Buchholz 2011-02-03 23:26:24 UTC
(In reply to comment #8)
> The trace in comment #6 is _not a crash_. It's a filesystem shutdown because
> a
OK - I would not use this word anymore ;-)
> transaction failed. That's what is supposed to happen, and it tells you this:
> 
> Filesystem "dm-7": Corruption of in-memory data detected.  Shutting down
> filesystem: dm-7
> Please umount the filesystem, and rectify the problem(s)
> 
> As to gdb failing to work, try just running "gdb vmlinux" and running
> 'l *(xfs_da_do_buf+0x4c8)' at the command prompt once it has started.....
> 

root@ittest01:~ (master $ u=) # gdb /usr/src/linux/vmlinux
GNU gdb (Gentoo 7.2 p1) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /usr/src/linux-2.6.37/vmlinux...(no debugging symbols found)...done.
(gdb) l *(xfs_da_do_buf+0x4c8)
No symbol table is loaded.  Use the "file" command.
Comment 10 Dave Chinner 2011-02-04 00:13:10 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > The trace in comment #6 is _not a crash_. It's a filesystem shutdown
> because a
> OK - I would not use this word anymore ;-)
> > transaction failed. That's what is supposed to happen, and it tells you
> this:
> > 
> > Filesystem "dm-7": Corruption of in-memory data detected.  Shutting down
> > filesystem: dm-7
> > Please umount the filesystem, and rectify the problem(s)
> > 
> > As to gdb failing to work, try just running "gdb vmlinux" and running
> > 'l *(xfs_da_do_buf+0x4c8)' at the command prompt once it has started.....
> > 
> 
> root@ittest01:~ (master $ u=) # gdb /usr/src/linux/vmlinux
> GNU gdb (Gentoo 7.2 p1) 7.2
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "i686-pc-linux-gnu".
> For bug reporting instructions, please see:
> <http://bugs.gentoo.org/>...
> Reading symbols from /usr/src/linux-2.6.37/vmlinux...(no debugging symbols
> found)...done.
> (gdb) l *(xfs_da_do_buf+0x4c8)
> No symbol table is loaded.  Use the "file" command.

You need to build your kernel with debugging symbols:

CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y

Just rebuilding it with these additional options set should not change the code at all, but should allow gdb to work.
Comment 11 Jochen Buchholz 2011-02-06 19:33:26 UTC
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
The next fail:
xfs_buf_get: failed to map pages
Filesystem "dm-7": XFS internal error xfs_trans_cancel at line 1815 of file fs/xfs/xfs_trans.c.  Caller 0xf09e4a01

Pid: 21091, comm: tar Not tainted 2.6.37 #1
Call Trace:
 [<f09c8bba>] xfs_error_report+0x2c/0x2e [xfs]
 [<f09e0d07>] xfs_trans_cancel+0x4b/0xc9 [xfs]
 [<f09e4a01>] ? xfs_create+0x48c/0x508 [xfs]
 [<f09e4a01>] xfs_create+0x48c/0x508 [xfs]
 [<f09edbb6>] xfs_vn_mknod+0xc8/0x153 [xfs]
 [<f09edc5a>] xfs_vn_create+0xa/0xc [xfs]
 [<c01c05d4>] vfs_create+0x85/0xcf
 [<c01c0c7b>] do_last+0x28a/0x50b
 [<c01c2469>] do_filp_open+0x1a7/0x47b
 [<c01aee8e>] ? free_pages_and_swap_cache+0x6a/0x7d
 [<c01b6b6e>] do_sys_open+0x44/0xbf
 [<c01b6c2b>] sys_open+0x1e/0x26
 [<c0102858>] sysenter_do_call+0x12/0x28
xfs_force_shutdown(dm-7,0x8) called from line 1816 of file fs/xfs/xfs_trans.c.  Return address = 0xf09e0d1d
Filesystem "dm-7": Corruption of in-memory data detected.  Shutting down filesystem: dm-7
Please umount the filesystem, and rectify the problem(s)
Filesystem "dm-7": xfs_log_force: error 5 returned.

gdb does fails like above. I read something about DEBUG_INFO which is not set. I will recompile the kernel with that ops too.

root@ittest01:/usr/src/linux # zcat /proc/config.gz |grep KALLSYMS
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
Comment 12 Jochen Buchholz 2011-02-06 19:34:17 UTC
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
The next fail:
xfs_buf_get: failed to map pages
Filesystem "dm-7": XFS internal error xfs_trans_cancel at line 1815 of file fs/xfs/xfs_trans.c.  Caller 0xf09e4a01

Pid: 21091, comm: tar Not tainted 2.6.37 #1
Call Trace:
 [<f09c8bba>] xfs_error_report+0x2c/0x2e [xfs]
 [<f09e0d07>] xfs_trans_cancel+0x4b/0xc9 [xfs]
 [<f09e4a01>] ? xfs_create+0x48c/0x508 [xfs]
 [<f09e4a01>] xfs_create+0x48c/0x508 [xfs]
 [<f09edbb6>] xfs_vn_mknod+0xc8/0x153 [xfs]
 [<f09edc5a>] xfs_vn_create+0xa/0xc [xfs]
 [<c01c05d4>] vfs_create+0x85/0xcf
 [<c01c0c7b>] do_last+0x28a/0x50b
 [<c01c2469>] do_filp_open+0x1a7/0x47b
 [<c01aee8e>] ? free_pages_and_swap_cache+0x6a/0x7d
 [<c01b6b6e>] do_sys_open+0x44/0xbf
 [<c01b6c2b>] sys_open+0x1e/0x26
 [<c0102858>] sysenter_do_call+0x12/0x28
xfs_force_shutdown(dm-7,0x8) called from line 1816 of file fs/xfs/xfs_trans.c.  Return address = 0xf09e0d1d
Filesystem "dm-7": Corruption of in-memory data detected.  Shutting down filesystem: dm-7
Please umount the filesystem, and rectify the problem(s)
Filesystem "dm-7": xfs_log_force: error 5 returned.

gdb does fails like above. I read something about DEBUG_INFO which is not set. I will recompile the kernel with that option too.

root@ittest01:/usr/src/linux # zcat /proc/config.gz |grep KALLSYMS
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
Comment 13 Jochen Buchholz 2011-02-07 18:28:59 UTC
gdb seems working now
root@ittest01:/usr/src/linux # gdb vmlinux /proc/kcore 
GNU gdb (Gentoo 7.2 p1) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /usr/src/linux-2.6.37/vmlinux...done.
[New <main task>]
Core was generated by `root=/dev/ram0 real_root=/dev/vda3 doscsi elevator=deadline vmalloc=256M'.
#0  0x00000000 in ?? ()
(gdb) l *(xfs_da_do_buf+0x4c8)
No symbol "xfs_da_do_buf" in current context.
(gdb) 

I think this is ok, because it stopped at an other place:

xfs_buf_get: failed to map pages
xfs_buf_get: failed to map pages
alloc_vmap_area: 5378 callbacks suppressed
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
Filesystem "dm-8": XFS internal error xfs_trans_cancel at line 1815 of file fs/xfs/xfs_trans.c.  Caller 0xf4d09a01

Pid: 14904, comm: emerge Not tainted 2.6.37 #1
Call Trace:
 [<f4cedbba>] xfs_error_report+0x2c/0x2e [xfs]
 [<f4d05d07>] xfs_trans_cancel+0x4b/0xc9 [xfs]
 [<f4d09a01>] ? xfs_create+0x48c/0x508 [xfs]
 [<f4d09a01>] xfs_create+0x48c/0x508 [xfs]
 [<f4d12bb6>] xfs_vn_mknod+0xc8/0x153 [xfs]
 [<f4d12c5a>] xfs_vn_create+0xa/0xc [xfs]
 [<c01c05d4>] vfs_create+0x85/0xcf
 [<c01c0c7b>] do_last+0x28a/0x50b
 [<c01c2469>] do_filp_open+0x1a7/0x47b
 [<c0417914>] ? do_page_fault+0x0/0x3b2
 [<c01b6b6e>] do_sys_open+0x44/0xbf
 [<c01b6c2b>] sys_open+0x1e/0x26
 [<c0102858>] sysenter_do_call+0x12/0x28
xfs_force_shutdown(dm-8,0x8) called from line 1816 of file fs/xfs/xfs_trans.c.  Return address = 0xf4d05d1d
Filesystem "dm-8": Corruption of in-memory data detected.  Shutting down filesystem: dm-8
Please umount the filesystem, and rectify the problem(s)
Filesystem "dm-8": xfs_log_force: error 5 returned.
Filesystem "dm-8": xfs_log_force: error 5 returned.


Should I restart since a message like:
EIP is at xfs_da_do_buf+0x4c8/0x61f [xfs]
or send me an gdb command
Comment 14 Jochen Buchholz 2011-02-18 21:28:21 UTC
eth0: no IPv6 routers present
hrtimer: interrupt took 64821353 ns
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
xfs_buf_get: failed to map pages
Filesystem "dm-7": XFS internal error xfs_trans_cancel at line 1815 of file fs/xfs/xfs_trans.c.  Caller 0xf4d09a01

Pid: 14607, comm: idlc Not tainted 2.6.37 #1
Call Trace:
 [<f4cedbba>] xfs_error_report+0x2c/0x2e [xfs]
 [<f4d05d07>] xfs_trans_cancel+0x4b/0xc9 [xfs]
 [<f4d09a01>] ? xfs_create+0x48c/0x508 [xfs]
 [<f4d09a01>] xfs_create+0x48c/0x508 [xfs]
 [<f4d12bb6>] xfs_vn_mknod+0xc8/0x153 [xfs]
 [<f4d12c5a>] xfs_vn_create+0xa/0xc [xfs]
 [<c01c05d4>] vfs_create+0x85/0xcf
 [<c01c0c7b>] do_last+0x28a/0x50b
 [<c01c2469>] do_filp_open+0x1a7/0x47b
 [<c01c1bb6>] ? do_unlinkat+0x17/0xfe
 [<c01b6b6e>] do_sys_open+0x44/0xbf
 [<c01b6c2b>] sys_open+0x1e/0x26
 [<c0102858>] sysenter_do_call+0x12/0x28
xfs_force_shutdown(dm-7,0x8) called from line 1816 of file fs/xfs/xfs_trans.c.  Return address = 0xf4d05d1d
Filesystem "dm-7": Corruption of in-memory data detected.  Shutting down filesystem: dm-7
Please umount the filesystem, and rectify the problem(s)
Filesystem "dm-7": xfs_log_force: error 5 returned.
Filesystem "dm-7": xfs_log_force: error 5 returned.
Comment 15 Jochen Buchholz 2011-02-26 22:25:39 UTC
Buggy XFS continues with version 2.6.37.2 on 32-bit systems. Compilation of openoffice end with umounting.
Kernel 2.6.37.2
Feb 26 20:18:26 ittest01 kernel: vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
Feb 26 20:18:26 ittest01 kernel: xfs_buf_get: failed to map pages
Feb 26 20:18:26 ittest01 kernel: BUG: unable to handle kernel NULL pointer dereference at 00000008
Feb 26 20:18:26 ittest01 kernel: IP: [<f09c7b46>] xfs_da_do_buf+0x4c8/0x61f [xfs]
Feb 26 20:18:26 ittest01 kernel: *pde = 00000000
Feb 26 20:18:26 ittest01 kernel: Oops: 0000 [#1] SMP
Feb 26 20:18:26 ittest01 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/virtio1/net/eth0/broadcast
Feb 26 20:18:26 ittest01 kernel: Modules linked in: ipv6 nfsd tpm_tis tpm i2c_piix4 tpm_bios rtc_cmos rtc_core rtc_lib floppy i2c_core pcspkr virtio_net processor virtio_balloon thermal_sys button tg3 libphy e1000 fuse xfs exportfs nfs auth_rpcgss nfs_acl lockd sunrpc jfs raid10 dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log dm_mod scsi_wait_scan sl811_hcd usbhid ohci_hcd ssb uhci_hcd usb_storage ehci_hcd usbcore aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx qlogicfas408 gdth advansys initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_pcmcia pcmcia pcmcia_core
Feb 26 20:18:26 ittest01 kernel: 
Feb 26 20:18:26 ittest01 kernel: Pid: 21831, comm: python Not tainted 2.6.37.2 #1 Bochs Bochs
Feb 26 20:18:26 ittest01 kernel: EIP: 0060:[<f09c7b46>] EFLAGS: 00210246 CPU: 0
Feb 26 20:18:26 ittest01 kernel: EIP is at xfs_da_do_buf+0x4c8/0x61f [xfs]
Feb 26 20:18:26 ittest01 kernel: EAX: 00000001 EBX: ed6e0c00 ECX: 00000008 EDX: d1edfc64
Feb 26 20:18:26 ittest01 kernel: ESI: 00000000 EDI: 00000000 EBP: d1edfc74 ESP: d1edfc0c
Feb 26 20:18:26 ittest01 kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Feb 26 20:18:26 ittest01 kernel: Process python (pid: 21831, ti=d1ede000 task=d2675280 task.ti=d1ede000)
Feb 26 20:18:26 ittest01 kernel: Stack:
Feb 26 20:18:26 ittest01 kernel: 005028cf 00000000 e1531918 ed03bc9c e193ca40 00000000 00000001 00000000
Feb 26 20:18:26 ittest01 kernel: 00000000 ffffffff ed03bc80 d1edfc44 00200292 ec0db320 ec62d4b8 ec0db300
Feb 26 20:18:26 ittest01 kernel: ec62d440 d1edfc68 f09f27b9 f09eaf98 e5142684 00000001 00000000 d1edfc9c
Feb 26 20:18:26 ittest01 kernel: Call Trace:
Feb 26 20:18:26 ittest01 kernel: [<f09f27b9>] ? xfs_buf_rele+0x57/0x9a [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09eaf98>] ? xfs_trans_brelse+0xd5/0xdb [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09c7cf9>] ? xfs_da_read_buf+0x18/0x1d [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09c8dc8>] ? xfs_da_node_lookup_int+0x4d/0x202 [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09c8dc8>] ? xfs_da_node_lookup_int+0x4d/0x202 [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09ceb52>] ? xfs_dir2_node_addname+0x63/0x79e [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09ef67e>] ? kmem_zone_zalloc+0x1c/0x29 [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09e97ef>] ? xfs_trans_add_item+0x18/0x48 [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09b8ff6>] ? xfs_bmap_last_offset+0xe4/0xfa [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09c9bf9>] ? xfs_dir2_isleaf+0x16/0x44 [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09ca68a>] ? xfs_dir_createname+0x107/0x10f [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09ed878>] ? xfs_create+0x32b/0x508 [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09f6b92>] ? xfs_vn_mknod+0xc8/0x153 [xfs]
Feb 26 20:18:26 ittest01 kernel: [<f09f6c36>] ? xfs_vn_create+0xa/0xc [xfs]
Feb 26 20:18:26 ittest01 kernel: [<c01c05a0>] ? vfs_create+0x85/0xcf
Feb 26 20:18:26 ittest01 kernel: [<c01c0c47>] ? do_last+0x28a/0x50b
Feb 26 20:18:26 ittest01 kernel: [<c01c2435>] ? do_filp_open+0x1a7/0x47b
Feb 26 20:18:26 ittest01 kernel: [<c0415dae>] ? _raw_spin_lock_irqsave+0x20/0x28
Feb 26 20:18:26 ittest01 kernel: [<c01b47fe>] ? kmem_cache_alloc+0x88/0xdf
Feb 26 20:18:26 ittest01 kernel: [<c01b6b26>] ? do_sys_open+0x44/0xbf
Feb 26 20:18:26 ittest01 kernel: [<c01b6be3>] ? sys_open+0x1e/0x26
Feb 26 20:18:26 ittest01 kernel: [<c0102818>] ? sysenter_do_call+0x12/0x28
Feb 26 20:18:26 ittest01 kernel: Code: 55 b8 8b 45 b4 eb 13 31 ff 83 7d f0 00 74 12 8b 4d 18 8d 55 f0 b8 01 00 00 00 e8 4c fa ff ff 89 c7 83 7d 14 01 0f 85 c7 00 00 00 <8b> 57 08 89 55 b4 89 d1 8b 52 08 8b 01 0f c8 86 f2 0f b7 d2 81
Feb 26 20:18:26 ittest01 kernel: EIP: [<f09c7b46>] xfs_da_do_buf+0x4c8/0x61f [xfs] SS:ESP 0068:d1edfc0c
Feb 26 20:18:26 ittest01 kernel: CR2: 0000000000000008
Feb 26 20:18:26 ittest01 kernel: ---[ end trace c82e322e7bcca320 ]---
Feb 26 20:18:33 ittest01 kernel: vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
Feb 26 20:18:33 ittest01 kernel: xfs_buf_get: failed to map pages
Feb 26 20:18:33 ittest01 kernel: vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
Feb 26 20:18:33 ittest01 kernel: xfs_buf_get: failed to map pages
Comment 16 Katharine Manton 2011-03-11 04:06:02 UTC
I've been having this problem on several systems from 2.6.34 onwards.  Each has a 2GB partition, rsynced from a local server nightly (after the local server rsyncs to a server on the 'net.)  Filesystem contains many small files.

All affected systems are 32-bit.  One 32-bit system wasn't affected; it turns out I'd formatted the partition with the default 4k block size.  The affected systems are formatted with 1k block size. 'vmalloc=768M' merely reduced the frequency of the problem.

I now have a test system set up and don't mind spending some time on this (as I'm forced to stick with 2.6.32 on two systems for now and have re-formatted the partition with ext3 on another.)  System has a spare drive installed with 3 partitions, formatted with bsize=512, 1k and 4k.

The following occurred while deleting a large number of small files on the 512-byte block fs:

magnum ~ # xfs_info /mnt/512
meta-data=/dev/sdb1              isize=256    agcount=4, agsize=1048576 blks
         =                       sectsz=512   attr=2
data     =                       bsize=512    blocks=4194304, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=512    blocks=20480, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

Mar 11 03:02:21 magnum kernel: vmap allocation for size 4194304 failed: use vmalloc=<size> to increase size.
Mar 11 03:02:21 magnum kernel: xfs_buf_get: failed to map pages
Mar 11 03:02:21 magnum kernel: BUG: unable to handle kernel NULL pointer dereference at 00000008
Mar 11 03:02:21 magnum kernel: IP: [<c10fb360>] xfs_da_do_buf+0x4da/0x5e1
Mar 11 03:02:21 magnum kernel: *pde = 00000000 
Mar 11 03:02:21 magnum kernel: Oops: 0000 [#1] PREEMPT SMP 
Mar 11 03:02:21 magnum kernel: last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
Mar 11 03:02:21 magnum kernel: Modules linked in: nfs lockd nfs_acl sunrpc w83627hf hwmon_vid lm90 hwmon autofs4 w83627hf_wdt raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 md_mod pata_hpt37x snd_ice1712 nvidia(P) snd_ice17xx_ak4xxx snd_ak4xxx_adda snd_cs8427 snd_i2c snd_mpu401_uart snd_intel8x0 snd_rawmidi snd_ac97_codec ac97_bus snd_pcm snd_seq_device snd_timer usbhid e100 agpgart snd analog hid mii soundcore ns558 sg pcspkr i2c_amd756 gameport i2c_core parport_pc parport snd_page_alloc thermal processor button
Mar 11 03:02:21 magnum kernel: 
Mar 11 03:02:21 magnum kernel: Pid: 2725, comm: rm Tainted: P            2.6.36-gentoo-r5 #10 7DPXDW-P/ 
Mar 11 03:02:21 magnum kernel: EIP: 0060:[<c10fb360>] EFLAGS: 00010246 CPU: 0
Mar 11 03:02:21 magnum kernel: EIP is at xfs_da_do_buf+0x4da/0x5e1
Mar 11 03:02:21 magnum kernel: EAX: 00000001 EBX: f60da400 ECX: fbd5a730 EDX: 00000000
Mar 11 03:02:21 magnum kernel: ESI: 00000000 EDI: 00000000 EBP: c70b5d78 ESP: c70b5d14
Mar 11 03:02:21 magnum kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Mar 11 03:02:21 magnum kernel: Process rm (pid: 2725, ti=c70b4000 task=f713c740 task.ti=c70b4000)
Mar 11 03:02:21 magnum kernel: Stack:
Mar 11 03:02:21 magnum kernel:  0012c096 00000000 f64cabc0 f64ca5dc d853fb40 00000001 00000000 00000000
Mar 11 03:02:21 magnum kernel: <0> ffffffff f64ca5c0 c70b5d48 c70b5d4c c70b5d50 c1020717 c70b5d58 c1020717
Mar 11 03:02:21 magnum kernel: <0> 00000001 c70b5d64 c102086e f65eeda0 00000001 00000000 c70b5da0 04000018
Mar 11 03:02:21 magnum kernel: Call Trace:
Mar 11 03:02:21 magnum kernel:  [<c1020717>] ? get_parent_ip+0xb/0x31
Mar 11 03:02:21 magnum kernel:  [<c1020717>] ? get_parent_ip+0xb/0x31
Mar 11 03:02:21 magnum kernel:  [<c102086e>] ? sub_preempt_count+0x7c/0x89
Mar 11 03:02:21 magnum kernel:  [<c10fb4c3>] ? xfs_da_read_buf+0x18/0x1d
Mar 11 03:02:21 magnum kernel:  [<c10fc610>] ? xfs_da_node_lookup_int+0x4d/0x202
Mar 11 03:02:21 magnum kernel:  [<c10fc610>] ? xfs_da_node_lookup_int+0x4d/0x202
Mar 11 03:02:21 magnum kernel:  [<c110194a>] ? xfs_dir2_node_removename+0x3f/0x3e5
Mar 11 03:02:21 magnum kernel:  [<c10fd482>] ? xfs_dir2_isleaf+0x16/0x44
Mar 11 03:02:21 magnum kernel:  [<c10fda50>] ? xfs_dir_removename+0xde/0xe6
Mar 11 03:02:21 magnum kernel:  [<c111f85e>] ? xfs_remove+0x1b3/0x2e0
Mar 11 03:02:21 magnum kernel:  [<c1020717>] ? get_parent_ip+0xb/0x31
Mar 11 03:02:21 magnum kernel:  [<c11286ff>] ? xfs_vn_unlink+0x30/0x62
Mar 11 03:02:21 magnum kernel:  [<c108738b>] ? vfs_rmdir+0x52/0x9e
Mar 11 03:02:21 magnum kernel:  [<c1088bca>] ? do_rmdir+0x7f/0xb7
Mar 11 03:02:21 magnum kernel:  [<c10807c8>] ? fput+0x165/0x16d
Mar 11 03:02:21 magnum kernel:  [<c107dfa3>] ? filp_close+0x51/0x5b
Mar 11 03:02:21 magnum kernel:  [<c1088c26>] ? sys_unlinkat+0x24/0x32
Mar 11 03:02:21 magnum kernel:  [<c10025cc>] ? sysenter_do_call+0x12/0x22
Mar 11 03:02:21 magnum kernel: Code: f0 00 c7 45 b8 00 00 00 00 74 13 8b 4d 18 8d 55 f0 b8 01 00 00 00 e8 06 fa ff ff 89 45 b8 83 7d 14 01 0f 85 82 00 00 00 8b 55 b8 <8b> 4a 08 8b 51 08 8b 01 0f c8 86 f2 0f b7 d2 81 fa ee fb 00 00 
Mar 11 03:02:21 magnum kernel: EIP: [<c10fb360>] xfs_da_do_buf+0x4da/0x5e1 SS:ESP 0068:c70b5d14
Mar 11 03:02:21 magnum kernel: CR2: 0000000000000008
Mar 11 03:02:21 magnum kernel: ---[ end trace 118398ff1b25f91d ]---

magnum ~ # gdb /usr/src/linux/vmlinux
GNU gdb (Gentoo 7.2 p1) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /usr/src/linux-2.6.36-gentoo-r5/vmlinux...done.
(gdb) l *(xfs_da_do_buf+0x4da)
0xc10fb360 is in xfs_da_do_buf (fs/xfs/xfs_da_btree.c:2088).
2083                    xfs_dir2_data_t         *data;
2084                    xfs_dir2_free_t         *free;
2085                    xfs_da_blkinfo_t        *info;
2086                    uint                    magic, magic1;
2087
2088                    info = rbp->data;
2089                    data = rbp->data;
2090                    free = rbp->data;
2091                    magic = be16_to_cpu(info->magic);
2092                    magic1 = be32_to_cpu(data->hdr.magic);
(gdb)

Next, I'll download and compile the latest vanilla kernel.  I was testing with 2.6.36-gentoo-r5 as I knew I could reliably trigger this bug with it.
Comment 17 Dave Chinner 2011-03-11 22:31:04 UTC
(In reply to comment #16)
> Reading symbols from /usr/src/linux-2.6.36-gentoo-r5/vmlinux...done.
> (gdb) l *(xfs_da_do_buf+0x4da)
> 0xc10fb360 is in xfs_da_do_buf (fs/xfs/xfs_da_btree.c:2088).
> 2083                    xfs_dir2_data_t         *data;
> 2084                    xfs_dir2_free_t         *free;
> 2085                    xfs_da_blkinfo_t        *info;
> 2086                    uint                    magic, magic1;
> 2087
> 2088                    info = rbp->data;
> 2089                    data = rbp->data;
> 2090                    free = rbp->data;
> 2091                    magic = be16_to_cpu(info->magic);
> 2092                    magic1 = be32_to_cpu(data->hdr.magic);
> (gdb)

Thanks, that tells me what the problem is. I'll attach a patch for you to try in a couple of minutes.
Comment 18 Dave Chinner 2011-03-11 22:35:54 UTC
Created attachment 50652 [details]
Fix oops on directory read error

This should fix the oops in xfs_do_da_buf() as a result of running out of vmalloc space. It won't fix the vmalloc problem, but should allow the system to continue to operate. Note that instead of a crash, it is likely that a controlled filesystem shutdown will now result when vmalloc space is exhausted in these situations....
Comment 19 Katharine Manton 2011-03-14 16:51:33 UTC
Patch applied to 2.6.36-gentoo-r5 and 'vanilla' 2.6.37.2

Both exhibit the same behaviour, so I'll stick with 2.6.37.2 now.

Running rsync, I now see a lot of:

Mar 14 15:50:00 magnum kernel: Filesystem "sdb2": page discard on page f777d9a0, inode 0x2000f1, offset 0.
Mar 14 15:50:10 magnum kernel: Filesystem "sdb2": page discard on page f77549e0, inode 0x600452, offset 0.
Mar 14 15:50:10 magnum kernel: Filesystem "sdb2": page discard on page f77549c0, inode 0x600453, offset 0.

...eventually and inevitably followed by:

Mar 14 15:50:20 magnum kernel: Filesystem "sdb2": XFS internal error xfs_trans_cancel at line 1815 of file fs/xfs/xfs_trans.c.  Caller 0xc1124a2e
Mar 14 15:50:20 magnum kernel: 
Mar 14 15:50:20 magnum kernel: Pid: 1835, comm: rsync Not tainted 2.6.37.2-p1 #2
Mar 14 15:50:20 magnum kernel: Call Trace:
Mar 14 15:50:20 magnum kernel:  [<c1109d0a>] xfs_error_report+0x2c/0x2e
Mar 14 15:50:20 magnum kernel:  [<c1121153>] xfs_trans_cancel+0x4b/0xc9
Mar 14 15:50:20 magnum kernel:  [<c1124a2e>] ? xfs_create+0x40f/0x460
Mar 14 15:50:20 magnum kernel:  [<c1124a2e>] xfs_create+0x40f/0x460
Mar 14 15:50:20 magnum kernel:  [<c112d2fe>] xfs_vn_mknod+0xc8/0x153
Mar 14 15:50:20 magnum kernel:  [<c112d3a2>] xfs_vn_create+0xa/0xc
Mar 14 15:50:20 magnum kernel:  [<c108a707>] vfs_create+0x60/0xaa
Mar 14 15:50:20 magnum kernel:  [<c112d398>] ? xfs_vn_create+0x0/0xc
Mar 14 15:50:20 magnum kernel:  [<c108b43c>] do_last+0x290/0x511
Mar 14 15:50:20 magnum kernel:  [<c108cc21>] do_filp_open+0x19a/0x47c
Mar 14 15:50:20 magnum kernel:  [<c1021bda>] ? get_parent_ip+0xb/0x31
Mar 14 15:50:20 magnum kernel:  [<c10223d4>] ? sub_preempt_count+0x7c/0x89
Mar 14 15:50:20 magnum kernel:  [<c1094700>] ? alloc_fd+0xbd/0xca
Mar 14 15:50:20 magnum kernel:  [<c1081891>] do_sys_open+0x44/0xc0
Mar 14 15:50:20 magnum kernel:  [<c108194f>] sys_open+0x1e/0x26
Mar 14 15:50:20 magnum kernel:  [<c100270c>] sysenter_do_call+0x12/0x22
Mar 14 15:50:20 magnum kernel: xfs_force_shutdown(sdb2,0x8) called from line 1816 of file fs/xfs/xfs_trans.c.  Return address = 0xc1121169
Mar 14 15:50:20 magnum kernel: Filesystem "sdb2": Corruption of in-memory data detected.  Shutting down filesystem: sdb2
Mar 14 15:50:20 magnum kernel: Please umount the filesystem, and rectify the problem(s)
Mar 14 15:50:35 magnum kernel: Filesystem "sdb2": xfs_log_force: error 5 returned.
Mar 14 15:51:35 magnum last message repeated 2 times

At this point, I can umount/mount/umount/xfs_check/mount and run rsync again.  The above will happen a few times; each time XFS recovery succeeds on mount and xfs_check doesn't report anything.

After repeating the above a few times, eventually this happens:

Mar 14 16:03:12 magnum kernel: vmap allocation for size 4194304 failed: use vmalloc=<size> to increase size.
Mar 14 16:03:12 magnum kernel: xfs_buf_get: failed to map pages

...repeated many times, then:

Mar 14 16:03:12 magnum kernel: Filesystem "sdb2": XFS internal error xfs_trans_cancel at line 1815 of file fs/xfs/xfs_trans.c.  Caller 0xc1124a2e
Mar 14 16:03:12 magnum kernel: 
Mar 14 16:03:12 magnum kernel: Pid: 2123, comm: rsync Not tainted 2.6.37.2-p1 #2
Mar 14 16:03:12 magnum kernel: Call Trace:
Mar 14 16:03:12 magnum kernel:  [<c1109d0a>] xfs_error_report+0x2c/0x2e
Mar 14 16:03:12 magnum kernel:  [<c1121153>] xfs_trans_cancel+0x4b/0xc9
Mar 14 16:03:12 magnum kernel:  [<c1124a2e>] ? xfs_create+0x40f/0x460
Mar 14 16:03:12 magnum kernel:  [<c1124a2e>] xfs_create+0x40f/0x460
Mar 14 16:03:12 magnum kernel:  [<c112d2fe>] xfs_vn_mknod+0xc8/0x153
Mar 14 16:03:12 magnum kernel:  [<c112d3a2>] xfs_vn_create+0xa/0xc
Mar 14 16:03:12 magnum kernel:  [<c108a707>] vfs_create+0x60/0xaa
Mar 14 16:03:12 magnum kernel:  [<c112d398>] ? xfs_vn_create+0x0/0xc
Mar 14 16:03:12 magnum kernel:  [<c108b43c>] do_last+0x290/0x511
Mar 14 16:03:12 magnum kernel:  [<c108cc21>] do_filp_open+0x19a/0x47c
Mar 14 16:03:12 magnum kernel:  [<c1021bda>] ? get_parent_ip+0xb/0x31
Mar 14 16:03:12 magnum kernel:  [<c10223d4>] ? sub_preempt_count+0x7c/0x89
Mar 14 16:03:12 magnum kernel:  [<c1094700>] ? alloc_fd+0xbd/0xca
Mar 14 16:03:12 magnum kernel:  [<c1081891>] do_sys_open+0x44/0xc0
Mar 14 16:03:12 magnum kernel:  [<c108194f>] sys_open+0x1e/0x26
Mar 14 16:03:12 magnum kernel:  [<c100270c>] sysenter_do_call+0x12/0x22
Mar 14 16:03:12 magnum kernel: xfs_force_shutdown(sdb2,0x8) called from line 1816 of file fs/xfs/xfs_trans.c.  Return address = 0xc1121169
Mar 14 16:03:12 magnum kernel: Filesystem "sdb2": Corruption of in-memory data detected.  Shutting down filesystem: sdb2
Mar 14 16:03:12 magnum kernel: Please umount the filesystem, and rectify the problem(s)
Mar 14 16:03:24 magnum kernel: Filesystem "sdb2": xfs_log_force: error 5 returned.
Mar 14 16:04:24 magnum last message repeated 2 times

After this has occurred:

magnum ~ # umount /mnt/1k
magnum ~ # mount /mnt/1k
mount: Cannot allocate memory

Mar 14 16:10:49 magnum kernel: XFS mounting filesystem sdb2
Mar 14 16:10:49 magnum kernel: alloc_vmap_area: 1561 callbacks suppressed
Mar 14 16:10:49 magnum kernel: vmap allocation for size 4194304 failed: use vmalloc=<size> to increase size.
Mar 14 16:10:49 magnum kernel: xfs_buf_get_uncached: failed to map pages
Mar 14 16:10:49 magnum kernel: XFS: log mount failed

At this point, the only recourse is to reboot.  Destroying the log doesn't help:

magnum ~ # xfs_repair -L /dev/sdb2
[...]
magnum ~ # mount /mnt/1k
mount: Cannot allocate memory

Mar 14 16:15:35 magnum kernel: XFS mounting filesystem sdb2
Mar 14 16:15:35 magnum kernel: vmap allocation for size 4194304 failed: use vmalloc=<size> to increase size.
Mar 14 16:15:35 magnum kernel: xfs_buf_get_uncached: failed to map pages
Mar 14 16:15:35 magnum kernel: XFS: log mount failed

After a reboot it's possible to mount the fs; if the log wasn't destroyed with 'xfs_repair -L' before rebooting, XFS recovery succeeds on mount.

At least the oops is fixed by your patch, as expected.
Comment 20 Dave Chinner 2011-03-14 21:45:02 UTC
On Mon, Mar 14, 2011 at 04:51:49PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=27492
> 
> 
> 
> 
> 
> --- Comment #19 from Katharine Manton <kat@technogoths.org.uk>  2011-03-14
> 16:51:33 ---
> Patch applied to 2.6.36-gentoo-r5 and 'vanilla' 2.6.37.2
> 
> Both exhibit the same behaviour, so I'll stick with 2.6.37.2 now.
> 
> Running rsync, I now see a lot of:
> 
> Mar 14 15:50:00 magnum kernel: Filesystem "sdb2": page discard on page
> f777d9a0, inode 0x2000f1, offset 0.
> Mar 14 15:50:10 magnum kernel: Filesystem "sdb2": page discard on page
> f77549e0, inode 0x600452, offset 0.
> Mar 14 15:50:10 magnum kernel: Filesystem "sdb2": page discard on page
> f77549c0, inode 0x600453, offset 0.

That implies you have run your filesystem out of space and exhausted
the reserve pool of blocks. Or perhaps you are getting IO errors
from your hardware.

> ...eventually and inevitably followed by:
> 
> Mar 14 15:50:20 magnum kernel: Filesystem "sdb2": XFS internal error
> xfs_trans_cancel at line 1815 of file fs/xfs/xfs_trans.c.  Caller 0xc1124a2e
> Mar 14 15:50:20 magnum kernel: 
> Mar 14 15:50:20 magnum kernel: Pid: 1835, comm: rsync Not tainted 2.6.37.2-p1
> #2
> Mar 14 15:50:20 magnum kernel: Call Trace:
> Mar 14 15:50:20 magnum kernel:  [<c1109d0a>] xfs_error_report+0x2c/0x2e
> Mar 14 15:50:20 magnum kernel:  [<c1121153>] xfs_trans_cancel+0x4b/0xc9
> Mar 14 15:50:20 magnum kernel:  [<c1124a2e>] ? xfs_create+0x40f/0x460
> Mar 14 15:50:20 magnum kernel:  [<c1124a2e>] xfs_create+0x40f/0x460
> Mar 14 15:50:20 magnum kernel:  [<c112d2fe>] xfs_vn_mknod+0xc8/0x153
> Mar 14 15:50:20 magnum kernel:  [<c112d3a2>] xfs_vn_create+0xa/0xc
> Mar 14 15:50:20 magnum kernel:  [<c108a707>] vfs_create+0x60/0xaa
> Mar 14 15:50:20 magnum kernel:  [<c112d398>] ? xfs_vn_create+0x0/0xc
> Mar 14 15:50:20 magnum kernel:  [<c108b43c>] do_last+0x290/0x511
> Mar 14 15:50:20 magnum kernel:  [<c108cc21>] do_filp_open+0x19a/0x47c
> Mar 14 15:50:20 magnum kernel:  [<c1021bda>] ? get_parent_ip+0xb/0x31
> Mar 14 15:50:20 magnum kernel:  [<c10223d4>] ? sub_preempt_count+0x7c/0x89
> Mar 14 15:50:20 magnum kernel:  [<c1094700>] ? alloc_fd+0xbd/0xca
> Mar 14 15:50:20 magnum kernel:  [<c1081891>] do_sys_open+0x44/0xc0
> Mar 14 15:50:20 magnum kernel:  [<c108194f>] sys_open+0x1e/0x26
> Mar 14 15:50:20 magnum kernel:  [<c100270c>] sysenter_do_call+0x12/0x22
> Mar 14 15:50:20 magnum kernel: xfs_force_shutdown(sdb2,0x8) called from line
> 1816 of file fs/xfs/xfs_trans.c.  Return address = 0xc1121169
> Mar 14 15:50:20 magnum kernel: Filesystem "sdb2": Corruption of in-memory
> data
> detected.  Shutting down filesystem: sdb2
> Mar 14 15:50:20 magnum kernel: Please umount the filesystem, and rectify the
> problem(s)
> Mar 14 15:50:35 magnum kernel: Filesystem "sdb2": xfs_log_force: error 5
> returned.
> Mar 14 15:51:35 magnum last message repeated 2 times

Which further implies that you are at ENOSPC, I think. However,
there should not be a shutdown here due to ENOSPC - all known
accounting bugs were fixed quite some time ago. If you can isolate
this problem, please raise a new bug for it.

> At this point, I can umount/mount/umount/xfs_check/mount and run
> rsync again.  The above will happen a few times; each time XFS
> recovery succeeds on mount and xfs_check doesn't report anything.

> After repeating the above a few times, eventually this happens:
> 
> Mar 14 16:03:12 magnum kernel: vmap allocation for size 4194304 failed: use
> vmalloc=<size> to increase size.
> Mar 14 16:03:12 magnum kernel: xfs_buf_get: failed to map pages

Which is back to the original problem. If increasing vmalloc space
doesn't fix your problem then you really, really need to get them VM
folk to triage and fix the problem (which appears to be vmap area
fragmentation). The only other thing you can do to avoid this
is move to x86_64...

> Mar 14 16:04:24 magnum last message repeated 2 times
> 
> After this has occurred:
> 
> magnum ~ # umount /mnt/1k
> magnum ~ # mount /mnt/1k
> mount: Cannot allocate memory
> 
> Mar 14 16:10:49 magnum kernel: XFS mounting filesystem sdb2
> Mar 14 16:10:49 magnum kernel: alloc_vmap_area: 1561 callbacks suppressed
> Mar 14 16:10:49 magnum kernel: vmap allocation for size 4194304 failed: use
> vmalloc=<size> to increase size.
> Mar 14 16:10:49 magnum kernel: xfs_buf_get_uncached: failed to map pages
> Mar 14 16:10:49 magnum kernel: XFS: log mount failed
> 
> At this point, the only recourse is to reboot.  Destroying the log doesn't
> help:
> 
> magnum ~ # xfs_repair -L /dev/sdb2
> [...]
> magnum ~ # mount /mnt/1k
>
> mount: Cannot allocate memory
> Mar 14 16:15:35 magnum kernel: XFS mounting filesystem sdb2
> Mar 14 16:15:35 magnum kernel: vmap allocation for size 4194304 failed: use
> vmalloc=<size> to increase size.
> Mar 14 16:15:35 magnum kernel: xfs_buf_get_uncached: failed to map pages
> Mar 14 16:15:35 magnum kernel: XFS: log mount failed

Of course not - mounting still needs to allocate the memory for the
log buffers. Once the VM is screwed, rebooting is your only option.

> At least the oops is fixed by your patch, as expected.

OK, I'll push it forwards.

Cheers,

Dave.
Comment 21 Katharine Manton 2011-03-16 14:56:44 UTC
(In reply to comment #20)
> That implies you have run your filesystem out of space and exhausted
> the reserve pool of blocks. Or perhaps you are getting IO errors
> from your hardware.

I've since added a SCSI controller and drive to the test system and see the same thing happening.

The source fs for rsync contains 584M of files.  The destination is a 2G partition, freshly formatted each time.

On a recent test (using the SCSI hardware), this:

Mar 16 12:42:44 magnum kernel: Filesystem "sdc2": page discard on page f7766980, inode 0x40a0fe, offset 0.
Mar 16 12:42:44 magnum kernel: Filesystem "sdc2": XFS internal error xfs_trans_cancel at line 1815 of file fs/xfs/xfs_trans.c.  Caller 0xc1124a2e

(etc.)

...occurred after rsync had transferred 2.8M of files:

After umount/mount:

# du -sh /mnt/c.1k/
2.8M    /mnt/c.1k/

# df -hT /mnt/c.1k/
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sdc2      xfs    2.0G   14M  2.0G   1% /mnt/c.1k

The same behaviour with PATA controller/drive and SCSI controller/drive seems to rule out I/O and it doesn't seem to be ENOSPC...

I rebooted with 2.6.32 and knocked up a script to umount/mkfs.xfs/mount/rsync repeatedly.  67 iterations later with no kernel messages and all files transferred each time I stopped testing; all seems well with 2.6.32

> Which further implies that you are at ENOSPC, I think. However,
> there should not be a shutdown here due to ENOSPC - all known
> accounting bugs were fixed quite some time ago. If you can isolate
> this problem, please raise a new bug for it.

It's repeatable with SCSI and PATA hardware.  This behaviour seems to have replaced the oops; I need to boot the unpatched 2.6.37.2 and run the tests again.

> Which is back to the original problem. If increasing vmalloc space
> doesn't fix your problem then you really, really need to get them VM
> folk to triage and fix the problem (which appears to be vmap area
> fragmentation). The only other thing you can do to avoid this
> is move to x86_64...

How much to increase vmalloc space to, though?  Default on this system is 128M.  On one system I increased it incrementally; 'vmalloc=768M' seemed to help but didn't eliminate the problem entirely.  What are the consequences (if any) of increasing vmalloc space considerably beyond the default?

I'm not sure how to proceed.  Moving to x86_64 isn't an option.  XFS /was/ reliable; I've used it on 32-bit systems from 2.4.19 + XFS 1.3 patch (probably earlier) and haven't had any problems until >2.6.32.  Something's changed.

I'll do some more testing with ever-larger vmalloc space.  I don't want to report another bug unless there really is one!

Cheers, Kat
Comment 22 Stratos Psomadakis 2011-05-02 12:32:43 UTC
Bug 27002 seems somewhat related to this one. A possible fix is included in 2.6.39-rc [1].

[1] http://www.spinics.net/lists/xfs/msg03913.html
Comment 23 Jochen Buchholz 2011-06-05 18:52:09 UTC
(In reply to comment #22)
> Bug 27002 seems somewhat related to this one. A possible fix is included in
> 2.6.39-rc [1].
> 
> [1] http://www.spinics.net/lists/xfs/msg03913.html

This patch did not work on 32 bit systems: 
Jun  4 23:36:42 skywalker klogd: vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
Jun  4 23:36:42 skywalker last message repeated 2 times
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
Jun  4 23:36:42 skywalker last message repeated 2 times
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
Jun  4 23:36:42 skywalker last message repeated 2 times
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:42 skywalker klogd: XFS (dm-8): xfs_buf_get: failed to map pages
Jun  4 23:36:42 skywalker klogd: 
Jun  4 23:36:58 skywalker klogd: alloc_vmap_area: 32 callbacks suppressed
Jun  4 23:36:58 skywalker klogd: vmap allocation for size 1048576 failed: use vmalloc=<size> to increase size.
Jun  4 23:36:58 skywalker last message repeated 2 times
Jun  4 23:36:58 skywalker klogd: XFS (dm-7): xfs_buf_get: failed to map pages
Jun  4 23:36:58 skywalker klogd: 
Jun  4 23:36:58 skywalker klogd: XFS (dm-7): Internal error xfs_trans_cancel at line 1911 of file fs/xfs/xfs_trans.c.  Caller 0xf95d5c62
Jun  4 23:36:58 skywalker klogd: 
Jun  4 23:36:58 skywalker klogd: Pid: 2338, comm: mkdir Not tainted 2.6.39 #1
Jun  4 23:36:58 skywalker klogd: Call Trace:
Jun  4 23:36:58 skywalker klogd:  [<f95b9bcb>] xfs_error_report+0x27/0x2c [xfs]
Jun  4 23:36:58 skywalker klogd:  [<f95d5c62>] ? xfs_create+0x426/0x49e [xfs]
Jun  4 23:36:58 skywalker klogd:  [<f95d1f8c>] xfs_trans_cancel+0x4b/0xc9 [xfs]
Jun  4 23:36:58 skywalker klogd:  [<f95d5c62>] ? xfs_create+0x426/0x49e [xfs]
Jun  4 23:36:58 skywalker klogd:  [<f95d5c62>] xfs_create+0x426/0x49e [xfs]
Jun  4 23:36:58 skywalker klogd:  [<f95df1b7>] xfs_vn_mknod+0xc5/0x153 [xfs]
Jun  4 23:36:58 skywalker klogd:  [<f95df252>] xfs_vn_mkdir+0xd/0xf [xfs]
Jun  4 23:36:58 skywalker klogd:  [<c01c85a4>] vfs_mkdir+0x6b/0xba
Jun  4 23:36:58 skywalker klogd:  [<c01ca662>] sys_mkdirat+0x73/0xac
Jun  4 23:36:58 skywalker klogd:  [<c01a970a>] ? remove_vma+0x46/0x4c
Jun  4 23:36:58 skywalker klogd:  [<c01aa4cd>] ? do_munmap+0x217/0x231
Jun  4 23:36:58 skywalker klogd:  [<c01ca6ab>] sys_mkdir+0x10/0x12
Jun  4 23:36:58 skywalker klogd:  [<c0411bcc>] sysenter_do_call+0x12/0x22
Jun  4 23:36:58 skywalker klogd: XFS (dm-7): xfs_do_force_shutdown(0x8) called from line 1912 of file fs/xfs/xfs_trans.c.  Return address = 0xf95d1fa2
Jun  4 23:36:58 skywalker klogd: XFS (dm-7): Corruption of in-memory data detected.  Shutting down filesystem
Jun  4 23:36:58 skywalker klogd: XFS (dm-7): Please umount the filesystem and rectify the problem(s)
Comment 24 Alan 2013-12-11 12:03:52 UTC

Closing as obsolete, if this is still seen on modern kernels please update

Note You need to log in before you can comment on or make changes to this bug.