Bug 7707 - "Eeek! page_mapcount(page) went negative! (-1)"
Summary: "Eeek! page_mapcount(page) went negative! (-1)"
Status: REJECTED INVALID
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Page Allocator (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Nick Piggin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-12-18 10:45 UTC by Chris Rankin
Modified: 2007-09-05 05:11 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.19.1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
rmap debug for 2.6.20 (14.56 KB, patch)
2007-02-05 20:03 UTC, Nick Piggin
Details | Diff

Description Chris Rankin 2006-12-18 10:45:31 UTC
Most recent kernel where this bug did *NOT* occur:
2.6.18.5 (to my knowledge)

Distribution:
Userspace is FC5

Hardware Environment:
Dual P4 2.66 GHz Xeon (Northwood), HT enabled, 2 GB RAM, compiled with gcc-4.1.1

Software Environment:

Problem Description:
Compiling xine-lib triggered internal BUG report:

Eeek! page_mapcount(page) went negative! (-1)
  page->flags = 14
  page->count = 0
  page->mapping = 00000000
------------[ cut here ]------------
kernel BUG at /home/chris/LINUX/linux-2.6.19/mm/rmap.c:578!
invalid opcode: 0000 [#1]
PREEMPT SMP
Modules linked in: radeon drm pwc eeprom cpufreq_ondemand p4_clockmod
speedstep_lib nfsd exportfs ipv6 autofs4 nfs lockd sunrpc af_packet
firmware_class binfmt_misc video thermal processor fan button ac lp parport_pc
parport nvram video1394 raw1394 eth1394 compat_ioctl32 videodev v4l1_compat
v4l2_common snd_usb_audio snd_usb_lib snd_intel8x0 snd_emu10k1_synth
snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1 snd_rawmidi
snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss ohci1394
snd_seq_midi_event snd_seq ieee1394 snd_pcm_oss snd_mixer_oss snd_pcm ehci_hcd
e7xxx_edac serio_raw snd_seq_device uhci_hcd edac_mc e1000 psmouse snd_timer
snd_page_alloc snd_util_mem snd_hwdep ide_cd cdrom snd soundcore pcspkr
intel_agp i2c_i801 i2c_core agpgart usbcore ext3 jbd
CPU:    0
EIP:    0060:[<c0145fa0>]    Not tainted VLI
EFLAGS: 00010282   (2.6.19.1 #1)
EIP is at page_remove_rmap+0x70/0x8f
eax: 0000001e   ebx: c100f500   ecx: ebd0c000   edx: 00000002
esi: 00000020   edi: 08665000   ebp: eac11994   esp: ebd0cee0
ds: 007b   es: 007b   ss: 0068
Process cc1 (pid: 24220, ti=ebd0c000 task=ed5b3a90 task.ti=ebd0c000)
Stack: c0284bf0 00000000 c100f500 c0140c39 00000000 edb60518 ebd0cf54 00000000
       00000001 08696000 ec3f3084 f798f040 c200f0c0 fffffff9 ffffffff c155822c
       ec3f3084 08696000 00000000 00000000 ebd0cf54 ecfab5c0 f798f040 00000001
Call Trace:
 [<c0140c39>] unmap_vmas+0x24d/0x4df
 [<c01436ac>] exit_mmap+0x7e/0x10e
 [<c011717d>] mmput+0x1d/0x78
 [<c011bb94>] do_exit+0x1a9/0x77c
 [<c0111c2f>] do_page_fault+0x281/0x51a
 [<c0150689>] vfs_write+0xfc/0x13b
 [<c011c1dd>] sys_exit_group+0x0/0xd
 [<c0102b9d>] sysenter_past_esp+0x56/0x79
 =======================
Code: 74 03 8b 53 0c 8b 42 04 89 44 24 04 c7 04 24 d9 4b 28 c0 e8 52 3c fd ff 8b
43 10 89 44 24 04 c7 04 24 f0 4b 28 c0 e8 3f 3c fd ff <0f> 0b 42 02 66 4b 28 c0
8b 53 10 83 f2 01 83 e2 01 89 d8 5b 59
EIP: [<c0145fa0>] page_remove_rmap+0x70/0x8f SS:ESP 0068:ebd0cee0
 <1>Fixing recursive fault but reboot is needed!
BUG: scheduling while atomic: cc1/0x00000002/24220
 [<c026d6cf>] __sched_text_start+0x4f/0x900
 [<c019ed40>] cfq_free_io_context+0x57/0xbb
 [<c0140068>] sys_madvise+0x168/0x3a4
 [<c011bad9>] do_exit+0xee/0x77c
 [<c0140068>] sys_madvise+0x168/0x3a4
 [<c0140068>] sys_madvise+0x168/0x3a4
 [<c0103fc7>] die+0x2a5/0x2cc
 [<c010486c>] do_invalid_op+0x0/0xab
 [<c010490e>] do_invalid_op+0xa2/0xab
 [<c0145fa0>] page_remove_rmap+0x70/0x8f
 [<c0119b85>] vprintk+0x2b9/0x313
 [<c0119b8f>] vprintk+0x2c3/0x313
 [<c013a59c>] __pagevec_free+0x18/0x22
 [<c026ff79>] error_code+0x39/0x40
 [<c0145fa0>] page_remove_rmap+0x70/0x8f
 [<c0140c39>] unmap_vmas+0x24d/0x4df
 [<c01436ac>] exit_mmap+0x7e/0x10e
 [<c011717d>] mmput+0x1d/0x78
 [<c011bb94>] do_exit+0x1a9/0x77c
 [<c0111c2f>] do_page_fault+0x281/0x51a
 [<c0150689>] vfs_write+0xfc/0x13b
 [<c011c1dd>] sys_exit_group+0x0/0xd
 [<c0102b9d>] sysenter_past_esp+0x56/0x79
 =======================

Steps to reproduce:
Have not reproduced it yet.
Comment 1 Vincent Kessler 2007-01-16 12:43:15 UTC
I think i got the same bug here:

Most recent kernel where this bug did *NOT* occur:
unknown

Distribution:
Custom Gentoo based

Hardware Environment:
Via EPIA based
(12 hours memtest86 proofed)

Software Environment:
kernel 2.6.19


Eeek! page_mapcount(page) went negative! (-1)
  page->flags = 80000004
  page->count = 0
  page->mapping = 00000000
------------[ cut here ]------------
kernel BUG at mm/rmap.c:578!
invalid opcode: 0000 [#1]
PREEMPT
Modules linked in: fcpci(P)
CPU:    0
EIP:    0060:[<c0143bd5>]    Tainted: P      VLI
EFLAGS: 00010286   (2.6.19 #2)
EIP is at page_remove_rmap+0x75/0xa6
eax: 0000001e   ebx: c122b140   ecx: ffffffff   edx: 00002a1d
esi: b7f5c000   edi: de060d70   ebp: 00000020   esp: de0f5dd0
ds: 007b   es: 007b   ss: 0068
Process telnet (pid: 948, ti=de0f4000 task=dee7b030 task.ti=de0f4000)
Stack: c030f271 00000000 c122b140 c013e434 c122b140 b7f5c000 1158a025 00006000
       00000000 00000001 b7f86000 de0edb7c de03f220 c03d248c 00000000 fffffffe
       de0edb7c b7f86000 00000000 de0f5e4c de04dcf8 de03f220 00000002 c0140f76
Call Trace:
 [<c013e434>] unmap_vmas+0x267/0x470
 [<c0140f76>] exit_mmap+0x75/0x101
 [<c011453f>] mmput+0x25/0x85
 [<c01178bf>] exit_mm+0xcd/0xd3
 [<c0118d2d>] do_exit+0x19c/0x7a5
 [<c01193ba>] sys_exit_group+0x0/0x11
 [<c0121416>] get_signal_to_deliver+0x391/0x3b7
 [<c0102529>] do_notify_resume+0x84/0x609
 [<c015d7a3>] d_rehash+0x17/0x44
 [<c027eade>] sock_attach_fd+0x7d/0xd9
 [<c027e693>] sockfd_lookup_light+0x24/0x3e
 [<c027e811>] sys_setsockopt+0x7f/0x9e
 [<c027ff15>] sys_socketcall+0x9a/0x24d
 [<c0102ede>] work_notifysig+0x13/0x25
 =======================
Code: 74 03 8b 53 0c 8b 42 04 c7 04 24 5a f2 30 c0 89 44 24 04 e8 e6 31 fd ff 8b
43 10 c7 04 24 71 f2 30 c0 89 44 24 04 e8 d3 31 fd ff <0f> 0b 42 02 06 f2 30 c0
8b 03 8b 53 10 c1 e8 1f 83 f2 01 83 e2
EIP: [<c0143bd5>] page_remove_rmap+0x75/0xa6 SS:ESP 0068:de0f5dd0
 <1>Fixing recursive fault but reboot is needed!
BUG: scheduling while atomic: telnet/0x00000002/948
 [<c02f588f>] __sched_text_start+0x4f/0x54f
 [<c011aaad>] irq_exit+0x25/0x30
 [<c0105a23>] do_IRQ+0x70/0x85
 [<c0118c7a>] do_exit+0xe9/0x7a5
 [<c01047a1>] die+0x286/0x28e
 [<c01050a8>] do_invalid_op+0x0/0xb4
 [<c0105153>] do_invalid_op+0xab/0xb4
 [<c0143bd5>] page_remove_rmap+0x75/0xa6
 [<c0116957>] release_console_sem+0x1a1/0x1e0
 [<c0116d8b>] vprintk+0x29c/0x2b9
 [<c02b5f37>] tcp_v4_send_check+0x8a/0xcf
 [<c02f7639>] error_code+0x39/0x40
 [<c0143bd5>] page_remove_rmap+0x75/0xa6
 [<c013e434>] unmap_vmas+0x267/0x470
 [<c0140f76>] exit_mmap+0x75/0x101
 [<c011453f>] mmput+0x25/0x85
 [<c01178bf>] exit_mm+0xcd/0xd3
 [<c0118d2d>] do_exit+0x19c/0x7a5
 [<c01193ba>] sys_exit_group+0x0/0x11
 [<c0121416>] get_signal_to_deliver+0x391/0x3b7
 [<c0102529>] do_notify_resume+0x84/0x609
 [<c015d7a3>] d_rehash+0x17/0x44
 [<c027eade>] sock_attach_fd+0x7d/0xd9
 [<c027e693>] sockfd_lookup_light+0x24/0x3e
 [<c027e811>] sys_setsockopt+0x7f/0x9e
 [<c027ff15>] sys_socketcall+0x9a/0x24d
 [<c0102ede>] work_notifysig+0x13/0x25
 =======================
Comment 2 Nick Piggin 2007-02-01 01:22:50 UTC
Are these repeatable at all? If so, I have a patch that could help track it down
if you are intersted to try?

Thanks
Comment 3 Chris Rankin 2007-02-03 04:16:51 UTC
Nick,

I haven't been able to reproduce this yet, no. (Which is annoying, because it
happened very quickly the first time.) But could you post the patch anyway, please?

Thanks,
Chris
Comment 4 Nick Piggin 2007-02-05 20:03:36 UTC
Created attachment 10309 [details]
rmap debug for 2.6.20

This should keep track of exactly who incremented the mapcount for a given
page,
and will print traces if somebody decrements the mapcount if they are not
supposed
to.
Comment 5 Natalie Protasevich 2007-05-22 14:42:34 UTC
Chris, Vincent,
Were you able to incorporate the patch and run with it? Have the problem been
reproduced with the patch in?
Thanks,
--Natalie
Comment 6 Susanna Kaukinen 2007-07-12 23:50:56 UTC
Looks like what I have, too:

Distribution:
Linux version 2.6.21 (2.6.21-5) (root@skeptic) (gcc version 4.1.3 20070601 (prerelease) (Debian 4.1.2-12)) #1 SMP Wed Jul 11 16:37:10 EEST 2007

However, I think I had the same issue while I was still running Debian 4.0.

Hardware Environment: Acer 7003WSMi

Software Environment:

Problem Description:
Executing `sudo vim /etc/fstab' while attemping to get my laptop connected to my Nokia 7373 phone via a bluetooth usb device, an `A-Link Bluetooth USB 2.0 Adapter A2 - US'. I was attempting to use `fuse', or so.

Dumps to terminals:

 Message from syslogd@localhost at Thu Jul 12 22:11:53 2007 ...
 localhost kernel: Eeek! page_mapcount(page) went negative! (-1)

 Message from syslogd@localhost at Thu Jul 12 22:11:53 2007 ...
 localhost kernel:   page pfn = 0

 Message from syslogd@localhost at Thu Jul 12 22:11:53 2007 ...
 localhost kernel:   page->flags = 400

 Message from syslogd@localhost at Thu Jul 12 22:11:53 2007 ...
 localhost kernel:   page->count = 1

 Message from syslogd@localhost at Thu Jul 12 22:11:53 2007 ...
 localhost kernel:   page->mapping = 0000000000000000

 Message from syslogd@localhost at Thu Jul 12 22:11:53 2007 ...
 localhost kernel:   vma->vm_ops = 0xffffffff804b31e0

 Message from syslogd@localhost at Thu Jul 12 22:11:53 2007 ...
 localhost kernel:   vma->vm_ops->nopage = filemap_nopage+0x0/0x2fe

 Message from syslogd@localhost at Thu Jul 12 22:11:53 2007 ...
 localhost kernel:   vma->vm_file->f_op->mmap = generic_file_mmap+0x0/0x3f

 Message from syslogd@localhost at Thu Jul 12 22:11:53 2007 ...
 localhost kernel: ------------[ cut here ]------------

 Message from syslogd@localhost at Thu Jul 12 22:11:53 2007 ...
 localhost kernel: invalid opcode: 0000 [1] SMP

dmesg:

Eeek! page_mapcount(page) went negative! (-1)
  page pfn = 0
  page->flags = 400
  page->count = 1
  page->mapping = 0000000000000000
  vma->vm_ops = 0xffffffff804b31e0
  vma->vm_ops->nopage = filemap_nopage+0x0/0x2fe
  vma->vm_file->f_op->mmap = generic_file_mmap+0x0/0x3f
------------[ cut here ]------------
kernel BUG at mm/rmap.c:596!
invalid opcode: 0000 [1] SMP
CPU 0
Modules linked in: fuse hci_usb rfcomm l2cap bluetooth ppdev parport_pc
lp parport button ac battery cpufreq_powersave cpufreq_stats
cpufreq_userspace cpufreq_ondeman
d cpufreq_conservative ipv6 arc4 ecb ieee80211_crypt_wep powernow_k8
freq_table loop pcmcia snd_hda_intel snd_hda_codec bcm43xx snd_pcm_oss
firmware_class snd_mixer_o
ss ieee80211softmac snd_pcm snd_timer snd soundcore yenta_socket
rsrc_nonstatic tifm_7xx1 ieee80211 ieee80211_crypt psmouse serio_raw
k8temp pcspkr snd_page_alloc i2c
_nforce2 pcmcia_core i2c_core evdev ext3 jbd mbcache sha256 aes cbc
blkcipher dm_crypt ide_generic dm_mirror dm_snapshot dm_mod ide_cd cdrom
ide_disk generic amd74xx
ide_core sata_nv forcedeth ata_generic libata scsi_mod ehci_hcd ohci_hcd
thermal processor fan
Pid: 7253, comm: sh Not tainted 2.6.21 #1
RIP: 0010:[<ffffffff8020ab42>]  [<ffffffff8020ab42>]
page_remove_rmap+0xe4/0x100
RSP: 0000:ffff81001bc83cd8  EFLAGS: 00010296
RAX: 000000000000003b RBX: ffff81003b1f3000 RCX: 0000000000005c7f
RDX: 00000000ffffffff RSI: ffff81001db1c268 RDI: ffffffff804aed3c
RBP: ffff810026749768 R08: ffffffff80595b40 R09: 00039c0000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 000000000804d000
R13: ffff81001db1c268 R14: ffff81000100b800 R15: 00000000080e9000
FS:  00000000f7ef8000(0000) GS:ffffffff804e4000(0000) knlGS:00000000f7e1b6c0
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000008049724 CR3: 000000001cc9d000 CR4: 00000000000006e0
Process sh (pid: 7253, threadinfo ffff81001bc82000, task ffff81001d9b6180)
Stack:  0000000036c92020 ffff81003b1f3000 0000000000000000 ffffffff80207a0a
 0000000000000000 ffff81001bc83dc8 ffffffffffffffff 0000000000000000
 ffff810026749768 ffff81001bc83dd0 00000000003fafff 0000000000000000
Call Trace:
 [<ffffffff80207a0a>] unmap_vmas+0x3dd/0x6fe
 [<ffffffff80235c85>] exit_mmap+0x76/0xeb
 [<ffffffff80237bb1>] mmput+0x28/0x98
 [<ffffffff8021370e>] do_exit+0x212/0x7ef
 [<ffffffff8020a9de>] do_page_fault+0x6f0/0x770
 [<ffffffff8021c7b4>] __dentry_open+0x101/0x1aa
 [<ffffffff802253a4>] do_filp_open+0x2a/0x38
 [<ffffffff8022cfcd>] sys_rt_sigprocmask+0x50/0xce
 [<ffffffff8025b93d>] error_exit+0x0/0x84


Code: 0f 0b eb fe 8b 77 18 41 58 5b 5d 83 e6 01 f7 de 83 c6 04 e9
RIP  [<ffffffff8020ab42>] page_remove_rmap+0xe4/0x100
 RSP <ffff81001bc83cd8>
Fixing recursive fault but reboot is needed!

P.S.
The last one I just got was like this:

Message from syslogd@localhost at Fri Jul 13 07:35:48 2007 ...
localhost kernel: Oops: 0000 [1] SMP 

Message from syslogd@localhost at Fri Jul 13 07:35:48 2007 ...
localhost kernel: CR2: 0000000000000040

Message from syslogd@localhost at Fri Jul 13 07:35:55 2007 ...
localhost kernel: stack segment: 0000 [2] SMP 

,I'll skip dmesg to not to clutter this too much.
Comment 7 Susanna Kaukinen 2007-07-13 00:17:07 UTC
(In reply to comment #5)
> Chris, Vincent,
> Were you able to incorporate the patch and run with it? Have the problem been
> reproduced with the patch in?
> Thanks,
> --Natalie

I'm compiling the kernel (2.6.21) w/the patch. Will report back when I know more.

Susanna.
Comment 8 Edward Hughes IV 2007-07-13 06:22:08 UTC
Subject: Re:  "Eeek! page_mapcount(page) went negative! (-1)"

> Eeek! page_mapcount(page) went negative! (-1)
>   page pfn = 0
>   page->flags = 400

By all means try that patch as Natalie suggests, but I doubt it'll
tell you anything interesting: your page table contains an entry
marked present for page 0 (itself correctly flagged as Reserved).

That means your page table has been corrupted: I don't think the
stack display goes far enough to show the corrupt page table entry
(we can make a patch for that later if this remains a mystery).
Let's suppose that it might be 00000001, a single bit error.

Please run memtest86+ overnight to check your RAM.
Comment 9 Edward Hughes IV 2007-07-13 06:38:47 UTC
Subject: Re:  "Eeek! page_mapcount(page) went negative! (-1)"

On Fri, 13 Jul 2007, bugme-daemon@bugzilla.kernel.org wrote:
> 
> ------- Comment #8 from echughesiv@gmail.com  2007-07-13 06:22 -------
> Subject: Re:  "Eeek! page_mapcount(page) went negative! (-1)"
>... 
> Please run memtest86+ overnight to check your RAM.

I would just like to add that I believe I am
Hugh Dickins <hugh@veritas.com>
despite bugzilla having decided that I am
Edward Hughes IV <echughesiv@gmail.com>

Help!!
Comment 10 Susanna Kaukinen 2007-07-13 08:56:48 UTC
I applied the patch, and after quite a few tries I finally got the kernel compiled. The problem had nothing to do w/the source code. It seems that there's something wrong w/my system, as it starts behaving oddly [1], when I compile without nice +20. 

Anyway, I haven't seen another Eeek! as of yet, altho I did suffer a total freeze just a few minutes ago. I'm suspecting that those are caused by a) kcryptd/LUKS, b) powernowd c) bad hardware or d) an unrelated kernel bug.

[1] Ordinary commands start failing (that won't after reboot), and a freeze sometimes follows soon after - altho more often I reboot the system myself as it has become so unusable (like tar and gcc crashing, etc.)
Comment 11 Susanna Kaukinen 2007-07-13 10:42:58 UTC
And the answer is: c) bad hardware.

Seems that the extra 512 Mb memory didn't last for more than a few months.

memtest: (a gazillion errors)
http://aycu13.webshots.com/image/21292/2002493603451648766_rs.jpg

Thank you all. =)

P.S. No new Eeek!s so far.
Comment 12 Natalie Protasevich 2007-07-13 11:37:44 UTC
Thanks, the bug can be closed now.
Comment 13 Hugh Dickins 2007-07-14 07:47:27 UTC
Subject: Re:  "Eeek! page_mapcount(page) went negative! (-1)"

Sorry, just testing whether I'm Hugh Dickins <hugh@veritas.com> again!
Comment 14 Srdjan Todorovic 2007-09-05 02:35:10 UTC
I have been getting this bug for a while now, and had the bug
yesterday on 2.6.20-16 (kubuntu 7.04). I ran memtest86 overnight
for 15 hours, with no errors.

Other people are reporting this on bugs.launchpad.net

I have noticed that some people (including myself) who reported
this on bugs.launchpad.net load nvidia's driver and taint the kernel.

I will try Nick's patch at some point and if someone can let me
know how to possibly reproduce this, I can rmmod nvidia and try to
trigger.
Comment 15 Nick Piggin 2007-09-05 05:11:12 UTC
We have no idea how it can be reproduced. If you can reproduce it without
nvidia loaded and with my patch installed, it may help.

Note You need to log in before you can comment on or make changes to this bug.