Bug 26582
Summary: | NULL pointer dereference on pipe creation | ||
---|---|---|---|
Product: | File System | Reporter: | Ferenc Wágner (wferi) |
Component: | Other | Assignee: | fs_other |
Status: | CLOSED UNREPRODUCIBLE | ||
Severity: | normal | CC: | akpm, cebbert, florian, maciej.rutecki, rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.37 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 21782 | ||
Attachments: |
screenshot of the BUG
screenshot of another BUG lspci -v output kernel config |
Description
Ferenc Wágner
2011-01-12 13:30:31 UTC
Created attachment 43322 [details]
screenshot of the BUG
The bug reproduced itself (at least I guess it's the same thing), this time the computer crashed while I was simply typing. However, updatedb was running in the background, causing some IO load. The text is rather different: BUG: unable to handle kernel paging request at ffffffd0 IP: [<c10d8c9a>] dcache_readdir+0x11a/0x1cb [...] Pid: 8196, comm: updatedb.mlocat Not tainted 2.6.37+ #5 1834S5G/1834S5G [...] Call Trace: [<c10ce628>] ? filldir64+0x0/0xcc [<c10ce8d6>] ? vfs_readdir+0x65/0x8f [<c10ce628>] ? filldir64+0x0/0xcc [<c10ce966>] ? sys_getdents64+0x66/0xa5 [<c100309f>] ? sysenter_do_call+0x12/0x28 I'm mostly using XFS, in case that matters. Created attachment 43512 [details]
screenshot of another BUG
I found out that my home filesystem is rather corrupt, even xfs_repair hangs on it. Can that possibly cause these BUGs? hmm.. data corruption... whom to cc? you might want to consider posting to lkml with a subject mentioning the data corruption... what commits are you running? I'm running three commits over 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5, that is, 2.6.37. Two commits are corrections to textual typos, the third decreases the suspend and hibernation test delays to 1 second. So basically I'm running vanilla 2.6.37. Do you mean in-memory data corruption? Do you think bug 26042 may be something similar? I can think of two possible causes: 1. the infamous cache-consistency problems with the i915 driver on 855GM chipsets (this is what I'm running, without fully understanding the issue: https://bugs.freedesktop.org//show_bug.cgi?id=26345) 2. Something related to suspend/hibernate, with which I'm also experiencing problems recently, cf. bug 27062 (console corruption) or bug 27372 (instant resume). Also, I've been avoiding suspend/hibernate for a couple of days now (shutting down my laptop instead), and I experienced no crashes since then. It's a short period only yet, but maybe a hint... And Rafael is on Cc already. :) Thanks, Feri. Since I gave up on suspending/hibernating (10 days ago), the issue hasn't come back. This is suggestive, although I'm not too keen on reproducing it (that would again result in unclean shutdowns and possibly further data loss). I'll try to reproduce it with read-only filesystems, though, in conjunction with hibernating. We're seeing many reports of memory corruption caused by hibernation lately. Which most likely is a red herring, because the hibernate core code hasn't changed recently _at_ _all_. Ferenc, what do you do to hibernate the system? Do you use s2disk? I used (hand compiled, modern) s2disk under 2.6.36 for quite some time, but that got deactivated by the dist-upgrade to Debian squeeze. The upgrade didn't change the running kernel, but brought very unstable hibernate functionality (like frequent reboot instead of resume). So I went back to s2disk, but that didn't change anything, the problems remained. Then I dropped s2disk again, and upgraded to 2.6.37, which introduced these memory corruption problems. Since then I've been running without s2disk, ie. pm-utils on top of the in-kernel hibernation method. Now, since the release of 2.6.38-rc6 I've been running that, went back to my normal suspend/hibernation usage, and things seem to run much better, maybe even perfect (there might have been a single freeze during pre-allocating memory for hibernation, but I didn't have time to really assess the situation as I had to switch off the machine prompty to safely get off the train...) So there's a good chance I'll declare this issue fixed in 2.6.38-rc6, if I experience no crash in the following weeks. However, my current minimal 2.6.38-rc6 config is radically different to the previous 2.6.37 distro config, which may also hide things. Oh well. I can't tell whether this is the result of a memory corruption or some other bug: it was triggered when I tried to list the root directory of my pendrive right after resuming from a suspend to RAM. "Of course" the pendrive was already gone by that time, because this minimal kernel does not support USB persistence across suspends, so I/O errors would have been expected (and they duly appeared when I tried to reproduce the BUG), but a BUG and a task blocked in D state looks like a, well, bug... Which I again failed to reproduce. Mar 3 03:15:45 szonett kernel: [60321.624061] PM: Finishing wakeup. Mar 3 03:15:45 szonett kernel: [60321.624063] Restarting tasks ... Mar 3 03:15:45 szonett kernel: [60321.624252] hub 1-0:1.0: over-current change on port 3 Mar 3 03:15:45 szonett kernel: [60321.632440] done. Mar 3 03:15:45 szonett kernel: [60321.632476] video LNXVIDEO:00: Restoring backlight state Mar 3 03:15:45 szonett kernel: [60321.728093] usb 1-4: USB disconnect, address 14 Mar 3 03:15:45 szonett kernel: [60321.728103] PM: Removing info for No Bus:ep_81 Mar 3 03:15:45 szonett kernel: [60321.728149] PM: Removing info for No Bus:ep_02 Mar 3 03:15:45 szonett kernel: [60321.728186] PM: Removing info for usb:1-4:1.0 Mar 3 03:15:45 szonett kernel: [60321.728229] PM: Removing info for No Bus:6:0:0:0 Mar 3 03:15:45 szonett kernel: [60321.728331] PM: Removing info for scsi:6:0:0:0 Mar 3 03:15:45 szonett NetworkManager[1131]: <info> (wlan): supplicant interface state: starting -> ready Mar 3 03:15:45 szonett NetworkManager[1131]: <info> (wlan): device state change: 2 -> 3 (reason 42) Mar 3 03:15:45 szonett acpid: 2 total rules matched Mar 3 03:15:45 szonett acpid: completed input layer event "button/sleep SBTN 00000080 00000000" Mar 3 03:15:45 szonett acpid: client 1172[0:0] has disconnected Mar 3 03:15:45 szonett acpid: received netlink event "battery PNP0C0A:00 00000081 00000001" Mar 3 03:15:45 szonett acpid: rule from /etc/acpi/events/battery matched Mar 3 03:15:45 szonett acpid: executing action "/etc/acpi/power.sh" Mar 3 03:15:45 szonett kernel: [60321.728375] PM: Removing info for No Bus:6:0:0:0 Mar 3 03:15:45 szonett kernel: [60321.729659] PM: Removing info for No Bus:sdb1 Mar 3 03:15:45 szonett kernel: [60321.730088] PM: Removing info for No Bus:8:16 Mar 3 03:15:45 szonett kernel: [60321.730245] PM: Removing info for No Bus:sdb Mar 3 03:15:45 szonett kernel: [60321.730439] PM: Removing info for No Bus:host6 Mar 3 03:15:45 szonett kernel: [60321.730548] PM: Removing info for scsi:host6 Mar 3 03:15:45 szonett kernel: [60321.731651] PM: Removing info for No Bus:ep_00 Mar 3 03:15:45 szonett kernel: [60321.731697] PM: Removing info for usb:1-4 Mar 3 03:15:45 szonett kernel: [60321.743957] ADDRCONF(NETDEV_UP): wlan: link is not ready Mar 3 03:15:45 szonett kernel: [60321.968108] usb 1-4: new high speed USB device using ehci_hcd and address 15 Mar 3 03:15:45 szonett kernel: [60322.045780] ADDRCONF(NETDEV_UP): utp: link is not ready Mar 3 03:15:46 szonett kernel: [60322.107332] PM: Adding info for usb:1-4 Mar 3 03:15:46 szonett kernel: [60322.108363] PM: Adding info for usb:1-4:1.0 Mar 3 03:15:46 szonett kernel: [60322.110628] scsi7 : usb-storage 1-4:1.0 Mar 3 03:15:46 szonett kernel: [60322.110678] PM: Adding info for scsi:host7 Mar 3 03:15:46 szonett kernel: [60322.110816] PM: Adding info for No Bus:host7 Mar 3 03:15:46 szonett kernel: [60322.111933] PM: Adding info for No Bus:ep_81 Mar 3 03:15:46 szonett kernel: [60322.111969] PM: Adding info for No Bus:ep_02 Mar 3 03:15:46 szonett kernel: [60322.112029] PM: Adding info for No Bus:ep_00 Mar 3 03:15:46 szonett kernel: [60322.112054] hub 3-0:1.0: over-current change on port 1 Mar 3 03:15:46 szonett kernel: [60322.216032] hub 3-0:1.0: over-current change on port 2 Mar 3 03:15:46 szonett kernel: [60323.113119] scsi 7:0:0:0: Direct-Access Generic USB Flash Disk 1.00 PQ: 0 ANSI: 2 Mar 3 03:15:46 szonett kernel: [60323.113182] PM: Adding info for scsi:target7:0:0 Mar 3 03:15:46 szonett kernel: [60323.113291] PM: Adding info for scsi:7:0:0:0 Mar 3 03:15:46 szonett kernel: [60323.113405] PM: Adding info for No Bus:7:0:0:0 Mar 3 03:15:46 szonett kernel: [60323.113484] PM: Adding info for No Bus:7:0:0:0 Mar 3 03:15:46 szonett kernel: [60323.117792] sd 7:0:0:0: [sdc] 3994624 512-byte logical blocks: (2.04 GB/1.90 GiB) Mar 3 03:15:46 szonett kernel: [60323.118649] sd 7:0:0:0: [sdc] Write Protect is off Mar 3 03:15:46 szonett kernel: [60323.118655] sd 7:0:0:0: [sdc] Mode Sense: 03 00 00 00 Mar 3 03:15:46 szonett kernel: [60323.119449] sd 7:0:0:0: [sdc] No Caching mode page present Mar 3 03:15:46 szonett kernel: [60323.119453] sd 7:0:0:0: [sdc] Assuming drive cache: write through Mar 3 03:15:46 szonett kernel: [60323.119510] PM: Adding info for No Bus:8:32 Mar 3 03:15:46 szonett kernel: [60323.119638] PM: Adding info for No Bus:sdc Mar 3 03:15:46 szonett kernel: [60323.125965] sd 7:0:0:0: [sdc] No Caching mode page present Mar 3 03:15:46 szonett kernel: [60323.125974] sd 7:0:0:0: [sdc] Assuming drive cache: write through Mar 3 03:15:46 szonett kernel: [60323.126574] sdc: sdc1 Mar 3 03:15:46 szonett kernel: [60323.126671] PM: Adding info for No Bus:sdc1 Mar 3 03:15:46 szonett kernel: [60323.132243] sd 7:0:0:0: [sdc] No Caching mode page present Mar 3 03:15:46 szonett kernel: [60323.132250] sd 7:0:0:0: [sdc] Assuming drive cache: write through Mar 3 03:15:46 szonett kernel: [60323.132256] sd 7:0:0:0: [sdc] Attached SCSI removable disk Mar 3 03:15:46 szonett acpid: action exited with status 0 Mar 3 03:15:46 szonett acpid: 1 total rule matched Mar 3 03:15:46 szonett acpid: completed netlink event "battery PNP0C0A:00 00000081 00000001" Mar 3 03:15:46 szonett acpid: client connected from 1172[0:0] Mar 3 03:15:46 szonett acpid: 1 client rule loaded Mar 3 03:15:55 szonett kernel: [60332.217444] BUG: unable to handle kernel NULL pointer dereference at 00000010 Mar 3 03:15:55 szonett kernel: [60332.217542] IP: [<c10ae967>] __mark_inode_dirty+0xa7/0x1b0 Mar 3 03:15:55 szonett kernel: [60332.217606] *pde = 00000000 Mar 3 03:15:55 szonett kernel: [60332.217639] Oops: 0000 [#1] Mar 3 03:15:55 szonett kernel: [60332.217671] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:02:02.0/rf_kill Mar 3 03:15:55 szonett kernel: [60332.217745] Mar 3 03:15:55 szonett kernel: [60332.217764] Pid: 30735, comm: ls Not tainted 2.6.38-rc6+ #38 IBM 1834S5G/1834S5G Mar 3 03:15:55 szonett kernel: [60332.217852] EIP: 0060:[<c10ae967>] EFLAGS: 00210202 CPU: 0 Mar 3 03:15:55 szonett kernel: [60332.217905] EIP is at __mark_inode_dirty+0xa7/0x1b0 Mar 3 03:15:55 szonett kernel: [60332.217950] EAX: 00000000 EBX: d0201594 ECX: 00004000 EDX: c14afd2c Mar 3 03:15:55 szonett kernel: [60332.218006] ESI: d47b9180 EDI: 00000000 EBP: 00000000 ESP: c0bf3f38 Mar 3 03:15:55 szonett kernel: [60332.218063] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Mar 3 03:15:55 szonett kernel: [60332.218113] Process ls (pid: 30735, ti=c0bf2000 task=c05a6e60 task.ti=c0bf2000) Mar 3 03:15:55 szonett kernel: [60332.218177] Stack: Mar 3 03:15:55 szonett kernel: [60332.218197] 00000000 c0bf3f90 c109f390 d0201594 de94c960 d47b9180 d0201594 dc78b180 Mar 3 03:15:55 szonett kernel: [60332.218297] 4d6ef9db c10a4842 00000000 c5c13e40 c0bf3f90 d0201594 c109f6c7 c109f390 Mar 3 03:15:55 szonett kernel: [60332.218396] d02015ac c5c13e40 00000000 00008000 00000000 c109f73e 080baf88 080baf60 Mar 3 03:15:55 szonett kernel: [60332.218497] Call Trace: Mar 3 03:15:55 szonett kernel: [60332.218525] [<c109f390>] ? filldir64+0x0/0xf0 Mar 3 03:15:55 szonett kernel: [60332.218568] [<c10a4842>] ? touch_atime+0x102/0x150 Mar 3 03:15:55 szonett kernel: [60332.218614] [<c109f6c7>] ? vfs_readdir+0x97/0xa0 Mar 3 03:15:55 szonett kernel: [60332.218658] [<c109f390>] ? filldir64+0x0/0xf0 Mar 3 03:15:55 szonett kernel: [60332.218700] [<c109f73e>] ? sys_getdents64+0x6e/0xd0 Mar 3 03:15:55 szonett kernel: [60332.218748] [<c1002a30>] ? sysenter_do_call+0x12/0x26 Mar 3 03:15:55 szonett kernel: [60332.218794] Code: 83 c4 24 c3 a8 07 75 ec 8b 73 10 ba 2c fd 4a c1 8b 46 1c 8b 00 e8 ba 6f 0c 00 85 c0 0f 85 82 00 00 00 8b 83 ac 00 00 00 8b 40 38 <f6> 40 10 02 8d b0 a0 00 00 00 75 1a 8b 50 0c f6 c2 10 0f 84 b9 Mar 3 03:15:55 szonett kernel: [60332.219223] EIP: [<c10ae967>] __mark_inode_dirty+0xa7/0x1b0 SS:ESP 0068:c0bf3f38 Mar 3 03:15:55 szonett kernel: [60332.219300] CR2: 0000000000000010 Mar 3 03:15:55 szonett kernel: [60332.233190] ---[ end trace d3e609750be29482 ]--- Mar 3 03:17:01 szonett /USR/SBIN/CRON[30751]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Mar 3 03:17:09 szonett ntpd[1292]: Deleting interface #18 wlan, fe80::213:ceff:fe70:6e59#123, interface stats: received=0, sent=0, dropped=0, active_time=600 secs Mar 3 03:17:19 szonett kernel: [60416.216959] SysRq : Emergency Sync Mar 3 03:17:19 szonett kernel: [60416.217529] Emergency Sync complete Mar 3 03:17:27 szonett acpid: client 1172[0:0] has disconnected Mar 3 03:17:39 szonett acpid: client connected from 1172[0:0] Mar 3 03:17:39 szonett acpid: 1 client rule loaded Mar 3 03:18:31 szonett acpid: client 1172[0:0] has disconnected Mar 3 03:19:07 szonett kernel: [60523.737188] SysRq : Show Blocked State Mar 3 03:19:07 szonett kernel: [60523.738817] task PC stack pid father Mar 3 03:19:07 szonett kernel: [60523.740470] bash D 1760c067 0 30789 30788 0x00000000 Mar 3 03:19:07 szonett kernel: [60523.741176] c66c3f4c 00200086 00000000 1760c067 cf2c34c8 0956b000 c107c385 00000000 Mar 3 03:19:07 szonett kernel: [60523.741176] 00000000 c66c3efc 00000000 c66c3f4c 00000004 c05a6120 c05a6120 d366f094 Mar 3 03:19:07 szonett kernel: [60523.741176] d366f094 0956b000 cf621a00 d366f094 cf2c34c8 c107d184 c36395ac d366f094 Mar 3 03:19:07 szonett kernel: [60523.741176] Call Trace: Mar 3 03:19:07 szonett kernel: [60523.741176] [<c107c385>] ? handle_pte_fault+0x495/0x560 Mar 3 03:19:07 szonett kernel: [60523.741176] [<c107d184>] ? handle_mm_fault+0x84/0xb0 Mar 3 03:19:07 szonett kernel: [60523.741176] [<c101cf10>] ? do_page_fault+0x0/0x430 Mar 3 03:19:07 szonett kernel: [60523.741176] [<c101d08d>] ? do_page_fault+0x17d/0x430 Mar 3 03:19:07 szonett kernel: [60523.741176] [<c13de1d8>] ? __mutex_lock_killable_slowpath+0x58/0xb0 Mar 3 03:19:07 szonett kernel: [60523.741176] [<c109f686>] ? vfs_readdir+0x56/0xa0 Mar 3 03:19:07 szonett kernel: [60523.741176] [<c109f390>] ? filldir64+0x0/0xf0 Mar 3 03:19:07 szonett kernel: [60523.741176] [<c109f73e>] ? sys_getdents64+0x6e/0xd0 Mar 3 03:19:07 szonett kernel: [60523.741176] [<c1002a30>] ? sysenter_do_call+0x12/0x26 Mar 3 03:19:07 szonett kernel: [60523.741176] Sched Debug Version: v0.10, 2.6.38-rc6+ #38 Mar 3 03:19:07 szonett kernel: [60523.741176] ktime : 60523763.309469 Mar 3 03:19:07 szonett kernel: [60523.741176] sched_clk : 202463.085204 Mar 3 03:19:07 szonett kernel: [60523.741176] cpu_clk : 60523741.176770 Mar 3 03:19:07 szonett kernel: [60523.741176] jiffies : 15055934 Mar 3 03:19:07 szonett kernel: [60523.741176] sched_clock_stable : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] Mar 3 03:19:07 szonett kernel: [60523.741176] sysctl_sched Mar 3 03:19:07 szonett kernel: [60523.741176] .sysctl_sched_latency : 6.000000 Mar 3 03:19:07 szonett kernel: [60523.741176] .sysctl_sched_min_granularity : 0.750000 Mar 3 03:19:07 szonett kernel: [60523.741176] .sysctl_sched_wakeup_granularity : 1.000000 Mar 3 03:19:07 szonett kernel: [60523.741176] .sysctl_sched_child_runs_first : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] .sysctl_sched_features : 7279 Mar 3 03:19:07 szonett kernel: [60523.741176] .sysctl_sched_tunable_scaling : 1 (logaritmic) Mar 3 03:19:07 szonett kernel: [60523.741176] Mar 3 03:19:07 szonett kernel: [60523.741176] cpu#0, 1398.763 MHz Mar 3 03:19:07 szonett kernel: [60523.741176] .nr_running : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] .load : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] .nr_switches : 8755477 Mar 3 03:19:07 szonett kernel: [60523.741176] .nr_load_updates : 1392836 Mar 3 03:19:07 szonett kernel: [60523.741176] .nr_uninterruptible : 1 Mar 3 03:19:07 szonett kernel: [60523.741176] .next_balance : 0.000000 Mar 3 03:19:07 szonett kernel: [60523.741176] .curr->pid : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] .clock : 60523720.507334 Mar 3 03:19:07 szonett kernel: [60523.741176] .cpu_load[0] : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] .cpu_load[1] : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] .cpu_load[2] : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] .cpu_load[3] : 5 Mar 3 03:19:07 szonett kernel: [60523.741176] .cpu_load[4] : 13 Mar 3 03:19:07 szonett kernel: [60523.741176] Mar 3 03:19:07 szonett kernel: [60523.741176] cfs_rq[0]: Mar 3 03:19:07 szonett kernel: [60523.741176] .exec_clock : 0.000000 Mar 3 03:19:07 szonett kernel: [60523.741176] .MIN_vruntime : 0.000001 Mar 3 03:19:07 szonett kernel: [60523.741176] .min_vruntime : 3658294.338690 Mar 3 03:19:07 szonett kernel: [60523.741176] .max_vruntime : 0.000001 Mar 3 03:19:07 szonett kernel: [60523.741176] .spread : 0.000000 Mar 3 03:19:07 szonett kernel: [60523.741176] .spread0 : 0.000000 Mar 3 03:19:07 szonett kernel: [60523.741176] .nr_spread_over : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] .nr_running : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] .load : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] Mar 3 03:19:07 szonett kernel: [60523.741176] rt_rq[0]: Mar 3 03:19:07 szonett kernel: [60523.741176] .rt_nr_running : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] .rt_throttled : 0 Mar 3 03:19:07 szonett kernel: [60523.741176] .rt_time : 0.011384 Mar 3 03:19:07 szonett kernel: [60523.741176] .rt_runtime : 950.000000 Mar 3 03:19:07 szonett kernel: [60523.741176] Mar 3 03:19:07 szonett kernel: [60523.741176] runnable tasks: Mar 3 03:19:07 szonett kernel: [60523.741176] task PID tree-key switches prio exec-runtime sum-exec sum-sleep Mar 3 03:19:07 szonett kernel: [60523.741176] ----------------------------------------------------------------------------------------------------- Do you have an monitor attached to vga (I'm thinking about https://patchwork.kernel.org/patch/601271/, but it says 965gm beeing affected.. but still, going crazy on that vga cable might be a welcomed change in your testing routine? :) ) Can you post lspci -v and attach your .config? Created attachment 50112 [details]
lspci -v output
Created attachment 50122 [details]
kernel config
When the BUG fired, there was no monitor connected to the VGA output of this laptop. I seldom use an external display, so probably that kernel didn't see one in its whole life. The kernel config in attachment 50122 [details] differs slightly from the one exhibiting the bug (USB_SUSPEND was not enabled at that time), but that's probably all to it.
(In reply to comment #9) > Which most likely is a red herring, because the hibernate core code hasn't > changed recently _at_ _all_. Still, I see many reports of hibernation causing memory (and filesystem) corruption, starting around 2.6.35 or so, maybe even earlier... https://bugzilla.redhat.com/show_bug.cgi?id=658391 https://bugzilla.redhat.com/show_bug.cgi?id=603897 https://bugzilla.redhat.com/show_bug.cgi?id=669223 https://bugzilla.redhat.com/show_bug.cgi?id=678486 Ferenc, are you still seeing this in 2.6.38.y? Not with my current minimal monolithic 2.6.38.3. I'm switching back to the Debian sid kernel (2.6.38-3) now. The 2.6.38-2 version (also based on 2.6.38.2) failed to resume for me in short course during previous testing, but that may have been a different issue. I couldn't pursue the problem at that time, but will take a close look now. |