Bug 7358 - Core dumps truncated in 2.6.19-rc1.
Summary: Core dumps truncated in 2.6.19-rc1.
Status: RESOLVED CODE_FIX
Alias: None
Product: Process Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: process_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-12 15:03 UTC by Daniel Jacobowitz
Modified: 2006-10-12 20:54 UTC (History)
0 users

See Also:
Kernel Version: 2.6.19-rc1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Daniel Jacobowitz 2006-10-12 15:03:47 UTC
Most recent kernel where this bug did not occur: 2.6.18-rc4
Distribution: Debian
Hardware Environment: x86_64 SMP (Athlon64 X2)
Problem Description:

Ever since I booted this new kernel, my core dumps have been useless.  They come
out some random size usually between 7K and 20K; I think that all the PT_NOTE
segments are getting written, but absolutely none of the PT_LOAD segments.

(I could reproduce this in qemu too, but I couldn't get qemu to boot a
debuggable kernel, so I haven't had much luck tracking down what's changed.)

The kernel was built from a git tree current as of October 7th.  I don't have
the exact revision any more.
Comment 1 Andrew Morton 2006-10-12 15:35:18 UTC
On Thu, 12 Oct 2006 15:15:53 -0700
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=7358
> 
>            Summary: Core dumps truncated in 2.6.19-rc1.
>     Kernel Version: 2.6.19-rc1
>             Status: NEW
>           Severity: normal
>              Owner: process_other@kernel-bugs.osdl.org
>          Submitter: drow@false.org
> 
> 
> Most recent kernel where this bug did not occur: 2.6.18-rc4
> Distribution: Debian
> Hardware Environment: x86_64 SMP (Athlon64 X2)
> Problem Description:
> 
> Ever since I booted this new kernel, my core dumps have been useless.  They come
> out some random size usually between 7K and 20K; I think that all the PT_NOTE
> segments are getting written, but absolutely none of the PT_LOAD segments.
> 
> (I could reproduce this in qemu too, but I couldn't get qemu to boot a
> debuggable kernel, so I haven't had much luck tracking down what's changed.)
> 
> The kernel was built from a git tree current as of October 7th.  I don't have
> the exact revision any more.
> 

hrm, I'd be suspecting Andi's games with DUMP_SEEK/dump_seek have
gone awry.

Does reverting the below fix it?



Begin forwarded message:

Date: Sun, 1 Oct 2006 08:02:34 GMT
From: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
To: git-commits-head@vger.kernel.org
Subject: [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern


commit d025c9db7f31fc0554ce7fb2dfc78d35a77f3487
tree 5da0a10cbc4b1a5cd5f04d7af2df334352df3728
parent e239ca540594cff00adcce163dc332b27015d8e5
author Andi Kleen <ak@suse.de> 1159684168 -0700
committer Linus Torvalds <torvalds@g5.osdl.org> 1159688373 -0700

[PATCH] Support piping into commands in /proc/sys/kernel/core_pattern

Using the infrastructure created in previous patches implement support to
pipe core dumps into programs.

This is done by overloading the existing core_pattern sysctl
with a new syntax:

|program

When the first character of the pattern is a '|' the kernel will instead
threat the rest of the pattern as a command to run.  The core dump will be
written to the standard input of that program instead of to a file.

This is useful for having automatic core dump analysis without filling up
disks.  The program can do some simple analysis and save only a summary of
the core dump.

The core dump proces will run with the privileges and in the name space of
the process that caused the core dump.

I also increased the core pattern size to 128 bytes so that longer command
lines fit.

Most of the changes comes from allowing core dumps without seeks.  They are
fairly straight forward though.

One small incompatibility is that if someone had a core pattern previously
that started with '|' they will get suddenly new behaviour.  I think that's
unlikely to be a real problem though.

Additional background:

> Very nice, do you happen to have a program that can accept this kind of
> input for crash dumps?  I'm guessing that the embedded people will
> really want this functionality.

I had a cheesy demo/prototype.  Basically it wrote the dump to a file again,
ran gdb on it to get a backtrace and wrote the summary to a shared directory.
Then there was a simple CGI script to generate a "top 10" crashes HTML
listing.

Unfortunately this still had the disadvantage to needing full disk space for a
dump except for deleting it afterwards (in fact it was worse because over the
pipe holes didn't work so if you have a holey address map it would require
more space).

Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
cores (at least it worked with zsh's =(cat core) syntax), so it would be
likely possible to do it without temporary space with a simple wrapper that
calls it in the right way.  I ran out of time before doing that though.

The demo prototype scripts weren't very good.  If there is really interest I
can dig them out (they are currently on a laptop disk on the desk with the
laptop itself being in service), but I would recommend to rewrite them for any
serious application of this and fix the disk space problem.

Also to be really useful it should probably find a way to automatically fetch
the debuginfos (I cheated and just installed them in advance).  If nobody else
does it I can probably do the rewrite myself again at some point.

My hope at some point was that desktops would support it in their builtin
crash reporters, but at least the KDE people I talked too seemed to be happy
with their user space only solution.

Alan sayeth:

  I don't believe that piping as such as neccessarily the right model, but
  the ability to intercept and processes core dumps from user space is asked
  for by many enterprise users as well.  They want to know about, capture,
  analyse and process core dumps, often centrally and in automated form.

[akpm@osdl.org: loff_t != unsigned long]
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

 fs/binfmt_elf.c |   77 ++++++++++++++++++++++++++++++++------------------------
 fs/exec.c       |   23 +++++++++++++---
 kernel/kmod.c   |    4 ++
 kernel/sysctl.c |    2 -
 4 files changed, 68 insertions(+), 38 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index bad5243..06435f3 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1151,11 +1151,23 @@ static int dump_write(struct file *file,
 
 static int dump_seek(struct file *file, loff_t off)
 {
-	if (file->f_op->llseek) {
-		if (file->f_op->llseek(file, off, 0) != off)
+	if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
+		if (file->f_op->llseek(file, off, 1) != off)
 			return 0;
-	} else
-		file->f_pos = off;
+	} else {
+		char *buf = (char *)get_zeroed_page(GFP_KERNEL);
+		if (!buf)
+			return 0;
+		while (off > 0) {
+			unsigned long n = off;
+			if (n > PAGE_SIZE)
+				n = PAGE_SIZE;
+			if (!dump_write(file, buf, n))
+				return 0;
+			off -= n;
+		}
+		free_page((unsigned long)buf);
+	}
 	return 1;
 }
 
@@ -1203,30 +1215,35 @@ static int notesize(struct memelfnote *e
 	return sz;
 }
 
-#define DUMP_WRITE(addr, nr)	\
-	do { if (!dump_write(file, (addr), (nr))) return 0; } while(0)
-#define DUMP_SEEK(off)	\
-	do { if (!dump_seek(file, (off))) return 0; } while(0)
+#define DUMP_WRITE(addr, nr, foffset)	\
+	do { if (!dump_write(file, (addr), (nr))) return 0; *foffset += (nr); } while(0)
 
-static int writenote(struct memelfnote *men, struct file *file)
+static int alignfile(struct file *file, loff_t *foffset)
 {
-	struct elf_note en;
+	char buf[4] = { 0, };
+	DUMP_WRITE(buf, roundup(*foffset, 4) - *foffset, foffset);
+	return 1;
+}
 
+static int writenote(struct memelfnote *men, struct file *file,
+			loff_t *foffset)
+{
+	struct elf_note en;
 	en.n_namesz = strlen(men->name) + 1;
 	en.n_descsz = men->datasz;
 	en.n_type = men->type;
 
-	DUMP_WRITE(&en, sizeof(en));
-	DUMP_WRITE(men->name, en.n_namesz);
-	/* XXX - cast from long long to long to avoid need for libgcc.a */
-	DUMP_SEEK(roundup((unsigned long)file->f_pos, 4));	/* XXX */
-	DUMP_WRITE(men->data, men->datasz);
-	DUMP_SEEK(roundup((unsigned long)file->f_pos, 4));	/* XXX */
+	DUMP_WRITE(&en, sizeof(en), foffset);
+	DUMP_WRITE(men->name, en.n_namesz, foffset);
+	if (!alignfile(file, foffset))
+		return 0;
+	DUMP_WRITE(men->data, men->datasz, foffset);
+	if (!alignfile(file, foffset))
+		return 0;
 
 	return 1;
 }
 #undef DUMP_WRITE
-#undef DUMP_SEEK
 
 #define DUMP_WRITE(addr, nr)	\
 	if ((size += (nr)) > limit || !dump_write(file, (addr), (nr))) \
@@ -1426,7 +1443,7 @@ #define	NUM_NOTES	6
 	int i;
 	struct vm_area_struct *vma;
 	struct elfhdr *elf = NULL;
-	loff_t offset = 0, dataoff;
+	loff_t offset = 0, dataoff, foffset;
 	unsigned long limit = current->signal->rlim[RLIMIT_CORE].rlim_cur;
 	int numnote;
 	struct memelfnote *notes = NULL;
@@ -1569,7 +1586,8 @@ #endif	
 		DUMP_WRITE(&phdr, sizeof(phdr));
 	}
 
-	/* Page-align dumped data */
+	foffset = offset;
+
 	dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
 
 	/* Write program headers for segments dump */
@@ -1594,6 +1612,7 @@ #endif	
 		phdr.p_align = ELF_EXEC_PAGESIZE;
 
 		DUMP_WRITE(&phdr, sizeof(phdr));
+		foffset += sizeof(phdr);
 	}
 
 #ifdef ELF_CORE_WRITE_EXTRA_PHDRS
@@ -1602,7 +1621,7 @@ #endif
 
  	/* write out the notes section */
 	for (i = 0; i < numnote; i++)
-		if (!writenote(notes + i, file))
+		if (!writenote(notes + i, file, &foffset))
 			goto end_coredump;
 
 	/* write out the thread status notes section */
@@ -1611,11 +1630,12 @@ #endif
 				list_entry(t, struct elf_thread_status, list);
 
 		for (i = 0; i < tmp->num_notes; i++)
-			if (!writenote(&tmp->notes[i], file))
+			if (!writenote(&tmp->notes[i], file, &foffset))
 				goto end_coredump;
 	}
- 
-	DUMP_SEEK(dataoff);
+
+	/* Align to page */
+	DUMP_SEEK(dataoff - foffset);
 
 	for (vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) {
 		unsigned long addr;
@@ -1631,10 +1651,10 @@ #endif
 
 			if (get_user_pages(current, current->mm, addr, 1, 0, 1,
 						&page, &vma) <= 0) {
-				DUMP_SEEK(file->f_pos + PAGE_SIZE);
+				DUMP_SEEK(PAGE_SIZE);
 			} else {
 				if (page == ZERO_PAGE(addr)) {
-					DUMP_SEEK(file->f_pos + PAGE_SIZE);
+					DUMP_SEEK(PAGE_SIZE);
 				} else {
 					void *kaddr;
 					flush_cache_page(vma, addr,
@@ -1658,13 +1678,6 @@ #ifdef ELF_CORE_WRITE_EXTRA_DATA
 	ELF_CORE_WRITE_EXTRA_DATA;
 #endif
 
-	if (file->f_pos != offset) {
-		/* Sanity check */
-		printk(KERN_WARNING
-		       "elf_core_dump: file->f_pos (%Ld) != offset (%Ld)\n",
-		       file->f_pos, offset);
-	}
-
 end_coredump:
 	set_fs(fs);
 
diff --git a/fs/exec.c b/fs/exec.c
index 0db3fc3..6270f8f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -58,7 +58,7 @@ #include <linux/kmod.h>
 #endif
 
 int core_uses_pid;
-char core_pattern[65] = "core";
+char core_pattern[128] = "core";
 int suid_dumpable = 0;
 
 EXPORT_SYMBOL(suid_dumpable);
@@ -1463,6 +1463,7 @@ int do_coredump(long signr, int exit_cod
 	int retval = 0;
 	int fsuid = current->fsuid;
 	int flag = 0;
+	int ispipe = 0;
 
 	binfmt = current->binfmt;
 	if (!binfmt || !binfmt->core_dump)
@@ -1504,22 +1505,34 @@ int do_coredump(long signr, int exit_cod
  	lock_kernel();
 	format_corename(corename, core_pattern, signr);
 	unlock_kernel();
-	file = filp_open(corename, O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag, 0600);
+ 	if (corename[0] == '|') {
+		/* SIGPIPE can happen, but it's just never processed */
+ 		if(call_usermodehelper_pipe(corename+1, NULL, NULL, &file)) {
+ 			printk(KERN_INFO "Core dump to %s pipe failed\n",
+			       corename);
+ 			goto fail_unlock;
+ 		}
+		ispipe = 1;
+ 	} else
+ 		file = filp_open(corename,
+				 O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE, 0600);
 	if (IS_ERR(file))
 		goto fail_unlock;
 	inode = file->f_dentry->d_inode;
 	if (inode->i_nlink > 1)
 		goto close_fail;	/* multiple links - don't dump */
-	if (d_unhashed(file->f_dentry))
+	if (!ispipe && d_unhashed(file->f_dentry))
 		goto close_fail;
 
-	if (!S_ISREG(inode->i_mode))
+	/* AK: actually i see no reason to not allow this for named pipes etc.,
+	   but keep the previous behaviour for now. */
+	if (!ispipe && !S_ISREG(inode->i_mode))
 		goto close_fail;
 	if (!file->f_op)
 		goto close_fail;
 	if (!file->f_op->write)
 		goto close_fail;
-	if (do_truncate(file->f_dentry, 0, 0, file) != 0)
+	if (!ispipe && do_truncate(file->f_dentry, 0, 0, file) != 0)
 		goto close_fail;
 
 	retval = binfmt->core_dump(signr, regs, file);
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 5c63c53..f8121b9 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -35,6 +35,7 @@ #include <linux/security.h>
 #include <linux/mount.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
+#include <linux/resource.h>
 #include <asm/uaccess.h>
 
 extern int max_threads;
@@ -158,6 +159,9 @@ static int ____call_usermodehelper(void 
 		FD_SET(0, fdt->open_fds);
 		FD_CLR(0, fdt->close_on_exec);
 		spin_unlock(&f->file_lock);
+
+		/* and disallow core files too */
+		current->signal->rlim[RLIMIT_CORE] = (struct rlimit){0, 0};
 	}
 
 	/* We can run anywhere, unlike our parent keventd(). */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c57c453..ba42694 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -294,7 +294,7 @@ static ctl_table kern_table[] = {
 		.ctl_name	= KERN_CORE_PATTERN,
 		.procname	= "core_pattern",
 		.data		= core_pattern,
-		.maxlen		= 64,
+		.maxlen		= 128,
 		.mode		= 0644,
 		.proc_handler	= &proc_dostring,
 		.strategy	= &sysctl_string,
-
To unsubscribe from this list: send the line "unsubscribe git-commits-head" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comment 2 Daniel Jacobowitz 2006-10-12 17:30:04 UTC
On Thu, Oct 12, 2006 at 03:46:56PM -0700, Andrew Morton wrote:
> hrm, I'd be suspecting Andi's games with DUMP_SEEK/dump_seek have
> gone awry.
> 
> Does reverting the below fix it?

Well, I can only test in qemu right now, but an ash core dump goes from
2K to 1.7MB when I reverse this patch.  I'd say it's the problem.

> Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
> cores (at least it worked with zsh's =(cat core) syntax), so it would be
> likely possible to do it without temporary space with a simple wrapper that
> calls it in the right way.  I ran out of time before doing that though.

I admit I'm very surprised; GDB will definitely seek in the core file.

I'm just guessing, but:

> @@ -1569,7 +1586,8 @@ #endif	
>  		DUMP_WRITE(&phdr, sizeof(phdr));
>  	}
>  
> -	/* Page-align dumped data */
> +	foffset = offset;
> +
>  	dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
>  
>  	/* Write program headers for segments dump */

At this point foffset is wrong.  offset already includes space for all
the program headers, but only one has been written.

> @@ -1594,6 +1612,7 @@ #endif	
>  		phdr.p_align = ELF_EXEC_PAGESIZE;
>  
>  		DUMP_WRITE(&phdr, sizeof(phdr));
> +		foffset += sizeof(phdr);
>  	}
>  
>  #ifdef ELF_CORE_WRITE_EXTRA_PHDRS

So this bit counts every phdr twice.

This seems terribly fragile.  Can't we use f_pos instead?

Comment 3 Andrew Morton 2006-10-12 17:43:23 UTC
On Thu, 12 Oct 2006 20:41:43 -0400
Daniel Jacobowitz <drow@false.org> wrote:

> On Thu, Oct 12, 2006 at 03:46:56PM -0700, Andrew Morton wrote:
> > hrm, I'd be suspecting Andi's games with DUMP_SEEK/dump_seek have
> > gone awry.
> > 
> > Does reverting the below fix it?
> 
> Well, I can only test in qemu right now, but an ash core dump goes from
> 2K to 1.7MB when I reverse this patch.  I'd say it's the problem.
> 
> > Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
> > cores (at least it worked with zsh's =(cat core) syntax), so it would be
> > likely possible to do it without temporary space with a simple wrapper that
> > calls it in the right way.  I ran out of time before doing that though.
> 
> I admit I'm very surprised; GDB will definitely seek in the core file.
> 
> I'm just guessing, but:
> 
> > @@ -1569,7 +1586,8 @@ #endif	
> >  		DUMP_WRITE(&phdr, sizeof(phdr));
> >  	}
> >  
> > -	/* Page-align dumped data */
> > +	foffset = offset;
> > +
> >  	dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
> >  
> >  	/* Write program headers for segments dump */
> 
> At this point foffset is wrong.  offset already includes space for all
> the program headers, but only one has been written.
> 
> > @@ -1594,6 +1612,7 @@ #endif	
> >  		phdr.p_align = ELF_EXEC_PAGESIZE;
> >  
> >  		DUMP_WRITE(&phdr, sizeof(phdr));
> > +		foffset += sizeof(phdr);
> >  	}
> >  
> >  #ifdef ELF_CORE_WRITE_EXTRA_PHDRS
> 
> So this bit counts every phdr twice.
> 
> This seems terribly fragile.  Can't we use f_pos instead?

You'd think that somebody would have noticed that coredumps are busted by
now..

Andi seems to be offline for a bit and I'm not sure that we want to tinker
with this too much.  I'll queue up a revert patch, will send it in a week
or so if nothing magical happens.


Comment 4 Andrew Morton 2006-10-12 20:54:21 UTC
Petr sent a fix.

Note You need to log in before you can comment on or make changes to this bug.