Bug 17281

Summary: [lxc] gdb cannot debug bash properly in an LXC container
Product: Process Management Reporter: Robin Green (greenrd)
Component: OtherAssignee: process_other
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, florian, fweisbec, lizf, menage
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35.4 Subsystem:
Regression: Yes Bisected commit-id:

Description Robin Green 2010-08-29 09:35:47 UTC
(I couldn't find a kernel bugzilla component for LXC bugs, so I picked something that sounded vaguely related.)

gdb is unable to debug bash once the command "ls" is typed in, but only inside an LXC container (outside the container, it works). The error message from gdb is:

Couldn't write debug register: No such process

Steps to Reproduce:
1. Set up an LXC chroot containing gdb and bash (on Fedora Linux, this can be done fairly easily by installing and configuring mach and then using "mach yum install gdb bash". mach also supports apt-get.) I also installed the separate -debuginfo packages inside the container, but I don't think this matters.
2. I use the libvirt command-line tools to actually start the LXC container, because I couldn't manage to get the LXC userspace tools to create a container. Define the LXC chroot in virsh, e.g. virsh define foo.xml for some libvirt
configuration file foo.xml
3. virsh --connect lxc:/// start foo && virsh --connect lxc:/// console foo
4. export PATH=/usr/bin:$PATH
5. gdb /bin/bash
6. Type 'r' to run bash
7. Inside that bash process, type 'ls' and hit the enter key

At this point it says 
Couldn't write debug register: No such process

FYI, I get a slightly different error message if I use the Fedora kernel instead of the mainline kernel:
Waiting for child: no such process

Regarding the regression flag: I am *guessing* this is a regression, because the very similar bug that I just mentioned in the Fedora kernel tree is definitely a regression. I plan to test older and newer mainline kernel versions at a later date.
Comment 1 Robin Green 2010-08-30 08:44:01 UTC
Confirmed as regression - bug not present in 2.6.31.12.

Bug still present in 2.6.35.4.
Comment 2 Robin Green 2010-08-30 08:57:29 UTC
OK, I will now use git bisect to try and find the patch responsible.
Comment 3 Robin Green 2010-09-01 09:31:32 UTC
According to git bisect, this bug was introduced by commit:

24f1e32c60c45c89a997c73395b69c8af6f0a84e

see http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=24f1e32c60c45c89a997c73395b69c8af6f0a84e
Comment 4 Frederic Weisbecker 2010-09-08 21:02:44 UTC
Thanks for your report.

This might be related to badly handled pid namespaces from breakpoint
code. I'll try to to get that fixed soon, once I'll manage to get
that lxc environment running...
Comment 5 Andrew Morton 2010-09-23 22:39:41 UTC
ping?
Comment 6 Frederic Weisbecker 2010-09-23 22:47:55 UTC
(In reply to comment #5)
> ping?

The fix is now upstream:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=068e35eee9ef98eb4cab55181977e24995d273be

And is tagged for backport as well. I guess the ticket can be closed now.