Bug 10041

Summary: 2.6.25-rc1/2 regression: first-time login into gnome fails
Product: Other Reporter: Rafael J. Wysocki (rjw)
Component: OtherAssignee: other_other
Status: CLOSED WILL_NOT_FIX    
Severity: normal CC: romano.giannetti
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    
Attachments: starce of gnome-setting-manager
strace of gnome-settings-manager for 2.6.24.2
strace of gnome-settings-manager for a successfull login
strace of gnome-settings-manager hand-started after a "failed" login

Description Rafael J. Wysocki 2008-02-18 14:34:05 UTC
Subject         : 2.6.25-rc1/2 regression: first-time login into gnome fails
Submitter       : Romano Giannetti <romanol@upcomillas.es>
Date            : 2008-02-18 11:56
References      : http://lkml.org/lkml/2008/2/18/145
Handled-By      : "Ray Lee" <ray-lk@madrabbit.org>

This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Romano Giannetti 2008-02-19 00:53:41 UTC
Hi,

        I have a very strange, but fully reproducible, regression with
2.6.25-rc1 -rc2. I have an ubuntu 7.10 fully updated.

        The first time after boot, when I login to gnome (through gdm) 
the login half-fails with a Setting Daemon error: failed to connect to
socket /tmp/dbus-<some random stuff>: connection refused. Nothing in the
logs, and there is no such socket in /tmp. 

If I log out and then log in again, all works ok. 

With 2.6.24.2 there is no such a problem.

.config, dmesg (2M buffer size) and syslog here: 

http://www.dea.icai.upcomillas.es/romano/linux/info/


        Romano 
Comment 2 Romano Giannetti 2008-02-19 01:08:31 UTC
As suggested bt Ray, I have waited 5 minutes after the login screen, and it doensn't happen. 
I will try again, I am pulling now and recompiling...
Comment 3 Romano Giannetti 2008-02-19 06:33:08 UTC
Hmm. Hands up. It seems it happens just sometime, not exactly relationed with the time I wait before logging in.
Comment 4 Romano Giannetti 2008-02-27 00:44:01 UTC
Confirmed with 2.6.25-rc3. I think it's gconfd or something like that that fails on a socket, probably a Unix one. I tried to make a git log v2.6.24.. net/unix but I am not able to tell if some change here could be the culprit.

To give more details: My ubuntu has the problem that it normally has a big delay on logging in into gnome. On kernel 2.6.24.2, it succeeded at last, after a 20 to 30 seconds delay. Now gconfd fails. 

Logging out and in again manage to give me a working gnome at the end (sometime I have to repeat it). 

If anyone can suggest how to debug it, please tell.

Romano 

 
Comment 5 Romano Giannetti 2008-03-05 02:08:30 UTC
Tested with -rc4. Same problem. But now I have some more info. Immediately after the failure, I ran a "strace gnome-setting-manager" and the first error was the same I had in the window telling me that gnome-setting-manager failed (grep for "failure" in the attached file).

Anyone, any hints? Maybe I should raise this on lkml? This is very reproducible, although a bit random.
Comment 6 Romano Giannetti 2008-03-05 02:10:56 UTC
Created attachment 15149 [details]
starce of gnome-setting-manager

grep for "failed" in the file; the error that crates the failing login was identical to the first one:

write(2, "\n** (gnome-settings-daemon:7750)"..., 142
** (gnome-settings-daemon:7750): WARNING **: Unable to connect to dbus: Failed to connect to socket /tmp/dbus-Pyo1eSHdSA: Connection refused
) = 142
Comment 7 Romano Giannetti 2008-03-06 01:51:20 UTC
The failing sequences are: 

socket(PF_FILE, SOCK_STREAM, 0)         = 13
connect(13, {sa_family=AF_FILE, path=@/tmp/dbus-Pyo1eSHdSA}, 23) = -1 ECONNREFUSED (Connection refused)

socket(PF_FILE, SOCK_STREAM, 0)         = 15
connect(15, {sa_family=AF_FILE, path=@/tmp/dbus-Pyo1eSHdSA}, 23) = -1 ECONNREFUSED (Connection refused)

Did something change in AF_FILE socket family changed? 
Comment 8 Romano Giannetti 2008-03-11 02:06:02 UTC
Created attachment 15210 [details]
strace of gnome-settings-manager for 2.6.24.2
Comment 9 Romano Giannetti 2008-03-11 02:07:31 UTC
Created attachment 15211 [details]
strace of gnome-settings-manager for a successfull login
Comment 10 Romano Giannetti 2008-03-11 02:08:22 UTC
Created attachment 15212 [details]
strace of gnome-settings-manager hand-started after a "failed" login
Comment 11 Romano Giannetti 2008-03-11 02:52:00 UTC
I think now that the problem is not gnome-settings-daemon failing. I have substituted gnome-settings-daemon with a script that do a 

exec strace -ff -tt -o /tmp/g-s-d-`uname -r` 

and I discovered that when the login fails(*) the script isn't even started. I had to start it by hand after having the prompt. Nevertheless, I uploaded all the logs  of a 2.6.24.2 session, a successful -rc4 session, and a failed one.

So, the failing bit is *before* even starting gnome-settings-manager. Is there someone expert on gnome startup that can advise me on what to instrument now?

Thanks


(*) what happen exactly is that few seconds after I input the password, a dialog appears with the " Unable to connect to dbus: Failed to connect to socket /tmp/dbus-wDIKYwkD9J: Connection refused" message. Then login continues, and I enter gnome, but without any personalized setting (obviously, given that gnome-settings-daemon isn't running). That is what I call half-failed login.
Notice that in 2.6.24.2 this never happens. 
Comment 12 Romano Giannetti 2008-03-17 01:38:25 UTC
Happened again, after several successfull login, with rc6.
One note: when I have the error and gnome-settings-manager fails to run, in the logout/exit gnome panel there are not suspend nor hibernate icons. 
Could be ACPI-related? 
Comment 13 Rafael J. Wysocki 2008-03-17 12:38:55 UTC
Might be, but it looks very transient right now, like a race somewhere that's hard to trigger.
Comment 14 Romano Giannetti 2008-03-19 01:39:49 UTC
Cristoph Hellwig told a "mee too" on the linux kernel mailing list:

http://lkml.org/lkml/2008/3/17/118

but now it seems that lkml is down (no messages in a loooong while). 
Comment 15 Romano Giannetti 2008-03-23 13:26:48 UTC
It seems that

apt-get install dbus-x11 

solves (hide?) the problem. 
Comment 16 Rafael J. Wysocki 2008-03-26 15:09:06 UTC
Ray Lee said:

This appears to be a race in user space that people have been hitting
for some time, but has gotten more likely with the latest kernel. It
matches the behavior of a gnome dbus bug [
http://bugzilla.gnome.org/show_bug.cgi?id=395488 ]. The way to avoid
that bug is to install the dbus-x11 package which forces dbus to start
up earlier, avoiding the race.

Though the original reporter hasn't replied, Christoph Hellwig hit the
same issue and was able to confirm that installing the dbus-x11
package avoids the issue, papering over bug, wherever it may lie.
Regardless, there are reports of this issue that go back before
2.6.25, so I don't think this is a regression, just a timing issue
that's a lot easier to hit with the latest kernel.

This is one of those things that'd be nice to have in a Known Issues
document in the kernel release. "Debian and derivative distributions
may need to install dbus-x11 package to avoid a known userspace issue
in the dbus package. [gnome bug 395488]"
Comment 17 Rafael J. Wysocki 2008-03-26 15:10:01 UTC
References : http://www.ussg.iu.edu/hypermail/linux/kernel/0803.3/1130.html