Subject : 2.6.25-rc1/2 regression: first-time login into gnome fails Submitter : Romano Giannetti <romanol@upcomillas.es> Date : 2008-02-18 11:56 References : http://lkml.org/lkml/2008/2/18/145 Handled-By : "Ray Lee" <ray-lk@madrabbit.org> This entry is being used for tracking a regression from 2.6.24. Please don't close it until the problem is fixed in the mainline.
Hi, I have a very strange, but fully reproducible, regression with 2.6.25-rc1 -rc2. I have an ubuntu 7.10 fully updated. The first time after boot, when I login to gnome (through gdm) the login half-fails with a Setting Daemon error: failed to connect to socket /tmp/dbus-<some random stuff>: connection refused. Nothing in the logs, and there is no such socket in /tmp. If I log out and then log in again, all works ok. With 2.6.24.2 there is no such a problem. .config, dmesg (2M buffer size) and syslog here: http://www.dea.icai.upcomillas.es/romano/linux/info/ Romano
As suggested bt Ray, I have waited 5 minutes after the login screen, and it doensn't happen. I will try again, I am pulling now and recompiling...
Hmm. Hands up. It seems it happens just sometime, not exactly relationed with the time I wait before logging in.
Confirmed with 2.6.25-rc3. I think it's gconfd or something like that that fails on a socket, probably a Unix one. I tried to make a git log v2.6.24.. net/unix but I am not able to tell if some change here could be the culprit. To give more details: My ubuntu has the problem that it normally has a big delay on logging in into gnome. On kernel 2.6.24.2, it succeeded at last, after a 20 to 30 seconds delay. Now gconfd fails. Logging out and in again manage to give me a working gnome at the end (sometime I have to repeat it). If anyone can suggest how to debug it, please tell. Romano
Tested with -rc4. Same problem. But now I have some more info. Immediately after the failure, I ran a "strace gnome-setting-manager" and the first error was the same I had in the window telling me that gnome-setting-manager failed (grep for "failure" in the attached file). Anyone, any hints? Maybe I should raise this on lkml? This is very reproducible, although a bit random.
Created attachment 15149 [details] starce of gnome-setting-manager grep for "failed" in the file; the error that crates the failing login was identical to the first one: write(2, "\n** (gnome-settings-daemon:7750)"..., 142 ** (gnome-settings-daemon:7750): WARNING **: Unable to connect to dbus: Failed to connect to socket /tmp/dbus-Pyo1eSHdSA: Connection refused ) = 142
The failing sequences are: socket(PF_FILE, SOCK_STREAM, 0) = 13 connect(13, {sa_family=AF_FILE, path=@/tmp/dbus-Pyo1eSHdSA}, 23) = -1 ECONNREFUSED (Connection refused) socket(PF_FILE, SOCK_STREAM, 0) = 15 connect(15, {sa_family=AF_FILE, path=@/tmp/dbus-Pyo1eSHdSA}, 23) = -1 ECONNREFUSED (Connection refused) Did something change in AF_FILE socket family changed?
Created attachment 15210 [details] strace of gnome-settings-manager for 2.6.24.2
Created attachment 15211 [details] strace of gnome-settings-manager for a successfull login
Created attachment 15212 [details] strace of gnome-settings-manager hand-started after a "failed" login
I think now that the problem is not gnome-settings-daemon failing. I have substituted gnome-settings-daemon with a script that do a exec strace -ff -tt -o /tmp/g-s-d-`uname -r` and I discovered that when the login fails(*) the script isn't even started. I had to start it by hand after having the prompt. Nevertheless, I uploaded all the logs of a 2.6.24.2 session, a successful -rc4 session, and a failed one. So, the failing bit is *before* even starting gnome-settings-manager. Is there someone expert on gnome startup that can advise me on what to instrument now? Thanks (*) what happen exactly is that few seconds after I input the password, a dialog appears with the " Unable to connect to dbus: Failed to connect to socket /tmp/dbus-wDIKYwkD9J: Connection refused" message. Then login continues, and I enter gnome, but without any personalized setting (obviously, given that gnome-settings-daemon isn't running). That is what I call half-failed login. Notice that in 2.6.24.2 this never happens.
Happened again, after several successfull login, with rc6. One note: when I have the error and gnome-settings-manager fails to run, in the logout/exit gnome panel there are not suspend nor hibernate icons. Could be ACPI-related?
Might be, but it looks very transient right now, like a race somewhere that's hard to trigger.
Cristoph Hellwig told a "mee too" on the linux kernel mailing list: http://lkml.org/lkml/2008/3/17/118 but now it seems that lkml is down (no messages in a loooong while).
It seems that apt-get install dbus-x11 solves (hide?) the problem.
Ray Lee said: This appears to be a race in user space that people have been hitting for some time, but has gotten more likely with the latest kernel. It matches the behavior of a gnome dbus bug [ http://bugzilla.gnome.org/show_bug.cgi?id=395488 ]. The way to avoid that bug is to install the dbus-x11 package which forces dbus to start up earlier, avoiding the race. Though the original reporter hasn't replied, Christoph Hellwig hit the same issue and was able to confirm that installing the dbus-x11 package avoids the issue, papering over bug, wherever it may lie. Regardless, there are reports of this issue that go back before 2.6.25, so I don't think this is a regression, just a timing issue that's a lot easier to hit with the latest kernel. This is one of those things that'd be nice to have in a Known Issues document in the kernel release. "Debian and derivative distributions may need to install dbus-x11 package to avoid a known userspace issue in the dbus package. [gnome bug 395488]"
References : http://www.ussg.iu.edu/hypermail/linux/kernel/0803.3/1130.html