Bug 13178

Summary: Booting very slow
Product: Other Reporter: Rafael J. Wysocki (rjw)
Component: OtherAssignee: other_other
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: alan, spamtrap
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 12398    

Description Rafael J. Wysocki 2009-04-25 19:50:15 UTC
Subject    : Analyzed/Solved: Booting 2.6.30-rc2-git7 very slow
Submitter  : Martin Knoblauch <spamtrap@knobisoft.de>
Date       : 2009-04-24 12:45
References : http://marc.info/?l=linux-kernel&m=124057716231773&w=4

This entry is being used for tracking a regression from 2.6.28.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Martin Knoblauch 2009-04-27 09:05:40 UTC
The reason for the problem is that "/proc/mounts" contains two entries for "sysfs":

[root@lpsdm52 hotplug]# uname -a
Linux lpsdm52 2.6.30-rc3-git2-nfs_ra #3 SMP Mon Apr 27 10:21:31 CEST 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@lpsdm52 hotplug]# grep sysfs /proc/mounts
none /sys sysfs rw,relatime 0 0
/sys /sys sysfs rw,relatime 0 0


 Which breaks the RHEL-4.3 provides script "/etc/hotplug/firmware.agent". The mount-path is now determined to be "/sys\n/sys". In turn every driver using the firmware-loader now fails and times out on the value in "/sys/class/firmware/timeout".

 There is a simple fix to the firmware agent, but this behaviour is still a regression.

Cheers
Martin
Comment 2 Rafael J. Wysocki 2009-04-28 22:02:57 UTC
On Monday 27 April 2009, Martin Knoblauch wrote:
> 
> ----- Original Message ----
> 
> > From: Martin Knoblauch <spamtrap@knobisoft.de>
> > To: Rafael J. Wysocki <rjw@sisk.pl>; Linux Kernel Mailing List
> <linux-kernel@vger.kernel.org>
> > Cc: Kernel Testers List <kernel-testers@vger.kernel.org>
> > Sent: Monday, April 27, 2009 9:18:53 AM
> > Subject: Re: [Bug #13178] Booting very slow
> > 
> > ----- Original Message ----
> > 
> > > From: Rafael J. Wysocki 
> > > To: Linux Kernel Mailing List 
> > > Cc: Kernel Testers List ; Martin Knoblauch 
> > 
> > > Sent: Sunday, April 26, 2009 11:46:31 AM
> > > Subject: [Bug #13178] Booting very slow
> > > 
> > > This message has been generated automatically as a part of a report
> > > of regressions introduced between 2.6.28 and 2.6.29.
> > > 
> > > The following bug entry is on the current list of known regressions
> > > introduced between 2.6.28 and 2.6.29.  Please verify if it still should
> > > be listed and let me know (either way).
> > > 
> > > 
> > > Bug-Entry    : http://bugzilla.kernel.org/show_bug.cgi?id=13178
> > > Subject        : Booting very slow
> > > Submitter    : Martin Knoblauch 
> > > Date        : 2009-04-24 12:45 (3 days old)
> > > References    : http://marc.info/?l=linux-kernel&m=124057716231773&w=4
> > 
> > Not really sure whether this is a real regression. Between 2.6.28 and
> 2.6.29 the 
> > content of /proc/mounts for sysfs changed from
> > 
> > /sys /sys sysfs rw 0 0
> > 
> > to
> > 
> > none /sys sysfs rw 0 0
> > 
> > 
> > This breaks RHEL-4.3 userland which parses /proc/mounts in the firmware
> hotplug 
> > agent to find the mount-point for sysfs. As a result firmware loading
> started to 
> > fail in 2.6.29. There is a simple fix in the /etc/hotplug/firmware.agent
> script 
> > (just assume /sys as it is done elsewhere).
> > 
> > Your call.
> > 
> > Cheers
> > Martin
> 
>  Actually I have to correct myself. The reason for the failure to parse
>  /proc/mounts for "sysfs" is that there are two lines:
> 
> [hotplug]# uname -a
> Linux lpsdm52 2.6.30-rc3-git2-nfs_ra #3 SMP Mon Apr 27 10:21:31 CEST 2009
> x86_64 x86_64 x86_64 GNU/Linux
> [hotplug]# grep sysfs /proc/mounts
> none /sys sysfs rw,relatime 0 0
> /sys /sys sysfs rw,relatime 0 0
> 
>  This breaks the "firmware.agent" /sys-parsing code. There still exists the
>  simple fix to
>  userspace, but I now think that this is a real regression that should be
>  fixed.
Comment 3 Rafael J. Wysocki 2009-05-18 17:12:08 UTC
On Monday 18 May 2009, Martin Knoblauch wrote:
> 
> ----- Original Message ----
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> > Cc: Kernel Testers List <kernel-testers@vger.kernel.org>; Martin Knoblauch
> <spamtrap@knobisoft.de>
> > Sent: Saturday, May 16, 2009 10:06:02 PM
> > Subject: [Bug #13178] Booting very slow
> > 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.28 and 2.6.29.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.28 and 2.6.29.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry    : http://bugzilla.kernel.org/show_bug.cgi?id=13178
> > Subject        : Booting very slow
> > Submitter    : Martin Knoblauch 
> > Date        : 2009-04-24 12:45 (23 days old)
> > References    : http://marc.info/?l=linux-kernel&m=124057716231773&w=4
> 
>  The issue is still open. It turns out that starting with 2.6.29-rc1
>  /proc/mounts already has a "sysfs" line when entering the startup scripts
>  from initrd. This breaks the RHEL4 firmware hotplug script.
> 
> Simple fix to user space is available. I do not know how important this issue
> is.
Comment 4 Martin Knoblauch 2009-05-25 08:33:56 UTC
The problem has been bisected down to commit:

|commit 1120f8b8169fb2cb51219d326892d963e762edb6
|Author: Stephen Hemminger <shemminger@vyatta.com>
|Date:  Thu Dec 18 09:17:16 2008 -0800
|
|    PCI: handle long delays in VPD access
|
|    Accessing the VPD area can take a long time.  The existing
|    VPD access code fails consistently on my hardware. There are comments
|
|    Change the access routines to:
|      * use a mutex rather than spinning with IRQ's disabled and lock held
|      * have a much longer timeout
|      * call cond_resched while spinning
|
|    Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
|    Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
|    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

 The issue seems to be kind of timing dependent. It seems I can only reproduce it on a certain hap configuration (HP/Proliant DL380G4). The symptom does not show up on an IBM x3650 with the same RHEL4.3 userpace. It also does not show up on my notebook with CentOS-5.3 userspace.

 No idea what to do about it. No complaints from my side if this gets closed for fuzzyness :-)

Martin
Comment 5 Rafael J. Wysocki 2009-05-27 19:46:03 UTC
On Wednesday 27 May 2009, Andrew Morton wrote:
> On Tue, 26 May 2009 01:04:04 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Monday 25 May 2009, Martin Knoblauch wrote:
> > > 
> > > ----- Original Message ----
> > > 
> > > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > > To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> > > > Cc: Kernel Testers List <kernel-testers@vger.kernel.org>; Martin
> Knoblauch <spamtrap@knobisoft.de>
> > > > Sent: Sunday, May 24, 2009 9:31:18 PM
> > > > Subject: [Bug #13178] Booting very slow
> > > > 
> > > > This message has been generated automatically as a part of a report
> > > > of regressions introduced between 2.6.28 and 2.6.29.
> > > > 
> > > > The following bug entry is on the current list of known regressions
> > > > introduced between 2.6.28 and 2.6.29.  Please verify if it still should
> > > > be listed and let me know (either way).
> > > > 
> > > > 
> > > > Bug-Entry    : http://bugzilla.kernel.org/show_bug.cgi?id=13178
> > > > Subject        : Booting very slow
> > > > Submitter    : Martin Knoblauch 
> > > > Date        : 2009-04-24 12:45 (31 days old)
> > > > References    : http://marc.info/?l=linux-kernel&m=124057716231773&w=4
> > > 
> > >  Still happens with 2.6.30-rc7. But see my comment on bz. I would be
> willing to leave this as "fuzzy timing related problem.
> > 
> > OK
> > 
> > I've closed it as "unreproducible".
> > 
> 
> afacit this should remain open.  It's a reproducible regression on one
> of Martin's machines and it has been bisected down to a particular
> commit which quite clearly has the potential to increase device
> intialisation times by a lot.  Especially if that commit was buggy.
Comment 6 Rafael J. Wysocki 2009-06-01 20:14:10 UTC
On Monday 01 June 2009, Martin Knoblauch wrote:
> 
> ----- Original Message ----
> 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.28 and 2.6.29.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.28 and 2.6.29.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry    : http://bugzilla.kernel.org/show_bug.cgi?id=13178
> > Subject        : Booting very slow
> > Submitter    : Martin Knoblauch 
> > Date        : 2009-04-24 12:45 (37 days old)
> > References    : http://marc.info/?l=linux-kernel&m=124057716231773&w=4
> 
>  We (HP and myself) are trying to track it down.
Comment 7 Rafael J. Wysocki 2009-06-08 11:11:42 UTC
On Monday 08 June 2009, Martin Knoblauch wrote:
> 
> ----- Original Message ----
> 
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> > Cc: Kernel Testers List <kernel-testers@vger.kernel.org>; Jesse Barnes
> <jbarnes@virtuousgeek.org>; Martin Knoblauch <spamtrap@knobisoft.de>; Stephen
> Hemminger <shemminger@vyatta.com>
> > Sent: Sunday, June 7, 2009 12:06:22 PM
> > Subject: [Bug #13178] Booting very slow
> > 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.28 and 2.6.29.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.28 and 2.6.29.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry    : http://bugzilla.kernel.org/show_bug.cgi?id=13178
> > Subject        : Booting very slow
> > Submitter    : Martin Knoblauch 
> > Date        : 2009-04-24 12:45 (45 days old)
> > References    : http://marc.info/?l=linux-kernel&m=124057716231773&w=4
> 
>  No change since last ping. We ruled out a non-HP NIC in the DL380. HP will
>  try to reproduce in-house.
Comment 8 Martin Knoblauch 2010-01-20 08:54:49 UTC
 Almost forgot about this one, as the hardware in question has been retired by the customer.

 Investigation by HP back in July/August 2009 showed, that the problem was caused by a VPD read problem on the platform. That in turn prevented the umount of "/sys" from the initrd image, which resulted in the double entry, which broke the hotplug script, which ....

 I will try to find out whether HP ever found a solution.