Bug 13783 - udev causes high cpu usage
Summary: udev causes high cpu usage
Status: RESOLVED OBSOLETE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: linux-scsi@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-16 07:36 UTC by Andreas Fackler
Modified: 2013-12-10 16:41 UTC (History)
11 users (show)

See Also:
Kernel Version: 2.6.31-rc3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Output of "udevadm monitor --environment" with udev-141-1.2 and kernel 2.6.30 (34.50 KB, text/plain)
2009-07-16 07:36 UTC, Andreas Fackler
Details
Output of "udevadm monitor --environment" with udev-141-1 and kernel 2.6.30 (5.07 KB, text/plain)
2009-07-16 07:36 UTC, Andreas Fackler
Details
Output of top with new udev (18.81 KB, text/plain)
2009-07-16 07:37 UTC, Andreas Fackler
Details
Output of top with old udev (16.73 KB, text/plain)
2009-07-16 07:38 UTC, Andreas Fackler
Details
Output of lspci (7.78 KB, text/plain)
2009-07-16 07:38 UTC, Andreas Fackler
Details
strace -f hald (999.91 KB, text/plain)
2009-07-31 15:33 UTC, Andreas Fackler
Details
strace -p 10678 (1.73 KB, text/plain)
2009-07-31 15:44 UTC, Andreas Fackler
Details

Description Andreas Fackler 2009-07-16 07:36:02 UTC
Created attachment 22372 [details]
Output of "udevadm monitor --environment" with udev-141-1.2 and kernel 2.6.30

Before I upgraded the udev version of Ubuntu 9.04 from 141-1 to 141-1.1 (and 141-1.2), top reported about 2% total cpu usage when idle. After the update, it rarely drops below 35%. Though top lists udevd among the more cpu hungry processes, it is only listed with about 3%, while system (in line 3) almost never drops below 20%.

"udevadm monitor --environment" after the upgrade reports many additional events with "SUBSYSTEM=scsi" and "ACTION=change".

I am using the amd64 version of Xubuntu 9.04 on a Acer Extensa 5230 laptop with a 2 GHz Intel Celeron processor. The problem occurs with all the kernel versions I tried: The Ubuntu kernel 2.6.28 and the mainline kernels 2.6.30 and 2.6.31-rc3.

See also the Ubuntu bug report:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/379780
Comment 1 Andreas Fackler 2009-07-16 07:36:56 UTC
Created attachment 22373 [details]
Output of "udevadm monitor --environment" with udev-141-1 and kernel 2.6.30
Comment 2 Andreas Fackler 2009-07-16 07:37:42 UTC
Created attachment 22374 [details]
Output of top with new udev
Comment 3 Andreas Fackler 2009-07-16 07:38:01 UTC
Created attachment 22375 [details]
Output of top with old udev
Comment 4 Andreas Fackler 2009-07-16 07:38:22 UTC
Created attachment 22376 [details]
Output of lspci
Comment 5 Kay Sievers 2009-07-16 07:56:47 UTC
Something keeps generating media change events in a loop for the optical drive.

If you keep "udevadm monitor" running, but kill udevd, do all events stop?

The event for "sr" open()s the device. I guess, this causes a new change event for the SCSI LUN (which it shouldn't), and that causes a loop.

Is there anything like HAL or DeviceKit-disks running which polls the optical drive?
Comment 6 Andreas Fackler 2009-07-16 17:26:37 UTC
Yes, if I stop either udevd or hald-addon-storage, no more events are generated and cpu usage goes back to normal.
Comment 7 Kay Sievers 2009-07-17 11:43:24 UTC
On Thu, Jul 16, 2009 at 19:26, <bugzilla-daemon@bugzilla.kernel.org> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13783

> --- Comment #6 from Andreas Fackler <AndreasFackler@web.de>  2009-07-16
> 17:26:37 ---
> Yes, if I stop either udevd or hald-addon-storage, no more events are
> generated
> and cpu usage goes back to normal.

Just to double check:

Killing the hal addon alone makes the event loop stop?

There is a binary: "hal-disable-polling", which can be used to
permanently disable the polling of the device. If you use that and
reboot the box, the issue goes away?

Thanks!
Comment 8 Andreas Fackler 2009-07-17 15:36:25 UTC
Thank you for your suggestion! It worked, more or less:

After I entered "hal-disable-polling --device /dev/scd0", the events immediately stopped (or maybe I didn't wait long enough), but they reappeared after a reboot, although the file "/etc/hal/fdi/information/media-check-disable-storage_model_DVD_RW_AD_7560S.fdi" had been created. So I entered the exact same command "hal-disable-polling --device /dev/scd0" again, which created the additional file "/etc/hal/fdi/information/media-check-disable-storage_model_Optiarc_DVD_RW_AD_7560S.fdi" and in fact ended the events permanently, even after a reboot.

Except if I open the DVD drive: Then events are fired wildly (not just one or two) and the cpu usage rises to about 30% for a few seconds, until it stops again. The same thing happens when I close the drive. I don't know whether this is normal?
Comment 9 Kay Sievers 2009-07-17 15:58:14 UTC
On Fri, Jul 17, 2009 at 17:36, <bugzilla-daemon@bugzilla.kernel.org> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13783
>
> So I entered the exact same command "hal-disable-polling
> --device /dev/scd0" again, which created the additional file
>
> "/etc/hal/fdi/information/media-check-disable-storage_model_Optiarc_DVD_RW_AD_7560S.fdi"
> and in fact ended the events permanently, even after a reboot.

Ah good.

> Except if I open the DVD drive: Then events are fired wildly (not just one or
> two) and the cpu usage rises to about 30% for a few seconds, until it stops
> again. The same thing happens when I close the drive. I don't know whether
> this
> is normal?

I guess something goes crazy with the media-changed detection.
Sometimes not-tested-in-such-setup/broken firmware for optical drives
does not like us.

We have seen similar behavior on some drives with HAL checking for the
"eject button pressed" events.

In some rare cases there are updates for the drive firmware available.
Comment 10 Andrew Morton 2009-07-23 00:22:52 UTC
(In reply to comment #9)
> I guess something goes crazy with the media-changed detection.
> Sometimes not-tested-in-such-setup/broken firmware for optical drives
> does not like us.

Seems a reasonable guess.

Do we know which device driver is being used for that drive?

In the ubuntu report at least three people are seeing the same problem.  Is there a common point where we can address this?  Maybe a blacklist?  Or are we dealing with three separate low-level drivers?

Thanks.
Comment 11 Andreas Fackler 2009-07-23 07:58:16 UTC
lspci -v and lshw say the SATA controller uses the ahci driver. They don't list any drivers or modules belonging to the optical drive, and it seems to be the same in the files Christian Mallwitz posted.
My blacklist files are just the Ubuntu defaults.
Here is what dmesg says:

[    1.500033] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    1.513385] ata2.00: ATAPI: Optiarc DVD RW AD-7560S, SX07, max UDMA/100, ATAPI AN
[    1.527733] ata2.00: configured for UDMA/100
[    1.545118] scsi 1:0:0:0: CD-ROM            Optiarc  DVD RW AD-7560S  SX07 PQ: 0 ANSI: 5
[    1.550572] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[    1.550575] Uniform CD-ROM driver Revision: 3.20
[    1.550642] sr 1:0:0:0: Attached scsi CD-ROM sr0
[    1.550691] sr 1:0:0:0: Attached scsi generic sg1 type 5
Comment 12 Tejun Heo 2009-07-25 03:33:49 UTC
Hmmm... interesting.  Can someone please strace the polling process and see what's going on?  Attaching strace to the hal-addon-storage should do the trick, Kay, right?  It could be that TUR result is alternating.  The problem probably is that windows doesn't use TUR while polling for media presence while we do.  There are drives which completely check out after being polled for a while and this bug report looks like another strain of the same problem.  I'll try to implement media presence polling inside kernel and use only GET_EVENT_STATUS_NOTIFICATION as windows does.  Please cc anyone with reproducible problem cases here.

Thanks.
Comment 13 Andreas Fackler 2009-07-31 15:33:19 UTC
Created attachment 22553 [details]
strace -f hald

I dont't know how to strace only hald-addon-storage, because it is a child process of hald and not an executable, so I did "strace -f hald". Within a few seconds, this generated more than 5 MB of output. I attached the first megabyte of it.
Comment 14 Kay Sievers 2009-07-31 15:37:43 UTC
On Fri, Jul 31, 2009 at 11:33, <bugzilla-daemon@bugzilla.kernel.org> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13783

> I don't know how to strace only hald-addon-storage, because it is a child
> process of hald and not an executable,

You should be able to just attach strace to the running process with "-p <pid>".
Comment 15 Andreas Fackler 2009-07-31 15:44:46 UTC
Created attachment 22554 [details]
strace -p 10678

Ah, right, I'm sorry! Here's a few seconds of "strace -p 10678", where 10678 was the pid of hald-addon-storage.
Comment 16 Wendall Cada 2009-10-17 04:01:15 UTC
I have the same issue. I have the exact results as detailed in Comment #8
(Here is the data on my hardware. (Please let me know if any additional debugging information is required.)

Fedora Core 11

Linux version 2.6.30.8-64.fc11.i686.PAE (mockbuild@x86-6.fedora.phx.redhat.com) (gcc version 4.4.1 20090725 (Red Hat 4.4.1-2) (GCC) ) #1 SMP

Output from hal-disable-polling --device /dev/sr0
Polling for drive /dev/sr0 have been disabled. The fdi file written was
  /etc/hal/fdi/information/media-check-disable-storage_model_DVD_RW_AD_7560S.fdi

Partial output from devkit -m
action=change @ 19:46:30.318587
device:           /sys/devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0
  subsystem:      block
  device file:    /dev/sr0
    symlink:      /dev/block/11:0
    symlink:      /dev/scd0
    symlink:      /dev/disk/by-path/pci-0000:00:1f.2-scsi-1:0:0:0
    symlink:      /dev/cdrom
    symlink:      /dev/cdrw
    symlink:      /dev/dvd
    symlink:      /dev/dvdrw
  properties:
    ID_VENDOR_ENC=Optiarc\x20
    UDEV_LOG=3
    ID_CDROM_MRW=1
    ACL_CDROM=1
    ID_MODEL=DVD_RW_AD-7560S
    ID_TYPE=cd
    ID_CDROM=1
    ID_CDROM_DVD_RAM=1
    ID_PATH=pci-0000:00:1f.2-scsi-1:0:0:0
    DEVTYPE=disk
    ID_VENDOR=Optiarc
    SEQNUM=89215
    DEVNAME=/dev/sr0
    ACTION=change
    ID_CDROM_DVD=1
    DEVPATH=/devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0
    ID_CDROM_CD_R=1
    ID_BUS=scsi
    ID_CDROM_DVD_R=1
    MAJOR=11
    SUBSYSTEM=block
    ID_MODEL_ENC=DVD\x20RW\x20AD-7560S\x20
    ID_CDROM_MRW_W=1
    ID_CDROM_CD_RW=1
    GENERATED=1
    DEVLINKS=/dev/block/11:0 /dev/scd0 /dev/disk/by-path/pci-0000:00:1f.2-scsi-1:0:0:0 /dev/cdrom /dev/cdrw /dev/dvd /dev/dvdrw
    MINOR=0
    ID_REVISION=S805
Comment 17 Wendall Cada 2009-10-17 04:06:01 UTC
I apologise. I forgot to add this bug was found in udev-141-7.fc11 i586
Comment 18 Tejun Heo 2009-10-18 03:06:34 UTC
The strace output doesn't indicate anything suspicious.  It's just polling for event w/ 2 sec timeout.  I don't see how that could cause high cpu usage.  Wendall, can you please strace the process which is consuming cpu?  Also, is the wasted cpu time system or user?
Comment 19 Wendall Cada 2009-10-19 03:46:23 UTC
What's interesting is that the cpu usage is spread across several processes:
devkit-disks-daemon 3%-8% (user)
gvfs-gdu-volume-monitor 3%-8% (system)
dbus-daemon 3%-8% (system)
gdu-notification-daemon 3%-8% (user)
udevd 1%-2% (system)

The cpu spikes to about 35%-40% usage for 30 seconds, then drops down to about 15%-20% for around a second and repeats the cycle.

I'm not sure which process I should provide an strace for. Or even if this is a udev bug at this point. Suggestions?
Comment 20 jt512 2009-10-22 07:00:28 UTC
Has this problem been fixed in version 147?  I had this problem in versions 145 and 146 in Arch64, but I just installed udev-147~-6, which someone has ported from ubuntu to Arch, and the problem is gone.
Comment 21 Tejun Heo 2009-10-26 15:08:24 UTC
Wendall, can you also please test udev version 147?  Thanks.
Comment 22 Petr Stehlik 2009-11-20 08:15:09 UTC
With up-to-date Ubuntu Karmic Koala and its udev-147~-6.1 the problem is still there (Acer Extensa 5630EZ with Optiarc DVD RW AD-7560S)
Comment 23 Petr Stehlik 2009-11-27 09:51:53 UTC
I have tried some PPA build of udev-147~-6.2 but it didn't help either. What can I do to get this fixed? 55% CPU load is not fun. Thanks.
Comment 24 jt512 2010-01-03 01:42:32 UTC
Is anybody working on this bug?  It is being reported by numerous people in the Archlinux forums.
Comment 25 Bruno Yporti 2010-01-17 21:15:26 UTC
I have the same problem here in my Lenovo 3000 G530 with Optiarc_DVD_RW_AD_7580S.
Comment 26 jt512 2010-02-01 23:35:39 UTC
This but persists in version 151.

I also have the Optiarc_DVD_RW_AD_7580S.

It's been nearly six months since this bug was reported.  What's up?
Comment 27 Chase Douglas 2010-02-08 19:45:43 UTC
Every report so far has been of the Optiarc_DVD_RW_AD_7580S drive. Is there anyone who has seen this issue on any other drive?
Comment 28 Tejun Heo 2010-02-09 03:37:00 UTC
Sorry about lack or response.  If the problem is limited to a single drive, is anyone interested in sending me the device?  I'll pay for the replacement unit and shipping via paypal.

Thanks.
Comment 29 Martin Pitt 2010-02-10 10:26:40 UTC
> Every report so far has been of the Optiarc_DVD_RW_AD_7580S drive. Is there
> anyone who has seen this issue on any other drive?

The downstream bug report also reported this to happen on a slightly different model:

UDEV  [1243584928.060421] change   /devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0 (block)
[...]
ID_VENDOR=Optiarc
ID_VENDOR_ENC=Optiarc\x20
ID_MODEL=DVD_RW_AD-7640S
ID_MODEL_ENC=DVD\x20RW\x20AD-7640S\x20
ID_REVISION=HJ01
ID_TYPE=cd
Comment 30 Ian Goodacre 2010-02-15 16:42:10 UTC
(In reply to comment #8)
> After I entered "hal-disable-polling --device /dev/scd0"...

I had to run "hal-disable-polling --device /dev/sr0" three times before the stream of events stopped. I now have three files in /etc/hal/fdi/information:

media-check-disable-storage_model_DVD_RW_AD_7580S.fdi
media-check-disable-storage_model_Optiarc_DVD_RW_AD_7580S_0.fdi
media-check-disable-storage_model_Optiarc_DVD_RW_AD_7580S.fdi

They were produced in the order listed.

Note You need to log in before you can comment on or make changes to this bug.