Created attachment 22372 [details]
Output of "udevadm monitor --environment" with udev-141-1.2 and kernel 2.6.30
Before I upgraded the udev version of Ubuntu 9.04 from 141-1 to 141-1.1 (and 141-1.2), top reported about 2% total cpu usage when idle. After the update, it rarely drops below 35%. Though top lists udevd among the more cpu hungry processes, it is only listed with about 3%, while system (in line 3) almost never drops below 20%.
"udevadm monitor --environment" after the upgrade reports many additional events with "SUBSYSTEM=scsi" and "ACTION=change".
I am using the amd64 version of Xubuntu 9.04 on a Acer Extensa 5230 laptop with a 2 GHz Intel Celeron processor. The problem occurs with all the kernel versions I tried: The Ubuntu kernel 2.6.28 and the mainline kernels 2.6.30 and 2.6.31-rc3.
See also the Ubuntu bug report:
Created attachment 22373 [details]
Output of "udevadm monitor --environment" with udev-141-1 and kernel 2.6.30
Created attachment 22374 [details]
Output of top with new udev
Created attachment 22375 [details]
Output of top with old udev
Created attachment 22376 [details]
Output of lspci
Something keeps generating media change events in a loop for the optical drive.
If you keep "udevadm monitor" running, but kill udevd, do all events stop?
The event for "sr" open()s the device. I guess, this causes a new change event for the SCSI LUN (which it shouldn't), and that causes a loop.
Is there anything like HAL or DeviceKit-disks running which polls the optical drive?
Yes, if I stop either udevd or hald-addon-storage, no more events are generated and cpu usage goes back to normal.
On Thu, Jul 16, 2009 at 19:26, <firstname.lastname@example.org> wrote:
> --- Comment #6 from Andreas Fackler <AndreasFackler@web.de> 2009-07-16
> 17:26:37 ---
> Yes, if I stop either udevd or hald-addon-storage, no more events are
> and cpu usage goes back to normal.
Just to double check:
Killing the hal addon alone makes the event loop stop?
There is a binary: "hal-disable-polling", which can be used to
permanently disable the polling of the device. If you use that and
reboot the box, the issue goes away?
Thank you for your suggestion! It worked, more or less:
After I entered "hal-disable-polling --device /dev/scd0", the events immediately stopped (or maybe I didn't wait long enough), but they reappeared after a reboot, although the file "/etc/hal/fdi/information/media-check-disable-storage_model_DVD_RW_AD_7560S.fdi" had been created. So I entered the exact same command "hal-disable-polling --device /dev/scd0" again, which created the additional file "/etc/hal/fdi/information/media-check-disable-storage_model_Optiarc_DVD_RW_AD_7560S.fdi" and in fact ended the events permanently, even after a reboot.
Except if I open the DVD drive: Then events are fired wildly (not just one or two) and the cpu usage rises to about 30% for a few seconds, until it stops again. The same thing happens when I close the drive. I don't know whether this is normal?
On Fri, Jul 17, 2009 at 17:36, <email@example.com> wrote:
> So I entered the exact same command "hal-disable-polling
> --device /dev/scd0" again, which created the additional file
> and in fact ended the events permanently, even after a reboot.
> Except if I open the DVD drive: Then events are fired wildly (not just one or
> two) and the cpu usage rises to about 30% for a few seconds, until it stops
> again. The same thing happens when I close the drive. I don't know whether
> is normal?
I guess something goes crazy with the media-changed detection.
Sometimes not-tested-in-such-setup/broken firmware for optical drives
does not like us.
We have seen similar behavior on some drives with HAL checking for the
"eject button pressed" events.
In some rare cases there are updates for the drive firmware available.
(In reply to comment #9)
> I guess something goes crazy with the media-changed detection.
> Sometimes not-tested-in-such-setup/broken firmware for optical drives
> does not like us.
Seems a reasonable guess.
Do we know which device driver is being used for that drive?
In the ubuntu report at least three people are seeing the same problem. Is there a common point where we can address this? Maybe a blacklist? Or are we dealing with three separate low-level drivers?
lspci -v and lshw say the SATA controller uses the ahci driver. They don't list any drivers or modules belonging to the optical drive, and it seems to be the same in the files Christian Mallwitz posted.
My blacklist files are just the Ubuntu defaults.
Here is what dmesg says:
[ 1.500033] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1.513385] ata2.00: ATAPI: Optiarc DVD RW AD-7560S, SX07, max UDMA/100, ATAPI AN
[ 1.527733] ata2.00: configured for UDMA/100
[ 1.545118] scsi 1:0:0:0: CD-ROM Optiarc DVD RW AD-7560S SX07 PQ: 0 ANSI: 5
[ 1.550572] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[ 1.550575] Uniform CD-ROM driver Revision: 3.20
[ 1.550642] sr 1:0:0:0: Attached scsi CD-ROM sr0
[ 1.550691] sr 1:0:0:0: Attached scsi generic sg1 type 5
Hmmm... interesting. Can someone please strace the polling process and see what's going on? Attaching strace to the hal-addon-storage should do the trick, Kay, right? It could be that TUR result is alternating. The problem probably is that windows doesn't use TUR while polling for media presence while we do. There are drives which completely check out after being polled for a while and this bug report looks like another strain of the same problem. I'll try to implement media presence polling inside kernel and use only GET_EVENT_STATUS_NOTIFICATION as windows does. Please cc anyone with reproducible problem cases here.
Created attachment 22553 [details]
strace -f hald
I dont't know how to strace only hald-addon-storage, because it is a child process of hald and not an executable, so I did "strace -f hald". Within a few seconds, this generated more than 5 MB of output. I attached the first megabyte of it.
On Fri, Jul 31, 2009 at 11:33, <firstname.lastname@example.org> wrote:
> I don't know how to strace only hald-addon-storage, because it is a child
> process of hald and not an executable,
You should be able to just attach strace to the running process with "-p <pid>".
Created attachment 22554 [details]
strace -p 10678
Ah, right, I'm sorry! Here's a few seconds of "strace -p 10678", where 10678 was the pid of hald-addon-storage.
I have the same issue. I have the exact results as detailed in Comment #8
(Here is the data on my hardware. (Please let me know if any additional debugging information is required.)
Fedora Core 11
Linux version 18.104.22.168-64.fc11.i686.PAE (email@example.com) (gcc version 4.4.1 20090725 (Red Hat 4.4.1-2) (GCC) ) #1 SMP
Output from hal-disable-polling --device /dev/sr0
Polling for drive /dev/sr0 have been disabled. The fdi file written was
Partial output from devkit -m
action=change @ 19:46:30.318587
device file: /dev/sr0
DEVLINKS=/dev/block/11:0 /dev/scd0 /dev/disk/by-path/pci-0000:00:1f.2-scsi-1:0:0:0 /dev/cdrom /dev/cdrw /dev/dvd /dev/dvdrw
I apologise. I forgot to add this bug was found in udev-141-7.fc11 i586
The strace output doesn't indicate anything suspicious. It's just polling for event w/ 2 sec timeout. I don't see how that could cause high cpu usage. Wendall, can you please strace the process which is consuming cpu? Also, is the wasted cpu time system or user?
What's interesting is that the cpu usage is spread across several processes:
devkit-disks-daemon 3%-8% (user)
gvfs-gdu-volume-monitor 3%-8% (system)
dbus-daemon 3%-8% (system)
gdu-notification-daemon 3%-8% (user)
udevd 1%-2% (system)
The cpu spikes to about 35%-40% usage for 30 seconds, then drops down to about 15%-20% for around a second and repeats the cycle.
I'm not sure which process I should provide an strace for. Or even if this is a udev bug at this point. Suggestions?
Has this problem been fixed in version 147? I had this problem in versions 145 and 146 in Arch64, but I just installed udev-147~-6, which someone has ported from ubuntu to Arch, and the problem is gone.
Wendall, can you also please test udev version 147? Thanks.
With up-to-date Ubuntu Karmic Koala and its udev-147~-6.1 the problem is still there (Acer Extensa 5630EZ with Optiarc DVD RW AD-7560S)
I have tried some PPA build of udev-147~-6.2 but it didn't help either. What can I do to get this fixed? 55% CPU load is not fun. Thanks.
Is anybody working on this bug? It is being reported by numerous people in the Archlinux forums.
I have the same problem here in my Lenovo 3000 G530 with Optiarc_DVD_RW_AD_7580S.
This but persists in version 151.
I also have the Optiarc_DVD_RW_AD_7580S.
It's been nearly six months since this bug was reported. What's up?
Every report so far has been of the Optiarc_DVD_RW_AD_7580S drive. Is there anyone who has seen this issue on any other drive?
Sorry about lack or response. If the problem is limited to a single drive, is anyone interested in sending me the device? I'll pay for the replacement unit and shipping via paypal.
> Every report so far has been of the Optiarc_DVD_RW_AD_7580S drive. Is there
> anyone who has seen this issue on any other drive?
The downstream bug report also reported this to happen on a slightly different model:
UDEV [1243584928.060421] change /devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0 (block)
(In reply to comment #8)
> After I entered "hal-disable-polling --device /dev/scd0"...
I had to run "hal-disable-polling --device /dev/sr0" three times before the stream of events stopped. I now have three files in /etc/hal/fdi/information:
They were produced in the order listed.