Hi, I have a Belkin card reader [1] where media revalidation appears to be broken. The kernel will happily return ENOMEDIUM when opening the /dev/sdb with O_RDONLY but only when running 'touch /dev/sdb' the partition goes away. I would think that if the kernel knows enough to return ENOMEDIUM for /dev/sdb it should be able to also infer that the partitions from /dev/sdb1 are no more. It is also very weird because I only see this behavior on the Belkin device, not on other USB card readers. This is on 2.6.29.1-46.fc11.x86_64 from Fedora Rawhide. Here is the detailed log. First we kill all polling [root@x61 1-3]# killall hald-addon-storage Then we listen on uevents [root@x61 1-3]# udevadm monitor monitor will print the received events for: UDEV the event which udev sends out after rule processing UEVENT the kernel uevent # insert CF card and run 'touch /dev/sdb' in another shell UEVENT[1239057973.774859] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb (block) UEVENT[1239057973.774932] add /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb/sdb1 (block) UEVENT[1239057973.778183] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb (block) UEVENT[1239057973.814998] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0 (scsi) UDEV [1239057973.962835] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb (block) UDEV [1239057974.094378] add /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb/sdb1 (block) UDEV [1239057974.265656] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb (block) UEVENT[1239057974.269085] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb (block) UDEV [1239057974.269574] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0 (scsi) UDEV [1239057974.437891] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb (block) # Now we remove CF card - note that since user space don't poll for media changes nothing happens # [...] # Now we run 'blockdev --rereadpt /dev/sdb'. It fails with # # /dev/sdb: No medium found # # The only interaction with the device that happens is that it # does open("/dev/sdb", O_RDONLY) and which fails with ENOMEDIUM # # However the following uevents occur UEVENT[1239058028.675551] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0 (scsi) UEVENT[1239058028.686676] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb (block) UDEV [1239058028.686744] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0 (scsi) UDEV [1239058028.765815] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb (block) # Note that we didn't get a remove event for /dev/sdb1. It is still around # both in sysfs and /dev # # run 'touch /dev/sdb' to get the remove event. # UEVENT[1239058055.909464] remove /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb/sdb1 (block) UDEV [1239058055.921046] remove /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb/sdb1 (block) UEVENT[1239058055.947384] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb (block) UDEV [1239058056.029241] change /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3:1.0/host19/target19:0:0/19:0:0:0/block/sdb (block) # Essentially touch just does # # open("/dev/sdb", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 # dup2(3, 0) = 0 # close(3) = 0 # utimensat(0, NULL, NULL, 0) = 0 # close(0) = 0 [1] : # pwd /sys/devices/pci0000:00/0000:00:1a.7/usb1/1-3 # cat idProduct idVendor manufacturer product 0248 050d BELKIN USB 2 Flash Media Device
Touch intentionally avoids blocking or testing things like media state (note its use of O_NONBLOCK)
Hi Alan, (In reply to comment #1) > Touch intentionally avoids blocking or testing things like media state (note > its use of O_NONBLOCK) I'm not sure what you are saying here as comment 0 shows that it is the touch call (with O_NONBLOCK) that actually makes the kernel remove sdb1 while earlier calls open() calls (without O_NONBLOCK) does not trigger this (but makes the kernel return ENOMEDIUM). What baffles me is that in order for the kernel to return ENOMEDIUM it must know there is no medium. Yet it refuses to remove the partitions on the medium until O_NONBLOCK is used. How do you explain that not being buggy behavior? Of course we can easily work around this in user space by just essentially running 'touch /dev/sdb' when we get ENOMEDIUM but at the end of the day I still think that is a work around.
ah Ok I misunderstood the original report With O_NDELAY we go direct to the device level status Without it we go via open_for_data() which checks the drive_status and does other stuff as well so its definitely not sane unless something else is holding the partition open and this is a timing funny. Can you rebuild drivers/cdrom/cdrom.c with the ERRLOGMASK define near the top set to log everything as a first step, then see what that shows up.
(In reply to comment #3) > Can you rebuild drivers/cdrom/cdrom.c with the ERRLOGMASK define near the top > set to log everything as a first step, then see what that shows up. Sure, but the bug only happens with a USB 4in1 card reader driven by usb-storage so maybe I need to tweak another define than in cdrom.c?
The sd case is a bit trickier but if you stick a few printk() calls into sd.c : sd_open/sd_close and sd_revalidate_disk so you can see the events occuring maybe that will produce a clue.
I just stumbled over this as well. We currently try to remove the polling in udisks to make use of the new in-kernel media polling facility (https://lkml.org/lkml/2010/12/8/299). My SD card reader (which is actually a Sony PRS e-book reader) behaves exactly like David's. When I eject the card from the reader, no uevents happen until I poke the drive with an open(): $ sudo head -c 1 /dev/sdd head: cannot open `/dev/sdd' for reading: No medium found KERNEL[1302016466.795990] change /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.5/1-1.5.3/1-1.5.3.3/1-1.5.3.3:1.0/host19/target19:0:0/19:0:0:2/block/sdd (block) UDEV_LOG=3 ACTION=change DISK_MEDIA_CHANGE=1 Note there is no removal event for /dev/sdd1, and the device is still around: $ ls /dev/sdd* /dev/sdd /dev/sdd1 * The current udisks poller works around this by doing the poke again with O_NONBLOCK; when doing this, sdd1 actually goes away: $ sudo python -c 'import os; os.open("/dev/sdd", os.O_RDONLY|os.O_NONBLOCK)' KERNEL[1302017728.645882] remove /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.5/1-1.5.3/1-1.5.3.3/1-1.5.3.3:1.0/host21/target21:0:0/21:0:0:2/block/sdd/sdd1 (block) ACTION=remove DEVTYPE=partition This behaves exactly the same when enabling the in-kernel polling in /sys/block/sdd/events_poll_msecs; when doing that, the "head" command (which is by and large doing open() without O_NONBLOCK here) does not need to be run, but the last python O_NONBLOCK poke is still required. Could the in-kernel poller do the equivalent of this when a DISK_MEDIA_CHANGE uevent is generated, i. e. take into account that partitions might have been removed/added? I. e. the equivalent of what opening the block device with O_NONBLOCK does? If this could be fixed in the in-kernel poller, we could remove the userspace workaround to open(O_NONBLOCK) the device on a DISK_MEDIA_CHANGE event.
Tejun, the issue with the in-kernel polling and media re-validation seems like an old bug -- would be still nice to fix that, so userspace does not need to work around that weird behavior. Any ideas?
Created attachment 53632 [details] cdrom_open-always-check-events.patch Can you please verify the attached patch fixes the problem? Thanks.
Created attachment 53652 [details] blkdev-get-rescan.patch For some reason I thought this report was about cdrom. We actually have two layers of problems here. For cdrom, the event check itself doesn't happen if medimum is not there. For other devices, that part happens but partition rescan doesn't. This patch should fix the latter. The patch is only compile tested. Please test whether it fixes the problem. Thanks.
On Wed, Apr 6, 2011 at 13:48, <bugzilla-daemon@bugzilla.kernel.org> wrote: > --- Comment #9 from Tejun Heo <tj@kernel.org> 2011-04-06 11:48:53 --- > Created an attachment (id=53652) > --> (https://bugzilla.kernel.org/attachment.cgi?id=53652) > blkdev-get-rescan.patch > > For some reason I thought this report was about cdrom. We actually have two > layers of problems here. For cdrom, the event check itself doesn't happen if > medimum is not there. For other devices, that part happens but partition > rescan doesn't. This patch should fix the latter. The patch is only compile > tested. Please test whether it fixes the problem. That fixes the issue for me, and works without weird added additional open(O_NONBLOCK) workarounds. Thanks!
On Wed, Apr 6, 2011 at 14:10, <bugzilla-daemon@bugzilla.kernel.org> wrote: > That fixes the issue for me, and works without weird added additional > open(O_NONBLOCK) workarounds. Feel free to add: Tested-By: Kay Sievers <kay.sievers@vrfy.org> to all 3 patches. I'm running them all here with the in-development version of udisks, that switched completely from userspace to kernel polling.
A patch referencing this bug report has been merged in v2.6.39: commit 02e352287a40bd456eb78df705bf888bc3161d3f Author: Tejun Heo <tj@kernel.org> Date: Fri Apr 29 10:15:20 2011 +0200 block: rescan partitions on invalidated devices on -ENOMEDIA too
So I close this bug as fixed assuming it fixes this also for David, the original reporter.
A patch referencing this bug report has been merged in v3.0-rc1: commit 1196f8b814f32cd04df334abf47648c2a9fd8324 Author: Tejun Heo <tj@kernel.org> Date: Thu Apr 21 20:54:45 2011 +0200 block: rescan partitions on invalidated devices on -ENOMEDIA too