I'm seeing strange behaviour with /proc/kmsg starting with commit
b56a39ac263e5b8cafedd551a49c2105e68b98c2 printk: return -EINVAL if the message len is bigger than the buf size
This has since been reverted by 6fda135c908d0f38a0167adcbd71094572e3059b and fixed differently with 116e90b23f74d303e8d607c7a7d54f60f14ab9f2 but the problem persists with current HEAD 9d4056aa9efa07015aa6da11ae2e72f22dc55c97
This is on a debian unstable userspace. Kernel logging is handled in two parts:
/bin/dd bs=1 if=/proc/kmsg of=/var/run/klogd/kmsg
which shovels /proc/kmsg into a fifo, that's read by klogd proper for producing the log files themselves:
/sbin/klogd -P /var/run/klogd/kmsg
The problem here is the dd part.
Given bs=1, it tries to do byte-size reads from /proc/kmsg. Starting at the commit referenced above, read(2) apparently immediately returns 0 instead of blocking. The dd process interprets this as EOF and exits. The klogd process on the reading end of the /var/run/klogd/kmsg fifo gets upset because the writer goes away and starts eating 100% CPU instead of, well, logging things.
If I drain /proc/kmsg into the fifo by hand (using cat(1)) and start klogd (with the dd and all) again, it strangely works.
There's an argument to be made that this is debian userspace being broken, but technically it's a kernel regression since it used to work. So I'm reporting it.
Apologies if I get the bug category wrong, I couldn't find anything appopriate for printk/logging related woes. But /proc is where I'm seeing this problem and it's a file system, sort of :)
I hacked together a simple kernel module that just printk():s a message periodically and took a look at /proc/kmsg.
<6>[23715.200075] Hello2 (ffff88004a3b6488)
in /proc/kmsg, I can /bin/dd if=/proc/kmsg bs=44 just fine, 44 being the length of the message. That will block nicely and run seemingly forever. Or until someone logs something that's not 44 bytes.
Anything below bs=44 results in 0 bytes read, ie. EOF.
If I try with "Hello" instead of "Hello2", bs=43 works and so on.
Probably best to also email this to the mailing list (and Jan Beulich/GregKH) initially.
With the new one-message-at-a-time approach, I don't think reading single bytes works this way anymore: The entire message gets consumed, and returning just a single byte of it to user space would make it impossible to obtain the rest of the message. Please also Cc firstname.lastname@example.org (whom I can't Cc here as he doesn't appear to be known to bugzilla, yet whom I would like to check whether this my understanding is correct, as he did the underlying large re-write).
This will still not make dd bs=1 work reliably. It never was. The /proc/kmsg interface might always return 0 for read, not indicating EOF, but telling try-again. This was always the case, and dd was never safe to use here. You find years old bugs for this on the net.
Before filing this I saw some reports that had issues with multiple readers, and as mentioned on lkml that would be tricky to get right on the kernel side. Or even figure out what could be correct and defined behaviour for multiple readers. So I'd imagine it's safe to ignore those.
But this is a single blocking reader. So unless the old code had cases where it would return 0 to a single blocking short read, the dd bs=1 _did_ work reliably :)
Not that I'm defending it. But multiple readers is a different case altogether.
Anyway, thanks for working on this :)
On Sun, Jul 8, 2012 at 9:42 PM, <email@example.com> wrote:
> --- Comment #5 from Jukka Ollila <firstname.lastname@example.org> 2012-07-08 19:42:44
> Before filing this I saw some reports that had issues with multiple readers,
Yss, it was always the case.
> and as mentioned on lkml that would be tricky to get right on the kernel
It's rather trivial, just quite some code to change.
> Or even figure out what could be correct and defined behaviour for multiple
> readers. So I'd imagine it's safe to ignore those.
It can't be fixed properly for multiple readers, it has kernel
internal global state, that is shared.
The new /dev/kmsg structured data/record interface supports as many
readers as wanted, and it also never keeps an kernel global state, and
never flushes the buffer. It can be read as many times as wanted.
> But this is a single blocking reader. So unless the old code had cases where
> would return 0 to a single blocking short read, the dd bs=1 _did_ work
It should be safe. But building system tools on the assumption that
nobody ever does "cat /proc/kmsg" is just crazy.
> Not that I'm defending it. But multiple readers is a different case
Yes, and no. But building such fragile with assumptions like "don't
touch it or it breaks" things is just wrong.
> Anyway, thanks for working on this :)
Sure, hope that will all work now.
Marking this as resolved.
Kay's fix (eb02dac93708f581c99858a19162af8ca2b6bfcb) is in 3.5-rc7. The dd log shovel works again.