Bug 19922

Summary: AoE error messages not rate-limited
Product: IO/Storage Reporter: Roman Mamedov (rm+bko)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, florian
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description Roman Mamedov 2010-10-09 06:35:23 UTC
Hello.

I had an AoE device go down overnight, and while a server was trying to write to it, it was also writing this message to its logs:

209                 printk(KERN_INFO "aoe: device %ld.%d is not up\n",
210                         d->aoemajor, d->aoeminor);

(See http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/block/aoe/aoeblk.c;hb=HEAD#l208 )

The message appeared many times per second, and over several hours produced about 7.5 gigabytes of log files, filling up all free space on the root filesystem.

I think the errors from the AoE module should be limited in rate, or the repeating (exactly the same) error messages should be folded into one with repeat count, like already done with some kernel messages from other modules.
Comment 1 Andrew Morton 2010-10-11 21:37:22 UTC
Thanks, I prepared a patch.
Comment 2 Florian Mickler 2010-10-31 18:03:59 UTC
This is 027b180d7405f2b2df25e2a8b1b796b00f3773cf in mainline.

Is this patch worthy for beeing applied to 2.6.27.y / 2.6.32.y or current stable?

AFAICS the message has been there in linus initial git import...