Bug 9017
Summary: | Strange file locking problems | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Peter (tuharsky) |
Component: | SCSI | Assignee: | io_scsi |
Status: | REJECTED INSUFFICIENT_DATA | ||
Severity: | normal | CC: | akpm, protasnb, yyyeer.bo |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.22.6 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Kern.log
syslog (gzipped) |
Description
Peter
2007-09-14 01:02:09 UTC
The filesystem is ext3, of course. We have about 200 Windows XP clients in the LAN, most of them open documents from fileserver. There is also an app that some 100 clients open from fileserver too. Can we see the kernel logs please? Created attachment 12825 [details]
Kern.log
The Sep 7 entries are from the last reboot before the failure. I don't have any logs hereafter. Next records available are from Sep 10. The problems started at Sep 10 around noon (when the backup runs), however nothing visible here. There are no more entries here before the next restart that restored the normal function.
Created attachment 12826 [details]
syslog (gzipped)
Syslog. As You can see, at 11:30 the backup job (and soon the problems) started. At 13:15, I rebooted the machine since other attempts failed.
I filtered out CUPS and sensord messages, since I don't think they are interesting anyhow. Peter, is the problem still there? Have you tested with newer kernels since then? Thanks. Well, since this is production system, we must have workarounded it. You know, this problem is not too easily reproducible, however clearly based on file lockig. Since the samba passes all locking to the kernel, and we haven't touched samba however touched kernel few times, the kernel is suspect #1. After problems, I have downgraded kernel to version 2.6.21.7, and problems have quite disappeared. However, we have met them again, althought we feel it's been less frequently than with 2.6.22.6 So I have downgraded again to 2.6.19.2, and seems that problem is completely gone. So You can assume that the bug appeared AFTER 2.6.19.2 I can send You kernel configurations, they are slightly different between the kernel versions (I need new function, so I compile it with newer kernel version). Unfortunately, I'm afraid it's not completely equal to try apply the config from 2.6.19.2 to the new version -many parameters have been added to kernel in newer versions. I cannot promise You, how soon could I test the new kernel again. I'll try. But will it make a difference? Does anybody actually work on the problem? You know, I'm already tired of those responses "test with newer kernel" after years from report, then I test again, then nothing for a half year, and then again "still problem with newer kernel?". For a production, it's quite a luxury to do such tests with no real chance to progress. So it appears that this was a regression from 2.6.19 and worsened from then on. The best would be to try git bisect search, but it is quite difficult for you in production environment. The reason for testing higher releases is to test patches when they have been developed, and for unsolved bugs is to verify if it is still triggered with all the surrounding changes, or maybe it mutates and creates different problems. It's understandably difficult to do when you are dealing with production system. Sometimes, it is possible to set up parallel debug system running your application and model it, but this is definitely a luxury too, depends if you have resources for that. Well, at the end of december I have booted 2.6.23.11 kernel, and since then, at least 5 times the application freezed due the exclusively locked files. I booted back the old kernel 16.1.2008 and we'll see, what will happen. However, seems that 2.6.23.11 really IS getting the problem worse. Peter, since you are having freezes repeatedly, you can collect some information on what processes are wedging your system without risk to your production environment. If you run a script that periodically collects output from vmstat, /proc/meminfo, and "echo t > /proc/sysreq-trigger", then you'll hopefully have a set of data close or at time of the freeze. |