Most recent kernel where this bug did not occur: Distribution: Debian Etch Hardware Environment: PC i486, Dual Opteron Software Environment: Samba Problem Description: We have Dual Opteron server with Adaptec 2130S (aacraid) controller, where our 4-disk hw RAID-5 array resides. We use open kernel drivers from standard tree only, we custom compiled kernel 2.6.21.5 on top of standard Debian Etch i486. Server is used as Samba fileserver. After two years of perfect function, the controller started to beep, because one of the disks started to have uncorrectable errors. The RAID-5 array has fallen to "Degraded" state, although continued working. However, strange file locking errors have shown up soon. We use Visual Foxpro application, that some 200 Windows machines run from that Samba share. There are dozens of form files that the app uses. In some situations, the app spits errors that "access to the library file denied". The file is usually some form file from the Samba share. Moreover, I am even unable to copy or tarball the application share on the server console. The process repeatedly stops at some form file, and after minutes (probably when someone in the network stops using the file) it "unlocks" and continues. Because of that, I assume, that the RAID status has some (bad) impact on file locking. I know it shouldn't happen, but it does. When I temporarily resolved the RAID so that "Optional" status took place, the locking problems stopped. Once the RAID has fallen back to "Degraded" status, problems arise again. Since this is production server, I just have resolved the RAID problems. However I can offer You any help I'm able to do, to help solving this odd kernel bug. Steps to reproduce:
Is there nothing of interest in the logs?
Well, RAID problems are back, so debugging possible. I'll try latest kernel. The buggy one is 2.6.21.7
Kernel dosen't tell anything to kern.log when the controller starts beeping. The startup (dmesg) is here -see attachment 12690 [details]
Created attachment 12691 [details] syslog Well, these samba oplock breaks are suspicious.
However, I cannot guarantee, that exactly these oplock errors are the merit. I looked at old logs and some oplock problems were there before, however they seemed a bit different. That was Debian Sarge with older samba release, so the error codes and syntax could have changed..
I don't understand what I'm seeing in your logs. How come there's a pile of ata errors coming out when you say the problem is with the aacraid controller?
These are subject of separate bug 8979, that is resolved aj a problem of old smartd version.
The kernel dosen't show up anything interesting when RAID enters "Degraded" state.
Peter, any updates? Have you tried other kernel levels, newer ones or falling back to the one that used to work for you? I won't be surprised if the controller itself was going bad.
Well, until the bug 9017 persists, it's quite impossible to debug this problem, because the symptoms are pretty same (file locking problems). After the bug 9017 resolved, I could try removing a harddrive from raid and see what will happen with recent kernel, but it dosen't make any sense any sooner.