Bug 11636

Summary: dm-snapshot or kcopyd Oops
Product: IO/Storage Reporter: skysky (wen_zl)
Component: LVM2/DMAssignee: Mikulas Patocka (mpatocka)
Status: CLOSED CODE_FIX    
Severity: high CC: agk, gmazyland, huangzuben, mpatocka, roel, wen_zl, zhlqcn
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: putty.log.20080920
putty.log.20080925

Description skysky 2008-09-24 02:41:06 UTC
Latest working kernel version:
   2.6.26
Earliest failing kernel version:
   2.6.19
Distribution:
  Redhat enterprise version 5

Software Environment:
  kernel version:  2.6.26
  LVM2 version:     2.02.16-RHEL5
  device-mapper version: 1.02.13

Problem Description:

  I'm now running Redhat 5, with a few disks under md (RAID5), LVM on top of that, formatted ext3,and shared by Samba for a Windows XP client . When snapshots are in use, large file transfers (transfers of large files, or transfers of many smaller ones) from WinXP client to Samba, kernel panic. Oops is as next.Disabling snapshots eliminates the behavior.

Oops:
BUG: unable to handle kernel paging request at 00200200
IP: [<c04e6468>] list_del+0x4/0x4d
*pdpt = 0000000022173001 *pde = 0000000000000000 
Oops: 0000 [#1] SMP 
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 sbs sbshc battery ac parport_pc lp parport sg e1000 ide_cd_mod cdrom button serio_raw rtc_cmos rtc_core i2c_i801 rtc_lib i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_log dm_mod megaraid_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]

Pid: 2366, comm: kcopyd Not tainted (2.6.26 #1)
EIP: 0060:[<c04e6468>] EFLAGS: 00010246 CPU: 7
EIP is at list_del+0x4/0x4d
EAX: e117d4c8 EBX: e106c5d8 ECX: 00200200 EDX: e106c668
ESI: e117d4c8 EDI: f3558ac0 EBP: f300eb00 ESP: efc35f0c
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kcopyd (pid: 2366, ti=efc35000 task=f34d24c0 task.ti=efc35000)
Stack: e106c5d8 f88c7b3e 00000000 00000000 00000000 0000006e 00000370 f300eb00 
       f88c882e 02f2823a 00000000 00c0ac2d 00000000 ec03ad08 f3558ac0 ec03ad08 
       f38c28c0 f88c7c0e ec03ad08 f3589058 00000000 f88d2e16 00000000 f88c7be6 
Call Trace:
 [<f88c7b3e>] pending_complete+0x80/0x128 [dm_snapshot]
 [<f88c882e>] persistent_commit+0xdc/0xf0 [dm_snapshot]
 [<f88c7c0e>] copy_callback+0x28/0x2c [dm_snapshot]
 [<f88d2e16>] run_complete_job+0x46/0x6f [dm_mod]
 [<f88c7be6>] copy_callback+0x0/0x2c [dm_snapshot]
 [<f88d2c55>] process_jobs+0x1f/0x99 [dm_mod]
 [<f88d2dd0>] run_complete_job+0x0/0x6f [dm_mod]
 [<f88d2ccf>] do_work+0x0/0x38 [dm_mod]
 [<f88d2ce5>] do_work+0x16/0x38 [dm_mod]
 [<c042fc05>] run_workqueue+0x6b/0xe0
 [<c0430282>] worker_thread+0x0/0xbd
 [<c0430335>] worker_thread+0xb3/0xbd
 [<c043247c>] autoremove_wake_function+0x0/0x2d
 [<c04323ba>] kthread+0x38/0x5e
 [<c0432382>] kthread+0x0/0x5e
 [<c0404463>] kernel_thread_helper+0x7/0x10
 =======================
Code: e5 8d 4b 04 8b 51 04 8d 46 04 e8 65 00 00 00 8d 4b 0c 8b 51 04 8d 46 0c 5b 5e 5f e9 54 00 00 00 89 c3 eb eb 90 90 90 53 8b 48 04 <8b> 11 39 c2 74 13 52 50 68 1d 3e 69 c0 e8 43 db f3 ff 0f 0b 83 
EIP: [<c04e6468>] list_del+0x4/0x4d SS:ESP 0068:efc35f0c
---[ end trace 3e64353a09c5d9e6 ]---

Maybe bug is in dm-snapshot or kcopyd thread code segment when exception is deleted. is it right? how to fix?

thanks in advance.
Comment 1 Andrew Morton 2008-09-24 02:54:18 UTC
Reassiged as io/storage, MD.  Assigned to Alasdair, marked as a regression.
Comment 2 Mikulas Patocka 2008-09-24 12:35:42 UTC
Hi

I suspect that you have general corruption of kernel memory and the bug is not caused by snapshots --- that snapshots are simply victim of the bug (because they allocate kernel memory heavily) and not cause of it.

I'd suggest these steps to track down where the corruption happens:

1) record your system configuration (drivers and processes that you are running) and find some reliable way how to reproduce the bug --- i.e. what to run and how long does it take on average until the bug appears. So that later, if the bug stops to appear, you can always go back to a configuration where it appears.

2) compile the same kernel with absolute minimum of things --- no drivers, no ACPI, nothing except what is needed for the reproduction of the bug (only disk driver, network card driver, md, dm). Don't run any other processes except samba (no Xwindow, etc). Try to reproduce the bug. If the bug doesn't happen, use bisecting to find out which driver or process caused it.

3) If the bug still happens, underclock your CPU and RAM timing and try to reproduce it again. Run memtest86. If you have spare RAM, try replacing it.

4) If the bug still happens, try some older kernel (2.6.25 and less) and try to find a kernel where the bug doesn't happen. If you find two adjacent kernels one with the bug and one without, use bisecting of patches to find out which one exactly caused the data corruption (there are automated tools on internet that can bisect patches between two kernel versions).
Comment 3 zhanglq 2008-09-25 02:31:34 UTC
Steps to Reproduce this error:
1.Create a lvm2 volume,and format this volume to ext3
2.Create one or more snapshot to this volume
3.Use samba share one directory to client. Here Samba version is 2.2.12 or 3.0.28.
4.Use Windows XP or 2K3 Client copy large amount files(maybe 5GB or more) to
this samba share.
5.After about 10 minutes transfer speed gose to zero
6.Wait serveral mintues more,then windows Client said:"Cannot copy <filename>:The specified network name is no longer available"
7.At the same time,samba server give difference error output:
  Kernel 2.6.19: see attachment  putty.log.20080920
  Kernel 2.6.26: see the begin of this bug(Description From skysky 2008-09-24 02:41  )
  Kernel 2.6.26.5: see attachment putty.log.20080925
Comment 4 zhanglq 2008-09-25 02:32:00 UTC
Created attachment 18028 [details]
putty.log.20080920
Comment 5 zhanglq 2008-09-25 02:32:48 UTC
Created attachment 18029 [details]
putty.log.20080925
Comment 6 zhanglq 2008-09-25 03:09:55 UTC
(In reply to comment #2)
> Hi
> I suspect that you have general corruption of kernel memory and the bug is
> not
> caused by snapshots --- that snapshots are simply victim of the bug (because
> they allocate kernel memory heavily) and not cause of it.
> I'd suggest these steps to track down where the corruption happens:
> 1) record your system configuration (drivers and processes that you are
> running) and find some reliable way how to reproduce the bug --- i.e. what to
> run and how long does it take on average until the bug appears. So that
> later,
> if the bug stops to appear, you can always go back to a configuration where
> it
> appears.
If we do as described in Comment #3 From zhanglq 2008-09-25 02:31:34,each time we can get that error.
> 2) compile the same kernel with absolute minimum of things --- no drivers, no
> ACPI, nothing except what is needed for the reproduction of the bug (only
> disk
> driver, network card driver, md, dm). Don't run any other processes except
> samba (no Xwindow, etc). Try to reproduce the bug. If the bug doesn't happen,
> use bisecting to find out which driver or process caused it.
> 3) If the bug still happens, underclock your CPU and RAM timing and try to
> reproduce it again. Run memtest86. If you have spare RAM, try replacing it.
> 4) If the bug still happens, try some older kernel (2.6.25 and less) and try
> to > find a kernel where the bug doesn't happen. If you find two adjacent
> kernels
> one with the bug and one without, use bisecting of patches to find out which
> one exactly caused the data corruption (there are automated tools on internet
> that can bisect patches between two kernel versions).
we will do as  you recommended,and until now the error still exist.
Comment 7 Mikulas Patocka 2008-09-29 08:14:32 UTC
Your results from 2.6.19 kernel suggests that you run out of memory. Snapshots consume a lot of memory (they need in-memory unswappable record for all reallocated chunks).

How much memory does your computer have? How much is free? Run "free" and "slabtop" commands and see the content of /proc/meminfo some time into the test and before the failure to see how much memory do snapshots consume.

But despite the memory issue, the kernel shouldn't crash. So there is some memory corruption in 2.6.26 (that wasn't present in 2.6.19) that causes crash when the kernel goes out of memory.

Please make sure that your kernels 2.6.19 and 2.6.26 use the same config options (or almost same --- except options that are different between these two kernels). Then, bisect between 2.6.19 and 2.6.26. Treat 2.6.19 behavior (OOM killer activation) as "correct" and 2.6.26 behavior (BUG or OOPS) as "bad". Try to find out kernel version and -rc version when the BUG was introduced.
Comment 8 zhanglq 2008-10-06 02:51:48 UTC
(In reply to comment #7)
> Your results from 2.6.19 kernel suggests that you run out of memory.
> Snapshots
> consume a lot of memory (they need in-memory unswappable record for all
> reallocated chunks).
> How much memory does your computer have? How much is free? Run "free" and
> "slabtop" commands and see the content of /proc/meminfo some time into the
> test
> and before the failure to see how much memory do snapshots consume.
> But despite the memory issue, the kernel shouldn't crash. So there is some
> memory corruption in 2.6.26 (that wasn't present in 2.6.19) that causes crash
> when the kernel goes out of memory.
> Please make sure that your kernels 2.6.19 and 2.6.26 use the same config
> options (or almost same --- except options that are different between these
> two
> kernels). Then, bisect between 2.6.19 and 2.6.26. Treat 2.6.19 behavior (OOM
> killer activation) as "correct" and 2.6.26 behavior (BUG or OOPS) as "bad".
> Try
> to find out kernel version and -rc version when the BUG was introduced.


Now, we change kernel to 2.6.24.7 and the bug seems to be fixed in this version,and we will use 2.6.24.7 as the final releaseof our software.

Summary:
In kernel 2.6.24 and 2.6.24.7,no such bug found.
In kernel 2.6.29、2.6.22、2.6.26.5、2.6.27,all have found this bug.

As we test kernel 2.6.24.7 for several days,we can believe in  this  version.
But we still do not know why.
Will you study it in depth?
thanks!
Comment 9 Mikulas Patocka 2008-10-08 03:56:30 UTC
It looks weird that out-of-memory condition is present in 2.6.19 and not present in 2.6.24. Snapshots remained mainly the same between these kernel versions, so it is improbable that one would consume significantly more memory than the other.

It may be possible that your test conditions were not exactly the same. To make sure, rerun the test with 2.6.19 and 2.6.24 few times and alternate between these two kernel versions, so that long-lasting changes (such as allocation of space on the disk) will affect both kernels the same way. Also, make sure that CONFIG_LBD is set to same value for both kernels, because it affects memory consumption.

It would be good if you could bisect between 2.6.24 and 2.6.26, to find out exact *-rc kernel, where the crash was introduced. This would help us to fix the crash.
Comment 10 Mikulas Patocka 2008-10-08 15:07:52 UTC
I have a question? Did you write simultaneously to the snapshot and to the origin when that "BUG: unable to handle kernel paging request at 00200200" happened? During source code review I found something that I suspect could cause this bug, but it happens only when someone simultaneously writes to both snapshot and origin.

I will make a patch soon.
Comment 11 zhanglq 2008-10-09 05:41:25 UTC
We do as  you supposed,rerun 2.6.19 and 2.6.24.7 kernel serveal times,and rerun RHEL 5.2(kernel 2.6.18) also.
Each time,We  reproduced this bug,so i think all the kernel tested since now have this bug.
So sorry for this result!

The latested description was posted by another member of us,please see the post:
http://bugzilla.kernel.org/show_bug.cgi?id=11720
In that post,we only test samba application.But today we get the same result with iSCSI Enterprise Target application,so we can be sure that this bug have no relation with any specified application.
The most suspected process was lvm2,maybe snapshot mechanism,or maybe device-mapper,should it one of these  use out system memory and cause this bug.

But we can not  see anything abnormal  from the meminfo or slabtop or the oom output.
Comment 12 Mikulas Patocka 2008-10-10 10:19:33 UTC
It looks like there are at least two bugs and we must fix them one by one.

So now, for the oopses:

Get kernel 2.6.26, where you saw the oopses. Try your testcase again to make sure that you have a workload that reproduces the oops (watch only for oopses, not memory overflow, oom-killer rage or other things). Now apply the patch at http://people.redhat.com/mpatocka/patches/kernel/dm-snapshot-fix-primary-pe-race.patch and try exactly the same workload again few times. Say, if you still see an oops.

Also, please answer the question if you were simultaneously writing to both the origin device and snapshot device (if you had both them mounted read-write), when you saw the oopses. Say: "Yes", "No" or "I don't remember".
Comment 13 zhanglq 2008-10-12 04:34:21 UTC
In  eache testcase,we only write to origin volume.
Comment 14 zhanglq 2008-10-13 02:03:23 UTC
we will test this patch,but recently our hardware platform is in use for other application. 
This will let you wait for several days.
Comment 15 skysky 2008-10-16 17:39:20 UTC
The exception-table map the copied data location on the ‘origin-real’
device to the block location on the ‘snap-cow’ device. The
‘exceptiontable’mapping helps to route read/write I/O requests coming for
the snapshot volume to the right place. But when snapshot created and copying masses amount data,The exception-table entry is leaped and consume  lots of memory. moreover,the exception-table is in low memory which size is 896M. So, soon Out of memory is happened. The exception-table will be rebuilt at boot. 

is it right? How and when to resolve it?

   thanks in advance.
Comment 16 Mikulas Patocka 2008-10-16 19:59:21 UTC
skysky: you are right. Kernel 2.6.25 introduced exception table compression in memory, basically it can compress consecutive runs of relocations. The compression is enabled only if you enable Large Block Device (CONFIG_LBD) --- so you can try this. Note that it doesn't guarrantee low memory usage, if your filesystem becomes too fragmented and writes will will be non-consecutive, the compression will compress fewer chunks and memory usage will increase.

Another (guaranteed) method how to decrease memory usage is increasing the chunk size. You can do it with a parameter to lvcreate command.

But as I said, we must first solve the oops problem, it happened to you at least twice, so you may be able to reproduce it, please try it and then try my patch. When the oops is solved, we can look at memory consumption.
Comment 17 Benn 2008-10-30 03:02:51 UTC
(In reply to comment #3)
> Steps to Reproduce this error:
> 1.Create a lvm2 volume,and format this volume to ext3
> 2.Create one or more snapshot to this volume
> 3.Use samba share one directory to client. Here Samba version is 2.2.12 or
> 3.0.28.
> 4.Use Windows XP or 2K3 Client copy large amount files(maybe 5GB or more) to
> this samba share.
> 5.After about 10 minutes transfer speed gose to zero
> 6.Wait serveral mintues more,then windows Client said:"Cannot copy
> <filename>:The specified network name is no longer available"
> 7.At the same time,samba server give difference error output:
>   Kernel 2.6.19: see attachment  putty.log.20080920
>   Kernel 2.6.26: see the begin of this bug(Description From skysky 2008-09-24
> 02:41  )
>   Kernel 2.6.26.5: see attachment putty.log.20080925
> 

  The following line from the attachment suggests that your linux kernel has no 
enough memory. This will happen even if your linux box has plenty of physical
memory on x86_32 bit with 1G/3G spilt. e.g. having a 24+0 md raid5 with stripe_cache_size
8092, will comsumes about $((8192*4096*24))=805MB, and this will leave no more than
100MB address space for kernel. When kernel become lack of address space, it will kill
random processes.

Normal: 1*4kB 1*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 11*4096kB = 48156kB


Above is my assumption, hope will help :-)
Comment 18 Roel Broersma 2008-12-09 08:44:54 UTC
Does anyone have an update about this bug ?
(i'm using OpenFiler 2.3 and use a 2.6.26 kernel. I'm having exactly the same problems. First we thought about the Intel drivers (with a lot of load) but it finally seems to be writing to a volume (which is snapshotted) in combination with Samba or iscsi_trgt. I'm didn't got the errors with a 2.6.24 kernel.  Got exactly the same traces and Oops.. )

Our whole community is waiting for a update/fix ;)
(https://project.openfiler.com/tracker/ticket/840, check the attachments/testcases also).

I have a dedicated test system and can/will help!
Comment 19 Roel Broersma 2008-12-14 07:25:39 UTC
We have appliad the fix in (OpenFiler) Kernel: 2.6.26.8-1.0.7  AND IT WORKS !
I've load tested for several days and it still works!  (disabling snapshots also seem to 'fix' the issue)

(for more info, see ticket: http://bugzilla.kernel.org/show_bug.cgi?id=11720 )
The creator of the fix (Mikulas Patocka) says there is also a lot (of this stuff) solved in the 2.6.28RC8 kernel.  Thanks!
Comment 20 Alasdair G Kergon 2009-10-18 18:36:51 UTC
No activity since December - can we mark this 'resolved'?
Comment 21 Roel Broersma 2009-10-19 13:59:37 UTC
(In reply to comment #20)
> No activity since December - can we mark this 'resolved'?

Ask 'Mikulas Patocka',  he made some patches, i;m not sure if they are incorporated in the latest kernel versions or not.. ?
Comment 22 Mikulas Patocka 2009-10-26 11:39:49 UTC
There is some unfinished report of corruption on iScsi (whent he user went quiet), other that that there are no known problems.