Bug 114301

Summary: jbd2_journal_write_revoke_records
Product: File System Reporter: xierui1010
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: hejiash, kernel, tytso
Priority: P1    
Hardware: PPC-64   
OS: Linux   
Kernel Version: 2.6.32-573.7.1.el6.ppc64 Subsystem:
Regression: No Bisected commit-id:
Attachments: backtrace.txt

Description xierui1010 2016-03-10 13:44:39 UTC
Created attachment 208631 [details]
backtrace.txt

KERNEL: /usr/lib/debug/lib/modules/2.6.32-573.7.1.el6.ppc64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 32
        DATE: Wed Mar  9 22:17:09 2016
      UPTIME: 61 days, 16:55:25
LOAD AVERAGE: 1.28, 1.27, 1.34
       TASKS: 2741
    NODENAME: pow-bss-jf23
     RELEASE: 2.6.32-573.7.1.el6.ppc64
     VERSION: #1 SMP Thu Sep 10 13:44:06 EDT 2015
     MACHINE: ppc64  (4228 Mhz)
      MEMORY: 251.8 GB
       PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for details)
         PID: 1840
     COMMAND: "jbd2/dm-13-8"
        TASK: c000003e8a130f60  [THREAD_INFO: c000003e845d4000]
         CPU: 4
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 1840   TASK: c000003e8a130f60  CPU: 4   COMMAND: "jbd2/dm-13-8"
 #0 [c000003e845d7500] .crash_kexec at c0000000000ec0c4
 #1 [c000003e845d7700] .die at c000000000031638
 #2 [c000003e845d77b0] .bad_page_fault at c000000000044bd8
 #3 [c000003e845d7830] handle_page_fault at c000000000005228
 Data Access error  [300] exception frame:
 R0:  0000000000000001    R1:  c000003e845d7b20    R2:  d0000000198de328   
 R3:  c0000019e95753d0    R4:  c000003e853bf180    R5:  0000000000000005   
 R6:  c0000019e9575440    R7:  c0000000016958c8    R8:  c000003eefec1f00   
 R9:  0000000000000010    R10: 0000000000000000    R11: c000001c751f4be0   
 R12: d0000000198cf750    R13: c000000001072f00    R14: 0000000000000000   
 R15: 0000000000000000    R16: ffffffffc03b3998    R17: 0000000000000005   
 R18: c000003e853bf180    R19: 0000000000000441    R20: c000003e966c3388   
 R21: 0000000000020000    R22: 0000000000000002    R23: 0000000000000001   
 R24: c0000019e95753d0    R25: c0000019e95753d0    R26: d0000000198d6360   
 R27: 0000000000000010    R28: c000003e88907590    R29: c000003e8a8b1e00   
 R30: d0000000198dd0d8    R31: 1fc67f012a980061   
 NIP: d0000000198c77c4    MSR: 8000000000009032    OR3: d0000000198d6370
 CTR: d0000000198c0880    LR:  d0000000198c791c    XER: 0000000000000000
 CCR: 0000000024008044    MQ:  0000000000000001    DAR: 1fc67f012a980079
 DSISR: 0000000040000000     Syscall Result: 0000000000000000

 #4 [c000003e845d7b20] .jbd2_journal_write_revoke_records at d0000000198c77c4 [jbd2]
 [Link Register ]  [c000003e845d7b20] .jbd2_journal_write_revoke_records at d0000000198c791c  (unreliable)
 #5 [c000003e845d7c10] .jbd2_journal_commit_transaction at d0000000198c39ec [jbd2]
 #6 [c000003e845d7de0] .kjournald2 at d0000000198cc51c [jbd2]
 #7 [c000003e845d7ed0] .kthread at c0000000000c17ec
 #8 [c000003e845d7f90] .kernel_thread at c000000000033c34
Comment 1 Christian Kujau 2016-03-13 19:35:29 UTC
Setting this to "high" won't help much :-) Was this error reported to the ext4 mailing list? What happened before this error? Was the system under load? Can you reproduce this crash? With a current kernel?
Comment 2 Theodore Tso 2016-03-13 23:52:21 UTC
This is a Red Hat Enterprise Linux kernel.   As such, bugs should be reported to Red Hat and Red Hat has the responsibility for fixing them.

If you have a reliable repro on a modern kernel (as opposed to a distribution kernel customized to within an inch of its life and released six years ago), please feel free to open a bug with the information on how to reproduce this on a modern kernel.
Comment 3 xierui1010 2016-03-14 03:30:22 UTC
Dear,the system under load.I can't to reproduce this problem.
Thank you for your help!
Comment 4 jia he 2016-03-30 09:19:26 UTC
(In reply to xierui1010 from comment #3)
> Dear,the system under load.I can't to reproduce this problem.
> Thank you for your help!

Hi xierui
I wonder whether you are using emulex be2net network driver.
And I got the similar call trace with you. Need to collect some common symptoms for the memory corruption