Bug 36072
Summary: | celestia causes kernel oops when allocation a lot of memory (for textures) | ||
---|---|---|---|
Product: | Drivers | Reporter: | aceman (acelists) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED OBSOLETE | ||
Severity: | high | CC: | alan, alexdeucher, thellstrom |
Priority: | P1 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 3.5.3 | Subsystem: | |
Regression: | Yes | Bisected commit-id: |
Description
aceman
2011-05-28 10:52:23 UTC
I am also using the transparent hugepages feature with is set to 'always' use them. This is getting worse. With kernel 3.0.3, mesa 7.11 I get the crash several seconds after starting celestia each time. Aug 30 00:01:03 coolbox kernel: [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! Aug 30 00:01:03 coolbox kernel: vmap allocation for size 178982912 failed: use vmalloc=<size> to increase size. Aug 30 00:01:03 coolbox kernel: BUG: unable to handle kernel paging request at 9adc8491 Aug 30 00:01:03 coolbox kernel: IP: [<f95daad0>] ttm_bo_move_ttm+0xa0/0xa0 [ttm] Aug 30 00:01:03 coolbox kernel: *pde = 00000000 Aug 30 00:01:03 coolbox kernel: Oops: 0000 [#1] PREEMPT SMP Aug 30 00:01:03 coolbox kernel: Modules linked in: fbcon font bitblit softcursor radeon ttm drm_kms_helper drm autofs4 agpgart fb fbdev cfbcopyarea cfbimgblt cfbfillrect nf_conntrack_ftp xt_tcpudp xt_owner xt_multiport nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT ipt_LOG iptable_filter ip_tables x_tables asus_atk0110 snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss cpufreq_conservative cpufreq_ondemand psmouse pcspkr cx88_blackbird firmware_class cx2341x cx8802 tuner_simple tuner_types tda9887 tda8290 tea5767 tuner cx8800 cx88xx rc_core i2c_algo_bit tveeprom v4l2_common videodev btcx_risc videobuf_dma_sg videobuf_core forcedeth snd_hda_codec_via snd_hda_intel snd_hda_codec snd_pcm snd_timer snd soundcore snd_page_alloc i2c_nforce2 i2c_core usbhid powernow_k8 processor mperf ohci_hcd ehci_hcd usbcore fuse Aug 30 00:01:03 coolbox kernel: Aug 30 00:01:03 coolbox kernel: Pid: 5484, comm: celestia Not tainted 2.6.40.3 #2 System manufacturer System Product Name/M2N68 Aug 30 00:01:03 coolbox kernel: EIP: 0060:[<f95daad0>] EFLAGS: 00010246 CPU: 1 Aug 30 00:01:03 coolbox kernel: EIP is at ttm_mem_io_lock+0x0/0x20 [ttm] Aug 30 00:01:03 coolbox kernel: EAX: 9adc8454 EBX: f565dc1c ECX: 00000000 EDX: 00000000 Aug 30 00:01:03 coolbox kernel: ESI: e04f8440 EDI: 9adc8454 EBP: f565dd3c ESP: f565dbe4 Aug 30 00:01:03 coolbox kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Aug 30 00:01:03 coolbox kernel: Process celestia (pid: 5484, ti=f565c000 task=e06347b0 task.ti=f565c000) Aug 30 00:01:03 coolbox kernel: Stack: Aug 30 00:01:03 coolbox kernel: f95dafa5 f565dc1c f565dd74 f565dc8c f95db162 dfdd402c fffffff4 00040002 Aug 30 00:01:03 coolbox kernel: e04f8440 f95d9c96 dfdd4064 dfec1640 00040001 00000002 00000002 f565dcc4 Aug 30 00:01:03 coolbox kernel: e04f8490 e04f8440 00000000 4bee9000 f5e32860 c0107e69 e04f8440 f565dc8c Aug 30 00:01:03 coolbox kernel: Call Trace: Aug 30 00:01:03 coolbox kernel: [<f95dafa5>] ? ttm_mem_reg_iounmap+0x35/0x70 [ttm] Aug 30 00:01:03 coolbox kernel: [<f95db162>] ? ttm_bo_move_memcpy+0x182/0x310 [ttm] Aug 30 00:01:03 coolbox kernel: [<f95d9c96>] ? ttm_bo_mem_space+0x306/0x3a0 [ttm] Aug 30 00:01:03 coolbox kernel: [<c0107e69>] ? nommu_map_page+0x39/0x70 Aug 30 00:01:03 coolbox kernel: [<f973b030>] ? radeon_bo_move+0xe0/0x330 [radeon] Aug 30 00:01:03 coolbox kernel: [<f95d8555>] ? ttm_bo_reserve_locked+0xa5/0x120 [ttm] Aug 30 00:01:03 coolbox kernel: [<f95d8bff>] ? ttm_bo_unreserve+0x1f/0x30 [ttm] Aug 30 00:01:03 coolbox kernel: [<f973af50>] ? radeon_move_blit.clone.2+0x1f0/0x1f0 [radeon] Aug 30 00:01:03 coolbox kernel: [<f95d8e95>] ? ttm_bo_handle_move_mem+0x135/0x340 [ttm] Aug 30 00:01:03 coolbox kernel: [<f95d9e5c>] ? ttm_bo_move_buffer+0x12c/0x140 [ttm] Aug 30 00:01:03 coolbox kernel: [<f95d9f06>] ? ttm_bo_validate+0x96/0x120 [ttm] Aug 30 00:01:03 coolbox kernel: [<f973bfd1>] ? radeon_bo_list_validate+0x71/0xc0 [radeon] Aug 30 00:01:03 coolbox kernel: [<f97545e2>] ? radeon_cs_ioctl+0x82/0x1a0 [radeon] Aug 30 00:01:03 coolbox kernel: [<c0206466>] ? print_block+0x376/0x510 Aug 30 00:01:03 coolbox kernel: [<f95efbaf>] ? drm_ioctl+0x18f/0x390 [drm] Aug 30 00:01:03 coolbox kernel: [<c0206466>] ? print_block+0x376/0x510 Aug 30 00:01:03 coolbox kernel: [<f9754560>] ? radeon_cs_finish_pages+0xa0/0xa0 [radeon] Aug 30 00:01:03 coolbox kernel: [<c014e373>] ? sched_clock_local+0xc3/0x1b0 Aug 30 00:01:03 coolbox kernel: [<c010fd5a>] ? x86_pmu_enable+0x1da/0x250 Aug 30 00:01:03 coolbox kernel: [<c0173f2d>] ? perf_event_task_tick+0xbd/0x220 Aug 30 00:01:03 coolbox kernel: [<f95efa20>] ? drm_version+0x90/0x90 [drm] Aug 30 00:01:03 coolbox kernel: [<c01b7228>] ? do_vfs_ioctl+0x88/0x5e0 Aug 30 00:01:03 coolbox kernel: [<c03a6b69>] ? schedule+0x1d9/0x680 Aug 30 00:01:03 coolbox kernel: [<c014c86d>] ? hrtimer_interrupt+0x15d/0x270 Aug 30 00:01:03 coolbox kernel: [<c0150050>] ? getnstimeofday+0x40/0xe0 Aug 30 00:01:03 coolbox kernel: [<c01b77bd>] ? sys_ioctl+0x3d/0x70 Aug 30 00:01:03 coolbox kernel: [<c0206466>] ? print_block+0x376/0x510 Aug 30 00:01:03 coolbox kernel: [<c03a8e21>] ? syscall_call+0x7/0xb Aug 30 00:01:03 coolbox kernel: [<c0206466>] ? print_block+0x376/0x510 Aug 30 00:01:03 coolbox kernel: [<c0206466>] ? print_block+0x376/0x510 Aug 30 00:01:03 coolbox kernel: Code: 00 00 00 66 31 c0 83 c8 01 89 47 50 eb 9f 90 8d 74 26 00 89 da 89 f0 e8 cf ca ff ff 85 c0 74 a4 89 c5 eb b2 8d b4 26 00 00 00 00 Aug 30 00:01:03 coolbox kernel: EIP: [<f95daad0>] ttm_mem_io_lock+0x0/0x20 [ttm] SS:ESP 0068:f565dbe4 Aug 30 00:01:03 coolbox kernel: CR2: 000000009adc8491 Aug 30 00:01:03 coolbox kernel: ---[ end trace fed47f0f5bccf5c3 ]--- Please attach the full dmesg. AFAICT ttm_bo_move_memcpy() uses old_copy uninitialized in some error paths. Thomas? I think this is the full relevant part of dmesg. Actually it is the content of /var/log/syslog after the machine reboot (with Alt-Sysrq-R). Should I capture something else? (In comment 2 the kernel version is displayed as 2.6.40.3, but that is just a renamed 3.0.3, custom compiled.) (In reply to comment #4) > Should I capture something else? Yes, the agp/drm/radeon initialization messages. Ok, this is from /var/log/messages: Aug 29 19:04:49 coolbox kernel: Linux agpgart interface v0.103 Aug 29 19:04:49 coolbox kernel: [drm] Initialized drm 1.1.0 20060810 Aug 29 19:04:49 coolbox kernel: [drm] radeon kernel modesetting enabled. Aug 29 19:04:49 coolbox kernel: radeon 0000:02:00.0: PCI INT A -> Link[LNEB] -> GSI 16 (level, low) -> IRQ 16 Aug 29 19:04:49 coolbox kernel: [drm] initializing kernel modesetting (RV710 0x1002:0x954F). Aug 29 19:04:49 coolbox kernel: [drm] register mmio base: 0xDFFF0000 Aug 29 19:04:49 coolbox kernel: [drm] register mmio size: 65536 Aug 29 19:04:49 coolbox kernel: ATOM BIOS: 954F.11.12.0.2.AS01 Aug 29 19:04:49 coolbox kernel: radeon 0000:02:00.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used) Aug 29 19:04:49 coolbox kernel: radeon 0000:02:00.0: GTT: 512M 0x0000000020000000 - 0x000000003FFFFFFF Aug 29 19:04:49 coolbox kernel: [drm] Detected VRAM RAM=512M, BAR=256M Aug 29 19:04:49 coolbox kernel: [drm] RAM width 64bits DDR Aug 29 19:04:49 coolbox kernel: [TTM] Zone kernel: Available graphics memory: 443484 kiB. Aug 29 19:04:49 coolbox kernel: [TTM] Zone highmem: Available graphics memory: 1037248 kiB. Aug 29 19:04:49 coolbox kernel: [TTM] Initializing pool allocator. Aug 29 19:04:49 coolbox kernel: [drm] radeon: 512M of VRAM memory ready Aug 29 19:04:49 coolbox kernel: [drm] radeon: 512M of GTT memory ready. Aug 29 19:04:49 coolbox kernel: [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). Aug 29 19:04:49 coolbox kernel: [drm] Driver supports precise vblank timestamp query. Aug 29 19:04:49 coolbox kernel: radeon 0000:02:00.0: radeon: using MSI. Aug 29 19:04:49 coolbox kernel: [drm] radeon: irq initialized. Aug 29 19:04:49 coolbox kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072 Aug 29 19:04:49 coolbox kernel: [drm] Loading RV710 Microcode Aug 29 19:04:50 coolbox kernel: radeon 0000:02:00.0: WB enabled Aug 29 19:04:50 coolbox kernel: [drm] ring test succeeded in 1 usecs Aug 29 19:04:50 coolbox kernel: [drm] radeon: ib pool ready. Aug 29 19:04:50 coolbox kernel: [drm] ib test succeeded in 0 usecs Aug 29 19:04:50 coolbox kernel: [drm] Radeon Display Connectors Aug 29 19:04:50 coolbox kernel: [drm] Connector 0: Aug 29 19:04:50 coolbox kernel: [drm] HDMI-A Aug 29 19:04:50 coolbox kernel: [drm] HPD1 Aug 29 19:04:50 coolbox kernel: [drm] DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c Aug 29 19:04:50 coolbox kernel: [drm] Encoders: Aug 29 19:04:50 coolbox kernel: [drm] DFP1: INTERNAL_UNIPHY Aug 29 19:04:50 coolbox kernel: [drm] Connector 1: Aug 29 19:04:50 coolbox kernel: [drm] VGA Aug 29 19:04:50 coolbox kernel: [drm] DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c Aug 29 19:04:50 coolbox kernel: [drm] Encoders: Aug 29 19:04:50 coolbox kernel: [drm] CRT2: INTERNAL_KLDSCP_DAC2 Aug 29 19:04:50 coolbox kernel: [drm] Connector 2: Aug 29 19:04:50 coolbox kernel: [drm] DVI-I Aug 29 19:04:50 coolbox kernel: [drm] HPD4 Aug 29 19:04:50 coolbox kernel: [drm] DDC: 0x7f10 0x7f10 0x7f14 0x7f14 0x7f18 0x7f18 0x7f1c 0x7f1c Aug 29 19:04:50 coolbox kernel: [drm] Encoders: Aug 29 19:04:50 coolbox kernel: [drm] CRT1: INTERNAL_KLDSCP_DAC1 Aug 29 19:04:50 coolbox kernel: [drm] DFP2: INTERNAL_UNIPHY2 Aug 29 19:04:50 coolbox kernel: [drm] Internal thermal controller without fan control Aug 29 19:04:50 coolbox kernel: [drm] radeon: power management initialized Aug 29 19:04:50 coolbox kernel: [drm] fb mappable at 0xC0142000 Aug 29 19:04:50 coolbox kernel: [drm] vram apper at 0xC0000000 Aug 29 19:04:50 coolbox kernel: [drm] size 9216000 Aug 29 19:04:50 coolbox kernel: [drm] fb depth is 24 Aug 29 19:04:50 coolbox kernel: [drm] pitch is 7680 Aug 29 19:04:50 coolbox kernel: fb0: radeondrmfb frame buffer device Aug 29 19:04:50 coolbox kernel: drm: registered panic notifier Aug 29 19:04:50 coolbox kernel: [drm] Initialized radeon 2.9.0 20080528 for 0000:02:00.0 on minor 0 If this is still seen in modern (3.2+) kernels, please re-open thanks Yes, still happens randomly, on kernel 3.5.3, X.org 1.12, ati driver 6.99.99, Mesa git (9.0): Aug 29 23:46:50 coolbox kernel: [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! Aug 29 23:47:16 coolbox last message repeated 646 times Aug 29 23:48:07 coolbox last message repeated 554 times Aug 29 23:50:15 coolbox kernel: [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! Aug 29 23:50:41 coolbox last message repeated 6 times Aug 29 23:50:41 coolbox kernel: vmap allocation for size 178966528 failed: use vmalloc=<size> to increase size. Aug 29 23:50:41 coolbox kernel: [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12! Aug 29 23:50:49 coolbox last message repeated 131 times Aug 29 23:50:58 coolbox kernel: ------------[ cut here ]------------ Aug 29 23:50:58 coolbox kernel: kernel BUG at drivers/gpu/drm/ttm/ttm_bo.c:1167! Aug 29 23:50:58 coolbox kernel: invalid opcode: 0000 [#1] SMP Aug 29 23:50:58 coolbox kernel: Modules linked in: usb_storage autofs4 nf_conntrack_ftp xt_tcpudp xt_owner xt_multiport nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_LOG iptable_filter ip _tables x_tables asus_atk0110 snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss cx88_blackbird cx2341x cx8802 tuner_simple tuner_types tda9887 tda8290 tea5767 tuner cx8800 cx88xx r c_core tveeprom v4l2_common videodev videobuf_dma_sg videobuf_core btcx_risc k10temp forcedeth snd_hda_codec_via snd_hda_intel snd_hda_codec snd_pcm snd_timer snd soundcore snd_page_alloc i2c_nforce2 ext4 mbcac he jbd2 crc16 usbhid powernow_k8 mperf ohci_hcd ehci_hcd usbcore usb_common fuse [last unloaded: microcode] Aug 29 23:50:58 coolbox kernel: Aug 29 23:50:58 coolbox kernel: Pid: 19931, comm: celestia Not tainted 2.6.45.3 #93 System manufacturer System Product Name/M2N68 Aug 29 23:50:58 coolbox kernel: EIP: 0060:[<c030a069>] EFLAGS: 00010206 CPU: 3 Aug 29 23:50:58 coolbox kernel: EIP is at ttm_bo_check_placement+0x19/0x20 Aug 29 23:50:58 coolbox kernel: EAX: e1f12c2c EBX: 00002aac ECX: 00000000 EDX: 00000100 Aug 29 23:50:58 coolbox kernel: ESI: ec13c468 EDI: ebbf71c0 EBP: 00021240 ESP: eaa33d70 Aug 29 23:50:58 coolbox kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Aug 29 23:50:58 coolbox kernel: CR0: 8005003b CR2: aa634000 CR3: 2afab000 CR4: 000007f0 Aug 29 23:50:58 coolbox kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Aug 29 23:50:58 coolbox kernel: DR6: ffff0ff0 DR7: 00000400 Aug 29 23:50:58 coolbox kernel: Process celestia (pid: 19931, ti=eaa32000 task=ec27a030 task.ti=eaa32000) Aug 29 23:50:58 coolbox kernel: Stack: Aug 29 23:50:58 coolbox kernel: c030b4c9 00000000 00000000 01aac000 e1f12c2c e1f12c00 fffffff4 00000001 Aug 29 23:50:58 coolbox kernel: 02aac000 c033bb07 00000000 e1f12c14 00000001 00000000 00000001 00000000 Aug 29 23:50:58 coolbox kernel: 00021240 00000000 c033b850 00000001 00000001 00000000 ec13c468 ec13cd38 Aug 29 23:50:58 coolbox kernel: Call Trace: Aug 29 23:50:58 coolbox kernel: [<c030b4c9>] ? ttm_bo_init+0x179/0x370 Aug 29 23:50:58 coolbox kernel: [<c033bb07>] ? radeon_bo_create+0x197/0x290 Aug 29 23:50:58 coolbox kernel: [<c033b850>] ? radeon_bo_clear_va+0x80/0x80 Aug 29 23:50:58 coolbox kernel: [<c034c2bc>] ? radeon_gem_object_create+0x5c/0xf0 Aug 29 23:50:58 coolbox kernel: [<c034c696>] ? radeon_gem_create_ioctl+0x66/0xf0 Aug 29 23:50:58 coolbox kernel: [<c0275cb3>] ? _copy_from_user+0x33/0x70 Aug 29 23:50:58 coolbox kernel: [<c034c630>] ? radeon_gem_pwrite_ioctl+0x30/0x30 Aug 29 23:50:58 coolbox kernel: [<c02f349c>] ? drm_ioctl+0x36c/0x3d0 Aug 29 23:50:58 coolbox kernel: [<c01c645d>] ? sys_umount+0x37d/0x380 Aug 29 23:50:58 coolbox kernel: [<c01c645d>] ? sys_umount+0x37d/0x380 Aug 29 23:50:58 coolbox kernel: [<c034c630>] ? radeon_gem_pwrite_ioctl+0x30/0x30 Aug 29 23:50:58 coolbox kernel: [<c019420a>] ? free_pgtables+0x8a/0xb0 Aug 29 23:50:58 coolbox kernel: [<c0193d49>] ? tlb_finish_mmu+0x9/0x30 Aug 29 23:50:58 coolbox kernel: [<c02f3130>] ? drm_copy_field+0x70/0x70 Aug 29 23:50:58 coolbox kernel: [<c01bc24a>] ? do_vfs_ioctl+0x7a/0x580 Aug 29 23:50:58 coolbox kernel: [<c019ab25>] ? do_munmap+0x245/0x300 Aug 29 23:50:58 coolbox kernel: [<c01c645d>] ? sys_umount+0x37d/0x380 Aug 29 23:50:58 coolbox kernel: [<c01bc77e>] ? sys_ioctl+0x2e/0x50 Aug 29 23:50:58 coolbox kernel: [<c0486cc5>] ? syscall_call+0x7/0xb Aug 29 23:50:58 coolbox kernel: [<c01c645d>] ? sys_umount+0x37d/0x380 Aug 29 23:50:58 coolbox kernel: [<c01c645d>] ? sys_umount+0x37d/0x380 Aug 29 23:50:58 coolbox kernel: Code: 8b 6c 24 14 83 c4 18 c3 8d 76 00 8d bc 27 00 00 00 00 8b 0a 85 c9 75 09 83 7a 04 00 75 03 31 c0 c3 8b 52 04 29 ca 39 50 44 76 f3 <0f> 0b 90 8d 74 26 00 8b 4a 14 53 6b d9 50 8b 44 18 1c a8 01 75 Aug 29 23:50:58 coolbox kernel: EIP: [<c030a069>] ttm_bo_check_placement+0x19/0x20 SS:ESP 0068:eaa33d70 Aug 29 23:50:58 coolbox kernel: ---[ end trace c7e3e649e5a39cf1 ]--- This is an out of memory error. Is it still an issue on a newer kernel and gfx stack? I do not have a working celestia at this time so I can't test it in the near future. I no longer have that particular GPU but I also haven't seen the problem in ages. I added the vmalloc argument to the kernel cmdline, that may also have helped. With the recent amdgpu kernel driver I am trying to run without this argument and haven't seen any problems with Celestia yet. I also have 4GB of VRAM on a Polaris11 GPU now. |