Latest working kernel version: unknown Earliest failing kernel version: unknown Distribution: Gentoo Hardware Environment: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz Software Environment: GCC 4.3.1 GLIBC 2.8 Problem Description: ksoftirqd loads one cpu at 8% even when the machine is idle only when dynticks is enabled. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10 root 15 -5 0 0 0 S 8 0.0 1:44.89 ksoftirqd/3 14:25:53 up 22 min, 2 users, load average: 0.04, 0.06, 0.01 Although this is a grsecurity patched kernel, I have verified this behaviour with a vanilla kernel. Witout dynticks: quad zakalwe # opreport -l /usr/src/linux-2.6.26.3-grsec/vmlinux CPU: Core 2, speed 1603 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 1201500 samples % symbol name 162 20.3774 mwait_idle 76 9.5597 apic_timer_interrupt 74 9.3082 native_read_tsc 34 4.2767 slab_pad_check 28 3.5220 getnstimeofday 21 2.6415 check_bytes_and_report 17 2.1384 native_sched_clock 16 2.0126 __update_sched_clock 16 2.0126 run_timer_softirq 14 1.7610 __do_softirq 14 1.7610 tick_sched_timer 11 1.3836 find_busiest_group 10 1.2579 scheduler_tick 9 1.1321 hrtimer_interrupt 9 1.1321 ktime_get_ts 9 1.1321 page_fault 9 1.1321 smp_apic_timer_interrupt 8 1.0063 sched_clock_tick 7 0.8805 _local_bh_enable 7 0.8805 do_softirq 7 0.8805 raise_softirq 7 0.8805 read_tsc 6 0.7547 hrtimer_run_pending 6 0.7547 sched_clock_cpu 6 0.7547 update_wall_time 5 0.6289 hrtimer_run_queues 5 0.6289 init_object 5 0.6289 notifier_call_chain 5 0.6289 on_freelist 5 0.6289 profile_tick 4 0.5031 __run_hrtimer 4 0.5031 _spin_lock 4 0.5031 clear_page_c 4 0.5031 find_lock_page 4 0.5031 hrtimer_forward 4 0.5031 in_lock_functions 4 0.5031 ktime_get 4 0.5031 lapic_next_event 4 0.5031 list_del 4 0.5031 restore_args 4 0.5031 update_process_times 3 0.3774 __round_jiffies 3 0.3774 account_system_time_scaled 3 0.3774 check_slab 3 0.3774 clockevents_program_event 3 0.3774 irq_enter 3 0.3774 irq_exit 3 0.3774 mce_idle_callback 3 0.3774 rb_next 3 0.3774 rcu_pending 3 0.3774 run_rebalance_domains 3 0.3774 schedule 3 0.3774 unmap_vmas 3 0.3774 update_vsyscall 2 0.2516 __first_cpu 2 0.2516 __link_path_walk 2 0.2516 _atomic_dec_and_lock 2 0.2516 copy_page_range 2 0.2516 copy_user_generic_string 2 0.2516 cpu_idle 2 0.2516 do_page_fault 2 0.2516 enter_idle 2 0.2516 filemap_fault 2 0.2516 find_first_bit 2 0.2516 find_next_bit 2 0.2516 get_page_from_freelist 2 0.2516 mutex_lock 2 0.2516 page_remove_rmap 2 0.2516 rb_insert_color 2 0.2516 thread_return 1 0.1258 __chk_obj_label 1 0.1258 __dec_zone_page_state 1 0.1258 __delay 1 0.1258 __dequeue_entity 1 0.1258 __dequeue_signal 1 0.1258 __find_get_block 1 0.1258 __list_add 1 0.1258 __pagevec_free 1 0.1258 __remove_hrtimer 1 0.1258 __slab_free 1 0.1258 __switch_to 1 0.1258 __up_read 1 0.1258 __wake_up_bit 1 0.1258 account_process_tick 1 0.1258 account_system_time 1 0.1258 add_event_entry 1 0.1258 any_ports_active 1 0.1258 check_object 1 0.1258 con_chars_in_buffer 1 0.1258 copy_page_c 1 0.1258 cp_new_stat 1 0.1258 deactivate_task 1 0.1258 dequeue_entity 1 0.1258 dequeue_task_fair 1 0.1258 do_dbs_timer 1 0.1258 do_mmap_pgoff 1 0.1258 do_select 1 0.1258 down_write 1 0.1258 elv_rb_find 1 0.1258 enqueue_hrtimer 1 0.1258 exit_idle 1 0.1258 ext3_get_group_desc 1 0.1258 find_vma 1 0.1258 find_vma_prev 1 0.1258 finish_task_switch 1 0.1258 flush_tlb_page 1 0.1258 fput 1 0.1258 free_hot_cold_page 1 0.1258 free_pages_and_swap_cache 1 0.1258 generic_file_aio_read 1 0.1258 handle_mm_fault 1 0.1258 ioread8 1 0.1258 kfree 1 0.1258 locks_remove_posix 1 0.1258 lookup_mnt 1 0.1258 memory_open 1 0.1258 mousedev_event 1 0.1258 mousedev_notify_readers 1 0.1258 next_zones_zonelist 1 0.1258 page_add_file_rmap 1 0.1258 page_waitqueue 1 0.1258 prepend 1 0.1258 radix_tree_lookup 1 0.1258 rb_erase 1 0.1258 read_inode_bitmap 1 0.1258 release_pages 1 0.1258 ret_from_intr 1 0.1258 run_posix_cpu_timers 1 0.1258 run_workqueue 1 0.1258 set_normalized_timespec 1 0.1258 skb_release_data 1 0.1258 sock_alloc_send_skb 1 0.1258 task_rq_lock 1 0.1258 task_tick_idle 1 0.1258 timer_stats_update_stats 1 0.1258 uhci_check_ports 1 0.1258 uhci_hub_status_data 1 0.1258 uhci_scan_schedule 1 0.1258 update_curr 1 0.1258 usecs_to_jiffies 1 0.1258 vfs_getattr 1 0.1258 vma_adjust With Dynticks quad zakalwe # opreport -l /usr/src/linux-2.6.26.3-grsec/vmlinux CPU: Core 2, speed 1603 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 1201500 samples % symbol name 3668 16.9017 __switch_to 2028 9.3448 schedule 1996 9.1973 native_read_tsc 1563 7.2021 find_busiest_group 767 3.5342 finish_task_switch 723 3.3315 native_sched_clock 709 3.2670 thread_return 662 3.0504 sched_clock_cpu 472 2.1749 __update_sched_clock 447 2.0597 tick_nohz_stop_sched_tick 409 1.8846 hrtick_set 401 1.8478 hrtick_start_fair 386 1.7786 read_tsc 332 1.5298 ksoftirqd 318 1.4653 __do_softirq 295 1.3593 task_rq_lock 265 1.2211 __resched_task 253 1.1658 enqueue_entity 250 1.1520 getnstimeofday 238 1.0967 do_softirq 231 1.0644 find_first_bit 222 1.0229 pick_next_task_fair 220 1.0137 find_next_bit 220 1.0137 get_next_timer_interrupt 217 0.9999 tick_nohz_restart_sched_tick 212 0.9769 update_curr 211 0.9723 select_task_rq_fair 210 0.9677 set_next_entity 178 0.8202 __next_cpu 169 0.7787 try_to_wake_up 166 0.7649 __first_cpu 161 0.7419 rb_insert_color 144 0.6635 dequeue_entity 144 0.6635 rb_erase 136 0.6267 __enqueue_entity 130 0.5990 put_prev_task_fair 115 0.5299 hrtimer_run_pending 110 0.5069 cpu_idle 99 0.4562 enqueue_task_fair 99 0.4562 run_timer_softirq 97 0.4470 deactivate_task 96 0.4424 ktime_get 96 0.4424 rb_next 92 0.4239 _spin_unlock_irqrestore 90 0.4147 __dequeue_entity 87 0.4009 tick_nohz_stop_idle 86 0.3963 call_softirq 84 0.3871 _local_bh_enable 83 0.3825 __list_add 82 0.3778 kthread_should_stop 80 0.3686 place_entity 77 0.3548 activate_task 74 0.3410 enqueue_task 68 0.3133 dequeue_task_fair 68 0.3133 rcu_pending 67 0.3087 ktime_get_ts 65 0.2995 set_normalized_timespec 55 0.2534 msecs_to_jiffies 51 0.2350 _cond_resched 49 0.2258 pick_next_task_rt 46 0.2120 raise_softirq_irqoff 44 0.2027 rcu_needs_cpu 40 0.1843 slab_pad_check 39 0.1797 check_preempt_curr_idle 37 0.1705 wake_up_process 34 0.1567 pick_next_task_idle 33 0.1521 put_prev_task_idle 32 0.1475 list_add 22 0.1014 check_bytes_and_report 22 0.1014 mwait_idle 18 0.0829 apic_timer_interrupt 9 0.0415 copy_user_generic_string 8 0.0369 init_object 7 0.0323 on_freelist 6 0.0276 page_fault 5 0.0230 __slab_alloc 5 0.0230 add_event_entry 5 0.0230 smp_apic_timer_interrupt 5 0.0230 sync_buffer 5 0.0230 update_wall_time 4 0.0184 __slab_free 4 0.0184 kmem_cache_free 4 0.0184 sched_clock_tick 4 0.0184 scheduler_tick 3 0.0138 __run_hrtimer 3 0.0138 account_system_time 3 0.0138 clear_page_c 3 0.0138 flush_tlb_page 3 0.0138 irq_enter 3 0.0138 raise_softirq 3 0.0138 select_nohz_load_balancer 3 0.0138 unix_poll 2 0.0092 __d_lookup 2 0.0092 __find_get_block 2 0.0092 __journal_temp_unlink_buffer 2 0.0092 __wake_up 2 0.0092 _spin_lock 2 0.0092 br_hello_timer_expired 2 0.0092 check_object 2 0.0092 check_slab 2 0.0092 clocksource_get_next 2 0.0092 copy_page_c 2 0.0092 do_page_fault 2 0.0092 find_lock_page 2 0.0092 idle_cpu 2 0.0092 memcmp 2 0.0092 mmap_region 2 0.0092 path_put 2 0.0092 release_pages 2 0.0092 run_rebalance_domains 2 0.0092 system_call_after_swapgs 2 0.0092 tick_nohz_update_jiffies 2 0.0092 tick_sched_timer 2 0.0092 update_process_times 2 0.0092 update_vsyscall 2 0.0092 vsnprintf 1 0.0046 __alloc_pages_internal 1 0.0046 __blk_put_request 1 0.0046 __do_fault 1 0.0046 __down_write_nested 1 0.0046 __find_get_block_slow 1 0.0046 __fput 1 0.0046 __inet_lookup_established 1 0.0046 __mod_timer 1 0.0046 __pte_alloc 1 0.0046 __round_jiffies 1 0.0046 __timer_stats_hrtimer_set_start_info 1 0.0046 __up_write 1 0.0046 __wake_up_sync 1 0.0046 add_dirent_to_buf 1 0.0046 any_ports_active 1 0.0046 br_config_bpdu_generation 1 0.0046 br_transmit_config 1 0.0046 copy_page_range 1 0.0046 do_dbs_timer 1 0.0046 do_get_write_access 1 0.0046 do_notify_resume 1 0.0046 do_wait 1 0.0046 do_wp_page 1 0.0046 dput 1 0.0046 enqueue_hrtimer 1 0.0046 exit_itimers 1 0.0046 ext3_get_blocks_handle 1 0.0046 ext3_journal_start_sb 1 0.0046 ext3_mark_iloc_dirty 1 0.0046 filemap_fault 1 0.0046 find_vma 1 0.0046 free_pgd_range 1 0.0046 get_pid_task 1 0.0046 get_slab 1 0.0046 get_user_pages 1 0.0046 handle_mm_fault 1 0.0046 hrtick_resched 1 0.0046 hrtimer_get_next_event 1 0.0046 init_timer 1 0.0046 inotify_inode_queue_event 1 0.0046 ioread8 1 0.0046 iowrite8 1 0.0046 journal_put_journal_head 1 0.0046 journal_start 1 0.0046 lock_timer_base 1 0.0046 mutex_lock 1 0.0046 notifier_call_chain 1 0.0046 page_waitqueue 1 0.0046 path_get 1 0.0046 proc_sys_lookup_table_one 1 0.0046 radix_tree_insert 1 0.0046 rebalance_domains 1 0.0046 remove_vma 1 0.0046 ret_from_intr 1 0.0046 run_posix_cpu_timers 1 0.0046 sched_balance_self 1 0.0046 scsi_sg_free 1 0.0046 strncpy_from_user 1 0.0046 task_tick_idle 1 0.0046 tick_do_update_jiffies64 1 0.0046 tick_program_event 1 0.0046 timer_stats_update_stats 1 0.0046 uhci_check_ports 1 0.0046 uhci_scan_schedule 1 0.0046 unfreeze_slab 1 0.0046 unlock_page 1 0.0046 unmap_vmas 1 0.0046 vma_prio_tree_add 1 0.0046 zone_watermark_ok I searched the lkml and it would seem that this problem is the same as: http://marc.info/?l=linux-kernel&m=119630440310744&w=2 reported late last year. I have no hardware in common with that case, but have a similar setup of bridged ethernet/tap devices. Powertop shows: Top causes for wakeups: 64.9% (251.0) ip : br_stp_enable_bridge (br_hello_timer_expired) 12.6% ( 48.8) USB device 6-2.3 : USB-PS/2 Optical Mouse (Logitech) 10.3% ( 39.7) <interrupt> : ehci_hcd:usb2, uhci_hcd:usb6 6.4% ( 24.6) aterm : schedule_timeout (process_timeout) 2.8% ( 10.9) X : do_setitimer (it_real_fn) 1.0% ( 4.0) <kernel core> : usb_hcd_poll_rh_status (rh_timer_func) 0.5% ( 2.0) <kernel core> : clocksource_register (clocksource_watchdog) 0.3% ( 1.2) fluxbox : schedule_timeout (process_timeout) 0.3% ( 1.0) ip : br_transmit_config (br_hold_timer_expired) 0.3% ( 1.0) ip : br_stp_enable_bridge (br_fdb_cleanup) 0.2% ( 0.8) <interrupt> : ata_piix, ata_piix, uhci_hcd:usb7 0.2% ( 0.6) firefox : futex_wait (hrtimer_wakeup) 0.1% ( 0.5) ip : __netdev_watchdog_up (dev_watchdog) 0.1% ( 0.2) init : schedule_timeout (process_timeout) 0.1% ( 0.2) <kernel core> : page_writeback_init (wb_timer_fn) 0.0% ( 0.1) <kernel core> : neigh_table_init_no_netlink (neigh_periodic_timer) 0.0% ( 0.1) <kernel core> : enqueue_task_rt (sched_rt_period_timer
Created attachment 17480 [details] opreport -cg from a vanilla 2.6.26.3 kernel exhibiting the problem I stopped oprofiled after a minute. I'm not that familiar with it, but I hope this is useful.
can you please provide the output of # cat /proc/timer_list; sleep 60; cat /proc/timer_list
Created attachment 17484 [details] Output from cat /proc/timer_list; sleep 60; cat /proc/timer_list
The idle sleep times of the CPUs: s2 97950027112 82652932873 98057803389 96210226007 s1 37983392768 33623740775 37827261075 36227296513 delta 59966634344 49029192098 60230542314 59982929494 % idle 100 82 100 100 So on CPU 1 we had ~20% activity during that 60 seconds. powertop tells us that the main wakeup source is br_stp_enable_bridge. Can you please check the following: # find /sys -type f -name hello_time | xargs cat
find /sys -type f -name hello_time | xargs cat 0
> ------- Comment #5 from sc.contact@gmail.com 2008-08-27 12:25 ------- > find /sys -type f -name hello_time | xargs cat > 0 Ooops. Can you please do: find /sys -type f -name hello_time and tell us the exact file name and verify that the output of "cat thatveryfilename" is 0 ? What's the CONFIG_HZ setting of your kernel ? Thanks, tglx
CONFIG_HZ=1000 cat /sys/class/net/br0/bridge/hello_time 0 I had copy and pasted the bridge setup from the gentoo qemu wiki which is brctl_br0=( "setfd 0" "sethello 0" "stp off" ) Changing sethello to 1 has now changed the value shown in sysfs to 99, and the load is normal.
> ------- Comment #7 from sc.contact@gmail.com 2008-08-27 14:47 ------- > CONFIG_HZ=1000 > > cat /sys/class/net/br0/bridge/hello_time > 0 > > I had copy and pasted the bridge setup from the gentoo qemu wiki which is > brctl_br0=( "setfd 0" "sethello 0" "stp off" ) > > Changing sethello to 1 has now changed the value shown in sysfs to 99, and > the > load is normal. Well, that's not a surprise looking at the code. hello_time is in (kernel internal ticks aka jiffies) which is 1ms in your setup (CONFIG_HZ=1000). 0 ticks means that the timer expires at within 1 ms, which is propably not what you want. Please tell the gentoo qemu wiki folks, that their config is not really a good idea. Aside of that auditing the brctl code would be not a bad idea. Can we close that bug ? Thanks, tglx
(In reply to comment #8) > Can we close that bug ? > Sure.