The bug described in http://lkml.iu.edu/hypermail/linux/kernel/0703.1/2582.html, remain in. [ 112.893809] BUG: unable to handle kernel paging request at ffffffffff7695f4 [ 112.893811] IP: [<ffffffff81372301>] strlcpy+0x1/0x50 [ 112.893816] PGD 1a2d067 PUD 1a2f067 PMD 117112067 PTE 0 [ 112.893819] Oops: 0000 [#1] SMP [ 112.893822] Modules linked in: bonding(E) [last unloaded: bonding] [ 112.893826] CPU: 1 PID: 16619 Comm: cat Tainted: G E 4.5.0-rc1+ #2 [ 112.893828] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 112.893829] task: ffff8800d5e5e680 ti: ffff8800da004000 task.ti: ffff8800da004000 [ 112.893830] RIP: 0010:[<ffffffff81372301>] [<ffffffff81372301>] strlcpy+0x1/0x50 [ 112.893833] RSP: 0018:ffff8800da007d50 EFLAGS: 00010282 [ 112.893834] RAX: ffffffffa003f000 RBX: ffffffffa003e3c0 RCX: ffffffff81a71e98 [ 112.893836] RDX: 0000000000000080 RSI: ffffffffff7695f4 RDI: ffff880111030615 [ 112.893837] RBP: ffff8800da007d90 R08: 00000000fffffffe R09: 0000000000000000 [ 112.893838] R10: 0000000000000005 R11: 000000000016ab0e R12: ffff880111030615 [ 112.893839] R13: ffff880111030608 R14: 0000000000004ae8 R15: ffff880111a96b00 [ 112.893841] FS: 00007f1298d80700(0000) GS:ffff880116f00000(0000) knlGS:0000000000000000 [ 112.893842] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 112.893843] CR2: ffffffffff7695f4 CR3: 00000000daf79000 CR4: 00000000000006e0 [ 112.893847] Stack: [ 112.893848] ffff8800da007d90 ffffffff810fa401 ffff8801110306d0 ffff880111030695 [ 112.893850] ffff880111030600 ffff880111030e00 ffff880111030600 ffff8800da007f20 [ 112.893852] ffff8800da007db0 ffffffff810fac5a ffff880111030600 ffff880111030e00 [ 112.893854] Call Trace: [ 112.893859] [<ffffffff810fa401>] ? module_get_kallsym+0xd1/0x170 [ 112.893861] [<ffffffff810fac5a>] update_iter+0xfa/0x110 [ 112.893862] [<ffffffff810fac90>] s_next+0x20/0x30 [ 112.893865] [<ffffffff8121239f>] seq_read+0x1ef/0x350 [ 112.893867] [<ffffffff8125766d>] proc_reg_read+0x3d/0x70 [ 112.893869] [<ffffffff811ef258>] __vfs_read+0x28/0xd0 [ 112.893871] [<ffffffff812f38a3>] ? security_file_permission+0xa3/0xc0 [ 112.893873] [<ffffffff811ef782>] ? rw_verify_area+0x52/0xd0 [ 112.893875] [<ffffffff811ef87f>] vfs_read+0x7f/0x130 [ 112.893877] [<ffffffff811f0896>] SyS_read+0x46/0xa0 [ 112.893879] [<ffffffff8116d87d>] ? context_tracking_enter+0x1d/0x30 [ 112.893881] [<ffffffff816c55ee>] entry_SYSCALL_64_fastpath+0x12/0x71 [ 112.893882] Code: f8 48 89 e5 84 c9 75 0d eb 1d 48 83 c0 01 0f b6 08 84 c9 74 12 40 38 f1 75 f0 88 10 48 83 c0 01 0f b6 08 84 c9 75 ee 5d c3 90 55 <80> 3e 00 48 89 f9 48 89 e5 41 54 53 74 3a 48 89 f0 48 83 c0 01 [ 112.893902] RIP [<ffffffff81372301>] strlcpy+0x1/0x50 [ 112.893904] RSP <ffff8800da007d50> The test script is. Notic you should have at least 2 processors: #!/bin/bash while true; do rmmod bonding.ko insmod bonding.ko done & while true; do cat /proc/kallsyms > /dev/null done & while true; do cat /proc/kallsyms > /dev/null done & while true; do cat /proc/kallsyms > /dev/null done & while true; do cat /proc/kallsyms > /dev/null done & while true; do cat /proc/kallsyms > /dev/null done & while true; do cat /proc/kallsyms > /dev/null done & while true; do cat /proc/kallsyms > /dev/null done & while true; do cat /proc/kallsyms > /dev/null done &
This patch seems to solve the problem: diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 5c5987f..fff4bec 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -516,6 +516,7 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos) static void *s_start(struct seq_file *m, loff_t *pos) { + mutex_lock(&module_mutex); if (!update_iter(m->private, *pos)) return NULL; return m->private; @@ -523,6 +524,7 @@ static void *s_start(struct seq_file *m, loff_t *pos) static void s_stop(struct seq_file *m, void *p) { + mutex_unlock(&module_mutex); } static int s_show(struct seq_file *m, void *p)
bugzilla-daemon@bugzilla.kernel.org writes: > https://bugzilla.kernel.org/show_bug.cgi?id=111541 > > --- Comment #1 from Weilong Chen <chenweilong@huawei.com> --- > This patch seems to solve the problem: > diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c > index 5c5987f..fff4bec 100644 > --- a/kernel/kallsyms.c > +++ b/kernel/kallsyms.c > @@ -516,6 +516,7 @@ static void *s_next(struct seq_file *m, void *p, loff_t > *pos) > > static void *s_start(struct seq_file *m, loff_t *pos) > { > + mutex_lock(&module_mutex); > if (!update_iter(m->private, *pos)) > return NULL; > return m->private; > @@ -523,6 +524,7 @@ static void *s_start(struct seq_file *m, loff_t *pos) > > static void s_stop(struct seq_file *m, void *p) > { > + mutex_unlock(&module_mutex); > } > > static int s_show(struct seq_file *m, void *p) Unfortunately, that would also create a DoS where anyone can stop module loading. I'm testing here to see if I can reproduce, but I suspect this will help. I wonder if Masami's removal of stop_machine from the unload path uncovered this race? Thanks! Rusty. diff --git a/kernel/module.c b/kernel/module.c index 8358f4697c0c..9f2de09b6d8c 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -1992,6 +1992,9 @@ static void free_module(struct module *mod) mod->state = MODULE_STATE_UNFORMED; mutex_unlock(&module_mutex); + /* kallsyms walks list without mutex; make sure they see UNFORMED */ + synchronize_sched(); + /* Remove dynamic debug info */ ddebug_remove_module(mod->name);
Thanks for replay. I(In reply to Rusty Russell from comment #2) > bugzilla-daemon@bugzilla.kernel.org writes: > > https://bugzilla.kernel.org/show_bug.cgi?id=111541 > > > > --- Comment #1 from Weilong Chen <chenweilong@huawei.com> --- > > This patch seems to solve the problem: > > diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c > > index 5c5987f..fff4bec 100644 > > --- a/kernel/kallsyms.c > > +++ b/kernel/kallsyms.c > > @@ -516,6 +516,7 @@ static void *s_next(struct seq_file *m, void *p, loff_t > > *pos) > > > > static void *s_start(struct seq_file *m, loff_t *pos) > > { > > + mutex_lock(&module_mutex); > > if (!update_iter(m->private, *pos)) > > return NULL; > > return m->private; > > @@ -523,6 +524,7 @@ static void *s_start(struct seq_file *m, loff_t *pos) > > > > static void s_stop(struct seq_file *m, void *p) > > { > > + mutex_unlock(&module_mutex); > > } > > > > static int s_show(struct seq_file *m, void *p) > > Unfortunately, that would also create a DoS where anyone can stop module > loading. > > I'm testing here to see if I can reproduce, but I suspect this will > help. I wonder if Masami's removal of stop_machine from the unload path > uncovered this race? > > Thanks! > Rusty. > Thanks for your reply, I test this patch, It doesn't work, still has the problem. You can add a 'printk("for test %p \n",mod->strtab + mod->symtab[symnum].st_name);' just before the first strlcpy... in module_get_kallsym, It can help reproduce the problem easily. > diff --git a/kernel/module.c b/kernel/module.c > index 8358f4697c0c..9f2de09b6d8c 100644 > --- a/kernel/module.c > +++ b/kernel/module.c > @@ -1992,6 +1992,9 @@ static void free_module(struct module *mod) > mod->state = MODULE_STATE_UNFORMED; > mutex_unlock(&module_mutex); > > + /* kallsyms walks list without mutex; make sure they see UNFORMED */ > + synchronize_sched(); > + > /* Remove dynamic debug info */ > ddebug_remove_module(mod->name);
(In reply to Weilong Chen from comment #3) > > Unfortunately, that would also create a DoS where anyone can stop module > > loading. > > > > I'm testing here to see if I can reproduce, but I suspect this will > > help. I wonder if Masami's removal of stop_machine from the unload path > > uncovered this race? Hmm, it seems that the stop_machine removal is not related, because try_stop_module() is called far before free_module(). And also, with stop_machine, repeating rmmod/insmod itself might be a DoS :) > Thanks for your reply, I test this patch, It doesn't work, still has the > problem. > You can add a 'printk("for test %p \n",mod->strtab + > mod->symtab[symnum].st_name);' > just before the first strlcpy... in module_get_kallsym, It can help > reproduce the problem easily. I think we should have SMP memory barriers for mod->state changes and references. Without them, the "if (mod->state == MODULE_STATE_UNFORMED)" in module_get_kallsyms() can not be hit even if it is turely modified. Could you also try below? diff --git a/kernel/module.c b/kernel/module.c index 8358f46..b84aa15 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -1989,9 +1989,12 @@ static void free_module(struct module *mod) /* We leave it in list to prevent duplicate loads, but make sure * that noone uses it while it's being deconstructed. */ mutex_lock(&module_mutex); - mod->state = MODULE_STATE_UNFORMED; + WRITE_ONCE(mod->state, MODULE_STATE_UNFORMED); + smp_wmb(); mutex_unlock(&module_mutex); + synchronize_sched(); + /* Remove dynamic debug info */ ddebug_remove_module(mod->name); @@ -3755,10 +3758,13 @@ int module_get_kallsym(unsigned int symnum, unsigned lon char *name, char *module_name, int *exported) { struct module *mod; + enum module_state state; preempt_disable(); list_for_each_entry_rcu(mod, &modules, list) { - if (mod->state == MODULE_STATE_UNFORMED) + smp_rmb(); + state = READ_ONCE(mod->state); + if (state == MODULE_STATE_UNFORMED) continue; if (symnum < mod->num_symtab) { *value = mod->symtab[symnum].st_value;
(In reply to Masami Hiramatsu from comment #4) > (In reply to Weilong Chen from comment #3) > > > Unfortunately, that would also create a DoS where anyone can stop module > > > loading. > > > > > > I'm testing here to see if I can reproduce, but I suspect this will > > > help. I wonder if Masami's removal of stop_machine from the unload path > > > uncovered this race? > > Hmm, it seems that the stop_machine removal is not related, because > try_stop_module() is called far before free_module(). And also, with > stop_machine, repeating rmmod/insmod itself might be a DoS :) > > > Thanks for your reply, I test this patch, It doesn't work, still has the > > problem. > > You can add a 'printk("for test %p \n",mod->strtab + > > mod->symtab[symnum].st_name);' > > just before the first strlcpy... in module_get_kallsym, It can help > > reproduce the problem easily. > > I think we should have SMP memory barriers for mod->state changes and > references. Without them, the "if (mod->state == MODULE_STATE_UNFORMED)" > in module_get_kallsyms() can not be hit even if it is turely modified. > > Could you also try below? > > diff --git a/kernel/module.c b/kernel/module.c > index 8358f46..b84aa15 100644 > --- a/kernel/module.c > +++ b/kernel/module.c > @@ -1989,9 +1989,12 @@ static void free_module(struct module *mod) > /* We leave it in list to prevent duplicate loads, but make sure > * that noone uses it while it's being deconstructed. */ > mutex_lock(&module_mutex); > - mod->state = MODULE_STATE_UNFORMED; > + WRITE_ONCE(mod->state, MODULE_STATE_UNFORMED); > + smp_wmb(); > mutex_unlock(&module_mutex); > > + synchronize_sched(); > + > /* Remove dynamic debug info */ > ddebug_remove_module(mod->name); > > @@ -3755,10 +3758,13 @@ int module_get_kallsym(unsigned int symnum, unsigned > lon > char *name, char *module_name, int *exported) > { > struct module *mod; > + enum module_state state; > > preempt_disable(); > list_for_each_entry_rcu(mod, &modules, list) { > - if (mod->state == MODULE_STATE_UNFORMED) > + smp_rmb(); > + state = READ_ONCE(mod->state); > + if (state == MODULE_STATE_UNFORMED) > continue; > if (symnum < mod->num_symtab) { > *value = mod->symtab[symnum].st_value; Hi, this not work again. The oops almost the same as before: v01:~ # [ 165.286475] BUG: unable to handle kernel paging request at ffffffffff6c88e9 [ 165.287305] IP: [<ffffffff81372301>] strlcpy+0x1/0x50 [ 165.287870] PGD 1a2d067 PUD 1a2f067 PMD 117112067 PTE 0 [ 165.288487] Oops: 0000 [#1] SMP [ 165.288865] Modules linked in: bonding(E) [last unloaded: bonding] [ 165.289575] CPU: 1 PID: 24436 Comm: cat Tainted: G E 4.5.0-rc1+ #11 [ 165.290379] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 165.291006] task: ffff8800dac35200 ti: ffff8800d6240000 task.ti: ffff8800d6240000 [ 165.291889] RIP: 0010:[<ffffffff81372301>] [<ffffffff81372301>] strlcpy+0x1/0x50 [ 165.292752] RSP: 0018:ffff8800d6243d50 EFLAGS: 00010282 [ 165.293361] RAX: ffffffffa003f000 RBX: ffffffffa003e3c0 RCX: ffffffff81a71e98 [ 165.294192] RDX: 0000000000000080 RSI: ffffffffff6c88e9 RDI: ffff8800d99e0115 [ 165.295017] RBP: ffff8800d6243d90 R08: 00000000fffffffe R09: 0000000000000000 [ 165.295859] R10: 0000000000000005 R11: 0000000000216055 R12: ffff8800d99e0115 [ 165.296709] R13: ffff8800d99e0108 R14: 0000000000003228 R15: ffff880111b6d300 [ 165.297558] FS: 00007ff9e0798700(0000) GS:ffff880116f00000(0000) knlGS:0000000000000000 [ 165.298512] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 165.299211] CR2: ffffffffff6c88e9 CR3: 00000000dac6f000 CR4: 00000000000006e0 [ 165.300027] Stack: [ 165.300268] ffff8800d6243d90 ffffffff810fa403 ffff8800d99e01d0 ffff8800d99e0195 [ 165.301153] ffff8800d99e0100 ffff8800d99e0e00 ffff8800d99e0100 ffff8800d6243f20 [ 165.302042] ffff8800d6243db0 ffffffff810fac5a ffff8800d99e0100 ffff8800d99e0e00 [ 165.302981] Call Trace: [ 165.303272] [<ffffffff810fa403>] ? module_get_kallsym+0xd3/0x170 [ 165.303973] [<ffffffff810fac5a>] update_iter+0xfa/0x110 [ 165.304625] [<ffffffff810fac90>] s_next+0x20/0x30 [ 165.305179] [<ffffffff8121239f>] seq_read+0x1ef/0x350 [ 165.305771] [<ffffffff8125766d>] proc_reg_read+0x3d/0x70 [ 165.306439] [<ffffffff811ef258>] __vfs_read+0x28/0xd0 [ 165.307032] [<ffffffff812f38a3>] ? security_file_permission+0xa3/0xc0 [ 165.307779] [<ffffffff811ef782>] ? rw_verify_area+0x52/0xd0 [ 165.308454] [<ffffffff811ef87f>] vfs_read+0x7f/0x130 [ 165.309036] [<ffffffff811f0896>] SyS_read+0x46/0xa0 [ 165.309607] [<ffffffff8116d87d>] ? context_tracking_enter+0x1d/0x30 [ 165.310355] [<ffffffff816c55ee>] entry_SYSCALL_64_fastpath+0x12/0x71 Here's my test patch, one 'printk' line more than yours, is it OK? -bash-4.2# git diff diff --git a/kernel/module.c b/kernel/module.c index 8358f46..3b633fb 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -1989,9 +1989,12 @@ static void free_module(struct module *mod) /* We leave it in list to prevent duplicate loads, but make sure * that noone uses it while it's being deconstructed. */ mutex_lock(&module_mutex); - mod->state = MODULE_STATE_UNFORMED; + WRITE_ONCE(mod->state, MODULE_STATE_UNFORMED); + smp_wmb(); mutex_unlock(&module_mutex); + synchronize_sched(); + /* Remove dynamic debug info */ ddebug_remove_module(mod->name); @@ -3755,14 +3758,18 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, char *name, char *module_name, int *exported) { struct module *mod; + enum module_state state; preempt_disable(); list_for_each_entry_rcu(mod, &modules, list) { - if (mod->state == MODULE_STATE_UNFORMED) + smp_rmb(); + state = READ_ONCE(mod->state); + if (state == MODULE_STATE_UNFORMED) continue; if (symnum < mod->num_symtab) { *value = mod->symtab[symnum].st_value; *type = mod->symtab[symnum].st_info; + printk("lg test %p \n",mod->strtab + mod->symtab[symnum].st_name); strlcpy(name, mod->strtab + mod->symtab[symnum].st_name, KSYM_NAME_LEN); strlcpy(module_name, mod->name, MODULE_NAME_LEN);
Peter Zijlstra <peterz@infradead.org> writes: > Adding lkml to Cc so that there is an actual email record of this. > > (I could for example not reply to Masami's later entries). > > On Mon, Feb 01, 2016 at 12:02:01PM +1030, Rusty Russell wrote: >> > https://bugzilla.kernel.org/show_bug.cgi?id=111541 > >> Unfortunately, that would also create a DoS where anyone can stop module >> loading. > >> diff --git a/kernel/module.c b/kernel/module.c >> index 8358f4697c0c..9f2de09b6d8c 100644 >> --- a/kernel/module.c >> +++ b/kernel/module.c >> @@ -1992,6 +1992,9 @@ static void free_module(struct module *mod) >> mod->state = MODULE_STATE_UNFORMED; >> mutex_unlock(&module_mutex); >> >> + /* kallsyms walks list without mutex; make sure they see UNFORMED */ >> + synchronize_sched(); >> + >> /* Remove dynamic debug info */ >> ddebug_remove_module(mod->name); > > So it all depends on when this mod->symtab stuff gets freed, and I could > not actually find this. You're right. The symtab is part of the module core, so it's freed after the synchronize_sched() below anyway: mutex_lock(&module_mutex); /* Unlink carefully: kallsyms could be walking list. */ list_del_rcu(&mod->list); mod_tree_remove(mod); /* Remove this module from bug list, this uses list_del_rcu */ module_bug_cleanup(mod); /* Wait for RCU-sched synchronizing before releasing mod->list and buglist. */ synchronize_sched(); mutex_unlock(&module_mutex); /* This may be empty, but that's OK */ disable_ro_nx(&mod->init_layout); module_arch_freeing_init(mod); module_memfree(mod->init_layout.base); kfree(mod->args); percpu_modfree(mod); /* Free lock-classes; relies on the preceding sync_rcu(). */ lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size); /* Finally, free the core (containing the module structure) */ disable_ro_nx(&mod->core_layout); ==> module_memfree(mod->core_layout.base); I finally reproduced it. For kallsyms, there are two copies of the symtab: a complete copy tacked onto the end of the init part, and a filtered core-symbols-only copy in the core part. We reassign the symtab pointer and size halfway through do_init_module(): #ifdef CONFIG_KALLSYMS mod->num_symtab = mod->core_num_syms; mod->symtab = mod->core_symtab; mod->strtab = mod->core_strtab; #endif Our kallsyms code uses the old (larger!) num_symtab value and boom: list_for_each_entry_rcu(mod, &modules, list) { if (mod->state == MODULE_STATE_UNFORMED) continue; if (symnum < mod->num_symtab) { ... strlcpy(name, mod->strtab + mod->symtab[symnum].st_name, KSYM_NAME_LEN); And there are other places with the same issue. This is a more complex, but I think worth it (actually two patches, rolled into one for testing): diff --git a/include/linux/module.h b/include/linux/module.h index 4560d8f1545d..2bb0c3085706 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -324,6 +324,12 @@ struct module_layout { #define __module_layout_align #endif +struct mod_kallsyms { + Elf_Sym *symtab; + unsigned int num_symtab; + char *strtab; +}; + struct module { enum module_state state; @@ -405,15 +411,10 @@ struct module { #endif #ifdef CONFIG_KALLSYMS - /* - * We keep the symbol and string tables for kallsyms. - * The core_* fields below are temporary, loader-only (they - * could really be discarded after module init). - */ - Elf_Sym *symtab, *core_symtab; - unsigned int num_symtab, core_num_syms; - char *strtab, *core_strtab; - + /* Protected by RCU and/or module_mutex: use rcu_dereference() */ + struct mod_kallsyms *kallsyms; + struct mod_kallsyms core_kallsyms; + /* Section attributes */ struct module_sect_attrs *sect_attrs; diff --git a/kernel/module.c b/kernel/module.c index 2149f7003e49..ecc920df45e6 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -303,6 +303,9 @@ struct load_info { struct _ddebug *debug; unsigned int num_debug; bool sig_ok; +#ifdef CONFIG_KALLSYMS + unsigned long mod_kallsyms_init_off; +#endif struct { unsigned int sym, str, mod, vers, info, pcpu; } index; @@ -2482,8 +2485,19 @@ static void layout_symtab(struct module *mod, struct load_info *info) info->index.str) | INIT_OFFSET_MASK; mod->init_layout.size = debug_align(mod->init_layout.size); pr_debug("\t%s\n", info->secstrings + strsect->sh_name); + + /* We'll tack temporary mod_kallsyms on the end. */ + mod->init_layout.size = ALIGN(mod->init_layout.size, + __alignof__(struct mod_kallsyms)); + info->mod_kallsyms_init_off = mod->init_layout.size; + mod->init_layout.size += sizeof(struct mod_kallsyms); } +/* + * We use the full symtab and strtab which layout_symtab arranged to + * be appended to the init section Later we switch to the cut-down + * core-only one. + */ static void add_kallsyms(struct module *mod, const struct load_info *info) { unsigned int i, ndst; @@ -2492,29 +2506,34 @@ static void add_kallsyms(struct module *mod, const struct load_info *info) char *s; Elf_Shdr *symsec = &info->sechdrs[info->index.sym]; - mod->symtab = (void *)symsec->sh_addr; - mod->num_symtab = symsec->sh_size / sizeof(Elf_Sym); + /* Set up to point into init section. */ + mod->kallsyms = mod->init_layout.base + info->mod_kallsyms_init_off; + + mod->kallsyms->symtab = (void *)symsec->sh_addr; + mod->kallsyms->num_symtab = symsec->sh_size / sizeof(Elf_Sym); /* Make sure we get permanent strtab: don't use info->strtab. */ - mod->strtab = (void *)info->sechdrs[info->index.str].sh_addr; + mod->kallsyms->strtab = (void *)info->sechdrs[info->index.str].sh_addr; /* Set types up while we still have access to sections. */ - for (i = 0; i < mod->num_symtab; i++) - mod->symtab[i].st_info = elf_type(&mod->symtab[i], info); - - mod->core_symtab = dst = mod->core_layout.base + info->symoffs; - mod->core_strtab = s = mod->core_layout.base + info->stroffs; - src = mod->symtab; - for (ndst = i = 0; i < mod->num_symtab; i++) { + for (i = 0; i < mod->kallsyms->num_symtab; i++) + mod->kallsyms->symtab[i].st_info + = elf_type(&mod->kallsyms->symtab[i], info); + + /* Now populate the cut down core kallsyms for after init. */ + mod->core_kallsyms.symtab = dst = mod->core_layout.base + info->symoffs; + mod->core_kallsyms.strtab = s = mod->core_layout.base + info->stroffs; + src = mod->kallsyms->symtab; + for (ndst = i = 0; i < mod->kallsyms->num_symtab; i++) { if (i == 0 || is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum, info->index.pcpu)) { dst[ndst] = src[i]; - dst[ndst++].st_name = s - mod->core_strtab; - s += strlcpy(s, &mod->strtab[src[i].st_name], + dst[ndst++].st_name = s - mod->core_kallsyms.strtab; + s += strlcpy(s, &mod->kallsyms->strtab[src[i].st_name], KSYM_NAME_LEN) + 1; } } - mod->core_num_syms = ndst; + mod->core_kallsyms.num_symtab = ndst; } #else static inline void layout_symtab(struct module *mod, struct load_info *info) @@ -3263,9 +3282,8 @@ static noinline int do_init_module(struct module *mod) module_put(mod); trim_init_extable(mod); #ifdef CONFIG_KALLSYMS - mod->num_symtab = mod->core_num_syms; - mod->symtab = mod->core_symtab; - mod->strtab = mod->core_strtab; + /* Switch to core kallsyms now init is done. */ + rcu_assign_pointer(mod->kallsyms, &mod->core_kallsyms); #endif mod_tree_remove_init(mod); disable_ro_nx(&mod->init_layout); @@ -3627,6 +3645,11 @@ static inline int is_arm_mapping_symbol(const char *str) && (str[2] == '\0' || str[2] == '.'); } +static const char *symname(struct mod_kallsyms *kallsyms, unsigned int symnum) +{ + return kallsyms->strtab + kallsyms->symtab[symnum].st_name; +} + static const char *get_ksymbol(struct module *mod, unsigned long addr, unsigned long *size, @@ -3634,6 +3657,7 @@ static const char *get_ksymbol(struct module *mod, { unsigned int i, best = 0; unsigned long nextval; + struct mod_kallsyms *kallsyms = rcu_dereference(mod->kallsyms); /* At worse, next value is at end of module */ if (within_module_init(addr, mod)) @@ -3643,32 +3667,32 @@ static const char *get_ksymbol(struct module *mod, /* Scan for closest preceding symbol, and next symbol. (ELF starts real symbols at 1). */ - for (i = 1; i < mod->num_symtab; i++) { - if (mod->symtab[i].st_shndx == SHN_UNDEF) + for (i = 1; i < kallsyms->num_symtab; i++) { + if (kallsyms->symtab[i].st_shndx == SHN_UNDEF) continue; /* We ignore unnamed symbols: they're uninformative * and inserted at a whim. */ - if (mod->symtab[i].st_value <= addr - && mod->symtab[i].st_value > mod->symtab[best].st_value - && *(mod->strtab + mod->symtab[i].st_name) != '\0' - && !is_arm_mapping_symbol(mod->strtab + mod->symtab[i].st_name)) + if (*symname(kallsyms, i) == '\0' + || is_arm_mapping_symbol(symname(kallsyms, i))) + continue; + + if (kallsyms->symtab[i].st_value <= addr + && kallsyms->symtab[i].st_value > kallsyms->symtab[best].st_value) best = i; - if (mod->symtab[i].st_value > addr - && mod->symtab[i].st_value < nextval - && *(mod->strtab + mod->symtab[i].st_name) != '\0' - && !is_arm_mapping_symbol(mod->strtab + mod->symtab[i].st_name)) - nextval = mod->symtab[i].st_value; + if (kallsyms->symtab[i].st_value > addr + && kallsyms->symtab[i].st_value < nextval) + nextval = kallsyms->symtab[i].st_value; } if (!best) return NULL; if (size) - *size = nextval - mod->symtab[best].st_value; + *size = nextval - kallsyms->symtab[best].st_value; if (offset) - *offset = addr - mod->symtab[best].st_value; - return mod->strtab + mod->symtab[best].st_name; + *offset = addr - kallsyms->symtab[best].st_value; + return symname(kallsyms, best); } /* For kallsyms to ask for address resolution. NULL means not found. Careful @@ -3758,19 +3782,20 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, preempt_disable(); list_for_each_entry_rcu(mod, &modules, list) { + struct mod_kallsyms *kallsyms = rcu_dereference(mod->kallsyms); + if (mod->state == MODULE_STATE_UNFORMED) continue; - if (symnum < mod->num_symtab) { - *value = mod->symtab[symnum].st_value; - *type = mod->symtab[symnum].st_info; - strlcpy(name, mod->strtab + mod->symtab[symnum].st_name, - KSYM_NAME_LEN); + if (symnum < kallsyms->num_symtab) { + *value = kallsyms->symtab[symnum].st_value; + *type = kallsyms->symtab[symnum].st_info; + strlcpy(name, symname(kallsyms, symnum), KSYM_NAME_LEN); strlcpy(module_name, mod->name, MODULE_NAME_LEN); *exported = is_exported(name, *value, mod); preempt_enable(); return 0; } - symnum -= mod->num_symtab; + symnum -= kallsyms->num_symtab; } preempt_enable(); return -ERANGE; @@ -3779,11 +3804,12 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, static unsigned long mod_find_symname(struct module *mod, const char *name) { unsigned int i; + struct mod_kallsyms *kallsyms = rcu_dereference(mod->kallsyms); - for (i = 0; i < mod->num_symtab; i++) - if (strcmp(name, mod->strtab+mod->symtab[i].st_name) == 0 && - mod->symtab[i].st_info != 'U') - return mod->symtab[i].st_value; + for (i = 0; i < kallsyms->num_symtab; i++) + if (strcmp(name, symname(kallsyms, i)) == 0 && + kallsyms->symtab[i].st_info != 'U') + return kallsyms->symtab[i].st_value; return 0; } @@ -3822,11 +3848,14 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, module_assert_mutex(); list_for_each_entry(mod, &modules, list) { + /* We hold module_mutex: no need for rcu_derefence */ + struct mod_kallsyms *kallsyms = mod->kallsyms; + if (mod->state == MODULE_STATE_UNFORMED) continue; - for (i = 0; i < mod->num_symtab; i++) { - ret = fn(data, mod->strtab + mod->symtab[i].st_name, - mod, mod->symtab[i].st_value); + for (i = 0; i < kallsyms->num_symtab; i++) { + ret = fn(data, symname(kallsyms, i), + mod, kallsyms->symtab[i].st_value); if (ret != 0) return ret; }
Rusty Russell <rusty@rustcorp.com.au> writes: > And there are other places with the same issue. This is a more > complex, but I think worth it (actually two patches, rolled into one > for testing): And this one actually works... diff --git a/include/linux/module.h b/include/linux/module.h index 4560d8f1545d..2bb0c3085706 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -324,6 +324,12 @@ struct module_layout { #define __module_layout_align #endif +struct mod_kallsyms { + Elf_Sym *symtab; + unsigned int num_symtab; + char *strtab; +}; + struct module { enum module_state state; @@ -405,15 +411,10 @@ struct module { #endif #ifdef CONFIG_KALLSYMS - /* - * We keep the symbol and string tables for kallsyms. - * The core_* fields below are temporary, loader-only (they - * could really be discarded after module init). - */ - Elf_Sym *symtab, *core_symtab; - unsigned int num_symtab, core_num_syms; - char *strtab, *core_strtab; - + /* Protected by RCU and/or module_mutex: use rcu_dereference() */ + struct mod_kallsyms *kallsyms; + struct mod_kallsyms core_kallsyms; + /* Section attributes */ struct module_sect_attrs *sect_attrs; diff --git a/kernel/module.c b/kernel/module.c index 2149f7003e49..4e4d567139f4 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -303,6 +303,9 @@ struct load_info { struct _ddebug *debug; unsigned int num_debug; bool sig_ok; +#ifdef CONFIG_KALLSYMS + unsigned long mod_kallsyms_init_off; +#endif struct { unsigned int sym, str, mod, vers, info, pcpu; } index; @@ -2480,10 +2483,21 @@ static void layout_symtab(struct module *mod, struct load_info *info) strsect->sh_flags |= SHF_ALLOC; strsect->sh_entsize = get_offset(mod, &mod->init_layout.size, strsect, info->index.str) | INIT_OFFSET_MASK; - mod->init_layout.size = debug_align(mod->init_layout.size); pr_debug("\t%s\n", info->secstrings + strsect->sh_name); + + /* We'll tack temporary mod_kallsyms on the end. */ + mod->init_layout.size = ALIGN(mod->init_layout.size, + __alignof__(struct mod_kallsyms)); + info->mod_kallsyms_init_off = mod->init_layout.size; + mod->init_layout.size += sizeof(struct mod_kallsyms); + mod->init_layout.size = debug_align(mod->init_layout.size); } +/* + * We use the full symtab and strtab which layout_symtab arranged to + * be appended to the init section Later we switch to the cut-down + * core-only one. + */ static void add_kallsyms(struct module *mod, const struct load_info *info) { unsigned int i, ndst; @@ -2492,29 +2506,34 @@ static void add_kallsyms(struct module *mod, const struct load_info *info) char *s; Elf_Shdr *symsec = &info->sechdrs[info->index.sym]; - mod->symtab = (void *)symsec->sh_addr; - mod->num_symtab = symsec->sh_size / sizeof(Elf_Sym); + /* Set up to point into init section. */ + mod->kallsyms = mod->init_layout.base + info->mod_kallsyms_init_off; + + mod->kallsyms->symtab = (void *)symsec->sh_addr; + mod->kallsyms->num_symtab = symsec->sh_size / sizeof(Elf_Sym); /* Make sure we get permanent strtab: don't use info->strtab. */ - mod->strtab = (void *)info->sechdrs[info->index.str].sh_addr; + mod->kallsyms->strtab = (void *)info->sechdrs[info->index.str].sh_addr; /* Set types up while we still have access to sections. */ - for (i = 0; i < mod->num_symtab; i++) - mod->symtab[i].st_info = elf_type(&mod->symtab[i], info); - - mod->core_symtab = dst = mod->core_layout.base + info->symoffs; - mod->core_strtab = s = mod->core_layout.base + info->stroffs; - src = mod->symtab; - for (ndst = i = 0; i < mod->num_symtab; i++) { + for (i = 0; i < mod->kallsyms->num_symtab; i++) + mod->kallsyms->symtab[i].st_info + = elf_type(&mod->kallsyms->symtab[i], info); + + /* Now populate the cut down core kallsyms for after init. */ + mod->core_kallsyms.symtab = dst = mod->core_layout.base + info->symoffs; + mod->core_kallsyms.strtab = s = mod->core_layout.base + info->stroffs; + src = mod->kallsyms->symtab; + for (ndst = i = 0; i < mod->kallsyms->num_symtab; i++) { if (i == 0 || is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum, info->index.pcpu)) { dst[ndst] = src[i]; - dst[ndst++].st_name = s - mod->core_strtab; - s += strlcpy(s, &mod->strtab[src[i].st_name], + dst[ndst++].st_name = s - mod->core_kallsyms.strtab; + s += strlcpy(s, &mod->kallsyms->strtab[src[i].st_name], KSYM_NAME_LEN) + 1; } } - mod->core_num_syms = ndst; + mod->core_kallsyms.num_symtab = ndst; } #else static inline void layout_symtab(struct module *mod, struct load_info *info) @@ -3263,9 +3282,8 @@ static noinline int do_init_module(struct module *mod) module_put(mod); trim_init_extable(mod); #ifdef CONFIG_KALLSYMS - mod->num_symtab = mod->core_num_syms; - mod->symtab = mod->core_symtab; - mod->strtab = mod->core_strtab; + /* Switch to core kallsyms now init is done. */ + rcu_assign_pointer(mod->kallsyms, &mod->core_kallsyms); #endif mod_tree_remove_init(mod); disable_ro_nx(&mod->init_layout); @@ -3627,6 +3645,11 @@ static inline int is_arm_mapping_symbol(const char *str) && (str[2] == '\0' || str[2] == '.'); } +static const char *symname(struct mod_kallsyms *kallsyms, unsigned int symnum) +{ + return kallsyms->strtab + kallsyms->symtab[symnum].st_name; +} + static const char *get_ksymbol(struct module *mod, unsigned long addr, unsigned long *size, @@ -3634,6 +3657,7 @@ static const char *get_ksymbol(struct module *mod, { unsigned int i, best = 0; unsigned long nextval; + struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms); /* At worse, next value is at end of module */ if (within_module_init(addr, mod)) @@ -3643,32 +3667,32 @@ static const char *get_ksymbol(struct module *mod, /* Scan for closest preceding symbol, and next symbol. (ELF starts real symbols at 1). */ - for (i = 1; i < mod->num_symtab; i++) { - if (mod->symtab[i].st_shndx == SHN_UNDEF) + for (i = 1; i < kallsyms->num_symtab; i++) { + if (kallsyms->symtab[i].st_shndx == SHN_UNDEF) continue; /* We ignore unnamed symbols: they're uninformative * and inserted at a whim. */ - if (mod->symtab[i].st_value <= addr - && mod->symtab[i].st_value > mod->symtab[best].st_value - && *(mod->strtab + mod->symtab[i].st_name) != '\0' - && !is_arm_mapping_symbol(mod->strtab + mod->symtab[i].st_name)) + if (*symname(kallsyms, i) == '\0' + || is_arm_mapping_symbol(symname(kallsyms, i))) + continue; + + if (kallsyms->symtab[i].st_value <= addr + && kallsyms->symtab[i].st_value > kallsyms->symtab[best].st_value) best = i; - if (mod->symtab[i].st_value > addr - && mod->symtab[i].st_value < nextval - && *(mod->strtab + mod->symtab[i].st_name) != '\0' - && !is_arm_mapping_symbol(mod->strtab + mod->symtab[i].st_name)) - nextval = mod->symtab[i].st_value; + if (kallsyms->symtab[i].st_value > addr + && kallsyms->symtab[i].st_value < nextval) + nextval = kallsyms->symtab[i].st_value; } if (!best) return NULL; if (size) - *size = nextval - mod->symtab[best].st_value; + *size = nextval - kallsyms->symtab[best].st_value; if (offset) - *offset = addr - mod->symtab[best].st_value; - return mod->strtab + mod->symtab[best].st_name; + *offset = addr - kallsyms->symtab[best].st_value; + return symname(kallsyms, best); } /* For kallsyms to ask for address resolution. NULL means not found. Careful @@ -3758,19 +3782,21 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, preempt_disable(); list_for_each_entry_rcu(mod, &modules, list) { + struct mod_kallsyms *kallsyms; + if (mod->state == MODULE_STATE_UNFORMED) continue; - if (symnum < mod->num_symtab) { - *value = mod->symtab[symnum].st_value; - *type = mod->symtab[symnum].st_info; - strlcpy(name, mod->strtab + mod->symtab[symnum].st_name, - KSYM_NAME_LEN); + kallsyms = rcu_dereference_sched(mod->kallsyms); + if (symnum < kallsyms->num_symtab) { + *value = kallsyms->symtab[symnum].st_value; + *type = kallsyms->symtab[symnum].st_info; + strlcpy(name, symname(kallsyms, symnum), KSYM_NAME_LEN); strlcpy(module_name, mod->name, MODULE_NAME_LEN); *exported = is_exported(name, *value, mod); preempt_enable(); return 0; } - symnum -= mod->num_symtab; + symnum -= kallsyms->num_symtab; } preempt_enable(); return -ERANGE; @@ -3779,11 +3805,12 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, static unsigned long mod_find_symname(struct module *mod, const char *name) { unsigned int i; + struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms); - for (i = 0; i < mod->num_symtab; i++) - if (strcmp(name, mod->strtab+mod->symtab[i].st_name) == 0 && - mod->symtab[i].st_info != 'U') - return mod->symtab[i].st_value; + for (i = 0; i < kallsyms->num_symtab; i++) + if (strcmp(name, symname(kallsyms, i)) == 0 && + kallsyms->symtab[i].st_info != 'U') + return kallsyms->symtab[i].st_value; return 0; } @@ -3822,11 +3849,14 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, module_assert_mutex(); list_for_each_entry(mod, &modules, list) { + /* We hold module_mutex: no need for rcu_dereference_sched */ + struct mod_kallsyms *kallsyms = mod->kallsyms; + if (mod->state == MODULE_STATE_UNFORMED) continue; - for (i = 0; i < mod->num_symtab; i++) { - ret = fn(data, mod->strtab + mod->symtab[i].st_name, - mod, mod->symtab[i].st_value); + for (i = 0; i < kallsyms->num_symtab; i++) { + ret = fn(data, symname(kallsyms, i), + mod, kallsyms->symtab[i].st_value); if (ret != 0) return ret; }
(In reply to Rusty Russell from comment #7) > Rusty Russell <rusty@rustcorp.com.au> writes: > > And there are other places with the same issue. This is a more > > complex, but I think worth it (actually two patches, rolled into one > > for testing): > > And this one actually works... > > diff --git a/include/linux/module.h b/include/linux/module.h > index 4560d8f1545d..2bb0c3085706 100644 > --- a/include/linux/module.h > +++ b/include/linux/module.h > @@ -324,6 +324,12 @@ struct module_layout { > #define __module_layout_align > #endif > > +struct mod_kallsyms { > + Elf_Sym *symtab; > + unsigned int num_symtab; > + char *strtab; > +}; > + > struct module { > enum module_state state; > > @@ -405,15 +411,10 @@ struct module { > #endif > > #ifdef CONFIG_KALLSYMS > - /* > - * We keep the symbol and string tables for kallsyms. > - * The core_* fields below are temporary, loader-only (they > - * could really be discarded after module init). > - */ > - Elf_Sym *symtab, *core_symtab; > - unsigned int num_symtab, core_num_syms; > - char *strtab, *core_strtab; > - > + /* Protected by RCU and/or module_mutex: use rcu_dereference() */ > + struct mod_kallsyms *kallsyms; > + struct mod_kallsyms core_kallsyms; > + > /* Section attributes */ > struct module_sect_attrs *sect_attrs; > > diff --git a/kernel/module.c b/kernel/module.c > index 2149f7003e49..4e4d567139f4 100644 > --- a/kernel/module.c > +++ b/kernel/module.c > @@ -303,6 +303,9 @@ struct load_info { > struct _ddebug *debug; > unsigned int num_debug; > bool sig_ok; > +#ifdef CONFIG_KALLSYMS > + unsigned long mod_kallsyms_init_off; > +#endif > struct { > unsigned int sym, str, mod, vers, info, pcpu; > } index; > @@ -2480,10 +2483,21 @@ static void layout_symtab(struct module *mod, struct > load_info *info) > strsect->sh_flags |= SHF_ALLOC; > strsect->sh_entsize = get_offset(mod, &mod->init_layout.size, strsect, > info->index.str) | INIT_OFFSET_MASK; > - mod->init_layout.size = debug_align(mod->init_layout.size); > pr_debug("\t%s\n", info->secstrings + strsect->sh_name); > + > + /* We'll tack temporary mod_kallsyms on the end. */ > + mod->init_layout.size = ALIGN(mod->init_layout.size, > + __alignof__(struct mod_kallsyms)); > + info->mod_kallsyms_init_off = mod->init_layout.size; > + mod->init_layout.size += sizeof(struct mod_kallsyms); > + mod->init_layout.size = debug_align(mod->init_layout.size); > } > > +/* > + * We use the full symtab and strtab which layout_symtab arranged to > + * be appended to the init section Later we switch to the cut-down > + * core-only one. > + */ > static void add_kallsyms(struct module *mod, const struct load_info *info) > { > unsigned int i, ndst; > @@ -2492,29 +2506,34 @@ static void add_kallsyms(struct module *mod, const > struct load_info *info) > char *s; > Elf_Shdr *symsec = &info->sechdrs[info->index.sym]; > > - mod->symtab = (void *)symsec->sh_addr; > - mod->num_symtab = symsec->sh_size / sizeof(Elf_Sym); > + /* Set up to point into init section. */ > + mod->kallsyms = mod->init_layout.base + info->mod_kallsyms_init_off; > + > + mod->kallsyms->symtab = (void *)symsec->sh_addr; > + mod->kallsyms->num_symtab = symsec->sh_size / sizeof(Elf_Sym); > /* Make sure we get permanent strtab: don't use info->strtab. */ > - mod->strtab = (void *)info->sechdrs[info->index.str].sh_addr; > + mod->kallsyms->strtab = (void *)info->sechdrs[info->index.str].sh_addr; > > /* Set types up while we still have access to sections. */ > - for (i = 0; i < mod->num_symtab; i++) > - mod->symtab[i].st_info = elf_type(&mod->symtab[i], info); > - > - mod->core_symtab = dst = mod->core_layout.base + info->symoffs; > - mod->core_strtab = s = mod->core_layout.base + info->stroffs; > - src = mod->symtab; > - for (ndst = i = 0; i < mod->num_symtab; i++) { > + for (i = 0; i < mod->kallsyms->num_symtab; i++) > + mod->kallsyms->symtab[i].st_info > + = elf_type(&mod->kallsyms->symtab[i], info); > + > + /* Now populate the cut down core kallsyms for after init. */ > + mod->core_kallsyms.symtab = dst = mod->core_layout.base + > info->symoffs; > + mod->core_kallsyms.strtab = s = mod->core_layout.base + info->stroffs; > + src = mod->kallsyms->symtab; > + for (ndst = i = 0; i < mod->kallsyms->num_symtab; i++) { > if (i == 0 || > is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum, > info->index.pcpu)) { > dst[ndst] = src[i]; > - dst[ndst++].st_name = s - mod->core_strtab; > - s += strlcpy(s, &mod->strtab[src[i].st_name], > + dst[ndst++].st_name = s - mod->core_kallsyms.strtab; > + s += strlcpy(s, &mod->kallsyms->strtab[src[i].st_name], > KSYM_NAME_LEN) + 1; > } > } > - mod->core_num_syms = ndst; > + mod->core_kallsyms.num_symtab = ndst; > } > #else > static inline void layout_symtab(struct module *mod, struct load_info *info) > @@ -3263,9 +3282,8 @@ static noinline int do_init_module(struct module *mod) > module_put(mod); > trim_init_extable(mod); > #ifdef CONFIG_KALLSYMS > - mod->num_symtab = mod->core_num_syms; > - mod->symtab = mod->core_symtab; > - mod->strtab = mod->core_strtab; > + /* Switch to core kallsyms now init is done. */ > + rcu_assign_pointer(mod->kallsyms, &mod->core_kallsyms); > #endif > mod_tree_remove_init(mod); > disable_ro_nx(&mod->init_layout); > @@ -3627,6 +3645,11 @@ static inline int is_arm_mapping_symbol(const char > *str) > && (str[2] == '\0' || str[2] == '.'); > } > > +static const char *symname(struct mod_kallsyms *kallsyms, unsigned int > symnum) > +{ > + return kallsyms->strtab + kallsyms->symtab[symnum].st_name; > +} > + > static const char *get_ksymbol(struct module *mod, > unsigned long addr, > unsigned long *size, > @@ -3634,6 +3657,7 @@ static const char *get_ksymbol(struct module *mod, > { > unsigned int i, best = 0; > unsigned long nextval; > + struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms); > > /* At worse, next value is at end of module */ > if (within_module_init(addr, mod)) > @@ -3643,32 +3667,32 @@ static const char *get_ksymbol(struct module *mod, > > /* Scan for closest preceding symbol, and next symbol. (ELF > starts real symbols at 1). */ > - for (i = 1; i < mod->num_symtab; i++) { > - if (mod->symtab[i].st_shndx == SHN_UNDEF) > + for (i = 1; i < kallsyms->num_symtab; i++) { > + if (kallsyms->symtab[i].st_shndx == SHN_UNDEF) > continue; > > /* We ignore unnamed symbols: they're uninformative > * and inserted at a whim. */ > - if (mod->symtab[i].st_value <= addr > - && mod->symtab[i].st_value > mod->symtab[best].st_value > - && *(mod->strtab + mod->symtab[i].st_name) != '\0' > - && !is_arm_mapping_symbol(mod->strtab + > mod->symtab[i].st_name)) > + if (*symname(kallsyms, i) == '\0' > + || is_arm_mapping_symbol(symname(kallsyms, i))) > + continue; > + > + if (kallsyms->symtab[i].st_value <= addr > + && kallsyms->symtab[i].st_value > > kallsyms->symtab[best].st_value) > best = i; > - if (mod->symtab[i].st_value > addr > - && mod->symtab[i].st_value < nextval > - && *(mod->strtab + mod->symtab[i].st_name) != '\0' > - && !is_arm_mapping_symbol(mod->strtab + > mod->symtab[i].st_name)) > - nextval = mod->symtab[i].st_value; > + if (kallsyms->symtab[i].st_value > addr > + && kallsyms->symtab[i].st_value < nextval) > + nextval = kallsyms->symtab[i].st_value; > } > > if (!best) > return NULL; > > if (size) > - *size = nextval - mod->symtab[best].st_value; > + *size = nextval - kallsyms->symtab[best].st_value; > if (offset) > - *offset = addr - mod->symtab[best].st_value; > - return mod->strtab + mod->symtab[best].st_name; > + *offset = addr - kallsyms->symtab[best].st_value; > + return symname(kallsyms, best); > } > > /* For kallsyms to ask for address resolution. NULL means not found. > Careful > @@ -3758,19 +3782,21 @@ int module_get_kallsym(unsigned int symnum, unsigned > long *value, char *type, > > preempt_disable(); > list_for_each_entry_rcu(mod, &modules, list) { > + struct mod_kallsyms *kallsyms; > + > if (mod->state == MODULE_STATE_UNFORMED) > continue; > - if (symnum < mod->num_symtab) { > - *value = mod->symtab[symnum].st_value; > - *type = mod->symtab[symnum].st_info; > - strlcpy(name, mod->strtab + > mod->symtab[symnum].st_name, > - KSYM_NAME_LEN); > + kallsyms = rcu_dereference_sched(mod->kallsyms); > + if (symnum < kallsyms->num_symtab) { > + *value = kallsyms->symtab[symnum].st_value; > + *type = kallsyms->symtab[symnum].st_info; > + strlcpy(name, symname(kallsyms, symnum), > KSYM_NAME_LEN); > strlcpy(module_name, mod->name, MODULE_NAME_LEN); > *exported = is_exported(name, *value, mod); > preempt_enable(); > return 0; > } > - symnum -= mod->num_symtab; > + symnum -= kallsyms->num_symtab; > } > preempt_enable(); > return -ERANGE; > @@ -3779,11 +3805,12 @@ int module_get_kallsym(unsigned int symnum, unsigned > long *value, char *type, > static unsigned long mod_find_symname(struct module *mod, const char *name) > { > unsigned int i; > + struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms); > > - for (i = 0; i < mod->num_symtab; i++) > - if (strcmp(name, mod->strtab+mod->symtab[i].st_name) == 0 && > - mod->symtab[i].st_info != 'U') > - return mod->symtab[i].st_value; > + for (i = 0; i < kallsyms->num_symtab; i++) > + if (strcmp(name, symname(kallsyms, i)) == 0 && > + kallsyms->symtab[i].st_info != 'U') > + return kallsyms->symtab[i].st_value; > return 0; > } > > @@ -3822,11 +3849,14 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, > const char *, > module_assert_mutex(); > > list_for_each_entry(mod, &modules, list) { > + /* We hold module_mutex: no need for rcu_dereference_sched */ > + struct mod_kallsyms *kallsyms = mod->kallsyms; > + > if (mod->state == MODULE_STATE_UNFORMED) > continue; > - for (i = 0; i < mod->num_symtab; i++) { > - ret = fn(data, mod->strtab + mod->symtab[i].st_name, > - mod, mod->symtab[i].st_value); > + for (i = 0; i < kallsyms->num_symtab; i++) { > + ret = fn(data, symname(kallsyms, i), > + mod, kallsyms->symtab[i].st_value); > if (ret != 0) > return ret; > } Test ok. But its really a big patch. The lts kernels such as 3.10 also have the problem, this patch seems not work for them, as some code are different.
bugzilla-daemon@bugzilla.kernel.org writes: > https://bugzilla.kernel.org/show_bug.cgi?id=111541 > > --- Comment #8 from Weilong Chen <chenweilong@huawei.com> --- > (In reply to Rusty Russell from comment #7) >> Rusty Russell <rusty@rustcorp.com.au> writes: >> > And there are other places with the same issue. This is a more >> > complex, but I think worth it (actually two patches, rolled into one >> > for testing): >> >> And this one actually works... >> >> diff --git a/include/linux/module.h b/include/linux/module.h >> index 4560d8f1545d..2bb0c3085706 100644 >> --- a/include/linux/module.h >> +++ b/include/linux/module.h >> @@ -324,6 +324,12 @@ struct module_layout { >> #define __module_layout_align >> #endif >> >> +struct mod_kallsyms { >> + Elf_Sym *symtab; >> + unsigned int num_symtab; >> + char *strtab; >> +}; >> + >> struct module { >> enum module_state state; >> >> @@ -405,15 +411,10 @@ struct module { >> #endif >> >> #ifdef CONFIG_KALLSYMS >> - /* >> - * We keep the symbol and string tables for kallsyms. >> - * The core_* fields below are temporary, loader-only (they >> - * could really be discarded after module init). >> - */ >> - Elf_Sym *symtab, *core_symtab; >> - unsigned int num_symtab, core_num_syms; >> - char *strtab, *core_strtab; >> - >> + /* Protected by RCU and/or module_mutex: use rcu_dereference() */ >> + struct mod_kallsyms *kallsyms; >> + struct mod_kallsyms core_kallsyms; >> + >> /* Section attributes */ >> struct module_sect_attrs *sect_attrs; >> >> diff --git a/kernel/module.c b/kernel/module.c >> index 2149f7003e49..4e4d567139f4 100644 >> --- a/kernel/module.c >> +++ b/kernel/module.c >> @@ -303,6 +303,9 @@ struct load_info { >> struct _ddebug *debug; >> unsigned int num_debug; >> bool sig_ok; >> +#ifdef CONFIG_KALLSYMS >> + unsigned long mod_kallsyms_init_off; >> +#endif >> struct { >> unsigned int sym, str, mod, vers, info, pcpu; >> } index; >> @@ -2480,10 +2483,21 @@ static void layout_symtab(struct module *mod, struct >> load_info *info) >> strsect->sh_flags |= SHF_ALLOC; >> strsect->sh_entsize = get_offset(mod, &mod->init_layout.size, strsect, >> info->index.str) | INIT_OFFSET_MASK; >> - mod->init_layout.size = debug_align(mod->init_layout.size); >> pr_debug("\t%s\n", info->secstrings + strsect->sh_name); >> + >> + /* We'll tack temporary mod_kallsyms on the end. */ >> + mod->init_layout.size = ALIGN(mod->init_layout.size, >> + __alignof__(struct mod_kallsyms)); >> + info->mod_kallsyms_init_off = mod->init_layout.size; >> + mod->init_layout.size += sizeof(struct mod_kallsyms); >> + mod->init_layout.size = debug_align(mod->init_layout.size); >> } >> >> +/* >> + * We use the full symtab and strtab which layout_symtab arranged to >> + * be appended to the init section Later we switch to the cut-down >> + * core-only one. >> + */ >> static void add_kallsyms(struct module *mod, const struct load_info *info) >> { >> unsigned int i, ndst; >> @@ -2492,29 +2506,34 @@ static void add_kallsyms(struct module *mod, const >> struct load_info *info) >> char *s; >> Elf_Shdr *symsec = &info->sechdrs[info->index.sym]; >> >> - mod->symtab = (void *)symsec->sh_addr; >> - mod->num_symtab = symsec->sh_size / sizeof(Elf_Sym); >> + /* Set up to point into init section. */ >> + mod->kallsyms = mod->init_layout.base + info->mod_kallsyms_init_off; >> + >> + mod->kallsyms->symtab = (void *)symsec->sh_addr; >> + mod->kallsyms->num_symtab = symsec->sh_size / sizeof(Elf_Sym); >> /* Make sure we get permanent strtab: don't use info->strtab. */ >> - mod->strtab = (void *)info->sechdrs[info->index.str].sh_addr; >> + mod->kallsyms->strtab = (void *)info->sechdrs[info->index.str].sh_addr; >> >> /* Set types up while we still have access to sections. */ >> - for (i = 0; i < mod->num_symtab; i++) >> - mod->symtab[i].st_info = elf_type(&mod->symtab[i], info); >> - >> - mod->core_symtab = dst = mod->core_layout.base + info->symoffs; >> - mod->core_strtab = s = mod->core_layout.base + info->stroffs; >> - src = mod->symtab; >> - for (ndst = i = 0; i < mod->num_symtab; i++) { >> + for (i = 0; i < mod->kallsyms->num_symtab; i++) >> + mod->kallsyms->symtab[i].st_info >> + = elf_type(&mod->kallsyms->symtab[i], info); >> + >> + /* Now populate the cut down core kallsyms for after init. */ >> + mod->core_kallsyms.symtab = dst = mod->core_layout.base + >> info->symoffs; >> + mod->core_kallsyms.strtab = s = mod->core_layout.base + info->stroffs; >> + src = mod->kallsyms->symtab; >> + for (ndst = i = 0; i < mod->kallsyms->num_symtab; i++) { >> if (i == 0 || >> is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum, >> info->index.pcpu)) { >> dst[ndst] = src[i]; >> - dst[ndst++].st_name = s - mod->core_strtab; >> - s += strlcpy(s, &mod->strtab[src[i].st_name], >> + dst[ndst++].st_name = s - mod->core_kallsyms.strtab; >> + s += strlcpy(s, &mod->kallsyms->strtab[src[i].st_name], >> KSYM_NAME_LEN) + 1; >> } >> } >> - mod->core_num_syms = ndst; >> + mod->core_kallsyms.num_symtab = ndst; >> } >> #else >> static inline void layout_symtab(struct module *mod, struct load_info >> *info) >> @@ -3263,9 +3282,8 @@ static noinline int do_init_module(struct module *mod) >> module_put(mod); >> trim_init_extable(mod); >> #ifdef CONFIG_KALLSYMS >> - mod->num_symtab = mod->core_num_syms; >> - mod->symtab = mod->core_symtab; >> - mod->strtab = mod->core_strtab; >> + /* Switch to core kallsyms now init is done. */ >> + rcu_assign_pointer(mod->kallsyms, &mod->core_kallsyms); >> #endif >> mod_tree_remove_init(mod); >> disable_ro_nx(&mod->init_layout); >> @@ -3627,6 +3645,11 @@ static inline int is_arm_mapping_symbol(const char >> *str) >> && (str[2] == '\0' || str[2] == '.'); >> } >> >> +static const char *symname(struct mod_kallsyms *kallsyms, unsigned int >> symnum) >> +{ >> + return kallsyms->strtab + kallsyms->symtab[symnum].st_name; >> +} >> + >> static const char *get_ksymbol(struct module *mod, >> unsigned long addr, >> unsigned long *size, >> @@ -3634,6 +3657,7 @@ static const char *get_ksymbol(struct module *mod, >> { >> unsigned int i, best = 0; >> unsigned long nextval; >> + struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms); >> >> /* At worse, next value is at end of module */ >> if (within_module_init(addr, mod)) >> @@ -3643,32 +3667,32 @@ static const char *get_ksymbol(struct module *mod, >> >> /* Scan for closest preceding symbol, and next symbol. (ELF >> starts real symbols at 1). */ >> - for (i = 1; i < mod->num_symtab; i++) { >> - if (mod->symtab[i].st_shndx == SHN_UNDEF) >> + for (i = 1; i < kallsyms->num_symtab; i++) { >> + if (kallsyms->symtab[i].st_shndx == SHN_UNDEF) >> continue; >> >> /* We ignore unnamed symbols: they're uninformative >> * and inserted at a whim. */ >> - if (mod->symtab[i].st_value <= addr >> - && mod->symtab[i].st_value > mod->symtab[best].st_value >> - && *(mod->strtab + mod->symtab[i].st_name) != '\0' >> - && !is_arm_mapping_symbol(mod->strtab + >> mod->symtab[i].st_name)) >> + if (*symname(kallsyms, i) == '\0' >> + || is_arm_mapping_symbol(symname(kallsyms, i))) >> + continue; >> + >> + if (kallsyms->symtab[i].st_value <= addr >> + && kallsyms->symtab[i].st_value > >> kallsyms->symtab[best].st_value) >> best = i; >> - if (mod->symtab[i].st_value > addr >> - && mod->symtab[i].st_value < nextval >> - && *(mod->strtab + mod->symtab[i].st_name) != '\0' >> - && !is_arm_mapping_symbol(mod->strtab + >> mod->symtab[i].st_name)) >> - nextval = mod->symtab[i].st_value; >> + if (kallsyms->symtab[i].st_value > addr >> + && kallsyms->symtab[i].st_value < nextval) >> + nextval = kallsyms->symtab[i].st_value; >> } >> >> if (!best) >> return NULL; >> >> if (size) >> - *size = nextval - mod->symtab[best].st_value; >> + *size = nextval - kallsyms->symtab[best].st_value; >> if (offset) >> - *offset = addr - mod->symtab[best].st_value; >> - return mod->strtab + mod->symtab[best].st_name; >> + *offset = addr - kallsyms->symtab[best].st_value; >> + return symname(kallsyms, best); >> } >> >> /* For kallsyms to ask for address resolution. NULL means not found. >> Careful >> @@ -3758,19 +3782,21 @@ int module_get_kallsym(unsigned int symnum, unsigned >> long *value, char *type, >> >> preempt_disable(); >> list_for_each_entry_rcu(mod, &modules, list) { >> + struct mod_kallsyms *kallsyms; >> + >> if (mod->state == MODULE_STATE_UNFORMED) >> continue; >> - if (symnum < mod->num_symtab) { >> - *value = mod->symtab[symnum].st_value; >> - *type = mod->symtab[symnum].st_info; >> - strlcpy(name, mod->strtab + >> mod->symtab[symnum].st_name, >> - KSYM_NAME_LEN); >> + kallsyms = rcu_dereference_sched(mod->kallsyms); >> + if (symnum < kallsyms->num_symtab) { >> + *value = kallsyms->symtab[symnum].st_value; >> + *type = kallsyms->symtab[symnum].st_info; >> + strlcpy(name, symname(kallsyms, symnum), >> KSYM_NAME_LEN); >> strlcpy(module_name, mod->name, MODULE_NAME_LEN); >> *exported = is_exported(name, *value, mod); >> preempt_enable(); >> return 0; >> } >> - symnum -= mod->num_symtab; >> + symnum -= kallsyms->num_symtab; >> } >> preempt_enable(); >> return -ERANGE; >> @@ -3779,11 +3805,12 @@ int module_get_kallsym(unsigned int symnum, unsigned >> long *value, char *type, >> static unsigned long mod_find_symname(struct module *mod, const char *name) >> { >> unsigned int i; >> + struct mod_kallsyms *kallsyms = rcu_dereference_sched(mod->kallsyms); >> >> - for (i = 0; i < mod->num_symtab; i++) >> - if (strcmp(name, mod->strtab+mod->symtab[i].st_name) == 0 && >> - mod->symtab[i].st_info != 'U') >> - return mod->symtab[i].st_value; >> + for (i = 0; i < kallsyms->num_symtab; i++) >> + if (strcmp(name, symname(kallsyms, i)) == 0 && >> + kallsyms->symtab[i].st_info != 'U') >> + return kallsyms->symtab[i].st_value; >> return 0; >> } >> >> @@ -3822,11 +3849,14 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, >> const char *, >> module_assert_mutex(); >> >> list_for_each_entry(mod, &modules, list) { >> + /* We hold module_mutex: no need for rcu_dereference_sched */ >> + struct mod_kallsyms *kallsyms = mod->kallsyms; >> + >> if (mod->state == MODULE_STATE_UNFORMED) >> continue; >> - for (i = 0; i < mod->num_symtab; i++) { >> - ret = fn(data, mod->strtab + mod->symtab[i].st_name, >> - mod, mod->symtab[i].st_value); >> + for (i = 0; i < kallsyms->num_symtab; i++) { >> + ret = fn(data, symname(kallsyms, i), >> + mod, kallsyms->symtab[i].st_value); >> if (ret != 0) >> return ret; >> } > > Test ok. > But its really a big patch. > The lts kernels such as 3.10 also have the problem, this patch seems not work > for them, as some code are different. Yes :( Thankyou for testing! I will push this with CC: stable, and do the backports myself as required. Thanks, Rusty.
See commit 8244062ef1e545 ("modules: fix longstanding /proc/kallsyms vs module insertion race").