Bug 11693
Summary: | ASUS P5QL Pro: Early kernel exception in dmi_table | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Felix Geyer (debfx-kernel) |
Component: | i386 | Assignee: | platform_i386 |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | alan, ijc, mingo |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25-rc1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Kernel 2.6.27 booted with early_ioremap_debug
Output of biosdecode Debug output of while (nrpages > 0) |
Description
Felix Geyer
2008-10-03 14:05:54 UTC
This implies dmi_table is touching a fixmap entry which is no longer valid. The bug patched above exposes the bug, since it would previously fail to reset the fixmap properly. Note: although this is probably a regression, it is not caused by the fingered commit. Rather, the fingered commit fixes another bug which was previously masking this bug. Created attachment 18177 [details]
Kernel 2.6.27 booted with early_ioremap_debug
Created attachment 18178 [details]
Output of biosdecode
I tracked down the crash to the specific line (in drivers/firmware/dmi_scan.c): while ((data - buf < len - 1) && (data[0] || data[1])) The kernel crashes when reading the last byte of the dmi table. To be more specific: it happens when reading data[1] while i=61 (num=62). Please let me know if you need any more debug information. Can you add a printk to the dmi_table loop and print out buf, data, dm->length, len just before the while loop and see what values you get on the crash case. I can several candidates for this bug (incorrect mapping, going one byte off the end, bogus ROM data - eg length going negative) and it would help to know which buf: 4293134096 data: 4293136383 dm->length: 4 len: 2289 Ok so the mapping ends at 385, we are at 383 Interestingly that means the map ends at 0xFFE41001, the last valid byte being the first byte of a page - that makes this look like an off by one error in the mapping code We know the ioremap size is 2289 because the two callers pass the same size value down, so it implies an off by one in early_ioremap. If you stick a +1 in the call to dmi_ioremap does the world miraculously improve ? (I collided mid-air with Alan posting this and it's basically the same but here it anyway) So in hex those are buf: 0xffe40710 data: 0xffe40fff dm->length: 4 len: 0x8f1 so the tables are 0xffe40710 - 0xffe41001 and the last data is 0xffe40fff - 0xffe41003 I'm not sure if 0xffe41001 is inclusive or not but either way the tables cover 0xffe41000 which means we should have mapped this page but seem to have only mapped up to 0xffe40fff. The crash address isn't included for the most recent examples but I guess the fault is at 0xffe41000 (the original screenshot shows 0xffb41000 but that was a different kernel). It would be useful to annotate the "while (nrpages > 0) {" loop in early_ioremap to confirm which mappings are being created. It also seems to be a bug that len doesn't quite cover the last entry but without the first bug the table's authors get away with it because the mapping would be up to 0xffe42000. early_ioremap has last_addr = phys_addr + size - 1; .... size = PAGE_ALIGN(last_addr) - phys_addr; so last_addr will be 0xffe41000 which get's page aligned to ... 0xffe41000, and bang. Right fix is to just drop the -1? Created attachment 18229 [details] Debug output of while (nrpages > 0) Both patches fix the crash bug: diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c @@ -98,7 +98,7 @@ static int __init dmi_walk_early(void (*decode)(const struct dmi_header *)) { u8 *buf; - buf = dmi_ioremap(dmi_base, dmi_len); + buf = dmi_ioremap(dmi_base, dmi_len+1); if (buf == NULL) return -1; diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c @@ -132,7 +132,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, void __iomem *ret_addr; /* Don't allow wraparound or zero size */ - last_addr = phys_addr + size - 1; + last_addr = phys_addr + size; if (!size || last_addr < phys_addr) return NULL; > The crash address isn't included for the most recent examples but I guess the > fault is at 0xffe41000 Yes, it's at 0xffe41000 I captured the attached screenshot using this patch and enabled debug_early_ioremap: diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c @@ -612,6 +612,7 @@ void __init *early_ioremap(unsigned long phys_addr, unsigned long size) idx0 = FIX_BTMAP_BEGIN - NR_FIX_BTMAPS*nesting; idx = idx0; while (nrpages > 0) { + printk(KERN_INFO "nrp: %08x idx: %08x fix(idx): %08lx phys_a: %08lx\n", nrpages, idx, fix_to_virt(idx), phys_addr); early_set_fixmap(idx, phys_addr); phys_addr += PAGE_SIZE; --idx; Right fix I think is a + 1 when computing the size. Patch posted Thanks for your work! Is the patch going to be in 2.6.28 and 2.6.27.1? Should be its in the post 2.6.27 tree now. Thanks for the report |