Bug 13012
Summary: | 2.6.28.9 causes init to segfault on Debian etch; 2.6.28.8 OK | ||
---|---|---|---|
Product: | Other | Reporter: | Barry K. Nathan (barryn) |
Component: | Other | Assignee: | other_other |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | alan, barryn, ole |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.28.9 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
.config from 2.6.28.9 (same as on 2.6.28.8)
kernel config: 2.6.27.27 - hangs when compiled with gcc-4.2 |
Description
Barry K. Nathan
2009-04-05 13:04:01 UTC
Created attachment 20816 [details]
.config from 2.6.28.9 (same as on 2.6.28.8)
It looks like vanilla gcc 4.1.2 doesn't work either: gcc (GCC) 4.1.2 So, on my Debian etch system anyway, the -fwrapv change means I can no longer compile 2.6.28.y with gcc 4.1.x, for *all* values of x, or it won't boot. Hmmm... I tried disabling the following two options (which were enabled in the .config I attached) but it made no difference (with gcc 4.1.2), the problem still occurred: CONFIG_CC_OPTIMIZE_FOR_SIZE CONFIG_OPTIMIZE_INLINING BTW, root filesystem is reiserfs (3.6) in case that is at all relevant. (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). -fwrapv killed Barry's gcc-4.1.2-compiled kernel in 2.6.27.x, 2.6.28.x and presumably 2.6.29, 2.6.30. On Sun, 5 Apr 2009 13:04:02 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=13012 > > Summary: 2.6.28.9 causes init to segfault on Debian etch; > 2.6.28.8 OK > Product: Other > Version: 2.5 > Kernel Version: 2.6.28.9 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Other > AssignedTo: other_other@kernel-bugs.osdl.org > ReportedBy: barryn@pobox.com > Regression: Yes > > > (I haven't had a chance to collect all that data that I wanted to collect > before reporting this bug, but this might be the best I can do for the next > few > days, so I think it's best for me to submit what I have so far instead of > sitting on it.) > > The system in question is a Debian etch system which has a static /dev (no > udev) and does not use an initrd or initramfs. With 2.6.28.8, it works fine. > With 2.6.28.9, init segfaults during boot, so the system fails to finish > booting. (Same problem happens on 2.6.27.21 and does not happen on 2.6.27.20, > but I have not had time to investigate that version in depth.) > > git bisect says "92db6956ecd01ceb7934be0252b3b184a82ebb64 is first bad > commit". > That's "Add '-fwrapv' to gcc CFLAGS". If I take 2.6.28.9 and I revert "Move > cc-option to below arch-specific setup" and "Add '-fwrapv' to gcc CFLAGS" > then > the resulting kernel works again. > > I also noticed that the problem only happens with some gcc's: > > Problem occurs: > gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) > gcc-4.1 (GCC) 4.1.3 20080704 (prerelease) (Debian 4.1.2-25) > > Problem does not occur (i.e. 2.6.28.9 works and I don't have to revert > anything): > gcc-4.2 (GCC) 4.2.4 (Debian 4.2.4-6) > gcc (Debian 4.3.2-1.1) 4.3.2 > > (BTW, on 2.6.29, I'm having a problem where if I'm running 2.6.29 then make > instantly blows up in a strange way, and the same pattern of working/failing > compilers is showing up there too.) > > I have not yet had a chance to try vanilla gcc 4.1.2. That's what I intend to > try next. > > I'll attach my .config within the next 24 hours, and I'll try to post serial > console captures by sometime Tuesday or so. If there's any other information > which would help, feel free to ask... > > > My gut feeling is that this is truly a gcc bug rather than a kernel bug, but > it's still disconcerting to see this kind of breakage suddenly pop up in a > -stable series, so IMO it may be worth considering reverting the -fwrapv > changes for 2.6.2[78].y. > On Thu, 9 Apr 2009, Andrew Morton wrote: > > -fwrapv killed Barry's gcc-4.1.2-compiled kernel in 2.6.27.x, 2.6.28.x > and presumably 2.6.29, 2.6.30. Auughh. I hate compiler bugs. They're horrible to debug. I _think_ 'fwrapv' only really matters with gcc-4.3, so maybe we could just enable it for new versions. HOWEVER, I also wonder if we could instead of "-fwrapv" use "-fno-strict-overflow". They are apparently subtly different, and maybe the bug literally only happens with -fwrapv. Barry, can you see if that simple "replace -fwrapv with -fno-strict-overflow" works for you? Or just go with Barry's helpful debugging: > > I also noticed that the problem only happens with some gcc's: > > > > Problem occurs: > > gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) > > gcc-4.1 (GCC) 4.1.3 20080704 (prerelease) (Debian 4.1.2-25) > > > > Problem does not occur (i.e. 2.6.28.9 works and I don't have to revert > > anything): > > gcc-4.2 (GCC) 4.2.4 (Debian 4.2.4-6) > > gcc (Debian 4.3.2-1.1) 4.3.2 and consider 4.2 to be the point where it's ok. Do we have some gcc developer who (a) knows what the rules are and (b) might even help us figure out where the bug occurs? Linus On Thu, 9 Apr 2009, Linus Torvalds wrote: > On Thu, 9 Apr 2009, Andrew Morton wrote: > > -fwrapv killed Barry's gcc-4.1.2-compiled kernel in 2.6.27.x, > > 2.6.28.x and presumably 2.6.29, 2.6.30. > > Auughh. I hate compiler bugs. They're horrible to debug. > > I _think_ 'fwrapv' only really matters with gcc-4.3, so maybe we could > just enable it for new versions. > > HOWEVER, I also wonder if we could instead of "-fwrapv" use > "-fno-strict-overflow". They are apparently subtly different, and maybe > the bug literally only happens with -fwrapv. > > Barry, can you see if that simple "replace -fwrapv with > -fno-strict-overflow" works for you? > > Or just go with Barry's helpful debugging: > > > I also noticed that the problem only happens with some gcc's: > > > > > > Problem occurs: > > > gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) > > > gcc-4.1 (GCC) 4.1.3 20080704 (prerelease) (Debian 4.1.2-25) > > > > > > Problem does not occur (i.e. 2.6.28.9 works and I don't have to > > > revert anything): > > > gcc-4.2 (GCC) 4.2.4 (Debian 4.2.4-6) > > > gcc (Debian 4.3.2-1.1) 4.3.2 > > and consider 4.2 to be the point where it's ok. > > Do we have some gcc developer who > (a) knows what the rules are > and > (b) might even help us figure out where the bug occurs? The discussion on issue looks to have died, but it has bitten Debian stable ("Lenny") [1] as it causes init to die on s390 after a kernel update. Here's a possible patch. The exact gcc version to check for is still a bit open I guess. For the s390 issue I've confirmed that 4.2.4 is OK, but for safety and because of Andrew's comment above I've set the test for 4.3 in the patch. Cheers, FJP [1] http://bugs.debian.org/536354 --- From: Frans Pop <elendil@planet.nl> Subject: Only add '-fwrapv' to gcc CFLAGS for gcc 4.3 and later This flag has been shown to cause init to segfault for kernels compiled with gcc-4.1. gcc version 4.2.4 has been shown to be OK, but as there is some uncertainty the flag is only added for 4.3 and later. This fixes http://bugzilla.kernel.org/show_bug.cgi?id=13012. Reported-by: Barry K. Nathan <barryn@pobox.com> Signed-off-by: Frans Pop <elendil@planet.nl> diff --git a/Makefile b/Makefile index 0aeec59..2f8756e 100644 --- a/Makefile +++ b/Makefile @@ -565,7 +565,8 @@ KBUILD_CFLAGS += $(call cc-option,-Wdeclaration-after-statement,) KBUILD_CFLAGS += $(call cc-option,-Wno-pointer-sign,) # disable invalid "can't wrap" optimizations for signed / pointers -KBUILD_CFLAGS += $(call cc-option,-fwrapv) +KBUILD_CFLAGS += $(shell if [ $(call cc-version) -ge 0430 ]; then \ + echo $(call cc-option,-fwrapv); fi ;) # revert to pre-gcc-4.4 behaviour of .eh_frame KBUILD_CFLAGS += $(call cc-option,-fno-dwarf2-cfi-asm) On Friday 10 July 2009, Frans Pop wrote:
> On Thu, 9 Apr 2009, Linus Torvalds wrote:
> > On Thu, 9 Apr 2009, Andrew Morton wrote:
> > > -fwrapv killed Barry's gcc-4.1.2-compiled kernel in 2.6.27.x,
> > > 2.6.28.x and presumably 2.6.29, 2.6.30.
> >
> > Auughh. I hate compiler bugs. They're horrible to debug.
> >
> > I _think_ 'fwrapv' only really matters with gcc-4.3, so maybe we
> > could just enable it for new versions.
> >
> > HOWEVER, I also wonder if we could instead of "-fwrapv" use
> > "-fno-strict-overflow". They are apparently subtly different, and
> > maybe the bug literally only happens with -fwrapv.
> >
> > Barry, can you see if that simple "replace -fwrapv with
> > -fno-strict-overflow" works for you?
Prompted by the same suggestion from Ben Hutchings I checked this too,
but -fno-strict-overflow was only introduced in gcc 4.2.
So using it instead of -fwrapv *would* fix the problem for gcc 4.1, but
*only* because it would effectively do the same as the patch I proposed:
not add an option at all for gcc 4.1.
So that change seems illogical unless there are other reasons to
prefer -fno-strict-overflow over -fwrapv (well, it would avoid the
gcc version check).
It does however make it somewhat more logical to change the test in my
proposed patch to also allow -fwrapv for gcc 4.2.
Cheers,
FJP
On Friday 10 July 2009, Frans Pop wrote: > The discussion on issue looks to have died, but it has bitten Debian > stable ("Lenny") [1] as it causes init to die on s390 after a kernel > update. > > Here's a possible patch. The exact gcc version to check for is still a > bit open I guess. For the s390 issue I've confirmed that 4.2.4 is OK, > but for safety and because of Andrew's comment above I've set the test > for 4.3 in the patch. Here's an updated patch as I found the gcc version check was incorrect (0430 should have been 0403; sorry). I've now changed the check to allow -fwrapv for gcc 4.2 as that has been shown to work and because of the consideration mentioned in my previous mail. --- From: Frans Pop <elendil@planet.nl> Subject: Only add '-fwrapv' to gcc CFLAGS for gcc 4.2 and later This flag has been shown to cause init to segfault for kernels compiled with gcc-4.1. gcc version 4.2.4 has been shown to be OK. This fixes http://bugzilla.kernel.org/show_bug.cgi?id=13012. Reported-by: Barry K. Nathan <barryn@pobox.com> Signed-off-by: Frans Pop <elendil@planet.nl> diff --git a/Makefile b/Makefile index 0aeec59..2519fde 100644 --- a/Makefile +++ b/Makefile @@ -565,7 +565,8 @@ KBUILD_CFLAGS += $(call cc-option,-Wdeclaration-after-statement,) KBUILD_CFLAGS += $(call cc-option,-Wno-pointer-sign,) # disable invalid "can't wrap" optimizations for signed / pointers -KBUILD_CFLAGS += $(call cc-option,-fwrapv) +KBUILD_CFLAGS += $(shell if [ $(call cc-version) -ge 0402 ]; then \ + echo $(call cc-option,-fwrapv); fi ;) # revert to pre-gcc-4.4 behaviour of .eh_frame KBUILD_CFLAGS += $(call cc-option,-fno-dwarf2-cfi-asm) On Fri, 10 Jul 2009, Frans Pop wrote: > > Prompted by the same suggestion from Ben Hutchings I checked this too, > but -fno-strict-overflow was only introduced in gcc 4.2. > So using it instead of -fwrapv *would* fix the problem for gcc 4.1, but > *only* because it would effectively do the same as the patch I proposed: > not add an option at all for gcc 4.1. > > So that change seems illogical unless there are other reasons to > prefer -fno-strict-overflow over -fwrapv (well, it would avoid the > gcc version check). > > It does however make it somewhat more logical to change the test in my > proposed patch to also allow -fwrapv for gcc 4.2. Hmm. It all really makes me suspect that we should really be using -fno-strict-overflow instead. That not only apparently avoids the unnecessary gcc version check (by virtue of the option only existing in compilers that don't have the problem), but qutie frankly, one of the core people involved with the whole thing (Ian Lance Taylor) seems to think it's the better option. See for example http://www.airs.com/blog/archives/120 on how gcc actually generates better code with -fno-strict-overflow. I added Ian to the cc. Ian: we generally do try to be careful and use explicit unsigned types for code that cares about overflow, but we use -fwrapv because there have been some cases where we didn't (and used pointer comparisons or signed integers). The problem is that apparently gcc-4.1.x was literally generating buggy code with -fwrapv. So now the choice for us is between switching to an explicit version test: -KBUILD_CFLAGS += $(call cc-option,-fwrapv) +KBUILD_CFLAGS += $(shell if [ $(call cc-version) -ge 0402 ]; then \ + echo $(call cc-option,-fwrapv); fi ;) or just switching to -fno-strict-overflow instead: -KBUILD_CFLAGS += $(call cc-option,-fwrapv) +KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow) which avoids the buggy gcc versions because it's simply not even supported by gcc-4.1.x (and even if that wasn't the case, possibly because only 'wrapv' is the problematic case - apparently the difference _does_ matter to gcc). From everything I have been able to find, I really prefer the second version. Not only is the patch cleaner, but it looks like code generation is better too (for some inexplicable reason, but I suspect it's because -fno-strict-overflow is just saner). Linus On Sun, 12 Jul 2009, Linus Torvalds wrote:
>
> From everything I have been able to find, I really prefer the second
> version. Not only is the patch cleaner, but it looks like code generation
> is better too (for some inexplicable reason, but I suspect it's because
> -fno-strict-overflow is just saner).
Hmm. I just checked. The file that caused us to do this thing in the first
place (fs/open.c, around like 415, which does:
/* Check for wrap through zero too */
if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 0))
goto out_fput;
to check that the resulting 'loffset_t' type is all good) has interesting
behaviour with my version of gcc (gcc version 4.4.0 20090506 (Red Hat
4.4.0-4) (GCC)).
- Without any options:
leaq (%rcx,%rdx), %rdi #, tmp73
movq 256(%rbx), %rsi # <variable>.i_sb, <variable>.i_sb
movl $-27, %eax #, D.29131
cmpq 40(%rsi), %rdi # <variable>.s_maxbytes, tmp73
ja .L148 #,
- With -fno-strict-overflow:
leaq (%rcx,%rdx), %rax #, D.29157
movq 256(%rbx), %rsi # <variable>.i_sb, <variable>.i_sb
cmpq 40(%rsi), %rax # <variable>.s_maxbytes, D.29157
ja .L154 #,
testq %rax, %rax # D.29157
js .L154 #,
- With -fwrapv:
leaq (%rcx,%rdx), %rax #, D.29158
movq 256(%rbx), %rsi # <variable>.i_sb, <variable>.i_sb
cmpq 40(%rsi), %rax # <variable>.s_maxbytes, D.29158
ja .L154 #,
testq %rax, %rax # D.29158
js .L154 #,
and from this it would look like:
- gcc on its own is actually the best version (the first comparison is
unsigned because s_maxbytes is actually 'unsigned long long', so it
actually does the right thing!)
In other words, the whole '< 0' was unnecessary, but does make the
source code way more readable, and makes the source code _correct_
regardless of any type issues!
- From a cursory inspection, -fno-strict-overflow and -fwrapv are both
equivalent in this code, and both do the stupid thing (but for good
reason - gcc doesn't know that 's_maxbytes' might not be 'negative in a
loffset_t', so technically speaking the extraneous 'js' is not
extraneous, because it can actually trigger some "more negative"
entries than s_maxbyes is.
- HOWEVER:
[torvalds@nehalem ~]$ git diff --stat open.s open.s-fno-strict-overflow
open.s => open.s-fno-strict-overflow | 22 +++++++++++++---------
1 files changed, 13 insertions(+), 9 deletions(-)
[torvalds@nehalem ~]$ git diff --stat open.s open.s-fwrapv
open.s => open.s-fwrapv | 296 ++++++++++++++++++++++++-----------------------
1 files changed, 150 insertions(+), 146 deletions(-)
where the _only_ difference that '-fno-strict-overflow' introduces is
that one small area (it's saying 22 lines changed, but that's because
there's also the compiler option listing at the top of the file etc)
In contrast, -fwrapv has done a lot of other changes too. Now, in both
cases, it really only added the same four instructions (testq + js +
branchtarget + jumparound).
It looks like 'fwrapv' generates more temporaries (possibly for the code
that treies to enforce the exact twos-complement behavior) that then all
get optimized back out again. The differences seem to be in the temporary
variable numbers etc, not in the actual code.
So fwrapv really _is_ different from fno-strict-pverflow, and disturbs the
code generation more.
IOW, I'm convinced we should never use fwrapv. It's clearly a buggy piece
of sh*t, as shown by our 4.1.x experiences. We should use
-fno-strict-overflow.
Will commit the following (which also fits naming-wise with our use of
'-fno-strict-aliasing').
Linus
---
Makefile | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/Makefile b/Makefile
index 0aeec59..bbe8453 100644
--- a/Makefile
+++ b/Makefile
@@ -565,7 +565,7 @@ KBUILD_CFLAGS += $(call cc-option,-Wdeclaration-after-statement,)
KBUILD_CFLAGS += $(call cc-option,-Wno-pointer-sign,)
# disable invalid "can't wrap" optimizations for signed / pointers
-KBUILD_CFLAGS += $(call cc-option,-fwrapv)
+KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow)
# revert to pre-gcc-4.4 behaviour of .eh_frame
KBUILD_CFLAGS += $(call cc-option,-fno-dwarf2-cfi-asm)
Reply-To: iant@google.com Linus Torvalds <torvalds@linux-foundation.org> writes: > Ian: we generally do try to be careful and use explicit unsigned types for > code that cares about overflow, but we use -fwrapv because there have been > some cases where we didn't (and used pointer comparisons or signed > integers). > > The problem is that apparently gcc-4.1.x was literally generating buggy > code with -fwrapv. So now the choice for us is between switching to an > explicit version test: > > -KBUILD_CFLAGS += $(call cc-option,-fwrapv) > +KBUILD_CFLAGS += $(shell if [ $(call cc-version) -ge 0402 ]; then \ > + echo $(call cc-option,-fwrapv); fi ;) > > or just switching to -fno-strict-overflow instead: > > -KBUILD_CFLAGS += $(call cc-option,-fwrapv) > +KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow) > > which avoids the buggy gcc versions because it's simply not even supported > by gcc-4.1.x (and even if that wasn't the case, possibly because only > 'wrapv' is the problematic case - apparently the difference _does_ > matter to gcc). My instinctive advice is that y'all should track down and fix the cases where the program relies on signed overflow being defined. However, if that is difficult--and it is--then I agree that -fno-strict-overflow is preferable when using a compiler which supports it (gcc 4.2.0 and later). (The gcc 4.2 and later option -Wstrict-overflow=N can help find the cases where a program relies on defined signed overflow, but only if somebody is patient enough to wade through all the false positives.) Ian OK, this patch is in 2.6.27.27 and now gcc-4.2.4 produces unbootable kernel on both i386 and x86-64 for my Dell PoweEdge 1950 servers. Reverting this patch: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6-stable.git;a=commitdiff;h=7d317ec29c587f76bbfa375ef3b832204f5bffe5 solves my problem. The same problem exists if 2.6.28.10 is manually patched with this "fix". Bootable kernel (2.6.28.10 / 2.6.27.25 / 2.6.27.27 + 7d317ec29c587f76bbfa375ef3b832204f5bffe5 revert): (...) radeonfb 0000:0e:0d.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19 radeonfb: Found Intel x86 BIOS ROM Image radeonfb: Retrieved PLL infos from BIOS radeonfb: Reference=27.00 MHz (RefDiv=12) Memory=200.00 Mhz, System=200.00 MHz radeonfb: PLL min 20000 max 50000 Switched to high resolution mode on CPU 2 Switched to high resolution mode on CPU 3 Switched to high resolution mode on CPU 1 Switched to high resolution mode on CPU 0 radeonfb: Monitor 1 type CRT found radeonfb: EDID probed radeonfb: Monitor 2 type no found Console: switching to colour frame buffer device 128x48 radeonfb (0000:0e:0d.0): ATI Radeon 515e "Q^" (...) Unbootable kernel (2.6.28.10 + 7d317ec29c587f76bbfa375ef3b832204f5bffe5 / 2.6.27.27): (...) radeonfb 0000:0e:0d.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19 radeonfb: Found Intel x86 BIOS ROM Image radeonfb: Retrieved PLL infos from BIOS radeonfb: Reference=27.00 MHz (RefDiv=12) Memory=200.00 Mhz, System=200.00 MHz radeonfb: PLL min 20000 max 50000 radeonfb: Monitor 1 type CRT found radeonfb: EDID probed radeonfb: Monitor 2 type no found (HANG) So instead of gcc-4.1+wrapv hang we now have gcc-4.2+strict-overflow hang. :( Created attachment 22412 [details]
kernel config: 2.6.27.27 - hangs when compiled with gcc-4.2
Fixed by http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3730793d457fed79a6d49bae72996d458c8e4f2d gcc-4.2.4 BUG. Reply-To: davej@redhat.com On Sun, Jul 12, 2009 at 10:29:50PM -0700, Ian Lance Taylor wrote: > (The gcc 4.2 and later option -Wstrict-overflow=N can help find the > cases where a program relies on defined signed overflow, but only if > somebody is patient enough to wade through all the false positives.) I got curious and wondered just how many warnings this would trigger, so I did a build with -Wstrict-overflow=5. I was pleasantly surprised to see that it isn't that many from an allmodconfig build.. crypto/algboss.c:173: warning: assuming signed overflow does not occur when simplifying conditional to constant net/ipv4/igmp.c:773: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/ata/libata-eh.c:3076: warning: assuming signed overflow does not occur when simplifying conditional to constant sound/core/seq/seq_queue.c:195: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/block/ub.c:2293: warning: assuming signed overflow does not occur when simplifying conditional to constant sound/pci/es1968.c:1545: warning: assuming signed overflow does not occur when simplifying conditional to constant sound/pci/es1968.c:1582: warning: assuming signed overflow does not occur when simplifying conditional to constant fs/ext3/inode.c:658: warning: assuming signed overflow does not occur when simplifying conditional to constant fs/ext4/inode.c:797: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/edac/amd64_edac.c:726: warning: assuming signed overflow does not occur when simplifying conditional to constant sound/soc/codecs/wm8988.c:665: warning: assuming signed overflow does not occur when simplifying conditional to constant net/bluetooth/cmtp/core.c:234: warning: assuming signed overflow does not occur when simplifying conditional to constant net/mac80211/rc80211_minstrel.c:193: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/isdn/hardware/eicon/debug.c:419: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/uio/uio.c:707: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/uio/uio.c:671: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/uio/uio.c:642: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/net/usb/hso.c:2308: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/video/matrox/matroxfb_g450.c:214: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/video/matrox/matroxfb_g450.c:160: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/video/matrox/matroxfb_maven.c:1125: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/video/matrox/matroxfb_maven.c:1044: warning: assuming signed overflow does not occur when simplifying conditional to constant drivers/net/wireless/zd1211rw/zd_rf_uw2453.c:417: warning: assuming signed overflow does not occur when simplifying conditional to constant Some of these appear to be obvious cases where we don't care about the sign (loop counters for eg) Is it worth fixing these up, with diffs like the below ? Dave diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 5f51fed..3bb95de 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -595,7 +595,7 @@ static int ext3_alloc_branch(handle_t *handle, struct inode *inode, int *offsets, Indirect *branch) { int blocksize = inode->i_sb->s_blocksize; - int i, n = 0; + unsigned int i, n = 0; int err = 0; struct buffer_head *bh; int num; On Fri, 24 Jul 2009, Dave Jones wrote:
>
> Is it worth fixing these up, with diffs like the below ?
Historically, when we fix up bugs like this, it causes more bugs than it
fixes.
It needs _really_ careful people, and people who really understand how C
type rules work. Stuff that looks obvious often is not, and basing a large
part of the patch on a compiler warning is a _very_ weak reason for
something that is more than just syntactic.
And the problem is, nobody can judge the patch from the diff. So it gets
absolutely zero review, until the day when somebody notices that the
unsigned version is a bug.
I'd suggest that the rule should be:
- if you can use -U30 and show all uses, so that people can actually look
at the patch and see what it _causes_ (ie not just the "we change 'i'
to 'unsigned'", but also the "this is where 'i' gets used, and
'unsigned' is right"), then we can apply it.
- none of the patches go through a 'trivial' tree, or come from newbies
that think this is a good way to get involved.
Is it worth it at that point? I dunno.
Linus
|