Bug 13941 - x86 Geode issue
Summary: x86 Geode issue
Status: REOPENED
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jonathan Nieder
URL:
Keywords:
Depends on:
Blocks: 13615
  Show dependency tree
 
Reported: 2009-08-09 21:40 UTC by Rafael J. Wysocki
Modified: 2014-09-09 12:45 UTC (History)
11 users (show)

See Also:
Kernel Version: 3.5.1
Tree: Mainline
Regression: Yes


Attachments
Debug patch that shows the corruption (4.20 KB, patch)
2009-11-16 00:51 UTC, Stefan Bader
Details | Diff
Debug v2 to v5 diff (1.59 KB, patch)
2009-11-16 00:58 UTC, Stefan Bader
Details | Diff
kernel config (108.77 KB, application/octet-stream)
2009-12-27 20:21 UTC, Martin-Éric Racine
Details
This is Ubuntu's 2.6.32 config. (112.79 KB, application/octet-stream)
2009-12-28 08:01 UTC, Martin-Éric Racine
Details
dmesg -r (36.90 KB, text/plain)
2010-02-25 23:25 UTC, Martin-Éric Racine
Details
dmesg -r on kernel 2.6.32 (29.09 KB, text/plain)
2010-03-02 00:27 UTC, Martin-Éric Racine
Details
attachment-17854-0.html (1.71 KB, text/html)
2014-08-28 08:55 UTC, deloptes
Details

Description Rafael J. Wysocki 2009-08-09 21:40:24 UTC
Subject    : x86 Geode issues in kernel >= 2.6.23 and >= 2.6.31-rc4
Submitter  : "Martin-Éric Racine" <q-funk@iki.fi>
Date       : 2009-08-03 12:58
References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4

This entry is being used for tracking a regression from 2.6.29.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Martin-Éric Racine 2009-08-17 10:37:10 UTC
http://launchpadlibrarian.net/30267494/2.6.31-5.24.jpg shows a snapshot of the kernel panic in action.

FYI Ubuntu's 2.6.31-5.24 package is based on upstream 2.6.31-rc5.
Comment 2 Martin-Éric Racine 2009-08-17 10:57:50 UTC
http://launchpadlibrarian.net/30414918/ubuntu_kernel_2.6.31-6.25.jpg shows an updated snapshot of the kernel panic, this time against 2.6.31-rc6.

PS: https://bugs.launchpad.net/linux/+bug/396286 is where the thread about this issue started.
Comment 3 Martin-Éric Racine 2009-08-17 11:00:31 UTC
PS: contrary to what Rafael said above, this tracks a regression starting with 2.6.31-rcX.  Kernels up to and including 2.6.30 are not affected.
Comment 4 Rafael J. Wysocki 2009-08-17 11:42:44 UTC
Yes, that was a mistake, sorry.  The right meta-bug is linked, though.
Comment 5 Leann Ogasawara 2009-08-18 16:16:11 UTC
Working with Martin-Éric in the Launchpad bug, a bisect seemed to narrow down the following commit:

f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0 is first bad commit
commit f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Mon Jun 8 19:50:45 2009 -0400

    add caching of ACLs in struct inode

    No helpers, no conversions yet.

    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Comment 6 Rafael J. Wysocki 2009-08-20 15:01:49 UTC
On Thursday 20 August 2009, Martin-Éric Racine wrote:
> Yes, it's still valid.
> 
> Screenshots of the crash have been provided. Is anything else missing
> for the LKML to be able to debug and fix this?
> 
> On Wed, Aug 19, 2009 at 11:26 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30.  Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=13941
> > Subject         : x86 Geode issue
> > Submitter       : Martin-Éric Racine <q-funk@iki.fi>
> > Date            : 2009-08-03 12:58 (17 days old)
> > References      : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
Comment 7 Rafael J. Wysocki 2009-08-26 20:58:28 UTC
On Wednesday 26 August 2009, Martin-Éric Racine wrote:
> Yes, it still is valid.
> 
> On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30.  Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=13941
> > Subject         : x86 Geode issue
> > Submitter       : Martin-Éric Racine <q-funk@iki.fi>
> > Date            : 2009-08-03 12:58 (23 days old)
> > References      : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
Comment 8 Rafael J. Wysocki 2009-09-10 23:03:10 UTC
On Sunday 06 September 2009, Martin-Éric Racine wrote:
> Yes, it should still be listed, for as long as it hasn't been resolved.
> 
> On Sun, Sep 6, 2009 at 8:24 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30.  Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=13941
> > Subject         : x86 Geode issue
> > Submitter       : Martin-Éric Racine <q-funk@iki.fi>
> > Date            : 2009-08-03 12:58 (35 days old)
> > References      : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
Comment 9 Martin-Éric Racine 2009-09-10 23:11:52 UTC
Erm... What is comment #8 supposed to be about?
Comment 10 Stefan Bader 2009-09-15 13:53:42 UTC
This is not complete yet, but to give a current status and maybe someone has an idea about the origins. Both, bisecting and looking at the panic show that the code added with the ACL caching framework is the place where the panic happens. The bug happens when __destroy_inode tries to free the i_acl structure. From the disassembly and the panic it can be seen that 0xffffb4ff is the content of i_acl. This looks like the value for ACL_NOT_CACHED (0xffffffff) was partially overwritten by a single byte value.
Experimentally swapping around the order of the acl pointers behind the previously existing private pointer lets the system boot.
So this might be either an issue with geode which has always being around but went unnoticed as the private pointer was not used or something accesses the inode structure directly with an offset hoping to find the private pointer at that place.
Comment 11 Rafael J. Wysocki 2009-10-26 22:12:39 UTC
On Monday 26 October 2009, Martin-Éric Racine wrote:
> I do not recall anyone on LKML actually ever doing any work towards
> fixing this issue so, yes, it is still open.
> 
> On Mon, Oct 26, 2009 at 9:31 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.30 and 2.6.31.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.30 and 2.6.31.  Please verify if it still should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=13941
> > Subject         : x86 Geode issue
> > Submitter       : Martin-Éric Racine <q-funk@iki.fi>
> > Date            : 2009-08-03 12:58 (85 days old)
> > References      : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
Comment 12 Martin-Éric Racine 2009-11-08 16:01:49 UTC
It should be noted that Stefan Bader has done a significant amount of work to help me narrow down the cause of this bug. He has produced a number of patches that improve the amount and quality of debug info sent by this new ACL caching code and I have routinely submitted a number of dmesg outputs to match. 

At this point, the collaboration of the LKML to analyze the results and fix the regression introduced by the addition of ACL caching would be extremely desirable.
Comment 13 Stefan Bader 2009-11-16 00:51:09 UTC
Created attachment 23792 [details]
Debug patch that shows the corruption

With this patch added to the crashing kernel on the Geode, the invalid pointers get detected and reported. The debug logs show a vast majority (I believe I only saw one single different case) of detections on destroy.
Comment 14 Stefan Bader 2009-11-16 00:58:56 UTC
Created attachment 23793 [details]
Debug v2 to v5 diff

This is the diff between v2 and v5 of my debugging patches. The interesting part is that solely by immediately testing (reading back) the values of the acl pointers, the problem seems to go away. So really, this sounds like some strange interaction of processor and caches than a race condition. Especially as there have been runs with SMP set to off and still corruption was observed.
Comment 15 Otavio Salvador 2009-11-21 22:20:01 UTC
I can't reproduce it on the hardwares I have here. I use 2.6.31.6 with following patches applied above of it:

 - BFS
 - AUFS2

and it works well for all LX and GX hardware I have with me.
Comment 16 Rafael J. Wysocki 2009-11-21 22:35:54 UTC
OK, closing as unreproducible.  If anyone can reproduce it with 2.6.31.6, please feel free to reopen.
Comment 17 Martin-Éric Racine 2009-11-21 22:58:47 UTC
I can still reproduce it with 2.6.31.6 and with 2.6.32, so, no, you cannot close it.
Comment 18 Martin-Éric Racine 2009-11-21 23:01:20 UTC
PS: and as it seems that I cannot re-open it myself, please do it for me.
Comment 19 Andres Salomon 2009-12-27 19:53:57 UTC
I'm unable to reproduce this bug w/ 2.6.33-rc2, built with gcc version 4.3.2 (Debian 4.3.2-1.1).  This really needs a .config supplied in order to reproduce it (I'm started to suspect that this is ubuntu-specific).

Martin or Stefan, can you please supply the .config that you're using?
Comment 20 Andres Salomon 2009-12-27 19:57:43 UTC
I should also note that I am using ext3 w/ POSIX_ACLs enabled, on an LX w/ the following root:

/dev/hda1 / ext3 rw,noatime,errors=continue,data=writeback 0 0
Comment 21 Martin-Éric Racine 2009-12-27 20:21:33 UTC
Created attachment 24319 [details]
kernel config

This is the config used on Ubuntu kernels.
Comment 22 Martin-Éric Racine 2009-12-27 20:23:56 UTC
Here, what I have is the following (replace /dev/hda1 with an UUID statement):

/dev/hda1 /               ext4    defaults,relatime,errors=remount-ro 0       1
Comment 23 Andres Salomon 2009-12-28 00:27:37 UTC
Can you please provide a newer .config that the problem occurs with?  oldconfig wants to make some large changes to the config when I attempt to use 2.6.32 (and 2.6.31-rc5 fails to build).
Comment 24 Martin-Éric Racine 2009-12-28 08:01:11 UTC
Created attachment 24324 [details]
This is Ubuntu's 2.6.32 config.
Comment 25 Andres Salomon 2009-12-28 21:58:57 UTC
With that .config and 2.6.32 on ext3, I'm afraid that I'm still unable to reproduce the
problem.

Things to check:
 - Are you positive that you're using ext3 and not ext4?
 - Try building a vanilla 2.6.32 kernel; this may be broken due to a patch that
Ubuntu has added
 - Try building the Ubuntu kernel on an older compiler (I'm using 4.3.2 (Debian
4.3.2-1.1)).  It could be a compiler bug..
Comment 26 H. Peter Anvin 2009-12-28 22:05:47 UTC
Could this be a change in the host compiler to default to -march=i686?  This has been reported on Fedora 12, that building the decompression code uses the default arch setting rather than what was set in Kconfig.  See:

http://git.kernel.org/tip/17a2a9b57a9a7d2fd8f97df951b5e63e0bd56ef5
Comment 27 Martin-Éric Racine 2009-12-29 00:34:55 UTC
I tried both ext3 and ext4.  Same crash regardless.

The Ubuntu guys maintain builds of stock kernels i.e. upstream vanilla kernels with the default configuration, packaged as a .deb for convenience, as reference material to test for precisely this sort of regressions. I got the same crash when testing with those.

Given Peter's assumption and Andres' third idea, I'd be tempted to blame it on recent host compiler defaults changes.

Could Leann or Stefan please point me to instructions on building their stock 2.6.32-9-generic kernel? It's gonna take forever to complete, but I could launch a build of that on the Geode host itself, install it and see if that one boots.

Peter: given how recent GCC produce broken i386 code anyhow, would -march=i486 or -march=i586 make more sense as a default base level for x86 kernels?
Comment 28 H. Peter Anvin 2009-12-29 00:38:36 UTC
Not sure how recent gcc produces broken i386 code, example please.

Either way, there really isn't any point in deviating for -march=i386; even with -march=i686 the differences are minimal (a handful of CMOV).
Comment 29 Martin-Éric Racine 2009-12-29 00:43:44 UTC
Trying "apt-get --compile source linux-image-2.6.32-9-generic" on the Geode now. I'll report ASAP on whether the resulting package boots any better or not.
Comment 30 Martin-Éric Racine 2010-01-04 11:41:22 UTC
Repeatedly trying to build 2.6.32 with Peter Anvin's patch following instructions at https://help.ubuntu.com/community/Kernel/Compile consistently fails:

  CC arch/x86/kernel/alternative.o
  CC arch/x86/kernel/i8253.o
  CC arch/x86/kernel/pci-nommu.o
  CC arch/x86/kernel/tsc.o
  CC arch/x86/kernel/io_delay.o
  CC arch/x86/kernel/rtc.o
  CC arch/x86/kernel/trampoline.o
  CC arch/x86/kernel/process.o
arch/x86/kernel/process.o: final close failed: File truncated
make[5]: *** [arch/x86/kernel/process.o] Error 1
make[4]: *** [arch/x86/kernel] Error 2
make[3]: *** [arch/x86] Error 2
make[2]: *** [sub-make] Error 2
make[1]: *** [/home/q-funk/Projektit/linux-2.6.32/debian/stamps/stamp-build-generic] Error 2
make: *** [binary-generic] Error 2

This is on a system with 1 GB of RAM and plenty of hard-disk space, so I'm really not sure how this "file truncated" keeps on showing up.
Comment 31 Martin-Éric Racine 2010-01-06 06:55:43 UTC
Stefan and I built new packages with Peter's patch, but this did not fix it either. Rather, the problem very much seems related to the new ACL caching code.
Comment 32 Stefan Bader 2010-01-06 08:30:33 UTC
I would not say it is so much related as on this specific hw it shows up with a high chance as a bug in this code. But as it shows in testing, the difference between a working kernel and one that shows the problem is not what the code does, but how long it takes. The only things added were printk's of the cached acl values while allocating the inode and all of a sudden there is no corruption.

The same code works on other platforms. At least I don't know of any report that looks like this. Also we cannot say it did not happen before. Just with the acl code there is code in place which visibly crashes at that point. If other structures or fields are affected this might "just" lead to weird effects.
We had tests with smp disabled and still got the same issue.
Up to now I cannot recall a completely positive reproduction of someone else with the exactly same hw and kernel. So a very very odd hw specific issue cannot be ruled out 100%.
The only other option would be a architecture specific problem. Either running at a speed that then allows the allocation being interrupted at an inconvenient time and leaving register values incorrectly. That might probably be some SMI interrupt. But on the other hand I could not really explain why this changes with the additional printks as the occurance of an interrupt would unlikely be related to the speed of the running main thread.
Would it be possible that such an interrupt disrupts something like the transfer between cache and memory and that an added read for some reason also causes that value to be written out?
Comment 33 Martin-Éric Racine 2010-02-25 23:25:11 UTC
Created attachment 25228 [details]
dmesg -r

It seems that we have some progress.

In an attempt to debug this issue, I compared notes with someone on Fedora for whom the same hardware works. As a test, I used their kernel 2.6.31 config (with a couple of small modifications to build specific drivers as built-in) to build my own kernel. Much to my amazement, this kernel boots fine, as long as I specify root=/dev/sda1 on the GRUB cmdline instead of the usual root=UUID=unique-filesystem-hash.

However, for some reason, an initrd.img was not automatically created upon installing this custom kernel package. Yet, as soon as I created one using "sudo update-initramfs -k 2.6.31.12-geodelx -c" and rebooted, the kernel failed to boot as before.

Just to be safe, I deleted the initrd and rebooted again, letting "udev" perform its work after /sbin/init has been launched by the kernel. Lo and behold, it worked again!

As such, it seems that something that gets included in the initramfs image is what messes with the ACL and destroys some inodes and makes the kernel crash in a non-recoverable way.

Interestingly enough, we still get the previous error messages about destroyed inodes when booting with this barebone kernel, without an initrd.img, but the error is non-fatal. The output of dmesg -r is attached.
Comment 34 Martin-Éric Racine 2010-03-02 00:27:30 UTC
Created attachment 25303 [details]
dmesg -r on kernel 2.6.32

This is what shows on Ubuntu's stock kernel for their upcoming Lucid 10.4 release, if I purposely purge the initrd.img and let the kernel boot using whatever drivers it has built-in. 

As you can see, we indeed succeed at booting, however vcons support doesn't work and, for obvious reasons, whatever goodies we expect to find in the initramfs image are also missing. However, the advantage is that the error I've mentioned appears as usual, but in a non-fatal way. 

Hopefully, the attached dmesg output can help the LKML developers determine what messes with sysfs on this host.
Comment 35 Martin-Éric Racine 2010-10-28 10:45:54 UTC
Kernel 2.6.36 magically fixed this. The fix would need to be backported to kernels newer than 2.6.30.
Comment 36 Florian Mickler 2010-12-17 22:06:11 UTC
Thanks for the heads-up. Regarding the backport... can you use git-bisect to find the fix? Else we have nothing to backport.  

I'm closing this as unreproducible in the meantime. If you happen to find the fix, please post that here!
Comment 37 Martin-Éric Racine 2011-02-04 14:22:34 UTC
The issue returned as of 2.6.38-rc1 on the same hardware.
Comment 38 Florian Mickler 2011-03-05 00:10:59 UTC
Is it still somehow related to the initrd? I.e. can you get 2.6.38-rc7 (or later) to boot without an initrd while with the initrd it does not?
Comment 39 Martin-Éric Racine 2011-03-19 09:15:07 UTC
2.6.38 final seems to work again as intended, with or without initrd.
Comment 40 Florian Mickler 2011-03-30 23:54:20 UTC
Nice. Thanks for the update. I'm closing this as unreproducible, since we don't know which commit fixed it.
Comment 41 Jonathan Nieder 2012-06-16 02:25:16 UTC
Reopening: http://bugs.debian.org/677655
Comment 42 Jonathan Nieder 2012-06-16 08:45:42 UTC
If I understand correctly, each kernel version reliably works or doesn't work, but this symptom comes and goes from one kernel version to the next.  What kernels have you tested, and what happened with each?

Do you think this is a timing-related or memory layout related problem? Any ideas for distinguishing between the two?
Comment 43 Jonathan Nieder 2012-07-14 16:45:19 UTC
Ping?  I really am curious about that list of kernels you have tested and results, if you happen to have that information available.
Comment 44 Martin-Éric Racine 2012-07-15 08:56:10 UTC
The issue comes and goes. As I recall, this happens whenever someone changes something in the inode ACL code. I don't have a clue about what causes it.

Kernel 3.2 is the last one that works for me.
Comment 45 Otavio Salvador 2012-08-13 13:21:58 UTC
I can reproduce this with 3.4.7, 3.5 and 3.5.1.
Comment 46 xerofoify 2014-06-25 02:03:27 UTC
This bug is old. Please tell against a newer kernel to see if this is still a 
valid kernel bug.
Cheers Nick
Comment 47 deloptes 2014-08-15 19:19:15 UTC
Hello to all of you!

I can confirm that the bug is there in 3.14.0 and > 2.6.26
I tried 2.6.30.10 2.6.32.63 2.6.36 2.6.38.8
None of them works

I upgraded recently one such Geode GX-MMX machine from lenny to wheezy and tried to compile more recent kernel few months ago, however all of this failed and finally came along this thread.

To verify the source of the issue I downloaded and rebuild almost same 2.6.26 that was working since 2008. This works (despite some issues with gcc-4.7 and with update-initramfs).
All the rest is hanging with some mangled output on the console

cat /proc/cpuinfo
processor       : 0
vendor_id       : Geode by NSC
cpu family      : 5
model           : 5
model name      : Geode(TM) Integrated Processor by National Semi

flags           : fpu de pse tsc msr cx8 pge cmov mmx mmxext 3dnowext 3dnow
bogomips        : 665.94

thanks in advance
Comment 48 Alan 2014-08-15 20:03:27 UTC
I don't actually have any Geode hardware (haven't for some years) but if you know which was the last release that worked, and which was the first that didn't then it may still be possible to chase down
Comment 49 deloptes 2014-08-16 01:26:05 UTC
Can you give some short instructions how to proceed?
I did not find any patches for 2.6.26
So we know that 2.6.26 and 2.6.26.2 is working 2.6.30 is not.
Actually I'm more interested in getting something above 2.6.32 working.
Is there a chance that this testing can be done on newer version?
Comment 50 H. Peter Anvin 2014-08-16 02:12:48 UTC
The absolutely best would be if you could do a "git bisect" between 2.6.26 and 2.6.30.
Comment 51 deloptes 2014-08-16 11:55:04 UTC
I want to draw the baseline for this testing
- compiling is done on Intel 64bit for Geode GX-MMX 
- compiler is gcc-4.7 on debian wheeze
- working config for 2.6.26 used as baseline make ARCH=i386 oldconfig
- compiling with make ARCH=i386 -j4 all
- added some additional options to gcc
  Makefile
  HOSTCC       = gcc -m32 -m3dnow -maes -mmmx
  HOSTCXX      = g++ -m32 -m3dnow -maes -mmmx

Starting with kernel 2.6.27.1 ( and in 2.6.26 )

1. In arch/x86/vdso/Makefile I have to remove "-m elf_i386" (compiling in chrooted i386 on 64)

CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -Wl,-soname=linux-gate.so.1
+VDSO_LDFLAGS_vdso32.lds = -Wl,-soname=linux-gate.so.1

Solving this issue

  VDSO    arch/x86/vdso/vdso32-int80.so.dbg
gcc: error: unrecognized command line option ‘-m’
gcc: error: elf_i386: No such file or directory
make[3]: *** [arch/x86/vdso/vdso32-int80.so.dbg] Error 1

2. the mutex_lock/_unlock should be marked as __used as described here

http://www.brunni.de/linux_kernel_2.6.27.62_with_gcc_4.6.3.html

  LD      .tmp_vmlinux1
kernel/built-in.o: In function `mutex_lock':
(.sched.text+0x726): undefined reference to `__mutex_lock_slowpath'
kernel/built-in.o: In function `mutex_unlock':
(.sched.text+0x730): undefined reference to `__mutex_unlock_slowpath'
make[2]: *** [.tmp_vmlinux1] Error 1

Is there a more intelligent way to overcome the above issues?
Is this setup useful?
Do I have to add some more debugging in the kernel?

I'll compile some kernels for testing and send you feedback next week

thanks in advance

regards
Comment 52 deloptes 2014-08-17 09:43:42 UTC
Hi,
from the official download site https://www.kernel.org/pub/linux/kernel/v2.6/
all kernels upto 2.6.30.10 worked. 2.6.31 do not work

I tested following

config-2.6.26
config-2.6.27.1
config-2.6.27.30
config-2.6.27.50
config-2.6.28.10
config-2.6.29.6
config-2.6.30.10
config-2.6.30.3
config-2.6.31
config-2.6.31.14

Does this help?
Comment 53 deloptes 2014-08-28 08:55:46 UTC
Created attachment 148631 [details]
attachment-17854-0.html

How do I do "git bisect" between last working 2.6.30.10 and 2.6.31?

thanks



On Saturday, August 16, 2014 4:12 AM, "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org> wrote:
 


https://bugzilla.kernel.org/show_bug.cgi?id=13941

--- Comment #50 from H. Peter Anvin <hpa@zytor.com> ---
The absolutely best would be if you could do a "git bisect" between 2.6.26 and
2.6.30.
Comment 54 deloptes 2014-09-06 12:36:27 UTC
Hi,
thanks for answering.
I'm not too familiar with git etc, but I'll try my best. I found this document which seems to be good for the start
https://www.kernel.org/pub/software/scm/git/docs/git-bisect-lk2009.html

Just for the record I noticed that 2.6.30 has issue with udev in debian wheezy.
Precisely it gets permission denied on locking /run/network/ifstate and thus can not bring up the network. There were other issues with /run as well. 
In wheezy udev is not starting because kernel higher than 2.6.32 is expected.
Comment 55 deloptes 2014-09-09 11:23:10 UTC
Hi again
could you please provide instructions (or link with such) that describes the process of bisect-ing.
It seems I can not do git clone for some reason.
I tried this @home and in the office but no luck.
All I get is

git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.git
Cloning into 'linux-2.6'...
fatal: unable to connect to git.kernel.org:
git.kernel.org[0: 199.204.44.194]: errno=Die Wartezeit f?r die Verbindung ist abgelaufen
git.kernel.org[1: 198.145.20.140]: errno=Die Wartezeit f?r die Verbindung ist abgelaufen
git.kernel.org[2: 149.20.4.72]: errno=Die Wartezeit f?r die Verbindung ist abgelaufen
git.kernel.org[3: 2001:4f8:1:10:0:1991:8:25]: errno=Das Netzwerk ist nicht erreichbar


git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Cloning into 'linux-stable'...
fatal: unable to connect to git.kernel.org:
git.kernel.org[0: 199.204.44.194]: errno=Die Wartezeit f?r die Verbindung ist abgelaufen
git.kernel.org[1: 198.145.20.140]: errno=Die Wartezeit f?r die Verbindung ist abgelaufen
git.kernel.org[2: 149.20.4.72]: errno=Die Wartezeit f?r die Verbindung ist abgelaufen
git.kernel.org[3: 2001:4f8:1:10:0:1991:8:25]: errno=Das Netzwerk ist nicht erreichbar

PS: the german messages are saying "Connection Timeout"

traceroute git.kernel.org
traceroute to git.kernel.org (149.20.4.72), 30 hops max, 60 byte packets
 1  10.146.8.4 (10.146.8.4)  0.414 ms  0.437 ms  0.487 ms
 2  192.168.192.4 (192.168.192.4)  0.460 ms  0.599 ms  0.738 ms
 3  80.109.255.1 (80.109.255.1)  2.979 ms  2.949 ms  2.868 ms
 4  at-vie-sk11-pe04-vl-2035.upc.at (84.116.228.98)  104.393 ms  104.270 ms  104.380 ms
 5  at-vie-sk11-pe02-vl-2028.upc.at (84.116.228.69)  104.371 ms  104.496 ms  104.501 ms
 6  at-vie15a-rd1-vl-2043.aorta.net (84.116.228.129)  103.672 ms  102.080 ms  102.733 ms
 7  uk-lon01b-rd1-xe-1-0-1.aorta.net (84.116.132.37)  102.424 ms  103.347 ms  102.573 ms
 8  de-fra01a-ri2-xe-4-0-0.aorta.net (84.116.130.138)  103.141 ms 84-116-130-149.aorta.net (84.116.130.149)  102.154 ms  101.613 ms
 9  84.116.137.182 (84.116.137.182)  105.509 ms 84.116.137.190 (84.116.137.190)  157.893 ms 84.116.137.186 (84.116.137.186)  105.663 ms
10  nyiix.r1.lga1.isc.org (198.32.160.95)  105.398 ms  104.632 ms  104.637 ms
11  int-0-0-0-7.r1.pao1.isc.org (149.20.65.137)  175.873 ms  175.884 ms  172.800 ms
12  git2.kernel.org (149.20.4.72)  172.420 ms  172.626 ms  172.640 ms


thanks in advance

regards
Comment 56 deloptes 2014-09-09 12:45:14 UTC
I think there is some IPv6 issue here

Note You need to log in before you can comment on or make changes to this bug.