Bug 16615 - [REGRESSION, bisected] 2.6.36-rc1 - suspend doesn't work and network issues afterwards
Summary: [REGRESSION, bisected] 2.6.36-rc1 - suspend doesn't work and network issues a...
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-17 19:19 UTC by trapdoor6
Modified: 2010-08-18 21:18 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.36-rc1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description trapdoor6 2010-08-17 19:19:49 UTC
Hello,
This is a bug I have already reported on LKML: http://lkml.org/lkml/2010/8/17/353
Please find details below
-----------------------

Working kernels: from 2.6.35 - to 2.6.35.2
Non working kernels: from (at least) 2.6.35-git2 - to 2.6.36-rc1
-----------------------

Description of the problem(s):

1. First of all I confirm that the problem is consistent: suspend always works well on any of the 'good' kernels and it newer works (with the same symptoms) on any of the 'bad' kernels. I've tested it several times on a couple of 'good' and 'bad' kernels to make sure it's not some random issue.

2. On a bad kernel, this is what happens when I try to switch to suspend mode:
- screen goes blank ['No signal detected !'] - that's OK :)
- either the CPU fan or PSU fan or both are still running and they won't stop (left it for about 30 min. once)
- the power light is on and still, where in suspend mode it should blink - that won't change either
- pressing the power button obviously doesn't wake the system up; need to hard reboot

3. Now, when booting again after an unsuccessful suspend and hard reboot, something happens to my network card (and it doesn't matter what is the next kernel I boot from - it can be any 'good' or 'bad' one):
- in Gnome, the gnome-panel network indicator shows my connection as disabled (wired eth0 - the only one I use); enabling it doesn't bring my network up

4. After second reboot from any of the 'good' or 'bad' kernels my network connection works OK.
-----------------------

Bisecting turned up the following commit. Reverting it in 2.6.36-rc1 results in a system that works.

commit bd25f4dd6972755579d0ea50d1a5ace2e9b00d1a
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Sun Jul 11 15:34:05 2010 +0200

    HID: hiddev: use usb_find_interface, get rid of BKL
   
    This removes the private hiddev_table in the usbhid
    driver and changes it to use usb_find_interface
    instead.
   
    The advantage is that we can avoid the race between
    usb_register_dev and usb_open and no longer need the
    big kernel lock.
   
    This doesn't introduce race condition -- the intf pointer could be
    invalidated only in hiddev_disconnect() through usb_deregister_dev(),
    but that will block on minor_rwsem and not actually remove the device
    until usb_open().
-----------------------

Despite the network issues (occurring every time after unsuccessful suspend > hard restart) the bisecting turned up a commit which isn't rather related to network. But I can confirm that reverting only this one commit in 2.6.36-rc1 [da5cabf80e2433131bf0ed8993abc0f7ea618c73] fixes the whole problem - suspend works OK and I have no issues with network after waking the system up, neither after re-booting from the same or any other kernel.

Please let me know what further details should I provide. Any logs, hardware specifications? I've also made a brief log recording the bisect if anyone would need it.

--
Regards
Tomasz B. aka trapDoor
Comment 1 Gabriel Craciunescu 2010-08-17 20:54:15 UTC
(In reply to comment #0)

Hello,

I have the same problem and was about to bisect the issue too.

I can also confirm that reverting bd25f4dd6972755579d0ea50d1a5ace2e9b00d1a makes suspend works again here.

Best Regards,

Gabriel C.
Comment 2 trapdoor6 2010-08-17 21:49:06 UTC
Thanks Gabriel,

I've got another thread for this on LKML. Would you like me to CC to my e-mails there?

Thanks,
Tomasz B.
Comment 3 Andrea Gelmini 2010-08-17 21:55:18 UTC
Hi all,
   same problem here.
   I didn't bisect, and I can't test the revert 'til next week.

Thanks a lot,
Andrea
Comment 4 Gabriel Craciunescu 2010-08-17 23:05:28 UTC
(In reply to comment #2)
> Thanks Gabriel,
> 
> I've got another thread for this on LKML. Would you like me to CC to my
> e-mails
> there?
> 

Sure just add me to CC there..
Comment 5 trapdoor6 2010-08-17 23:29:19 UTC
(In reply to comment #4)
> (In reply to comment #2)
> > Thanks Gabriel,
> > 
> > I've got another thread for this on LKML. Would you like me to CC to my
> e-mails
> > there?
> > 
> 
> Sure just add me to CC there..

OK. I will do next time.

Thanks.
Comment 6 trapdoor6 2010-08-18 09:18:15 UTC
Patch 9c9e54a8df0be48aa359744f412377cc55c3b7d2 fixes the problem on my end. Please refer to http://lkml.org/lkml/2010/8/18/92 for details.

Sent e-mail to Andrea and Gabriel asking for testing on their end.

Tomasz B.
Comment 7 trapdoor6 2010-08-18 21:14:35 UTC
Here is an e-mail from Jiri Kosing with a patch that fixes the problem. A copy was sent to LKML and the link in my previous comment refers to it but LKML seems to be down.

The patch can be found here:
http://git.kernel.org/?p=linux/kernel/git/jikos/hid.git;a=commitdiff;h=9c9e54a8df0be48aa359744f412377cc55c3b7d2

Andrea and Gabriel have confirmed that it fixed the problem on their end too.

---------- Forwarded message ----------

From: Jiri Kosina <jkosina@suse.cz>
Date: Wed, Aug 18, 2010 at 9:46 AM
Subject: Re: [REGRESSION, bisected] 2.6.36-rc1 - suspend issues
To: trapDoor <trapdoor6@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>, LKML <linux-kernel@vger.kernel.org>, arnd@arndb.de, osnix0@googlemail.com, andrea.gelmini@gelma.net, trapDoor <trapdoor6@googlemail.com>


On Wed, 18 Aug 2010, trapDoor wrote:

> > Could you please verify whether the patch below (which is currently queued
> > in my tree and I am planning to push it to Linus soon, as it fixes memory
> > corruption) fixes your issue? Thanks.
> >
> > commit 9c9e54a8df0be48aa359744f412377cc55c3b7d2
> > Author: Jiri Kosina <jkosina@suse.cz>
> > Date:   Fri Aug 13 12:19:45 2010 +0200
> >
> >    HID: hiddev: fix memory corruption due to invalid intfdata
> >
> >    Commit bd25f4dd6972755579d0 ("HID: hiddev: use usb_find_interface,
> >    get rid of BKL") introduced using of private intfdata in hiddev for
> >    purpose of storing hiddev pointer.
> >
> >    This is a problem, because intf pointer is already being set to struct
> >    hid_device pointer by HID core. This obviously lead to memory
> corruptions
> >    at device disconnect time, such as
> >
> >    WARNING: at lib/kobject.c:595 kobject_put+0x37/0x4b()
> >    kobject: '(null)' (ffff88011e9cd898): is not initialized, yet
> kobject_put() is being called.
> >
> >    Convert hiddev into accessing hiddev through struct hid_device which is
> >    in intfdata already.
> >
> >    Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> >    Reported-and-tested-by: Heinz Diehl <htd@fritha.org>
> >    Reported-and-tested-by: Alan Ott <alan@signal11.us>
> >    Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> >
> > diff --git a/drivers/hid/usbhid/hiddev.c b/drivers/hid/usbhid/hiddev.c
> > index f285017..0a29c51 100644
> > --- a/drivers/hid/usbhid/hiddev.c
> > +++ b/drivers/hid/usbhid/hiddev.c
> > @@ -266,13 +266,15 @@ static int hiddev_open(struct inode *inode, struct
> file *file)
> >  {
> >        struct hiddev_list *list;
> >        struct usb_interface *intf;
> > +       struct hid_device *hid;
> >        struct hiddev *hiddev;
> >        int res;
> >
> >        intf = usb_find_interface(&hiddev_driver, iminor(inode));
> >        if (!intf)
> >                return -ENODEV;
> > -       hiddev = usb_get_intfdata(intf);
> > +       hid = usb_get_intfdata(intf);
> > +       hiddev = hid->hiddev;
> >
> >        if (!(list = kzalloc(sizeof(struct hiddev_list), GFP_KERNEL)))
> >                return -ENOMEM;
> > @@ -890,7 +892,6 @@ int hiddev_connect(struct hid_device *hid, unsigned int
> force)
> >        hid->hiddev = hiddev;
> >        hiddev->hid = hid;
> >        hiddev->exist = 1;
> > -       usb_set_intfdata(usbhid->intf, usbhid);
> >        retval = usb_register_dev(usbhid->intf, &hiddev_class);
> >        if (retval) {
> >                err_hid("Not able to get a minor for this device.");
> >
>
> Yes. That patch fixes my problem too. Tested on top of the current
> tree [2.6.36-rc1-00051-g3b89f56]. Thanks!

Thanks a lot for prompt testing.

> One little thing though. There is a message about 2 lines offset when
> patching. Succeeded anyway but I've never got such messages before so
> I thought it would be better to mention it:
>
> patch -p1 <../patches/hid.git-9c9e54a8df0be48aa359744f412377cc55c3b7d2.patch
> patching file drivers/hid/usbhid/hiddev.c
> Hunk #2 succeeded at 890 (offset -2 lines).
>
> I downloaded the patch from
>
> http://git.kernel.org/?p=linux/kernel/git/jikos/hid.git;a=commitdiff;h=9c9e54a8df0be48aa359744f412377cc55c3b7d2
> to make sure I had the original indentations etc. preserved (Gmail
> tends to mess up with lines).

Yeah, that's fine. It just means that the patch has been generated on a
slightly different version of the file, which makes the line numbers be
off a bit, but patch, git and friends can cope with this easily.

--
Jiri Kosina
SUSE Labs, Novell Inc.

---------- End of forwarded message ----------


Tomasz B.
Comment 8 trapdoor6 2010-08-18 21:18:12 UTC
And below is the final confirmation from Andrea and Gabriel. Therefore I'm closing this call.

---------- Forwarded message ----------
From: Gabriel Craciunescu <osnix0@googlemail.com>
Date: Wed, Aug 18, 2010 at 3:48 PM
Subject: Re: [REGRESSION, bisected] 2.6.36-rc1 - suspend issues
To: Andrea Gelmini <andrea.gelmini@gelma.net>
Cc: trapdoor6@gmail.com, Jiri Kosina <jkosina@suse.cz>, "Rafael J. Wysocki" <rjw@sisk.pl>, LKML <linux-kernel@vger.kernel.org>, arnd@arndb.de, trapDoor <trapdoor6@googlemail.com>


2010/8/18 Andrea Gelmini <andrea.gelmini@gelma.net>
>
> 2010/8/18 trapDoor <trapdoor6@gmail.com>:
> > Andrea, Gabriel
> > Would you please check if Jiri's patch solves the problem on your end
> > as well? Just to make 100% sure we can close related bugzilla call.
>
> Well, I made the test, with ~10 cycle hibernation/resume.
> Everything seems goes right.
>

Does work fine here also ..

Regards,

Gabriel

---------- End of forwarded message ----------

--
Tomasz B.

Note You need to log in before you can comment on or make changes to this bug.