Bug 11249 - TC HTB hanging problem
Summary: TC HTB hanging problem
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Arnaldo Carvalho de Melo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-04 08:01 UTC by Leandro Silva
Modified: 2008-09-26 05:13 UTC (History)
0 users

See Also:
Kernel Version: 2.6.23 and 2.6.25
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Leandro Silva 2008-08-04 08:01:04 UTC
Latest working kernel version:
Earliest failing kernel version: 2.6.23

Distribution: Mandriva 2007.1 and 2008.0

Hardware Environment:
It happens in many different servers

Software Environment:
Problem Description:
I have close to 200 servers, most with mandriva 2006.0 using kernel 2.6.15, some with mandriva 2008.0 using kernel 2.6.23 or 2.6.25. In some of them (kernels 2.6.23 and 2.6.25 confirmed) the server hangs at random (some servers hang more than once a day, some once a month).
The hardware is different from each other and i have about 5 servers with exactly the same configuration (proc, mem, ethernet, so one) and one hangs every day while the others are running fine, all with the same rules for traffic shapping (tc using htb).
I think that it is something related to tc because last week i accessed a server and when i type tc del to remove the shapping it hanged. My client restarted the server and about 10 minutes later i did it again with the same effect. No kernel panic, no oops, just hangs. 
I've read some posts and bugs but i see something related to ethernet driver (like sk98lin), but it is happening with several servers with different hardwares.
I have some servers with kernel 2.6.15 and, as far as i know, it doesn't happen with them, but some of they use a different set of tc rules (a few less rules actually) or none at all.
I don't use the kernel shippied with mandriva distro, always got kernel from kernel.org and compilled myself.

Steps to reproduce:
Handly, since it is random, it takes minutes or weeks to happen, but always with some change in tc (start or stop).
Comment 1 Anonymous Emailer 2008-08-04 10:55:32 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Mon,  4 Aug 2008 08:01:05 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11249
> 
>            Summary: TC HTB hanging problem
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.23 and 2.6.25
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: lansoweb@hotmail.com
> 
> 
> Latest working kernel version:
> Earliest failing kernel version: 2.6.23
> 
> Distribution: Mandriva 2007.1 and 2008.0
> 
> Hardware Environment:
> It happens in many different servers
> 
> Software Environment:
> Problem Description:
> I have close to 200 servers, most with mandriva 2006.0 using kernel 2.6.15,
> some with mandriva 2008.0 using kernel 2.6.23 or 2.6.25. In some of them
> (kernels 2.6.23 and 2.6.25 confirmed) the server hangs at random (some
> servers
> hang more than once a day, some once a month).
> The hardware is different from each other and i have about 5 servers with
> exactly the same configuration (proc, mem, ethernet, so one) and one hangs
> every day while the others are running fine, all with the same rules for
> traffic shapping (tc using htb).
> I think that it is something related to tc because last week i accessed a
> server and when i type tc del to remove the shapping it hanged. My client
> restarted the server and about 10 minutes later i did it again with the same
> effect. No kernel panic, no oops, just hangs. 
> I've read some posts and bugs but i see something related to ethernet driver
> (like sk98lin), but it is happening with several servers with different
> hardwares.
> I have some servers with kernel 2.6.15 and, as far as i know, it doesn't
> happen
> with them, but some of they use a different set of tc rules (a few less rules
> actually) or none at all.
> I don't use the kernel shippied with mandriva distro, always got kernel from
> kernel.org and compilled myself.
> 
> Steps to reproduce:
> Handly, since it is random, it takes minutes or weeks to happen, but always
> with some change in tc (start or stop).
Comment 2 Leandro Silva 2008-08-04 11:44:34 UTC
Hello Andrew!

Just to add, 2 weeks ago one of my clients had to reboot the server 3 times
during the day, so I disabled the qos and it worked fine for 3 days. After
this days I started the qos again and 10 hours later the server hanged
again, so it's disabled until now without hanging.
Another info is that I got other server that was hanging randomly and put
the users using a router but kept the server on, in the internet and with
qos running and it doesn't hang in the last 10 days. So I guess it's not the
qos only, but something with qos and usage by users. I have other client
with same kernel, same rules running for more than 2 month and with more
than 4 times the internet usage than the others and it never hanged, so it's
not only high usage.
I really don't know what is happening.

Thanks a lot any advice,
Leandro

-----Mensagem original-----
De: Andrew Morton [mailto:akpm@linux-foundation.org] 
Enviada em: segunda-feira, 4 de agosto de 2008 14:55
Para: netdev@vger.kernel.org
Cc: bugme-daemon@bugzilla.kernel.org; lansoweb@hotmail.com
Assunto: Re: [Bugme-new] [Bug 11249] New: TC HTB hanging problem


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Mon,  4 Aug 2008 08:01:05 -0700 (PDT) bugme-daemon@bugzilla.kernel.org
wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11249
> 
>            Summary: TC HTB hanging problem
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.23 and 2.6.25
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: lansoweb@hotmail.com
> 
> 
> Latest working kernel version:
> Earliest failing kernel version: 2.6.23
> 
> Distribution: Mandriva 2007.1 and 2008.0
> 
> Hardware Environment:
> It happens in many different servers
> 
> Software Environment:
> Problem Description:
> I have close to 200 servers, most with mandriva 2006.0 using kernel
2.6.15,
> some with mandriva 2008.0 using kernel 2.6.23 or 2.6.25. In some of them
> (kernels 2.6.23 and 2.6.25 confirmed) the server hangs at random (some
servers
> hang more than once a day, some once a month).
> The hardware is different from each other and i have about 5 servers with
> exactly the same configuration (proc, mem, ethernet, so one) and one hangs
> every day while the others are running fine, all with the same rules for
> traffic shapping (tc using htb).
> I think that it is something related to tc because last week i accessed a
> server and when i type tc del to remove the shapping it hanged. My client
> restarted the server and about 10 minutes later i did it again with the
same
> effect. No kernel panic, no oops, just hangs. 
> I've read some posts and bugs but i see something related to ethernet
driver
> (like sk98lin), but it is happening with several servers with different
> hardwares.
> I have some servers with kernel 2.6.15 and, as far as i know, it doesn't
happen
> with them, but some of they use a different set of tc rules (a few less
rules
> actually) or none at all.
> I don't use the kernel shippied with mandriva distro, always got kernel
from
> kernel.org and compilled myself.
> 
> Steps to reproduce:
> Handly, since it is random, it takes minutes or weeks to happen, but
always
> with some change in tc (start or stop).
Comment 3 Jarek Poplawski 2008-08-04 14:59:53 UTC
Leandro Oliveira da Silva wrote, On 08/04/2008 08:44 PM:
...

> Just to add, 2 weeks ago one of my clients had to reboot the server 3 times
> during the day, so I disabled the qos and it worked fine for 3 days. After
> this days I started the qos again and 10 hours later the server hanged
> again, so it's disabled until now without hanging.

Hi,

There were a few bugs found for these kernels, but alas not all stable
versions were fixed. The best thing would be trying eg. 2.6.25.14 or 2.6.26.

Otherwise you could especially check these two patches:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.25.y.git;a=commit;h=066a3b5b2346febf9a655b444567b7138e3bb939
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.25.y.git;a=commit;h=734bf48fe5276f319464fd30dc4a046a29d2b94a

Alas some HTB (or around) problems are still diagnosed, so this could be not
enough.

Regards,
Jarek P.
Comment 4 Leandro Silva 2008-08-05 05:13:49 UTC
Hello Jarek!

Many thanks for your response! Are these two patches already included in
2.6.26 version?

Thanks a lot again,
Leandro

-----Mensagem original-----
De: Jarek Poplawski [mailto:jarkao2@gmail.com] 
Enviada em: segunda-feira, 4 de agosto de 2008 18:59
Para: Leandro Oliveira da Silva
Cc: 'Andrew Morton'; netdev@vger.kernel.org;
bugme-daemon@bugzilla.kernel.org
Assunto: Re: RES: [Bugme-new] [Bug 11249] New: TC HTB hanging problem

Leandro Oliveira da Silva wrote, On 08/04/2008 08:44 PM:
..

> Just to add, 2 weeks ago one of my clients had to reboot the server 3
times
> during the day, so I disabled the qos and it worked fine for 3 days. After
> this days I started the qos again and 10 hours later the server hanged
> again, so it's disabled until now without hanging.

Hi,

There were a few bugs found for these kernels, but alas not all stable
versions were fixed. The best thing would be trying eg. 2.6.25.14 or 2.6.26.

Otherwise you could especially check these two patches:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.25.y.git;a=commit
;h=066a3b5b2346febf9a655b444567b7138e3bb939
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.25.y.git;a=commit
;h=734bf48fe5276f319464fd30dc4a046a29d2b94a

Alas some HTB (or around) problems are still diagnosed, so this could be not
enough.

Regards,
Jarek P.
Comment 5 Leandro Silva 2008-08-05 05:19:31 UTC
Hello!

I've Just checked the 2.6.26 kernel and this patch is there, I'll give a try
and put this one in some critical clients.

Thanks,
Leandro

-----Mensagem original-----
De: Jarek Poplawski [mailto:jarkao2@gmail.com] 
Enviada em: segunda-feira, 4 de agosto de 2008 18:59
Para: Leandro Oliveira da Silva
Cc: 'Andrew Morton'; netdev@vger.kernel.org;
bugme-daemon@bugzilla.kernel.org
Assunto: Re: RES: [Bugme-new] [Bug 11249] New: TC HTB hanging problem

Leandro Oliveira da Silva wrote, On 08/04/2008 08:44 PM:
..

> Just to add, 2 weeks ago one of my clients had to reboot the server 3
times
> during the day, so I disabled the qos and it worked fine for 3 days. After
> this days I started the qos again and 10 hours later the server hanged
> again, so it's disabled until now without hanging.

Hi,

There were a few bugs found for these kernels, but alas not all stable
versions were fixed. The best thing would be trying eg. 2.6.25.14 or 2.6.26.

Otherwise you could especially check these two patches:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.25.y.git;a=commit
;h=066a3b5b2346febf9a655b444567b7138e3bb939
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.25.y.git;a=commit
;h=734bf48fe5276f319464fd30dc4a046a29d2b94a

Alas some HTB (or around) problems are still diagnosed, so this could be not
enough.

Regards,
Jarek P.
Comment 6 Jarek Poplawski 2008-08-05 05:24:20 UTC
On Tue, Aug 05, 2008 at 09:13:15AM -0300, Leandro Oliveira da Silva wrote:
> Hello Jarek!
> 
> Many thanks for your response! Are these two patches already included in
> 2.6.26 version?

Yes.

> Thanks a lot again,

Not at all... at least until we know if it works!

Jarek P.
Comment 7 Leandro Silva 2008-08-05 12:46:24 UTC
Hi Jarek!

Good news. I downloaded the 2.6.26 version to one of my clients, compiled,
and before the reboot to put this one as active I ran my script and the
server hanged. When they restarted the server the kernel 2.6.26 entered and
I ran the script several times without problem. I guess in my case it was
one of the two bugs, but I'm putting this kernel in other 3 servers and
let's see if they work fine now. I'll send a email on Friday with the
status.

Thanks a lot,
Leandro

-----Mensagem original-----
De: Jarek Poplawski [mailto:jarkao2@gmail.com] 
Enviada em: ter
Comment 8 Leandro Silva 2008-08-13 05:36:43 UTC
Hi Jarek!

Some clients of mine are using the 2.6.26 version without a problem for 1 week now, and some of them used to hang every day, i guess it's solved. thanks a lot for the advice!

Leandro

> Date: Tue, 5 Aug 2008 12:29:16 +0000
> From: jarkao2@gmail.com
> To: lansoweb@hotmail.com
> CC: akpm@linux-foundation.org; netdev@vger.kernel.org;
> bugme-daemon@bugzilla.kernel.org
> Subject: Re: RES: RES: [Bugme-new] [Bug 11249] New: TC HTB hanging problem
> 
> On Tue, Aug 05, 2008 at 09:13:15AM -0300, Leandro Oliveira da Silva wrote:
> > Hello Jarek!
> > 
> > Many thanks for your response! Are these two patches already included in
> > 2.6.26 version?
> 
> Yes.
> 
> > Thanks a lot again,
> 
> Not at all... at least until we know if it works!
> 
> Jarek P.

_________________________________________________________________
Conhe
Comment 9 Jarek Poplawski 2008-08-13 05:48:24 UTC
On Wed, Aug 13, 2008 at 09:36:35AM -0300, Leandro Oliveira da Silva wrote:
> 
> Hi Jarek!
> 
> Some clients of mine are using the 2.6.26 version without a problem for 1
> week now, and some of them used to hang every day, i guess it's solved.
> thanks a lot for the advice!
> 

Hi Leandro!

Very nice to "hear" this!

Thanks for testing,
Jarek P.

PS: When you're sure there is nothing more around this you could
probably close this bugzilla report.

Note You need to log in before you can comment on or make changes to this bug.