Bug 6613

Summary: iptables broken on 32-bit PReP (ARCH=ppc)
Product: Networking Reporter: Meelis Roos (mroos)
Component: Netfilter/IptablesAssignee: Harald Welte (laforge)
Status: RESOLVED CODE_FIX    
Severity: normal CC: protasnb, stelian
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.17-rc4 Subsystem:
Regression: --- Bisected commit-id:
Attachments: .config for 2.6.17-rc5-git
Patch hopefully fixing the problem

Description Meelis Roos 2006-05-25 03:02:11 UTC
Most recent kernel where this bug did not occur: none known, this is a fresh 
install
Distribution: Debian unstable
Hardware Environment: 32-bit PowerPC 604 with PReP subarch (using old 
ARCH=ppc)
Software Environment: usual 32-bit ppc userspace, gcc 4.0.3
Problem Description: iptables operations usually just give "Incalida 
operation". modprobe iptable_filter and adding rules to the nat table have 
failed in testing while iptable_nat can be modprobed and listed.

Steps to reproduce:
modprobe iptable_filter (errors out with Invalid Argument)
iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -j SNAT --to 192.168.1.1 (usually 
errors out with Invalid Argument, sometimes succeeds, when succeeds then the 
rule works fine)
Comment 1 Meelis Roos 2006-05-25 03:05:32 UTC
Additionally, when stracing the failed iptables -A ..., iptables is killed 
with SIGSEGV (not while in a syscall) and doen not yield a Invalid Argument 
error. This might of course be another bug, in ppc ptrace or iptables 
userspace program or whatever.
Comment 2 Andrew Morton 2006-05-25 07:01:51 UTC
bugme-daemon@bugzilla.kernel.org wrote:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=6613
> 
>            Summary: iptables broken on 32-bit PReP (ARCH=ppc)
>     Kernel Version: 2.6.17-rc4
>             Status: NEW
>           Severity: normal
>              Owner: laforge@gnumonks.org
>          Submitter: mroos@linux.ee
> 
> 
> Most recent kernel where this bug did not occur: none known, this is a fresh 
> install
> Distribution: Debian unstable
> Hardware Environment: 32-bit PowerPC 604 with PReP subarch (using old 
> ARCH=ppc)
> Software Environment: usual 32-bit ppc userspace, gcc 4.0.3
> Problem Description: iptables operations usually just give "Incalida 
> operation". modprobe iptable_filter and adding rules to the nat table have 
> failed in testing while iptable_nat can be modprobed and listed.
> 
> Steps to reproduce:
> modprobe iptable_filter (errors out with Invalid Argument)
> iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -j SNAT --to 192.168.1.1 (usually 
> errors out with Invalid Argument, sometimes succeeds, when succeeds then the 
> rule works fine)
> 

Comment 3 Patrick McHardy 2006-05-25 11:50:46 UTC
Andrew Morton wrote:
> bugme-daemon@bugzilla.kernel.org wrote:
> 
>>http://bugzilla.kernel.org/show_bug.cgi?id=6613
>>
>>           Summary: iptables broken on 32-bit PReP (ARCH=ppc)
>>    Kernel Version: 2.6.17-rc4
>>            Status: NEW
>>          Severity: normal
>>             Owner: laforge@gnumonks.org
>>         Submitter: mroos@linux.ee
>>
>>
>>Most recent kernel where this bug did not occur: none known, this is a fresh 
>>install
>>Distribution: Debian unstable
>>Hardware Environment: 32-bit PowerPC 604 with PReP subarch (using old 
>>ARCH=ppc)
>>Software Environment: usual 32-bit ppc userspace, gcc 4.0.3
>>Problem Description: iptables operations usually just give "Incalida 
>>operation". modprobe iptable_filter and adding rules to the nat table have 
>>failed in testing while iptable_nat can be modprobed and listed.
>>
>>Steps to reproduce:
>>modprobe iptable_filter (errors out with Invalid Argument)
>>iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -j SNAT --to 192.168.1.1 (usually 
>>errors out with Invalid Argument, sometimes succeeds, when succeeds then the 
>>rule works fine)


Meelis, it would really help if you could try 2.6.16 and in case
that doesn't work 2.6.15 to give an idea about whether this is a
recent regression or an old problem. We had a number of changes
in this area in the last two kernel versions that could be related.

Comment 4 Meelis Roos 2006-05-25 12:10:22 UTC
> Meelis, it would really help if you could try 2.6.16 and in case
> that doesn't work 2.6.15 to give an idea about whether this is a
> recent regression or an old problem. We had a number of changes
> in this area in the last two kernel versions that could be related.

Yes, I'm still compiling 2.6.16, since just before sending the report. 
Will let you know ASAP.

Comment 5 Meelis Roos 2006-05-25 13:13:22 UTC
>>> http://bugzilla.kernel.org/show_bug.cgi?id=6613
>
> Meelis, it would really help if you could try 2.6.16 and in case
> that doesn't work 2.6.15 to give an idea about whether this is a
> recent regression or an old problem. We had a number of changes
> in this area in the last two kernel versions that could be related.

2.6.16 doesn't work either.

Tried 2.6.8-3 from sarge package, it is working.

Compiling 2.6.15 now...

Comment 6 Meelis Roos 2006-05-26 00:09:36 UTC
> Meelis, it would really help if you could try 2.6.16 and in case
> that doesn't work 2.6.15 to give an idea about whether this is a
> recent regression or an old problem. We had a number of changes
> in this area in the last two kernel versions that could be related.

Unfortunatlety, 2.6.15 does not boot on this machine so I'm locked out 
remotely at the moment. Will see if I can find the boot cure - there 
used to be a Motorola Powerstack-specific patch to make it boot that 
Debian 2.6.18 and IIRC 2.6.12 packages included and that was integrated 
somewhere later - maybe it's missing fom 2.6.15.

Comment 7 Meelis Roos 2006-06-01 00:01:13 UTC
>>> modprobe iptable_filter (errors out with Invalid Argument)
>>> iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -j SNAT --to 192.168.1.1 (usually
>>> errors out with Invalid Argument, sometimes succeeds, when succeeds then the
>>> rule works fine)
>
> Meelis, it would really help if you could try 2.6.16 and in case
> that doesn't work 2.6.15 to give an idea about whether this is a
> recent regression or an old problem. We had a number of changes
> in this area in the last two kernel versions that could be related.

Have not gotten 2.6.15 to work with one evening of tinkering - the irq 
patch was not sufficent, there is something more broken in booting that 
I dodn't figure out yet. So no test results for 2.6.15 yet.

Comment 8 Patrick McHardy 2006-06-01 10:41:59 UTC
Meelis Roos wrote:
>> Meelis, it would really help if you could try 2.6.16 and in case
>> that doesn't work 2.6.15 to give an idea about whether this is a
>> recent regression or an old problem. We had a number of changes
>> in this area in the last two kernel versions that could be related.
> 
> 
> Have not gotten 2.6.15 to work with one evening of tinkering - the irq
> patch was not sufficent, there is something more broken in booting that
> I dodn't figure out yet. So no test results for 2.6.15 yet.

Then lets try something different. Please enable the
DEBUG_IP_FIREWALL_USER define in net/ipv4/netfilter/ip_tables.c and
post the results, if any.

Comment 9 Meelis Roos 2006-06-01 13:47:46 UTC
> Then lets try something different. Please enable the
> DEBUG_IP_FIREWALL_USER define in net/ipv4/netfilter/ip_tables.c and
> post the results, if any.

On bootup I get this in dmesg (one Bad offset has been added):

ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
ip_conntrack version 2.4 (1536 buckets, 12288 max) - 224 bytes per conntrack
translate_table: size 632
Bad offset cb437924
ip_nat_init: can't setup rules.

And on iptables -t nat -L

translate_table: size 632
Bad offset cb4368f4
ip_nat_init: can't setup rules.
translate_table: size 632
Bad offset cb4368f4
ip_nat_init: can't setup rules.

Seems iptable_nat does not load at all this time.

Modprobe iptable_filter still fails, dmesg contains
translate_table: size 632
Finished chain 1
Finished chain 2
Finished chain 3

Next modprobe iptable_nat gives

translate_table: size 632
Bad offset c8e01944
ip_nat_init: can't setup rules.

Comment 10 Patrick McHardy 2006-06-02 06:05:59 UTC
Meelis Roos wrote:
>> Then lets try something different. Please enable the
>> DEBUG_IP_FIREWALL_USER define in net/ipv4/netfilter/ip_tables.c and
>> post the results, if any.
> 
> 
> On bootup I get this in dmesg (one Bad offset has been added):
> 
> ip_tables: (C) 2000-2006 Netfilter Core Team
> Netfilter messages via NETLINK v0.30.
> ip_conntrack version 2.4 (1536 buckets, 12288 max) - 224 bytes per
> conntrack
> translate_table: size 632
> Bad offset cb437924
> ip_nat_init: can't setup rules.
> 
> And on iptables -t nat -L
> 
> translate_table: size 632
> Bad offset cb4368f4
> ip_nat_init: can't setup rules.
> translate_table: size 632
> Bad offset cb4368f4
> ip_nat_init: can't setup rules.
> 
> Seems iptable_nat does not load at all this time.
> 
> Modprobe iptable_filter still fails, dmesg contains
> translate_table: size 632
> Finished chain 1
> Finished chain 2
> Finished chain 3
> 
> Next modprobe iptable_nat gives
> 
> translate_table: size 632
> Bad offset c8e01944
> ip_nat_init: can't setup rules.


Very strange, this means that the initial table data must somehow
be wrong, but for some reason it still seems to get past the
size and offset checks for the filter table. I can't see how
loading the filter table could fail after the "Finished chain .."
messages without another message. Which kernel version did you
perform these test on?

Comment 11 Meelis Roos 2006-06-02 06:15:21 UTC
> Very strange, this means that the initial table data must somehow
> be wrong, but for some reason it still seems to get past the
> size and offset checks for the filter table. I can't see how
> loading the filter table could fail after the "Finished chain .."
> messages without another message. Which kernel version did you
> perform these test on?

Yesterdays 2.6.17-rc5+git.

Comment 12 Patrick McHardy 2006-06-02 06:53:22 UTC
Meelis Roos wrote:
>> Very strange, this means that the initial table data must somehow
>> be wrong, but for some reason it still seems to get past the
>> size and offset checks for the filter table. I can't see how
>> loading the filter table could fail after the "Finished chain .."
>> messages without another message. Which kernel version did you
>> perform these test on?
> 
> 
> Yesterdays 2.6.17-rc5+git.

Please enable DEBUG_IP_FIREWALL_USER in net/netfilter/x_tables.c as well
and retry. Results of the raw or mangle table would also be interesting
because they contain a different number of built-in chains.

Comment 13 Meelis Roos 2006-06-04 05:36:40 UTC
> Please enable DEBUG_IP_FIREWALL_USER in net/netfilter/x_tables.c as well
> and retry. Results of the raw or mangle table would also be interesting
> because they contain a different number of built-in chains.

Sorry it took so long, I was away. Adding this define does not seem to 
do much (table->private->number prints only):

On boot (1 nat rule):
ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
ip_conntrack version 2.4 (1536 buckets, 12288 max) - 224 bytes per conntrack
translate_table: size 632
Finished chain 0
Finished chain 3
Finished chain 4
table->private->number = 4
t->private->number = 4
translate_table: size 800
Bad offset cba528d4

modprobe iptable_nat succeeded in manual modprobe.

modprobe iptable_filter:
translate_table: size 632
Bad offset cbbd910c

modprobe iptable_mangle:
translate_table: size 936
Bad offset cbbd80dc

modprobe iptable_raw:
translate_table: size 480
Bad offset cb8abd44

Retrying ifup and ifdown that tried to do iptables -D and iptables -I:
t->private->number = 4
t->private->number = 4
t->private->number = 4
translate_table: size 800
Bad offset cbbd80dc
t->private->number = 4

And retrying it more (succeeded this time):

t->private->number = 4
t->private->number = 4
translate_table: size 800
Finished chain 0
Finished chain 3
Finished chain 4
ip_tables: Translated table
do_replace: oldnum=4, initnum=4, newnum=5
t->private->number = 5

Comment 14 Stelian Pop 2006-06-05 07:10:31 UTC
Hmm, I think I'm bitten by this same bug, on an Apple Powerbook (ARCH=powerpc)
here. Running latest git as of now.

All the relevant netfilter options are set to 'y', however
/proc/net/ip_tables_names shows only the 'raw' table and all I'm able to find in
the kernel logs are those init messages:
  Netfilter messages via NETLINK v0.30.
  ip_conntrack version 2.4 (8192 buckets, 65536 max) - 204 bytes per conntrack
  ip_tables: (C) 2000-2006 Netfilter Core Team
  ip_nat_init: can't setup rules.
  ipt_recent v0.3.1: Stephen Frost <sfrost@snowman.net>.   
http://snowman.net/projects/ipt_recent/
  arp_tables: (C) 2002 David S. Miller

I'm attacing the full .config.

Stelian.
Comment 15 Stelian Pop 2006-06-05 07:11:20 UTC
Created attachment 8255 [details]
.config for 2.6.17-rc5-git
Comment 16 Stelian Pop 2006-06-05 07:17:23 UTC
I've examined my logs, and I believe this problem was not in the original
2.6.17-rc4...
Comment 17 Stelian Pop 2006-06-05 08:44:12 UTC
After some more research, I found out that the problem exists in both 2.6.17-rc4
and -rc3, but it only happens when you compile with CONFIG_DEBUG_SLAB.
Comment 18 Stelian Pop 2006-06-05 08:59:19 UTC
in 2.6.16 (with DEBUG_SLAB), /proc/net/ip_tables_names shows 'raw' and 'filter',
but no 'nat' table. Same message (ip_nat_init: can't setup rules) in dmesg.
Comment 19 Stelian Pop 2006-06-05 09:25:13 UTC
2.6.15 is ok.
Comment 20 Stelian Pop 2006-06-05 15:32:47 UTC
Created attachment 8265 [details]
Patch hopefully fixing the problem

The problem was caused by an alignment problem. On PowerPC, vmalloc() does not
always return an __alignof__(struct _xt_align) aligned address, causing some
test to fail later.

The fix is crude, it is probably possible to do better than that but right now
I need some sleep :)

Please make sure this patch (or a better version of it) hits Linus ASAP,
hopefully before final 2.6.17 is released.

Stelian.
Comment 21 Meelis Roos 2006-06-10 03:54:47 UTC
> The problem was caused by an alignment problem. On PowerPC, vmalloc() does
> not
> always return an __alignof__(struct _xt_align) aligned address, causing some
> test to fail later.

This patch works for me too.
Comment 22 Natalie Protasevich 2007-07-08 11:47:30 UTC
Stelian, what's the status on this patch, has it been submitted?
Thanks.
Comment 23 Stelian Pop 2007-07-09 02:00:51 UTC
(In reply to comment #22)
> Stelian, what's the status on this patch, has it been submitted?
> Thanks.

I must say I don't know, it's been a long time... Since I haven't activated CONFIG_DEBUG_SLAB since, I haven't been affected by the bug, should it still be present.

This should be retested.
Comment 24 Natalie Protasevich 2007-07-20 21:47:53 UTC
Meelis,
I don't see Stelian's patch in the git tree, but still some other changes could've fixed your problem. Can you please test the new kernel 2.6.22+ and confirm the problem is still there or otherwise.
Comment 25 Meelis Roos 2007-07-26 10:20:02 UTC
> I don't see Stelian's patch in the git tree, but still some other changes
> could've fixed your problem. Can you please test the new kernel 2.6.22+ and
> confirm the problem is still there or otherwise.

Tried it with SLAB debug and everyting worked fine (2.6.23-rc1+git). So 
it seems to have been fixed meanwhile by something else.
Comment 26 Andrew Morton 2007-08-02 14:05:49 UTC
OK, thanks, let's close the report.