Bug 12518

Summary: BUG: using smp_processor_id() in preemptible [00000000] code: dellWirelessCtl/...
Product: Other Reporter: Alex Riesen (raa.lkml)
Component: OtherAssignee: other_other
Status: CLOSED CODE_FIX    
Severity: normal CC: rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: v2.6.28-5716-gfe0bdec (also happens with v2.6.29-rc2) Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 12398    
Attachments: dmesg booting v2.6.29-rc3 + 651f8118cf0a5724

Description Alex Riesen 2009-01-21 14:21:08 UTC
Latest working kernel version: v2.6.28
Earliest failing kernel version: v2.6.28-5716-gfe0bdec (maybe earlier)
Distribution: Ubuntu
Hardware Environment: Dell XPS M1330
Software Environment: dcdbas?
Problem Description:

I see this often (all the time) with v2.6.28-5716-gfe0bdec. It is a
Dell XPS M1330 (iwl3945, the newer, "...-2" firmware).

Jan  5 19:49:36 blimp kernel: BUG: using smp_processor_id() in preemptible [00000000] code: dellWirelessCtl/14180
Jan  5 19:49:36 blimp kernel: caller is smi_request+0x89/0xc8
Jan  5 19:49:36 blimp kernel: Pid: 14180, comm: dellWirelessCtl Not tainted 2.6.28-t #146
Jan  5 19:49:36 blimp kernel: Call Trace:
Jan  5 19:49:36 blimp kernel:  [<ffffffff8033ee83>] debug_smp_processor_id+0xdb/0xf0
Jan  5 19:49:36 blimp kernel:  [<ffffffff8046a6d8>] smi_request+0x89/0xc8
Jan  5 19:49:36 blimp kernel:  [<ffffffff8046a78e>] smi_request_store+0x77/0xa5
Jan  5 19:49:36 blimp kernel:  [<ffffffff803c69cb>] dev_attr_store+0x1e/0x20
Jan  5 19:49:36 blimp kernel:  [<ffffffff802e66e8>] sysfs_write_file+0xe4/0x119
Jan  5 19:49:36 blimp kernel:  [<ffffffff802a0084>] vfs_write+0xae/0x124
Jan  5 19:49:36 blimp kernel:  [<ffffffff802a01be>] sys_write+0x47/0x70
Jan  5 19:49:36 blimp kernel:  [<ffffffff8020b69b>] system_call_fastpath+0x16/0x1b

The function smi_request:

/**
 * smi_request: generate SMI request
 *
 * Called with smi_data_lock.
 */
static int smi_request(struct smi_cmd *smi_cmd)
{
	cpumask_t old_mask;
	int ret = 0;

	if (smi_cmd->magic != SMI_CMD_MAGIC) {
		dev_info(&dcdbas_pdev->dev, "%s: invalid magic value\n",
			 __func__);
		return -EBADR;
	}

	/* SMI requires CPU 0 */
	old_mask = current->cpus_allowed;
	set_cpus_allowed_ptr(current, &cpumask_of_cpu(0));
	if (smp_processor_id() != 0) {			    <==== here?
		dev_dbg(&dcdbas_pdev->dev, "%s: failed to get CPU 0\n",
			__func__);
		ret = -EBUSY;
		goto out;
	}

	/* generate SMI */
	asm volatile (
		"outb %b0,%w1"
		: /* no output args */
		: "a" (smi_cmd->command_code),
		  "d" (smi_cmd->command_address),
		  "b" (smi_cmd->ebx),
		  "c" (smi_cmd->ecx)
		: "memory"
	);

out:
	set_cpus_allowed_ptr(current, &old_mask);
	return ret;
}

As the problem points at dcdbas driver, it was also reported
to Dell's Doug Warzecha <Douglas_Warzecha@dell.com>.
No answer yet
Comment 1 Anonymous Emailer 2009-01-27 15:52:11 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 21 Jan 2009 14:21:09 -0800 (PST)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=12518
> 
>            Summary: BUG: using smp_processor_id() in preemptible [00000000]
>                     code:  dellWirelessCtl/...
>            Product: Other
>            Version: 2.5
>      KernelVersion: v2.6.28-5716-gfe0bdec (also happens with v2.6.29-rc2)
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: other_other@kernel-bugs.osdl.org
>         ReportedBy: fork0@users.sf.net
> 
> 
> Latest working kernel version: v2.6.28
> Earliest failing kernel version: v2.6.28-5716-gfe0bdec (maybe earlier)
> Distribution: Ubuntu
> Hardware Environment: Dell XPS M1330
> Software Environment: dcdbas?
> Problem Description:
> 
> I see this often (all the time) with v2.6.28-5716-gfe0bdec. It is a
> Dell XPS M1330 (iwl3945, the newer, "...-2" firmware).
> 
> Jan  5 19:49:36 blimp kernel: BUG: using smp_processor_id() in preemptible
> [00000000] code: dellWirelessCtl/14180
> Jan  5 19:49:36 blimp kernel: caller is smi_request+0x89/0xc8
> Jan  5 19:49:36 blimp kernel: Pid: 14180, comm: dellWirelessCtl Not tainted
> 2.6.28-t #146
> Jan  5 19:49:36 blimp kernel: Call Trace:
> Jan  5 19:49:36 blimp kernel:  [<ffffffff8033ee83>]
> debug_smp_processor_id+0xdb/0xf0
> Jan  5 19:49:36 blimp kernel:  [<ffffffff8046a6d8>] smi_request+0x89/0xc8
> Jan  5 19:49:36 blimp kernel:  [<ffffffff8046a78e>]
> smi_request_store+0x77/0xa5
> Jan  5 19:49:36 blimp kernel:  [<ffffffff803c69cb>] dev_attr_store+0x1e/0x20
> Jan  5 19:49:36 blimp kernel:  [<ffffffff802e66e8>]
> sysfs_write_file+0xe4/0x119
> Jan  5 19:49:36 blimp kernel:  [<ffffffff802a0084>] vfs_write+0xae/0x124
> Jan  5 19:49:36 blimp kernel:  [<ffffffff802a01be>] sys_write+0x47/0x70
> Jan  5 19:49:36 blimp kernel:  [<ffffffff8020b69b>]
> system_call_fastpath+0x16/0x1b
> 
> The function smi_request:
> 
> /**
>  * smi_request: generate SMI request
>  *
>  * Called with smi_data_lock.
>  */
> static int smi_request(struct smi_cmd *smi_cmd)
> {
>         cpumask_t old_mask;
>         int ret = 0;
> 
>         if (smi_cmd->magic != SMI_CMD_MAGIC) {
>                 dev_info(&dcdbas_pdev->dev, "%s: invalid magic value\n",
>                          __func__);
>                 return -EBADR;
>         }
> 
>         /* SMI requires CPU 0 */
>         old_mask = current->cpus_allowed;
>         set_cpus_allowed_ptr(current, &cpumask_of_cpu(0));
>         if (smp_processor_id() != 0) {                      <==== here?
>                 dev_dbg(&dcdbas_pdev->dev, "%s: failed to get CPU 0\n",
>                         __func__);
>                 ret = -EBUSY;
>                 goto out;
>         }
> 
>         /* generate SMI */
>         asm volatile (
>                 "outb %b0,%w1"
>                 : /* no output args */
>                 : "a" (smi_cmd->command_code),
>                   "d" (smi_cmd->command_address),
>                   "b" (smi_cmd->ebx),
>                   "c" (smi_cmd->ecx)
>                 : "memory"
>         );
> 
> out:
>         set_cpus_allowed_ptr(current, &old_mask);
>         return ret;
> }
> 
> As the problem points at dcdbas driver, it was also reported
> to Dell's Doug Warzecha <Douglas_Warzecha@dell.com>.
> No answer yet

Well that's odd.  debug_smp_processor_id() has an explicit test to see
if this thread is pinned to a single CPU.

I'm assuming that the recent cpumack changes broke that somehow. 
Perhaps it is already fixed?
Comment 2 Anonymous Emailer 2009-01-28 14:42:11 UTC
Reply-To: fork0@users.sourceforge.net

Andrew Morton, Wed, Jan 28, 2009 00:51:50 +0100:
> On Wed, 21 Jan 2009 14:21:09 -0800 (PST)
> bugme-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=12518
> > 
> >            Summary: BUG: using smp_processor_id() in preemptible [00000000]
> >                     code:  dellWirelessCtl/...
> > 
> > I see this often (all the time) with v2.6.28-5716-gfe0bdec. It is a
> > Dell XPS M1330 (iwl3945, the newer, "...-2" firmware).
> > 
> > ...
> > 
> > As the problem points at dcdbas driver, it was also reported
> > to Dell's Doug Warzecha <Douglas_Warzecha@dell.com>.
> > No answer yet
> 
> Well that's odd.  debug_smp_processor_id() has an explicit test to see
> if this thread is pinned to a single CPU.
> 
> I'm assuming that the recent cpumack changes broke that somehow. 
> Perhaps it is already fixed?
> 

No:

    BUG: using smp_processor_id() in preemptible [00000000] code: hald-addon-dell/4246
    caller is dcdbas_smi_request+0x8c/0x131
    Pid: 4246, comm: hald-addon-dell Not tainted 2.6.29-rc3-t #180
    Call Trace:
     [<ffffffff8037b477>] debug_smp_processor_id+0xdb/0xf0
     [<ffffffff804d7108>] dcdbas_smi_request+0x8c/0x131
     [<ffffffff804d7229>] smi_request_store+0x7c/0xab
     [<ffffffff804178e5>] dev_attr_store+0x23/0x25
     [<ffffffff802fbf70>] sysfs_write_file+0xe9/0x11e
     [<ffffffff802b2bd0>] vfs_write+0xb3/0x129
     [<ffffffff802b2d14>] sys_write+0x4c/0x75
     [<ffffffff8020b8cb>] system_call_fastpath+0x16/0x1b
Comment 3 Anonymous Emailer 2009-01-29 01:10:03 UTC
Reply-To: akpm@linux-foundation.org

On Wed, 28 Jan 2009 23:41:19 +0100 Alex Riesen <fork0@users.sourceforge.net> wrote:

> Andrew Morton, Wed, Jan 28, 2009 00:51:50 +0100:
> > On Wed, 21 Jan 2009 14:21:09 -0800 (PST)
> > bugme-daemon@bugzilla.kernel.org wrote:
> > 
> > > http://bugzilla.kernel.org/show_bug.cgi?id=12518
> > > 
> > >            Summary: BUG: using smp_processor_id() in preemptible
> [00000000]
> > >                     code:  dellWirelessCtl/...
> > > 
> > > I see this often (all the time) with v2.6.28-5716-gfe0bdec. It is a
> > > Dell XPS M1330 (iwl3945, the newer, "...-2" firmware).
> > > 
> > > ...
> > > 
> > > As the problem points at dcdbas driver, it was also reported
> > > to Dell's Doug Warzecha <Douglas_Warzecha@dell.com>.
> > > No answer yet
> > 
> > Well that's odd.  debug_smp_processor_id() has an explicit test to see
> > if this thread is pinned to a single CPU.
> > 
> > I'm assuming that the recent cpumack changes broke that somehow. 
> > Perhaps it is already fixed?
> > 
> 
> No:
> 
>     BUG: using smp_processor_id() in preemptible [00000000] code:
>     hald-addon-dell/4246
>     caller is dcdbas_smi_request+0x8c/0x131
>     Pid: 4246, comm: hald-addon-dell Not tainted 2.6.29-rc3-t #180
>     Call Trace:
>      [<ffffffff8037b477>] debug_smp_processor_id+0xdb/0xf0
>      [<ffffffff804d7108>] dcdbas_smi_request+0x8c/0x131
>      [<ffffffff804d7229>] smi_request_store+0x7c/0xab
>      [<ffffffff804178e5>] dev_attr_store+0x23/0x25
>      [<ffffffff802fbf70>] sysfs_write_file+0xe9/0x11e
>      [<ffffffff802b2bd0>] vfs_write+0xb3/0x129
>      [<ffffffff802b2d14>] sys_write+0x4c/0x75
>      [<ffffffff8020b8cb>] system_call_fastpath+0x16/0x1b

Rusty, Mike: could you please take a look at this one?

debug_smp_processor_id() shouldn't be warning if the task is pinned to
a single CPU, which dcdbas_smi_request() clearly just did.

(reads the new cpumask code for a while)

(falls over and dies)

(posthumously wonders why debug_smp_processor_id() doesn't just do
if (cpus_weight(&current->cpus_allowed) == 1) or something)
Comment 4 Rusty Russell 2009-01-29 03:00:21 UTC
On Thursday 29 January 2009 09:11:19 Alex Riesen wrote:
> Andrew Morton, Wed, Jan 28, 2009 00:51:50 +0100:
> > Well that's odd.  debug_smp_processor_id() has an explicit test to see
> > if this thread is pinned to a single CPU.
> > 
> > I'm assuming that the recent cpumack changes broke that somehow. 
> > Perhaps it is already fixed?
> > 
> 
> No:
> 
>     BUG: using smp_processor_id() in preemptible [00000000] code:
>     hald-addon-dell/4246

This *is* weird.  I have a fix for this driver to use work_on_cpu, which
I sent out twice, but hasn't been applied.  I will be pushing it for .30
regardless.

Try this which shouldn't do anything, but you never know:

cpumask: convert lib/smp_processor_id to new cpumask ops.

Impact: cleanup

Convert other misc kernel functions to use struct cpumask.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 lib/smp_processor_id.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.28.orig/lib/smp_processor_id.c
+++ linux-2.6.28/lib/smp_processor_id.c
@@ -22,7 +22,7 @@ notrace unsigned int debug_smp_processor
 	 * Kernel threads bound to a single CPU can safely use
 	 * smp_processor_id():
 	 */
-	if (cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu)))
+	if (cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu)))
 		goto out;
 
 	/*
Comment 5 Mike Travis 2009-01-29 09:32:56 UTC
Rusty Russell wrote:
> On Thursday 29 January 2009 09:11:19 Alex Riesen wrote:
>> Andrew Morton, Wed, Jan 28, 2009 00:51:50 +0100:
>>> Well that's odd.  debug_smp_processor_id() has an explicit test to see
>>> if this thread is pinned to a single CPU.
>>>
>>> I'm assuming that the recent cpumack changes broke that somehow. 
>>> Perhaps it is already fixed?
>>>
>> No:
>>
>>     BUG: using smp_processor_id() in preemptible [00000000] code:
>>     hald-addon-dell/4246
> 
> This *is* weird.  I have a fix for this driver to use work_on_cpu, which
> I sent out twice, but hasn't been applied.  I will be pushing it for .30
> regardless.

It's in tip/cpus4096:

commit 651f8118cf0a5724f23fe1de4a3d9d36b2e01c2e
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Sat Jan 10 21:58:09 2009 -0800

    cpumask: convert other misc kernel functions

    Impact: use new cpumask API.

    Convert other misc kernel functions to use struct cpumask.

    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: Mike Travis <travis@sgi.com>

diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
index 0f8fc22..4689cb0 100644
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -22,7 +22,7 @@ notrace unsigned int debug_smp_processor_id(void)
         * Kernel threads bound to a single CPU can safely use
         * smp_processor_id():
         */
-       if (cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu)))
+       if (cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu)))
                goto out;

        /*

> 
> Try this which shouldn't do anything, but you never know:
> 
> cpumask: convert lib/smp_processor_id to new cpumask ops.
> 
> Impact: cleanup
> 
> Convert other misc kernel functions to use struct cpumask.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  lib/smp_processor_id.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-2.6.28.orig/lib/smp_processor_id.c
> +++ linux-2.6.28/lib/smp_processor_id.c
> @@ -22,7 +22,7 @@ notrace unsigned int debug_smp_processor
>        * Kernel threads bound to a single CPU can safely use
>        * smp_processor_id():
>        */
> -     if (cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu)))
> +     if (cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu)))
>               goto out;
>  
>       /*
Comment 6 Anonymous Emailer 2009-01-29 09:51:03 UTC
Reply-To: akpm@linux-foundation.org

On Thu, 29 Jan 2009 09:32:19 -0800 Mike Travis <travis@sgi.com> wrote:

> Rusty Russell wrote:
> > On Thursday 29 January 2009 09:11:19 Alex Riesen wrote:
> >> Andrew Morton, Wed, Jan 28, 2009 00:51:50 +0100:
> >>> Well that's odd.  debug_smp_processor_id() has an explicit test to see
> >>> if this thread is pinned to a single CPU.
> >>>
> >>> I'm assuming that the recent cpumack changes broke that somehow. 
> >>> Perhaps it is already fixed?
> >>>
> >> No:
> >>
> >>     BUG: using smp_processor_id() in preemptible [00000000] code:
> hald-addon-dell/4246
> > 
> > This *is* weird.  I have a fix for this driver to use work_on_cpu, which
> > I sent out twice, but hasn't been applied.  I will be pushing it for .30
> > regardless.
> 
> It's in tip/cpus4096:

Ingo, it's needed in mainline.

> commit 651f8118cf0a5724f23fe1de4a3d9d36b2e01c2e
> Author: Rusty Russell <rusty@rustcorp.com.au>
> Date:   Sat Jan 10 21:58:09 2009 -0800
> 
>     cpumask: convert other misc kernel functions
> 
>     Impact: use new cpumask API.
> 
>     Convert other misc kernel functions to use struct cpumask.
> 
>     Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>     Signed-off-by: Mike Travis <travis@sgi.com>
>
> diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
> index 0f8fc22..4689cb0 100644
> --- a/lib/smp_processor_id.c
> +++ b/lib/smp_processor_id.c
> @@ -22,7 +22,7 @@ notrace unsigned int debug_smp_processor_id(void)
>          * Kernel threads bound to a single CPU can safely use
>          * smp_processor_id():
>          */
> -       if (cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu)))
> +       if (cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu)))
>                 goto out;
> 
>         /*
> 

This is superficialy srtange.  Why would

	cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu))

and

	cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu))

return differing values?
Comment 7 Mike Travis 2009-01-29 12:08:54 UTC
Andrew Morton wrote:
> On Thu, 29 Jan 2009 09:32:19 -0800 Mike Travis <travis@sgi.com> wrote:
> 
>> Rusty Russell wrote:
>>> On Thursday 29 January 2009 09:11:19 Alex Riesen wrote:
>>>> Andrew Morton, Wed, Jan 28, 2009 00:51:50 +0100:
>>>>> Well that's odd.  debug_smp_processor_id() has an explicit test to see
>>>>> if this thread is pinned to a single CPU.
>>>>>
>>>>> I'm assuming that the recent cpumack changes broke that somehow. 
>>>>> Perhaps it is already fixed?
>>>>>
>>>> No:
>>>>
>>>>     BUG: using smp_processor_id() in preemptible [00000000] code:
>>>>     hald-addon-dell/4246
>>> This *is* weird.  I have a fix for this driver to use work_on_cpu, which
>>> I sent out twice, but hasn't been applied.  I will be pushing it for .30
>>> regardless.
>> It's in tip/cpus4096:
> 
> Ingo, it's needed in mainline.
> 
>> commit 651f8118cf0a5724f23fe1de4a3d9d36b2e01c2e
>> Author: Rusty Russell <rusty@rustcorp.com.au>
>> Date:   Sat Jan 10 21:58:09 2009 -0800
>>
>>     cpumask: convert other misc kernel functions
>>
>>     Impact: use new cpumask API.
>>
>>     Convert other misc kernel functions to use struct cpumask.
>>
>>     Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>>     Signed-off-by: Mike Travis <travis@sgi.com>
>>
>> diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
>> index 0f8fc22..4689cb0 100644
>> --- a/lib/smp_processor_id.c
>> +++ b/lib/smp_processor_id.c
>> @@ -22,7 +22,7 @@ notrace unsigned int debug_smp_processor_id(void)
>>          * Kernel threads bound to a single CPU can safely use
>>          * smp_processor_id():
>>          */
>> -       if (cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu)))
>> +       if (cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu)))
>>                 goto out;
>>
>>         /*
>>
> 
> This is superficialy srtange.  Why would
> 
>       cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu))
> 
> and
> 
>       cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu))
> 
> return differing values?

I wonder if some bits >= nr_cpu_ids are being set in current->cpus_allowed?
(Since the only difference between the above two is the first uses NR_CPUS
and the second uses nr_cpu_ids.)

-Mike
Comment 8 Anonymous Emailer 2009-01-29 13:12:23 UTC
Reply-To: fork0@users.sourceforge.net

Rusty Russell, Thu, Jan 29, 2009 11:59:41 +0100:
> On Thursday 29 January 2009 09:11:19 Alex Riesen wrote:
> > Andrew Morton, Wed, Jan 28, 2009 00:51:50 +0100:
> > > Well that's odd.  debug_smp_processor_id() has an explicit test to see
> > > if this thread is pinned to a single CPU.
> > > 
> > > I'm assuming that the recent cpumack changes broke that somehow. 
> > > Perhaps it is already fixed?
> > > 
> > 
> > No:
> > 
> >     BUG: using smp_processor_id() in preemptible [00000000] code:
> hald-addon-dell/4246
> 
> This *is* weird.  I have a fix for this driver to use work_on_cpu, which
> I sent out twice, but hasn't been applied.  I will be pushing it for .30
> regardless.
> 
> Try this which shouldn't do anything, but you never know:
> 
> cpumask: convert lib/smp_processor_id to new cpumask ops.
> 

It did something: it fixed the BUG.
Comment 9 Alex Riesen 2009-01-29 13:14:40 UTC
Created attachment 20043 [details]
dmesg booting v2.6.29-rc3 + 651f8118cf0a5724
Comment 10 Rusty Russell 2009-01-29 17:43:03 UTC
On Friday 30 January 2009 04:20:15 Andrew Morton wrote:
> This is superficialy srtange.  Why would
> 
>       cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu))
> 
> and
> 
>       cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu))
> 
> return differing values?

Ingo, another fix (Alex, I assume you have CONFIG_CPUMASK_OFFSTACK=y?).

Subject: cpumask: convert lib/smp_processor_id to new cpumask ops.

Impact: fix debug_smp_processor_id() for CONFIG_CPUMASK_OFFSTACK=y

The scheduler now uses the new cpumask API, which deals up to
nr_cpumask_bits, whereas the API used NR_CPUS bits.

If CONFIG_CPUMASK_OFFSTACK=y these two are not equal, so the top bits
are undefined.  Leading to bug 12518 "BUG: using smp_processor_id() in
preemptible [00000000] code: dellWirelessCtl/..."

The fix is simple: use the modern API in the check.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 lib/smp_processor_id.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.28.orig/lib/smp_processor_id.c
+++ linux-2.6.28/lib/smp_processor_id.c
@@ -22,7 +22,7 @@ notrace unsigned int debug_smp_processor
 	 * Kernel threads bound to a single CPU can safely use
 	 * smp_processor_id():
 	 */
-	if (cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu)))
+	if (cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu)))
 		goto out;
 
 	/*
Comment 11 Rusty Russell 2009-01-29 17:50:32 UTC
On Friday 30 January 2009 04:02:19 Mike Travis wrote:
> Rusty Russell wrote:
> > On Thursday 29 January 2009 09:11:19 Alex Riesen wrote:
> >> Andrew Morton, Wed, Jan 28, 2009 00:51:50 +0100:
> >>> Well that's odd.  debug_smp_processor_id() has an explicit test to see
> >>> if this thread is pinned to a single CPU.
> >>>
> >>> I'm assuming that the recent cpumack changes broke that somehow. 
> >>> Perhaps it is already fixed?
> >>>
> >> No:
> >>
> >>     BUG: using smp_processor_id() in preemptible [00000000] code:
> hald-addon-dell/4246
> > 
> > This *is* weird.  I have a fix for this driver to use work_on_cpu, which
> > I sent out twice, but hasn't been applied.  I will be pushing it for .30
> > regardless.
> 
> It's in tip/cpus4096:
>
> commit 651f8118cf0a5724f23fe1de4a3d9d36b2e01c2e
> Author: Rusty Russell <rusty@rustcorp.com.au>
> Date:   Sat Jan 10 21:58:09 2009 -0800
> 
>     cpumask: convert other misc kernel functions
> 
>     Impact: use new cpumask API.

That wasn't the patch I was referring to (the one which convers the dcbdas
driver to work_on_cpu is what I meant) but I just re-submitted this identical
one-liner with a much longer explanation of why it's now a bug fix, not a
cleanup.

Since they're textually identical, I don't care which one goes in, but it's now
urgent.

Thanks,
Rusty.
Comment 12 Anonymous Emailer 2009-01-29 23:30:49 UTC
Reply-To: fork0@users.sourceforge.net

Rusty Russell, Fri, Jan 30, 2009 02:42:52 +0100:
> On Friday 30 January 2009 04:20:15 Andrew Morton wrote:
> > This is superficialy srtange.  Why would
> > 
> >     cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu))
> > 
> > and
> > 
> >     cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu))
> > 
> > return differing values?
> 
> Ingo, another fix (Alex, I assume you have CONFIG_CPUMASK_OFFSTACK=y?).
> 

Yes, I do.
Comment 13 Ingo Molnar 2009-01-30 06:11:48 UTC
* Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Friday 30 January 2009 04:20:15 Andrew Morton wrote:
> > This is superficialy srtange.  Why would
> > 
> >     cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu))
> > 
> > and
> > 
> >     cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu))
> > 
> > return differing values?
> 
> Ingo, another fix (Alex, I assume you have CONFIG_CPUMASK_OFFSTACK=y?).
> 
> Subject: cpumask: convert lib/smp_processor_id to new cpumask ops.
> 
> Impact: fix debug_smp_processor_id() for CONFIG_CPUMASK_OFFSTACK=y
> 
> The scheduler now uses the new cpumask API, which deals up to
> nr_cpumask_bits, whereas the API used NR_CPUS bits.
> 
> If CONFIG_CPUMASK_OFFSTACK=y these two are not equal, so the top bits
> are undefined.  Leading to bug 12518 "BUG: using smp_processor_id() in
> preemptible [00000000] code: dellWirelessCtl/..."
> 
> The fix is simple: use the modern API in the check.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  lib/smp_processor_id.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

> -     if (cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu)))
> +     if (cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu)))
>               goto out;

this change was already added to tip/master two weeks ago, via:

    651f811: cpumask: convert other misc kernel functions

but (see the changelog below) it was not realized that it's a fix for 
upstream too.

So i've cherry-picked it over into tip/core/urgent, and i've added your 
more specific bug description above.

Thanks,

	Ingo

--------------->
From 651f8118cf0a5724f23fe1de4a3d9d36b2e01c2e Mon Sep 17 00:00:00 2001
From: Rusty Russell <rusty@rustcorp.com.au>
Date: Sat, 10 Jan 2009 21:58:09 -0800
Subject: [PATCH] cpumask: convert other misc kernel functions

Impact: use new cpumask API.

Convert other misc kernel functions to use struct cpumask.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
---
 lib/smp_processor_id.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
index 0f8fc22..4689cb0 100644
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -22,7 +22,7 @@ notrace unsigned int debug_smp_processor_id(void)
 	 * Kernel threads bound to a single CPU can safely use
 	 * smp_processor_id():
 	 */
-	if (cpus_equal(current->cpus_allowed, cpumask_of_cpu(this_cpu)))
+	if (cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu)))
 		goto out;
 
 	/*
Comment 14 Rafael J. Wysocki 2009-02-04 17:12:21 UTC
Fixed by commit 4ab0a9409af5fad74ad1fc9e46d5a8b460f353e9 .