Bug 68221

Summary: snd-ice1712 does not work for kernel 3.12.x any longer
Product: Memory Management Reporter: dl1ksv
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: normal CC: szg00000, tiwai
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.12.x Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel config of my kernel 3.12.6
Patch to select CONFIG_ZONE_DMA

Description dl1ksv 2014-01-06 16:48:49 UTC
Up to kernel kernel 3.11.6 snd-ice1712 worked flawless With Delta M44 card.

Setup:

dr-xr-xr-x 6 root root 0  6. Jan 17:38 card0
dr-xr-xr-x 4 root root 0  6. Jan 17:38 card1
dr-xr-xr-x 6 root root 0  6. Jan 17:38 card2
dr-xr-xr-x 3 root root 0  6. Jan 17:38 card3
dr-xr-xr-x 3 root root 0  6. Jan 17:38 card4
-r--r--r-- 1 root root 0  6. Jan 17:38 cards
-r--r--r-- 1 root root 0  6. Jan 17:38 devices
lrwxrwxrwx 1 root root 5  6. Jan 17:38 Generic -> card4
-r--r--r-- 1 root root 0  6. Jan 17:38 hwdep
lrwxrwxrwx 1 root root 5  6. Jan 17:38 Loopback -> card2
lrwxrwxrwx 1 root root 5  6. Jan 17:38 M44 -> card1
-r--r--r-- 1 root root 0  6. Jan 17:38 modules
-r--r--r-- 1 root root 0  6. Jan 17:38 pcm
lrwxrwxrwx 1 root root 5  6. Jan 17:38 SB -> card0
dr-xr-xr-x 2 root root 0  6. Jan 17:38 seq
-r--r--r-- 1 root root 0  6. Jan 17:38 timers
lrwxrwxrwx 1 root root 5  6. Jan 17:38 V20 -> card3
-r--r--r-- 1 root root 0  6. Jan 17:38 version

Kernel 3.11.6
 aplay -Dplughw:1 testsounds2011.09.02.17.46.38.wav 
Wiedergabe: WAVE 'testsounds2011.09.02.17.46.38.wav' : Signed 16 bit Little Endian, Rate: 11025 Hz, mono

No problems

Kernel 3.12.x

aplay -Dplughw:1 testsounds2011.09.02.17.46.38.wav 
Wiedergabe: WAVE 'testsounds2011.09.02.17.46.38.wav' : Signed 16 bit Little Endian, Rate: 11025 Hz, mono
aplay: set_params:1297: Fehler beim Setzen der Hardware-Parameter:
ACCESS:  RW_INTERLEAVED
FORMAT:  S16_LE
SUBFORMAT:  STD
SAMPLE_BITS: 16
FRAME_BITS: 16
CHANNELS: 1
RATE: 11025
PERIOD_TIME: (124988 124989)
PERIOD_SIZE: 1378
PERIOD_BYTES: 2756
PERIODS: (4 5)
BUFFER_TIME: (500045 500046)
BUFFER_SIZE: 5513
BUFFER_BYTES: 11026
TICK_TIME: 0

dmesg shows no further messages.

If you look at the prealloc parameter in
/proc/asound/card1/pcm0p
you see that this paramter is set to 0 while in kernel 3.11.6 this parameter is set to 256.

Trying to set this parameter to 256 by
echo 256>prealloc

leads to
ALSA sound/core/info.c:423 data write error to prealloc (-12)

in the dmesg log
echo 1 >prealloc

leads to

ALSA sound/core/info.c:423 data write error to prealloc (-22)
Comment 1 Takashi Iwai 2014-01-06 17:17:36 UTC
There is no code change at all in sound/pci/ice1712/* and sound/core/* regarding snd-ice1712 driver between 3.11 and later kernels, so it cannot be the reason.

As you noticed, the problem is likely the failed buffer allocation.  The driver couldn't pre-allocate the buffer but didn't stop.  It's a valid behavior since it's no fatal error, per se.  (But we could show some warning, at least...)

The error code -12 by echo 256 indicates that you couldn't allocate the continuous memory of the given size.  (The error code -22 means an invalid value, which is true for the given value 1.)

That being said, the culprit is outside the sound driver.  Possibly some changes in memory allocator, or any other driver grabs too big pages before the sound driver.  In anyway, please try to bisect.
Comment 2 dl1ksv 2014-01-07 19:47:11 UTC
(In reply to Takashi Iwai from comment #1)
> There is no code change at all in sound/pci/ice1712/* and sound/core/*
> regarding snd-ice1712 driver between 3.11 and later kernels, so it cannot be
> the reason.
> 
> As you noticed, the problem is likely the failed buffer allocation.  The
> driver couldn't pre-allocate the buffer but didn't stop.  It's a valid
> behavior since it's no fatal error, per se.  (But we could show some
> warning, at least...)
> 
> The error code -12 by echo 256 indicates that you couldn't allocate the
> continuous memory of the given size.  (The error code -22 means an invalid
> value, which is true for the given value 1.)
> 
> That being said, the culprit is outside the sound driver.  Possibly some
> changes in memory allocator, or any other driver grabs too big pages before
> the sound driver.  In anyway, please try to bisect.

This is the result of my bisect:

# first bad commit: [81c0a2bb515fd4daae8cab64352877480792b515] mm: page_alloc: fair zone allocator policy

Some further information:
I use 8GB of memory
cat /proc/meminfo
MemTotal:        8155528 kB
MemFree:         7240716 kB
Buffers:            1888 kB
Cached:           422336 kB
SwapCached:            0 kB
Active:           417112 kB
Inactive:         384748 kB
Active(anon):     378084 kB
Inactive(anon):     5000 kB
Active(file):      39028 kB
Inactive(file):   379748 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8388600 kB
SwapFree:        8388600 kB
Dirty:              1260 kB
Writeback:             0 kB
AnonPages:        377636 kB
Mapped:            87212 kB
Shmem:              5448 kB
Slab:              40680 kB
SReclaimable:      19404 kB
SUnreclaim:        21276 kB
KernelStack:        2720 kB
PageTables:        17432 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12466364 kB
Committed_AS:    1375628 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      312308 kB
VmallocChunk:   34359406592 kB
HardwareCorrupted:     0 kB
DirectMap4k:       55464 kB
DirectMap2M:     3063808 kB
DirectMap1G:     5242880 kB
Comment 3 Takashi Iwai 2014-01-08 06:33:50 UTC
Thanks.  Could you confirm that the problem still exists on the latest 3.13-rc kernel, just to be sure?
Comment 4 dl1ksv 2014-01-08 07:07:57 UTC
Problem remains with kernel 3.13.0-rc7
Comment 5 Takashi Iwai 2014-01-08 13:01:38 UTC
I tested 3.12.x on a machine with an ice1712 board, and the buffer allocation works there.  I suspect it depends on the kernel config.

Could you attach your kernel config?
Comment 6 dl1ksv 2014-01-08 15:31:31 UTC
Created attachment 121271 [details]
kernel config of my kernel 3.12.6
Comment 7 Takashi Iwai 2014-01-09 14:06:27 UTC
Thanks.  It seems that the behavior depends on CONFIG_ZONE_DMA.
Could you set CONFIG_ZONE_DMA=y and see whether it works now?
(CONFIG_BOUNCE seems irrelevant.)
Comment 8 dl1ksv 2014-01-09 15:13:56 UTC
Yes, that option did it. But I wonder as in 3.11.x it works witout this option set.
Nevertheless now it works.
Many thanks for your assistance.
Comment 9 Takashi Iwai 2014-01-09 15:26:15 UTC
Yes, this is a regression in the recent kernel, but people didn't notice it since usually CONFIG_ZONE_DMA is enabled.

I'll change the component to MM and reassign someone else who has a better clue :)
Comment 10 Takashi Iwai 2014-01-09 15:38:04 UTC
In short, dma_alloc_coherent() with a high order fails since 3.12 when CONFIG_DMA_ZONE isn't set.  It could be worked around by disabling fair zone allocation, too, e.g.

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2391,10 +2391,12 @@ static void prepare_slowpath(gfp_t gfp_mask, unsigned int order,
 		 */
 		if (!zone_local(preferred_zone, zone))
 			continue;
+#if 0
 		mod_zone_page_state(zone, NR_ALLOC_BATCH,
 				    high_wmark_pages(zone) -
 				    low_wmark_pages(zone) -
 				    zone_page_state(zone, NR_ALLOC_BATCH));
+#endif
 	}
 }
Comment 11 Takashi Iwai 2014-01-09 15:53:37 UTC
Another note: I found that the problem is related with DMA mask set by the driver.  This device needs 28bit DMA mask, so it calls pci_set_dma_mask() and pci_set_consistent_dma_mask() with DMA_BIT_MASK(28).  Without these calls, the page allocation succeeded.
Comment 12 Andrew Morton 2014-01-09 21:26:50 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 09 Jan 2014 15:38:04 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=68221

Johannes, could you have a headscratch over this one please? 
fair-zone-allocator-policy is causing an audio driver high-order page
allocation failure, but only when CONFIG_DMA_ZONE=n and
pci_set_dma_mask(DMA_BIT_MASK(28)).


btw, this was harder to debug than it needed to be - we didn't get the
usual page-allocation-failure warning, presumably because
snd_malloc_dev_pages() uses __GFP_NOWARN.  Perhaps we should undo that,
or teach snd_malloc_dev_pages() to emit its own warning under suitable
circumstances?

> --- Comment #10 from Takashi Iwai <tiwai@suse.de> ---
> In short, dma_alloc_coherent() with a high order fails since 3.12 when
> CONFIG_DMA_ZONE isn't set.  It could be worked around by disabling fair zone
> allocation, too, e.g.
> 
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2391,10 +2391,12 @@ static void prepare_slowpath(gfp_t gfp_mask, unsigned
> int order,
>           */
>          if (!zone_local(preferred_zone, zone))
>              continue;
> +#if 0
>          mod_zone_page_state(zone, NR_ALLOC_BATCH,
>                      high_wmark_pages(zone) -
>                      low_wmark_pages(zone) -
>                      zone_page_state(zone, NR_ALLOC_BATCH));
> +#endif
>      }
>  }
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.
Comment 13 Johannes Weiner 2014-01-09 23:20:41 UTC
On Thu, Jan 09, 2014 at 01:26:46PM -0800, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Thu, 09 Jan 2014 15:38:04 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=68221
> 
> Johannes, could you have a headscratch over this one please? 
> fair-zone-allocator-policy is causing an audio driver high-order page
> allocation failure, but only when CONFIG_DMA_ZONE=n and
> pci_set_dma_mask(DMA_BIT_MASK(28)).

I have a hard time finding the exact path from the driver to the page
allocator, but I think we end up in x86's dma_generic_alloc_coherent.

Without a DMA zone, that function simply allocates DMA32 memory and
hopes that the result happens to be within the 28 bit address limit.
No reclaim, no compaction, just a one-off crapshoot.

As the DMA32 zone is filling up, the odds of that succeeding dwindle
drastically.  The fair zone allocator expedites this by filling all
zones evenly instead of the Normal zone first, but it does not make a
difference anymore after a bit of uptime.

The DMA zone is the only thing that provides a reasonable chance for
such allocations to succeed, so any driver that has addressing
restrictions below 32 bit should select CONFIG_ZONE_DMA.

> btw, this was harder to debug than it needed to be - we didn't get the
> usual page-allocation-failure warning, presumably because
> snd_malloc_dev_pages() uses __GFP_NOWARN.  Perhaps we should undo that,
> or teach snd_malloc_dev_pages() to emit its own warning under suitable
> circumstances?

If my analysis is correct, it's not the page allocator that fails but
the dma layer that filters inadequate allocations and returns NULL.
Comment 14 Andrew Morton 2014-01-09 23:30:12 UTC
On Thu, 9 Jan 2014 18:20:33 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:

> On Thu, Jan 09, 2014 at 01:26:46PM -0800, Andrew Morton wrote:
> > 
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> > 
> > On Thu, 09 Jan 2014 15:38:04 +0000 bugzilla-daemon@bugzilla.kernel.org
> wrote:
> > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=68221
> > 
> > Johannes, could you have a headscratch over this one please? 
> > fair-zone-allocator-policy is causing an audio driver high-order page
> > allocation failure, but only when CONFIG_DMA_ZONE=n and
> > pci_set_dma_mask(DMA_BIT_MASK(28)).
> 
> I have a hard time finding the exact path from the driver to the page
> allocator, but I think we end up in x86's dma_generic_alloc_coherent.
> 
> Without a DMA zone, that function simply allocates DMA32 memory and
> hopes that the result happens to be within the 28 bit address limit.
> No reclaim, no compaction, just a one-off crapshoot.
> 
> As the DMA32 zone is filling up, the odds of that succeeding dwindle
> drastically.  The fair zone allocator expedites this by filling all
> zones evenly instead of the Normal zone first, but it does not make a
> difference anymore after a bit of uptime.
> 
> The DMA zone is the only thing that provides a reasonable chance for
> such allocations to succeed, so any driver that has addressing
> restrictions below 32 bit should select CONFIG_ZONE_DMA.

Ah, yes, right, what a disaster.  I dimly remember deciding a long time
ago that the sane way to honour pci_set_dma_mask() was for the page
allocator to implement separate zones for DMA_BIT_MASK(31),
DMA_BIT_MASK(30), ..., DMA_BIT_MASK(13).  That didn't happen ;)

So what to do?

a) I guess ZONE_DMA corresponds to DMA_BIT_MASK(24) and should be
   suitable for most pci_set_dma_mask() users.  So drivers which set
   such masks should depend on or select CONFIG_DMA_ZONE. Or, better,

b) teach pci-dma.c to reclaim/compact/migrate its way to success.
Comment 15 Takashi Iwai 2014-01-10 08:01:38 UTC
At Thu, 9 Jan 2014 13:26:46 -0800,
Andrew Morton wrote:
> 
> btw, this was harder to debug than it needed to be - we didn't get the
> usual page-allocation-failure warning, presumably because
> snd_malloc_dev_pages() uses __GFP_NOWARN.  Perhaps we should undo that,
> or teach snd_malloc_dev_pages() to emit its own warning under suitable
> circumstances?

A single allocation failure can be non-fatal, as we have a fallback to
reduce the size and retry.  But it'd be good to show the warning if 
the allocation fails completely, yes.
Comment 16 Takashi Iwai 2014-01-10 08:38:40 UTC
At Thu, 9 Jan 2014 18:20:33 -0500,
Johannes Weiner wrote:
> 
> On Thu, Jan 09, 2014 at 01:26:46PM -0800, Andrew Morton wrote:
> > 
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> > 
> > On Thu, 09 Jan 2014 15:38:04 +0000 bugzilla-daemon@bugzilla.kernel.org
> wrote:
> > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=68221
> > 
> > Johannes, could you have a headscratch over this one please? 
> > fair-zone-allocator-policy is causing an audio driver high-order page
> > allocation failure, but only when CONFIG_DMA_ZONE=n and
> > pci_set_dma_mask(DMA_BIT_MASK(28)).
> 
> I have a hard time finding the exact path from the driver to the page
> allocator, but I think we end up in x86's dma_generic_alloc_coherent.
> 
> Without a DMA zone, that function simply allocates DMA32 memory and
> hopes that the result happens to be within the 28 bit address limit.
> No reclaim, no compaction, just a one-off crapshoot.
> 
> As the DMA32 zone is filling up, the odds of that succeeding dwindle
> drastically.  The fair zone allocator expedites this by filling all
> zones evenly instead of the Normal zone first, but it does not make a
> difference anymore after a bit of uptime.
> 
> The DMA zone is the only thing that provides a reasonable chance for
> such allocations to succeed, so any driver that has addressing
> restrictions below 32 bit should select CONFIG_ZONE_DMA.

Yes, forcibly selecting CONFIG_ZONE_DMA would be one feasible option,
which can be put to stable kernels, too.

OTOH, I wonder why ZONE_DMA32 was filled up so quickly.  I tested a
different condition, 6GB machine with 31bit DMA mask, and it fails the
allocation, too, at boot time.  It shouldn't be too hungry at such an
early stage.

Actually, if we had the commit "mm: page_alloc: exclude unreclaimable
allocations from zone fairness policy", this problem could be worked
around, too.  Is fairness too strict?
Comment 17 Takashi Iwai 2014-01-10 15:20:53 UTC
At Thu, 9 Jan 2014 15:30:08 -0800,
Andrew Morton wrote:
> 
> On Thu, 9 Jan 2014 18:20:33 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > On Thu, Jan 09, 2014 at 01:26:46PM -0800, Andrew Morton wrote:
> > > 
> > > (switched to email.  Please respond via emailed reply-to-all, not via the
> > > bugzilla web interface).
> > > 
> > > On Thu, 09 Jan 2014 15:38:04 +0000 bugzilla-daemon@bugzilla.kernel.org
> wrote:
> > > 
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=68221
> > > 
> > > Johannes, could you have a headscratch over this one please? 
> > > fair-zone-allocator-policy is causing an audio driver high-order page
> > > allocation failure, but only when CONFIG_DMA_ZONE=n and
> > > pci_set_dma_mask(DMA_BIT_MASK(28)).
> > 
> > I have a hard time finding the exact path from the driver to the page
> > allocator, but I think we end up in x86's dma_generic_alloc_coherent.
> > 
> > Without a DMA zone, that function simply allocates DMA32 memory and
> > hopes that the result happens to be within the 28 bit address limit.
> > No reclaim, no compaction, just a one-off crapshoot.
> > 
> > As the DMA32 zone is filling up, the odds of that succeeding dwindle
> > drastically.  The fair zone allocator expedites this by filling all
> > zones evenly instead of the Normal zone first, but it does not make a
> > difference anymore after a bit of uptime.
> > 
> > The DMA zone is the only thing that provides a reasonable chance for
> > such allocations to succeed, so any driver that has addressing
> > restrictions below 32 bit should select CONFIG_ZONE_DMA.
> 
> Ah, yes, right, what a disaster.  I dimly remember deciding a long time
> ago that the sane way to honour pci_set_dma_mask() was for the page
> allocator to implement separate zones for DMA_BIT_MASK(31),
> DMA_BIT_MASK(30), ..., DMA_BIT_MASK(13).  That didn't happen ;)
> 
> So what to do?
> 
> a) I guess ZONE_DMA corresponds to DMA_BIT_MASK(24) and should be
>    suitable for most pci_set_dma_mask() users.  So drivers which set
>    such masks should depend on or select CONFIG_DMA_ZONE. Or, better,
> 
> b) teach pci-dma.c to reclaim/compact/migrate its way to success.

I guess (a) is the best option for 3.12.x and 3.13.x from maintenance
POV.  I'll attach a fix patch in the bugzilla.  (Embedding a patch in
the bugzilla mail isn't good, right?)
Comment 18 Takashi Iwai 2014-01-10 15:23:18 UTC
Created attachment 121531 [details]
Patch to select CONFIG_ZONE_DMA
Comment 19 Andrew Morton 2014-01-10 20:59:15 UTC
On Fri, 10 Jan 2014 16:20:48 +0100 Takashi Iwai <tiwai@suse.de> wrote:

> > So what to do?
> > 
> > a) I guess ZONE_DMA corresponds to DMA_BIT_MASK(24) and should be
> >    suitable for most pci_set_dma_mask() users.  So drivers which set
> >    such masks should depend on or select CONFIG_DMA_ZONE. Or, better,
> > 
> > b) teach pci-dma.c to reclaim/compact/migrate its way to success.
> 
> I guess (a) is the best option for 3.12.x and 3.13.x from maintenance
> POV.

OK.

But then we'll just forget about it :(

>  I'll attach a fix patch in the bugzilla.  (Embedding a patch in
> the bugzilla mail isn't good, right?)

Well, patches in bugzilla don't get applied.  I assume you'll also put
a copy into the sound tree..
Comment 20 Takashi Iwai 2014-01-11 09:25:07 UTC
At Fri, 10 Jan 2014 12:59:11 -0800,
Andrew Morton wrote:
> 
> On Fri, 10 Jan 2014 16:20:48 +0100 Takashi Iwai <tiwai@suse.de> wrote:
> 
> > > So what to do?
> > > 
> > > a) I guess ZONE_DMA corresponds to DMA_BIT_MASK(24) and should be
> > >    suitable for most pci_set_dma_mask() users.  So drivers which set
> > >    such masks should depend on or select CONFIG_DMA_ZONE. Or, better,
> > > 
> > > b) teach pci-dma.c to reclaim/compact/migrate its way to success.
> > 
> > I guess (a) is the best option for 3.12.x and 3.13.x from maintenance
> > POV.
> 
> OK.
> 
> But then we'll just forget about it :(

We can wait for Johannes coming up with a better patch, too :)

But, the lower DMA bit mask without CONFIG_DMA_ZONE is bad, so the
current fix patch makes sense in anyway.

> >  I'll attach a fix patch in the bugzilla.  (Embedding a patch in
> > the bugzilla mail isn't good, right?)
> 
> Well, patches in bugzilla don't get applied.  I assume you'll also put
> a copy into the sound tree..

I'm going to apply it in sound git tree for 3.14-rc1.
Comment 21 Szőgyényi Gábor 2016-11-09 19:43:19 UTC
Please try this bug with latest kernel image.
Comment 22 dl1ksv 2016-11-10 11:42:47 UTC
Works for me with kernel 4.9.0-rc4 .
Comment 23 dl1ksv 2016-11-13 15:57:08 UTC
Meanwhile my system occasionally is dead after using the sound card, no mouse, no keyboard. I can't get any error messages.
After reseting my system it does not boot as important parameters like boot order etc. are overwritten !
Comment 24 Szőgyényi Gábor 2017-03-06 20:38:45 UTC
Anything new with this bug?