Bug 68221
Summary: | snd-ice1712 does not work for kernel 3.12.x any longer | ||
---|---|---|---|
Product: | Memory Management | Reporter: | dl1ksv |
Component: | Page Allocator | Assignee: | Andrew Morton (akpm) |
Status: | NEW --- | ||
Severity: | normal | CC: | szg00000, tiwai |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.12.x | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
kernel config of my kernel 3.12.6
Patch to select CONFIG_ZONE_DMA |
Description
dl1ksv
2014-01-06 16:48:49 UTC
There is no code change at all in sound/pci/ice1712/* and sound/core/* regarding snd-ice1712 driver between 3.11 and later kernels, so it cannot be the reason. As you noticed, the problem is likely the failed buffer allocation. The driver couldn't pre-allocate the buffer but didn't stop. It's a valid behavior since it's no fatal error, per se. (But we could show some warning, at least...) The error code -12 by echo 256 indicates that you couldn't allocate the continuous memory of the given size. (The error code -22 means an invalid value, which is true for the given value 1.) That being said, the culprit is outside the sound driver. Possibly some changes in memory allocator, or any other driver grabs too big pages before the sound driver. In anyway, please try to bisect. (In reply to Takashi Iwai from comment #1) > There is no code change at all in sound/pci/ice1712/* and sound/core/* > regarding snd-ice1712 driver between 3.11 and later kernels, so it cannot be > the reason. > > As you noticed, the problem is likely the failed buffer allocation. The > driver couldn't pre-allocate the buffer but didn't stop. It's a valid > behavior since it's no fatal error, per se. (But we could show some > warning, at least...) > > The error code -12 by echo 256 indicates that you couldn't allocate the > continuous memory of the given size. (The error code -22 means an invalid > value, which is true for the given value 1.) > > That being said, the culprit is outside the sound driver. Possibly some > changes in memory allocator, or any other driver grabs too big pages before > the sound driver. In anyway, please try to bisect. This is the result of my bisect: # first bad commit: [81c0a2bb515fd4daae8cab64352877480792b515] mm: page_alloc: fair zone allocator policy Some further information: I use 8GB of memory cat /proc/meminfo MemTotal: 8155528 kB MemFree: 7240716 kB Buffers: 1888 kB Cached: 422336 kB SwapCached: 0 kB Active: 417112 kB Inactive: 384748 kB Active(anon): 378084 kB Inactive(anon): 5000 kB Active(file): 39028 kB Inactive(file): 379748 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 8388600 kB SwapFree: 8388600 kB Dirty: 1260 kB Writeback: 0 kB AnonPages: 377636 kB Mapped: 87212 kB Shmem: 5448 kB Slab: 40680 kB SReclaimable: 19404 kB SUnreclaim: 21276 kB KernelStack: 2720 kB PageTables: 17432 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 12466364 kB Committed_AS: 1375628 kB VmallocTotal: 34359738367 kB VmallocUsed: 312308 kB VmallocChunk: 34359406592 kB HardwareCorrupted: 0 kB DirectMap4k: 55464 kB DirectMap2M: 3063808 kB DirectMap1G: 5242880 kB Thanks. Could you confirm that the problem still exists on the latest 3.13-rc kernel, just to be sure? Problem remains with kernel 3.13.0-rc7 I tested 3.12.x on a machine with an ice1712 board, and the buffer allocation works there. I suspect it depends on the kernel config. Could you attach your kernel config? Created attachment 121271 [details]
kernel config of my kernel 3.12.6
Thanks. It seems that the behavior depends on CONFIG_ZONE_DMA. Could you set CONFIG_ZONE_DMA=y and see whether it works now? (CONFIG_BOUNCE seems irrelevant.) Yes, that option did it. But I wonder as in 3.11.x it works witout this option set. Nevertheless now it works. Many thanks for your assistance. Yes, this is a regression in the recent kernel, but people didn't notice it since usually CONFIG_ZONE_DMA is enabled. I'll change the component to MM and reassign someone else who has a better clue :) In short, dma_alloc_coherent() with a high order fails since 3.12 when CONFIG_DMA_ZONE isn't set. It could be worked around by disabling fair zone allocation, too, e.g. --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2391,10 +2391,12 @@ static void prepare_slowpath(gfp_t gfp_mask, unsigned int order, */ if (!zone_local(preferred_zone, zone)) continue; +#if 0 mod_zone_page_state(zone, NR_ALLOC_BATCH, high_wmark_pages(zone) - low_wmark_pages(zone) - zone_page_state(zone, NR_ALLOC_BATCH)); +#endif } } Another note: I found that the problem is related with DMA mask set by the driver. This device needs 28bit DMA mask, so it calls pci_set_dma_mask() and pci_set_consistent_dma_mask() with DMA_BIT_MASK(28). Without these calls, the page allocation succeeded. (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Thu, 09 Jan 2014 15:38:04 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=68221 Johannes, could you have a headscratch over this one please? fair-zone-allocator-policy is causing an audio driver high-order page allocation failure, but only when CONFIG_DMA_ZONE=n and pci_set_dma_mask(DMA_BIT_MASK(28)). btw, this was harder to debug than it needed to be - we didn't get the usual page-allocation-failure warning, presumably because snd_malloc_dev_pages() uses __GFP_NOWARN. Perhaps we should undo that, or teach snd_malloc_dev_pages() to emit its own warning under suitable circumstances? > --- Comment #10 from Takashi Iwai <tiwai@suse.de> --- > In short, dma_alloc_coherent() with a high order fails since 3.12 when > CONFIG_DMA_ZONE isn't set. It could be worked around by disabling fair zone > allocation, too, e.g. > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2391,10 +2391,12 @@ static void prepare_slowpath(gfp_t gfp_mask, unsigned > int order, > */ > if (!zone_local(preferred_zone, zone)) > continue; > +#if 0 > mod_zone_page_state(zone, NR_ALLOC_BATCH, > high_wmark_pages(zone) - > low_wmark_pages(zone) - > zone_page_state(zone, NR_ALLOC_BATCH)); > +#endif > } > } > > -- > You are receiving this mail because: > You are the assignee for the bug. On Thu, Jan 09, 2014 at 01:26:46PM -0800, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Thu, 09 Jan 2014 15:38:04 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=68221 > > Johannes, could you have a headscratch over this one please? > fair-zone-allocator-policy is causing an audio driver high-order page > allocation failure, but only when CONFIG_DMA_ZONE=n and > pci_set_dma_mask(DMA_BIT_MASK(28)). I have a hard time finding the exact path from the driver to the page allocator, but I think we end up in x86's dma_generic_alloc_coherent. Without a DMA zone, that function simply allocates DMA32 memory and hopes that the result happens to be within the 28 bit address limit. No reclaim, no compaction, just a one-off crapshoot. As the DMA32 zone is filling up, the odds of that succeeding dwindle drastically. The fair zone allocator expedites this by filling all zones evenly instead of the Normal zone first, but it does not make a difference anymore after a bit of uptime. The DMA zone is the only thing that provides a reasonable chance for such allocations to succeed, so any driver that has addressing restrictions below 32 bit should select CONFIG_ZONE_DMA. > btw, this was harder to debug than it needed to be - we didn't get the > usual page-allocation-failure warning, presumably because > snd_malloc_dev_pages() uses __GFP_NOWARN. Perhaps we should undo that, > or teach snd_malloc_dev_pages() to emit its own warning under suitable > circumstances? If my analysis is correct, it's not the page allocator that fails but the dma layer that filters inadequate allocations and returns NULL. On Thu, 9 Jan 2014 18:20:33 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote: > On Thu, Jan 09, 2014 at 01:26:46PM -0800, Andrew Morton wrote: > > > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Thu, 09 Jan 2014 15:38:04 +0000 bugzilla-daemon@bugzilla.kernel.org > wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=68221 > > > > Johannes, could you have a headscratch over this one please? > > fair-zone-allocator-policy is causing an audio driver high-order page > > allocation failure, but only when CONFIG_DMA_ZONE=n and > > pci_set_dma_mask(DMA_BIT_MASK(28)). > > I have a hard time finding the exact path from the driver to the page > allocator, but I think we end up in x86's dma_generic_alloc_coherent. > > Without a DMA zone, that function simply allocates DMA32 memory and > hopes that the result happens to be within the 28 bit address limit. > No reclaim, no compaction, just a one-off crapshoot. > > As the DMA32 zone is filling up, the odds of that succeeding dwindle > drastically. The fair zone allocator expedites this by filling all > zones evenly instead of the Normal zone first, but it does not make a > difference anymore after a bit of uptime. > > The DMA zone is the only thing that provides a reasonable chance for > such allocations to succeed, so any driver that has addressing > restrictions below 32 bit should select CONFIG_ZONE_DMA. Ah, yes, right, what a disaster. I dimly remember deciding a long time ago that the sane way to honour pci_set_dma_mask() was for the page allocator to implement separate zones for DMA_BIT_MASK(31), DMA_BIT_MASK(30), ..., DMA_BIT_MASK(13). That didn't happen ;) So what to do? a) I guess ZONE_DMA corresponds to DMA_BIT_MASK(24) and should be suitable for most pci_set_dma_mask() users. So drivers which set such masks should depend on or select CONFIG_DMA_ZONE. Or, better, b) teach pci-dma.c to reclaim/compact/migrate its way to success. At Thu, 9 Jan 2014 13:26:46 -0800,
Andrew Morton wrote:
>
> btw, this was harder to debug than it needed to be - we didn't get the
> usual page-allocation-failure warning, presumably because
> snd_malloc_dev_pages() uses __GFP_NOWARN. Perhaps we should undo that,
> or teach snd_malloc_dev_pages() to emit its own warning under suitable
> circumstances?
A single allocation failure can be non-fatal, as we have a fallback to
reduce the size and retry. But it'd be good to show the warning if
the allocation fails completely, yes.
At Thu, 9 Jan 2014 18:20:33 -0500,
Johannes Weiner wrote:
>
> On Thu, Jan 09, 2014 at 01:26:46PM -0800, Andrew Morton wrote:
> >
> > (switched to email. Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> >
> > On Thu, 09 Jan 2014 15:38:04 +0000 bugzilla-daemon@bugzilla.kernel.org
> wrote:
> >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=68221
> >
> > Johannes, could you have a headscratch over this one please?
> > fair-zone-allocator-policy is causing an audio driver high-order page
> > allocation failure, but only when CONFIG_DMA_ZONE=n and
> > pci_set_dma_mask(DMA_BIT_MASK(28)).
>
> I have a hard time finding the exact path from the driver to the page
> allocator, but I think we end up in x86's dma_generic_alloc_coherent.
>
> Without a DMA zone, that function simply allocates DMA32 memory and
> hopes that the result happens to be within the 28 bit address limit.
> No reclaim, no compaction, just a one-off crapshoot.
>
> As the DMA32 zone is filling up, the odds of that succeeding dwindle
> drastically. The fair zone allocator expedites this by filling all
> zones evenly instead of the Normal zone first, but it does not make a
> difference anymore after a bit of uptime.
>
> The DMA zone is the only thing that provides a reasonable chance for
> such allocations to succeed, so any driver that has addressing
> restrictions below 32 bit should select CONFIG_ZONE_DMA.
Yes, forcibly selecting CONFIG_ZONE_DMA would be one feasible option,
which can be put to stable kernels, too.
OTOH, I wonder why ZONE_DMA32 was filled up so quickly. I tested a
different condition, 6GB machine with 31bit DMA mask, and it fails the
allocation, too, at boot time. It shouldn't be too hungry at such an
early stage.
Actually, if we had the commit "mm: page_alloc: exclude unreclaimable
allocations from zone fairness policy", this problem could be worked
around, too. Is fairness too strict?
At Thu, 9 Jan 2014 15:30:08 -0800,
Andrew Morton wrote:
>
> On Thu, 9 Jan 2014 18:20:33 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> > On Thu, Jan 09, 2014 at 01:26:46PM -0800, Andrew Morton wrote:
> > >
> > > (switched to email. Please respond via emailed reply-to-all, not via the
> > > bugzilla web interface).
> > >
> > > On Thu, 09 Jan 2014 15:38:04 +0000 bugzilla-daemon@bugzilla.kernel.org
> wrote:
> > >
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=68221
> > >
> > > Johannes, could you have a headscratch over this one please?
> > > fair-zone-allocator-policy is causing an audio driver high-order page
> > > allocation failure, but only when CONFIG_DMA_ZONE=n and
> > > pci_set_dma_mask(DMA_BIT_MASK(28)).
> >
> > I have a hard time finding the exact path from the driver to the page
> > allocator, but I think we end up in x86's dma_generic_alloc_coherent.
> >
> > Without a DMA zone, that function simply allocates DMA32 memory and
> > hopes that the result happens to be within the 28 bit address limit.
> > No reclaim, no compaction, just a one-off crapshoot.
> >
> > As the DMA32 zone is filling up, the odds of that succeeding dwindle
> > drastically. The fair zone allocator expedites this by filling all
> > zones evenly instead of the Normal zone first, but it does not make a
> > difference anymore after a bit of uptime.
> >
> > The DMA zone is the only thing that provides a reasonable chance for
> > such allocations to succeed, so any driver that has addressing
> > restrictions below 32 bit should select CONFIG_ZONE_DMA.
>
> Ah, yes, right, what a disaster. I dimly remember deciding a long time
> ago that the sane way to honour pci_set_dma_mask() was for the page
> allocator to implement separate zones for DMA_BIT_MASK(31),
> DMA_BIT_MASK(30), ..., DMA_BIT_MASK(13). That didn't happen ;)
>
> So what to do?
>
> a) I guess ZONE_DMA corresponds to DMA_BIT_MASK(24) and should be
> suitable for most pci_set_dma_mask() users. So drivers which set
> such masks should depend on or select CONFIG_DMA_ZONE. Or, better,
>
> b) teach pci-dma.c to reclaim/compact/migrate its way to success.
I guess (a) is the best option for 3.12.x and 3.13.x from maintenance
POV. I'll attach a fix patch in the bugzilla. (Embedding a patch in
the bugzilla mail isn't good, right?)
Created attachment 121531 [details]
Patch to select CONFIG_ZONE_DMA
On Fri, 10 Jan 2014 16:20:48 +0100 Takashi Iwai <tiwai@suse.de> wrote: > > So what to do? > > > > a) I guess ZONE_DMA corresponds to DMA_BIT_MASK(24) and should be > > suitable for most pci_set_dma_mask() users. So drivers which set > > such masks should depend on or select CONFIG_DMA_ZONE. Or, better, > > > > b) teach pci-dma.c to reclaim/compact/migrate its way to success. > > I guess (a) is the best option for 3.12.x and 3.13.x from maintenance > POV. OK. But then we'll just forget about it :( > I'll attach a fix patch in the bugzilla. (Embedding a patch in > the bugzilla mail isn't good, right?) Well, patches in bugzilla don't get applied. I assume you'll also put a copy into the sound tree.. At Fri, 10 Jan 2014 12:59:11 -0800, Andrew Morton wrote: > > On Fri, 10 Jan 2014 16:20:48 +0100 Takashi Iwai <tiwai@suse.de> wrote: > > > > So what to do? > > > > > > a) I guess ZONE_DMA corresponds to DMA_BIT_MASK(24) and should be > > > suitable for most pci_set_dma_mask() users. So drivers which set > > > such masks should depend on or select CONFIG_DMA_ZONE. Or, better, > > > > > > b) teach pci-dma.c to reclaim/compact/migrate its way to success. > > > > I guess (a) is the best option for 3.12.x and 3.13.x from maintenance > > POV. > > OK. > > But then we'll just forget about it :( We can wait for Johannes coming up with a better patch, too :) But, the lower DMA bit mask without CONFIG_DMA_ZONE is bad, so the current fix patch makes sense in anyway. > > I'll attach a fix patch in the bugzilla. (Embedding a patch in > > the bugzilla mail isn't good, right?) > > Well, patches in bugzilla don't get applied. I assume you'll also put > a copy into the sound tree.. I'm going to apply it in sound git tree for 3.14-rc1. Please try this bug with latest kernel image. Works for me with kernel 4.9.0-rc4 . Meanwhile my system occasionally is dead after using the sound card, no mouse, no keyboard. I can't get any error messages. After reseting my system it does not boot as important parameters like boot order etc. are overwritten ! Anything new with this bug? |