Bug 215595 - First 64KiB of DRAM must be reserved for PCIe devices with Cyclone V, feature stopped working.
Summary: First 64KiB of DRAM must be reserved for PCIe devices with Cyclone V, feature...
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: ARM (show other bugs)
Hardware: ARM Linux
: P1 normal
Assignee: linux-arm-kernel@lists.arm.linux.org.uk
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-02-12 10:36 UTC by Brian T. McKee
Modified: 2022-02-15 00:30 UTC (History)
0 users

See Also:
Kernel Version: 5.16.9
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Document that shows the first megabyte of SDRAM is reserved BOOT ROM (223.37 KB, image/png)
2022-02-12 18:41 UTC, Brian T. McKee
Details
Proposed patch to enable / force fdt priority over kernel in boot sequence for ARM architecture (1.38 KB, patch)
2022-02-14 19:12 UTC, Brian T. McKee
Details | Diff

Description Brian T. McKee 2022-02-12 10:36:28 UTC
Kernel 5.11.0 works. Kernel 5.16.9 does not.

This is SOCFPGA Cyclone V specific.

In order to get a PCIe NVMe running reliably I had to ensure that the first 64KiB of main memory was inaccessible by the system because there are second CPU jump vectors and probably other things located there.

I did this in the dts by creating an entry like this:

        reserved-memory {
                #address-cells = <1>;
                #size-cells = <1>;
                ranges;

                pcidma1@0 {
                        reg = <0x00000000 0x00010000>;
                        no-map;
                };
                lbsm_mem: lbsm_mem@3ffe0000 {
                        reg = <0x3fe00000 0x00200000>;
                        no-map;
                };
        };

Note: the end of my 1GiB memory is also reserved for other hardware functions.

With Kernel 5.11 the output at boot (or in dmesg) looked like this:

[    0.000000] Memory policy: Data cache writealloc
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000010000-0x000000002fffffff]
[    0.000000]   HighMem  [mem 0x0000000030000000-0x000000003fdfffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000010000-0x00000000001fffff]
[    0.000000]   node   0: [mem 0x0000000000200000-0x000000003fdfffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000010000-0x000000003fdfffff]
[    0.000000] On node 0 totalpages: 261616
[    0.000000]   Normal zone: 1536 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 196592 pages, LIFO batch:63
[    0.000000]   HighMem zone: 65024 pages, LIFO batch:15

However with 5.16.9 the output at boot (or in dmesg) looks like this:

[    0.000000] Memory policy: Data cache writealloc
[    0.000000] OF: fdt: Reserved memory: failed to reserve memory for node 'pcidma1@0': base 0x00000000, size 0 MiB
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000002fffffff]
[    0.000000]   HighMem  [mem 0x0000000030000000-0x000000003fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000003fdfffff]
[    0.000000]   node   0: [mem 0x000000003fe00000-0x000000003fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
[    0.000000] percpu: Embedded 14 pages/cpu s28428 r8192 d20724 u57344
[    0.000000] pcpu-alloc: s28428 r8192 d20724 u57344 alloc=14*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 260608

I tried increasing the size of the reserved-memory section in the dts to 512KiB, but that did not help.

Can someone advise me on what I need to do to get this to work. The NVMe device will not be stable if I cannot get this to work.

Thank you very much.
Comment 1 Brian T. McKee 2022-02-12 18:41:40 UTC
Created attachment 300443 [details]
Document that shows the first megabyte of SDRAM is reserved BOOT ROM

Although, I'm not certain why linux works without the reservation and I'm not sure why it only seems to affect the PCIe controller. My only thought is that the PCIe controller driver is attempting to use location 0x0 for DMA.
Comment 2 Brian T. McKee 2022-02-12 18:47:06 UTC
Here is the output of cat /proc/iomem that shows how the first 2 megs is mapped out in 5.11, but not in 5.16.9:

5.16.9:
root@cyclone5:~# cat /proc/iomem 
00000000-3fdfffff : System RAM
  00008000-00bfffff : Kernel code
  00d00000-00fdcbd7 : Kernel data
3fe00000-3fffffff : ff200000.wtec_se1 lbsm_mem@3ffe0000
c0000000-c00fffff : pcie@000100000
  c0000000-c00fffff : PCI Bus 0000:01
    c0000000-c0003fff : 0000:01:00.0
      c0000000-c0003fff : nvme
c0100000-c0103fff : c0000000.pcie Cra
c0200000-c020007f : c0200080.msi vector_slave
c0200080-c020008f : c0200080.msi csr
ff700000-ff701fff : ff700000.ethernet ethernet@ff700000
ff702000-ff703fff : ff702000.ethernet ethernet@ff702000
ff704000-ff704fff : ff704000.dwmmc0 dwmmc0@ff704000
ff705000-ff705fff : ff705000.spi spi@ff705000
ff706000-ff706fff : ff706000.fpgamgr fpgamgr@ff706000
ff709000-ff709fff : ff709000.gpio gpio@ff709000
ffb90000-ffb90003 : ff706000.fpgamgr fpgamgr@ff706000
ffc02000-ffc0201f : serial
ffc03000-ffc0301f : serial
ffc04000-ffc04fff : ffc04000.i2c i2c@ffc04000
ffc05000-ffc05fff : ffc05000.i2c i2c@ffc05000
ffd02000-ffd02fff : ffd02000.watchdog watchdog@ffd02000
ffd05000-ffd05fff : rstmgr
ffd08140-ffd08143 : ffd08140.l2-ecc
ffe01000-ffe01fff : pdma@ffe01000
ffff0000-ffffffff : ffff0000.sram sram@ffff0000


5.11.0:
root@cyclone5:~# cat /proc/iomem 
00200000-3fdfffff : System RAM
  00d00000-00ddf287 : Kernel data
3fe00000-3fffffff : ff200000.wtec_se1 lbsm_mem@3ffe0000
c0000000-c00fffff : pcie@000100000
  c0000000-c00fffff : PCI Bus 0000:01
    c0000000-c0003fff : 0000:01:00.0
      c0000000-c0003fff : nvme
c0100000-c0103fff : c0000000.pcie Cra
c0200000-c020007f : c0200080.msi vector_slave
c0200080-c020008f : c0200080.msi csr
ff700000-ff701fff : ff700000.ethernet ethernet@ff700000
ff702000-ff703fff : ff702000.ethernet ethernet@ff702000
ff704000-ff704fff : ff704000.dwmmc0 dwmmc0@ff704000
ff705000-ff705fff : ff705000.spi spi@ff705000
ff706000-ff706fff : ff706000.fpgamgr fpgamgr@ff706000
ff709000-ff709fff : ff709000.gpio gpio@ff709000
ffa00000-ffa00fff : ff705000.spi spi@ff705000
ffb90000-ffb90003 : ff706000.fpgamgr fpgamgr@ff706000
ffc02000-ffc0201f : serial
ffc03000-ffc0301f : serial
ffc04000-ffc04fff : ffc04000.i2c i2c@ffc04000
ffc05000-ffc05fff : ffc05000.i2c i2c@ffc05000
ffd02000-ffd02fff : ffd02000.watchdog watchdog@ffd02000
ffd05000-ffd05fff : rstmgr
ffd08140-ffd08143 : ffd08140.l2-ecc
ffe01000-ffe01fff : pdma@ffe01000
ffff0000-ffffffff : ffff0000.sram sram@ffff0000
Comment 3 Brian T. McKee 2022-02-12 23:47:48 UTC
I traced it with some pr_info statements and I can see that the kernel is allocating memory in that space before processing the OF file.

I have to figure out a way to make it process the OF file first, like it used to.
Comment 4 Brian T. McKee 2022-02-13 01:36:01 UTC
Is it unflattening the device tree to location 0x00000000?

That's what it looks like to me.
Comment 5 Brian T. McKee 2022-02-14 19:12:52 UTC
Created attachment 300453 [details]
Proposed patch to enable / force fdt priority over kernel in boot sequence for ARM architecture

This is the patch I developed to force my dts to override the kernel starting address to free up 2 MiB at the beginning of main memory.

I still do not understand why this memory needs to be reserved. I only know that the PCIe driver from intel for the Cyclone V does something with this memory, for some reason. It might be a silicon or driver bug for all I know.

The only thing I am sure of is: if I don't reserve this memory, the NVMe controller becomes flakey and errors are generated and partitions are corrupted.

It makes sense to me that the fdt should have priority over everything since it knows the hardware best.

I would appreciate feedback on the validity of doing things this way.

Thanks for your time.
Comment 6 Brian T. McKee 2022-02-15 00:30:20 UTC
I did a little testing. I have the first two megs reserved, so I used dd to copy /dev/zero to that address space. Results:

The jump vectors for the second CPU are there and they exist up to 0x8000. If I touch them, system go boom!

I can clear the memory from 8000 to 0x100000 (the first megabyte minus the first 32 KiB), no problem.

However if I write a zero to 0x100000 I get a segmentation fault. The question is: who is using that memory. The kernel isn't...

It is the PCIe controller.

I got this on the console when I tried to write to 0x00100000:
[  91.061733] Unable to handle kernel paging request at virtual address c0100000 
[  91.068926] [c0100000] *pgd=0001141e(bad) 
[  91.072935] Internal error: Oops: 80d [#1] SMP ARM 
[  91.077713] Modules linked in: 
[  91.080761] CPU: 0 PID: 406 Comm: dd Not tainted 5.16.9-wtec #1 
[  91.086661] Hardware name: Altera SOCFPGA 
[  91.090656] PC is at arm_copy_from_user+0x70/0x38c 
[  91.095447] LR is at 0x0 
[  91.097972] pc : [<c05350a4>]   lr : [<00000000>]   psr: 20070013 
[  91.104214] sp : c7353e6c ip : 00000000 fp : c7353ecc 
[  91.109417] r10: 00000400 r9 : c686e6c0 r8 : 00000000 
[  91.114621] r7 : 00000000 r6 : 00000000 r5 : 00000000 r4 : 00000000 
[  91.121121] r3 : 00000000 r2 : 00000360 r1 : 004ee020 r0 : c0100000 
[  91.127622] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user 
[  91.134731] Control: 10c5387d Table: 06b8c04a DAC: 00000055

That leading C on the address is not part of the address I was trying to talk to. Somehow that DRAM space is being mapped to PCIe space, even though it is not supposed to be.

Not so coincidentally, PCIe Cra is mapped to that location:

c0100000-c0103fff : c0000000.pcie Cra

I don't understand why, but clearly the PCIe driver is using that memory and IMO it shouldn't be.

So go figure. There should be a way to determine where the Cra is mapped in DRAM, but when I was trying to figure this out a year ago, I could not get mapping to work. I wonder if this is a bug in the hardware. It could be the FPGA is not configured correctly. Too bad there is little to no documentation on this.

Note You need to log in before you can comment on or make changes to this bug.