Distribution: Fermi [Redhat] Linux LTS Release 3.0.1 (Feynman) Hardware Environment: Hardware Environment:Dell Precision M60 laptop 1.8 Ghz Pentium M step 6, 1 GB RAM Software Environment: Simple C program measuring mem copy performance. ++ uname -r + cd /usr/src/linux-2.6.10-rc3-bk12 + sh scripts/ver_linux If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux ron.lap 2.6.10-rc3-bk12 #2 Sat Dec 18 20:10:41 CST 2004 i686 i686 i386 GNU/Linux Gnu C 3.2.3 Gnu make 3.79.1 binutils 2.14.90.0.4 util-linux 2.11y mount 2.11y module-init-tools 3.1-pre5 e2fsprogs 1.32 jfsutils 1.1.2 reiserfsprogs line reiser4progs line pcmcia-cs 3.1.31 quota-tools 3.09. PPP 2.4.1 isdn4k-utils 3.1pre4 nfs-utils 1.0.5 Linux C Library 2.3.2 Dynamic linker (ldd) 2.3.2 Procps 2.0.13 Net-tools 1.60 Kbd 1.08 Sh-utils 4.5.3 Modules Loaded pcspkr psmouse nls_iso8859_1 nls_cp437 mousedev ehci_hcd uhci_hcd usbcore + cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 13 model name : Intel(R) Pentium(R) M processor 1.80GHz stepping : 6 cpu MHz : 1794.542 cache size : 2048 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe est tm2 bogomips : 3555.32 + cat /proc/modules pcspkr 3336 0 - Live 0xf8871000 psmouse 22216 0 - Live 0xf8918000 nls_iso8859_1 3968 1 - Live 0xf8858000 nls_cp437 5632 1 - Live 0xf8855000 mousedev 11160 1 - Live 0xf886d000 ehci_hcd 30848 0 - Live 0xf8864000 uhci_hcd 33608 0 - Live 0xf885a000 usbcore 120008 3 ehci_hcd,uhci_hcd, Live 0xf8873000 + cat /proc/ioports 0000-001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 0376-0376 : ide1 03c0-03df : vga+ 03f6-03f6 : ide0 04d0-04d1 : pnp 00:01 0800-087f : 0000:00:1f.0 0800-0803 : PM1a_EVT_BLK 0804-0805 : PM1a_CNT_BLK 0806-0807 : pnp 00:02 0808-080b : PM_TMR 0820-0820 : PM2_CNT_BLK 0828-082f : GPE0_BLK 0860-087f : pnp 00:02 0880-08bf : 0000:00:1f.0 0880-08bf : pnp 00:02 08c0-08df : pnp 00:02 08e0-08e5 : ACPI CPU throttle 0900-097f : pnp 00:07 0cf8-0cff : PCI conf1 b800-b8ff : 0000:00:1f.5 bc40-bc7f : 0000:00:1f.5 bf20-bf3f : 0000:00:1d.2 bf20-bf3f : uhci_hcd bf40-bf5f : 0000:00:1d.1 bf40-bf5f : uhci_hcd bf80-bf9f : 0000:00:1d.0 bf80-bf9f : uhci_hcd bfa0-bfaf : 0000:00:1f.1 bfa0-bfa7 : ide0 bfa8-bfaf : ide1 c000-cfff : PCI Bus #01 ecf8-ecff : 0000:02:01.3 f400-f4fe : motherboard f400-f4fe : pnp 00:02 + cat /proc/iomem 00000000-0009efff : System RAM 0009f000-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000cf7ff : Video ROM 000f0000-000fffff : System ROM 00100000-3ffadfff : System RAM 00100000-003e4bdd : Kernel code 003e4bde-0057733f : Kernel data 3ffae000-3fffffff : reserved 40000000-400003ff : 0000:00:1f.1 40001000-40001fff : 0000:02:01.0 40002000-40002fff : 0000:02:01.1 d0000000-dfffffff : PCI Bus #01 d0000000-dfffffff : 0000:01:00.0 e0000000-e7ffffff : 0000:00:00.0 f4fff400-f4fff4ff : 0000:00:1f.5 f4fff800-f4fff9ff : 0000:00:1f.5 f4fffc00-f4ffffff : 0000:00:1d.7 f4fffc00-f4ffffff : ehci_hcd fafe8000-fafebfff : 0000:02:01.2 fafee000-fafeefff : 0000:02:03.0 fafef800-fafeffff : 0000:02:01.2 fafef800-fafeffff : ohci1394 faff0000-faffffff : 0000:02:00.0 faff0000-faffffff : tg3 fc000000-fdffffff : PCI Bus #01 fc000000-fcffffff : 0000:01:00.0 feda0000-fedfffff : reserved ffb00000-ffffffff : reserved + lspci -vvv 00:00.0 Host bridge: Intel Corp. 82855PM Processor to I/O Controller (rev 03) Subsystem: Dell: Unknown device 013f Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 0 Region 0: Memory at e0000000 (32-bit, prefetchable) [size=128M] Capabilities: [e4] #09 [4104] Capabilities: [a0] AGP version 2.0 Status: RQ=31 SBA+ 64bit- FW+ Rate=x1,x2,x4 Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none> 00:01.0 PCI bridge: Intel Corp. 82855PM Processor to AGP Controller (rev 03) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 Bus: primary=00, secondary=01, subordinate=01, sec-latency=32 I/O behind bridge: 0000c000-0000cfff Memory behind bridge: fc000000-fdffffff Prefetchable memory behind bridge: d0000000-dfffffff BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B- 00:1d.0 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 01) (prog-if 00 [UHCI]) Subsystem: Dell: Unknown device 013f Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin A routed to IRQ 11 Region 4: I/O ports at bf80 [size=32] 00:1d.1 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 01) (prog-if 00 [UHCI]) Subsystem: Dell: Unknown device 013f Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin B routed to IRQ 11 Region 4: I/O ports at bf40 [size=32] 00:1d.2 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 01) (prog-if 00 [UHCI]) Subsystem: Dell: Unknown device 013f Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin C routed to IRQ 11 Region 4: I/O ports at bf20 [size=32] 00:1d.7 USB Controller: Intel Corp. 82801DB/DBM (ICH4/ICH4-M) USB 2.0 EHCI Controller (rev 01) (prog-if 20 [EHCI]) Subsystem: Dell: Unknown device 013f Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin D routed to IRQ 11 Region 0: Memory at f4fffc00 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] #0a [2080] 00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev 81) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR+ Latency: 0 Bus: primary=00, secondary=02, subordinate=02, sec-latency=32 I/O behind bridge: 0000d000-0000efff Memory behind bridge: f6000000-fbffffff Prefetchable memory behind bridge: fff00000-000fffff BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B- 00:1f.0 ISA bridge: Intel Corp. 82801DBM LPC Interface Controller (rev 01) Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 00:1f.1 IDE interface: Intel Corp. 82801DBM (ICH4) Ultra ATA Storage Controller (rev 01) (prog-if 8a [Master SecP PriP]) Subsystem: Dell: Unknown device 013f Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at <ignored> Region 1: I/O ports at <ignored> Region 2: I/O ports at <ignored> Region 3: I/O ports at <ignored> Region 4: I/O ports at bfa0 [size=16] Region 5: Memory at 40000000 (32-bit, non-prefetchable) [size=1K] 00:1f.5 Multimedia audio controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01) Subsystem: Dell: Unknown device 013f Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin B routed to IRQ 11 Region 0: I/O ports at b800 [size=256] Region 1: I/O ports at bc40 [size=64] Region 2: Memory at f4fff800 (32-bit, non-prefetchable) [size=512] Region 3: Memory at f4fff400 (32-bit, non-prefetchable) [size=256] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 01:00.0 VGA compatible controller: nVidia Corporation NVIDIA Quadro FX 700 Go (rev a1) (prog-if 00 [VGA]) Subsystem: Dell: Unknown device 019b Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop+ ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (1250ns min, 250ns max) Interrupt: pin A routed to IRQ 11 Region 0: Memory at fc000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at d0000000 (32-bit, prefetchable) [size=256M] Expansion ROM at 80000000 [disabled] [size=128K] Capabilities: [60] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [44] AGP version 3.0 Status: RQ=31 SBA+ 64bit- FW+ Rate=x1,x2,x4 Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none> 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705M Gigabit Ethernet (rev 01) Subsystem: Dell: Unknown device 865d Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (16000ns min), cache line size 08 Interrupt: pin A routed to IRQ 11 Region 0: Memory at faff0000 (64-bit, non-prefetchable) [size=64K] Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- Address: fffffbfdfbff4bb8 Data: 7bf7 02:01.0 CardBus bridge: Texas Instruments: Unknown device ac47 (rev 01) Subsystem: Dell: Unknown device 013f Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin A routed to IRQ 255 Region 0: Memory at 40001000 (32-bit, non-prefetchable) [disabled] [size=4K] Bus: primary=02, secondary=03, subordinate=06, sec-latency=176 Memory window 0: 00000000-00000000 [disabled] (prefetchable) Memory window 1: 00000000-00000000 [disabled] (prefetchable) I/O window 0: 00000000-00000003 [disabled] I/O window 1: 00000000-00000003 [disabled] BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset+ 16bInt- PostWrite+ 16-bit legacy interface ports at 0001 02:01.1 CardBus bridge: Texas Instruments: Unknown device ac4a (rev 01) Subsystem: Dell: Unknown device 013f Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32, cache line size 08 Interrupt: pin A routed to IRQ 11 Region 0: Memory at 40002000 (32-bit, non-prefetchable) [size=4K] Bus: primary=02, secondary=07, subordinate=0a, sec-latency=176 Memory window 0: 00000000-00000000 (prefetchable) Memory window 1: 00000000-00000000 (prefetchable) I/O window 0: 00000000-00000003 I/O window 1: 00000000-00000003 BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset+ 16bInt- PostWrite- 16-bit legacy interface ports at 0001 02:01.2 FireWire (IEEE 1394): Texas Instruments: Unknown device 802b (prog-if 10 [OHCI]) Subsystem: Dell: Unknown device 013f Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (500ns min, 1000ns max), cache line size 08 Interrupt: pin A routed to IRQ 11 Region 0: Memory at fafef800 (32-bit, non-prefetchable) [size=2K] Region 1: Memory at fafe8000 (32-bit, non-prefetchable) [size=16K] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME+ 02:01.3 System peripheral: Texas Instruments: Unknown device 8204 Subsystem: Dell: Unknown device 013f Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Region 0: I/O ports at ecf8 [size=8] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 02:03.0 Network controller: Intel Corp. PRO/Wireless 2200BG (rev 05) Subsystem: Intel Corp.: Unknown device 2721 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (750ns min, 6000ns max), cache line size 08 Interrupt: pin A routed to IRQ 11 Region 0: Memory at fafee000 (32-bit, non-prefetchable) [size=4K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- + cat /proc/scsi/scsi Attached devices: + cat /proc/mtrr reg00: base=0x00000000 ( 0MB), size=1024MB: write-back, count=1 reg01: base=0xfeda0000 (4077MB), size= 128KB: write-through, count=1 Problem Description: The memory bandwith as tested by a simple program that copies 32bit locations from one mem buffer to another, decreases after wake after 1st suspend to RAM after a boot/reboot: Performance before 1st suspend (i.e. measured right after reboot): # loops bytes seconds bytes/second 4194304 256 2.100492 511185866.789113 2097152 512 2.073641 517805054.977469 1048576 1024 2.060252 521170092.380577 524288 2048 2.052698 523088030.088424 262144 4096 2.039247 526538376.692490 131072 8192 2.044709 525131799.203941 65536 16384 2.057255 521929371.181502 32768 32768 2.053823 522801539.867567 16384 65536 2.053944 522770772.074838 8192 131072 2.238565 479656315.227236 4096 262144 2.054797 522553740.550410 2048 524288 2.054639 522593942.682512 1024 1048576 2.088225 514188787.926278 512 2097152 2.257877 475553754.428407 256 4194304 2.819464 380831901.411452 128 8388608 3.121978 343929972.822267 64 16777216 3.165082 339246133.754594 32 33554432 3.157960 340011224.038280 16 67108864 2.926294 366928913.861524 Performance after 1st suspend: # loops bytes seconds bytes/second 4194304 256 2.100117 511277152.787996 2097152 512 2.073161 517924926.797557 1048576 1024 2.060612 521079098.412580 524288 2048 2.052631 523105103.105875 262144 4096 2.039196 526551550.912906 131072 8192 2.044657 525145209.305469 65536 16384 2.057808 521789078.527142 32768 32768 2.070937 518481123.648766 16384 65536 2.053861 522791890.406682 8192 131072 2.053784 522811493.143815 4096 262144 2.053972 522763611.648508 2048 524288 2.055298 522426383.691156 1024 1048576 2.181485 492206847.482478 512 2097152 2.566232 418411834.244989 256 4194304 3.184070 337223046.854247 128 8388608 3.103361 345993218.834314 64 16777216 3.290089 326356478.275219 32 33554432 3.364081 319178336.648871 16 67108864 3.293490 326019464.409720 Steps to reproduce: 1. boot or reboot 2. measure main memory bandwidth 3. software suspend: echo mem >/sys/power/state) 4. wake (via power button) 5. remeausure main memory bandwith and notice decrease in performance
The output for before and after the 1st sleep should have been: Performance before 1st suspend (i.e. measured right after reboot): # loops bytes seconds bytes/second 4194304 256 2.128207 504528854.566207 2097152 512 2.064953 519983640.231064 1048576 1024 2.038665 526688688.129248 524288 2048 2.030522 528800915.687797 262144 4096 2.027024 529713416.193157 131072 8192 2.027298 529641837.541972 65536 16384 2.064885 520000751.362871 32768 32768 2.050999 523521397.453337 16384 65536 2.050308 523697819.887343 8192 131072 2.050273 523706772.020141 4096 262144 2.050425 523667920.764058 2048 524288 2.051523 523387668.379623 1024 1048576 2.088559 514106494.729768 512 2097152 2.205559 486834320.262604 256 4194304 2.440570 439955370.196205 128 8388608 2.571250 417595232.034043 64 16777216 2.581927 415868379.934201 32 33554432 2.576516 416741778.530588 16 67108864 2.592133 414230984.782878 Performance after 1st suspend: # loops bytes seconds bytes/second 4194304 256 2.100117 511277152.787996 2097152 512 2.073161 517924926.797557 1048576 1024 2.060612 521079098.412580 524288 2048 2.052631 523105103.105875 262144 4096 2.039196 526551550.912906 131072 8192 2.044657 525145209.305469 65536 16384 2.057808 521789078.527142 32768 32768 2.070937 518481123.648766 16384 65536 2.053861 522791890.406682 8192 131072 2.053784 522811493.143815 4096 262144 2.053972 522763611.648508 2048 524288 2.055298 522426383.691156 1024 1048576 2.181485 492206847.482478 512 2097152 2.566232 418411834.244989 256 4194304 3.184070 337223046.854247 128 8388608 3.103361 345993218.834314 64 16777216 3.290089 326356478.275219 32 33554432 3.364081 319178336.648871 16 67108864 3.293490 326019464.409720 Big "I'm sorry" for the silly mistake of entering the wrong output (showing performance about the same before and after :( Here's the simple program used to produce the output: #include <stdio.h> /* stderr, stdout, fprintf, printf */ #include <stdint.h> /* uint8, uint */ #include <stdlib.h> /* malloc, strtol */ #include <sys/time.h> /* gettimeofday */ #include <errno.h> /* sys_errlist */ #include <getopt.h> /* getopt_long */ #include <linux/trace.h> /* TRACE */ int g_bufsiz=0x4000000; int g_loop_multiplier=16; int main( int argc , char *argv[] ) { struct timeval t0_s, t1_s; uint32_t *buf1, *buf2; int siz, xx; int loop, loops=1; double mark, curr, delta; buf1 = malloc( g_bufsiz ); buf2 = malloc( g_bufsiz ); for (xx=0; (xx<<2)<g_bufsiz; xx++) buf2[xx] = xx; for (siz = 0x100; siz<g_bufsiz; siz<<=1) loops<<=1; loops *= g_loop_multiplier; printf( "# loops bytes seconds bytes/second\n" ); for (siz = 0x100; siz<=g_bufsiz; ) { gettimeofday( &t0_s, NULL ); for (loop=0; loop<loops; loop++) { for (xx=0; (xx<<2)<siz; xx++) { buf1[xx] = buf2[xx]; } } gettimeofday( &t1_s, NULL ); curr = (double)t1_s.tv_usec/1000000; mark = (double)t0_s.tv_usec/1000000; curr += (double)t1_s.tv_sec; mark += (double)t0_s.tv_sec; delta = curr - mark; printf( "%7d %8d %f %f\n" , loops, siz, delta, ((double)loops*siz)/delta ); siz<<=1; loops>>=1; } return (0); } /* main */
this program access 128MB of RAM. What does vmstat say about the free memory when this program is running before and after suspend? If you run the program a 2nd time after resume, does it get the same answer?
Yes, the program I used (as an example) is simplistic and one should be carefull to have enough free memory. I reran the exact cases I ran before and vmstat did show that both before and after the sleep, I had lot of free memory: procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 0 979224 8312 23696 0 0 37 11 1011 40 7 0 90 2 Yes, when I run the program a 2nd time, I do get the same answer. Additionally, (or actually originally) I see the same decrease in main mem bandwith using the streams benchmark (Ref.http://www.cs.virginia.edu/stream/): before sleep: ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 5 microseconds. Each test below will take on the order of 31033 microseconds. (= 6206 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 918.3507 0.0349 0.0348 0.0354 Scale: 899.5326 0.0356 0.0356 0.0357 Add: 1156.2094 0.0415 0.0415 0.0416 Triad: 1152.0744 0.0417 0.0417 0.0417 after resume: ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 5 microseconds. Each test below will take on the order of 38527 microseconds. (= 7705 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 742.3733 0.0432 0.0431 0.0438 Scale: 727.7368 0.0440 0.0440 0.0440 Add: 934.2698 0.0514 0.0514 0.0514 Triad: 931.5138 0.0515 0.0515 0.0517 The streams benchmark (stream_d), I believe, lock pages. The results are also repeatable.
This possibly is caused by decreased CPU frequency. Please try load the cpufreq driver and scale CPU frequency to maxium, then retest it.
I do not know exactly what you mean by "load the cpufreq driver and scale CPU frequency to maximum". Perhaps you mean: cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq \ >/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed ? I doubt that it is CPU freq because the "cache" performance does not change, just the main memory ("bigger than cache") performance changes. I do see that before and after the sleep both /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq and /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq have 1800000, the max frequency for my processor. I went ahead and did the "cat" mentioned above anyway and there was no change. Again, I doubt it's anything to do with CPU freq (more like some memory controller setting??), but if you still want me to try the cpufreq driver thing (which I will be glad to do), then please tell me exactly what steps to take. Thanks.
You can do echo "performance" >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor to run at maximum frequency all the time.
Thanks for mentioning /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor, although I hope we are all in agreement that this has nothing to do with the problem. I did do it so that I could be 100% sure. Again, thanks for mentioning it because it was something I missed/forgot and seems it will be usefull; I've done some preliminary tests and seems I will be able to achieve better power savings -- it's another topic but I will investigate how thermal throtling effect power consumption.
Hi David, Len, and Venkatesh, I think the problem might be a chipset bug. At least with the version I have: Intel Corp. 82855PM Processor to I/O Controller (rev 03) there seems to be an issue with 333 Mhz memory support (it just can't do it?). I'm hoping you guys from Intel can confirm all this. So, just based on that, I'm thinking about asking Dell if they can update my main board - to one that has a 82855PM revision 21h (stepping B1). The decreased performance after suspend also appears to happen under windows; this is what leads me to believe it may just be a chipset bug. Maybe you guys can find a work-around (assuming I'm correct) but even so, I would like to get 333 Mhz performance. Can you verify the existance of any systems that do maintain memory bandwith performance after suspend? The results I get under window are with using SiSoftware Sandra Lite 2005.1.10.37 and the numbers before and after suspend are as follows: before after RAM Bandwidth Int Buff'd iSSE2: 2280 MB/s 1895 MB/s RAM Bandwidth Float Buff'd iSSE2: 2284 MB/s 1893 MB/s
The specification update for the 855PM agrees with you http://support.intel.com/design/chipsets/specupdt/25348802.pdf that PC2700 DDR 333 support was added at B1. But that doesn't explain why performance started fast, and then decreased after resume. Probably we need to dump out the configuration registers of the memory controller before and after the suspend/resume to find out exactly what is going on.
please attach (do not paste) the output of "lspci -xxx" from before and after the suspend.
Created attachment 4305 [details] lspci -xxx before suspend
Created attachment 4306 [details] lspci -xxx after suspend I've attached the output from "lspci -xxx" (before and after suspend) Prior to my previous post (Additional Comment #8) I did an analyzsis of the 82855PM registers (which contributed to my "bug" theory): register 7c DRC DRAM Controller Mode Register Ref. p.71 bit 28 changes from 1 to 0 DRAM Power-down disabled??? bits 6:4 change from 2 to 7 before: All CPU cycles to DRAM result in an all banks precharge command on the DRAM interface. after: Normal operation. Would running in "All Banks Pre-charge Enable" be better???? register b8 ATTBASE Aperture Translation Table Base Register changes from 36460000 to 36ba0000 Is this analysis correct? And what about posting to some other list to verify 100% that other 82855PM (rev 03) chipset can maintain performance after suspend????
Your anysis is quite right. Though I don't know the detail of the config register bits, but your problem are very likely related with the issue. OS generally doesn't touch PCI host controller's registers, I think it's BIOS's responsiblity. As you said, the issue exists in Win. I would think it's a BIOS bug. Test laptops from other vendors (should have different BIOSs) would be helpful.
To comment 12: 9c9 < 70: 03 03 00 00 00 00 00 00 00 00 02 2d 71 32 40 30 --- > 70: 03 03 00 00 00 00 00 00 00 00 02 2d 71 37 40 20 bit 0-7: 71 --> 71 bit 8-15: 32 --> 37 [bit 8-11: 2 -->7. (010: Refresh interval 7.8 sec --> 111 Reserved)] bit 24-31: 30 --> 20 why it is "bits 6:4 change from 2 to 7" Am I wrong?
Hi Luming, use: setpci -d 8086:3340 7c.l to see the byteswap (i.e. after suspend): # setpci -d 8086:3340 7c.l 20403771 But maybe there's not really suppose to be a byte swap? (I've done device work before, but this is my very first experience with these chipset registers) Hi David, As an aside, I have tested all the BIOS revisions that exist for my Dell Precision M60. Yes, I would like to be able to test other 82855PM systems. I think the best way to do this would be via the internet community. But I quite a bit weak at communicating with the internet community and was hoping you guys would be able to know the most efficient way of doing this; i.e. which news groups to post to.
I did a google search and came up with http://dev.gentoo.org/~brix/papers/X31/X31.html and emailed Henrik <brix@gentoo.org> and asked him to do a mem test and check out this page.
Here at fermi, there are a lot of Dell lap tops. I just had a colleague test, under windows, his Dell Inspiron 600m, which has the 82855PM rev 03 and he also see the same 17% performance decrease after resume from suspend (standby).
Test in Dell laptops can't tell us anything. They possibly have the same BIOS. We need test different vendors' laptops. I tested in several laptops (including HP nx5000, Toshiba M2, but not 82855PM based system), hostbridge's config register doesn't be changed (or changed 'status' register, but it's normal). That's why I suspect it's a BIOS error. But anyway, let's see more test results. As for the mail list, I think acpi-devel@lists.sourceforge.net is ok and it would be better with a highlight title.
OK. Anyway, because the same thing happens under windows, I don't think it's an ACPI4Linux problem so I'm kind of between lists as far as begging for help. And I appreciate your responses in helping me collect the information I have. David, are you saying the the HP nx5000 did have an 82855PM and the just the Toshiba M2 did not? And are you saying the the HP nx5000 had the same main memory bandwidth measurement before the 1st suspend as it did after the suspend? If so, was it an 82855PM rev 03 or rev 21? ... acpi-devel@lists.sourceforge.net; what do you mean by "highlight title"? I just figured out how to view that list archive via the web. David, Could you please post a "highlight title" measure there and I will watch for it? You can mention the "steps to reproduce the (potential) problem": Steps to reproduce: 1a. verify that the chipset is 82855PM and note the rev (03 or 21) 1b. boot or reboot 2. measure main memory bandwidth (prior to the 1st suspend after reboot) 3. software suspend (or standby under windows) 4. wake (via power button) 5. remeausure main memory bandwith and (potentially) notice decrease in performance. Note that L1/L2 cache performance does not decrease. You could also ref. (not) bug page: http://bugzilla.kernel.org/show_bug.cgi?id=3918 Thanks.
No, I haven't laptop with 855PM. What I said highlight just means to get people's attention :). I would like add one line in your list: compare 'lspci -xxx' output from before/after suspend/resume.
OK, since highlight is just getting people's attention, then I went ahead and attempted an email to acpi-devel@lists.sourceforge.net and included the request for lspci -xxx info.
Please use setpci to restore the original vaue after S3 resume, and re-test memory bandwidth.
>Please use setpci to restore the original vaue after S3 resume, Changing (or even reading some registers) host controller's config register in runtime possibly cause severe impact. Please don't do it unless you really know what you are doing.
I just want to see the result of resote original Refresh interval, which is {bit 8-15: 32 --> 37 [bit 8-11: 2 -->7. (010: Refresh interval 7.8 sec --> 111 Reserved)]}
Putting register 7c back to the pre suspend value solves the problem. I went ahead and wrote that whole register: xx=`setpci -d 8086:3340 7c.l` if [ $xx = 20403771 ];then echo "20403771 is the value of register 7c that makes main mem slow" echo "main mem is faster with the reboot value of 30403271" #setpci -d 8086:3340 7c.l=20403771 # slow setpci -d 8086:3340 7c.l=30403271 # fast fi (confession: I was still thinking is was bits 4-6, then went over the bits again and finally see that it is bits 8-11 as it appears in "7c.l=30403271 # fast") Thanks for pushing me to try this. So this (non)bug is closed! Thanks to all.
This is expected result. What we want to know is if it's a BIOS bug. But now we know it is. Yes, OS can workaround this issue, but I think we can't make a generic solution, save/restore hostbridge's config space is dangerous. I would suggest you report the bug to Dell as a BIOS bug.
I will do so. Thanks
Ron, can you post your BIOS version number to this issue please? Thanks, Matt
Hi Matt, A07. I also reloaded/tried A03 through A06 and possibly A01 and A02 also. When I was trying the older revs, it was all the same day (several days ago) and then I went back to A07. I checked today and A07 is the latest. Thanks, Ron
Per Dell BIOS team, this has been fixed internally already. New BIOSes for each affected platform will release on support.dell.com "soon" which include the fix. The Inspiron 600m BIOS A15 will be the first release with this fixed in BIOS. Thanks, Matt