Hi, i use > uname -a Linux lade 3.9.0-rc4-iwlwifi #3 SMP Sun Mar 24 17:38:52 CET 2013 x86_64 GNU/Linux without any patches, and sometimes i get the error below. i have 2 access points, and i believe the problem is created while roaming between the access points. greetings [44524.547565] iwlwifi 0000:03:00.0: Current SW read_ptr 107 write_ptr 194 [44524.547601] iwl data: 00000000: f9 ff e1 ff ff ff ff ff 9c 03 20 e7 ff fb ff ff .......... ..... [44524.547621] iwlwifi 0000:03:00.0: FH TRBs(0) = 0x00000000 [44524.547641] iwlwifi 0000:03:00.0: FH TRBs(1) = 0x801020de [44524.547660] iwlwifi 0000:03:00.0: FH TRBs(2) = 0x00000000 [44524.547680] iwlwifi 0000:03:00.0: FH TRBs(3) = 0x803000d9 [44524.547700] iwlwifi 0000:03:00.0: FH TRBs(4) = 0x00000000 [44524.547719] iwlwifi 0000:03:00.0: FH TRBs(5) = 0x00000000 [44524.547739] iwlwifi 0000:03:00.0: FH TRBs(6) = 0x00000000 [44524.547759] iwlwifi 0000:03:00.0: FH TRBs(7) = 0x00709039 [44524.547830] iwlwifi 0000:03:00.0: Q 0 is active and mapped to fifo 3 ra_tid 0x0000 [218,218] [44524.547899] iwlwifi 0000:03:00.0: Q 1 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [44524.547968] iwlwifi 0000:03:00.0: Q 2 is active and mapped to fifo 1 ra_tid 0x0000 [223,223] [44524.548037] iwlwifi 0000:03:00.0: Q 3 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [44524.548106] iwlwifi 0000:03:00.0: Q 4 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [44524.548175] iwlwifi 0000:03:00.0: Q 5 is active and mapped to fifo 4 ra_tid 0x0000 [0,0] [44524.548243] iwlwifi 0000:03:00.0: Q 6 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [44524.548312] iwlwifi 0000:03:00.0: Q 7 is active and mapped to fifo 5 ra_tid 0x0000 [0,0] [44524.548381] iwlwifi 0000:03:00.0: Q 8 is active and mapped to fifo 4 ra_tid 0x0000 [0,0] [44524.548450] iwlwifi 0000:03:00.0: Q 9 is active and mapped to fifo 7 ra_tid 0x0000 [58,58] [44524.548550] iwlwifi 0000:03:00.0: Q 10 is active and mapped to fifo 5 ra_tid 0x0000 [0,0] [44524.548629] iwlwifi 0000:03:00.0: Q 11 is active and mapped to fifo 1 ra_tid 0x0000 [52,52] [44524.548698] iwlwifi 0000:03:00.0: Q 12 is active and mapped to fifo 1 ra_tid 0x0000 [107,194] [44524.548767] iwlwifi 0000:03:00.0: Q 13 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44524.548837] iwlwifi 0000:03:00.0: Q 14 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44524.548905] iwlwifi 0000:03:00.0: Q 15 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44524.548975] iwlwifi 0000:03:00.0: Q 16 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44524.549043] iwlwifi 0000:03:00.0: Q 17 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44524.549112] iwlwifi 0000:03:00.0: Q 18 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44524.549181] iwlwifi 0000:03:00.0: Q 19 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44526.548929] iwlwifi 0000:03:00.0: fail to flush all tx fifo queues Q 12 [44526.548940] iwlwifi 0000:03:00.0: Current SW read_ptr 107 write_ptr 194 [44526.548975] iwl data: 00000000: f9 ff e1 ff ff ff ff ff 9c 03 20 e7 ff fb ff ff .......... ..... [44526.548996] iwlwifi 0000:03:00.0: FH TRBs(0) = 0x00000000 [44526.549017] iwlwifi 0000:03:00.0: FH TRBs(1) = 0x801020de [44526.549036] iwlwifi 0000:03:00.0: FH TRBs(2) = 0x00000000 [44526.549056] iwlwifi 0000:03:00.0: FH TRBs(3) = 0x803000da [44526.549076] iwlwifi 0000:03:00.0: FH TRBs(4) = 0x00000000 [44526.549095] iwlwifi 0000:03:00.0: FH TRBs(5) = 0x00000000 [44526.549115] iwlwifi 0000:03:00.0: FH TRBs(6) = 0x00000000 [44526.549135] iwlwifi 0000:03:00.0: FH TRBs(7) = 0x00709041 [44526.549205] iwlwifi 0000:03:00.0: Q 0 is active and mapped to fifo 3 ra_tid 0x0000 [219,219] [44526.549274] iwlwifi 0000:03:00.0: Q 1 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [44526.549343] iwlwifi 0000:03:00.0: Q 2 is active and mapped to fifo 1 ra_tid 0x0000 [223,223] [44526.549412] iwlwifi 0000:03:00.0: Q 3 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [44526.549481] iwlwifi 0000:03:00.0: Q 4 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [44526.549550] iwlwifi 0000:03:00.0: Q 5 is active and mapped to fifo 4 ra_tid 0x0000 [0,0] [44526.549619] iwlwifi 0000:03:00.0: Q 6 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [44526.549688] iwlwifi 0000:03:00.0: Q 7 is active and mapped to fifo 5 ra_tid 0x0000 [0,0] [44526.549756] iwlwifi 0000:03:00.0: Q 8 is active and mapped to fifo 4 ra_tid 0x0000 [0,0] [44526.549826] iwlwifi 0000:03:00.0: Q 9 is active and mapped to fifo 7 ra_tid 0x0000 [66,66] [44526.549918] iwlwifi 0000:03:00.0: Q 10 is active and mapped to fifo 5 ra_tid 0x0000 [0,0] [44526.549992] iwlwifi 0000:03:00.0: Q 11 is active and mapped to fifo 1 ra_tid 0x0000 [52,52] [44526.550080] iwlwifi 0000:03:00.0: Q 12 is active and mapped to fifo 1 ra_tid 0x0000 [107,194] [44526.550149] iwlwifi 0000:03:00.0: Q 13 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44526.550220] iwlwifi 0000:03:00.0: Q 14 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44526.550293] iwlwifi 0000:03:00.0: Q 15 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44526.550362] iwlwifi 0000:03:00.0: Q 16 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44526.550431] iwlwifi 0000:03:00.0: Q 17 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44526.550500] iwlwifi 0000:03:00.0: Q 18 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44526.550569] iwlwifi 0000:03:00.0: Q 19 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44528.550438] iwlwifi 0000:03:00.0: fail to flush all tx fifo queues Q 12 [44528.550451] iwlwifi 0000:03:00.0: Current SW read_ptr 107 write_ptr 194 [44528.550487] iwl data: 00000000: f9 ff e1 ff ff ff ff ff 9c 03 20 e7 ff fb ff ff .......... ..... [44528.550509] iwlwifi 0000:03:00.0: FH TRBs(0) = 0x00000000 [44528.550529] iwlwifi 0000:03:00.0: FH TRBs(1) = 0x801020de [44528.550549] iwlwifi 0000:03:00.0: FH TRBs(2) = 0x00000000 [44528.550568] iwlwifi 0000:03:00.0: FH TRBs(3) = 0x803000db [44528.550588] iwlwifi 0000:03:00.0: FH TRBs(4) = 0x00000000 [44528.550608] iwlwifi 0000:03:00.0: FH TRBs(5) = 0x00000000 [44528.550628] iwlwifi 0000:03:00.0: FH TRBs(6) = 0x00000000 [44528.550648] iwlwifi 0000:03:00.0: FH TRBs(7) = 0x00709047 [44528.550718] iwlwifi 0000:03:00.0: Q 0 is active and mapped to fifo 3 ra_tid 0x0000 [220,220] [44528.550787] iwlwifi 0000:03:00.0: Q 1 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [44528.550856] iwlwifi 0000:03:00.0: Q 2 is active and mapped to fifo 1 ra_tid 0x0000 [223,223] [44528.550925] iwlwifi 0000:03:00.0: Q 3 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [44528.550994] iwlwifi 0000:03:00.0: Q 4 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [44528.551063] iwlwifi 0000:03:00.0: Q 5 is active and mapped to fifo 4 ra_tid 0x0000 [0,0] [44528.551132] iwlwifi 0000:03:00.0: Q 6 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [44528.551201] iwlwifi 0000:03:00.0: Q 7 is active and mapped to fifo 5 ra_tid 0x0000 [0,0] [44528.551269] iwlwifi 0000:03:00.0: Q 8 is active and mapped to fifo 4 ra_tid 0x0000 [0,0] [44528.551376] iwlwifi 0000:03:00.0: Q 9 is active and mapped to fifo 7 ra_tid 0x0000 [72,72] [44528.551459] iwlwifi 0000:03:00.0: Q 10 is active and mapped to fifo 5 ra_tid 0x0000 [0,0] [44528.551528] iwlwifi 0000:03:00.0: Q 11 is active and mapped to fifo 1 ra_tid 0x0000 [52,52] [44528.551597] iwlwifi 0000:03:00.0: Q 12 is active and mapped to fifo 1 ra_tid 0x0000 [107,194] [44528.551666] iwlwifi 0000:03:00.0: Q 13 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44528.551735] iwlwifi 0000:03:00.0: Q 14 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44528.551804] iwlwifi 0000:03:00.0: Q 15 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44528.551873] iwlwifi 0000:03:00.0: Q 16 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44528.551942] iwlwifi 0000:03:00.0: Q 17 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44528.552010] iwlwifi 0000:03:00.0: Q 18 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44528.552079] iwlwifi 0000:03:00.0: Q 19 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44530.551750] iwlwifi 0000:03:00.0: fail to flush all tx fifo queues Q 12 [44530.551761] iwlwifi 0000:03:00.0: Current SW read_ptr 107 write_ptr 194 [44530.551796] iwl data: 00000000: f9 ff e1 ff ff ff ff ff 9c 03 20 e7 ff fb ff ff .......... ..... [44530.551817] iwlwifi 0000:03:00.0: FH TRBs(0) = 0x00000000 [44530.551838] iwlwifi 0000:03:00.0: FH TRBs(1) = 0x801020de [44530.551857] iwlwifi 0000:03:00.0: FH TRBs(2) = 0x00000000 [44530.551878] iwlwifi 0000:03:00.0: FH TRBs(3) = 0x803000dc [44530.551897] iwlwifi 0000:03:00.0: FH TRBs(4) = 0x00000000 [44530.551917] iwlwifi 0000:03:00.0: FH TRBs(5) = 0x00000000 [44530.551937] iwlwifi 0000:03:00.0: FH TRBs(6) = 0x00000000 [44530.551956] iwlwifi 0000:03:00.0: FH TRBs(7) = 0x0070904f [44530.552027] iwlwifi 0000:03:00.0: Q 0 is active and mapped to fifo 3 ra_tid 0x0000 [221,221] [44530.552097] iwlwifi 0000:03:00.0: Q 1 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [44530.552166] iwlwifi 0000:03:00.0: Q 2 is active and mapped to fifo 1 ra_tid 0x0000 [223,223] [44530.552235] iwlwifi 0000:03:00.0: Q 3 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [44530.552303] iwlwifi 0000:03:00.0: Q 4 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [44530.552372] iwlwifi 0000:03:00.0: Q 5 is active and mapped to fifo 4 ra_tid 0x0000 [0,0] [44530.552441] iwlwifi 0000:03:00.0: Q 6 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [44530.552510] iwlwifi 0000:03:00.0: Q 7 is active and mapped to fifo 5 ra_tid 0x0000 [0,0] [44530.552579] iwlwifi 0000:03:00.0: Q 8 is active and mapped to fifo 4 ra_tid 0x0000 [0,0] [44530.552648] iwlwifi 0000:03:00.0: Q 9 is active and mapped to fifo 7 ra_tid 0x0000 [80,80] [44530.552733] iwlwifi 0000:03:00.0: Q 10 is active and mapped to fifo 5 ra_tid 0x0000 [0,0] [44530.552803] iwlwifi 0000:03:00.0: Q 11 is active and mapped to fifo 1 ra_tid 0x0000 [52,52] [44530.552882] iwlwifi 0000:03:00.0: Q 12 is active and mapped to fifo 1 ra_tid 0x0000 [107,194] [44530.552955] iwlwifi 0000:03:00.0: Q 13 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44530.553030] iwlwifi 0000:03:00.0: Q 14 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44530.553106] iwlwifi 0000:03:00.0: Q 15 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44530.553182] iwlwifi 0000:03:00.0: Q 16 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44530.553259] iwlwifi 0000:03:00.0: Q 17 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44530.553335] iwlwifi 0000:03:00.0: Q 18 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44530.553407] iwlwifi 0000:03:00.0: Q 19 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44532.553208] iwlwifi 0000:03:00.0: fail to flush all tx fifo queues Q 12 [44532.553219] iwlwifi 0000:03:00.0: Current SW read_ptr 107 write_ptr 194 [44532.553255] iwl data: 00000000: f9 ff e1 ff ff ff ff ff 9c 03 20 e7 ff fb ff ff .......... ..... [44532.553276] iwlwifi 0000:03:00.0: FH TRBs(0) = 0x00000000 [44532.553296] iwlwifi 0000:03:00.0: FH TRBs(1) = 0x801020de [44532.553316] iwlwifi 0000:03:00.0: FH TRBs(2) = 0x00000000 [44532.553336] iwlwifi 0000:03:00.0: FH TRBs(3) = 0x803000dd [44532.553356] iwlwifi 0000:03:00.0: FH TRBs(4) = 0x00000000 [44532.553376] iwlwifi 0000:03:00.0: FH TRBs(5) = 0x00000000 [44532.553395] iwlwifi 0000:03:00.0: FH TRBs(6) = 0x00000000 [44532.553415] iwlwifi 0000:03:00.0: FH TRBs(7) = 0x00709056 [44532.553485] iwlwifi 0000:03:00.0: Q 0 is active and mapped to fifo 3 ra_tid 0x0000 [222,222] [44532.553555] iwlwifi 0000:03:00.0: Q 1 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [44532.553626] iwlwifi 0000:03:00.0: Q 2 is active and mapped to fifo 1 ra_tid 0x0000 [223,223] [44532.553698] iwlwifi 0000:03:00.0: Q 3 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [44532.553771] iwlwifi 0000:03:00.0: Q 4 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [44532.553843] iwlwifi 0000:03:00.0: Q 5 is active and mapped to fifo 4 ra_tid 0x0000 [0,0] [44532.553914] iwlwifi 0000:03:00.0: Q 6 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [44532.553986] iwlwifi 0000:03:00.0: Q 7 is active and mapped to fifo 5 ra_tid 0x0000 [0,0] [44532.554059] iwlwifi 0000:03:00.0: Q 8 is active and mapped to fifo 4 ra_tid 0x0000 [0,0] [44532.554159] iwlwifi 0000:03:00.0: Q 9 is active and mapped to fifo 7 ra_tid 0x0000 [87,87] [44532.554249] iwlwifi 0000:03:00.0: Q 10 is active and mapped to fifo 5 ra_tid 0x0000 [0,0] [44532.554326] iwlwifi 0000:03:00.0: Q 11 is active and mapped to fifo 1 ra_tid 0x0000 [52,52] [44532.554397] iwlwifi 0000:03:00.0: Q 12 is active and mapped to fifo 1 ra_tid 0x0000 [107,194] [44532.554478] iwlwifi 0000:03:00.0: Q 13 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44532.554548] iwlwifi 0000:03:00.0: Q 14 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44532.554617] iwlwifi 0000:03:00.0: Q 15 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44532.554686] iwlwifi 0000:03:00.0: Q 16 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44532.554755] iwlwifi 0000:03:00.0: Q 17 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44532.554824] iwlwifi 0000:03:00.0: Q 18 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44532.554893] iwlwifi 0000:03:00.0: Q 19 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44538.659314] iwlwifi 0000:03:00.0: fail to flush all tx fifo queues Q 12 [44538.659326] iwlwifi 0000:03:00.0: Current SW read_ptr 107 write_ptr 194 [44538.659360] iwl data: 00000000: f9 ff e1 ff ff ff ff ff 9c 03 20 e7 ff fb ff ff .......... ..... [44538.659381] iwlwifi 0000:03:00.0: FH TRBs(0) = 0x00000000 [44538.659401] iwlwifi 0000:03:00.0: FH TRBs(1) = 0x801020de [44538.659421] iwlwifi 0000:03:00.0: FH TRBs(2) = 0x00000000 [44538.659441] iwlwifi 0000:03:00.0: FH TRBs(3) = 0x803000de [44538.659460] iwlwifi 0000:03:00.0: FH TRBs(4) = 0x00000000 [44538.659480] iwlwifi 0000:03:00.0: FH TRBs(5) = 0x00000000 [44538.659499] iwlwifi 0000:03:00.0: FH TRBs(6) = 0x00000000 [44538.659519] iwlwifi 0000:03:00.0: FH TRBs(7) = 0x00709060 [44538.659589] iwlwifi 0000:03:00.0: Q 0 is active and mapped to fifo 3 ra_tid 0x0000 [223,223] [44538.659659] iwlwifi 0000:03:00.0: Q 1 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [44538.659729] iwlwifi 0000:03:00.0: Q 2 is active and mapped to fifo 1 ra_tid 0x0000 [223,223] [44538.659798] iwlwifi 0000:03:00.0: Q 3 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [44538.659867] iwlwifi 0000:03:00.0: Q 4 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [44538.659936] iwlwifi 0000:03:00.0: Q 5 is active and mapped to fifo 4 ra_tid 0x0000 [0,0] [44538.660004] iwlwifi 0000:03:00.0: Q 6 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [44538.660095] iwlwifi 0000:03:00.0: Q 7 is active and mapped to fifo 5 ra_tid 0x0000 [0,0] [44538.660165] iwlwifi 0000:03:00.0: Q 8 is active and mapped to fifo 4 ra_tid 0x0000 [0,0] [44538.660264] iwlwifi 0000:03:00.0: Q 9 is active and mapped to fifo 7 ra_tid 0x0000 [97,97] [44538.660343] iwlwifi 0000:03:00.0: Q 10 is active and mapped to fifo 5 ra_tid 0x0000 [0,0] [44538.660418] iwlwifi 0000:03:00.0: Q 11 is active and mapped to fifo 1 ra_tid 0x0000 [52,52] [44538.660495] iwlwifi 0000:03:00.0: Q 12 is active and mapped to fifo 1 ra_tid 0x0000 [107,194] [44538.660571] iwlwifi 0000:03:00.0: Q 13 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44538.660643] iwlwifi 0000:03:00.0: Q 14 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44538.660712] iwlwifi 0000:03:00.0: Q 15 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44538.660781] iwlwifi 0000:03:00.0: Q 16 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44538.660849] iwlwifi 0000:03:00.0: Q 17 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44538.660918] iwlwifi 0000:03:00.0: Q 18 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [44538.660987] iwlwifi 0000:03:00.0: Q 19 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
*** Bug 60658 has been marked as a duplicate of this bug. ***
*** Bug 69051 has been marked as a duplicate of this bug. ***
*** Bug 43259 has been marked as a duplicate of this bug. ***
*** Bug 42686 has been marked as a duplicate of this bug. ***
*** Bug 43184 has been marked as a duplicate of this bug. ***
*** Bug 49631 has been marked as a duplicate of this bug. ***
*** Bug 43186 has been marked as a duplicate of this bug. ***
confirmed to be still present in 3.13.1. It seems to me, that the bug only occurs during high constant loads of the interface. If state is resumed from hibernate it occurs even without much bandwidth usage. Network controller: Intel Corporation Centrino Ultimate-N 6300 (rev 35) Subsystem: Intel Corporation Centrino Ultimate-N 6300 3x3 AG
TX AMPDU are disabled by default in 3.14. This workaround should help.
The consequences of disabling this feature leads to huge slow down issues (ping response times of 5-8s instead of 30ms). They occur periodically nearly at same time (guess), like in the fifo tx queue flushing bug. Testing on 3.14rc7
Sorry for the spam: the bug is still present in 3.14rc7. Didn't check dmesg output. So the proposed workaround does not work in my case. Is there a way to check, if TX AMPDU is really disabled?
can you share your dmesg?
Created attachment 130131 [details] dmesg
Hello, I confirm that the issue is reproduced on vanilla 3.14.0 kernel. Attaching dmesg, lspci and config.
Created attachment 131241 [details] Logs from vanilla 3.14.0 kernel
*** Bug 48921 has been marked as a duplicate of this bug. ***
Created attachment 131531 [details] get more logs this patch allows to get more logs. Please apply and send the logs. note that this patch adds tons of logs - and I will need the syslog to get all the history. Thanks.
Created attachment 131541 [details] dmesg with debug enhanced logs
please send the syslog too. (as written above). Thanks
Created attachment 131561 [details] dmesg buffer syslogged
Created attachment 131611 [details] add more slightly debug Please replace the previous patch by this one and send again your syslog. Unfortunely, from what I am seeing, I don't see any driver bugs, but the FW is really getting stuck for a long time. It does recover long later though.
Created attachment 132101 [details] Reproducement with new patch by other user. (In reply to Emmanuel Grumbach from comment #21) > Created attachment 131611 [details] > add more slightly debug > > Please replace the previous patch by this one and send again your syslog. > Unfortunely, from what I am seeing, I don't see any driver bugs, but the FW > is really getting stuck for a long time. It does recover long later though. Hello Can someone reopen this bug or let me know if I need to file a new one? Similar issues to the ones described in this bug I have been experiencing on my system; it is also concerning the "fail to flush all tx fifo queues Q 2" output, which you can be found in the middle of the attachment with 1000 lines of context. This attachment is with the patch of the previous comment included. This Gentoo runs a 3.14.0-rc5 Linux kernel and linux-firmware 20131230, the NIC: 03:00.0 Network controller [0280]: Intel Corporation WiFi Link 5100 [8086:4232] Around the moment that this happens, I experience huge latency and packet loss; here for example I have packet loss (10.0.0.1 is router, 8.8.8.8 is internet): PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=1558 ms --- 10.0.0.1 ping statistics --- 4 packets transmitted, 1 received, 75% packet loss, time 3001ms rtt min/avg/max/mdev = 1558.746/1558.746/1558.746/0.000 ms, pipe 2 In another case I for example have a moment where it recovers: PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=2979 ms 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=2265 ms 64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=2005 ms 64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=1543 ms --- 10.0.0.1 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3002ms rtt min/avg/max/mdev = 1543.949/2198.716/2979.334/519.544 ms, pipe 3 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=47 time=553 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=47 time=683 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=47 time=77.6 ms 64 bytes from 8.8.8.8: icmp_seq=4 ttl=47 time=78.8 ms --- 8.8.8.8 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3008ms rtt min/avg/max/mdev = 77.680/348.382/683.868/274.029 ms But this recovered state won't last very long. After these symptoms; I need to play a bit with rmmod and modprobe, reconnecting or even rebooting as well as wait in a hope to get it to fully recover. If you need more information, let me know; if I discover more, I'll comment. TIA
(In reply to Tom Wijsman from comment #22) > Created attachment 132101 [details] > Reproducement with new patch by other user. > > (In reply to Emmanuel Grumbach from comment #21) > > Created attachment 131611 [details] > > add more slightly debug > > > > Please replace the previous patch by this one and send again your syslog. > > Unfortunely, from what I am seeing, I don't see any driver bugs, but the FW > > is really getting stuck for a long time. It does recover long later though. > > Hello > > Can someone reopen this bug or let me know if I need to file a new one? > You are suffering from the exact same problems as described in this bug. You don't need to re-open anything - this bug is open. From what I see from your logs - this is the exact same behavior as the other users. The firmware takes a very long time to transmit the packets. Can you please enable debug logs with the patch applied: modprobe iwlwifi debug=0xC080000 I wonder if we are stuck in high rates while the link is bad. This can increase significantly the latency.
(In reply to Emmanuel Grumbach from comment #23) > From what I see from your logs - this is the exact same behavior as the > other users. The firmware takes a very long time to transmit the packets. > Can you please enable debug logs with the patch applied: > > modprobe iwlwifi debug=0xC080000 Will do; I only experience this at certain hours of the day due to high rates happening then, but I can look into setting up a test case with Raspberry Pi. > I wonder if we are stuck in high rates while the link is bad. This can > increase significantly the latency. Yes, I discovered that too, it happens when there is high rate Ethernet traffic. Let's say I have two Ethernet connected computers and my laptop that is on WLAN; my laptop on WLAN gets affected when the two computers transfer, eg. a file of 1 GB, this problem then continues for a bit after the file is done transferring. My router is a Linksys WRT610N running DD-WRT v24-sp2 (04/07/12) std-usb-nas - build 18946M NEWD-2; if needed, I can try to get that back to original firmware. But before blaming the router, as it should work fine; I think other WLAN clients here are unaffected, I'll try to confirm that by having someone else test with a different NIC here on the network test it. An early idea I get is that iwlwifi doesn't deal well with the high rates compared to other cards; I'll try to get more debug info, as well as try the rest I described here to see if I can get further confirmations and details. This has worked fine in the past (let's say, a year ago), but somewhere in the last months this has regressed; I can't pinpoint when exactly and why, I'm also not sure if I can bisect it somehow unless I get to a clear isolated test case. Thank you for the debug option and patch, set it and patched again; reboot. :)
Created attachment 133971 [details] Reproducement with new patch and debug=0xC080000 by other user. (In reply to Emmanuel Grumbach from comment #23) > Can you please enable debug logs with the patch applied: > > modprobe iwlwifi debug=0xC080000 > > I wonder if we are stuck in high rates while the link is bad. This can > increase significantly the latency. With that debug option, I've managed to capture it happen again now; this log with 2000 lines context captured it twice as well as disconnect(s) that occurred. The "fail to flush all tx fifo queues Q 2" occurs at 00:44 and 00:49 in the log. An Ethernet connected laptop is streaming from a Raspberry Pi over Ethernet (= cause, unaffected), another WiFi connected laptop can game fine (= unaffected); I'm on my WiFi connected laptop experiencing this while I am browsing. Thank you very much in advance for looking into this.
ouch - I meant 0xc0800000 (I missed a trailing 0)... sorry...
Created attachment 134591 [details] Reproducement with new patch and debug=0xc0800000 by other user. This time with new debug parameter, it happened twice close to a disconnect; I sometimes get random disconnects, unsure if it is caused by the same cause. 2000 lines of context; due to disconnect, NetworkManager / wpa_supplicant log is present too; "fail to flush all tx fifo queues Q 2" happens at 00:41 and 00:46. The situation is the same as before. Thank you in advance.
Thank you Tom, this was really helpful. What I see here is that each packet is re-transmitted a lot times. This certainly increases the latency. This points to the rate scaling algorithm. I'll try to see what can be done. Thanks again.
Created attachment 134601 [details] expire packets after 50ms Please retry with the patch attached. This will throw away packets that are too old. I expect packet loss, but the firmware should be more responsive. Let me know what happen with this.
Doesn't compile, but I suppose that -tx_cmd->stop_time.life_time = iwl_read32(CSR_UCODE_DRV_GP2) + 50 * 1024; +tx_cmd->stop_time.life_time = iwl_read32(priv->trans, CSR_UCODE_DRV_GP2) + 50 * 1024; is what you intended; will try and see how it goes, let's wait it out... :)
(In reply to Tom Wijsman from comment #30) > Doesn't compile, but I suppose that > > -tx_cmd->stop_time.life_time = iwl_read32(CSR_UCODE_DRV_GP2) + 50 * 1024; > +tx_cmd->stop_time.life_time = iwl_read32(priv->trans, CSR_UCODE_DRV_GP2) + > 50 * 1024; > > is what you intended; will try and see how it goes, let's wait it out... :) Yep - sorry, I couldn't really compile when I wrote the change ;)
(In reply to Emmanuel Grumbach from comment #29) > Let me know what happen with this. Well, it appears I can't even connect with this; not with 50, not with 500 and not even with 5000. With the patch, I get: iwlwifi 0000:03:00.0: I iwlagn_rx_reply_tx TXQ 0 status LIFE_EXPIRE (0x00000087) Without the patch, I get: iwlwifi 0000:03:00.0: I iwlagn_rx_reply_tx TXQ 0 status SUCCESS (0x00000201)
Argh, just one more worked out; so, this one works: iwl_read32(priv->trans, CSR_UCODE_DRV_GP2) + 50000 * 1024; Are you perhaps off by another 1024? Do I need to try 50 * 1024 * 1024? I've also noticed that while I can now connect, I get disconnected later; I guess that once you give me a range, I need to try out and find a stable value?
Created attachment 134631 [details] dmesg output switch from repeating SUCCESS to repeating LIFE_EXPIRE with 50000 * 1024 Attached you can find dmesg output switching from a repeating SUCCESS to a repeating LIFE_EXPIRE with 50000 * 1024; at that point, pings to my router stop arriving after which after a timeout I get disconnected.
(In reply to Tom Wijsman from comment #33) > Do I need to try 50 * 1024 * 1024? 50 * 1024 * 1024: Disconnects (LIFE_EXPIRE) after a few pings. 100 * 1024 * 1024: Get to YT video, set it Full HD, disconnects (LIFE_EXPIRE). 200 * 1024 * 1024: Watch a YT video, wait, disconnect. From a first impression it controls how long it takes until it disconnects? :(
It control how much to wait until the packet is discarded. What I see here is that the rate scaling algorithm scaled down (intial rate = 0x8100) - you are using a very low rate because of the bad link, and even then, there are a lot of retransmissions. I guess this is a PHY / RF problem and these are tricky as you can guess... One last thing I noticed is that we don't seem to go down to CCK rates, are you in 2.4GHz or 5.2GHz?
Created attachment 134681 [details] DD-WRT Basic/Security/Advanced settings + iw link/station/scan and wavemon output (In reply to Emmanuel Grumbach from comment #36) > One last thing I noticed is that we don't seem to go down to CCK rates, are > you in 2.4GHz or 5.2GHz? 2.4 GHz Attached is are the DD-WRT router's wireless settings, as well as iw dump output; this is a lot of information, but it might contain something valuable. Given that Basic Rate says "The default setting will ensure maximum compatibility with all devices. You may also choose to enable all data rates by selecting ALL." I'm going to try to change that from ALL to Default to see if that gives an improvement; perhaps it is incompatible with that, I'll report back if this is the case. This information was captured with the infinite life_time put back and no traffic on Ethernet this morning; given that the given value doesn't work, and raising it a lot just seems to stall a disconnect I'll stay back on infinite life_time until we've come down to a better value or idea.
can you paste the output of: cat /sys/kernel/debug/ieee80211/phy0/netdev\:wlan0/stations/*/rate_scale_table you can then, echo 0x820a into that same file and see what happens. This will force to move to a fixed low rate. You seem to be stuck in OFDM rates while you should downgrading to CCK rates - PHY / RF issues. In any case, please remove the lifetime settings, but you already did that it seems.
# ls /sys/kernel/debug/ieee80211/phy0 fragmentation_threshold ht40allow_map long_retry_limit rts_threshold short_retry_limit Hmm, no netdev\:* entries in there; do I need to enable some CONFIG option?
CONFIG_MAC80211_DEBUGFS and CONFIG_IWLWIFI_DEBUGFS
With the Default setting (Comment #37), I still experience high pings; I'm not sure if I can reproduce the failure right now, we'll have to wait that out. Here is the output you've requested: sta_id 0 failed=121 success=121 rate=0FFF fixed rate 0x0 valid_tx_ant ANT_B, lq type HT SISO 20MHz SGI last tx rate=0xA103 general:flags=0x0 mimo-d=0 s-ant0x2 d-ant=0x3 agg:time_limit=4000 dist_start_th=3 frame_cnt_limit=63 Start idx [0]=0x0 [1]=0x0 [2]=0x0 [3]=0x0 rate[0] 0xA104 36mbps (16QAM 3/4) rate[1] 0xA104 36mbps (16QAM 3/4) rate[2] 0xA103 24mbps (16QAM 1/2) rate[3] 0xA103 24mbps (16QAM 1/2) rate[4] 0xA103 24mbps (16QAM 1/2) rate[5] 0x8007 18mbps (QPSK 3/4) rate[6] 0x8005 12mbps (QPSK 1/2) rate[7] 0x826E 11mbps (QPSK CCK) rate[8] 0x800F 9mbps (BPSK 1/2) rate[9] 0x800D 6mbps (BPSK 1/2) rate[10] 0x8237 5.5mbps (BPSK CCK) rate[11] 0x8214 2mbps (QPSK DSSS) rate[12] 0x820A 1mbps (BPSK DSSS) rate[13] 0x820A 1mbps (BPSK DSSS) rate[14] 0x820A 1mbps (BPSK DSSS) rate[15] 0x820A 1mbps (BPSK DSSS) After the echo, I see that all the rates repeat the same: sta_id 0 failed=0 success=0 rate=0FFF fixed rate 0x820A valid_tx_ant ANT_B, lq type HT SISO 20MHz SGI last tx rate=0xA100 general:flags=0x0 mimo-d=0 s-ant0x2 d-ant=0x3 agg:time_limit=4000 dist_start_th=3 frame_cnt_limit=63 Start idx [0]=0x0 [1]=0x0 [2]=0x0 [3]=0x0 rate[0] 0x820A 1mbps (BPSK DSSS) ... with the rest repeating the same up to rate[15] ... It doesn't help with the pings, they still jump into seconds like before: 64 bytes from 10.0.0.1: icmp_seq=176 ttl=64 time=2.81 ms 64 bytes from 10.0.0.1: icmp_seq=177 ttl=64 time=5.65 ms 64 bytes from 10.0.0.1: icmp_seq=178 ttl=64 time=4.64 ms 64 bytes from 10.0.0.1: icmp_seq=179 ttl=64 time=1060 ms 64 bytes from 10.0.0.1: icmp_seq=180 ttl=64 time=1157 ms 64 bytes from 10.0.0.1: icmp_seq=181 ttl=64 time=1139 ms 64 bytes from 10.0.0.1: icmp_seq=182 ttl=64 time=487 ms 64 bytes from 10.0.0.1: icmp_seq=183 ttl=64 time=888 ms 64 bytes from 10.0.0.1: icmp_seq=184 ttl=64 time=433 ms 64 bytes from 10.0.0.1: icmp_seq=185 ttl=64 time=9.54 ms 64 bytes from 10.0.0.1: icmp_seq=186 ttl=64 time=32.4 ms This is during the network situation; it just doesn't trigger a failure at the moment, it's not always reproducible. I'll wait to see if I'll get a reproducement with Default setting (test #1); when it does, it isn't that setting and I'll then try to see what the echo does while the situation is occurring (test #2). If it turns out the Default setting has fixed this up, I can later try again with the All setting; just to reproduce and provide more information as to why the All setting would break it in that case. Stay tuned... PS: Without the echo, it seems like it stays (when idle) in the higher rates, where it only very infrequently goes to the lower rates; I'm not sure how often it's supposed to go there, if needed I can try to capture a video in the future.
What I fear is that the rate control doesn't downgrade to low CCK rates, where these might help you when you have a bad link. If the rate control doesn't downgrade, you'll have tons of re-transmissions, and this can cause big delays. The echo command fixes the rate to a low rate. This is a debug step, just to see if it helps. I came to the conclusion that this might help since you seemed to have a lot of re-transmissions and you seemed stuck in OFDM rates (higher than CCK). To go back to a floating rate, you need to unload a reload the module.
When setting rates statically to constant 1mbps, I still experience high response times and failing tx queue flushes.
What I further think is that the disconnects while idle have to do with the rate being too low for the high-rate-stuck iwlwifi to handle; therefore, I think it would timeout due to this mismatch. Whereas making my internet traffic a little less idle it manages to stay connected; eg., opening a few internet related apps, playing a game, ... That observation put aside; I notice some interesting commits in iwlwifi-next: iwlwifi: mvm: rs: revert changes to search cycle rules iwlwifi: mvm: rs: overhaul rs_get_best_rate iwlwifi: mvm: rs: don't allow TPC when power save is disabled iwlwifi: mvm: rs: avoid TPC while in search cycle iwlwifi: mvm: rs: choose correct expected tpt table ^ These are directly related to selecting the rate. iwlwifi: mvm: implement mac80211's flush callback ^ "This allows mac80211 to flush the Tx queues before it sends critical management frames." says enough and seems quite relevant to this bug. Are there other commits there that are of interest testing? Should and can I just cherrypick these 6 on top of my 3.15-rc4? Which kernel release will they go in?
(In reply to Martin Scherer from comment #43) > When setting rates statically to constant 1mbps, I still experience high > response times and failing tx queue flushes. Can I see the output of debug=0xC0800000 I'd like the see the number of retries - thanks
(In reply to Tom Wijsman from comment #44) > > iwlwifi: mvm: rs: revert changes to search cycle rules > iwlwifi: mvm: rs: overhaul rs_get_best_rate > iwlwifi: mvm: rs: don't allow TPC when power save is disabled > iwlwifi: mvm: rs: avoid TPC while in search cycle > iwlwifi: mvm: rs: choose correct expected tpt table > > ^ These are directly related to selecting the rate. > > iwlwifi: mvm: implement mac80211's flush callback > > ^ "This allows mac80211 to flush the Tx queues before it sends critical > management frames." says enough and seems quite relevant to this bug. > > Are there other commits there that are of interest testing? Should and can I > just cherrypick these 6 on top of my 3.15-rc4? Which kernel release will > they go in? These aren't related to your device - they are related to newer devices.
Just want to make sure if this ever gets fixed, that the fix includes all affected cards. My lspci reports: 02:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000 [Condor Peak] Subsystem: Intel Corporation Centrino Wireless-N 1000 BGN The info from other reports seems to match for me (my details below), although I have a very basic situation: one router / access point. The problem frequently occurs during bandwidth-intensive video streaming, despite a very high available bandwidth and a (mostly) strong ap signal. However, there are literally dozens of other ap signals available, but I've never connected with any of them. Not sure if they are "jamming" my frequency, but I have tried every available channel for my own connection (most are saturated). Here are my details: uname -a: Linux 3.14.3-200.fc20.x86_64 #1 SMP Tue May 6 19:00:18 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux dmesg snippet: [ 368.768448] iwlwifi 0000:02:00.0: fail to flush all tx fifo queues Q 2 [ 368.768463] iwlwifi 0000:02:00.0: Current SW read_ptr 72 write_ptr 223 [ 368.768518] iwl data: 00000000: 00 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 368.768561] iwlwifi 0000:02:00.0: FH TRBs(0) = 0x00000000 [ 368.768603] iwlwifi 0000:02:00.0: FH TRBs(1) = 0x80102057 [ 368.768643] iwlwifi 0000:02:00.0: FH TRBs(2) = 0x00000000 [ 368.768682] iwlwifi 0000:02:00.0: FH TRBs(3) = 0x80300012 [ 368.768726] iwlwifi 0000:02:00.0: FH TRBs(4) = 0x00000000 [ 368.768764] iwlwifi 0000:02:00.0: FH TRBs(5) = 0x00000000 [ 368.768803] iwlwifi 0000:02:00.0: FH TRBs(6) = 0x00000000 [ 368.768842] iwlwifi 0000:02:00.0: FH TRBs(7) = 0x00704061 [ 368.768930] iwlwifi 0000:02:00.0: Q 0 is active and mapped to fifo 3 ra_tid 0x0000 [19,19] [ 368.769016] iwlwifi 0000:02:00.0: Q 1 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [ 368.769103] iwlwifi 0000:02:00.0: Q 2 is active and mapped to fifo 1 ra_tid 0x0000 [72,223] [ 368.769189] iwlwifi 0000:02:00.0: Q 3 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.769333] iwlwifi 0000:02:00.0: Q 4 is active and mapped to fifo 7 ra_tid 0x0000 [98,98] [ 368.769480] iwlwifi 0000:02:00.0: Q 5 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.769587] iwlwifi 0000:02:00.0: Q 6 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.769674] iwlwifi 0000:02:00.0: Q 7 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.769761] iwlwifi 0000:02:00.0: Q 8 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.769848] iwlwifi 0000:02:00.0: Q 9 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.769964] iwlwifi 0000:02:00.0: Q 10 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.770062] iwlwifi 0000:02:00.0: Q 11 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.770151] iwlwifi 0000:02:00.0: Q 12 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.770238] iwlwifi 0000:02:00.0: Q 13 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.770325] iwlwifi 0000:02:00.0: Q 14 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.770455] iwlwifi 0000:02:00.0: Q 15 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.770550] iwlwifi 0000:02:00.0: Q 16 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.770636] iwlwifi 0000:02:00.0: Q 17 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.770722] iwlwifi 0000:02:00.0: Q 18 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 368.770808] iwlwifi 0000:02:00.0: Q 19 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
(In reply to kernel from comment #47) > Very high available bandwidth and a (mostly) strong ap signal. Same here, I sit ~2 meters or so from the router while this happens. > However, there are literally dozens of other ap signals available, but I've > never connected with any of them. Not sure if they are "jamming" my > frequency, but I have tried every available channel for my own connection > (most are saturated). Same here, I count ~16 APs at the moment; divided over all 2.4GHz frequencies. Regarding strength, my AP and ~2 others are at 3 of the 4 bars. There are ~6 APs at 2 of the 4 bars, with another ~6 APs at 1 of the 4 bars.
FWIW I'm seeing the "Q xx is inactive" messages also when coming back from standby on my Dell XPS13 (lspci: 01:00.0 Network controller: Intel Corporation Centrino Advanced-N 6235 (rev 24)). And they also inhibit proper connecting after standby. I don't know if that helps, but if I can add anything to this thread I'd be happy to contribute. Kernel: 3.14.3, fw: 18.168.6.1.
Emmanuel. I am having the same issue, but as an extra comment I have noticed this issue on networks that has 802.11h enabled. For example Ruckus Wireless APs with a feature called "ChannelFly" uses 802.11h to change the channel.
Thanks for all the reports. Can someone test this: diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c b/drivers/net/wireless/iwlwifi/dvm/power.c index f2c1439..e5a6b70 100644 --- a/drivers/net/wireless/iwlwifi/dvm/power.c +++ b/drivers/net/wireless/iwlwifi/dvm/power.c @@ -288,6 +288,9 @@ static void iwl_power_build_cmd(struct iwl_priv *priv, bool enabled = priv->hw->conf.flags & IEEE80211_CONF_PS; int dtimper; + iwl_power_sleep_cam_cmd(priv, cmd); + return; + dtimper = priv->hw->conf.ps_dtim_period ?: 1; if (priv->wowlan)
Created attachment 137661 [details] disable power save The same as above, but in a decent patch format
On 05/29/2014 07:08 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=56581 > > --- Comment #52 from Emmanuel Grumbach <emmanuel.grumbach@intel.com> --- > Created attachment 137661 [details] > --> https://bugzilla.kernel.org/attachment.cgi?id=137661&action=edit > disable power save > > The same as above, but in a decent patch format > I've noticed, that since I switched the AP network from mixed b/g/n to b/g the bug disappeared at all. The connection strengths is same like before, because the AP has not been moved and the laptop is also in same position. Maybe it disappeared, because high rates of N-mode are disabled, but this can be excluded since we have tested fixed rates of 1mbps. And power saving should not be related to disabling AP mixed n-mode neither, right? Hope this gains a few new hints. Best, Martin
(In reply to Emmanuel Grumbach from comment #51) > + iwl_power_sleep_cam_cmd(priv, cmd); > + return; Reproduced: Increased pings, slower network and disconnect happened again with similar traffic in the earlier described network environment; didn't get "fail to flush Tx queue", but that's just like before that I don't always get that. Extra debugging information: My router shows a high number of "Transmitted (TX) errors"; from the last time I have around 21000 such errors over a 2 hours span. Thought: An internet search about this reveals that this could have to do something with "Sensitivity Range (ACK Timing)" so I'll be increasing that value with each incident to see if this can help or not. This reminds me that I still need to set up some kind of test scenario; like, have a Raspberry Pi flood data packets to and/or from my router. (In reply to Martin Scherer from comment #53) > I've noticed, that since I switched the AP network from mixed b/g/n to > b/g the bug disappeared at all. Well, it could be that; but I'm not willing to downgrade the experience of the network or my client here, I do have seen tricks like the module parameter 11n_disable=1 (disabling N from the client instead of the router) reported to help with cases like these, so that is a way to look out for. When I experience trouble again, I might try this parameter and compare results. > And power saving should not be related to disabling AP mixed n-mode > neither, right? I perceive it to help a small bit, in terms of disconnects I sometimes receive; but it doesn't seem like a resolution to "fail to flush Tx queue" or slowdowns.
@Martin: can you please run with debug=0x00001000 ? I'd like to see the working scenario (without 11N rates) and the failing scenario (with 11N rates). Thanks
On 05/31/2014 09:46 PM, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=56581 > > --- Comment #55 from Emmanuel Grumbach <emmanuel.grumbach@intel.com> --- > @Martin: can you please run with debug=0x00001000 ? > I'd like to see the working scenario (without 11N rates) and the failing > scenario (with 11N rates). > > Thanks > Unfortunately there are still queue drops with disabled 11n, but it doesn't lead to the same unreliable behaviour like with 11n. I will provide some logs nevertheless.
The 11n comments reminded me of a bug that's been plaguing Linux systems for years. Both of these reports (linked below) are made via Ubuntu but they both seem to be traced back to the kernel. The specific logging messages have changed over the years, but the bug seems to be very much the same! (I'm sure more googling would turn up even more...) https://bugs.launchpad.net/ubuntu/+source/linux/+bug/836250 https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1201470
Can someone provide the info I requested in comment 55? I think that the problem might come from the protection modes. Or, can someone try this? diff --git a/drivers/net/wireless/iwlwifi/dvm/rxon.c b/drivers/net/wireless/iwlwifi/dvm/rxon.c index 15d6881..8be72ac 100644 --- a/drivers/net/wireless/iwlwifi/dvm/rxon.c +++ b/drivers/net/wireless/iwlwifi/dvm/rxon.c @@ -1072,8 +1072,8 @@ int iwlagn_commit_rxon(struct iwl_priv *priv, struct iwl_rxon_context *ctx) * force CTS-to-self frames protection if RTS-CTS is not preferred * one aggregation protection method */ - if (!priv->hw_params.use_rts_for_aggregation) - ctx->staging.flags |= RXON_FLG_SELF_CTS_EN; +// if (!priv->hw_params.use_rts_for_aggregation) +// ctx->staging.flags |= RXON_FLG_SELF_CTS_EN; if ((ctx->vif && ctx->vif->bss_conf.use_short_slot) || !(ctx->staging.flags & RXON_FLG_BAND_24G_MSK)) @@ -1480,9 +1480,9 @@ void iwlagn_bss_info_changed(struct ieee80211_hw *hw, else ctx->staging.flags &= ~RXON_FLG_TGG_PROTECT_MSK; - if (bss_conf->use_cts_prot) - ctx->staging.flags |= RXON_FLG_SELF_CTS_EN; - else +// if (bss_conf->use_cts_prot) +// ctx->staging.flags |= RXON_FLG_SELF_CTS_EN; +// else ctx->staging.flags &= ~RXON_FLG_SELF_CTS_EN; memcpy(ctx->staging.bssid_addr, bss_conf->bssid, ETH_ALEN);
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1328345
Can someone test the patch in comment 58?
(In reply to Emmanuel Grumbach from comment #60) > Can someone test the patch in comment 58? I tried the patch in a 3.15.1 kernel and I've been using my machine for around 10-11 hours (including suspends). There have been no iwlwifi errors in my dmesg and my connection has been working perfectly fine since then (I did have those errors and connection drops in a 3.15.0 without the patch)
Good - other users?
I've suddenly gotten: [23441.760658] iwlwifi 0000:03:00.0: fail to flush all tx fifo queues Q 2 So apparently it's a no-go. Weird thing is I am not doing any network intensive operations right now, which I have done throughout the day.
As far as disconnects go, I still have them; however, if I set up an infinite normal ping to my router it keeps the connection alive. As far as the TX fifo flushes fails go, I don't think I've since reproduced them; but given it was infrequent for me I'm not willing to label this as fixed at all. It makes me under the impression that increasing "Sensitivity Range (ACK Timing)" has helped. But again, I might just as well experience it later. So, I'm going to give this more time; I still don't have a test, I'll try to set one up in a week. (In reply to Rafael Medina from comment #63) > So apparently it's a no-go. Weird thing is I am not doing any network > intensive operations right now, which I have done throughout the day. When I experienced this I noticed that sometimes it comes after the intensive operations; afaik, if the intensive computer shuts down, the issue can continue. It acts somewhat like a lawine...
ok - thanks all for the testing. I will apply the patch anyway. CTS-to-self is a small optimisation, and is hard to get it right in the firmware. I will disable it since it seems to help even if there are still other issues.
*** Bug 79711 has been marked as a duplicate of this bug. ***
Patch that disables CTS to Self landed in Linus's tree. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=43d826ca5979927131685cc2092c7ce862cb91cd
Hi. I'm getting lots of these: [ 45.964346] iwlwifi 0000:03:00.0: fail to flush all tx fifo queues Q 2 [ 45.964355] iwlwifi 0000:03:00.0: Current SW read_ptr 161 write_ptr 201 [ 45.964406] iwl data: 00000000: 00 00 00 00 00 00 00 00 fe ff 01 00 00 00 00 00 ................ [ 45.964443] iwlwifi 0000:03:00.0: FH TRBs(0) = 0x00000000 [ 45.964479] iwlwifi 0000:03:00.0: FH TRBs(1) = 0x801020b0 [ 45.964500] iwlwifi 0000:03:00.0: FH TRBs(2) = 0x00000000 [ 45.964520] iwlwifi 0000:03:00.0: FH TRBs(3) = 0x80300008 [ 45.964541] iwlwifi 0000:03:00.0: FH TRBs(4) = 0x00000000 [ 45.964562] iwlwifi 0000:03:00.0: FH TRBs(5) = 0x00000000 [ 45.964582] iwlwifi 0000:03:00.0: FH TRBs(6) = 0x00000000 [ 45.964605] iwlwifi 0000:03:00.0: FH TRBs(7) = 0x00704068 [ 45.964687] iwlwifi 0000:03:00.0: Q 0 is active and mapped to fifo 3 ra_tid 0x0000 [9,9] [ 45.964778] iwlwifi 0000:03:00.0: Q 1 is active and mapped to fifo 2 ra_tid 0x0000 [0,0] [ 45.964859] iwlwifi 0000:03:00.0: Q 2 is active and mapped to fifo 1 ra_tid 0x0000 [161,201] [ 45.964936] iwlwifi 0000:03:00.0: Q 3 is active and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.965019] iwlwifi 0000:03:00.0: Q 4 is active and mapped to fifo 7 ra_tid 0x0000 [105,105] [ 45.965097] iwlwifi 0000:03:00.0: Q 5 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.965182] iwlwifi 0000:03:00.0: Q 6 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.965268] iwlwifi 0000:03:00.0: Q 7 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.965354] iwlwifi 0000:03:00.0: Q 8 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.965443] iwlwifi 0000:03:00.0: Q 9 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.965529] iwlwifi 0000:03:00.0: Q 10 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.965609] iwlwifi 0000:03:00.0: Q 11 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.965691] iwlwifi 0000:03:00.0: Q 12 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.965778] iwlwifi 0000:03:00.0: Q 13 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.965863] iwlwifi 0000:03:00.0: Q 14 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.965943] iwlwifi 0000:03:00.0: Q 15 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.966030] iwlwifi 0000:03:00.0: Q 16 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.966116] iwlwifi 0000:03:00.0: Q 17 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.966198] iwlwifi 0000:03:00.0: Q 18 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 45.966288] iwlwifi 0000:03:00.0: Q 19 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0] [ 85.490930] wlan0: deauthenticating from 88:dc:96:05:73:fa by local choice (reason=3) It looks like the problem happens when signal is (very) weak. Currently I'm at a country side, so there are not many devices - my card can 'see' only 2 APs including one I'm connected to. The AP I'm connecting to doesn't have encryption and it looks like it's a TP-Link router. Unfortunately that is about all I know about hardware on that side. I have 'Intel Corporation Centrino Ultimate-N 6300 (rev 3e)'. If I set '11n_disable=1' problem doesn't seem to appear (which is expected, I presume) and I get reasonably stable connection with reasonably good speeds. Without '11n_disable=1' I get lots of those messages in logs and almost unusable connection. I'm using Ubuntu Trusty (Mint). Patch from https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=43d826ca5979927131685cc2092c7ce862cb91cd doesn't seem to make any difference. Is this the same problem? Is there anything I can try to help diagnose this issue? Thanks!
@Nikolay: can you please try this: https://bugzilla.kernel.org/attachment.cgi?id=143861&action=diff thanks
Created attachment 144831 [details] Logs after applying suggested patches Hi. I've tried a suggested patch and it looks like it made matters even worse. I'm attaching dmesg output with errors that happened just after boot. There are lots of errors and software restarts :). After those errors have happened reloading wifi related kernel modules produces one more SW restart and after that I get connection that seem to be limited to 1Mbit/s in both directions, without any aggregation, as far as I can see. And even-though it is relatively stable it is still worse than results I get with '11n_disable=1'. And I do not think this is intended result of this patch. Just for clarification: I'm using 'stock' Ubuntu kernel designated as 3.13.0-32-generic with two patches Emmanuel Grumbach have suggested. Please let me know if there's something I can test or if there's any other information/logs I can provide. Thanks.
Yes - obviously this was not what I wanted to achieve... I'll try to poke at it tomorrow - but I am not very optimistic. Disabling protection can be ... a bad work around - but it is certainly not a viable solution that will work for everybody. I just want to know if it will work for you, and then I can make it a module parameter.
I looked a little bit into how it behaves when problem occurs. Indeed rate_scale_table shows that card tried to transmit at a very high bit rate (60Mbits) which probably doesn't reach destination because link quality is not good enough. And it doesn't looks like card/driver tries to scale rate down. I looked a little bit into the code. To my very NOOBy view it seems that hardware/firmware part is not very intelligent: it is given a list of bitrates and it tries to transmit data using those bitrates one by one. Then it reports results of those attempts back to kernel. It looks like hardware/firmware doesn't chose those bitrates - kernel does. So, assuming that original problem is indeed caused by card/driver sending data on too high bit rate - this seems like a driver bug, not firmware bug. Is it worth poking in driver's internals to figure out what may be causing this problem? Thanks!
The problem does not occur with kernel version 3.16.0-031600. The logs under /var/log are clear and no wireless malfunction has happend since I updated my kernel from version 3.13.9 two days ago.
I'm still getting this on 3.16.1 (Arch Linux). It happens less often though.
Created attachment 147591 [details] bits of syslog with stack trace Maybe this log will be useful. Kernel 3.16.1 on Ubuntu x86_64, Intel Corporation Centrino Wireless-N 2230 (rev c4) in AP mode.
(In reply to Szilveszter Nemeth from comment #73) kernel version 3.16 solved the issue also for me. kernel-3.16.1-300.fc21.x86_64
Ok thanks. Let's close the bug then.
Emmanuel, kernel 3.16.1 is still affected by the issue. See my message #75 and message #74.
(In reply to Mike from comment #78) > Emmanuel, kernel 3.16.1 is still affected by the issue. > See my message #75 and message #74. Could you please post logs about the issue?
Szilveszter, see https://bugzilla.kernel.org/attachment.cgi?id=147591
Created attachment 147841 [details] iwlwifi log I still see these errors in my dmesg log. I don't know if it's related, but the logs share the "Q ... is active and mapped to fifo ..." lines. Mostly the problems occur after coming out of standby.
linux 3.16.1 problem: https://bugzilla.kernel.org/show_bug.cgi?id=83071
*** Bug 83071 has been marked as a duplicate of this bug. ***
(In reply to Mike from comment #80) > Szilveszter, see https://bugzilla.kernel.org/attachment.cgi?id=147591 This is a different issue which won't be fixed
(In reply to Martin from comment #81) > Created attachment 147841 [details] > iwlwifi log > > I still see these errors in my dmesg log. I don't know if it's related, but > the logs share the "Q ... is active and mapped to fifo ..." lines. Mostly > the problems occur after coming out of standby. This is another issue - you have an NMI_WDG_INTERRUPT which is clearly a firmware issue and won't be fixed.
(In reply to Emmanuel Grumbach from comment #84) > (In reply to Mike from comment #80) > > Szilveszter, see https://bugzilla.kernel.org/attachment.cgi?id=147591 > > This is a different issue which won't be fixed Any link to that issue and why it wontfix?
it won't be fixed because it is a firmware issue and unfortunately, the firmware team doesn't fix bugs for these old devices anymore.
this issue won't be fixed or another issue that Mike replied at comment #80?
comment #80's.
I still experience this bug sporadically while roaming between access points. I use kernel 3.0.101 with wireless modules from linux-backports 3.16.1 on arm platform. I discovered that I can perfectly reproduce the failure when i plug off the access point during data transfer. In this scenario, there is no sporadic behaviour any more, it happens all the time. Any ideas how to further address this issue? It's a big problem for us since we need reliable roaming... Below you can find the dmesg output, but it's very similar to the previous logs.
Created attachment 149251 [details] dmesg output 3.16-1
I changed the IWL_FLUSH_WAIT_MS from 2000 to 200, just for testing. The problem obviously still exits but the driver doesn't hang for such a long time. Of course this is no reliable solution for our roaming problem... Any further ideas?
I am afraid this is a FW bug, which makes it impossible to fix. I guess I can take your suggestion to reduce IWL_FLUSH_WAIT_MS.
But if it's a firmware issue, why does the Windows driver not have any problems at all and roams perfectly? Is the firmware different?
(In reply to Bill Bluescreen from comment #94) > But if it's a firmware issue, why does the Windows driver not have any > problems at all and roams perfectly? Is the firmware different? yes - and we have no way to unify them unfortunately. This has changed starting from 7260.
(In reply to Bill Bluescreen from comment #94) > But if it's a firmware issue, why does the Windows driver not have any > problems at all and roams perfectly? Uhm, no, that is an assumption; I do experience high pings trouble on Windows at a similar moment as described above, not sure how to start debugging this...
> I am afraid this is a FW bug, which makes it impossible to fix. It's not impossible. Someone needs to lobby whoever is responsible at Intel to do it. This bug affects many. I don't have much else to add. On 3.16.2 and still occurs sporadically.
The 6300 cards were sold in Lenovo laptops less then a year ago. Problem was already known back then. Problem still exists. Problem renders 802.11n unusable for many users. And yet Intel decides not to fix firmware for it? Doesn't sound too good for a brand.
I am having 02:00.0 Network controller: Intel Corporation Centrino Wireless-N 2200 (rev c4) This problem didn't appear until the last month with the workaround of disabling 802.11N mode. But it doesn't appear to help anymore.
I can confirm this. I'm experiensing these errors (and 3-5s ping times) on Intel Corporation Centrino Ultimate-N 6300 (rev 3e) connected to Hotel's AP mac of which suggests that it is made by HP. Setting 11n_disable=1 doesn't seem to stop the problem from happening.
> The 6300 cards were sold in Lenovo laptops less then a year ago. The 6235 is still sold today.
the 6300 as well...
The "it's a firmware bug, nothing can be done" it's a pretty annoying answer. I've had this bug for *years*. First on a T410 (6200), and now on T430 (6205). I know this is not a good place to complain, but maybe someone can point us to the right channel to ask for this to be fixed.
An idea came through my mind. What we do in the flush flow is that we wait until the driver finishes to transmit all the frames. If we are far away from the AP (and waiting to roam) this can be useless. It might be better to drop the frames. I guess that this can be something worth trying: diff --git a/drivers/net/wireless/iwlwifi/dvm/mac80211.c b/drivers/net/wireless/iwlwifi/dvm/mac80211.c index d8ebc7e..fe1c1af 100644 --- a/drivers/net/wireless/iwlwifi/dvm/mac80211.c +++ b/drivers/net/wireless/iwlwifi/dvm/mac80211.c @@ -1108,6 +1108,8 @@ static void iwlagn_mac_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif, goto done; } + drop = true; + /* * mac80211 will not push any more frames for transmit * until the flush is completed --- This will drop frames - but maybe the responsiveness of the firmware will be better. I am pretty sure that it won't solve all the problems though...
I'm testing this right now. So far, so good.
It seems to be working much better, knock on wood :)
(In reply to Emmanuel Grumbach from comment #104) I'm testing Emmanuel's oneliner as well, in a scenario where the bug is consistently reproduceable: *with unpatched driver*, received signal strength between -75dbm and -80dbm, it sometimes works, but below -80dbm (approx) it consistently hangs, spitting out on syslog each 2 seconds the "fail to flush" blurb, and on 'iw event' says "connection quality monitor event: RSSI went below threshold" ping6 to router's fe80:: also starts to spike, giving from 2000ms to even 40000ms, and everything wifi-related becomes very unresponsive (even a "modprobe -r iwldvm" or any "iw" command hangs for several seconds) *with patched driver*, same exact scenario (between -75 and -80dbm): not even one "fail to flush" so far, iw event is silent, and ping6 is (at last!) stable, negligible packet loss, only a few periods of latency, max 5 seconds, in the beginning of the test (30 minutes long ping6) ping6 fe80::ea94:f6ff:fe47:c059%wlan0 -L -i0.2 [...] 9062/8985 packets, 0% loss, min/avg/ewma/max = 0.973/75.654/7.810/5451.022 ms thanks a lot Emmanuel for making my laptop's wifi a lot less miserable :) will report tomorrow how it behaved in a longer term.
Same here, still working steady after more than 48 hours. This is using my 5Ghz which was just impossible before. I'm very happy :) Thanks Emmanuel.
thanks for your testing. I am uploading now a final version of the fix. In this version, I don't flush the VO frames, which is critical to ensure proper roaming. With this version I'd expect roaming to work much better.
Created attachment 152431 [details] drop frames as a work around. This is potential fix for this long living bug. Please (re-)test this version. This one should help for roaming issues.
Created attachment 152441 [details] updated version of the fix small change in the fix
@Oscar @Gui can you please test the final version of the patch (Comment 111)? Thanks
Sure, I'm testing it right now. It looks good.
Created attachment 152451 [details] fix I got it wrong again - here is the updated patch - minor changes.
I will now test the latest version of the patch, especially in our roaming environment and report the results later
Created attachment 152561 [details] syslog for me it became much worse! now i got the attached error, even without any roaming and low network traffic. Signal conditions are very good, access point is 4m away from the device. Intel Wireless 6300, linux-backports 3.16.2-1 with the latest patch
@Bill Bluescreen, this is really weird - I tested it with the same NIC as yours and with the same FW version. Did the backport work smoothly? can you capture tracing?
Also can you please try with debug=0xf? this will add data to the log.
(In reply to Emmanuel Grumbach from comment #118) > Also can you please try with debug=0xf? > actually I need 0x8000000f
ok i will try it now. the backport compiled without any problems
I found out something: The error doesn't appear if I disable background scanning in my wpa_supplicant.conf it was set to 10s, with the line commented out the problem doesn't exist anymore. But of course I can't do any roaming.
works for me: $ cat wpa_supplicant.conf ctrl_interface=/var/run/wpa_supplicant bgscan="simple:10:-45:10" network={ ssid="Guest" key_mgmt=NONE } [ 454.756007] iwlwifi 0000:02:00.0: U iwlagn_mac_flush Flushing SCD queues: 0xffdee [ 454.757427] iwlwifi 0000:02:00.0: U iwlagn_mac_flush wait transmit/flush all frames [ 454.757438] iwlwifi 0000:02:00.0: U iwl_trans_pcie_wait_txq_empty Emptying queue 0... [ 454.764316] iwlwifi 0000:02:00.0: U iwl_trans_pcie_wait_txq_empty Queue 0 is now empty. [ 454.764326] iwlwifi 0000:02:00.0: U iwl_trans_pcie_wait_txq_empty Emptying queue 1... [ 454.764331] iwlwifi 0000:02:00.0: U iwl_trans_pcie_wait_txq_empty Queue 1 is now empty. [ 454.764336] iwlwifi 0000:02:00.0: U iwl_trans_pcie_wait_txq_empty Emptying queue 2... [ 454.764341] iwlwifi 0000:02:00.0: U iwl_trans_pcie_wait_txq_empty Queue 2 is now empty. [ 454.764346] iwlwifi 0000:02:00.0: U iwl_trans_pcie_wait_txq_empty Emptying queue 3... [ 454.764351] iwlwifi 0000:02:00.0: U iwl_trans_pcie_wait_txq_empty Queue 3 is now empty. [ 454.764356] iwlwifi 0000:02:00.0: U iwl_trans_pcie_wait_txq_empty Emptying queue 11... [ 454.764361] iwlwifi 0000:02:00.0: U iwl_trans_pcie_wait_txq_empty Queue 11 is now empty. This to work fine... I have: sudo ping X.X.X.X -s 5000 -f running in background
also: $ uname -r 3.16.1+ $ git log --oneline -2 2f32ded iwlwifi: dvm: drop non VO frames when flushing 9a35988 Linux 3.16.1
Created attachment 152581 [details] dmesg output this is my dmesg output with background scan interval 10s This row seems interessting: [ 213.162517] iwlwifi 0000:01:00.0: Error sending REPLY_SCAN_CMD: time out after 2000ms.
this is clearly not related to my patch - nor to the bug discussed here. didn't you have that before my patch?
and no further scan requests from wpa_supplicant are possible: 1412109239.265251: wlan0: Starting radio work 'scan'@0xb2238 after 0.000071 second wait 1412109239.265457: wlan0: nl80211: scan request 1412109241.630229: nl80211: Scan trigger failed: ret=-110 (Connection timed out) 1412109241.631613: wlan0: Radio work 'scan'@0xb2238 done in 2.366345 seconds
no, I didn't have this before but I had to upgrade to 3.16 to apply your patch. Before I was using 3.15.1-1 but your patch did not apply since the signature of the method iwlagn_mac_flush changed
(In reply to Bill Bluescreen from comment #127) > no, I didn't have this before but I had to upgrade to 3.16 to apply your > patch. Before I was using 3.15.1-1 but your patch did not apply since the > signature of the method iwlagn_mac_flush changed ok - go back to 3.15.1-1 and add 77be2c54c5bd26279abc13807398771d80cda37a before my patch.
3.15.1-1: static void iwlagn_mac_flush(struct ieee80211_hw *hw, u32 queues, bool drop) 3.16/patch static void iwlagn_mac_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif, u32 queues, bool drop) ok, I will try this, thanks in advance!
(In reply to Bill Bluescreen from comment #129) > 3.15.1-1: > static void iwlagn_mac_flush(struct ieee80211_hw *hw, u32 queues, bool drop) > > 3.16/patch > static void iwlagn_mac_flush(struct ieee80211_hw *hw, struct ieee80211_vif > *vif, > u32 queues, bool drop) > > > > ok, I will try this, thanks in advance! Note that I have the exact same NIC (6300), FW and driver as you (I am currently running 3.16.1 + patch) and it works perfectly. But the timeout issue you saw may be related to power state of the link (PCI) which may vary from platform to platform.
btw: you are right, the problem exists also without your patch on 3.16.2... will try 3.15.1 again now with the patches...
Created attachment 152591 [details] dmesg output with 3.15.1-1 I got the attached error, more or less once a minute and without any roaming
Created attachment 152601 [details] dmesg output this is the dmesg output with debug=0x800000f 3.15.1-1, Intel 6300. No errors if background scanning is disabled. Even without your patch I have the error. No problems however if I do not apply the 77be2c54c5bd26279abc13807398771d80cda37a from kernel git!
I made it work by commenting out + //if (sdata) + //check_sdata_in_driver(sdata); in the 77be2c54c5bd26279abc13807398771d80cda37a patch. Now the patched driver works with enabled background-scanning and in a few simple and quick roaming tests I was not able to reproduce the problem anymore. I will do further testing during the next days but it does not look too bad at the moment! ;)
(In reply to Bill Bluescreen from comment #134) > I made it work by commenting out > > + //if (sdata) > + //check_sdata_in_driver(sdata); > > in the 77be2c54c5bd26279abc13807398771d80cda37a patch. Now the patched > driver works with enabled background-scanning and in a few simple and quick > roaming tests I was not able to reproduce the problem anymore. I will do > further testing during the next days but it does not look too bad at the > moment! ;) This makes absolutely no sense to me. Clearly these 2 lines don't affect the flow towards the NIC. you'll agree with me that if sdata is NULL, these 2 lines are no-op. If sdata is not NULL, these 2 lines just warn in a certain case... so ?
@Bill, please open a new bug - the discussion we are having isn't really related to this specific bug. It'll be easier to handle it as a separate bug.
@Oscar and @Gui please share your NIC model. I'd like to see how many NICs were covered. I tested 6200 and 6300 which are very similar. Other potential testers? I added in the URL section a link to Ubuntu's database in which you can find a link to a build that includes the fix. In case you have Ubuntu and don't feel like compiling the kernel.
@Emmanuel I'm testing kernel in my Ubuntu 14.10.
@Cristian what NIC do you have?
@Emmanuel 01:00.0 Network controller [0280]: Intel Corporation Centrino Advanced-N 6235 [8086:088e] (rev 24) Subsystem: Intel Corporation Centrino Advanced-N 6235 AGN [8086:4060] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 29 Region 0: Memory at f0500000 (64-bit, non-prefetchable) [size=8K] Capabilities: <access denied> Kernel driver in use: iwlwifi
Created attachment 152681 [details] dmesg out - version a Now only hope.
how fast did you get the "fail to flush" messages usually?
@Emmanuel I lie down to sleep and when I wake up, the message is :). About 8 hours. Is there any way to speed up the test?
Don't think there is way to speed up things. I guess I just need to be more patient :)
> Is there any way to speed up the test? > I experienced this error most of the time, if there is a lot of traffic.
I have no clue, but could this be in any way related to channels 12 + 13 which are not used in the US, unlike in many other countries, particularly in Europe. PS. I still think it's poor company policy from Intel to not update their firmware for "old" (still sold) devices. Rather than making you dance around the issue, trying to fix what your colleagues don't, it could possibly be fixed much easier directly in the firmware. It shows that an open source driver is only worth so much when the firmware is closed.
@Emanuel tomorrow I'll further investigate this issue and maybe open a new bug. It seems strange to me as well, tomorrow I will try to reproduce it again. Anyhow, now it seems to work for me. I didn't see the error any more and during this week I will bring the device to our production environment were we discovered the roaming problems and do more testing there. In lab until now everything seems to be fine. @pdknsk I absolutely share your point of view. A firmware fix would be very desirable for a device that is still sold by a company big and famous as Intel... However, at the moment a workaround that will do the job is better than nothing. I am curious to see how it performs in our production environment...
Created attachment 152711 [details] dmesg output - version b [ 9322.009112] iwlwifi 0000:01:00.0: Microcode SW error detected. Restarting 0x2000000. [ 9322.009125] iwlwifi 0000:01:00.0: CSR values: [ 9322.009130] iwlwifi 0000:01:00.0: (2nd byte of CSR_INT_COALESCING is CSR_INT_PERIODIC_REG) [ 9322.009139] iwlwifi 0000:01:00.0: CSR_HW_IF_CONFIG_REG: 0X00484b00 [ 9322.009147] iwlwifi 0000:01:00.0: CSR_INT_COALESCING: 0X0000ff40 [ 9322.009155] iwlwifi 0000:01:00.0: CSR_INT: 0X00000000 [ 9322.009163] iwlwifi 0000:01:00.0: CSR_INT_MASK: 0X00000000 [ 9322.009170] iwlwifi 0000:01:00.0: CSR_FH_INT_STATUS: 0X00000000 [ 9322.009178] iwlwifi 0000:01:00.0: CSR_GPIO_IN: 0X00000038 [ 9322.009186] iwlwifi 0000:01:00.0: CSR_RESET: 0X00000000 [ 9322.009194] iwlwifi 0000:01:00.0: CSR_GP_CNTRL: 0X080403c5 [ 9322.009202] iwlwifi 0000:01:00.0: CSR_HW_REV: 0X000000b0 [ 9322.009210] iwlwifi 0000:01:00.0: CSR_EEPROM_REG: 0Xc4540ffd [ 9322.009217] iwlwifi 0000:01:00.0: CSR_EEPROM_GP: 0X90000801 [ 9322.009225] iwlwifi 0000:01:00.0: CSR_OTP_GP_REG: 0X00030001 [ 9322.009233] iwlwifi 0000:01:00.0: CSR_GIO_REG: 0X00080044 [ 9322.009241] iwlwifi 0000:01:00.0: CSR_GP_UCODE_REG: 0X0000a963 [ 9322.009249] iwlwifi 0000:01:00.0: CSR_GP_DRIVER_REG: 0X00000000 [ 9322.009256] iwlwifi 0000:01:00.0: CSR_UCODE_DRV_GP1: 0X00000000 [ 9322.009273] iwlwifi 0000:01:00.0: CSR_UCODE_DRV_GP2: 0X00000000 [ 9322.009279] iwlwifi 0000:01:00.0: CSR_LED_REG: 0X00000040 [ 9322.009285] iwlwifi 0000:01:00.0: CSR_DRAM_INT_TBL_REG: 0X881189d2 [ 9322.009291] iwlwifi 0000:01:00.0: CSR_GIO_CHICKEN_BITS: 0X27800200 [ 9322.009298] iwlwifi 0000:01:00.0: CSR_ANA_PLL_CFG: 0X00000000 [ 9322.009303] iwlwifi 0000:01:00.0: CSR_MONITOR_STATUS_REG: 0X43f7f7d7 [ 9322.009310] iwlwifi 0000:01:00.0: CSR_HW_REV_WA_REG: 0X0001001a [ 9322.009316] iwlwifi 0000:01:00.0: CSR_DBG_HPET_MEM_REG: 0Xffff0010 [ 9322.009320] iwlwifi 0000:01:00.0: FH register values: [ 9322.009360] iwlwifi 0000:01:00.0: FH_RSCSR_CHNL0_STTS_WPTR_REG: 0X0cbb2400 [ 9322.009377] iwlwifi 0000:01:00.0: FH_RSCSR_CHNL0_RBDCB_BASE_REG: 0X00cccf60 [ 9322.009394] iwlwifi 0000:01:00.0: FH_RSCSR_CHNL0_WPTR: 0X00000008 [ 9322.009412] iwlwifi 0000:01:00.0: FH_MEM_RCSR_CHNL0_CONFIG_REG: 0X80801114 [ 9322.009429] iwlwifi 0000:01:00.0: FH_MEM_RSSR_SHARED_CTRL_REG: 0X000000fc [ 9322.009446] iwlwifi 0000:01:00.0: FH_MEM_RSSR_RX_STATUS_REG: 0X07030000 [ 9322.009463] iwlwifi 0000:01:00.0: FH_MEM_RSSR_RX_ENABLE_ERR_IRQ2DRV: 0X00000000 [ 9322.009479] iwlwifi 0000:01:00.0: FH_TSSR_TX_STATUS_REG: 0X07ff0001 [ 9322.009497] iwlwifi 0000:01:00.0: FH_TSSR_TX_ERROR_REG: 0X00000000 [ 9322.009503] iwlwifi 0000:01:00.0: Loaded firmware version: 18.168.6.1 [ 9322.009628] iwlwifi 0000:01:00.0: Start IWL Error Log Dump: [ 9322.009633] iwlwifi 0000:01:00.0: Status: 0x000002CC, count: 6 [ 9322.009639] iwlwifi 0000:01:00.0: 0x0000198A | ADVANCED_SYSASSERT [ 9322.009644] iwlwifi 0000:01:00.0: 0x00015920 | uPc [ 9322.009648] iwlwifi 0000:01:00.0: 0x00015910 | branchlink1 [ 9322.009653] iwlwifi 0000:01:00.0: 0x00015910 | branchlink2 [ 9322.009657] iwlwifi 0000:01:00.0: 0x0000DBEA | interruptlink1 [ 9322.009661] iwlwifi 0000:01:00.0: 0x00000000 | interruptlink2 [ 9322.009666] iwlwifi 0000:01:00.0: 0x000014ED | data1 [ 9322.009670] iwlwifi 0000:01:00.0: 0x00000023 | data2 [ 9322.009674] iwlwifi 0000:01:00.0: 0x000001DC | line [ 9322.009679] iwlwifi 0000:01:00.0: 0xD1402849 | beacon time [ 9322.009683] iwlwifi 0000:01:00.0: 0x111267B7 | tsf low [ 9322.009687] iwlwifi 0000:01:00.0: 0x00000060 | tsf hi [ 9322.009692] iwlwifi 0000:01:00.0: 0x00000000 | time gp1 [ 9322.009696] iwlwifi 0000:01:00.0: 0x2B4EC6B7 | time gp2 [ 9322.009700] iwlwifi 0000:01:00.0: 0x00000000 | time gp3 [ 9322.009705] iwlwifi 0000:01:00.0: 0x754312A8 | uCode version [ 9322.009709] iwlwifi 0000:01:00.0: 0x000000B0 | hw version [ 9322.009714] iwlwifi 0000:01:00.0: 0x00484B00 | board version [ 9322.009718] iwlwifi 0000:01:00.0: 0x0000001C | hcmd [ 9322.009722] iwlwifi 0000:01:00.0: 0xAF863040 | isr0 [ 9322.009727] iwlwifi 0000:01:00.0: 0x1189F800 | isr1 [ 9322.009731] iwlwifi 0000:01:00.0: 0x00000E1A | isr2 [ 9322.009736] iwlwifi 0000:01:00.0: 0x0547FCC3 | isr3 [ 9322.009740] iwlwifi 0000:01:00.0: 0x00000000 | isr4 [ 9322.009744] iwlwifi 0000:01:00.0: 0x00800110 | isr_pref [ 9322.009758] iwlwifi 0000:01:00.0: 0x000260A4 | wait_event [ 9322.009762] iwlwifi 0000:01:00.0: 0x000000D4 | l2p_control [ 9322.009765] iwlwifi 0000:01:00.0: 0x00000000 | l2p_duration [ 9322.009768] iwlwifi 0000:01:00.0: 0x00000007 | l2p_mhvalid [ 9322.009771] iwlwifi 0000:01:00.0: 0x00101042 | l2p_addr_match [ 9322.009774] iwlwifi 0000:01:00.0: 0x00000005 | lmpm_pmg_sel [ 9322.009777] iwlwifi 0000:01:00.0: 0x13011136 | timestamp [ 9322.009780] iwlwifi 0000:01:00.0: 0x00000818 | flow_handler [ 9322.009837] iwlwifi 0000:01:00.0: Start IWL Event Log Dump: nothing in log [ 9322.010165] iwlwifi 0000:01:00.0: Fw not loaded - dropping CMD: 1e [ 9322.010172] iwlwifi 0000:01:00.0: flush request fail [ 9322.010178] ieee80211 phy0: Hardware restart was requested [ 9322.010321] iwlwifi 0000:01:00.0: L1 Disabled; Enabling L0S [ 9322.017008] iwlwifi 0000:01:00.0: Radio type=0x2-0x1-0x0
@Cristian - I think this firmware error is not related to this specific bug. I have seen it before. Did the recovery succeed?
Hi there, here's my NIC: 03:00.0 Network controller: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection Subsystem: Intel Corporation WiFi Link 5100 AGN Physical Slot: 1 Flags: bus master, fast devsel, latency 0, IRQ 48 Memory at f2500000 (64-bit, non-prefetchable) [size=8K] Capabilities: [c8] Power Management version 3 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 00-1e-65-ff-ff-e6-0f-4c Kernel driver in use: iwlwifi Kernel modules: iwlwifi In my environment, I can't really tell the difference between your first patch and the last. But they all make a huge difference compared to the previous behavior.
@Emmanuel I do not understand your question. You mean if I could surf the internet? If if I was able to navigate.
Hi, I'm testing the ubuntu build with this NIC: 03:00.0 Network controller: Intel Corporation Centrino Advanced-N 6205 [Taylor Peak] (rev 34) Subsystem: Intel Corporation Centrino Advanced-N 6205 AGN Flags: bus master, fast devsel, latency 0, IRQ 45 Memory at f1c00000 (64-bit, non-prefetchable) [size=8K] Capabilities: [c8] Power Management version 3 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number a4-4e-31-ff-ff-41-4a-b8 Kernel driver in use: iwlwifi So far so good, but tomorrow will be able to test on a roaming environment. PD: thanks @emmanuel for your work
(In reply to Cristian Aravena Romero from comment #151) > @Emmanuel > > I do not understand your question. You mean if I could surf the internet? If > if I was able to navigate. Yes. I want to know if you had any issues besides this fat error in the log. @marianitn thanks for your testing.
I pushed the patch to my tree: https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/iwlwifi-fixes.git/commit/?id=ff36e3756cc8abb42608ceed4d173d128ac4c9ad I'll leave this bug open for a few more days to be able to get more reports and then I'll close it.
@Emmanuel: Does this patch address those timeouts that were happening in non-roaming environments woth weak signal in any way? I'm currently testing your patch but my location but unfortunately signal here is too strong to trigger it. I'll try to do more testing in couple of days but all discussion above was concerning roaming scenario and I'm a little bit concerned about this bug being closed since originally it doesn't seem it was about roaming at all. Thanks. Nikolay.
@Nikolay: This patch is good for your scenario too. I don't think that Oscar was testing roaming since my very first version (the one-liner) wasn't good for roaming and he said it improved the behavior on his setup.
That's right, I'm using a single AP here and it works a lot better with Emmanuel's patch.
My wakeup from standby problems seem to have vanished (or at least become less frequent) as well. Thx for the fix!
This is the device: 01:00.0 Network controller: Intel Corporation Centrino Advanced-N 6235 (rev 24) Subsystem: Intel Corporation Centrino Advanced-N 6235 AGN Flags: bus master, fast devsel, latency 0, IRQ 28 Memory at d0400000 (64-bit, non-prefetchable) [size=8K] Capabilities: [c8] Power Management version 3 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number ************ Kernel driver in use: iwlwifi
I've given it a try on intel 6300. I didn't get any 'long pings' but after several hours of reasonably good work it just stopped receiving or sending anything. Like it went completelly deaf and mute, suddenly, no errors in logs as far as I can see. Other clients of same AP do not exibit same behaviour. I'm running 3.17 + latest patch. I'm not sure if this is problem itself is in any way related to this bug. But I guess my question at this point is: what should I do to gather reasonable debugging information to open a new bug, considering that logs are silent? Thanks. Nikolay.
@Nikolay, This is obviously a different issue. To see what happens, I'd need you to run with debug=0xC0800000 given as parameter to iwlwifi. It'll be very chatty, but it will let us know what is happening.
roaming seems to work much better and faster now. However, now we have a high rate of lost packages. When testing with iperf (UDP at 2Mbit/s) we have 1~2% packet loss, even at best signal conditions. Without the last patch, the same device has <0,02% lost packages. Does anyone else have similar effects?
I can try to make the patch slightly less brutal - I'll come up with an updated version in the next 2 hours.
I'm experiencing exactly the same behavior as comment 160, although I can't blame the patch because the network is very bad here, and I recall having this behaviour before. But other devices work. When I try to enable debug I get: [ 3.882957] iwlwifi: unknown parameter 'debug' ignored [ 3.883229] iwlwifi 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control [ 3.883360] iwlwifi 0000:03:00.0: irq 45 for MSI/MSI-X [ 3.889803] iwlwifi 0000:03:00.0: loaded firmware version 18.168.6.1 op_mode iwldvm [ 3.916377] iwlwifi 0000:03:00.0: CONFIG_IWLWIFI_DEBUG disabled [ 3.916381] iwlwifi 0000:03:00.0: CONFIG_IWLWIFI_DEBUGFS enabled [ 3.916383] iwlwifi 0000:03:00.0: CONFIG_IWLWIFI_DEVICE_TRACING enabled [ 3.916386] iwlwifi 0000:03:00.0: Detected Intel(R) Centrino(R) Advanced-N 6205 AGN, REV=0xB0 [ 3.916547] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S [ 3.957977] ieee80211 phy0: Selected rate control algorithm 'iwl-agn-rs' [ 4.589261] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S [ 4.595840] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 [ 4.863533] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S [ 4.870060] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 I think the ubuntu build I'm running (3.16.0-20.27+lp1328345v201410061136) doesn't have debug enabled. Is there other build I can try?
(In reply to marianitn from comment #164) > > I think the ubuntu build I'm running (3.16.0-20.27+lp1328345v201410061136) > doesn't have debug enabled. Is there other build I can try? exactly - you need CONFIG_IWLWIFI_DEBUG Note that this is not the same bug.
Created attachment 153041 [details] less aggressive drop This should be applied on top of the previous patch. It will be less aggressive and will hopefully reduce the packet loss. Note that you will probably see "fail to flush all tx fifo queues Q 0" ignore those and let me know how it behaves If this is still not good enough - I will make mac80211 give a hint to the driver if we are disconnecting or not.
@Emanuel thank you for the quick response. Wouldn't this lead to the same behaviour as before? In case of any errors (as before while roaming in our case) this will lead to the same situation: the device will be locked and the driver will wait for 2000ms at iwl_trans_wait_tx_queue_empty and only after this 2 seconds the queues will be flushed. Maybe this could work with shorter time-out values in iwl_trans_wait_tx_queue_empty? Or do I miss something? In any case, I will do some testing tomorrow and report as soon as possibile!
ok, I missed the change in the second file, in fact the time out is shorter now ;)
I reduced the time out to 100ms. I first try to give a chance to the packets to be sent. I that doesn't work - I flush. Please also note that 1~2% packet loss is not nice I agree, but in TCP you shouldn't really feel it I think. Unless you need super high throughput.
For me at home a loss of 2% doesn't matter, I do completely agree. Our customer however has higher expectations and requirements and won't accept values higher than 1%. (Which was no problem at all before applying the patch). Tomorrow I will try the second version. PS: I just tried to reproduce the error at home with my laptop (Intel Wireless 6235) and I didn't loose any packets with yesterday's version of the patch. But with the 6235 card I never had any trouble, no "fail to flush all tx fifo queue" messages. Just to give feedback for another card...
Created attachment 153091 [details] dmesg output This is the output with the latest patch on intel 6300 NIC :-(
It feels like that the latest patches aren't perform as well as the early ones when in low-signal (no roaming). There's a room here where I've always lost the connection, but then it started working perfectly with this patch (https://bugzilla.kernel.org/attachment.cgi?id=152451&action=diff) and now it fails again with the new one. I'll try to do a more scientific test and post the results. Which logs would be useful to help debug this?
Yes - this is what I was afraid of - this kind of heuristics is ... problematic. I'll try to think it over during the week end.
Chatting with a colleague gave us an idea: can you please load mac80211 with beacon_loss_count=1000 as a module parameter? And of course - drop my patch from comment 166. This should reduce the packet loss and will probably not affect roaming. thanks
Comment #171 might be a problem with our backport, with kernel 3.16.2 it seems ok, no more package loss with the second patch but I didn't test roaming yet.
I think that my suggestion from Comment 174 is better than the second patch.
*** Bug 86101 has been marked as a duplicate of this bug. ***
Created attachment 153421 [details] dmesg output I removed the last patch and loaded mac80211 with the proposed option but now we have the well-known "fail to flush all tx fifi queues..."
(In reply to Bill Bluescreen from comment #178) > I removed the last patch and loaded mac80211 with the proposed option but > now we have the well-known "fail to flush all tx fifi queues..." Did you have any patch applied? That was intended to go with the patch from comment #154
(In reply to Johannes Berg from comment #179) > (In reply to Bill Bluescreen from comment #178) > > > I removed the last patch and loaded mac80211 with the proposed option but > > now we have the well-known "fail to flush all tx fifi queues..." > > Did you have any patch applied? That was intended to go with the patch from > comment #154 yes, we applied the patch from comment #154 and removed the one from #166
Please reproduce with debug=0x8000000f. This is very strange. Are sure you didn't have issues with the patches orbackport?
Created attachment 153431 [details] syslog Hello, we tested again with debug messages enabled. The txflush error did not appear any more but we still have packet loss. In the debug output "iwlagn_mac_flush wait transmit/flush all frames" appears several times. Shouldn't these lines be vanished after setting missed_beacon_count=1000?
Also here in our lab we still have "iwlagn_mac_flush wait transmit/flush all frames" messages in our system log.
Please add dump_stack() in iwlagn_mac_flush() so that we can see what triggers the flush. Thanks
@Bill - we are really discussing a different issue. If you want, you can open a new bug and CC ilw@linux.intel.com. I don't want to continue spamming all the (many) people following this bug.
I am closing this bug. New / related issues should be discussed in a new bug that can point to this one.