original_kernel

Commit Graph

Author	SHA1	Message	Date
John Fastabend	d85f017758	net: tls, fix sk_write_space NULL write when tx disabled The ctx->sk_write_space pointer is only set when TLS tx mode is enabled. When running without TX mode its a null pointer but we still set the sk sk_write_space pointer on close(). Fix the close path to only overwrite sk->sk_write_space when the current pointer is to the tls_write_space function indicating the tls module should clean it up properly as well. Reported-by: Hillf Danton <hdanton@sina.com> Cc: Ying Xue <ying.xue@windriver.com> Cc: Andrey Konovalov <andreyknvl@google.com> Fixes: `57c722e932` ("net/tls: swap sk_write_space on close") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-15 12:40:15 -07:00
Wenwen Wang	6f967f8b1b	liquidio: add cleanup in octeon_setup_iq() If oct->fn_list.enable_io_queues() fails, no cleanup is executed, leading to memory/resource leaks. To fix this issue, invoke octeon_delete_instr_queue() before returning from the function. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-15 12:37:37 -07:00
David S. Miller	0b24a44176	mlx5-fixes-2019-08-15 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl1Vp+0ACgkQSD+KveBX +j76RwgAxYKiyXsvOvXTpRC1VrmOcrPw4Nr+UjL7CCKbY6f66rdFHdK5SuVphpEV zwsTyD5mMvYCD5+6oAAA2p9x65d7n6mrDvkGjBZ8RzSmAeavchOz0iHwBrc98IDF iPOeoCLcaLRTf1O+J4NHJwx1DgnLkImyNqIfe5Ce0oGoYoA7UeDXvzLQfXBAV8+b MPZcKv3U+++zg2QIX2d8L0h+5k1yO0PhqBX1uzHVmEMN3B7ZB7bm4oXryfXQIJIF +oylg9WjEc81CCEVFUAMrVZikfLWO4Qx/aHuBeYcn06n5Jdf2aUkh3SgW9YCk4KT DEExmv9RdYTkEymnltjEfklrMbUZWA== =bFyv -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2019-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-08-15 This series introduces two fixes to mlx5 driver. 1) Eran fixes a compatibility issue with ethtool flash. 2) Maxim fixes a race in XSK wakeup flow. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-15 12:09:39 -07:00
Eran Ben Elisha	f43d48d10a	net/mlx5e: Fix compatibility issue with ethtool flash device Cited patch deleted ethtool flash device support, as ethtool core can fallback into devlink flash callback. However, this is supported only if there is a devlink port registered over the corresponding netdevice. As mlx5e do not have devlink port support over native netdevice, it broke the ability to flash device via ethtool. This patch re-add the ethtool callback to avoid user functionality breakage when trying to flash device via ethtool. Fixes: `9c8bca2637` ("mlx5: Move firmware flash implementation to devlink") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-15 11:43:57 -07:00
Maxim Mikityanskiy	e0d57d9c7e	net/mlx5e: Fix a race with XSKICOSQ in XSK wakeup flow Add a missing spinlock around XSKICOSQ usage at the activation stage, because there is a race between a configuration change and the application calling sendto(). Fixes: `db05815b36` ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-15 11:43:56 -07:00
Anders Roxell	2aafdf5a57	selftests: net: tcp_fastopen_backup_key.sh: fix shellcheck issue When running tcp_fastopen_backup_key.sh the following issue was seen in a busybox environment. ./tcp_fastopen_backup_key.sh: line 33: [: -ne: unary operator expected Shellcheck showed the following issue. $ shellcheck tools/testing/selftests/net/tcp_fastopen_backup_key.sh In tools/testing/selftests/net/tcp_fastopen_backup_key.sh line 33: if [ $val -ne 0 ]; then ^-- SC2086: Double quote to prevent globbing and word splitting. Rework to do a string comparison instead. Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-15 11:34:32 -07:00
Wenwen Wang	c554336efa	cxgb4: fix a memory leak bug In blocked_fl_write(), 't' is not deallocated if bitmap_parse_user() fails, leading to a memory leak bug. To fix this issue, free t before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-14 20:03:42 -07:00
zhengbin	6d5afe2039	sctp: fix memleak in sctp_send_reset_streams If the stream outq is not empty, need to kfree nstr_list. Fixes: `d570a59c5b` ("sctp: only allow the out stream reset when the stream outq is empty") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 20:45:01 -07:00
David Ahern	d00ee64e1d	netlink: Fix nlmsg_parse as a wrapper for strict message parsing Eric reported a syzbot warning: BUG: KMSAN: uninit-value in nh_valid_get_del_req+0x6f1/0x8c0 net/ipv4/nexthop.c:1510 CPU: 0 PID: 11812 Comm: syz-executor444 Not tainted 5.3.0-rc3+ #17 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x191/0x1f0 lib/dump_stack.c:113 kmsan_report+0x162/0x2d0 mm/kmsan/kmsan_report.c:109 __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:294 nh_valid_get_del_req+0x6f1/0x8c0 net/ipv4/nexthop.c:1510 rtm_del_nexthop+0x1b1/0x610 net/ipv4/nexthop.c:1543 rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5223 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5241 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf6c/0x1050 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg net/socket.c:657 [inline] ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311 __sys_sendmmsg+0x53a/0xae0 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] __se_sys_sendmmsg+0xbd/0xe0 net/socket.c:2439 __x64_sys_sendmmsg+0x56/0x70 net/socket.c:2439 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:297 entry_SYSCALL_64_after_hwframe+0x63/0xe7 The root cause is nlmsg_parse calling __nla_parse which means the header struct size is not checked. nlmsg_parse should be a wrapper around __nlmsg_parse with NL_VALIDATE_STRICT for the validate argument very much like nlmsg_parse_deprecated is for NL_VALIDATE_LIBERAL. Fixes: `3de6440354` ("netlink: re-add parse/validate functions in strict mode") Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 20:37:16 -07:00
Heiner Kallweit	c36757eb9d	net: phy: consider AN_RESTART status when reading link status After configuring and restarting aneg we immediately try to read the link status. On some systems the PHY may not yet have cleared the "aneg complete" and "link up" bits, resulting in a false link-up signal. See [0] for a report. Clause 22 and 45 both require the PHY to keep the AN_RESTART bit set until the PHY actually starts auto-negotiation. Let's consider this in the generic functions for reading link status. The commit marked as fixed is the first one where the patch applies cleanly. [0] https://marc.info/?t=156518400300003&r=1&w=2 Fixes: `c1164bb1a6` ("net: phy: check PMAPMD link status only in genphy_c45_read_link") Tested-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 19:49:01 -07:00
Wenwen Wang	48ec7014c5	net/mlx4_en: fix a memory leak bug In mlx4_en_config_rss_steer(), 'rss_map->indir_qp' is allocated through kzalloc(). After that, mlx4_qp_alloc() is invoked to configure RSS indirection. However, if mlx4_qp_alloc() fails, the allocated 'rss_map->indir_qp' is not deallocated, leading to a memory leak bug. To fix the above issue, add the 'qp_alloc_err' label to free 'rss_map->indir_qp'. Fixes: `4931c6ef04` ("net/mlx4_en: Optimized single ring steering") Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 19:42:43 -07:00
Thomas Falcon	66cf4710b2	ibmveth: Convert multicast list size for little-endian system The ibm,mac-address-filters property defines the maximum number of addresses the hypervisor's multicast filter list can support. It is encoded as a big-endian integer in the OF device tree, but the virtual ethernet driver does not convert it for use by little-endian systems. As a result, the driver is not behaving as it should on affected systems when a large number of multicast addresses are assigned to the device. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 19:38:31 -07:00
Julian Wiedmann	072f794000	s390/qeth: serialize cmd reply with concurrent timeout Callbacks for a cmd reply run outside the protection of card->lock, to allow for additional cmds to be issued & enqueued in parallel. When qeth_send_control_data() bails out for a cmd without having received a reply (eg. due to timeout), its callback may concurrently be processing a reply that just arrived. In this case, the callback potentially accesses a stale reply->reply_param area that eg. was on-stack and has already been released. To avoid this race, add some locking so that qeth_send_control_data() can (1) wait for a concurrently running callback, and (2) zap any pending callback that still wants to run. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 19:26:47 -07:00
Xin Long	a1794de8b9	sctp: fix the transport error_count check As the annotation says in sctp_do_8_2_transport_strike(): "If the transport error count is greater than the pf_retrans threshold, and less than pathmaxrtx ..." It should be transport->error_count checked with pathmaxrxt, instead of asoc->pf_retrans. Fixes: `5aa93bcf66` ("sctp: Implement quick failover draft from tsvwg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 19:08:20 -07:00
André Draszik	bb0ce4c151	net: phy: at803x: stop switching phy delay config needlessly This driver does a funny dance disabling and re-enabling RX and/or TX delays. In any of the RGMII-ID modes, it first disables the delays, just to re-enable them again right away. This looks like a needless exercise. Just enable the respective delays when in any of the relevant 'id' modes, and disable them otherwise. Also, remove comments which don't add anything that can't be seen by looking at the code. Signed-off-by: André Draszik <git@andred.net> CC: Andrew Lunn <andrew@lunn.ch> CC: Florian Fainelli <f.fainelli@gmail.com> CC: Heiner Kallweit <hkallweit1@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-12 14:02:29 -07:00
Nathan Chancellor	125b7e0949	net: tc35815: Explicitly check NET_IP_ALIGN is not zero in tc35815_rx clang warns: drivers/net/ethernet/toshiba/tc35815.c:1507:30: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand] if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) ^ ~~~~~~~~~~~~ drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: use '&' for a bitwise operation if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) ^~ & drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: remove constant to silence this warning if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN) ~^~~~~~~~~~~~~~~ 1 warning generated. Explicitly check that NET_IP_ALIGN is not zero, which matches how this is checked in other parts of the tree. Because NET_IP_ALIGN is a build time constant, this check will be constant folded away during optimization. Fixes: `82a9928db5` ("tc35815: Enable StripCRC feature") Link: https://github.com/ClangBuiltLinux/linux/issues/608 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:41:48 -07:00
Chris Packham	8874ecae29	tipc: initialise addr_trail_end when setting node addresses We set the field 'addr_trial_end' to 'jiffies', instead of the current value 0, at the moment the node address is initialized. This guarantees we don't inadvertently enter an address trial period when the node address is explicitly set by the user. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:40:04 -07:00
Chen-Yu Tsai	58799865be	net: dsa: Check existence of .port_mdb_add callback before calling it The dsa framework has optional .port_mdb_{prepare,add,del} callback fields for drivers to handle multicast database entries. When adding an entry, the framework goes through a prepare phase, then a commit phase. Drivers not providing these callbacks should be detected in the prepare phase. DSA core may still bypass the bridge layer and call the dsa_port_mdb_add function directly with no prepare phase or no switchdev trans object, and the framework ends up calling an undefined .port_mdb_add callback. This results in a NULL pointer dereference, as shown in the log below. The other functions seem to be properly guarded. Do the same for .port_mdb_add in dsa_switch_mdb_add_bitmap() as well. 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = (ptrval) [00000000] *pgd=00000000 Internal error: Oops: 80000005 [#1] SMP ARM Modules linked in: rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi mac80211 cfg80211 CPU: 1 PID: 134 Comm: kworker/1:2 Not tainted 5.3.0-rc1-00247-gd3519030752a #1 Hardware name: Allwinner sun7i (A20) Family Workqueue: events switchdev_deferred_process_work PC is at 0x0 LR is at dsa_switch_event+0x570/0x620 pc : [<00000000>] lr : [<c08533ec>] psr: 80070013 sp : ee871db8 ip : 00000000 fp : ee98d0a4 r10: 0000000c r9 : 00000008 r8 : ee89f710 r7 : ee98d040 r6 : ee98d088 r5 : c0f04c48 r4 : ee98d04c r3 : 00000000 r2 : ee89f710 r1 : 00000008 r0 : ee98d040 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 6deb406a DAC: 00000051 Process kworker/1:2 (pid: 134, stack limit = 0x(ptrval)) Stack: (0xee871db8 to 0xee872000) 1da0: ee871e14 103ace2d 1dc0: 00000000 ffffffff 00000000 ee871e14 00000005 00000000 c08524a0 00000000 1de0: ffffe000 c014bdfc c0f04c48 ee871e98 c0f04c48 ee9e5000 c0851120 c014bef0 1e00: 00000000 b643aea2 ee9b4068 c08509a8 ee2bf940 ee89f710 ee871ecb 00000000 1e20: 00000008 103ace2d 00000000 c087e248 ee29c868 103ace2d 00000001 ffffffff 1e40: 00000000 ee871e98 00000006 00000000 c0fb2a50 c087e2d0 ffffffff c08523c4 1e60: ffffffff c014bdfc 00000006 c0fad2d0 ee871e98 ee89f710 00000000 c014c500 1e80: 00000000 ee89f3c0 c0f04c48 00000000 ee9e5000 c087dfb4 ee9e5000 00000000 1ea0: ee89f710 ee871ecb 00000001 103ace2d 00000000 c0f04c48 00000000 c087e0a8 1ec0: 00000000 efd9a3e0 0089f3c0 103ace2d ee89f700 ee89f710 ee9e5000 00000122 1ee0: 00000100 c087e130 ee89f700 c0fad2c8 c1003ef0 c087de4c 2e928000 c0fad2ec 1f00: c0fad2ec ee839580 ef7a62c0 ef7a9400 00000000 c087def8 c0fad2ec c01447dc 1f20: ef315640 ef7a62c0 00000008 ee839580 ee839594 ef7a62c0 00000008 c0f03d00 1f40: ef7a62d8 ef7a62c0 ffffe000 c0145b84 ffffe000 c0fb2420 c0bfaa8c 00000000 1f60: ffffe000 ee84b600 ee84b5c0 00000000 ee870000 ee839580 c0145b40 ef0e5ea4 1f80: ee84b61c c014a6f8 00000001 ee84b5c0 c014a5b0 00000000 00000000 00000000 1fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84) [<c014bdfc>] (notifier_call_chain) from [<c014bef0>] (raw_notifier_call_chain+0x18/0x20) [<c014bef0>] (raw_notifier_call_chain) from [<c08509a8>] (dsa_port_mdb_add+0x48/0x74) [<c08509a8>] (dsa_port_mdb_add) from [<c087e248>] (__switchdev_handle_port_obj_add+0x54/0xd4) [<c087e248>] (__switchdev_handle_port_obj_add) from [<c087e2d0>] (switchdev_handle_port_obj_add+0x8/0x14) [<c087e2d0>] (switchdev_handle_port_obj_add) from [<c08523c4>] (dsa_slave_switchdev_blocking_event+0x94/0xa4) [<c08523c4>] (dsa_slave_switchdev_blocking_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84) [<c014bdfc>] (notifier_call_chain) from [<c014c500>] (blocking_notifier_call_chain+0x50/0x68) [<c014c500>] (blocking_notifier_call_chain) from [<c087dfb4>] (switchdev_port_obj_notify+0x44/0xa8) [<c087dfb4>] (switchdev_port_obj_notify) from [<c087e0a8>] (switchdev_port_obj_add_now+0x90/0x104) [<c087e0a8>] (switchdev_port_obj_add_now) from [<c087e130>] (switchdev_port_obj_add_deferred+0x14/0x5c) [<c087e130>] (switchdev_port_obj_add_deferred) from [<c087de4c>] (switchdev_deferred_process+0x64/0x104) [<c087de4c>] (switchdev_deferred_process) from [<c087def8>] (switchdev_deferred_process_work+0xc/0x14) [<c087def8>] (switchdev_deferred_process_work) from [<c01447dc>] (process_one_work+0x218/0x50c) [<c01447dc>] (process_one_work) from [<c0145b84>] (worker_thread+0x44/0x5bc) [<c0145b84>] (worker_thread) from [<c014a6f8>] (kthread+0x148/0x150) [<c014a6f8>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c) Exception stack(0xee871fb0 to 0xee871ff8) 1fa0: 00000000 00000000 00000000 00000000 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 Code: bad PC value ---[ end trace 1292c61abd17b130 ]--- [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84) corresponds to $ arm-linux-gnueabihf-addr2line -C -i -e vmlinux c08533ec linux/net/dsa/switch.c:156 linux/net/dsa/switch.c:178 linux/net/dsa/switch.c:328 Fixes: `e6db98db8a` ("net: dsa: add switch mdb bitmap functions") Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:36:51 -07:00
Petr Machata	8028ccda39	mlxsw: spectrum_ptp: Keep unmatched entries in a linked list To identify timestamps for matching with their packets, Spectrum-1 uses a five-tuple of (port, direction, domain number, message type, sequence ID). If there are several clients from the same domain behind a single port sending Delay_Req's, the only thing differentiating these packets, as far as Spectrum-1 is concerned, is the sequence ID. Should sequence IDs between individual clients be similar, conflicts may arise. That is not a problem to hardware, which will simply deliver timestamps on a first comes, first served basis. However the driver uses a simple hash table to store the unmatched pieces. When a new conflicting piece arrives, it pushes out the previously stored one, which if it is a packet, is delivered without timestamp. Later on as the corresponding timestamps arrive, the first one is mismatched to the second packet, and the second one is never matched and eventually is GCd. To correct this issue, instead of using a simple rhashtable, use rhltable to keep the unmatched entries. Previously, a found unmatched entry would always be removed from the hash table. That is not the case anymore--an incompatible entry is left in the hash table. Therefore removal from the hash table cannot be used to confirm the validity of the looked-up pointer, instead the lookup would simply need to be redone. Therefore move it inside the critical section. This simplifies a lot of the code. Fixes: `8748642751` ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Reported-by: Alex Veber <alexve@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:35:39 -07:00
Jonathan Neuschäfer	d81f41411c	net: nps_enet: Fix function names in doc comments Adjust the function names in two doc comments to match the corresponding functions. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:32:33 -07:00
David Howells	68553f1a6f	rxrpc: Fix local refcounting Fix rxrpc_unuse_local() to handle a NULL local pointer as it can be called on an unbound socket on which rx->local is not yet set. The following reproduced (includes omitted): int main(void) { socket(AF_RXRPC, SOCK_DGRAM, AF_INET); return 0; } causes the following oops to occur: BUG: kernel NULL pointer dereference, address: 0000000000000010 ... RIP: 0010:rxrpc_unuse_local+0x8/0x1b ... Call Trace: rxrpc_release+0x2b5/0x338 __sock_release+0x37/0xa1 sock_close+0x14/0x17 __fput+0x115/0x1e9 task_work_run+0x72/0x98 do_exit+0x51b/0xa7a ? __context_tracking_exit+0x4e/0x10e do_group_exit+0xab/0xab __x64_sys_exit_group+0x14/0x17 do_syscall_64+0x89/0x1d4 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: syzbot+20dee719a2e090427b5f@syzkaller.appspotmail.com Fixes: `730c5fd42c` ("rxrpc: Fix local endpoint refcounting") Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:28:29 -07:00
David Ahern	59c84b9fcf	netdevsim: Restore per-network namespace accounting for fib entries Prior to the commit in the fixes tag, the resource controller in netdevsim tracked fib entries and rules per network namespace. Restore that behavior. Fixes: `5fc494225c` ("netdevsim: create devlink instance per netdevsim instance") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 20:59:19 -07:00
David S. Miller	9481382b36	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2019-08-11 The following pull-request contains BPF updates for your net tree. The main changes are: 1) x64 JIT code generation fix for backward-jumps to 1st insn, from Alexei. 2) Fix buggy multi-closing of BTF file descriptor in libbpf, from Andrii. 3) Fix libbpf_num_possible_cpus() to make it thread safe, from Takshak. 4) Fix bpftool to dump an error if pinning fails, from Jakub. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 14:49:34 -07:00
Jakub Kicinski	57c722e932	net/tls: swap sk_write_space on close Now that we swap the original proto and clear the ULP pointer on close we have to make sure no callback will try to access the freed state. sk_write_space is not part of sk_prot, remember to swap it. Reported-by: syzbot+dcdc9deefaec44785f32@syzkaller.appspotmail.com Fixes: `95fa145479` ("bpf: sockmap/tls, close can race with map free") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 19:55:22 -07:00
Dexuan Cui	6d0d779dca	hv_netvsc: Fix a warning of suspicious RCU usage This fixes a warning of "suspicious rcu_dereference_check() usage" when nload runs. Fixes: `776e726bfb` ("netvsc: fix RCU warning in get_stats") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:42:19 -07:00
David S. Miller	9566e650bf	mlx5-fixes-2019-08-08 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl1Mf5AACgkQSD+KveBX +j7ztQgAiaCmQxJ6wV82KFVbmGA04oC3S0vz2lafTB1L3i2TqX6ZiRW07qGIPfml HtylFu1Qta6sXDlJC5iQK48TnQaOr4HJqTxigofrWA3Pt7sYI1FNVG73BR1p8M5D 11ZZM6zCGkCd8cD5hTMzhvEr2d5NSMIy5wzC8DiiEph9JwAEh3ojeteP9KuwYdzX /8W6swmwgMg/VKGlOwPhQcxSot0PcBm/a2sNjfGJ/NqyAb1yMQ5Bttt30A8346YN kIhUBWIvDkbWDxxokRFGbpx5VswqJpMDKUEACWoJT44JyfFBMpMXPNe6wLvd1IIf noxmAqe+CSTbr5UqJZdjqSrId5GhBA== =X/dE -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2019-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-08-08 This series introduces some fixes to mlx5 driver. Highlights: 1) From Tariq, Critical mlx5 kTLS fixes to better align with hw specs. 2) From Aya, Fixes to mlx5 tx devlink health reporter. 3) From Maxim, aRFs parsing to use flow dissector to avoid relying on invalid skb fields. Please pull and let me know if there is any problem. For -stable v4.3 ('net/mlx5e: Only support tx/rx pause setting for port owner') For -stable v4.9 ('net/mlx5e: Use flow keys dissector to parse packets for ARFS') For -stable v5.1 ('net/mlx5e: Fix false negative indication on tx reporter CQE recovery') ('net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter') ('net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off') Note: when merged with net-next this minor conflict will pop up: ++<<<<<<< (net-next) + if (is_eswitch_flow) { + flow->esw_attr->match_level = match_level; + flow->esw_attr->tunnel_match_level = tunnel_match_level; ++======= + if (flow->flags & MLX5E_TC_FLOW_ESWITCH) { + flow->esw_attr->inner_match_level = inner_match_level; + flow->esw_attr->outer_match_level = outer_match_level; ++>>>>>>> (net) To resolve, use hunks from net (2nd) and replace: if (flow->flags & MLX5E_TC_FLOW_ESWITCH) with if (is_eswitch_flow) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:23:33 -07:00
Taehee Yoo	8b6381600d	ixgbe: fix possible deadlock in ixgbe_service_task() ixgbe_service_task() calls unregister_netdev() under rtnl_lock(). But unregister_netdev() internally calls rtnl_lock(). So deadlock would occur. Fixes: `59dd45d550` ("ixgbe: firmware recovery mode") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:17:00 -07:00
David S. Miller	703acf6259	Merge branch 'Fix-collisions-in-socket-cookie-generation' Daniel Borkmann says: ==================== Fix collisions in socket cookie generation This change makes the socket cookie generator as a global counter instead of per netns in order to fix cookie collisions for BPF use cases we ran into. See main patch #1 for more details. Given the change is small/trivial and fixes an issue we're seeing my preference would be net tree (though it cleanly applies to net-next as well). Went for net tree instead of bpf tree here given the main change is in net/core/sock_diag.c, but either way would be fine with me. v1 -> v2: - Fix up commit description in patch #1, thanks Eric! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:14:46 -07:00
Daniel Borkmann	609a2ca57a	bpf: sync bpf.h to tools infrastructure Pull in updates in BPF helper function description. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:14:46 -07:00
Daniel Borkmann	cd48bdda4f	sock: make cookie generation global instead of per netns Generating and retrieving socket cookies are a useful feature that is exposed to BPF for various program types through bpf_get_socket_cookie() helper. The fact that the cookie counter is per netns is quite a limitation for BPF in practice in particular for programs in host namespace that use socket cookies as part of a map lookup key since they will be causing socket cookie collisions e.g. when attached to BPF cgroup hooks or cls_bpf on tc egress in host namespace handling container traffic from veth or ipvlan devices with peer in different netns. Change the counter to be global instead. Socket cookie consumers must assume the value as opqaue in any case. Not every socket must have a cookie generated and knowledge of the counter value itself does not provide much value either way hence conversion to global is fine. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Martynas Pumputis <m@lambda.lt> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 13:14:46 -07:00
David S. Miller	7bac762d8d	RxRPC fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl1NmIAACgkQ+7dXa6fL C2vroRAAioYEcgShBOH6uqjuGiWZjRsGX6Ppw8QKg7LTvvAes5oT74t3jL0a8as9 Zn3Y5bj7jueUH1vakFGFdycWNO8wgxD6+63Ki0M070Yy15lV9uH1VyAI8BJhMxHe M3v6Yjy5OzILNqqs5+9i3om8ocBLkwfiGlocHXxQv15MA6lMgDFl23d1J3L4KK0o sItorzvg90wIWNWR2jctbjqyAdS17LeXdUL41Oc3SXJi9wLHony3LAH3al4Rb9k7 FgWHyD9colDEezl4cWAehjZvm060EfWmuQsDT7+iDhNftokZTeDlRdnizDzAISYQ i3wleKXegxFpXJ6FfaJeL8BDMUjTVG/MWUWSCb4gKYOr1XC6ioVhvo2DH7mgcb+2 E4dP76u0S7hoYI0Vn3X8091Vc4rXv6tCgw71HYA5wWKnZIfxFo+conOFoNujRh3V 3DZXj/hr/3UzIA8tsqWadfGfE9O08bObnhqfqFCzSTy3UhlWGkKCKEIyJljhh2wb ztvDpASl2L62NPtd0irHG2yHxbVFH0lWcF1buL/wbndLo3HaQ7bTxQ/s8RDOpKTH 40Rv9Leih4QuT8WLrEMhRkgxE1heP9dGC34ZD2r1eZX6Lb9A5w21GjYd3/hl0dyG llb75sqyBfAbfYveMij58hLuk4GW7lcaoQ0rnDwvS6yJ1LTHxLI= =u6EZ -----END PGP SIGNATURE----- Merge tag 'rxrpc-fixes-20190809' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== Here's a couple of fixes for rxrpc: (1) Fix refcounting of the local endpoint. (2) Don't calculate or report packet skew information. This has been obsolete since AFS 3.1 and so is a waste of resources. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 11:27:17 -07:00
Daniel Borkmann	4f7aafd78a	Merge branch 'bpf-bpftool-pinning-error-msg' Jakub Kicinski says: ==================== First make sure we don't use "prog" in error messages because the pinning operation could be performed on a map. Second add back missing error message if pin syscall failed. ==================== Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-09 17:38:53 +02:00
Jakub Kicinski	3c7be384fe	tools: bpftool: add error message on pin failure No error message is currently printed if the pin syscall itself fails. It got lost in the loadall refactoring. Fixes: `77380998d9` ("bpftool: add loadall command") Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-09 17:38:53 +02:00
Jakub Kicinski	b3e78adcbf	tools: bpftool: fix error message (prog -> object) Change an error message to work for any object being pinned not just programs. Fixes: `71bb428fe2` ("tools: bpf: add bpftool") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-09 17:38:53 +02:00
David Howells	e8c3af6bb3	rxrpc: Don't bother generating maxSkew in the ACK packet Don't bother generating maxSkew in the ACK packet as it has been obsolete since AFS 3.1. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>	2019-08-09 15:24:00 +01:00
David Howells	730c5fd42c	rxrpc: Fix local endpoint refcounting The object lifetime management on the rxrpc_local struct is broken in that the rxrpc_local_processor() function is expected to clean up and remove an object - but it may get requeued by packets coming in on the backing UDP socket once it starts running. This may result in the assertion in rxrpc_local_rcu() firing because the memory has been scheduled for RCU destruction whilst still queued: rxrpc: Assertion failed ------------[ cut here ]------------ kernel BUG at net/rxrpc/local_object.c:468! Note that if the processor comes around before the RCU free function, it will just do nothing because ->dead is true. Fix this by adding a separate refcount to count active users of the endpoint that causes the endpoint to be destroyed when it reaches 0. The original refcount can then be used to refcount objects through the work processor and cause the memory to be rcu freed when that reaches 0. Fixes: `4f95dd78a7` ("rxrpc: Rework local endpoint management") Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>	2019-08-09 15:21:19 +01:00
Fuqian Huang	8c25d0887a	net: tundra: tsi108: use spin_lock_irqsave instead of spin_lock_irq in IRQ context As spin_unlock_irq will enable interrupts. Function tsi108_stat_carry is called from interrupt handler tsi108_irq. Interrupts are enabled in interrupt handler. Use spin_lock_irqsave/spin_unlock_irqrestore instead of spin_(un)lock_irq in IRQ context to avoid this. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:43:34 -07:00
YueHaibing	227f2f030e	team: Add vlan tx offload to hw_enc_features We should also enable team's vlan tx offload in hw_enc_features, pass the vlan packets to the slave devices with vlan tci, let the slave handle vlan tunneling offload implementation. Fixes: `3268e5cb49` ("team: Advertise tunneling offload features") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:41:41 -07:00
Jakub Kicinski	414776621d	net/tls: prevent skb_orphan() from leaking TLS plain text with offload sk_validate_xmit_skb() and drivers depend on the sk member of struct sk_buff to identify segments requiring encryption. Any operation which removes or does not preserve the original TLS socket such as skb_orphan() or skb_clone() will cause clear text leaks. Make the TCP socket underlying an offloaded TLS connection mark all skbs as decrypted, if TLS TX is in offload mode. Then in sk_validate_xmit_skb() catch skbs which have no socket (or a socket with no validation) and decrypted flag set. Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and sk->sk_validate_xmit_skb are slightly interchangeable right now, they all imply TLS offload. The new checks are guarded by CONFIG_TLS_DEVICE because that's the option guarding the sk_buff->decrypted member. Second, smaller issue with orphaning is that it breaks the guarantee that packets will be delivered to device queues in-order. All TLS offload drivers depend on that scheduling property. This means skb_orphan_partial()'s trick of preserving partial socket references will cause issues in the drivers. We need a full orphan, and as a result netem delay/throttling will cause all TLS offload skbs to be dropped. Reusing the sk_buff->decrypted flag also protects from leaking clear text when incoming, decrypted skb is redirected (e.g. by TC). See commit `0608c69c9a` ("bpf: sk_msg, sock{map\|hash} redirect through ULP") for justification why the internal flag is safe. The only location which could leak the flag in is tcp_bpf_sendmsg(), which is taken care of by clearing the previously unused bit. v2: - remove superfluous decrypted mark copy (Willem); - remove the stale doc entry (Boris); - rely entirely on EOR marking to prevent coalescing (Boris); - use an internal sendpages flag instead of marking the socket (Boris). v3 (Willem): - reorganize the can_skb_orphan_partial() condition; - fix the flag leak-in through tcp_bpf_sendmsg. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:39:35 -07:00
David S. Miller	0de94de180	Merge branch 'skbedit-batch-fixes' Roman Mashak says: ==================== Fix batched event generation for skbedit action When adding or deleting a batch of entries, the kernel sends up to TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user space. However it does not consider that the action sizes may vary and require different skb sizes. For example, consider the following script adding 32 entries with all supported skbedit parameters and cookie (in order to maximize netlink messages size): % cat tc-batch.sh TC="sudo /mnt/iproute2.git/tc/tc" $TC actions flush action skbedit for i in `seq 1 $1`; do cmd="action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd \ ptype host inheritdsfield \ index $i cookie aabbccddeeff112233445566778800a1 " args=$args$cmd done $TC actions add $args % % ./tc-batch.sh 32 Error: Failed to fill netlink attributes while adding TC action. We have an error talking to the kernel % patch 1 adds callback in tc_action_ops of skbedit action, which calculates the action size, and passes size to tcf_add_notify()/tcf_del_notify(). patch 2 updates the TDC test suite with relevant skbedit test cases. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:37:06 -07:00
Roman Mashak	7bc161846d	tc-testing: updated skbedit action tests with batch create/delete Update TDC tests with cases varifying ability of TC to install or delete batches of skbedit actions. Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:37:06 -07:00
Roman Mashak	e1fea322fc	net sched: update skbedit action for batched events operations Add get_fill_size() routine used to calculate the action size when building a batch of events. Fixes: `ca9b0e27e` ("pkt_action: add new action skbedit") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:37:06 -07:00
YueHaibing	e3e3af9aa2	net: dsa: sja1105: remove set but not used variables 'tx_vid' and 'rx_vid' Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/dsa/sja1105/sja1105_main.c: In function sja1105_fdb_dump: drivers/net/dsa/sja1105/sja1105_main.c:1226:14: warning: variable tx_vid set but not used [-Wunused-but-set-variable] drivers/net/dsa/sja1105/sja1105_main.c:1226:6: warning: variable rx_vid set but not used [-Wunused-but-set-variable] They are not used since commit `6d7c7d948a` ("net: dsa: sja1105: Fix broken learning with vlan_filtering disabled") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:29:02 -07:00
YueHaibing	d595b03de2	bonding: Add vlan tx offload to hw_enc_features As commit `30d8177e8a` ("bonding: Always enable vlan tx offload") said, we should always enable bonding's vlan tx offload, pass the vlan packets to the slave devices with vlan tci, let them to handle vlan implementation. Now if encapsulation protocols like VXLAN is used, skb->encapsulation may be set, then the packet is passed to vlan device which based on bonding device. However in netif_skb_features(), the check of hw_enc_features: if (skb->encapsulation) features &= dev->hw_enc_features; clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results in same issue in commit `30d8177e8a` like this: vlan_dev_hard_start_xmit -->dev_queue_xmit -->validate_xmit_skb -->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared -->validate_xmit_vlan -->__vlan_hwaccel_push_inside //skb->tci is cleared ... --> bond_start_xmit --> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34 --> __skb_flow_dissect // nhoff point to IP header --> case htons(ETH_P_8021Q) // skb_vlan_tag_present is false, so vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan), //vlan point to ip header wrongly Fixes: `b2a103e6d0` ("bonding: convert to ndo_fix_features") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:01:44 -07:00
Ivan Khoronzhuk	51650d33b2	net: sched: sch_taprio: fix memleak in error path for sched list parse In error case, all entries should be freed from the sched list before deleting it. For simplicity use rcu way. Fixes: `5a781ccbd1` ("tc: Add support for configuring the taprio scheduler") Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:00:24 -07:00
Stephen Hemminger	fe90689fed	net: docs: replace IPX in tuntap documentation IPX is no longer supported, but the example in the documentation might useful. Replace it with IPv6. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:06:53 -07:00
Stephen Hemminger	7e7c076e12	docs: admin-guide: remove references to IPX and token-ring Both IPX and TR have not been supported for a while now. Remove them from the /proc/sys/net documentation. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:06:53 -07:00
Ross Lagerwall	3a0233ddec	xen/netback: Reset nr_frags before freeing skb At this point nr_frags has been incremented but the frag does not yet have a page assigned so freeing the skb results in a crash. Reset nr_frags before freeing the skb to prevent this. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:01:24 -07:00
Guillaume Nault	891584f48a	inet: frags: re-introduce skb coalescing for local delivery Before commit `d4289fcc9b` ("net: IP6 defrag: use rbtrees for IPv6 defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus generating many fragments) and running over an IPsec tunnel, reported more than 6Gbps throughput. After that patch, the same test gets only 9Mbps when receiving on a be2net nic (driver can make a big difference here, for example, ixgbe doesn't seem to be affected). By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing (IPv4 fragment coalescing was dropped by commit `14fe22e334` ("Revert "ipv4: use skb coalescing in defragmentation"")). Without fragment coalescing, be2net runs out of Rx ring entries and starts to drop frames (ethtool reports rx_drops_no_frags errors). Since the netperf traffic is only composed of UDP fragments, any lost packet prevents reassembly of the full datagram. Therefore, fragments which have no possibility to ever get reassembled pile up in the reassembly queue, until the memory accounting exeeds the threshold. At that point no fragment is accepted anymore, which effectively discards all netperf traffic. When reassembly timeout expires, some stale fragments are removed from the reassembly queue, so a few packets can be received, reassembled and delivered to the netperf receiver. But the nic still drops frames and soon the reassembly queue gets filled again with stale fragments. These long time frames where no datagram can be received explain why the performance drop is so significant. Re-introducing fragment coalescing is enough to get the initial performances again (6.6Gbps with be2net): driver doesn't drop frames anymore (no more rx_drops_no_frags errors) and the reassembly engine works at full speed. This patch is quite conservative and only coalesces skbs for local IPv4 and IPv6 delivery (in order to avoid changing skb geometry when forwarding). Coalescing could be extended in the future if need be, as more scenarios would probably benefit from it. [0]: Test configuration Sender: ip xfrm policy flush ip xfrm state flush ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1 ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1 ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow netserver -D -L fc00:2::1 Receiver: ip xfrm policy flush ip xfrm state flush ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1 ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1 ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6 Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 15:55:10 -07:00
Aya Levin	a4e508cab6	net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter Remove check of recovery bit, in the beginning of the CQE recovery function. This test is already performed right before the reporter is invoked, when CQE error is detected. Fixes: `de8650a820` ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:20 -07:00

1 2 3 4 5 ...

856366 Commits All Branches Search

856366 Commits

All Branches