original_kernel

Commit Graph

Author	SHA1	Message	Date
Florian Westphal	06cad3acf0	netfilter: core: export raw versions of add/delete hook functions This will allow the nat core to reuse the nf_hook infrastructure to maintain nat lookup functions. The raw versions don't assume a particular hook location, the functions get added/deleted from the hook blob that is passed to the functions. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2018-05-23 09:14:05 +02:00
Florian Westphal	4e25ceb80b	netfilter: nf_tables: allow chain type to override hook register Will be used in followup patch when nat types no longer use nf_register_net_hook() but will instead register with the nat core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2018-05-23 09:14:05 +02:00
Florian Westphal	ba7d284a98	netfilter: xtables: allow table definitions not backed by hook_ops The ip(6)tables nat table is currently receiving skbs from the netfilter core, after a followup patch skbs will be coming from the netfilter nat core instead, so the table is no longer backed by normal hook_ops. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2018-05-23 09:14:05 +02:00
Florian Westphal	1f55236bd8	netfilter: nf_nat: move common nat code to nat core Copy-pasted, both l3 helpers almost use same code here. Split out the common part into an 'inet' helper. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2018-05-23 09:14:05 +02:00
Brian Norris	11799564fc	mfd: cros_ec: Retry commands when EC is known to be busy Commit `001dde9400` ("mfd: cros ec: spi: Fix "in progress" error signaling") pointed out some bad code, but its analysis and conclusion was not 100% correct. It is correct that we should not propagate result==EC_RES_IN_PROGRESS for transport errors, because this has a special meaning -- that we should follow up with EC_CMD_GET_COMMS_STATUS until the EC is no longer busy. This is definitely the wrong thing for many commands, because among other problems, EC_CMD_GET_COMMS_STATUS doesn't actually retrieve any RX data from the EC, so commands that expected some data back will instead start processing junk. For such commands, the right answer is to either propagate the error (and return that error to the caller) or resend the original command (not EC_CMD_GET_COMMS_STATUS). Unfortunately, commit `001dde9400` forgets a crucial point: that for some long-running operations, the EC physically cannot respond to commands any more. For example, with EC_CMD_FLASH_ERASE, the EC may be re-flashing its own code regions, so it can't respond to SPI interrupts. Instead, the EC prepares us ahead of time for being busy for a "long" time, and fills its hardware buffer with EC_SPI_PAST_END. Thus, we expect to see several "transport" errors (or, messages filled with EC_SPI_PAST_END). So we should really translate that to a retryable error (-EAGAIN) and continue sending EC_CMD_GET_COMMS_STATUS until we get a ready status. IOW, it is actually important to treat some of these "junk" values as retryable errors. Together with commit `001dde9400`, this resolves bugs like the following: 1. EC_CMD_FLASH_ERASE now works again (with commit `001dde9400`, we would abort the first time we saw EC_SPI_PAST_END) 2. Before commit `001dde9400`, transport errors (e.g., EC_SPI_RX_BAD_DATA) seen in other commands (e.g., EC_CMD_RTC_GET_VALUE) used to yield junk data in the RX buffer; they will now yield -EAGAIN return values, and tools like 'hwclock' will simply fail instead of retrieving and re-programming undefined time values Fixes: `001dde9400` ("mfd: cros ec: spi: Fix "in progress" error signaling") Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>	2018-05-23 06:59:00 +01:00
David S. Miller	1fe8c06c4a	Merge branch 'qed-firmware-TLV' Sudarsana Reddy Kalluru says: ==================== qed*: Add support for management firmware TLV request. Management firmware (MFW) requires config and state information from the driver. It queries this via TLV (type-length-value) request wherein mfw specificies the list of required TLVs. Driver fills the TLV data and responds back to MFW. This patch series adds qed/qede/qedf/qedi driver implementation for supporting the TLV queries from MFW. Changes from previous versions: ------------------------------- v2: Split patch (2) into multiple simpler patches. v2: Update qed_tlv_parsed_buf->p_val datatype to void pointer to avoid bunch of unnecessary typecasts. Please consider applying this series to "net-next". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:55 -04:00
Manish Rangankar	269afb3603	qedi: Add get_generic_tlv_data handler. Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Manish Rangankar	534bbdf883	qedi: Add support for populating ethernet TLVs. This patch adds callbacks for providing the ethernet protocol driver TLVs. Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Chad Dupuis	8673daf4f5	qedf: Add get_generic_tlv_data handler. Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Chad Dupuis	642a0b37e6	qedf: Add support for populating ethernet TLVs. This patch adds callbacks for providing the ethernet protocol driver TLVs. Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Sudarsana Reddy Kalluru	d25b859ccd	qede: Add support for populating ethernet TLVs. This patch adds callbacks for providing the ethernet protocol driver TLVs. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Sudarsana Reddy Kalluru	59ccf86fe6	qed: Add driver infrastucture for handling mfw requests. MFW requests the TLVs in interrupt context. Extracting of the required data from upper layers and populating of the TLVs require process context. The patch adds work-queues for processing the tlv requests. It also adds the implementation for requesting the tlv values from appropriate protocol driver. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Sudarsana Reddy Kalluru	77a509e4f6	qed: Add support for processing iscsi tlv request. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:54 -04:00
Sudarsana Reddy Kalluru	f240b68822	qed: Add support for processing fcoe tlv request. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:53 -04:00
Sudarsana Reddy Kalluru	2528c38993	qed: Add support for tlv request processing. The patch adds driver support for processing TLV requests/repsonses from the mfw and upper driver layers respectively. The implementation reads the requested TLVs from the shared memory, requests the values from upper layer drivers, populates this info (TLVs) shared memory and notifies MFW about the TLV values. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:53 -04:00
Sudarsana Reddy Kalluru	dd006921d6	qed: Add MFW interfaces for TLV request support. The patch adds required management firmware (MFW) interfaces such as mailbox commands, TLV types etc. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 23:29:53 -04:00
Sinan Kaya	92d7223a74	alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2 memory-barriers.txt has been updated with the following requirement. "When using writel(), a prior wmb() is not needed to guarantee that the cache coherent memory writes have completed before writing to the MMIO region." Current writeX() and iowriteX() implementations on alpha are not satisfying this requirement as the barrier is after the register write. Move mb() in writeX() and iowriteX() functions to guarantee that HW observes memory changes before performing register operations. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matt Turner <mattst88@gmail.com>	2018-05-22 18:10:36 -07:00
Christoph Hellwig	f5e82fa260	alpha: simplify get_arch_dma_ops Remove the dma_ops indirection. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Matt Turner <mattst88@gmail.com>	2018-05-22 18:10:36 -07:00
Christoph Hellwig	6db615431a	alpha: use dma_direct_ops for jensen The generic dma_direct implementation does the same thing as the alpha pci-noop implementation, just with more bells and whistles. And unlike the current code it at least has a theoretical chance to actually compile. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Matt Turner <mattst88@gmail.com>	2018-05-22 18:10:36 -07:00
David S. Miller	9c803cfd5f	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-05-22 This series contains updates to i40e only. Jake provides all the changes in this series starting with making it consistent in how we approach the bit lock. Fixed the reporting of the VEB statistics and the queue statistics to always return every queue even if it is not currently in use. Use WARN_ONCE() so that the first time we end up with an incorrect size we will dump a stack trace and a message to help highlight the issue early in testing. Folded the fixed string prefix into the stat string definition. Instead of using a separate char *p pointer when copying strings, use the data pointer directly. Added code comments for several of the statistic functions to better explain the number and ordering of statistics. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:45:06 -04:00
David S. Miller	119768c903	Merge branch 'tcp-ECN-quickack' Eric Dumazet says: ==================== tcp: reduce quickack pressure for ECN Small patch series changing TCP behavior vs quickack and ECN First patch is a refactoring, adding parameter to tcp_incr_quickack() and tcp_enter_quickack_mode() helpers. Second patch implements the change, lowering number of ACK packets sent after an ECN event. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:43:16 -04:00
Eric Dumazet	522040ea5f	tcp: do not aggressively quick ack after ECN events ECN signals currently forces TCP to enter quickack mode for up to 16 (TCP_MAX_QUICKACKS) following incoming packets. We believe this is not needed, and only sending one immediate ack for the current packet should be enough. This should reduce the extra load noticed in DCTCP environments, after congestion events. This is part 2 of our effort to reduce pure ACK packets. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:43:15 -04:00
Eric Dumazet	9a9c9b51e5	tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode We want to add finer control of the number of ACK packets sent after ECN events. This patch is not changing current behavior, it only enables following change. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:43:15 -04:00
Bo Chen	d7db318651	pcnet32: add an error handling path in pcnet32_probe_pci() Make sure to invoke pci_disable_device() when errors occur in pcnet32_probe_pci(). Signed-off-by: Bo Chen <chenbo@pdx.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:40:15 -04:00
Vlad Buslov	290aa0ad74	net: sched: don't disable bh when accessing action idr Initial net_device implementation used ingress_lock spinlock to synchronize ingress path of device. This lock was used in both process and bh context. In some code paths action map lock was obtained while holding ingress_lock. Commit `e1e992e52f` ("[NET_SCHED] protect action config/dump from irqs") modified actions to always disable bh, while using action map lock, in order to prevent deadlock on ingress_lock in softirq. This lock was removed from net_device, so disabling bh, while accessing action map, is no longer necessary. Replace all action idr spinlock usage with regular calls that do not disable bh. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:34:34 -04:00
Shahed Shaikh	fdd13dd350	qed: Fix mask for physical address in ILT entry ILT entry requires 12 bit right shifted physical address. Existing mask for ILT entry of physical address i.e. ILT_ENTRY_PHY_ADDR_MASK is not sufficient to handle 64bit address because upper 8 bits of 64 bit address were getting masked which resulted in completer abort error on PCIe bus due to invalid address. Fix that mask to handle 64bit physical address. Fixes: `fe56b9e6a8` ("qed: Add module with basic common support") Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:32:55 -04:00
Eric Dumazet	66fb33254f	ipmr: properly check rhltable_init() return value commit `8fb472c09b` ("ipmr: improve hash scalability") added a call to rhltable_init() without checking its return value. This problem was then later copied to IPv6 and factorized in commit `0bbbf0e7d0` ("ipmr, ip6mr: Unite creation of new mr_table") kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 31552 Comm: syz-executor7 Not tainted 4.17.0-rc5+ #60 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:rht_key_hashfn include/linux/rhashtable.h:277 [inline] RIP: 0010:__rhashtable_lookup include/linux/rhashtable.h:630 [inline] RIP: 0010:rhltable_lookup include/linux/rhashtable.h:716 [inline] RIP: 0010:mr_mfc_find_parent+0x2ad/0xbb0 net/ipv4/ipmr_base.c:63 RSP: 0018:ffff8801826aef70 EFLAGS: 00010203 RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffc90001ea0000 RDX: 0000000000000079 RSI: ffffffff8661e859 RDI: 000000000000000c RBP: ffff8801826af1c0 R08: ffff8801b2212000 R09: ffffed003b5e46c2 R10: ffffed003b5e46c2 R11: ffff8801daf23613 R12: dffffc0000000000 R13: ffff8801826af198 R14: ffff8801cf8225c0 R15: ffff8801826af658 FS: 00007ff7fa732700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000003ffffff9c CR3: 00000001b0210000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6mr_cache_find_parent net/ipv6/ip6mr.c:981 [inline] ip6mr_mfc_delete+0x1fe/0x6b0 net/ipv6/ip6mr.c:1221 ip6_mroute_setsockopt+0x15c6/0x1d70 net/ipv6/ip6mr.c:1698 do_ipv6_setsockopt.isra.9+0x422/0x4660 net/ipv6/ipv6_sockglue.c:163 ipv6_setsockopt+0xbd/0x170 net/ipv6/ipv6_sockglue.c:922 rawv6_setsockopt+0x59/0x140 net/ipv6/raw.c:1060 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3039 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903 __do_sys_setsockopt net/socket.c:1914 [inline] __se_sys_setsockopt net/socket.c:1911 [inline] __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: `8fb472c09b` ("ipmr: improve hash scalability") Fixes: `0bbbf0e7d0` ("ipmr, ip6mr: Unite creation of new mr_table") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: Yuval Mintz <yuvalm@mellanox.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 15:31:15 -04:00
David S. Miller	73bf1fc58d	Merge branch 'net-ipv6-Fix-route-append-and-replace-use-cases' David Ahern says: ==================== net/ipv6: Fix route append and replace use cases This patch set fixes a few append and replace uses cases for IPv6 and adds test cases that codifies the expectations of how append and replace are expected to work. In paricular it allows a multipath route to have a dev-only nexthop, something Thomas tried to accomplish with commit `edd7ceb782` ("ipv6: Allow non-gateway ECMP for IPv6") which had to be reverted because of breakage, and to replace an existing FIB entry with a reject route. There are a number of inconsistent and surprising aspects to the Linux API for adding, deleting, replacing and changing FIB entries. For example, with IPv4 NLM_F_APPEND means insert the route after any existing entries with the same key (prefix + priority + TOS for IPv4) and NLM_F_CREATE without the append flag inserts the new route before any existing entries. IPv6 on the other hand attempts to guess whether a new route should be appended to an existing one, possibly creating a multipath route, or to add a new entry after any existing ones. This applies to both the 'append' (NLM_F_CREATE + NLM_F_APPEND) and 'prepend' (NLM_F_CREATE only) cases meaning for IPv6 the NLM_F_APPEND is basically ignored. This guessing whether the route should be added to a multipath route (gateway routes) or inserted after existing entries (non-gateway based routes) means a multipath route can not have a dev only nexthop (potentially required in some cases - tunnels or VRF route leaking for example) and route 'replace' is a bit adhoc treating gateway based routes and dev-only / reject routes differently. This has led to frustration with developers working on routing suites such as FRR where workarounds such as delete and add are used instead of replace. After this patch set there are 2 differences between IPv4 and IPv6: 1. 'ip ro prepend' = NLM_F_CREATE only IPv4 adds the new route before any existing ones IPv6 adds new route after any existing ones 2. 'ip ro append' = NLM_F_CREATE\|NLM_F_APPEND IPv4 adds the new route after any existing ones IPv6 adds the nexthop to existing routes converting to multipath For the former, there are cases where we want same prefix routes added after existing ones (e.g., multicast, prefix routes for macvlan when used for virtual router redundancy). Requiring the APPEND flag to add a new route to an existing one helps here but is a slight change in behavior since prepend with gateway routes now create a separate entry. For the latter IPv6 behavior is preferred - appending a route for the same prefix and metric to make a multipath route, so really IPv4 not allowing an existing route to be updated is the limiter. This will be fixed when nexthops become separate objects - a future patch set. Thank you to Thomas and Ido for testing earlier versions of this set, and to Ido for providing an update to the mlxsw driver. Changes since RFC - cleanup wording in test script; add comments about expected failures and why ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:20 -04:00
David Ahern	abb1860aac	selftests: fib_tests: Add ipv4 route add append replace tests Add IPv4 route tests covering add, append and replace permutations. Assumes the ability to add a basic single path route works; this is required for example when adding an address to an interface. $ fib_tests.sh -t ipv4_rt IPv4 route add / append tests TEST: Attempt to add duplicate route - gw [ OK ] TEST: Attempt to add duplicate route - dev only [ OK ] TEST: Attempt to add duplicate route - reject route [ OK ] TEST: Add new nexthop for existing prefix [ OK ] TEST: Append nexthop to existing route - gw [ OK ] TEST: Append nexthop to existing route - dev only [ OK ] TEST: Append nexthop to existing route - reject route [ OK ] TEST: Append nexthop to existing reject route - gw [ OK ] TEST: Append nexthop to existing reject route - dev only [ OK ] TEST: add multipath route [ OK ] TEST: Attempt to add duplicate multipath route [ OK ] TEST: Route add with different metrics [ OK ] TEST: Route delete with metric [ OK ] IPv4 route replace tests TEST: Single path with single path [ OK ] TEST: Single path with multipath [ OK ] TEST: Single path with reject route [ OK ] TEST: Single path with single path via multipath attribute [ OK ] TEST: Invalid nexthop [ OK ] TEST: Single path - replace of non-existent route [ OK ] TEST: Multipath with multipath [ OK ] TEST: Multipath with single path [ OK ] TEST: Multipath with single path via multipath attribute [ OK ] TEST: Multipath with reject route [ OK ] TEST: Multipath - invalid first nexthop [ OK ] TEST: Multipath - invalid second nexthop [ OK ] TEST: Multipath - replace of non-existent route [ OK ] Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:19 -04:00
David Ahern	f9a5a9d89f	selftests: fib_tests: Add ipv6 route add append replace tests Add IPv6 route tests covering add, append and replace permutations. Assumes the ability to add a basic single path route works; this is required for example when adding an address to an interface. $ fib_tests.sh -t ipv6_rt IPv6 route add / append tests TEST: Attempt to add duplicate route - gw [ OK ] TEST: Attempt to add duplicate route - dev only [ OK ] TEST: Attempt to add duplicate route - reject route [ OK ] TEST: Add new route for existing prefix (w/o NLM_F_EXCL) [ OK ] TEST: Append nexthop to existing route - gw [ OK ] TEST: Append nexthop to existing route - dev only [ OK ] TEST: Append nexthop to existing route - reject route [ OK ] TEST: Append nexthop to existing reject route - gw [ OK ] TEST: Append nexthop to existing reject route - dev only [ OK ] TEST: Add multipath route [ OK ] TEST: Attempt to add duplicate multipath route [ OK ] TEST: Route add with different metrics [ OK ] TEST: Route delete with metric [ OK ] IPv6 route replace tests TEST: Single path with single path [ OK ] TEST: Single path with multipath [ OK ] TEST: Single path with reject route [ OK ] TEST: Single path with single path via multipath attribute [ OK ] TEST: Invalid nexthop [ OK ] TEST: Single path - replace of non-existent route [ OK ] TEST: Multipath with multipath [ OK ] TEST: Multipath with single path [ OK ] TEST: Multipath with single path via multipath attribute [ OK ] TEST: Multipath with reject route [ OK ] TEST: Multipath - invalid first nexthop [ OK ] TEST: Multipath - invalid second nexthop [ OK ] TEST: Multipath - replace of non-existent route [ OK ] Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:19 -04:00
David Ahern	7df15e6c3e	selftests: fib_tests: Add option to pause after each test Add option to pause after each test before cleanup is done. Allows user to do manual inspection or more ad-hoc testing after each test with the setup in tact. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:19 -04:00
David Ahern	1c7447b4e8	selftests: fib_tests: Add command line options Add command line options for controlling pause on fail, controlling specific tests to run and verbose mode rather than relying on environment variables. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:19 -04:00
David Ahern	37ce42c14e	selftests: fib_tests: Add success-fail counts As more tests are added, it is convenient to have a tally at the end. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:18 -04:00
David Ahern	f34436a430	net/ipv6: Simplify route replace and appending into multipath route Bring consistency to ipv6 route replace and append semantics. Remove rt6_qualify_for_ecmp which is just guess work. It fails in 2 cases: 1. can not replace a route with a reject route. Existing code appends a new route instead of replacing the existing one. 2. can not have a multipath route where a leg uses a dev only nexthop Existing use cases affected by this change: 1. adding a route with existing prefix and metric using NLM_F_CREATE without NLM_F_APPEND or NLM_F_EXCL (ie., what iproute2 calls 'prepend'). Existing code auto-determines that the new nexthop can be appended to an existing route to create a multipath route. This change breaks that by requiring the APPEND flag for the new route to be added to an existing one. Instead the prepend just adds another route entry. 2. route replace. Existing code replaces first matching multipath route if new route is multipath capable and fallback to first matching non-ECMP route (reject or dev only route) in case one isn't available. New behavior replaces first matching route. (Thanks to Ido for spotting this one) Note: Newer iproute2 is needed to display multipath routes with a dev-only nexthop. This is due to a bug in iproute2 and parsing nexthops. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:18 -04:00
David Ahern	5a15a1b07c	mlxsw: spectrum_router: Add support for route append Handle append for gateway based routes. Dev-only multipath routes will be handled by a follow on patch. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 14:44:18 -04:00
Alexey Kodanev	2677d20677	dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect() Syzbot reported the use-after-free in timer_is_static_object() [1]. This can happen because the structure for the rto timer (ccid2_hc_tx_sock) is removed in dccp_disconnect(), and ccid2_hc_tx_rto_expire() can be called after that. The report [1] is similar to the one in commit `120e9dabaf` ("dccp: defer ccid_hc_tx_delete() at dismantle time"). And the fix is the same, delay freeing ccid2_hc_tx_sock structure, so that it is freed in dccp_sk_destruct(). [1] ================================================================== BUG: KASAN: use-after-free in timer_is_static_object+0x80/0x90 kernel/time/timer.c:607 Read of size 8 at addr ffff8801bebb5118 by task syz-executor2/25299 CPU: 1 PID: 25299 Comm: syz-executor2 Not tainted 4.17.0-rc5+ #54 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 timer_is_static_object+0x80/0x90 kernel/time/timer.c:607 debug_object_activate+0x2d9/0x670 lib/debugobjects.c:508 debug_timer_activate kernel/time/timer.c:709 [inline] debug_activate kernel/time/timer.c:764 [inline] __mod_timer kernel/time/timer.c:1041 [inline] mod_timer+0x4d3/0x13b0 kernel/time/timer.c:1102 sk_reset_timer+0x22/0x60 net/core/sock.c:2742 ccid2_hc_tx_rto_expire+0x587/0x680 net/dccp/ccids/ccid2.c:147 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:525 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 </IRQ> ... Allocated by task 25374: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 ccid_new+0x25b/0x3e0 net/dccp/ccid.c:151 dccp_hdlr_ccid+0x27/0x150 net/dccp/feat.c:44 __dccp_feat_activate+0x184/0x270 net/dccp/feat.c:344 dccp_feat_activate_values+0x3a7/0x819 net/dccp/feat.c:1538 dccp_create_openreq_child+0x472/0x610 net/dccp/minisocks.c:128 dccp_v4_request_recv_sock+0x12c/0xca0 net/dccp/ipv4.c:408 dccp_v6_request_recv_sock+0x125d/0x1f10 net/dccp/ipv6.c:415 dccp_check_req+0x455/0x6a0 net/dccp/minisocks.c:197 dccp_v4_rcv+0x7b8/0x1f3f net/dccp/ipv4.c:841 ip_local_deliver_finish+0x2e3/0xd80 net/ipv4/ip_input.c:215 NF_HOOK include/linux/netfilter.h:288 [inline] ip_local_deliver+0x1e1/0x720 net/ipv4/ip_input.c:256 dst_input include/net/dst.h:450 [inline] ip_rcv_finish+0x81b/0x2200 net/ipv4/ip_input.c:396 NF_HOOK include/linux/netfilter.h:288 [inline] ip_rcv+0xb70/0x143d net/ipv4/ip_input.c:492 __netif_receive_skb_core+0x26f5/0x3630 net/core/dev.c:4592 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4657 process_backlog+0x219/0x760 net/core/dev.c:5337 napi_poll net/core/dev.c:5735 [inline] net_rx_action+0x7b7/0x1930 net/core/dev.c:5801 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 Freed by task 25374: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 ccid_hc_tx_delete+0xc3/0x100 net/dccp/ccid.c:190 dccp_disconnect+0x130/0xc66 net/dccp/proto.c:286 dccp_close+0x3bc/0xe60 net/dccp/proto.c:1045 inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:460 sock_release+0x96/0x1b0 net/socket.c:594 sock_close+0x16/0x20 net/socket.c:1149 __fput+0x34d/0x890 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x1e4/0x290 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:191 [inline] exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8801bebb4cc0 which belongs to the cache ccid2_hc_tx_sock of size 1240 The buggy address is located 1112 bytes inside of 1240-byte region [ffff8801bebb4cc0, ffff8801bebb5198) The buggy address belongs to the page: page:ffffea0006faed00 count:1 mapcount:0 mapping:ffff8801bebb41c0 index:0xffff8801bebb5240 compound_mapcount: 0 flags: 0x2fffc0000008100(slab\|head) raw: 02fffc0000008100 ffff8801bebb41c0 ffff8801bebb5240 0000000100000003 raw: ffff8801cdba3138 ffffea0007634120 ffff8801cdbaab40 0000000000000000 page dumped because: kasan: bad access detected ... ================================================================== Reported-by: syzbot+5d47e9ec91a6f15dbd6f@syzkaller.appspotmail.com Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 13:55:20 -04:00
Wenwen Wang	6009d1fe6b	isdn: eicon: fix a missing-check bug In divasmain.c, the function divas_write() firstly invokes the function diva_xdi_open_adapter() to open the adapter that matches with the adapter number provided by the user, and then invokes the function diva_xdi_write() to perform the write operation using the matched adapter. The two functions diva_xdi_open_adapter() and diva_xdi_write() are located in diva.c. In diva_xdi_open_adapter(), the user command is copied to the object 'msg' from the userspace pointer 'src' through the function pointer 'cp_fn', which eventually calls copy_from_user() to do the copy. Then, the adapter number 'msg.adapter' is used to find out a matched adapter from the 'adapter_queue'. A matched adapter will be returned if it is found. Otherwise, NULL is returned to indicate the failure of the verification on the adapter number. As mentioned above, if a matched adapter is returned, the function diva_xdi_write() is invoked to perform the write operation. In this function, the user command is copied once again from the userspace pointer 'src', which is the same as the 'src' pointer in diva_xdi_open_adapter() as both of them are from the 'buf' pointer in divas_write(). Similarly, the copy is achieved through the function pointer 'cp_fn', which finally calls copy_from_user(). After the successful copy, the corresponding command processing handler of the matched adapter is invoked to perform the write operation. It is obvious that there are two copies here from userspace, one is in diva_xdi_open_adapter(), and one is in diva_xdi_write(). Plus, both of these two copies share the same source userspace pointer, i.e., the 'buf' pointer in divas_write(). Given that a malicious userspace process can race to change the content pointed by the 'buf' pointer, this can pose potential security issues. For example, in the first copy, the user provides a valid adapter number to pass the verification process and a valid adapter can be found. Then the user can modify the adapter number to an invalid number. This way, the user can bypass the verification process of the adapter number and inject inconsistent data. This patch reuses the data copied in diva_xdi_open_adapter() and passes it to diva_xdi_write(). This way, the above issues can be avoided. Signed-off-by: Wenwen Wang <wang6495@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 13:48:34 -04:00
Fabio Estevam	1f508124e9	net: fec: Add a SPDX identifier Currently there is no license information in the header of this file. The MODULE_LICENSE field contains ("GPL"), which means GNU Public License v2 or later, so add a corresponding SPDX license identifier. Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 13:42:05 -04:00
Fabio Estevam	9fcca5effc	net: fec: ptp: Switch to SPDX identifier Adopt the SPDX license identifier headers to ease license compliance management. Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 13:42:04 -04:00
Xin Long	644fbdeacf	sctp: fix the issue that flags are ignored when using kernel_connect Now sctp uses inet_dgram_connect as its proto_ops .connect, and the flags param can't be passed into its proto .connect where this flags is really needed. sctp works around it by getting flags from socket file in __sctp_connect. It works for connecting from userspace, as inherently the user sock has socket file and it passes f_flags as the flags param into the proto_ops .connect. However, the sock created by sock_create_kern doesn't have a socket file, and it passes the flags (like O_NONBLOCK) by using the flags param in kernel_connect, which calls proto_ops .connect later. So to fix it, this patch defines a new proto_ops .connect for sctp, sctp_inet_connect, which calls __sctp_connect() directly with this flags param. After this, the sctp's proto .connect can be removed. Note that sctp_inet_connect doesn't need to do some checks that are not needed for sctp, which makes thing better than with inet_dgram_connect. Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-22 13:37:26 -04:00
Peter Maydell	cc19846079	arm64: fault: Don't leak data in ESR context for user fault on kernel VA If userspace faults on a kernel address, handing them the raw ESR value on the sigframe as part of the delivered signal can leak data useful to attackers who are using information about the underlying hardware fault type (e.g. translation vs permission) as a mechanism to defeat KASLR. However there are also legitimate uses for the information provided in the ESR -- notably the GCC and LLVM sanitizers use this to report whether wild pointer accesses by the application are reads or writes (since a wild write is a more serious bug than a wild read), so we don't want to drop the ESR information entirely. For faulting addresses in the kernel, sanitize the ESR. We choose to present userspace with the illusion that there is nothing mapped in the kernel's part of the address space at all, by reporting all faults as level 0 translation faults taken to EL1. These fields are safe to pass through to userspace as they depend only on the instruction that userspace used to provoke the fault: EC IL (always) ISV CM WNR (for all data aborts) All the other fields in ESR except DFSC are architecturally RES0 for an L0 translation fault taken to EL1, so can be zeroed out without confusing userspace. The illusion is not entirely perfect, as there is a tiny wrinkle where we will report an alignment fault that was not due to the memory type (for instance a LDREX to an unaligned address) as a translation fault, whereas if you do this on real unmapped memory the alignment fault takes precedence. This is not likely to trip anybody up in practice, as the only users we know of for the ESR information who care about the behaviour for kernel addresses only really want to know about the WnR bit. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-05-22 17:14:20 +01:00
Jacob Keller	3f76bdb437	i40e: use the more traditional 'i' loop variable Since we no longer use i as an array index for the data variable, replace the use of 'j' with 'i' so that we match the general loop variable name. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-05-22 08:37:06 -07:00
Jacob Keller	ec29bbf8ab	i40e: add function doc headers for ethtool stats functions Add documentation for the i40e_get_stats_count, i40e_get_stat_strings and i40e_get_ethtool_stats explaining that the number and ordering of statistics must remain constant for a given netdevice. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-05-22 08:37:06 -07:00
Jacob Keller	e08696dcd9	i40e: update data pointer directly when copying to the buffer A future patch is going to add a helper function i40e_add_ethtool_stats that will help lower the amount of boiler plate code in the i40e_get_ethtool_stats function. This conversion will take place over many patches, and the helper function will work by directly updating a reference to the data pointer. Since this would not work combined with the current method of accessing data like an array, update all the code that copies stats into the data buffer to use direct updates to the pointer instead of array accesses. This will prevent incorrect stat updates for patches in between the conversion. Similarly, when copying strings, we used a separate char p pointer. Instead, use the data pointer directly as it's already a (u8 ) type which is the same size. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-05-22 08:37:06 -07:00
Jacob Keller	bf1c39e640	i40e: fold prefix strings directly into stat names We always prefix these stats with a fixed string, so just fold this prefix into the stat string definition. This preparatory work will make it easier to implement a helper function to copy stats and strings into the supplied buffers in a future patch. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-05-22 08:37:06 -07:00
Jacob Keller	9b10df596b	i40e: use WARN_ONCE to replace the commented BUG_ON size check We don't really want to use BUG_ON here since that would completely crash the kernel, thus the reason we commented it out. We can't use BUILD_BUG_ON because at least now (a) the sizes aren't constant (we are fixing this) and (b) not all compilers are smart enough to understand that "p - data" is a constant. Instead, just use a WARN_ONCE so that the first time we end up with an incorrect size we will dump a stack trace and a message, hopefully highlighting the issues early in testing. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-05-22 08:37:06 -07:00
Jacob Keller	019b9cd44d	i40e: split i40e_get_strings() into smaller functions Split the statistic strings and private flags strings into their own separate functions to aid code readability. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-05-22 08:37:06 -07:00
Jacob Keller	b831236509	i40e: always return all queue stat strings The ethtool API for obtaining device statistics is not intended to allow runtime changes in the number of statistics reported. It may appear this way, as there is an ability to request the number of stats using ethtool_get_set_count(). However, it is expected that this must always return the same value for invocations of the same device. If we don't satisfy this contract, and allow the number of stats to change during run time, we could cause invalid memory accesses or report the stat strings incorrectly. This is because the API for obtaining stats is to (1) get the size, (2) get the strings and finally (3) get the stats. Since these are each separate ethtool op commands, it is not possible to maintain consistency by holding the RTNL lock over the whole operation. This results in the potential for a race condition to occur where the size changed between any of the 3 calls. Avoid this issue by requiring that we always return the same value for a given device. We can check any values which remain constant for the life of the device, but must not report different sizes depending on runtime attributes. This patch specifically fixes the queue statistics to always return every queue even if it's not currently in use. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-05-22 08:37:06 -07:00
Jacob Keller	9955d4940e	i40e: always return VEB stat strings The ethtool API for obtaining device statistics is not intended to allow runtime changes in the number of statistics reported. It may appear this way, as there is an ability to request the number of stats using ethtool_get_set_count(). However, it is expected that this must always return the same value for invocations of the same device. If we don't satisfy this contract, and allow the number of stats to change during run time, we could cause invalid memory accesses or report the stat strings incorrectly. This is because the API for obtaining stats is to (1) get the size, (2) get the strings and finally (3) get the stats. Since these are each separate ethtool op commands, it is not possible to maintain consistency by holding the RTNL lock over the whole operation. This results in the potential for a race condition to occur where the size changed between any of the 3 calls. Avoid this issue by requiring that we always return the same value for a given device. We can check any values which remain constant for the life of the device, but must not report different sizes depending on runtime attributes. This patch specifically fixes the VEB statistics strings to always be reported. Other issues will be fixed in future patches. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-05-22 08:37:06 -07:00
Jacob Keller	bdf2752305	i40e: free skb after clearing lock in ptp_stop Use the same logic to free the skb after clearing the Tx timestamp bit lock in i40e_ptp_stop as we use in the other locations. It is not as important here since we are not racing against a future Tx timestamp request (as we are disabling PTP at this point). However it is good to be consistent in how we approach the bit lock so that future callers don't copy the old anti-pattern. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-05-22 08:37:06 -07:00

... 6 7 8 9 10 ...

755637 Commits All Branches Search

755637 Commits

All Branches