Commit Graph

336580 Commits

Author SHA1 Message Date
Paul Gortmaker 94fc9c4719 tipc: delete TIPC_ADVANCED Kconfig variable
There used to be a time when TIPC had lots of Kconfig knobs the
end user could alter, but they have all been made automatic or
obsolete, with the exception of CONFIG_TIPC_PORTS.  This
previously existing set of options was all hidden under the
TIPC_ADVANCED setting, which does not exist in any code, but
only in Kconfig scope.

Having this now, just to hide the one remaining "advanced"
option no longer makes sense.  Remove it.  Also get rid of the
ifdeffery in the TIPC code that allowed for TIPC_PORTS to be
possibly undefined.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-22 14:33:29 -05:00
Ying Xue 4cb7d55ab4 tipc: eliminate an unnecessary cast of node variable
As the variable:node is currently defined to u32 type, it is
unnecessary to cast its type to u32 again when using it.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-22 14:33:28 -05:00
Jon Maloy c64f7a6a1f tipc: introduce message to synchronize broadcast link
Upon establishing a first link between two nodes, there is
currently a risk that the two endpoints will disagree on exactly
which sequence number reception and acknowleding of broadcast
packets should start.

The following scenarios may happen:

1: Node A sends an ACTIVATE message to B, telling it to start acking
   packets from sequence number N.
2: Node A sends out broadcast N, but does not expect an acknowledge
   from B, since B is not yet in its broadcast receiver's list.
3: Node A receives ACK for N from all nodes except B, and releases
   packet N.
4: Node B receives the ACTIVATE, activates its link endpoint, and
   stores the value N as sequence number of first expected packet.
5: Node B sends a NAME_DISTR message to A.
6: Node A receives the NAME_DISTR message, and activates its endpoint.
   At this moment B is added to A's broadcast receiver's set.
   Node A also sets sequence number 0 as the first broadcast packet
   to be received from B.
7: Node A sends broadcast N+1.
8: B receives N+1, determines there is a gap in the sequence, since
   it is expecting N, and sends a NACK for N back to A.
9: Node A has already released N, so no retransmission is possible.
   The broadcast link in direction A->B is stale.

In addition to, or instead of, 7-9 above, the following may happen:

10: Node B sends broadcast M > 0 to A.
11: Node A receives M, falsely decides there must be a gap, since
    it is expecting packet 0, and asks for retransmission of packets
    [0,M-1].
12: Node B has already released these packets, so the broadcast
    link is stale in direction B->A.

We solve this problem by introducing a new unicast message type,
BCAST_PROTOCOL/STATE, to convey the sequence number of the next
sent broadcast packet to the other endpoint, at exactly the moment
that endpoint is added to the own node's broadcast receivers list,
and before any other unicast messages are permitted to be sent.

Furthermore, we don't allow any node to start receiving and
processing broadcast packets until this new synchronization
message has been received.

To maintain backwards compatibility, we still open up for
broadcast reception if we receive a NAME_DISTR message without
any preceding broadcast sync message. In this case, we must
assume that the other end has an older code version, and will
never send out the new synchronization message. Hence, for mixed
old and new nodes, the issue arising in 7-12 of the above may
happen with the same probability as before.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-22 14:33:21 -05:00
Ying Xue 389dd9bcf6 tipc: rename supported flag to recv_permitted
Rename the "supported" flag in bclink structure to "recv_permitted"
to better reflect what it is used for. When this flag is set for a
given node, we are permitted to receive and acknowledge broadcast
messages from that node.  Convert it to a bool at the same time,
since it is not used to store any numerical values.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-22 07:50:51 -05:00
Ying Xue 818f4da526 tipc: remove supportable flag from bclink structure
The "supportable" flag in bclink structure is a compatibility flag
indicating whether a peer node is capable of receiving TIPC broadcast
messages. However, all TIPC versions since tipc-1.5, and after the
inclusion in the upstream Linux kernel in 2006, support this capability.
It is highly unlikely that anybody is still using such an old
version of TIPC, let alone that they want to mix it with TIPC-2.0
nodes. Therefore, we now remove the "supportable" flag.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-22 07:50:50 -05:00
Ying Xue 3c294cb374 tipc: remove the bearer congestion mechanism
Currently at the TIPC bearer layer there is the following congestion
mechanism:

Once sending packets has failed via that bearer, the bearer will be
flagged as being in congested state at once. During bearer congestion,
all packets arriving at link will be queued on the link's outgoing
buffer.  When we detect that the state of bearer congestion has
relaxed (e.g. some packets are received from the bearer) we will try
our best to push all packets in the link's outgoing buffer until the
buffer is empty, or until the bearer is congested again.

However, in fact the TIPC bearer never receives any feedback from the
device layer whether a send was successful or not, so it must always
assume it was successful. Therefore, the bearer congestion mechanism
as it exists currently is of no value.

But the bearer blocking state is still useful for us. For example,
when the physical media goes down/up, we need to change the state of
the links bound to the bearer.  So the code maintaing the state
information is not removed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-21 20:07:25 -05:00
Ying Xue 7503115107 tipc: wake up all waiting threads at socket shutdown
When a socket is shut down, we should wake up all thread sleeping on
it, instead of just one of them. Otherwise, when several threads are
polling the same socket, and one of them does shutdown(), the
remaining threads may end up sleeping forever.

Also, to align socket usage with common practice in other stacks, we
use one of the common socket callback handlers, sk_state_change(),
to wake up pending users. This is similar to the usage in e.g.
inet_shutdown(). [net/ipv4/af_inet.c].

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-21 20:06:29 -05:00
Erik Hugne c4fc298ab4 tipc: return POLLOUT for sockets in an unconnected state
If an implied connect is attempted on a nonblocking STREAM/SEQPACKET
socket during link congestion, the connect message will be discarded
and sendmsg will return EAGAIN. This is normal behavior, and the
application is expected to poll the socket until POLLOUT is set,
after which the connection attempt can be retried.
However, the POLLOUT flag is never set for unconnected sockets and
poll() always returns a zero mask. The application is then left without
a trigger for when it can make another attempt at sending the message.

The solution is to check if we're polling on an unconnected socket
and set the POLLOUT flag if the TIPC port owned by this socket
is not congested. The TIPC ports waiting on a specific link will be
marked as 'not congested' when the link congestion have abated.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-21 14:54:32 -05:00
Ying Xue f288bef464 tipc: fix race/inefficiencies in poll/wait behaviour
When an application blocks at poll/select on a TIPC socket
while requesting a specific event mask, both the filter_rcv() and
wakeupdispatch() case will wake it up unconditionally whenever
the state changes (i.e an incoming message arrives, or congestion
has subsided).  No mask is used.

To avoid this, we populate sk->sk_data_ready and sk->sk_write_space
with tipc_data_ready and tipc_write_space respectively, which makes
tipc more in alignment with the rest of the networking code.  These
pass the exact set of possible events to the waker in fs/select.c
hence avoiding waking up blocked processes unnecessarily.

In doing so, we uncover another issue -- that there needs to be a
memory barrier in these poll/receive callbacks, otherwise we are
subject to the the same race as documented above wq_has_sleeper()
[in commit a57de0b4 "net: adding memory barrier to the poll and
receive callbacks"].  So we need to replace poll_wait() with
sock_poll_wait() and use rcu protection for the sk->sk_wq pointer
in these two new functions.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-21 14:54:31 -05:00
Neil Horman de4594a51c sctp: send abort chunk when max_retrans exceeded
In the event that an association exceeds its max_retrans attempts, we should
send an ABORT chunk indicating that we are closing the assocation as a result.
Because of the nature of the error, its unlikely to be received, but its a nice
clean way to close the association if it does make it through, and it will give
anyone watching via tcpdump a clue as to what happened.

Change notes:
v2)
	* Removed erroneous changes from sctp_make_violation_parmlen

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-20 15:50:37 -05:00
Sachin Kamat 388dfc2d2d net: Remove redundant null check before kfree in dev.c
kfree on a null pointer is a no-op.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-20 13:48:09 -05:00
Sachin Kamat 973b1b9a45 caif: Remove redundant null check before kfree in cfctrl.c
kfree on a null pointer is a no-op.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-20 13:48:09 -05:00
Sachin Kamat 4babfb9b27 bnx2x: Remove duplicate inclusion of bnx2x_hsi.h
bnx2x_hsi.h was included twice.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-20 13:46:34 -05:00
Nicolas Dichtel e2f1f072db sit: allow to configure 6rd tunnels via netlink
This patch add the support of 6RD tunnels management via netlink.
Note that netdev_state_change() is now called when 6RD parameters are updated.

6RD parameters are updated only if there is at least one 6RD attribute.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-20 13:43:28 -05:00
David Stevens e4f67addf1 add DOVE extensions for VXLAN
This patch provides extensions to VXLAN for supporting Distributed
Overlay Virtual Ethernet (DOVE) networks. The patch includes:

	+ a dove flag per VXLAN device to enable DOVE extensions
	+ ARP reduction, whereby a bridge-connected VXLAN tunnel endpoint
		answers ARP requests from the local bridge on behalf of
		remote DOVE clients
	+ route short-circuiting (aka L3 switching). Known destination IP
		addresses use the corresponding destination MAC address for
		switching rather than going to a (possibly remote) router first.
	+ netlink notification messages for forwarding table and L3 switching
		misses

Changes since v2
	- combined bools into "u32 flags"
	- replaced loop with !is_zero_ether_addr()

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-20 13:41:28 -05:00
Ben Hutchings ff33c0e188 net: Remove bogus dependencies on INET
Various drivers depend on INET because they used to select INET_LRO,
but they have all been converted to use GRO which has no such
dependency.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 19:13:59 -05:00
Ben Hutchings 3865fe169a ehea: Remove remnants of LRO support
Commit 2cb1deb56f ('ehea: Remove LRO
support') left behind the Kconfig depends/select and feature flag.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 19:13:59 -05:00
Ben Hutchings f1d29a3fa6 mlx4_en: Remove remnants of LRO support
Commit fa37a9586f ('mlx4_en: Moving to
work with GRO') left behind the Kconfig depends/select, some dead
code and comments referring to LRO.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 19:13:59 -05:00
Johannes Berg 01f1c6b994 net: remove unnecessary wireless includes
The wireless and wext includes in net-sysfs.c aren't
needed, so remove them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 19:10:36 -05:00
Joachim Eastwood c867b55eb4 net/ethernet: remove useless is_valid_ether_addr from drivers ndo_open
If ndo_validate_addr is set to the generic eth_validate_addr
function there is no point in calling is_valid_ether_addr
from driver ndo_open if ndo_open is not used elsewhere in
the driver.

With this change is_valid_ether_addr will be called from the
generic eth_validate_addr function. So there should be no change
in the actual behavior.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 19:01:18 -05:00
Shan Wei ae4b46e9d7 net: rds: use this_cpu_* per-cpu helper
Signed-off-by: Shan Wei <davidshan@tencent.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 18:59:44 -05:00
Shan Wei 1f743b0765 net: core: use this_cpu_ptr per-cpu helper
flush_tasklet is a struct, not a pointer in percpu var.
so use this_cpu_ptr to get the member pointer.

Signed-off-by: Shan Wei <davidshan@tencent.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 18:59:44 -05:00
Sachin Kamat 91aa76518d vhost: Remove duplicate inclusion of linux/vhost.h
linux/vhost.h was included twice.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 14:51:40 -05:00
Nicolas Ferre 909a85834d net/macb: move to circ_buf macros and fix initial condition
Move to circular buffers management macro and correct an error
with circular buffer initial condition.

Without this patch, the macb_tx_ring_avail() function was
not reporting the proper ring availability at startup:
macb macb: eth0: BUG! Tx Ring full when queue awake!
macb macb: eth0: tx_head = 0, tx_tail = 0
And hanginig forever...

I remove the macb_tx_ring_avail() function and use the
proven macros from circ_buf.h. CIRC_CNT() is used in the
"consumer" part of the driver: macb_tx_interrupt() to match
advice from Documentation/circular-buffers.txt.

Reported-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 14:21:25 -05:00
Eric W. Biederman e5ef39eda6 netfilter: Remove the spurious \ in __ip_vs_lblc_init
In (464dc801c7 net: Don't export sysctls to unprivileged users)
I typoed and introduced a spurious backslash.  Delete it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 14:20:42 -05:00
Stefan Raspl 18af5c1797 qeth: Remove BUG_ONs
Remove BUG_ONs or convert to WARN_ON_ONCE/WARN_ONs since a failure within a
networking device driver is no reason to shut down the entire machine.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 14:19:00 -05:00
Stefan Raspl 395672e098 qeth: Consolidate tracing of card features
Trace all supported and enabled card features to s390dbf.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 14:19:00 -05:00
Stefan Raspl 7096b18739 qeth: Clarify card type naming for virtual NICs
So far, virtual NICs whether attached to a VSWITCH or a guest LAN were always
displayed as guest LANs in the device driver attributes and messages, while
in fact it is a virtual NIC.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 14:19:00 -05:00
Ursula Braun 569743e493 claw: remove BUG_ONs
Remove BUG_ON's in claw driver, since the checked error conditions
are null pointer accesses.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 14:19:00 -05:00
Ursula Braun bfd2eb3b4a ctcm: remove BUG_ONs
Remove BUG_ON's in ctcm driver, since the checked error conditions
are null pointer accesses.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 14:19:00 -05:00
Stefan Raspl 7bf9bcff45 qeth: Remove unused variable
Eliminate a variable that is never modified.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 14:19:00 -05:00
Eric W. Biederman c260b7722f net: Allow userns root to control tun and tap devices
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) calls to
ns_capable(net->user_ns,CAP_NET_ADMIN) calls.

Allow setting of the tun iff flags.
Allow creating of tun devices.
Allow adding a new queue to a tun device.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 14:15:54 -05:00
Eric W. Biederman 3594698a1f net: Make CAP_NET_BIND_SERVICE per user namespace
Allow privileged users in any user namespace to bind to
privileged sockets in network namespaces they control.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:33:37 -05:00
Eric W. Biederman b51642f6d7 net: Enable a userns root rtnl calls that are safe for unprivilged users
- Only allow moving network devices to network namespaces you have
  CAP_NET_ADMIN privileges over.

- Enable creating/deleting/modifying interfaces
- Enable adding/deleting addresses
- Enable adding/setting/deleting neighbour entries
- Enable adding/removing routes
- Enable adding/removing fib rules
- Enable setting the forwarding state
- Enable adding/removing ipv6 address labels
- Enable setting bridge parameter

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:33:36 -05:00
Eric W. Biederman c027aab4a6 net: Enable some sysctls that are safe for the userns root
- Enable the per device ipv4 sysctls:
   net/ipv4/conf/<if>/forwarding
   net/ipv4/conf/<if>/mc_forwarding
   net/ipv4/conf/<if>/accept_redirects
   net/ipv4/conf/<if>/secure_redirects
   net/ipv4/conf/<if>/shared_media
   net/ipv4/conf/<if>/rp_filter
   net/ipv4/conf/<if>/send_redirects
   net/ipv4/conf/<if>/accept_source_route
   net/ipv4/conf/<if>/accept_local
   net/ipv4/conf/<if>/src_valid_mark
   net/ipv4/conf/<if>/proxy_arp
   net/ipv4/conf/<if>/medium_id
   net/ipv4/conf/<if>/bootp_relay
   net/ipv4/conf/<if>/log_martians
   net/ipv4/conf/<if>/tag
   net/ipv4/conf/<if>/arp_filter
   net/ipv4/conf/<if>/arp_announce
   net/ipv4/conf/<if>/arp_ignore
   net/ipv4/conf/<if>/arp_accept
   net/ipv4/conf/<if>/arp_notify
   net/ipv4/conf/<if>/proxy_arp_pvlan
   net/ipv4/conf/<if>/disable_xfrm
   net/ipv4/conf/<if>/disable_policy
   net/ipv4/conf/<if>/force_igmp_version
   net/ipv4/conf/<if>/promote_secondaries
   net/ipv4/conf/<if>/route_localnet

- Enable the global ipv4 sysctl:
   net/ipv4/ip_forward

- Enable the per device ipv6 sysctls:
   net/ipv6/conf/<if>/forwarding
   net/ipv6/conf/<if>/hop_limit
   net/ipv6/conf/<if>/mtu
   net/ipv6/conf/<if>/accept_ra
   net/ipv6/conf/<if>/accept_redirects
   net/ipv6/conf/<if>/autoconf
   net/ipv6/conf/<if>/dad_transmits
   net/ipv6/conf/<if>/router_solicitations
   net/ipv6/conf/<if>/router_solicitation_interval
   net/ipv6/conf/<if>/router_solicitation_delay
   net/ipv6/conf/<if>/force_mld_version
   net/ipv6/conf/<if>/use_tempaddr
   net/ipv6/conf/<if>/temp_valid_lft
   net/ipv6/conf/<if>/temp_prefered_lft
   net/ipv6/conf/<if>/regen_max_retry
   net/ipv6/conf/<if>/max_desync_factor
   net/ipv6/conf/<if>/max_addresses
   net/ipv6/conf/<if>/accept_ra_defrtr
   net/ipv6/conf/<if>/accept_ra_pinfo
   net/ipv6/conf/<if>/accept_ra_rtr_pref
   net/ipv6/conf/<if>/router_probe_interval
   net/ipv6/conf/<if>/accept_ra_rt_info_max_plen
   net/ipv6/conf/<if>/proxy_ndp
   net/ipv6/conf/<if>/accept_source_route
   net/ipv6/conf/<if>/optimistic_dad
   net/ipv6/conf/<if>/mc_forwarding
   net/ipv6/conf/<if>/disable_ipv6
   net/ipv6/conf/<if>/accept_dad
   net/ipv6/conf/<if>/force_tllao

- Enable the global ipv6 sysctls:
   net/ipv6/bindv6only
   net/ipv6/icmp/ratelimit

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:33:00 -05:00
Eric W. Biederman 276996fda0 net: Allow the userns root to control vlans.
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Allow the vlan ioctls:
SET_VLAN_INGRESS_PRIORITY_CMD
SET_VLAN_EGRESS_PRIORITY_CMD
SET_VLAN_FLAG_CMD
SET_VLAN_NAME_TYPE_CMD
ADD_VLAN_CMD
DEL_VLAN_CMD

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:33:00 -05:00
Eric W. Biederman cb99050305 net: Allow userns root to control the network bridge code.
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Allow setting bridge paramters via sysfs.

Allow all of the bridge ioctls:
BRCTL_ADD_IF
BRCTL_DEL_IF
BRCTL_SET_BRDIGE_FORWARD_DELAY
BRCTL_SET_BRIDGE_HELLO_TIME
BRCTL_SET_BRIDGE_MAX_AGE
BRCTL_SET_BRIDGE_AGING_TIME
BRCTL_SET_BRIDGE_STP_STATE
BRCTL_SET_BRIDGE_PRIORITY
BRCTL_SET_PORT_PRIORITY
BRCTL_SET_PATH_COST
BRCTL_ADD_BRIDGE
BRCTL_DEL_BRDIGE

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:33:00 -05:00
Eric W. Biederman df008c91f8 net: Allow userns root to control llc, netfilter, netlink, packet, and xfrm
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Allow creation of af_key sockets.
Allow creation of llc sockets.
Allow creation of af_packet sockets.

Allow sending xfrm netlink control messages.

Allow binding to netlink multicast groups.
Allow sending to netlink multicast groups.
Allow adding and dropping netlink multicast groups.
Allow sending to all netlink multicast groups and port ids.

Allow reading the netfilter SO_IP_SET socket option.
Allow sending netfilter netlink messages.
Allow setting and getting ip_vs netfilter socket options.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:45 -05:00
Eric W. Biederman af31f412c7 net: Allow userns root to control ipv6
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed while
resource control is left unchanged.

Allow the SIOCSIFADDR ioctl to add ipv6 addresses.
Allow the SIOCDIFADDR ioctl to delete ipv6 addresses.
Allow the SIOCADDRT ioctl to add ipv6 routes.
Allow the SIOCDELRT ioctl to delete ipv6 routes.

Allow creation of ipv6 raw sockets.

Allow setting the IPV6_JOIN_ANYCAST socket option.
Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR
socket option.

Allow setting the IPV6_TRANSPARENT socket option.
Allow setting the IPV6_HOPOPTS socket option.
Allow setting the IPV6_RTHDRDSTOPTS socket option.
Allow setting the IPV6_DSTOPTS socket option.
Allow setting the IPV6_IPSEC_POLICY socket option.
Allow setting the IPV6_XFRM_POLICY socket option.

Allow sending packets with the IPV6_2292HOPOPTS control message.
Allow sending packets with the IPV6_2292DSTOPTS control message.
Allow sending packets with the IPV6_RTHDRDSTOPTS control message.

Allow setting the multicast routing socket options on non multicast
routing sockets.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for
setting up, changing and deleting tunnels over ipv6.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for
setting up, changing and deleting ipv6 over ipv4 tunnels.

Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding,
deleting, and changing the potential router list for ISATAP tunnels.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:45 -05:00
Eric W. Biederman 52e804c6df net: Allow userns root to control ipv4
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed
while resource control is left unchanged.

Allow creating raw sockets.
Allow the SIOCSARP ioctl to control the arp cache.
Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting gre tunnels.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipip tunnels.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipsec virtual tunnel interfaces.

Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
sockets.

Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
arbitrary ip options.

Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
Allow setting the IP_TRANSPARENT ipv4 socket option.
Allow setting the TCP_REPAIR socket option.
Allow setting the TCP_CONGESTION socket option.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:45 -05:00
Eric W. Biederman 5e1fccc0bf net: Allow userns root control of the core of the network stack.
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed
while resource control is left unchanged.

Allow ethtool ioctls.

Allow binding to network devices.
Allow setting the socket mark.
Allow setting the socket priority.

Allow setting the network device alias via sysfs.
Allow setting the mtu via sysfs.
Allow changing the network device flags via sysfs.
Allow setting the network device group via sysfs.

Allow the following network device ioctls.
SIOCGMIIPHY
SIOCGMIIREG
SIOCSIFNAME
SIOCSIFFLAGS
SIOCSIFMETRIC
SIOCSIFMTU
SIOCSIFHWADDR
SIOCSIFSLAVE
SIOCADDMULTI
SIOCDELMULTI
SIOCSIFHWBROADCAST
SIOCSMIIREG
SIOCBONDENSLAVE
SIOCBONDRELEASE
SIOCBONDSETHWADDR
SIOCBONDCHANGEACTIVE
SIOCBRADDIF
SIOCBRDELIF
SIOCSHWTSTAMP

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:45 -05:00
Eric W. Biederman 00f70de09c net: Allow userns root to force the scm creds
If the user calling sendmsg has the appropriate privieleges
in their user namespace allow them to set the uid, gid, and
pid in the SCM_CREDENTIALS control message to any valid value.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:45 -05:00
Zhao Hongjiang 86937c05cb user_ns: get rid of duplicate code in net_ctl_permissions
Get rid of duplicate code in net_ctl_permissions and fix the comment.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:45 -05:00
Eric W. Biederman cff109768b net: Update the per network namespace sysctls to be available to the network namespace owner
- Allow anyone with CAP_NET_ADMIN rights in the user namespace of the
  the netowrk namespace to change sysctls.
- Allow anyone the uid of the user namespace root the same
  permissions over the network namespace sysctls as the global root.
- Allow anyone with gid of the user namespace root group the same
  permissions over the network namespace sysctl as the global root group.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:45 -05:00
Eric W. Biederman dfc47ef863 net: Push capable(CAP_NET_ADMIN) into the rtnl methods
- In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check
  to ns_capable(net->user-ns, CAP_NET_ADMIN).  Allowing unprivileged
  users to make netlink calls to modify their local network
  namespace.

- In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so
  that calls that are not safe for unprivileged users are still
  protected.

Later patches will remove the extra capable calls from methods
that are safe for unprivilged users.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:44 -05:00
Eric W. Biederman 464dc801c7 net: Don't export sysctls to unprivileged users
In preparation for supporting the creation of network namespaces
by unprivileged users, modify all of the per net sysctl exports
and refuse to allow them to unprivileged users.

This makes it safe for unprivileged users in general to access
per net sysctls, and allows sysctls to be exported to unprivileged
users on an individual basis as they are deemed safe.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:30:55 -05:00
Eric W. Biederman 73f7ef4359 sysctl: Pass useful parameters to sysctl permissions
- Current is implicitly avaiable so passing current->nsproxy isn't useful.
- The ctl_table_header is needed to find how the sysctl table is connected
  to the rest of sysctl.
- ctl_table_root is avaiable in the ctl_table_header so no need to it.

With these changes it becomes possible to write a version of
net_sysctl_permission that takes into account the network namespace of
the sysctl table, an important feature in extending the user namespace.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:30:55 -05:00
Eric W. Biederman d328b83682 userns: make each net (net_ns) belong to a user_ns
The user namespace which creates a new network namespace owns that
namespace and all resources created in it.  This way we can target
capability checks for privileged operations against network resources to
the user_ns which created the network namespace in which the resource
lives.  Privilege to the user namespace which owns the network
namespace, or any parent user namespace thereof, provides the same
privilege to the network resource.

This patch is reworked from a version originally by
Serge E. Hallyn <serge.hallyn@canonical.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:30:55 -05:00
Eric W. Biederman 2407dc25f7 netns: Deduplicate and fix copy_net_ns when !CONFIG_NET_NS
The copy of copy_net_ns used when the network stack is not
built is broken as it does not return -EINVAL when attempting
to create a new network namespace.  We don't even have
a previous network namespace.

Since we need a copy of copy_net_ns in net/net_namespace.h that is
available when the networking stack is not built at all move the
correct version of copy_net_ns from net_namespace.c into net_namespace.h
Leaving us with just 2 versions of copy_net_ns.  One version for when
we compile in network namespace suport and another stub for all other
occasions.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:30:55 -05:00
Vlad Yasevich 75fe83c322 ipv6: Preserve ipv6 functionality needed by NET
Some pieces of network use core pieces of IPv6 stack.  Keep
them available while letting new GSO offload pieces depend
on CONFIG_INET.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 02:34:00 -05:00