In preempt case current arch_local_irq_restore() from
preempt_schedule_irq() may enable hard interrupt but we really
should disable interrupts when we return from the interrupt,
and so that we don't get interrupted after loading SRR0/1.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The calculation for the left shift of the mask OPROFILE_PM_PMCSEL_MSK has an
error. The calculation is should be to shift left by (max_cntrs - cntr) times
the width of the pmsel field width. However, the #define OPROFILE_MAX_PMC_NUM
was used instead of OPROFILE_PMSEL_FIELD_WIDTH. This patch fixes the
calculation.
Signed-off-by: Carl Love <cel@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
commit f96972f2dc "kernel/sys.c: call disable_nonboot_cpus() in
kernel_restart()"
added a call to disable_nonboot_cpus() on kernel_restart(), which tries
to shutdown all the CPUs except the first one. The issue with the PA
Semi, is that it does not support CPU hotplug.
When the call is made to __cpu_down(), it calls the notifiers
CPU_DOWN_PREPARE, and then tries to take the CPU down.
One of the notifiers to the CPU hotplug code, is the cpufreq. The
DOWN_PREPARE will call __cpufreq_remove_dev() which calls
cpufreq_driver->exit. The PA Semi exit handler unmaps regions of I/O
that is used by an interrupt that goes off constantly
(system_reset_common, but it goes off during normal system operations
too). I'm not sure exactly what this interrupt does.
Running a simple function trace, you can see it goes off quite a bit:
# tracer: function
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
<idle>-0 [001] 1558.859363: .pasemi_system_reset_exception <-.system_reset_exception
<idle>-0 [000] 1558.860112: .pasemi_system_reset_exception <-.system_reset_exception
<idle>-0 [000] 1558.861109: .pasemi_system_reset_exception <-.system_reset_exception
<idle>-0 [001] 1558.861361: .pasemi_system_reset_exception <-.system_reset_exception
<idle>-0 [000] 1558.861437: .pasemi_system_reset_exception <-.system_reset_exception
When the region is unmapped, the system crashes with:
Disabling non-boot CPUs ...
Error taking CPU1 down: -38
Unable to handle kernel paging request for data at address 0xd0000800903a0100
Faulting instruction address: 0xc000000000055fcc
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
Modules linked in: shpchp
NIP: c000000000055fcc LR: c000000000055fb4 CTR: c0000000000df1fc
REGS: c0000000012175d0 TRAP: 0300 Not tainted (3.8.0-rc4-test-dirty)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000088 XER: 00000000
SOFTE: 0
DAR: d0000800903a0100, DSISR: 42000000
TASK = c0000000010e9008[0] 'swapper/0' THREAD: c000000001214000 CPU: 0
GPR00: d0000800903a0000 c000000001217850 c0000000012167e0 0000000000000000
GPR04: 0000000000000000 0000000000000724 0000000000000724 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000001 0000000000a70000
GPR12: 0000000024000080 c00000000fff0000 ffffffffffffffff 000000003ffffae0
GPR16: ffffffffffffffff 0000000000a21198 0000000000000060 0000000000000000
GPR20: 00000000008fdd35 0000000000a21258 000000003ffffaf0 0000000000000417
GPR24: 0000000000a226d0 c000000000000000 0000000000000000 0000000000000000
GPR28: c00000000138b358 0000000000000000 c000000001144818 d0000800903a0100
NIP [c000000000055fcc] .set_astate+0x5c/0xa4
LR [c000000000055fb4] .set_astate+0x44/0xa4
Call Trace:
[c000000001217850] [c000000000055fb4] .set_astate+0x44/0xa4 (unreliable)
[c0000000012178f0] [c00000000005647c] .restore_astate+0x2c/0x34
[c000000001217980] [c000000000054668] .pasemi_system_reset_exception+0x6c/0x88
[c000000001217a00] [c000000000019ef0] .system_reset_exception+0x48/0x84
[c000000001217a80] [c000000000001e40] system_reset_common+0x140/0x180
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
dmraid assess redundancy and replacements slightly inaccurately which
could lead to some degraded arrays failing to assemble.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIVAwUAUQb0OTnsnt1WYoG5AQJ4qg/8DIs0eWaJWF5lqF8qJBeMQrUDkqe7glqe
ezwVlO48uxVzJUxCMd1aZdxjTI/AIvqZ1U5DBC62SlB/RqWlrwIQIIok5odWfnG4
eAII5hUnktWqL4Ksqz4mgdI+WWwSc3JR7verqS4wOvcsN4qU5t8vL6dj2b0cNYCQ
I7B+fkMZE1KE4Y2DOf8dEO9gbNIO1ZllZapLRolrsGOr8Ggo1prEoMxBYF5HZNoE
J1As2N6NA7/kadtsfkCSs+f//5t1uMZluMjEUe4lDmgeqzqz/93kFmJ2OfSdqO4J
wuuTHbCL+NSjjAZuByluSO98O0h87xXGVMv/c7gadVQtOn6I1DA2i4wTaiHOzr4K
cdALvbteVCAPYLMA+s8ee6YYbB5pnlblT8FShG+3O6ae1KmbqKex1LlZLpwEoS8y
VxI1WCSQbBr/ejAnhxLFQPo5OAcoeHomlZHKPtCBSbwQ0f0pOHHPYlZyX2PtX6hF
U9bmtMq0XZulDORdLmIsEEpwzRKQ+b89+RrYXM7AhkJTxRP59RVwqFHy9SybcFBS
S5XFKqpCE+ioBvLp9HK189xMe0Nel2g7KWd34v5LcvQ21rzATezAh5TsWIzN3oV8
9/phd6nZa0hhcELykOTmK5b6+ks2tBfEN2FuyKfSq4Z2nz46rfD4wYVTY2+Qjh+D
hmUDBgguejo=
=bJyL
-----END PGP SIGNATURE-----
Merge tag 'md-3.8-fixes' of git://neil.brown.name/md
Pull dmraid fix from NeilBrown:
"Just one fix for md in 3.8
dmraid assess redundancy and replacements slightly inaccurately which
could lead to some degraded arrays failing to assemble."
* tag 'md-3.8-fixes' of git://neil.brown.name/md:
DM-RAID: Fix RAID10's check for sufficient redundancy
This patch fixes MAX_STACK_TRACE_ENTRIES too low warning for ppc32,
which is similar to commit 12660b17.
Reported-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit fb59581404 removed
xfs_flushinval_pages() and changed its callers to use
filemap_write_and_wait() and truncate_pagecache_range() directly.
But in xfs_swap_extents() this change accidental switched the argument
for 'tip' to 'ip'. This patch switches it back to 'tip'
Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
One fix for the AMD IOMMU driver to work around broken BIOSes found in
the field. Some BIOSes forget to enable a workaround for a hardware
problem which might cause the IOMMU to stop working under high load
conditions. The fix makes sure this workaround is enabled.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRBpTXAAoJECvwRC2XARrjGDUQAKsNxpN2iD0BEvVUqCzTR7ha
BTwxwKnMwxBr0vZZZzCCT9JnNXcPKKfJYLEWqW5QE7m/qlvYiBxS8Cg8uAfGVw0n
y/y/SQPww7jeImyOCvAN9Axl+SZ8sHmKJTmS4343+CqpQ1e6PilC4WV5ogmOz/Gy
nc9bj9rJGIMEP76bCYY7rMz7xVOaHmIOE+XcEA8TTj37AOk8t9PTUqLno+APTqWd
X3jhgjRTQuisCiy+sTiGllXoa+CdH7+gmDOvd4S8CRzrhIznPDNI+x7UNfq8n5A0
KBqwUEzeQ5fyqqopJQaSaK8+6eTZ8dUxcfpqjyD/sxe7dLY0V+1KBNcNZrOolz/w
juLbV+dTfSJcaJHjjvh1NEqvN4ky/6zuNF50KexaL0DSqpUkPf62heXd+P60l5DE
Tj+h3d8xX/mI1Ap2q14/4Bggvpdz3I+GPWnmyISOI7ZklxB0DlYeQiY+ZYDdO5Bl
4aNvCRRRPEG6TsZzkJR60+iSjUnGEN7PSdrDkFymvmG0U0hH73xcy5Xc4Z3mRffx
HNyK4uAnUNIgPzdZA2K9uctGLOj14Z1n/iREc2FhrGhPeyoaMhXMyWPbTVCQ0Fdx
7cV6sBzuh/RzFD/S8r+VHP4umRg8uf2+22FaAVOaOD1wtO5ug9WAZQ+nqwcOTHNc
YVO8wlC8XyybzM2+Xb7E
=X83i
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fix from Joerg Roedel:
"One fix for the AMD IOMMU driver to work around broken BIOSes found in
the field. Some BIOSes forget to enable a workaround for a hardware
problem which might cause the IOMMU to stop working under high load
conditions. The fix makes sure this workaround is enabled."
* tag 'iommu-fixes-v3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
IOMMU, AMD Family15h Model10-1Fh erratum 746 Workaround
We have some build failure fixes (twl4030, vexpress, abx500 and tps65910),
some actual runtime oops and lockup fixes (rtsx, da9052), and some more
hypothetical NULL pointers dereferences fixes for pcf50633 and max776xx.
Then we also have additional rtsx fixes for a correct switch output voltage
and clock divider correctness for rtl8411 (rtsx driver), and irqdomain fix for
db8550-prcmu, and some more cosmetic fixes for arizona and wm5102.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRBIBuAAoJEIqAPN1PVmxKCZYP/3EP7I6nTnHfmMJHr6KhL9F8
h/PFzSJiYC5DHoYpvcD6ESkDtqgZTOgt/R8VzbzcfCoSAlARCyo3WenCjUhREspW
2vCb+rVqBXc3+pn/Hed5WlTx3a231iSYiQd4OMbDkG22TuTKdf4GOWcl4KnAVMjp
NMqD3wCkDeMkutxRO7eWc+B/eXmYDp38abiYU+xJCMfmpvRwiPp7/RQTw/9kHgF/
VHGqzH91YPJmcF9OcDDzsvJ2zGwPsXPhtsOnwxL7KkjI4WM4EZv8Nr0NwTsuIgNJ
liqs4QO1XpTF+bAPKW/aT4VVLxYmLrzVao+bg6A9Vn5Q6Wt+N4McectvN7yndfOQ
GuSPI+LqcZvDEHaKGybRFdsbN+sh95f7Qz6dbFedJ3nWBhlFd7YiXgkQF3Yg38sX
rbK66F0PuH7F010a3cbhZ4jsHUb1MxzU6YSCLwUvukF1ijitPP89md0K9YaN9cbT
YbBdZpphaiFePz9CjRyyYJvo4DC9i9BTgC8Ac3qiG1TELhb/Dl064d4o0oDDEfzH
qVo21yUWeJ9jsHMnFvJuaDe9IbfxyDWJSLXFPlwaW/1qdbDPKzCr1Sro4v+lmOh5
1RIiHfu52RSPDewo0ACZPPOd8h8/Jfra37CDiGPGnjbEkUJTxC7XfHie6M9034ov
m/ORqHJOi6Wh9Iy7YHM3
=rxug
-----END PGP SIGNATURE-----
Merge tag 'mfd-for-linus-3.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
Pull MFD fixes from Samuel Ortiz:
"This is the first pull request for MFD fixes for 3.8
We have some build failure fixes (twl4030, vexpress, abx500 and
tps65910), some actual runtime oops and lockup fixes (rtsx, da9052),
and some more hypothetical NULL pointers dereferences fixes for
pcf50633 and max776xx.
Then we also have additional rtsx fixes for a correct switch output
voltage and clock divider correctness for rtl8411 (rtsx driver), and
irqdomain fix for db8550-prcmu, and some more cosmetic fixes for
arizona and wm5102."
* tag 'mfd-for-linus-3.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
mfd: rtsx: Fix oops when rtsx_pci_sdmmc is not probed
mfd: wm5102: Fix definition of WM5102_MAX_REGISTER
mfd: twl4030: Don't warn about uninitialized return code
mfd: da9052/53 lockup fix
mfd: rtsx: Add clock divider hook
mmc: rtsx: Call MFD hook to switch output voltage
mfd: rtsx: Add output voltage switch hook
mfd: Fix compile errors and warnings when !CONFIG_AB8500_BM
mfd: vexpress: Export global functions to fix build error
mfd: arizona: Check errors from regcache_sync()
mfd: tc3589x: Use simple irqdomain
mfd: pcf50633: Init pcf->dev before using it
mfd: max77693: Init max77693->dev before using it
mfd: max77686: Init max77686->dev before using it
mfd: db8500-prcmu: Fix irqdomain usage
mfd: tps65910: Select REGMAP_IRQ in Kconfig to fix build error
mfd: arizona: Disable control interface reporting for WM5102 and WM5110
Pull networking updates from David Miller:
"Much more accumulated than I would have liked due to an unexpected
bout with a nasty flu:
1) AH and ESP input don't set ECN field correctly because the
transport head of the SKB isn't set correctly, fix from Li
RongQing.
2) If netfilter conntrack zones are disabled, we can return an
uninitialized variable instead of the proper error code. Fix from
Borislav Petkov.
3) Fix double SKB free in ath9k driver beacon handling, from Felix
Feitkau.
4) Remove bogus assumption about netns cleanup ordering in
nf_conntrack, from Pablo Neira Ayuso.
5) Remove a bogus BUG_ON in the new TCP fastopen code, from Eric
Dumazet. It uses spin_is_locked() in it's test and is therefore
unsuitable for UP.
6) Fix SELINUX labelling regressions added by the tuntap multiqueue
changes, from Paul Moore.
7) Fix CRC errors with jumbo frame receive in tg3 driver, from Nithin
Nayak Sujir.
8) CXGB4 driver sets interrupt coalescing parameters only on first
queue, rather than all of them. Fix from Thadeu Lima de Souza
Cascardo.
9) Fix regression in the dispatch of read/write registers in dm9601
driver, from Tushar Behera.
10) ipv6_append_data miscalculates header length, from Romain KUNTZ.
11) Fix PMTU handling regressions on ipv4 routes, from Steffen
Klassert, Timo Teräs, and Julian Anastasov.
12) In 3c574_cs driver, add necessary parenthesis to "x << y & z"
expression. From Nickolai Zeldovich.
13) macvlan_get_size() causes underallocation netlink message space,
fix from Eric Dumazet.
14) Avoid division by zero in xfrm_replay_advance_bmp(), from Nickolai
Zeldovich. Amusingly the zero check was already there, we were
just performing it after the modulus :-)
15) Some more splice bug fixes from Eric Dumazet, which fix things
mostly eminating from how we now more aggressively use high-order
pages in SKBs.
16) Fix size calculation bug when freeing hash tables in the IPSEC
xfrm code, from Michal Kubecek.
17) Fix PMTU event propagation into socket cached routes, from Steffen
Klassert.
18) Fix off by one in TX buffer release in netxen driver, from Eric
Dumazet.
19) Fix rediculous memory allocation requirements introduced by the
tuntap multiqueue changes, from Jason Wang.
20) Remove bogus AMD platform workaround in r8169 driver that causes
major problems in normal operation, from Timo Teräs.
21) virtio-net set affinity and select queue don't handle
discontiguous cpu numbers properly, fix from Wanlong Gao.
22) Fix a route refcounting issue in loopback driver, from Eric
Dumazet. There's a similar fix coming that we might add to the
macvlan driver as well.
23) Fix SKB leaks in batman-adv's distributed arp table code, from
Matthias Schiffer.
24) r8169 driver gives descriptor ownership back the hardware before
we're done reading the VLAN tag out of it, fix from Francois
Romieu.
25) Checksums not calculated properly in GRE tunnel driver fix from
Pravin B Shelar.
26) Fix SCTP memory leak on namespace exit."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits)
dm9601: support dm9620 variant
SCTP: Free the per-net sysctl table on net exit. v2
net: phy: icplus: fix broken INTR pin settings
net: phy: icplus: Use the RGMII interface mode to configure clock delays
IP_GRE: Fix kernel panic in IP_GRE with GRE csum.
sctp: set association state to established in dupcook_a handler
ip6mr: limit IPv6 MRT_TABLE identifiers
r8169: fix vlan tag read ordering.
net: cdc_ncm: use IAD provided by the USB core
batman-adv: filter ARP packets with invalid MAC addresses in DAT
batman-adv: check for more types of invalid IP addresses in DAT
batman-adv: fix skb leak in batadv_dat_snoop_incoming_arp_reply()
net: loopback: fix a dst refcounting issue
virtio-net: reset virtqueue affinity when doing cpu hotplug
virtio-net: split out clean affinity function
virtio-net: fix the set affinity bug when CPU IDs are not consecutive
can: pch_can: fix invalid error codes
can: ti_hecc: fix invalid error codes
can: c_can: fix invalid error codes
r8169: remove the obsolete and incorrect AMD workaround
...
Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.
CC: xfs@oss.sgi.com
CC: Ben Myers <bpm@sgi.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
When the new inode verify in xfs_iread() fails, the create
transaction is aborted and a shutdown occurs. The subsequent unmount
then hangs in xfs_wait_buftarg() on a buffer that has an elevated
hold count. Debug showed that it was an AGI buffer getting stuck:
[ 22.576147] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
[ 22.976213] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
[ 23.376206] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
[ 23.776325] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
The trace of this buffer leading up to the shutdown (trimmed for
brevity) looks like:
xfs_buf_init: bno 0x2 nblks 0x1 hold 1 caller xfs_buf_get_map
xfs_buf_get: bno 0x2 len 0x200 hold 1 caller xfs_buf_read_map
xfs_buf_read: bno 0x2 len 0x200 hold 1 caller xfs_trans_read_buf_map
xfs_buf_iorequest: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
xfs_buf_hold: bno 0x2 nblks 0x1 hold 1 caller xfs_buf_iorequest
xfs_buf_rele: bno 0x2 nblks 0x1 hold 2 caller xfs_buf_iorequest
xfs_buf_iowait: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
xfs_buf_ioerror: bno 0x2 len 0x200 hold 1 caller xfs_buf_bio_end_io
xfs_buf_iodone: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_ioend
xfs_buf_iowait_done: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
xfs_buf_hold: bno 0x2 nblks 0x1 hold 1 caller xfs_buf_item_init
xfs_trans_read_buf: bno 0x2 len 0x200 hold 2 recur 0 refcount 1
xfs_trans_brelse: bno 0x2 len 0x200 hold 2 recur 0 refcount 1
xfs_buf_item_relse: bno 0x2 nblks 0x1 hold 2 caller xfs_trans_brelse
xfs_buf_rele: bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_relse
xfs_buf_unlock: bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
xfs_buf_rele: bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
xfs_buf_trylock: bno 0x2 nblks 0x1 hold 2 caller _xfs_buf_find
xfs_buf_find: bno 0x2 len 0x200 hold 2 caller xfs_buf_get_map
xfs_buf_get: bno 0x2 len 0x200 hold 2 caller xfs_buf_read_map
xfs_buf_read: bno 0x2 len 0x200 hold 2 caller xfs_trans_read_buf_map
xfs_buf_hold: bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_init
xfs_trans_read_buf: bno 0x2 len 0x200 hold 3 recur 0 refcount 1
xfs_trans_log_buf: bno 0x2 len 0x200 hold 3 recur 0 refcount 1
xfs_buf_item_unlock: bno 0x2 len 0x200 hold 3 flags DIRTY liflags ABORTED
xfs_buf_unlock: bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
xfs_buf_rele: bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
And that is the AGI buffer from cold cache read into memory to
transaction abort. You can see at transaction abort the bli is dirty
and only has a single reference. The item is not pinned, and it's
not in the AIL. Hence the only reference to it is this transaction.
The problem is that the xfs_buf_item_unlock() call is dropping the
last reference to the xfs_buf_log_item attached to the buffer (which
holds a reference to the buffer), but it is not freeing the
xfs_buf_log_item. Hence nothing will ever release the buffer, and
the unmount hangs waiting for this reference to go away.
The fix is simple - xfs_buf_item_unlock needs to detect the last
reference going away in this case and free the xfs_buf_log_item to
release the reference it holds on the buffer.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
There is a window on small filesytsems where specualtive
preallocation can be larger than that ENOSPC throttling thresholds,
resulting in specualtive preallocation trying to reserve more space
than there is space available. This causes immediate ENOSPC to be
triggered, prealloc to be turned off and flushing to occur. One the
next write (i.e. next 4k page), we do exactly the same thing, and so
effective drive into synchronous 4k writes by triggering ENOSPC
flushing on every page while in the window between the prealloc size
and the ENOSPC prealloc throttle threshold.
Fix this by checking to see if the prealloc size would consume all
free space, and throttle it appropriately to avoid premature
ENOSPC...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
When _xfs_buf_find is passed an out of range address, it will fail
to find a relevant struct xfs_perag and oops with a null
dereference. This can happen when trying to walk a filesystem with a
metadata inode that has a partially corrupted extent map (i.e. the
block number returned is corrupt, but is otherwise intact) and we
try to read from the corrupted block address.
In this case, just fail the lookup. If it is readahead being issued,
it will simply not be done, but if it is real read that fails we
will get an error being reported. Ideally this case should result
in an EFSCORRUPTED error being reported, but we cannot return an
error through xfs_buf_read() or xfs_buf_get() so this lookup failure
may result in ENOMEM or EIO errors being reported instead.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
The stack_switch check currently occurs in __xfs_bmapi_allocate,
which means the stack switch only occurs when xfs_bmapi_allocate()
is called in a loop. Pull the check up before the loop in
xfs_bmapi_write() such that the first iteration of the loop has
consistent behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
9802182 changed the return value from EWRONGFS (aka EINVAL)
to EFSCORRUPTED which doesn't seem to be handled properly by
the root filesystem probe.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
The IOMMU may stop processing page translations due to a perceived lack
of credits for writing upstream peripheral page service request (PPR)
or event logs. If the L2B miscellaneous clock gating feature is enabled
the IOMMU does not properly register credits after the log request has
completed, leading to a potential system hang.
BIOSes are supposed to disable L2B micellaneous clock gating by setting
L2_L2B_CK_GATE_CONTROL[CKGateL2BMiscDisable](D0F2xF4_x90[2]) = 1b. This
patch corrects that for those which do not enable this workaround.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <joro@8bytes.org>
I get the following warning every day with v3.7, once or
twice a day:
[ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()
As explained by Linus as well:
|
| Once we've done the "list_add_rcu()" to add it to the
| queue, we can have (another) IPI to the target CPU that can
| now see it and clear the mask.
|
| So by the time we get to actually send the IPI, the mask might
| have been cleared by another IPI.
|
This patch also fixes a system hang problem, if the data->cpumask
gets cleared after passing this point:
if (WARN_ONCE(!mask, "empty IPI mask"))
return;
then the problem in commit 83d349f35e ("x86: don't send an IPI to
the empty set of CPU's") will happen again.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: peterz@infradead.org
Cc: mina86@mina86.org
Cc: srivatsa.bhat@linux.vnet.ibm.com
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/20130126075357.GA3205@udknight
[ Tidied up the changelog and the comment in the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The recent commit fb6791d100
included the wrong logic. The lvbptr check was incorrectly
added after the patch was tested.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
dm9620 is a newer variant of dm9601 with more features (usb 2.0, checksum
offload, ..), but it can also be put in a dm9601 compatible mode, allowing
us to reuse the existing driver.
This does mean that the extended features like checksum offload cannot be
used, but that's hardly critical on a 100mbps interface.
Thanks to Sławek Wernikowski <slawek@wernikowski.net> for providing me
with a dm9620 based device to test.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the setting of the INTR pin that is
valid for IP101 A/G device and not for the IP1001.
Reported-by: Anunay Saxena <anunay.saxena@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like several other PHY devices which support RGMII, the IC+1001 allows
additional delays to by added to the RX_CLK and TX_CLK signals to
compensate for skew between the clock and data signals. Previously this
was always enabled, but this change makes use of the different RGMII
interface modes to allow the user to specify whether this should be
enabled.
Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to IP_GRE GSO support, GRE can recieve non linear skb which
results in panic in case of GRE_CSUM. Following patch fixes it by
using correct csum API.
Bug introduced in commit 6b78f16e4b (gre: add GSO support)
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have historically hard-coded entry points in head.S just so it's easy
to build the executable/bzImage headers with references to them.
Unfortunately, this leads to boot loaders abusing these "known" addresses
even when they are *explicitly* told that they "should look at the ELF
header to find this address, as it may change in the future". And even
when the address in question *has* actually been changed in the past,
without fanfare or thought to compatibility.
Thus we have bootloaders doing stunningly broken things like jumping
to offset 0x200 in the kernel startup code in 64-bit mode, *hoping*
that startup_64 is still there (it has moved at least once
before). And hoping that it's actually a 64-bit kernel despite the
fact that we don't give them any indication of that fact.
This patch should hopefully remove the temptation to abuse internal
addresses in future, where sternly worded comments have not sufficed.
Instead of having hard-coded addresses and saying "please don't abuse
these", we actually pull the addresses out of the ELF payload into
zoffset.h, and make build.c shove them back into the right places in
the bzImage header.
Rather than including zoffset.h into build.c and thus having to rebuild
the tool for every kernel build, we parse it instead. The parsing code
is small and simple.
This patch doesn't actually move any of the interesting entry points, so
any offending bootloader will still continue to "work" after this patch
is applied. For some version of "work" which includes jumping into the
compressed payload and crashing, if the bzImage it's given is a 32-bit
kernel. No change there then.
[ hpa: some of the issues in the description are addressed or
retconned by the 2.12 boot protocol. This patch has been edited to
only remove fixed addresses that were *not* thus retconned. ]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
The 'Attributes' argument to pci->Attributes() function is 64-bit. So
when invoking in 32-bit mode it takes two registers, not just one.
This fixes memory corruption when booting via the 32-bit EFI boot stub.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
If the bootloader calls the EFI handover entry point as a standard function
call, then it'll have a return address on the stack. We need to pop that
before calling efi_main(), or the arguments will all be out of position on
the stack.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
When booting under OVMF we have precisely one GOP device, and it
implements the ConOut protocol.
We break out of the loop when we look at it... and then promptly abort
because 'first_gop' never gets set. We should set first_gop *before*
breaking out of the loop. Yes, it doesn't really mean "first" any more,
but that doesn't matter. It's only a flag to indicate that a suitable
GOP was found.
In fact, we'd do just as well to initialise 'width' to zero in this
function, then just check *that* instead of first_gop. But I'll do the
minimal fix for now (and for stable@).
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358513837.2397.247.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
While sctp handling a duplicate COOKIE-ECHO and the action is
'Association restart', sctp_sf_do_dupcook_a() will processing
the unexpected COOKIE-ECHO for peer restart, but it does not set
the association state to SCTP_STATE_ESTABLISHED, so the association
could stuck in SCTP_STATE_SHUTDOWN_PENDING state forever.
This violates the sctp specification:
RFC 4960 5.2.4. Handle a COOKIE ECHO when a TCB Exists
Action
A) In this case, the peer may have restarted. .....
After this, the endpoint shall enter the ESTABLISHED state.
To resolve this problem, adding a SCTP_CMD_NEW_STATE cmd to the
command list before SCTP_CMD_REPLY cmd, this will set the restart
association to SCTP_STATE_ESTABLISHED state properly and also avoid
I-bit being set in the DATA chunk header when COOKIE_ACK is bundled
with DATA chunks.
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We did this for IPv4 in b49d3c1e1c "net: ipmr: limit MRT_TABLE
identifiers" but we need to do it for IPv6 as well. On IPv6 the name
is "pim6reg" instead of "pimreg" so there is one less digit allowed.
The strcpy() is in ip6mr_reg_vif().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Control of receive descriptor must not be returned to ethernet chipset
before vlan tag processing is done.
VLAN tag receive word is now reset both in normal and error path.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Spotted-by: Timo Teras <timo.teras@iki.fi>
Cc: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 9992c2e (net: cdc_ncm: workaround for missing CDC Union)
added code to lookup an IAD for the interface we are probing.
This is redundant. The USB core has already done the lookup
and saved the result in the USB interface struct. Use that
instead.
Cc: Greg Suarez <gsuarez@smithmicro.com>
Cc: Alexey Orishko <alexey.orishko@stericsson.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
- fix an skb memleak in DAT
- fix the ARP filtering routine in DAT by preventing bogus entries to overwrite
already existing ones in the local cache.
- fix the ARP filtering routine in DAT by preventing it to parse and add to the
cache bogus entries
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJRBV1fAAoJEADl0hg6qKeOJnkQALIJCps/Lx5XE+JKTXYlHM/Z
XJs6y5fOZQJiBaRDkp84+802ozcCIUBbZvvzA3GWFIVB63shD+kHrfIQoen8706p
XqlV+Z8ut77RKfAK2+gjyGiYhGuWkuN+KRs3ezcgwQnuBqrPSO42aeK4Tnc7/btN
kMptwcHH2I81WTKXAbMO1pw6O+0NYOFsI9HV7XAi2eLMNl18LK9D+Oj32MCjI/ZP
8ukKdSGkxF4LLi6giSPAHJyShTHAwgdC/MMPjHtemuyPKxINee8I8zF2rndJ1e5i
xl9zaLrdhXFleCmXcrOv7lqVds5kyyZBcQG2ZyIdJl/M5NWkt6BLiILZM8Z7/VX7
tzkJntTHfIftD+Q9MmQsxV79THR7EGUqaxCgRAO3tQwkFFfQoUL8JhnvTS7aeQwq
a8lWZH0q/n1aEC2El22D8H17KS+0Ai0osRITWwAAeoL8dx4h8Yc0FKJSVGOH4qrc
mSAKxacE7JPOz/j/QkL8yBKj5xx7FBWZOcIs7gVgPuIIMTiQHpHZmztupaYiVZLM
ZlavzcnO5NiEidsKzFxrRIj6PaQqq+wACNzELggWdF+ksr7lO3I1pqQTe17VDdeR
+RYUI4+ij+94bkvp6uzoQFJH/opeCvgdG7pl+6DkY97pWg52H8eb4Ercaswao2ix
q7mBawgcIJAtmzEAXsZp
=DvXf
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included changes ares:
- fix an skb memleak in DAT
- fix the ARP filtering routine in DAT by preventing bogus entries to overwrite
already existing ones in the local cache.
- fix the ARP filtering routine in DAT by preventing it to parse and add to the
cache bogus entries
Signed-off-by: David S. Miller <davem@davemloft.net>
Define the 2.12 bzImage boot protocol: add xloadflags and additional
fields to allow the command line, initramfs and struct boot_params to
live above the 4 GiB mark.
The xloadflags now communicates if this is a 64-bit kernel with the
legacy 64-bit entry point and which of the EFI handover entry points
are supported.
Avoid adding new read flags to loadflags because of claimed
bootloaders testing the whole byte for == 1 to determine bzImageness
at least until the issue can be researched further.
This is based on patches by Yinghai Lu and David Woodhouse.
Originally-by: Yinghai Lu <yinghai@kernel.org>
Originally-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org
Cc: Rob Landley <rob@landley.net>
Cc: Gokul Caushik <caushik1@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joe Millenbach <jmillenbach@gmail.com>
We do need to start the lease recovery thread prior to waiting for the
client initialisation to complete in NFSv4.1.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: stable@vger.kernel.org [>=3.7]
If walking the list in nfs4[01]_walk_client_list fails, then the most
likely explanation is that the server dropped the clientid before we
actually managed to confirm it. As long as our nfs_client is the very
last one in the list to be tested, the caller can be assured that this
is the case when the final return value is NFS4ERR_STALE_CLIENTID.
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org [>=3.7]
Tested-by: Ben Greear <greearb@candelatech.com>
The reference counting in nfs4_init_client assumes wongly that it
is safe for nfs4_discover_server_trunking() to return a pointer to a
nfs_client prior to bumping the reference count.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: stable@vger.kernel.org [>=3.7]
Currently, nfs_xdev_mount converts all errors from clone_server() to
ENOMEM, which can then leak to userspace (for instance to 'mount'). Fix that.
Also ensure that if nfs_fs_mount_common() returns an error, we
don't dprintk(0)...
The regression originated in commit 3d176e3fe4
(NFS: Use nfs_fs_mount_common() for xdev mounts)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.5]
We never want multicast MAC addresses in the Distributed ARP Table, so it's
best to completely ignore ARP packets containing them where we expect unicast
addresses.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
There are more types of IP addresses that may appear in ARP packets that we
don't want to process. While some of these should never appear in sane ARP
packets, a 0.0.0.0 source is used for duplicate address detection and thus seen
quite often.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
The callers of batadv_dat_snoop_incoming_arp_reply() assume the skb has been
freed when it returns true; fix this by calling kfree_skb before returning as
it is done in batadv_dat_snoop_incoming_arp_request().
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Commit 23caaf19b1 (ALSA: usb-mixer: Add support for Audio Class v2.0)
forgot to adjust the length check for UAC 2.0 feature unit descriptors.
This would make the code abort on encountering a feature unit without
per-channel controls, and thus prevented the driver to work with any
device having such a unit, such as the RME Babyface or Fireface UCX.
Reported-by: Florian Hanisch <fhanisch@uni-potsdam.de>
Tested-by: Matthew Robbetts <wingfeathera@gmail.com>
Tested-by: Michael Beer <beerml@sigma6audio.de>
Cc: Daniel Mack <daniel@caiaq.de>
Cc: 2.6.35+ <stable@vger.kernel.org>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The usual set of driver updates, nothing too thrilling in here - one
core change for the regulator bypass mode which was just not doing the
right thing at all and a bunch of driver specifics.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQ/k4iAAoJELSic+t+oim9vGoP+wVfKgLrhZFl8N/MN4oGtqIH
jsirqWaGHURajrQP09JOI0dpQVzHIOIZZrGQxROErCdpD+bykEVFyR1PHejmr1m/
3Et7HQ+zXHoNycvFj1Bmd800veZC/GOCGXK28fm1VNzVJtYSn4TEebeEsceSL2Hy
DCBMMhhgnmVuhD9UxwjzJWE36c20jeaTuaBKygnByf4J1j6Y29zR6hNbGLhKD2WC
fKLy2aC7DtZQjrJF43hK6RIvKvZUeKtNC01vGRYkABnydAI+gPmT+oYO1OgtsUb6
2Pqb/w35HWYzE1yZErbvhXZLHHfhYDfNOojOpr2AUida7QI42yRVtUpXqQwER2WU
dpFt9XEjXX0fWsOV6ETKYPpJ6iWuJj9ZVqxjjOMI4Hqarb/PAPpsocjcKIOc8yOm
PQwajZDQ9O9muMvUs42YKBpVtiFUE3uAx4dv30SvHl3lYGj7r/xLsvpiNo0aJzw2
GQekVOkGxmHd8Myc7V3yuxibANWg3hRONJpBDpXplpP/1qxWsd/G8ZLm3kUgxBqi
Y76tGEWUBOAY98/oudciskwEXKV0PQHWv17O75jYvIGlj11UzapahJmSVXZ4Dn8/
i5cG7tvStJ6H1CzKFPoMD1rMwDpcjq0jOwACoHapGF1mil8ArsWoCtjdpLQxFkGN
cRs+jfUAZzL3qlIzJoRK
=wG3s
-----END PGP SIGNATURE-----
Merge tag 'asoc-3.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for v3.8-rc4
The usual set of driver updates, nothing too thrilling in here - one
core change for the regulator bypass mode which was just not doing the
right thing at all and a bunch of driver specifics.
John W. Linville says:
====================
This is a batch of fixes intende for the 3.8 stream.
Regarding the iwlwifi bits, Johannes says this:
"Please pull to get a single fix from Emmanuel for a bug I introduced due
to misunderstanding the code."
Regarding the mac80211 bits, Johannes says this:
"I have a few small fixes for you:
* some mesh frames would cause encryption warnings -- fixes from Bob
* scanning would pretty much break an association if we transmitted
anything to the AP while scanning -- fix from Stanislaw
* mode injection was broken by channel contexts -- fix from Felix
* FT roaming was broken: hardware crypto would get disabled by it"
Along with that, a handful of other fixes confined to specific drivers.
Avinash Patil fixes a typo in a NULL check in mwifiex.
Larry Finger fixes a build warning in rtlwifi. Seems safe...
Stanislaw Gruszka fixes iwlegacy to prevent microcode errors when
switching from IBSS mode to STA mode.
Felix Fietkau provides a trio of ath9k fixes related to proper tuning.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Greear reported crashes in ip_rcv_finish() on a stress
test involving many macvlans.
We tracked the bug to a dst use after free. ip_rcv_finish()
was calling dst->input() and got garbage for dst->input value.
It appears the bug is in loopback driver, lacking
a skb_dst_force() before calling netif_rx().
As a result, a non refcounted dst, normally protected by a
RCU read_lock section, was escaping this section and could
be freed before the packet being processed.
[<ffffffff813a3c4d>] loopback_xmit+0x64/0x83
[<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e
[<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c
[<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e
[<ffffffff8148cfa6>] ? eth_header+0x28/0xb6
[<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7
[<ffffffff814ad835>] ip_finish_output2+0x297/0x30d
[<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d
[<ffffffff814ad90e>] ip_finish_output+0x63/0x68
[<ffffffff814ae412>] ip_output+0x61/0x67
[<ffffffff814ab904>] dst_output+0x17/0x1b
[<ffffffff814adb6d>] ip_local_out+0x1e/0x23
[<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353
[<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc
[<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b
[<ffffffff814c3571>] tcp_connect+0x53c/0x587
[<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d
[<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e
[<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431
[<ffffffff814d6913>] __inet_stream_connect+0x84/0x287
[<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
[<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f
[<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11
[<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79
[<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
[<ffffffff814d6b49>] inet_stream_connect+0x33/0x49
[<ffffffff814632c6>] sys_connect+0x75/0x98
This bug was introduced in linux-2.6.35, in commit
7fee226ad2 (net: add a noref bit on skb dst)
skb_dst_force() is enforced in dev_queue_xmit() for devices having a
qdisc.
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a cpu notifier to virtio-net, so that we can reset the
virtqueue affinity if the cpu hotplug happens. It improve
the performance through enabling or disabling the virtqueue
affinity after doing cpu hotplug.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Eric Dumazet <erdnetdev@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: virtualization@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split out the clean affinity function to virtnet_clean_affinity().
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Eric Dumazet <erdnetdev@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: virtualization@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Michael mentioned, set affinity and select queue will not work very
well when CPU IDs are not consecutive, this can happen with hot unplug.
Fix this bug by traversal the online CPUs, and create a per cpu variable
to find the mapping from CPU to the preferable virtual-queue.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Eric Dumazet <erdnetdev@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: virtualization@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>