linux-stable-rt

Commit Graph

Author	SHA1	Message	Date
David Mosberger-Tang	e7e965fa19	[IA64] use srlz.d instead of srlz.i in ia64_leave_kernel() This patch switches the srlz.i in ia64_leave_kernel() to srlz.d. As per architecture manual, the former is needed only to ensure that the clearing of PSR.IC is seen by the VHPT for subsequent instruction fetches. However, since the remainder of the code (up to and including the RFI instruction) is mapped by a pinned TLB entry, there is no chance of an iTLB miss and we don't care whether or not the VHPT sees PSR.IC cleared. Since srlz.d is substantially cheaper than srlz.i, this should shave off a few cycles off the interrupt path (unverified though; I'm not setup to measure this at the moment). Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:22:08 -07:00
David Mosberger-Tang	fbf7192ba0	[IA64] Annotate fsys_bubble_down() with McKinley dispatch info. This patch changes comments & formatting only. There is no code change. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:21:26 -07:00
David Mosberger-Tang	1ba7be7d69	[IA64] Reschedule fsys_bubble_down(). Improvements come from eliminating srlz.i, not scheduling AR/CR-reads too early (while there are others still pending), scheduling the backing-store switch as well as possible, splitting the BBB bundle into a MIB/MBB pair. Why is it safe to eliminate the srlz.i? Observe that we used to clear bits ~PSR_PRESERVED_BITS in PSR.L. Since PSR_PRESERVED_BITS==PSR.{UP,MFL,MFH,PK,DT,PP,SP,RT,IC}, we ended up clearing PSR.{BE,AC,I,DFL,DFH,DI,DB,SI,TB}. However, PSR.BE : already is turned off in __kernel_syscall_via_epc() PSR.AC : don't care (kernel normally turns PSR.AC on) PSR.I : already turned off by the time fsys_bubble_down gets invoked PSR.DFL: always 0 (kernel never turns it on) PSR.DFH: don't care --- kernel never touches f32-f127 on its own initiative PSR.DI : always 0 (kernel never turns it on) PSR.SI : always 0 (kernel never turns it on) PSR.DB : don't care --- kernel never enables kernel-level breakpoints PSR.TB : must be 0 already; if it wasn't zero on entry to __kernel_syscall_via_epc, the branch to fsys_bubble_down will trigger a taken branch; the taken-trap-handler then converts the syscall into a break-based system-call. In other words: all the bits we're clearying are either 0 already or are don't cares! Thus, we don't have to write PSR.L at all and we don't have to do a srlz.i either. Good for another ~20 cycle improvement for EPC-based heavy-weight syscalls. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:20:51 -07:00
David Mosberger-Tang	21bc4f9b34	[IA64] Annotate __kernel_syscall_via_epc() with McKinley dispatch info. Two other very minor changes: use "mov.i" instead of "mov" for reading ar.pfs (for clarity; doesn't affect the code at all). Also, predicate the load of r14 for consistency. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:20:11 -07:00
David Mosberger-Tang	70929a57cf	[IA64] Reschedule __kernel_syscall_via_epc(). Avoid some stalls, which is good for about 2 cycles when invoking a light-weight handler. When invoking a heavy-weight handler, this helps by about 7 cycles, with most of the improvement coming from the improved branch-prediction achieved by splitting the BBB bundle into two MIB bundles. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:19:37 -07:00
David Mosberger-Tang	f8fa5448fc	[IA64] Reschedule break_fault() for better performance. This patch reorganizes break_fault() to optimistically assume that a system-call is being performed from user-space (which is almost always the case). If it turns out that (a) we're not being called due to a system call or (b) we're being called from within the kernel, we fixup the no-longer-valid assumptions in non_syscall() and .break_fixup(), respectively. With this approach, there are 3 major phases: - Phase 1: Read various control & application registers, in particular the current task pointer from AR.K6. - Phase 2: Do all memory loads (load system-call entry, load current_thread_info()->flags, prefetch kernel register-backing store) and switch to kernel register-stack. - Phase 3: Call ia64_syscall_setup() and invoke syscall-handler. Good for 26-30 cycles of improvement on break-based syscall-path. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:19:04 -07:00
David Mosberger-Tang	c03f058fbf	[IA64] In ia64_leave_syscall(), fix comments and whitespace only. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:18:22 -07:00
David Mosberger-Tang	87e522a0f7	[IA64] Schedule ia64_leave_syscall() to read ar.bsp earlier Reschedule code to read ar.bsp as early as possible. To enable this, don't bother clearing some of the registers when we're returning to kernel stacks. Also, instead of trying to support the pNonSys case (which makes no sense), do a bugcheck instead (with break 0). Finally, remove a clear of r14 which is a left-over from the previous patch. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:17:44 -07:00
David Mosberger-Tang	060561ff79	[IA64] In syscall-entry, use st8 instead of stf8 to clear pt_regs.r8 Using stf8 seemed like a clever idea at the time, but stf8 forces the cache-line to be invalidated in the L1D (if it happens to be there already). This patch eliminates a guaranteed L1D cache-miss and, by itself, is good for a 1-2 cycle improvement for heavy-weight syscalls. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:17:03 -07:00
David Mosberger-Tang	96e017495e	[IA64] On return from syscall, hint b7 with __kernel_syscall_via_epc(). Why is this a good idea? Clearing b7 to 0 is guaranteed to do us no good and writing it with __kernel_syscall_via_epc() yields a 6 cycle improvement _if_ the application performs another EPC-based system- call without overwriting b7, which is not all that uncommon. Well worth the minimal cost of 1 bundle of code. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:16:07 -07:00
David Mosberger-Tang	3c79c8b1d9	[IA64] Schedule fp-clearing insns at least 6 cycles after reading ar.bsp. Decreases syscall overhead by approximately 6 cycles. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:15:13 -07:00
David Mosberger-Tang	9ec1a7ad43	[IA64] Use dynamic prediction for RSE-clearing branches. This by itself is good for a 1-2 cycle speed up. Effect is bigger when combined with the later patches. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:13:33 -07:00
David Mosberger-Tang	06ef660816	[IA64] __ia64_syscall() is no longer used anywhere in the kernel. Remove it. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:10:45 -07:00
Kenji Kaneshige	b9e41d7fb6	[IA64] iosapic.c: typo ... s/spin_unlock_irq/spin_unlock/ vector sharing patch had a typo ... mismatched spin_lock() with a spin_unlock_irq(). Fix from Kenji Kaneshige. Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-25 13:27:48 -07:00
Tony Luck	e1ed81ab7a	[IA64] print "siblings" before {physical,core,thread} id Rohit and Suresh changed their mind about the order to print things in /proc/cpuinfo, but didn't include the change in the version of the patch they sent to me. Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-25 13:27:12 -07:00
Kenji Kaneshige	24eeb568ae	[IA64] vector sharing (Large I/O system support) Current ia64 linux cannot handle greater than 184 interrupt sources because of the lack of vectors. The following patch enables ia64 linux to handle greater than 184 interrupt sources by allowing the same vector number to be shared by multiple IOSAPIC's RTEs. The design of this patch is besed on "Intel(R) Itanium(R) Processor Family Interrupt Architecture Guide". Even if you don't have a large I/O system, you can see the behavior of vector sharing by changing IOSAPIC_LAST_DEVICE_VECTOR to fewer value. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-25 13:26:23 -07:00
Suresh Siddha	e927ecb05e	[IA64] multi-core/multi-thread identification Version 3 - rediffed to apply on top of Ashok's hotplug cpu patch. /proc/cpuinfo output in step with x86. This is an updated MC/MT identification patch based on the previous discussions on list. Add the Multi-core and Multi-threading detection for IPF. - Add new core and threading related fields in /proc/cpuinfo. Physical id Core id Thread id Siblings - setup the cpu_core_map and cpu_sibling_map appropriately - Handles Hot plug CPU Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Gordon Jin <gordon.jin@intel.com> Signed-off-by: Rohit Seth <rohit.seth@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-25 13:25:06 -07:00
David Mosberger-Tang	a37d98f6a9	[IA64] fix syscall-optimization goof Sadly, I goofed in this syscall-tuning patch: ChangeSet 1.1966.1.40 2005/01/22 13:31:05 davidm@hpl.hp.com [IA64] Improve ia64_leave_syscall() for McKinley-type cores. Optimize ia64_leave_syscall() a bit better for McKinley-type cores. The patch looks big, but that's mostly due to renaming r16/r17 to r2/r3. Good for a 13 cycle improvement. The problem is that the size of the physical stacked registers was loaded into the wrong register (r3 instead of r17). Since r17 by coincidence always had the value 1, this had the effect of turning rse_clear_invalid into a no-op. That poses the risk of leaking kernel state back to user-land and is hence not acceptable. The fix below is simple, but unfortunately it costs us about 28 cycles in syscall overhead. ;-( Unfortunately, there isn't much we can do about that since those registers have to be cleared one way or another. --david Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-25 13:20:38 -07:00
Stephane Eranian	4944930ab7	[IA64] perfmon: make pfm_sysctl a global, and other cleanup - make pfm_sysctl a global such that it is possible to enable/disable debug printk in sampling formats using PFM_DEBUG. - remove unused pfm_debug_var variable - fix a bug in pfm_handle_work where an BUG_ON() could be triggered. There is a path where pfm_handle_work() can be called with interrupts enabled, i.e., when TIF_NEED_RESCHED is set. The fix correct the masking and unmasking of interrupts in pfm_handle_work() such that we restore the interrupt mask as it was upon entry. signed-off-by: stephane eranian <eranian@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-25 13:08:30 -07:00
David Mosberger-Tang	30325d1771	[IA64] speed up syscall path a bit more Recently I noticed that clearing ar.ssd/ar.csd right before srlz.d is causing significant stalling in the syscall path. The patch below fixes that by moving the register-writes after srlz.d. On a Madison, this drops break-based getpid() from 241 to 226 cycles (-15 cycles). Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-25 13:03:16 -07:00
Keith Owens	e8d1cb2f28	[IA64] Tighten up unw_unwind_to_user check Detect user space by the unwind frame with predicate PRED_USER_STACK set, instead of a user space IP. Tighten up the last ditch check for running off the top of the kernel stack. Based on a suggestion by David Mosberger, reworked to fit the current tree. This survives my stress test which used to break 2.6.9 kernels. Unlike 2.6.11, the stress test now unwinds to the correct point, so gdb can get the user space registers. Signed-off-by: Keith Owens <kaos@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-25 11:45:26 -07:00
David Mosberger-Tang	8297511530	[IA64] add missing cpu_relax() in ITC syncing code Call cpu_relax() in busy-waiting loops of the ITC-syncing code. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-25 11:44:02 -07:00
Ashok Raj	df6c6804ce	[IA64] Fix build errors for !HOTPLUG case. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-22 14:46:24 -07:00
Ashok Raj	b8d8b883e6	[IA64] cpu hotplug: return offlined cpus to SAL This patch is required to support cpu removal for IPF systems. Existing code just fakes the real offline by keeping it run the idle thread, and polling for the bit to re-appear in the cpu_state to get out of the idle loop. For the cpu-offline to work correctly, we need to pass control of this CPU back to SAL so it can continue in the boot-rendez mode. This gives the SAL control to not pick this cpu as the monarch processor for global MCA events, and addition does not wait for this cpu to checkin with SAL for global MCA events as well. The handoff is implemented as documented in SAL specification section 3.2.5.1 "OS_BOOT_RENDEZ to SAL return State" Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-22 14:44:40 -07:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

1 2 3

125 Commits