Various scsi drivers use scsi_cmnd.buffer and scsi_cmnd.bufflen in their
queuecommand functions. Those fields are internal storage for the
midlayer only and are used to restore the original payload after
request_buffer and request_bufflen have been overwritten for EH. Using
the buffer and bufflen fields means they do very broken things in error
handling.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch( originally submitted by Christoph Hellwig) removes
instance_lock and changes fw_outstanding variable data type to
atomic_t.
Signed-off-by: Sumant Patro <Sumant.Patro@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We can race and misset the suspend bit if iscsi_write_space is
called then iscsi_send returns with a failure indicating
there is no space.
To handle this this patch returns a error upwards allowing xmitworker
to decide if we need to try and transmit again. For the no
write space case xmitworker will not retry, and instead
let iscsi_write_space queue it back up if needed (this relies
on the work queue code to properly requeue us if needed).
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
from davidw@netapp.com:
remove task type should return a task on success.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
from davidw@netapp.com:
We must grab the session lock when modifying the running lists.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If recovery failed or we are in recovery only overwrite the state
if we are going to terminate the session or if we logged back in.
STOP_CONN_SUSPEND and conn_cnt are not used. We only support
a single connection session ATM, so cleanup that code while
we are working around it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
MPT fusion driver initialization fails while second kernel is booting,
after a system crash (if kdump kernel is configured). Oops message is
pasted below.
*****************************************************************************
Fusion MPT base driver 3.03.08
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SAS Host driver 3.03.08 ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 5 (level, low) -> IRQ 5
mptbase: Initiating ioc0 bringup
BUG: unable to handle kernel paging request at virtual address 00002608
printing eip:
c11782fd
*pde = 00000000
Oops: 0000 [#1]
Modules linked in:
CPU: 0
EIP: 0060:[<c11782fd>] Not tainted VLI
EFLAGS: 00010046 (2.6.17-rc1-16M #2)
EIP is at mptscsih_io_done+0x27/0x3a3
eax: c4fed000 ebx: c4fed000 ecx: 00002600 edx: 00000298
esi: c11782d6 edi: 00002600 ebp: 00000000 esp: c1332f74
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c1332000 task=c128f9c0) Stack: <0>0000006c 00000020 00000298 00002600 c4fed000 c4fed000 c11782d6 0000260 0
00000000 c1172c49 c4fed000 c1305b40 00000005 00000000 c1172d75 c48877e0
c1029687 00000000 c1307fb8 00000000 c1305a00 00000001 00000000 c1307fb8
Call Trace:
<c11782d6> mptscsih_io_done+0x0/0x3a3 <c1172c49> mpt_turbo_reply+0xbb/0xd3
<c1172d75> mpt_interrupt+0x22/0x2b <c1029687> misrouted_irq+0x63/0xcb
<c10297b3> note_interrupt+0x43/0x98 <c10292f9> __do_IRQ+0x68/0x8f
<c1003fac> do_IRQ+0x36/0x4e
=======================
<c1002aa6> common_interrupt+0x1a/0x20 <c1001150> mwait_idle+0x1a/0x2a
<c10010bf> cpu_idle+0x40/0x5c <c1308610> start_kernel+0x17a/0x17c Code: 5e 5f 5d c3 55 89 cd 57 56 53 83 ec 14 89 54 24 0c 89 44 24 10 8b 90 cc 00 00 00 8b 4c 24 0c 81 c2 98 02 00 00 85 ed 89 54 24 08 <0f> b7 79 08 89 fe 74 04 0f b7 75 08 66 39 f7 75 0d 8b 44 24 0c
*******************************************************************************
o Kdump capture kernel boot fails during initialization of MPT fusion driver.
(LSI Logic / Symbios Logic SAS1064E PCI-Express Fusion-MPT SAS (rev 01))
o Problem is easily reproducible, if system crashed while some disk activity
like cp operation was going on.
o After a system crash, devices are not shutdown and capture kernel starts
booting while skipping BIOS. Hence underlying device is left in operational
state. In this case scsi contoller was left with interrupt line asserted
reply FIFO was not empty. When driver starts initializing in the second
kernel, it receives the interrupt the moment request_irq() is called.
Interrupt handler, reads the message from reply FIFO and tries to access
the associated message frame and panics, as in the new kernel's context
that message frame is not valid at all.
o In this scenario, probably we should delay the request_irq() call. First
bring up the IOC, reset it if needed and then should register for irq.
o I have tested the patch with SAS1064E and 53c1030 controllers.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: "Moore, Eric Dean" <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
max_id now means the maximum number of ids on the bus, which means it
is one greater than the largest possible id number.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The scsi_scan_host_selected() should return -EINVAL when the id is equal
to the max_id. Currently it uses ">" when comparing with max_id, and
hence leaves the border case when "id==max_id".
The channel and lun have values valid from 0 up to,
and including, max_channel or max_lun. But, the valid values for id
range from 0 to max_id-1. This patch fixes the problem.
Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Bump up version number, skip "4.6.0" because this might
clash with zfcp version in certain distros.
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If zfcp's port erp fails we now call fc_remote_port_delete. This helps
to avoid offlined scsi devices if scsi commands time out due to path
failures. When an adapter erp fails we call fc_remote_port_delete for
all ports on that adapter.
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Ralph Wuerthner <rwuerthn@de.ibm.com>
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Replace hex dump of bit error threshold data by log message showing
bit error threshold data human readable.
Signed-off-by: Ralph Wuerthner <rwuerthn@de.ibm.com>
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Removed some macros, struct members and typedefs which were
unused or not necessary.
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Replace kmalloc/memset by kzalloc or kcalloc.
Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Relax the lowmem bounce buffer requirement for imm so that any
low memory page will do -- they don't need to be below the
ISA 16 MB limit, just need to be mapped in low memory.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Semantic changes in ISP24xx firmware behaviour inadvertently
caused the driver to believe an F-port topology was present in an
N_port-to-N_port configuration.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Simplify and centralise buffer allocation/deallocation, as
there's no point in having two memory request methods.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Expandind on the previous commit:
commit 79f89a4296
Author: andrew.vasquez@qlogic.com <andrew.vasquez@qlogic.com>
Date: Fri Jan 13 17:05:58 2006 -0800
[SCSI] qla2xxx: Disable port-type RSCN handling via driver state-machine.
and given:
- the process-context requirements of the FC transport
rport-APIs.
- lack of port-type RSCN processing logic for ISP24xx and newer
chips.
it's time now to remove the state-machine logic from mainline.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
In qla2x00_reset_chip the driver first takes the hardware lock,
and then later on takes the mbx lock.
In the mailbox_command code.. it goes the other way around.
Discovered with the lock validator.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If firmware image is unavailable via request_firwmare(), then
attempt to load the image (likely out-of-date) stored in flash
memory.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
With ISP24XX and ISP54XX parts.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Do not flush queues then block session. This will cause commands
to needlessly swing around on us and remove goofy
recovery_failed field and replace with state value.
And do not start recovery from within the host reset function.
This causeis too many problems becuase open-iscsi was desinged to
call out to userspace then have userpscae decide if we should
go into recovery or kill the session.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Patch from david.somayajulu@qlogic.com and cleaned up by Tomo.
qla4xxx is going to have a different daemon so this patch
just routes the events to the right daemon.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Discovered by steven@hayter.me.uk and patch by michaelc@cs.wisc.edu
The dtask mempool is reserving 261120 items per session! Since we are now
sending headers with sendmsg there is no reason for the mempool and that
was causing us to us carzy amounts of mem. We can preallicate a header in
the r2t and task struct and reuse them
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We only use the mtask data buffer for login tasks so we do not
need to preallocate a buffer for every mtask. This saves
8 * 31 KB.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From Zhen and ported by Mike:
Don't use sendpage for the headers. sendpage for the pdu headers
does not seem to have a performance impact, makes life harder
for mutiple data pdus to be in flight and still trips up some
network cards when it is from slab mem.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The Coverity checker found a memory leak (bug nr. 1245) in
drivers/block/DAC960.c::DAC960_V2_ProcessCompletedCommand()
The leak is pretty unlikely since it requires that the first of two
successive kmalloc() calls fail while the second one succeeds. But it can
still happen even if it's unlikely.
If the first call that allocates 'PhysicalDeviceInfo' fails but the one
that allocates 'InquiryUnitSerialNumber' succeeds, then we will leak the
memory allocated to 'InquiryUnitSerialNumber' when the variable goes out
of scope.
A simple fix for this is to change the existing code that frees
'PhysicalDeviceInfo' if that one was allocated but
'InquiryUnitSerialNumber' was not, into a check for either pointer
being NULL and if so just free both. This is safe since kfree() can
deal with being passed a NULL pointer and it avoids the leak.
While I was there I also removed the casts of the kmalloc() return
value since it's pointless.
I also updated the driver version since this patch changes the workings of
the code (however slightly).
This issue could probably be fixed a lot more elegantly, but the code
is a big mess IMHO and I just took the least intrusive route to a fix
that I could find instead of starting on a cleanup as well (that can
come later).
Please consider for inclusion.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received From Mark Salyzyn
The queue tracking is just not being used, not even for debugging. Information
about outstanding commands can be acquired from the scsi structures.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received From Mark Salyzyn
A race condition existed that could result in a lost completion of a
command to the ppc based cards.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received From Mark Salyzyn
Add the ability to adjust for unusual corner case failures. Both of
these additional module parameters deal with embedded, non-intel or
complicated system scenarios.
Aif_timeout can be increased past the default 2 minute timeout to drop
application registrations when a system has an unusually high event load
resulting from continuing management requests, or simultaneous builds,
or sluggish user space as a result of system load.
Startup_timeout can be increased past the default 3 minute timeout to
drop an adapter initialization for systems that have a very large number
of targets, or slow to spin-up targets, or a complicated set of array
configurations that extend the time for the firmware to declare that it
is operational. This timeout would only have an affect on non-intel
based systems, as the (more patient) BIOS would generally be where the
startup delay would be dealt with.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received From Mark Salyzyn
Slight space and speed efficiency improvement.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received From Mark Salyzyn
Since new commands to the card are quiesced, respect the changes in
the SCSI error path which dropped locking around the hba reset handler
and similarly drop the lock requirement in the driver's path.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Problem spotted by: Suzuki K P <suzuki@in.ibm.com>
A zero return on success isn't correct for filesystem write functions.
They should either return negative error or the length of bytes
consumed. Add code to convert our zero on success error return to
return the length of bytes passed in.
This fixes the following:
$ echo "scsi remove-single-device 0 0 3 0" > /proc/scsi/scsi
bash: echo: write error: No such device or address"
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
debugged by wrwhitehead@novell.com
patch and analysis by fujita.tomonori@lab.ntt.co.jp
Only tcp_read_sock and recv_actor (iscsi_tcp_data_recv for us) see
desc.count. It is is used just for permitting tcp_read_sock to read
the portion of data in the socket.
When iscsi_tcp_data_recv sees a partial header, it sets
desc.count. However, it is possible that the next skb (containing the
rest of the header) still does not come. So I'm not sure that this
scheme is completely correct.
Ideally, we should use the exact length of the data in the socket for
desc.count. However, it is not so simple (see SIOCINQ in
tcp_ioctl). So I think that iscsi_tcp_data_recv can just stop playing
with desc.count and tell tcp_read_sock to read the all skbs. As
proposed already, if iscsi_tcp_data_ready sets desc.count to
non-zero, tcp_read_sock does that.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
debugged by Ming and Rohan:
The problem Ming and Rohan debugged was that during a normal session
login, open-iscsi is not incrementing the exp_statsn counter. It was
stuck at zero. From the RFC, it looks like if the login response PDU has
a successful status then we should be incrementing that value. Also from
the RFC, it looks like if when we drop a connection then reconnect, we
should be using the exp_statsn from the old connection in the next
relogin attempt.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
align printk output
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>