original_kernel/block
Tejun Heo 17497acbdc blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
blk-mq uses percpu_ref for its usage counter which tracks the number
of in-flight commands and used to synchronously drain the queue on
freeze.  percpu_ref shutdown takes measureable wallclock time as it
involves a sched RCU grace period.  This means that draining a blk-mq
takes measureable wallclock time.  One would think that this shouldn't
matter as queue shutdown should be a rare event which takes place
asynchronously w.r.t. userland.

Unfortunately, SCSI probing involves synchronously setting up and then
tearing down a lot of request_queues back-to-back for non-existent
LUNs.  This means that SCSI probing may take above ten seconds when
scsi-mq is used.

  [    0.949892] scsi host0: Virtio SCSI HBA
  [    1.007864] scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    1.1. PQ: 0 ANSI: 5
  [    1.021299] scsi 0:0:1:0: Direct-Access     QEMU     QEMU HARDDISK    1.1. PQ: 0 ANSI: 5
  [    1.520356] tsc: Refined TSC clocksource calibration: 2491.910 MHz

  <stall>

  [   16.186549] sd 0:0:0:0: Attached scsi generic sg0 type 0
  [   16.190478] sd 0:0:1:0: Attached scsi generic sg1 type 0
  [   16.194099] osd: LOADED open-osd 0.2.1
  [   16.203202] sd 0:0:0:0: [sda] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
  [   16.208478] sd 0:0:0:0: [sda] Write Protect is off
  [   16.211439] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  [   16.218771] sd 0:0:1:0: [sdb] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
  [   16.223264] sd 0:0:1:0: [sdb] Write Protect is off
  [   16.225682] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

This is also the reason why request_queues start in bypass mode which
is ended on blk_register_queue() as shutting down a fully functional
queue also involves a RCU grace period and the queues for non-existent
SCSI devices never reach registration.

blk-mq basically needs to do the same thing - start the mq in a
degraded mode which is faster to shut down and then make it fully
functional only after the queue reaches registration.  percpu_ref
recently grew facilities to force atomic operation until explicitly
switched to percpu mode, which can be used for this purpose.  This
patch makes blk-mq initialize q->mq_usage_counter in atomic mode and
switch it to percpu mode only once blk_register_queue() is reached.

Note that this issue was previously worked around by 0a30288da1
("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during
probe") for v3.17.  The temp fix was reverted in preparation of adding
persistent atomic mode to percpu_ref by 9eca80461a ("Revert "blk-mq,
percpu_ref: implement a kludge for SCSI blk-mq stall during probe"").
This patch and the prerequisite percpu_ref changes will be merged
during v3.18 devel cycle.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
Fixes: add703fda9 ("blk-mq: use percpu_ref for mq usage count")
Reviewed-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
2014-09-24 13:37:21 -04:00
..
partitions partitions: aix.c: off by one bug 2014-08-05 13:13:24 -06:00
Kconfig
Kconfig.iosched
Makefile block: move mm/bounce.c to block/ 2014-05-19 20:01:52 -06:00
bio-integrity.c block: Fix BUG_ON when pi errors occur 2014-08-21 20:37:47 -05:00
bio.c block: use kmalloc alignment for bio slab 2014-08-01 12:30:34 -04:00
blk-cgroup.c Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-08-04 10:11:28 -07:00
blk-cgroup.h Revert "block: add __init to blkcg_policy_register" 2014-06-22 16:34:11 -06:00
blk-core.c scsi-mq: fix requests that use a separate CDB buffer 2014-08-22 15:04:31 -05:00
blk-exec.c blk-mq: avoid infinite recursion with the FUA flag 2014-09-22 11:55:19 -06:00
blk-flush.c block: remove elv_abort_queue and blk_abort_flushes 2014-06-11 15:31:21 -06:00
blk-integrity.c
blk-ioc.c
blk-iopoll.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next 2014-06-03 12:57:53 -07:00
blk-lib.c block/blk-lib.c: make __blkdev_issue_zeroout static 2014-05-26 17:39:09 -06:00
blk-map.c
blk-merge.c blk-merge: fix blk_recount_segments 2014-09-02 10:25:12 -06:00
blk-mq-cpu.c blk-mq: add file comments and update copyright notices 2014-05-28 10:15:41 -06:00
blk-mq-cpumap.c blk-mq: add file comments and update copyright notices 2014-05-28 10:15:41 -06:00
blk-mq-sysfs.c blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode 2014-09-24 13:37:21 -04:00
blk-mq-tag.c blk-mq: bitmap tag: fix races in bt_get() function 2014-06-17 22:13:08 -07:00
blk-mq-tag.h blk-mq: bitmap tag: fix races on shared ::wake_index fields 2014-06-17 22:12:35 -07:00
blk-mq.c blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode 2014-09-24 13:37:21 -04:00
blk-mq.h blk-mq: decouble blk-mq freezing from generic bypassing 2014-07-01 10:31:13 -06:00
blk-settings.c block: ensure that bio_add_page() always accepts a page for an empty bio 2014-06-10 12:53:56 -06:00
blk-softirq.c
blk-sysfs.c blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode 2014-09-24 13:37:21 -04:00
blk-tag.c block: don't assume last put of shared tags is for the host 2014-07-08 12:25:28 +02:00
blk-throttle.c cgroup: remove sane_behavior support on non-default hierarchies 2014-07-09 10:08:08 -04:00
blk-timeout.c block: ensure that the timer is always added 2014-05-30 15:41:39 -06:00
blk.h block: remove elv_abort_queue and blk_abort_flushes 2014-06-11 15:31:21 -06:00
bounce.c mm: convert some level-less printks to pr_* 2014-06-06 16:08:18 -07:00
bsg-lib.c
bsg.c block: add blk_rq_set_block_pc() 2014-06-06 07:57:37 -06:00
cfq-iosched.c cfq-iosched: Add comments on update timing of weight 2014-08-28 08:16:29 -06:00
cmdline-parser.c
compat_ioctl.c Merge branch 'for-3.17/core' of git://git.kernel.dk/linux-block 2014-08-14 09:07:02 -06:00
deadline-iosched.c
elevator.c Revert "block: add __init to elv_register" 2014-06-22 16:34:11 -06:00
genhd.c genhd: fix leftover might_sleep() in blk_free_devt() 2014-09-22 14:45:45 -06:00
ioctl.c block: fix BLKSECTGET ioctl when max_sectors is greater than USHRT_MAX 2014-07-01 10:43:07 -06:00
ioprio.c block: move ioprio.c from fs/ to block/ 2014-05-19 11:02:18 -06:00
noop-iosched.c
partition-generic.c block: Fix dev_t minor allocation lifetime 2014-09-03 15:01:02 -06:00
scsi_ioctl.c block: fix error handling in sg_io 2014-08-26 08:20:01 -06:00