linux-stable-rt

History

K.Tanaka a07e6ab41b md: the md RAID10 resync thread could cause a md RAID10 array deadlock This message describes another issue about md RAID10 found by testing the 2.6.24 md RAID10 using new scsi fault injection framework. Abstract: When a scsi error results in disabling a disk during RAID10 recovery, the resync threads of md RAID10 could stall. This case, the raid array has already been broken and it may not matter. But I think stall is not preferable. If it occurs, even shutdown or reboot will fail because of resource busy. The deadlock mechanism: The r10bio_s structure has a "remaining" member to keep track of BIOs yet to be handled when recovering. The "remaining" counter is incremented when building a BIO in sync_request() and is decremented when finish a BIO in end_sync_write(). If building a BIO fails for some reasons in sync_request(), the "remaining" should be decremented if it has already been incremented. I found a case where this decrement is forgotten. This causes a md_do_sync() deadlock because md_do_sync() waits for md_done_sync() called by end_sync_write(), but end_sync_write() never calls md_done_sync() because of the "remaining" counter mismatch. For example, this problem would be reproduced in the following case: Personalities : [raid10] md0 : active raid10 sdf1[4] sde1[5](F) sdd1[2] sdc1[1] sdb1[6](F) 3919616 blocks 64K chunks 2 near-copies [4/2] [_UU_] [>....................] recovery = 2.2% (45376/1959808) finish=0.7min speed=45376K/sec This case, sdf1 is recovering, sdb1 and sde1 are disabled. An additional error with detaching sdd will cause a deadlock. md0 : active raid10 sdf1[4] sde1[5](F) sdd1[6](F) sdc1[1] sdb1[7](F) 3919616 blocks 64K chunks 2 near-copies [4/1] [_U__] [=>...................] recovery = 5.0% (99520/1959808) finish=5.9min speed=5237K/sec 2739 ? S< 0:17 [md0_raid10] 28608 ? D< 0:00 [md0_resync] 28629 pts/1 Ss 0:00 bash 28830 pts/1 R+ 0:00 ps ax 31819 ? D< 0:00 [kjournald] The resync thread keeps working, but actually it is deadlocked. Patch: By this patch, the remaining counter will be decremented if needed. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2008-03-04 16:35:18 -08:00
..
raid6test	md: raid6: clean up the style of raid6test/test.c	2008-02-06 10:41:18 -08:00
.gitignore	…
Kconfig	dm: targets no longer experimental	2008-02-08 02:10:32 +00:00
Makefile	…
bitmap.c	md: reduce CPU wastage on idle md array with a write-intent bitmap	2008-03-04 16:35:17 -08:00
dm-bio-list.h	…
dm-bio-record.h	…
dm-crypt.c	dm crypt: use async crypto	2008-02-08 02:11:14 +00:00
dm-delay.c	…
dm-emc.c	…
dm-exception-store.c	dm snapshot: use uninitialized_var	2008-02-08 02:10:11 +00:00
dm-hw-handler.c	…
dm-hw-handler.h	…
dm-io.c	…
dm-io.h	…
dm-ioctl.c	dm ioctl: use uninitialized_var	2008-02-08 02:10:16 +00:00
dm-linear.c	…
dm-log.c	dm log: auto load modules	2008-02-08 02:11:19 +00:00
dm-log.h	…
dm-mpath-hp-sw.c	…
dm-mpath-rdac.c	…
dm-mpath.c	dm mpath: add missing static	2008-02-08 02:10:35 +00:00
dm-mpath.h	…
dm-path-selector.c	…
dm-path-selector.h	…
dm-raid1.c	dm-raid1.c: fix NULL dereferences	2008-02-19 15:52:27 -08:00
dm-round-robin.c	…
dm-snap.c	dm snapshot: combine consecutive exceptions in memory	2008-02-08 02:11:27 +00:00
dm-snap.h	dm snapshot: combine consecutive exceptions in memory	2008-02-08 02:11:27 +00:00
dm-stripe.c	dm: stripe enhanced status return	2008-02-08 02:11:24 +00:00
dm-table.c	Introduce path_put()	2008-02-14 21:13:33 -08:00
dm-target.c	…
dm-uevent.c	…
dm-uevent.h	…
dm-zero.c	…
dm.c	dm: move deferred bio flushing to workqueue	2008-02-08 02:11:17 +00:00
dm.h	dm: trigger change uevent on rename	2007-12-20 17:32:11 +00:00
faulty.c	md: change ITERATE_RDEV to rdev_for_each	2008-02-06 10:41:19 -08:00
kcopyd.c	…
kcopyd.h	…
linear.c	md: change ITERATE_RDEV to rdev_for_each	2008-02-06 10:41:19 -08:00
md.c	md: lock access to rdev attributes properly	2008-03-04 16:35:18 -08:00
mktables.c	md: raid6: Fix mktable.c	2008-02-06 10:41:18 -08:00
multipath.c	md: change ITERATE_RDEV to rdev_for_each	2008-02-06 10:41:19 -08:00
raid0.c	md: change ITERATE_RDEV to rdev_for_each	2008-02-06 10:41:19 -08:00
raid1.c	md: fix possible raid1/raid10 deadlock on read error during resync	2008-03-04 16:35:18 -08:00
raid5.c	md: fix an occasional deadlock in raid5	2008-02-06 10:41:19 -08:00
raid6.h	…
raid6algos.c	…
raid6altivec.uc	…
raid6int.uc	…
raid6mmx.c	…
raid6recov.c	…
raid6sse1.c	…
raid6sse2.c	…
raid6x86.h	…
raid10.c	md: the md RAID10 resync thread could cause a md RAID10 array deadlock	2008-03-04 16:35:18 -08:00
unroll.pl	…