linux-stable-rt

Commit Graph

Author	SHA1	Message	Date
Al Viro	7df336ec12	Fix btrfs when ACLs are configured out ... otherwise generic_permission() will allow anything for all files you don't own and that have some group permissions. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:36:43 -04:00
Hisashi Hifumi	524724ed1f	Btrfs: fdatasync should skip metadata writeout In btrfs, fdatasync and fsync are identical, but fdatasync should skip committing transaction when inode->i_state is set just I_DIRTY_SYNC and this indicates only atime or/and mtime updates. Following patch improves fdatasync throughput. --file-block-size=4K --file-total-size=16G --file-test-mode=rndwr --file-fsync-mode=fdatasync run Results: -2.6.30-rc8 Test execution summary: total time: 1980.6540s total number of events: 10001 total time taken by event execution: 1192.9804 per-request statistics: min: 0.0000s avg: 0.1193s max: 15.3720s approx. 95 percentile: 0.7257s Threads fairness: events (avg/stddev): 625.0625/151.32 execution time (avg/stddev): 74.5613/9.46 -2.6.30-rc8-patched Test execution summary: total time: 1695.9118s total number of events: 10000 total time taken by event execution: 871.3214 per-request statistics: min: 0.0000s avg: 0.0871s max: 10.4644s approx. 95 percentile: 0.4787s Threads fairness: events (avg/stddev): 625.0000/131.86 execution time (avg/stddev): 54.4576/8.98 Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:53 -04:00
David Woodhouse	163e783e6a	Btrfs: remove crc32c.h and use libcrc32c directly. There's no need to preserve this abstraction; it used to let us use hardware crc32c support directly, but libcrc32c is already doing that for us through the crypto API -- so we're already using the Intel crc32c acceleration where appropriate. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:53 -04:00
Christoph Hellwig	6cbff00f46	Btrfs: implement FS_IOC_GETFLAGS/SETFLAGS/GETVERSION Add support for the standard attributes set via chattr and read via lsattr. Currently we store the attributes in the flags value in the btrfs inode, but I wonder whether we should split it into two so that we don't have to keep converting between the two formats. Remove the btrfs_clear_flag/btrfs_set_flag/btrfs_test_flag macros as they were confusing the existing code and got in the way of the new additions. Also add the FS_IOC_GETVERSION ioctl for getting i_generation as it's trivial. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:52 -04:00
Chris Mason	c289811cc0	Btrfs: autodetect SSD devices During mount, btrfs will check the queue nonrot flag for all the devices found in the FS. If they are all non-rotating, SSD mode is enabled by default. If the FS was mounted with -o nossd, the non-rotating flag is ignored. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:52 -04:00
Chris Mason	451d7585a8	Btrfs: add mount -o ssd_spread to spread allocations out Some SSDs perform best when reusing block numbers often, while others perform much better when clustering strictly allocates big chunks of unused space. The default mount -o ssd will find rough groupings of blocks where there are a bunch of free blocks that might have some allocated blocks mixed in. mount -o ssd_spread will make sure there are no allocated blocks mixed in. It should perform better on lower end SSDs. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:52 -04:00
Chris Mason	c604480171	Btrfs: avoid allocation clusters that are too spread out In SSD mode for data, and all the time for metadata the allocator will try to find a cluster of nearby blocks for allocations. This commit adds extra checks to make sure that each free block in the cluster is close to the last one. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:51 -04:00
Chris Mason	3b30c22f64	Btrfs: Add mount -o nossd This allows you to turn off the ssd mode via remount. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:50 -04:00
Chris Mason	d644d8a1e3	Btrfs: avoid IO stalls behind congested devices in a multi-device FS The btrfs IO submission threads try to service a bunch of devices with a small number of threads. They do a congestion check to try and avoid waiting on requests for a busy device. The checks make sure we've sent a few requests down to a given device just so that we aren't bouncing between busy devices without actually sending down any IO. The counter used to decide if we can switch to the next device is somewhat overloaded. It is also being used to decide if we've done a good batch of requests between the WRITE_SYNC or regular priority lists. It may get reset to zero often, leaving us hammering on a busy device instead of moving on to another disk. This commit adds a new counter for the number of bios sent while servicing a device. It doesn't get reset or fiddled with. On multi-device filesystems, this fixes IO stalls in streaming write workloads. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:49 -04:00
Chris Mason	d84275c938	Btrfs: don't allow WRITE_SYNC bios to starve out regular writes Btrfs uses dedicated threads to submit bios when checksumming is on, which allows us to make sure the threads dedicated to checksumming don't get stuck waiting for requests. For each btrfs device, there are two lists of bios. One list is for WRITE_SYNC bios and the other is for regular priority bios. The IO submission threads used to process all of the WRITE_SYNC bios first and then switch to the regular bios. This commit makes sure we don't completely starve the regular bios by rotating between the two lists. WRITE_SYNC bios are still favored 2:1 over the regular bios, and this tries to run in batches to avoid seeking. Benchmarking shows this eliminates stalls during streaming buffered writes on both multi-device and single device filesystems. If the regular bios starve, the system can end up with a large amount of ram pinned down in writeback pages. If we are a little more fair between the two classes, we're able to keep throughput up and make progress on the bulk of our dirty ram. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:49 -04:00
Chris Mason	585ad2c379	Btrfs: fix metadata dirty throttling limits Once a metadata block has been written, it must be recowed, so the btrfs dirty balancing call has a check to make sure a fair amount of metadata was actually dirty before it started writing it back to disk. A previous commit had changed the dirty tracking for metadata without updating the btrfs dirty balancing checks. This commit switches it to use the correct counter. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:48 -04:00
Chris Mason	2c943de6ad	Btrfs: reduce mount -o ssd CPU usage The block allocator in SSD mode will try to find groups of free blocks that are close together. This commit makes it loop less on a given group size before bumping it. The end result is that we are less likely to fill small holes in the available free space, but we don't waste as much CPU building the large cluster used by ssd mode. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:48 -04:00
Chris Mason	cfbb930846	Btrfs: balance btree more often With the new back reference code, the cost of a balance has gone down in terms of the number of back reference updates done. This commit makes us more aggressively balance leaves and nodes as they become less full. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:47 -04:00
Chris Mason	b361242102	Btrfs: stop avoiding balancing at the end of the transaction. When the delayed reference code was added, some checks were added to avoid extra balancing while the delayed references were being flushed. This made for less efficient btrees, but it reduced the chances of loops where no forward progress was made because the balances made more delayed ref updates. With the new dead root removal code and the mixed back references, the extent allocation tree is no longer using precise back refs, and the delayed reference updates don't carry the risk of looping forever anymore. So, the balance avoidance is no longer required. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:47 -04:00
Yan Zheng	5d4f98a28c	Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE) This commit introduces a new kind of back reference for btrfs metadata. Once a filesystem has been mounted with this commit, IT WILL NO LONGER BE MOUNTABLE BY OLDER KERNELS. When a tree block in subvolume tree is cow'd, the reference counts of all extents it points to are increased by one. At transaction commit time, the old root of the subvolume is recorded in a "dead root" data structure, and the btree it points to is later walked, dropping reference counts and freeing any blocks where the reference count goes to 0. The increments done during cow and decrements done after commit cancel out, and the walk is a very expensive way to go about freeing the blocks that are no longer referenced by the new btree root. This commit reduces the transaction overhead by avoiding the need for dead root records. When a non-shared tree block is cow'd, we free the old block at once, and the new block inherits old block's references. When a tree block with reference count > 1 is cow'd, we increase the reference counts of all extents the new block points to by one, and decrease the old block's reference count by one. This dead tree avoidance code removes the need to modify the reference counts of lower level extents when a non-shared tree block is cow'd. But we still need to update back ref for all pointers in the block. This is because the location of the block is recorded in the back ref item. We can solve this by introducing a new type of back ref. The new back ref provides information about pointer's key, level and in which tree the pointer lives. This information allow us to find the pointer by searching the tree. The shortcoming of the new back ref is that it only works for pointers in tree blocks referenced by their owner trees. This is mostly a problem for snapshots, where resolving one of these fuzzy back references would be O(number_of_snapshots) and quite slow. The solution used here is to use the fuzzy back references in the common case where a given tree block is only referenced by one root, and use the full back references when multiple roots have a reference on a given block. This commit adds per subvolume red-black tree to keep trace of cached inodes. The red-black tree helps the balancing code to find cached inodes whose inode numbers within a given range. This commit improves the balancing code by introducing several data structures to keep the state of balancing. The most important one is the back ref cache. It caches how the upper level tree blocks are referenced. This greatly reduce the overhead of checking back ref. The improved balancing code scales significantly better with a large number of snapshots. This is a very large commit and was written in a number of pieces. But, they depend heavily on the disk format change and were squashed together to make sure git bisect didn't end up in a bad state wrt space balancing or the format change. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:46 -04:00
Yan Zheng	5c939df56c	btrfs: Fix set/clear_extent_bit for 'end == (u64)-1' There are some 'start = state->end + 1;' like code in set_extent_bit and clear_extent_bit. They overflow when end == (u64)-1. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-10 11:29:46 -04:00
Linus Torvalds	064e38aade	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: Fix oops and use after free during space balancing Btrfs: set device->total_disk_bytes when adding new device	2009-06-05 11:54:28 -07:00
Chris Mason	44fb551163	Btrfs: Fix oops and use after free during space balancing The btrfs allocator uses list_for_each to walk the available block groups when searching for free blocks. It starts off with a hint to help find the best block group for a given allocation. The hint is resolved into a block group, but we don't properly check to make sure the block group we find isn't in the middle of being freed due to filesystem shrinking or balancing. If it is being freed, the list pointers in it are bogus and can't be trusted. But, the code happily goes along and uses them in the list_for_each loop, leading to all kinds of fun. The fix used here is to check to make sure the block group we find really is on the list before we use it. list_del_init is used when removing it from the list, so we can do a proper check. The allocation clustering code has a similar bug where it will trust the block group in the current free space cluster. If our allocation flags have changed (going from single spindle dup to raid1 for example) because the drives in the FS have changed, we're not allowed to use the old block group any more. The fix used here is to check the current cluster against the current allocation flags. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-04 15:41:27 -04:00
Yan Zheng	2cc3c559fb	Btrfs: set device->total_disk_bytes when adding new device It was not being properly initialized, and so the size saved to disk was not correct. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-06-04 09:23:57 -04:00
Linus Torvalds	5732c46849	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: Spelling fix in btrfs_lookup_first_block_group comments Btrfs: make show_options result match actual option names Btrfs: remove outdated comment in btrfs_ioctl_resize() Btrfs: remove some WARN_ONs in the IO failure path Btrfs: Don't loop forever on metadata IO failures Btrfs: init inode ordered_data_close flag properly	2009-05-14 19:18:44 -07:00
Sankar P	9f55684c2d	Btrfs: Spelling fix in btrfs_lookup_first_block_group comments Signed-off-by: Sankar P <sankar.curiosity@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-05-14 14:00:34 -04:00
Sage Weil	6b65c5c61b	Btrfs: make show_options result match actual option names The notreelog and flushoncommit mount options were being printed slightly differently. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-05-14 14:00:34 -04:00
Li Hong	5d847a8ed9	Btrfs: remove outdated comment in btrfs_ioctl_resize() In Li Zefan's commit `dae7b665cf`, a combination call of kmalloc() and copy_from_user() is replaced by memdup_user(). So btrfs_ioctl_resize() doesn't use GFP_NOFS any more. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-05-14 14:00:33 -04:00
Chris Mason	cc7b0c9b70	Btrfs: remove some WARN_ONs in the IO failure path These debugging WARN_ONs make too much console noise during regular IO failures. An IO failure will still generate a number of messages as we verify checksums etc, but these two are not needed. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-05-14 14:00:33 -04:00
Chris Mason	76a05b35a3	Btrfs: Don't loop forever on metadata IO failures When a btrfs metadata read fails, the first thing we try to do is find a good copy on another mirror of the block. If this fails, read_tree_block() ends up returning a buffer that isn't up to date. The btrfs btree reading code was reworked to drop locks and repeat the search when IO was done, but the changes didn't add a check for failed reads. The end result was looping forever on buffers that were never going to become up to date. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-05-14 14:00:32 -04:00
Chris Mason	2757495c90	Btrfs: init inode ordered_data_close flag properly This flag is used to decide when we need to send a given file through the ordered code to make sure it is fully written before a transaction commits. It was not being properly set to zero when the inode was being setup. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-05-14 14:00:31 -04:00
Al Viro	6f5bbff9a1	Convert obvious places to deactivate_locked_super() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-05-09 10:49:40 -04:00
Linus Torvalds	4ebf662337	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: look for acls during btrfs_read_locked_inode Btrfs: fix acl caching Btrfs: Fix a bunch of printk() warnings. Btrfs: Fix a trivial warning using max() of u64 vs ULL. Btrfs: remove unused btrfs_bit_radix slab Btrfs: ratelimit IO error printks Btrfs: remove #if 0 code Btrfs: When shrinking, only update disk size on success Btrfs: fix deadlocks and stalls on dead root removal Btrfs: fix fallocate deadlock on inode extent lock Btrfs: kill btrfs_cache_create Btrfs: don't export symbols Btrfs: simplify makefile Btrfs: try to keep a healthy ratio of metadata vs data block groups	2009-04-27 11:16:33 -07:00
Chris Mason	46a53cca82	Btrfs: look for acls during btrfs_read_locked_inode This changes btrfs_read_locked_inode() to peek ahead in the btree for acl items. If it is certain a given inode has no acls, it will set the in memory acl fields to null to avoid acl lookups completely. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-27 13:18:35 -04:00
Chris Mason	7b1a14bbb0	Btrfs: fix acl caching Linus noticed the btrfs code to cache acls wasn't properly caching a NULL acl when the inode didn't have any acls. This meant the common case of no acls resulted in expensive btree searches every time the kernel checked permissions (which is quite often). This is a modified version of Linus' original patch: Properly set initial acl fields to BTRFS_ACL_NOT_CACHED in the inode. This forces an acl lookup when permission checks are done. Fix btrfs_get_acl to avoid lookups and locking when the inode acls fields are set to null. Fix btrfs_get_acl to use the right return value from __btrfs_getxattr when deciding to cache a NULL acl. It was storing a NULL acl when __btrfs_getxattr return -ENOENT, but __btrfs_getxattr was actually returning -ENODATA for this case. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-27 13:18:26 -04:00
Joel Becker	21380931eb	Btrfs: Fix a bunch of printk() warnings. Just happened to notice a bunch of %llu vs u64 warnings. Here's a patch to cast them all. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-27 08:37:49 -04:00
Joel Becker	e63b6a6c0f	Btrfs: Fix a trivial warning using max() of u64 vs ULL. A small warning popped up on ia64 because inode-map.c was comparing a u64 object id with the ULL FIRST_FREE_OBJECTID. My first thought was that all the OBJECTID constants should contain the u64 cast because btrfs code deals entirely in u64s. But then I saw how large that was, and figured I'd just fix the max() call. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-27 08:37:49 -04:00
Chris Mason	45c06543af	Btrfs: remove unused btrfs_bit_radix slab Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-27 08:37:48 -04:00
Chris Mason	193f284d49	Btrfs: ratelimit IO error printks Btrfs has printks for various IO errors, including bad checksums and mismatches between what we expect the block headers to contain and what we actually find on the disk. Longer term we need a real reporting mechanism for this, but for now printk is going to have to do. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-27 07:41:47 -04:00
Chris Mason	b7967db75a	Btrfs: remove #if 0 code Btrfs had some old code sitting around under #if 0, this drops it. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-27 07:40:52 -04:00
Chris Ball	d6397baee4	Btrfs: When shrinking, only update disk size on success Previously, we updated a device's size prior to attempting a shrink operation. This patch moves the device resizing logic to only happen if the shrink completes successfully. In the process, it introduces a new field to btrfs_device -- disk_total_bytes -- to track the on-disk size. Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-27 07:40:51 -04:00
Chris Mason	59bc5c758e	Btrfs: fix deadlocks and stalls on dead root removal After a transaction commit, the old root of the subvol btrees are sent through snapshot removal. This is what actually frees up any blocks replaced by COW, and anything the old blocks pointed to. Snapshot deletion will pause when a transaction commit has started, which helps to avoid a huge amount of delayed reference count updates piling up as the transaction is trying to close. But, this pause happens after the snapshot deletion process has asked other procs on the system to throttle back a bit so that it can make progress. We don't want to throttle everyone while we're waiting for the transaction commit, it leads to deadlocks in the user transaction ioctls used by Ceph and makes things slower in general. This patch changes things to avoid the throttling while we sleep. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-24 15:46:05 -04:00
Chris Mason	e980b50cda	Btrfs: fix fallocate deadlock on inode extent lock The btrfs fallocate call takes an extent lock on the entire range being fallocated, and then runs through insert_reserved_extent on each extent as they are allocated. The problem with this is that btrfs_drop_extents may decide to try and take the same extent lock fallocate was already holding. The solution used here is to push down knowledge of the range that is already locked going into btrfs_drop_extents. It turns out that at least one other caller had the same bug. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-24 15:46:05 -04:00
Christoph Hellwig	9601e3f633	Btrfs: kill btrfs_cache_create Just use kmem_cache_create directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-24 15:46:04 -04:00
Christoph Hellwig	0d4bf11e53	Btrfs: don't export symbols Currently the extent_map code is only for btrfs so don't export it's symbols. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-24 15:46:04 -04:00
Christoph Hellwig	2ea2544ef5	Btrfs: simplify makefile Get rid of the hacks for building out of tree, and always use += for assigning to the object lists. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-24 15:46:03 -04:00
Josef Bacik	97e728d435	Btrfs: try to keep a healthy ratio of metadata vs data block groups This patch makes the chunk allocator keep a good ratio of metadata vs data block groups. By default for every 8 data block groups, we'll allocate 1 metadata chunk, or about 12% of the disk will be allocated for metadata. This can be changed by specifying the metadata_ratio mount option. This is simply the number of data block groups that have to be allocated to force a metadata chunk allocation. By making sure we allocate metadata chunks more often, we are less likely to get into situations where the whole disk has been allocated as data block groups. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-24 15:46:02 -04:00
Linus Torvalds	ccc5ff94c6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix btrfs fallocate oops and deadlock Btrfs: use the right node in reada_for_balance Btrfs: fix oops on page->mapping->host during writepage Btrfs: add a priority queue to the async thread helpers Btrfs: use WRITE_SYNC for synchronous writes	2009-04-21 14:12:58 -07:00
Chris Mason	546888da82	Btrfs: fix btrfs fallocate oops and deadlock Btrfs fallocate was incorrectly starting a transaction with a lock held on the extent_io tree for the file, which could deadlock. Strictly speaking it was using join_transaction which would be safe, but it is better to move the transaction outside of the lock. When preallocated extents are overwritten, btrfs_mark_buffer_dirty was being called on an unlocked buffer. This was triggering an assertion and oops because the lock is supposed to be held. The bug was calling btrfs_mark_buffer_dirty on a leaf after btrfs_del_item had been run. btrfs_del_item takes care of dirtying things, so the solution is a to skip the btrfs_mark_buffer_dirty call in this case. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-21 12:45:12 -04:00
Li Zefan	dae7b665cf	btrfs: use memdup_user() Remove open-coded memdup_user(). Note this changes some GFP_NOFS to GFP_KERNEL, since copy_from_user() may cause pagefault, it's pointless to pass GFP_NOFS to kmalloc(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-04-20 23:02:50 -04:00
Chris Mason	8c594ea81d	Btrfs: use the right node in reada_for_balance reada_for_balance was using the wrong index into the path node array, so it wasn't reading the right blocks. We never directly used the results of the read done by this function because the btree search is started over at the end. This fixes reada_for_balance to reada in the correct node and to avoid searching past the last slot in the node. It also makes sure to hold the parent lock while we are finding the nodes to read. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-20 15:53:09 -04:00
Chris Mason	11c8349b4e	Btrfs: fix oops on page->mapping->host during writepage The extent_io writepage call updates the writepage index in the inode as it makes progress. But, it was doing the update after unlocking the page, which isn't legal because page->mapping can't be trusted once the page is unlocked. This lead to an oops, especially common with compression turned on. The fix here is to update the writeback index before unlocking the page. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-20 15:53:09 -04:00
Chris Mason	d313d7a31a	Btrfs: add a priority queue to the async thread helpers Btrfs is using WRITE_SYNC_PLUG to send down synchronous IOs with a higher priority. But, the checksumming helper threads prevent it from being fully effective. There are two problems. First, a big queue of pending checksumming will delay the synchronous IO behind other lower priority writes. Second, the checksumming uses an ordered async work queue. The ordering makes sure that IOs are sent to the block layer in the same order they are sent to the checksumming threads. Usually this gives us less seeky IO. But, when we start mixing IO priorities, the lower priority IO can delay the higher priority IO. This patch solves both problems by adding a high priority list to the async helper threads, and a new btrfs_set_work_high_prio(), which is used to make put a new async work item onto the higher priority list. The ordering is still done on high priority IO, but all of the high priority bios are ordered separately from the low priority bios. This ordering is purely an IO optimization, it is not involved in data or metadata integrity. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-20 15:53:08 -04:00
Chris Mason	ffbd517d5a	Btrfs: use WRITE_SYNC for synchronous writes Part of reducing fsync/O_SYNC/O_DIRECT latencies is using WRITE_SYNC for writes we plan on waiting on in the near future. This patch mirrors recent changes in other filesystems and the generic code to use WRITE_SYNC when WB_SYNC_ALL is passed and to use WRITE_SYNC for other latency critical writes. Btrfs uses async worker threads for checksumming before the write is done, and then again to actually submit the bios. The bio submission code just runs a per-device list of bios that need to be sent down the pipe. This list is split into low priority and high priority lists so the WRITE_SYNC IO happens first. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-20 15:53:08 -04:00
Linus Torvalds	b983471794	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: BUG to BUG_ON changes Btrfs: remove dead code Btrfs: remove dead code Btrfs: fix typos in comments Btrfs: remove unused ftrace include Btrfs: fix __ucmpdi2 compile bug on 32 bit builds Btrfs: free inode struct when btrfs_new_inode fails Btrfs: fix race in worker_loop Btrfs: add flushoncommit mount option Btrfs: notreelog mount option Btrfs: introduce btrfs_show_options Btrfs: rework allocation clustering Btrfs: Optimize locking in btrfs_next_leaf() Btrfs: break up btrfs_search_slot into smaller pieces Btrfs: kill the pinned_mutex Btrfs: kill the block group alloc mutex Btrfs: clean up find_free_extent Btrfs: free space cache cleanups Btrfs: unplug in the async bio submission threads Btrfs: keep processing bios for a given bdev if our proc is batching	2009-04-03 15:14:44 -07:00

1 2 3 4 5 ...

1009 Commits