linux-stable-rt

Commit Graph

Author	SHA1	Message	Date
Chris Mason	3394e1607e	Btrfs: Give each subvol and snapshot their own anonymous devid Each subvolume has its own private inode number space, and so we need to fill in different device numbers for each subvolume to avoid confusing applications. This commit puts a struct super_block into struct btrfs_root so it can call set_anon_super() and get a different device number generated for each root. btrfs_rename is changed to prevent renames across subvols. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-11-17 20:42:26 -05:00
Chris Mason	3de4586c52	Btrfs: Allow subvolumes and snapshots anywhere in the directory tree Before, all snapshots and subvolumes lived in a single flat directory. This was awkward and confusing because the single flat directory was only writable with the ioctls. This commit changes the ioctls to create subvols and snapshots at any point in the directory tree. This requires making separate ioctls for snapshot and subvol creation instead of a combining them into one. The subvol ioctl does: btrfsctl -S subvol_name parent_dir After the ioctl is done subvol_name lives inside parent_dir. The snapshot ioctl does: btrfsctl -s path_for_snapshot root_to_snapshot path_for_snapshot can be an absolute or relative path. btrfsctl breaks it up into directory and basename components. root_to_snapshot can be any file or directory in the FS. The snapshot is taken of the entire root where that file lives. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-11-17 21:02:50 -05:00
Chris Mason	5f2cc086cc	Btrfs: Avoid unplug storms during commit While doing a commit, btrfs makes sure all the metadata blocks were properly written to disk, calling wait_on_page_writeback for each page. This writeback happens after allowing another transaction to start, so it competes for the disk with other processes in the FS. If the page writeback bit is still set, each wait_on_page_writeback might trigger an unplug, even though the page might be waiting for checksumming to finish or might be waiting for the async work queue to submit the bio. This trades wait_on_page_writeback for waiting on the extent writeback bits. It won't trigger any unplugs and substantially improves performance in a number of workloads. This also changes the async bio submission to avoid requeueing if there is only one device. The requeue just wastes CPU time because there are no other devices to service. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-11-07 18:22:45 -05:00
Yan Zheng	80ff385665	Btrfs: update nodatacow code v2 This patch simplifies the nodatacow checker. If all references were created after the latest snapshot, then we can avoid COW safely. This patch also updates run_delalloc_nocow to do more fine-grained checking. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>	2008-10-30 14:20:02 -04:00
Chris Mason	87ef2bb46b	Btrfs: prevent looping forever in finish_current_insert and del_pending_extents finish_current_insert and del_pending_extents process extent tree modifications that build up while we are changing the extent tree. It is a confusing bit of code that prevents recursion. Both functions run through a list of pending operations and both funcs add to the list of pending operations. If you have two procs in either one of them, they can end up looping forever making more work for each other. This patch makes them walk forward through the list of pending changes instead of always trying to process the entire list. At transaction commit time, we catch any changes that were left over. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-10-30 11:23:27 -04:00
Yan Zheng	84234f3a1f	Btrfs: Add root tree pointer transaction ids This patch adds transaction IDs to root tree pointers. Transaction IDs in tree pointers are compared with the generation numbers in block headers when reading root blocks of trees. This can detect some types of IO errors. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>	2008-10-29 14:49:05 -04:00
Josef Bacik	2517920135	Btrfs: nuke fs wide allocation mutex V2 This patch removes the giant fs_info->alloc_mutex and replaces it with a bunch of little locks. There is now a pinned_mutex, which is used when messing with the pinned_extents extent io tree, and the extent_ins_mutex which is used with the pending_del and extent_ins extent io trees. The locking for the extent tree stuff was inspired by a patch that Yan Zheng wrote to fix a race condition, I cleaned it up some and changed the locking around a little bit, but the idea remains the same. Basically instead of holding the extent_ins_mutex throughout the processing of an extent on the extent_ins or pending_del trees, we just hold it while we're searching and when we clear the bits on those trees, and lock the extent for the duration of the operations on the extent. Also to keep from getting hung up waiting to lock an extent, I've added a try_lock_extent so if we cannot lock the extent, move on to the next one in the tree and we'll come back to that one. I have tested this heavily and it does not appear to break anything. This has to be applied on top of my find_free_extent redo patch. I tested this patch on top of Yan's space reblancing code and it worked fine. The only thing that has changed since the last version is I pulled out all my debugging stuff, apparently I forgot to run guilt refresh before I sent the last patch out. Thank you, Signed-off-by: Josef Bacik <jbacik@redhat.com>	2008-10-29 14:49:05 -04:00
Yan Zheng	f82d02d9d8	Btrfs: Improve space balancing code This patch improves the space balancing code to keep more sharing of tree blocks. The only case that breaks sharing of tree blocks is data extents get fragmented during balancing. The main changes in this patch are: Add a 'drop sub-tree' function. This solves the problem in old code that BTRFS_HEADER_FLAG_WRITTEN check breaks sharing of tree block. Remove relocation mapping tree. Relocation mappings are stored in struct btrfs_ref_path and updated dynamically during walking up/down the reference path. This reduces CPU usage and simplifies code. This patch also fixes a bug. Root items for reloc trees should be updated in btrfs_free_reloc_root. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>	2008-10-29 14:49:05 -04:00
Chris Mason	30c43e2444	Btrfs: remove last_log_alloc allocator optimization The tree logging code was trying to separate tree log allocations from normal metadata allocations to improve writeback patterns during an fsync. But, the code was not effective and ended up just mixing tree log blocks with regular metadata. That seems to be working fairly well, so the last_log_alloc code can be removed. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-10-03 12:24:01 -04:00
Chris Mason	d352ac6814	Btrfs: add and improve comments This improves the comments at the top of many functions. It didn't dive into the guts of functions because I was trying to avoid merging problems with the new allocator and back reference work. extent-tree.c and volumes.c were both skipped, and there is definitely more work todo in cleaning and commenting the code. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-29 15:18:18 -04:00
Zheng Yan	1a40e23b95	Btrfs: update space balancing code This patch updates the space balancing code to utilize the new backref format. Before, btrfs-vol -b would break any COW links on data blocks or metadata. This was slow and caused the amount of space used to explode if a large number of snapshots were present. The new code can keeps the sharing of all data extents and most of the tree blocks. To maintain the sharing of data extents, the space balance code uses a seperate inode hold data extent pointers, then updates the references to point to the new location. To maintain the sharing of tree blocks, the space balance code uses reloc trees to relocate tree blocks in reference counted roots. There is one reloc tree for each subvol, and all reloc trees share same root key objectid. Reloc trees are snapshots of the latest committed roots of subvols (root->commit_root). To relocate a tree block referenced by a subvol, there are two steps. COW the block through subvol's reloc tree, then update block pointer in the subvol to point to the new block. Since all reloc trees share same root key objectid, doing special handing for tree blocks owned by them is easy. Once a tree block has been COWed in one reloc tree, we can use the resulting new block directly when the same block is required to COW again through other reloc trees. In this way, relocated tree blocks are shared between reloc trees, so they are also shared between subvols. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-26 10:09:34 -04:00
Zheng Yan	5b21f2ed3f	Btrfs: extent_map and data=ordered fixes for space balancing * Add an EXTENT_BOUNDARY state bit to keep the writepage code from merging data extents that are in the process of being relocated. This allows us to do accounting for them properly. * The balancing code relocates data extents indepdent of the underlying inode. The extent_map code was modified to properly account for things moving around (invalidating extent_map caches in the inode). * Don't take the drop_mutex in the create_subvol ioctl. It isn't required. * Fix walking of the ordered extent list to avoid races with sys_unlink * Change the lock ordering rules. Transaction start goes outside the drop_mutex. This allows btrfs_commit_transaction to directly drop the relocation trees. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-26 10:05:38 -04:00
Zheng Yan	e465768938	Btrfs: Add shared reference cache Btrfs has a cache of reference counts in leaves, allowing it to avoid reading tree leaves while deleting snapshots. To reduce contention with multiple subvolumes, this cache is private to each subvolume. This patch adds shared reference cache support. The new space balancing code plays with multiple subvols at the same time, So the old per-subvol reference cache is not well suited. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-26 10:04:53 -04:00
Chris Mason	d0c803c404	Btrfs: Record dirty pages tree-log pages in an extent_io tree This is the same way the transaction code makes sure that all the other tree blocks are safely on disk. There's an extent_io tree for each root, and any blocks allocated to the tree logs are recorded in that tree. At tree-log sync, the extent_io tree is walked to flush down the dirty pages and wait for them. The main benefit is less time spent walking the tree log and skipping clean pages, and getting sequential IO down to the drive. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:07 -04:00
Chris Mason	4bef084857	Btrfs: Tree logging fixes * Pin down data blocks to prevent them from being reallocated like so: trans 1: allocate file extent trans 2: free file extent trans 3: free file extent during old snapshot deletion trans 3: allocate file extent to new file trans 3: fsync new file Before the tree logging code, this was legal because the fsync would commit the transation that did the final data extent free and the transaction that allocated the extent to the new file at the same time. With the tree logging code, the tree log subtransaction can commit before the transaction that freed the extent. If we crash, we're left with two different files using the extent. * Don't wait in start_transaction if log replay is going on. This avoids deadlocks from iput while we're cleaning up link counts in the replay code. * Don't deadlock in replay_one_name by trying to read an inode off the disk while holding paths for the directory * Hold the buffer lock while we mark a buffer as written. This closes a race where someone is changing a buffer while we write it. They are supposed to mark it dirty again after they change it, but this violates the cow rules. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:07 -04:00
Chris Mason	e02119d5a7	Btrfs: Add a write ahead tree log to optimize synchronous operations File syncs and directory syncs are optimized by copying their items into a special (copy-on-write) log tree. There is one log tree per subvolume and the btrfs super block points to a tree of log tree roots. After a crash, items are copied out of the log tree and back into the subvolume. See tree-log.c for all the details. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:07 -04:00
Chris Mason	b64a2851ba	Btrfs: Wait for async bio submissions to make some progress at queue time Before, the btrfs bdi congestion function was used to test for too many async bios. This keeps that check to throttle pdflush, but also adds a check while queuing bios. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	777e6bd706	Btrfs: Transaction commit: don't use filemap_fdatawait After writing out all the remaining btree blocks in the transaction, the commit code would use filemap_fdatawait to make sure it was all on disk. This means it would wait for blocks written by other procs as well. The new code walks the list of blocks for this transaction again and waits only for those required by this transaction. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Yan Zheng	7ea394f119	Btrfs: Fix nodatacow for the new data=ordered mode Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Yan Zheng	b48652c101	Btrfs: Various small fixes. This trivial patch contains two locking fixes and a off by one fix. --- Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Sage Weil	9ca9ee09c1	Btrfs: fix ioctl-initiated transactions vs wait_current_trans() Commit 597:466b27332893 (btrfs_start_transaction: wait for commits in progress) breaks the transaction start/stop ioctls by making btrfs_start_transaction conditionally wait for the next transaction to start. If an application artificially is holding a transaction open, things deadlock. This workaround maintains a count of open ioctl-initiated transactions in fs_info, and avoids wait_current_trans() if any are currently open (in start_transaction() and btrfs_throttle()). The start transaction ioctl uses a new btrfs_start_ioctl_transaction() that _does_ call wait_current_trans(), effectively pushing the join/wait decision to the outer ioctl-initiated transaction. This more or less neuters btrfs_throttle() when ioctl-initiated transactions are in use, but that seems like a pretty fundamental consequence of wrapping lots of write()'s in a transaction. Btrfs has no way to tell if the application considers a given operation as part of it's transaction. Obviously, if the transaction start/stop ioctls aren't being used, there is no effect on current behavior. Signed-off-by: Sage Weil <sage@newdream.net> --- ctree.h \| 1 + ioctl.c \| 12 +++++++++++- transaction.c \| 18 +++++++++++++----- transaction.h \| 2 ++ 4 files changed, 27 insertions(+), 6 deletions(-) Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	2dd3e67b1e	Btrfs: More throttle tuning * Make walk_down_tree wake up throttled tasks more often * Make walk_down_tree call cond_resched during long loops * As the size of the ref cache grows, wait longer in throttle * Get rid of the reada code in walk_down_tree, the leaves don't get read anymore, thanks to the ref cache. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	65b51a009e	btrfs_search_slot: reduce lock contention by cowing in two stages A btree block cow has two parts, the first is to allocate a destination block and the second is to copy the old bock over. The first part needs locks in the extent allocation tree, and may need to do IO. This changeset splits that into a separate function that can be called without any tree locks held. btrfs_search_slot is changed to drop its path and start over if it has to COW a contended block. This often means that many writers will pre-alloc a new destination for a the same contended block, but they cache their prealloc for later use on lower levels in the tree. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	18e35e0ab3	Btrfs: Throttle less often waiting for snapshots to delete Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:06 -04:00
Chris Mason	37d1aeee39	Btrfs: Throttle tuning This avoids waiting for transactions with pages locked by breaking out the code to wait for the current transaction to close into a function called by btrfs_throttle. It also lowers the limits for where we start throttling. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Yan	bcc63abbf3	Btrfs: implement memory reclaim for leaf reference cache The memory reclaiming issue happens when snapshot exists. In that case, some cache entries may not be used during old snapshot dropping, so they will remain in the cache until umount. The patch adds a field to struct btrfs_leaf_ref to record create time. Besides, the patch makes all dead roots of a given snapshot linked together in order of create time. After a old snapshot was completely dropped, we check the dead root list and remove all cache entries created before the oldest dead root in the list. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Yan Zheng	f321e49103	Btrfs: Update and fix mount -o nodatacow To check whether a given file extent is referenced by multiple snapshots, the checker walks down the fs tree through dead root and checks all tree blocks in the path. We can easily detect whether a given tree block is directly referenced by other snapshot. We can also detect any indirect reference from other snapshot by checking reference's generation. The checker can always detect multiple references, but can't reliably detect cases of single reference. So btrfs may do file data cow even there is only one reference. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	ab78c84de1	Btrfs: Throttle operations if the reference cache gets too large A large reference cache is directly related to a lot of work pending for the cleaner thread. This throttles back new operations based on the size of the reference cache so the cleaner thread will be able to keep up. Overall, this actually makes the FS faster because the cleaner thread will be more likely to find things in cache. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	017e5369eb	Btrfs: Leaf reference cache update This changes the reference cache to make a single cache per root instead of one cache per transaction, and to key by the byte number of the disk block instead of the keys inside. This makes it much less likely to have cache misses if a snapshot or something has an extra reference on a higher node or a leaf while the first transaction that added the leaf into the cache is dropping. Some throttling is added to functions that free blocks heavily so they wait for old transactions to drop. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Yan Zheng	31153d8128	Btrfs: Add a leaf reference cache Much of the IO done while dropping snapshots is done looking up leaves in the filesystem trees to see if they point to any extents and to drop the references on any extents found. This creates a cache so that IO isn't required. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Josef Bacik	aec7477b3b	Btrfs: Implement new dir index format Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	ed98b56a63	Btrfs: Take the csum mutex while reading checksums Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	f421950f86	Btrfs: Fix some data=ordered related data corruptions Stress testing was showing data checksum errors, most of which were caused by a lookup bug in the extent_map tree. The tree was caching the last pointer returned, and searches would check the last pointer first. But, search callers also expect the search to return the very first matching extent in the range, which wasn't always true with the last pointer usage. For now, the code to cache the last return value is just removed. It is easy to fix, but I think lookups are rare enough that it isn't required anymore. This commit also replaces do_sync_mapping_range with a local copy of the related functions. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:05 -04:00
Chris Mason	f929574938	btrfs_start_transaction: wait for commits in progress to finish btrfs_commit_transaction has to loop waiting for any writers in the transaction to finish before it can proceed. btrfs_start_transaction should be polite and not join a transaction that is in the process of being finished off. There are a few places that can't wait, basically the ones doing IO that might be needed to finish the transaction. For them, btrfs_join_transaction is added. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	e6dcd2dc9c	Btrfs: New data=ordered implementation The old data=ordered code would force commit to wait until all the data extents from the transaction were fully on disk. This introduced large latencies into the commit and stalled new writers in the transaction for a long time. The new code changes the way data allocations and extents work: * When delayed allocation is filled, data extents are reserved, and the extent bit EXTENT_ORDERED is set on the entire range of the extent. A struct btrfs_ordered_extent is allocated an inserted into a per-inode rbtree to track the pending extents. * As each page is written EXTENT_ORDERED is cleared on the bytes corresponding to that page. * When all of the bytes corresponding to a single struct btrfs_ordered_extent are written, The previously reserved extent is inserted into the FS btree and into the extent allocation trees. The checksums for the file data are also updated. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	77a41afb7d	Btrfs: Drop some verbose printks Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	3f157a2fd2	Btrfs: Online btree defragmentation fixes The btree defragger wasn't making forward progress because the new key wasn't being saved by the btrfs_search_forward function. This also disables the automatic btree defrag, it wasn't scaling well to huge filesystems. The auto-defrag needs to be done differently. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	1b1e2135dc	Btrfs: Add a per-inode csum mutex to avoid races creating csum items Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	a74a4b97b6	Btrfs: Replace the transaction work queue with kthreads This creates one kthread for commits and one kthread for deleting old snapshots. All the work queues are removed. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	89ce8a63d0	Add btrfs_end_transaction_throttle to force writers to wait for pending commits The existing throttle mechanism was often not sufficient to prevent new writers from coming in and making a given transaction run forever. This adds an explicit wait at the end of most operations so they will allow the current transaction to close. There is no wait inside file_write, inode updates, or cow filling, all which have different deadlock possibilities. This is a temporary measure until better asynchronous commit support is added. This code leads to stalls as it waits for data=ordered writeback, and it really needs to be fixed. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	a213501153	Btrfs: Replace the big fs_mutex with a collection of other locks Extent alloctions are still protected by a large alloc_mutex. Objectid allocations are covered by a objectid mutex Other btree operations are protected by a lock on individual btree nodes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	925baeddc5	Btrfs: Start btree concurrency work. The allocation trees and the chunk trees are serialized via their own dedicated mutexes. This means allocation location is still not very fine grained. The main FS btree is protected by locks on each block in the btree. Locks are taken top / down, and as processing finishes on a given level of the tree, the lock is released after locking the lower level. The end result of a search is now a path where only the lowest level is locked. Releasing or freeing the path drops any locks held. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Sven Wegener	3b96362cc8	Btrfs: Invalidate dcache entry after creating snapshot and We need to invalidate an existing dcache entry after creating a new snapshot or subvolume, because a negative dache entry will stop us from accessing the new snapshot or subvolume. --- ctree.h \| 23 +++++++++++++++++++++++ inode.c \| 4 ++++ transaction.c \| 4 ++++ 3 files changed, 31 insertions(+) Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	48ec2cf873	Btrfs: Fix race in running_transaction checks When a new transaction was started, the code would incorrectly set the pointer in fs_info before all the data structures were setup. fsync heavy workloads hit races on the setup of the ordered inode spinlock Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	a061fc8da7	Btrfs: Add support for online device removal This required a few structural changes to the code that manages bdev pointers: The VFS super block now gets an anon-bdev instead of a pointer to the lowest bdev. This allows us to avoid swapping the super block bdev pointer around at run time. The code to read in the super block no longer goes through the extent buffer interface. Things got ugly keeping the mapping constant. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	d6bfde8765	Btrfs: Fixes for 2.6.18 enterprise kernels 2.6.18 seems to get caught in an infinite loop when cancel_rearming_delayed_workqueue is called more than once, so this switches to cancel_delayed_work, which is arguably more correct. Also, balance_dirty_pages can run into problems with 2.6.18 based kernels because it doesn't have the per-bdi dirty limits. This avoids calling balance_dirty_pages on the btree inode unless there is actually something to balance, which is a good optimization in general. Finally there's a compile fix for ordered-data.h Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	81d7ed29ff	Btrfs: Throttle file_write when data=ordered is flushing the inode Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	ce9adaa5a7	Btrfs: Do metadata checksums for reads via a workqueue Before, metadata checksumming was done by the callers of read_tree_block, which would set EXTENT_CSUM bits in the extent tree to show that a given range of pages was already checksummed and didn't need to be verified again. But, those bits could go away via try_to_releasepage, and the end result was bogus checksum failures on pages that never left the cache. The new code validates checksums when the page is read. It is a little tricky because metadata blocks can span pages and a single read may end up going via multiple bios. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	0b86a832a1	Btrfs: Add support for multiple devices per filesystem Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	80b6794d11	Btrfs: Lower stack usage in transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	4529ba495c	Btrfs: Add data block hints to SSD mode too Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	d1310b2e0c	Btrfs: Split the extent_map code into two parts There is now extent_map for mapping offsets in the file to disk and extent_io for state tracking, IO submission and extent_bufers. The new extent_map code shifts from [start,end] pairs to [start,len], and pushes the locking out into the caller. This allows a few performance optimizations and is easier to use. A number of extent_map usage bugs were fixed, mostly with failing to remove extent_map entries when changing the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	e18e4809b1	Btrfs: Add mount -o ssd, which includes optimizations for seek free storage Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	4d5e74bc0a	Btrfs: Fix data=ordered vs wait_on_inode deadlock on older kernels Using ilookup5 during data=ordered writeback could deadlock on I_LOCK. This saves a pointer to the inode instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	2da98f003f	Btrfs: Run igrab on data=ordered inodes to prevent deadlocks during writeout Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	cee36a03e8	Rework btrfs_drop_inode to avoid scheduling Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	e2008b6140	Btrfs: Add some simple throttling to wait for data=ordered and snapshot deletion Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	3063d29f2a	Btrfs: Move snapshot creation to commit time It is very difficult to create a consistent snapshot of the btree when other writers may update the btree before the commit is done. This changes the snapshot creation to happen during the commit, while no other updates are possible. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	dc17ff8f11	Btrfs: Add data=ordered support This forces file data extents down the disk along with the metadata that references them. The current implementation is fairly simple, and just writes out all of the dirty pages in an inode before the commit. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:59 -04:00
Chris Mason	4313b3994d	Btrfs: Reduce stack usage in the resizer, fix 32 bit compiles Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	6da6abae02	Btrfs: Back port to 2.6.18-el kernels Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Christian Hesse	17636e03f4	Btrfs: section mismatch warnings --Boundary-00=_CcOWHFYK4T+JwSj Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Hello everybody, compiling btrfs into the kernel results in section mismatch warnings. __exit functions are called where they are not allowed to. The attached patch fixes this for me. Not sure if it is correct though. Signed-off-by: Christian Hesse <mail@earthworm.de> -- Regards, Chris --Boundary-00=_CcOWHFYK4T+JwSj Content-Type: text/x-diff; charset="iso-8859-1"; name="btrfs-section_mismatches.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="btrfs-section_mismatches.patch" Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:58 -04:00
Chris Mason	35ebb934bd	Btrfs: Fix PAGE_CACHE_SHIFT shifts on 32 bit machines Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Chris Mason	a6b6e75e09	Btrfs: Defrag only leaves, and only when the parent node has a single objectid This allows us to defrag huge directories, but skip the expensive defrag case in more common usage, where it does not help as much. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:57 -04:00
Chris Mason	4dc119046d	Btrfs: Add an extent buffer LRU to reduce radix tree hits Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Chris Mason	6b80053d02	Btrfs: Add back the online defragging code Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Chris Mason	db94535db7	Btrfs: Allow tree blocks larger than the page size Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Chris Mason	1a5bc167f6	Btrfs: Change the remaining radix trees used by extent-tree.c to extent_map trees Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Chris Mason	f510cfecfc	Btrfs: Fix extent_buffer and extent_state leaks Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Chris Mason	5f39d397df	Btrfs: Create extent_buffer interface for large blocksizes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:03:56 -04:00
Chris Mason	d3c2fdcf7b	Btrfs: Use balance_dirty_pages_nr on btree blocks btrfs_btree_balance_dirty is changed to pass the number of pages dirtied for more accurate dirty throttling. This lets the VM make better decisions about when to force some writeback. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:00:48 -04:00
Chris Mason	5ce14bbcdd	Btrfs: Find and remove dead roots the first time a root is loaded. Dead roots are trees left over after a crash, and they were either in the process of being removed or were waiting to be removed when the box crashed. Before, a search of the entire tree of root pointers was done on mount looking for dead roots. Now, the search is done the first time we load a root. This makes mount faster when there are a large number of snapshots, and it enables the block accounting code to properly update the block counts on the latest root as old versions of the root are reaped after a crash. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-09-11 11:15:39 -04:00
Josef Bacik	58176a9604	Btrfs: Add per-root block accounting and sysfs entries Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-29 15:47:34 -04:00
Josef Bacik	15ee9bc7ed	Btrfs: delay commits during fsync to allow more writers Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-10 16:22:09 -04:00
Chris Mason	e9d0b13b5b	Btrfs: Btree defrag on the extent-mapping tree as well Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-10 14:06:19 -04:00
Chris Mason	409eb95d7f	Btrfs: Further reduce the concurrency penalty of defrag and drop_snapshot Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-08 20:17:12 -04:00
Chris Mason	26b8003f10	Btrfs: Replace extent tree preallocation code with some bit radix magic. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-08 20:17:12 -04:00
Chris Mason	f4468e94c8	Btrfs: Let some locks go during defrag and snapshot dropping Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-08 10:08:58 -04:00
Chris Mason	6702ed490c	Btrfs: Add run time btree defrag, and an ioctl to force btree defrag This adds two types of btree defrag, a run time form that tries to defrag recently allocated blocks in the btree when they are still in ram, and an ioctl that forces defrag of all btree blocks. File data blocks are not defragged yet, but this can make a huge difference in sequential btree reads. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-07 16:15:09 -04:00
Chris Mason	9f3a742736	Btrfs: Do snapshot deletion in smaller chunks. Before, snapshot deletion was a single atomic unit. This caused considerable lock contention and required an unbounded amount of space. Now, the drop_progress field in the root item is used to indicate how far along snapshot deletion is, and to resume where it left off. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-08-07 15:52:19 -04:00
Zach Brown	ec6b910fb3	Btrfs: trivial include fixups Almost none of the files including module.h need to do so, remove them. Include sched.h in extent-tree.c to silence a warning about cond_resched() being undeclared. Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-07-11 10:00:37 -04:00
Chris Mason	ccd467d60e	Btrfs: crash recovery fixes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-28 15:57:36 -04:00
Chris Mason	4b52dff6d3	Btrfs: Fix super block updates during transaction commit The super block written during commit was not consistent with the state of the trees. This change adds an in-memory copy of the super so that we can make sure to write out consistent data during a commit. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-26 10:06:50 -04:00
Chris Mason	22bb92f376	Btrfs: Documentation update Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-22 14:49:31 -04:00
Chris Mason	5eda7b5e9b	Btrfs: Add the ability to find and remove dead roots after a crash. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-22 14:16:25 -04:00
Chris Mason	54aa1f4dfd	Btrfs: Audit callers and return codes to make sure -ENOSPC gets up the stack Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-22 14:16:25 -04:00
Chris Mason	8c2383c3dd	Subject: Rework btrfs_file_write to only allocate while page locks are held Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-18 09:57:58 -04:00
Chris Mason	340887809d	Btrfs: i386 fixes from axboe Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-12 11:36:58 -04:00
Chris Mason	6cbd557078	Btrfs: add GPLv2 Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-12 09:07:21 -04:00
Chris Mason	0cf6c62017	Btrfs: remove device tree Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-09 09:22:25 -04:00
Chris Mason	ad693af684	Btrfs: reap dead roots right after commit Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-09 08:19:57 -04:00
Chris Mason	facda1e787	Btrfs: get forced transaction commits via workqueue Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-08 18:11:48 -04:00
Chris Mason	08607c1b18	Btrfs: add compat ioctl Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-06-08 15:33:54 -04:00
Chris Mason	e37c9e6921	Btrfs: many allocator fixes, pretty solid Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-05-09 20:13:14 -04:00
Chris Mason	35b7e47610	Btrfs: fix page cache memory leak Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-05-02 15:53:43 -04:00
Chris Mason	31f3c99b73	Btrfs: allocator improvements, inode block groups Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-04-30 15:25:45 -04:00
Chris Mason	7c4452b9a6	Btrfs: smarter transaction writeback Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-04-28 09:29:35 -04:00
Chris Mason	9078a3e1e4	Btrfs: start of block group code Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-04-26 16:46:15 -04:00
Chris Mason	8fd17795b2	Btrfs: early fsync support Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-04-19 21:01:03 -04:00
Chris Mason	8352d8a473	Btrfs: add disk ioctl, mostly working Signed-off-by: Chris Mason <chris.mason@oracle.com>	2007-04-12 10:43:05 -04:00

1 2 3 4

164 Commits