If (start + size) is not cacheline aligned and (start & mask) > (end &
mask), the last but one cacheline won't be invalidated as it should.
Fix this by rounding `end' down to the nearest cacheline boundary if
it gets adjusted due to misalignment.
Also flush the write buffer unconditionally -- if the dcache wrote
back a line just before we invalidated it, the dirty data may be
sitting in the write buffer waiting to corrupt our buffer later.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>