movs instruction will combine data to accelerate moving data,
however we need to concern two cases about it.
1. movs instruction need long lantency to startup,
so here we use general mov instruction to copy data.
2. movs instruction is not good for unaligned case,
even if src offset is 0x10, dest offset is 0x0,
we avoid and handle the case by general mov instruction.
Signed-off-by: Ma Ling <ling.ma@intel.com>
LKML-Reference: <1284664360-6138-1-git-send-email-ling.ma@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>