On 11/26/2012 04:33:08 AM, Phil Sutter wrote:
The basic idea is taken from the linux-kernel, but further optimized.

First align the buffer to 8 bytes, then use ldrd/strd to read and store
in 8 byte quantities, then do the final bytes.

Tested using: 'date ; nand read.raw 0xE00000 0x0 0x10000 ; date'.
Without this patch, NAND read of 132MB took 49s (~2.69MB/s). With this
patch in place, reading the same amount of data was done in 27s
(~4.89MB/s). So read performance is increased by ~80%!

Signed-off-by: Nico Erfurth <n...@erfurth.eu>
Tested-by: Phil Sutter <phil.sut...@viprinet.com>
Cc: Prafulla Wadaskar <prafu...@marvell.com>
---
 drivers/mtd/nand/kirkwood_nand.c |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/drivers/mtd/nand/kirkwood_nand.c b/drivers/mtd/nand/kirkwood_nand.c
index bdab5aa..e04a59f 100644
--- a/drivers/mtd/nand/kirkwood_nand.c
+++ b/drivers/mtd/nand/kirkwood_nand.c
@@ -38,6 +38,34 @@ struct kwnandf_registers {
 static struct kwnandf_registers *nf_reg =
        (struct kwnandf_registers *)KW_NANDF_BASE;

+
+/* The basic idea is stolen from the linux kernel, but the inner loop is optimized a bit more */ +static void kw_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len)
+{
+       struct nand_chip *chip = mtd->priv;
+
+       while (len && (unsigned long)buf & 7)
+       {

Brace goes on the previous line.

+               *buf++ = readb(chip->IO_ADDR_R);
+               len--;
+       };
+
+       asm volatile (
+               ".LFlashLoop:\n"
+               "  subs\t%0, #8\n"
+               "  ldrpld\tr2, [%2]\n" // Read 2 words
+               "  strpld\tr2, [%1], #8\n" // Read 2 words
+ " bpl\t.LFlashLoop\n" // This results in one additional loop if len%8 <> 0
+               "  addne\t%0, #8\n"
+               : "+&r" (len), "+&r" (buf)
+               : "r" (chip->IO_ADDR_R)
+               : "r2", "r3", "memory", "cc"
+       );

Use a real tab (or a space) rather than \t (which only helps readability in the asm output, rather than the C source that people actually look at).

Should probably use a numeric label to avoid any possibility of conflict.

Would this make more sense as a more generic optimized memcpy_fromio() or similar?

-Scott
_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Reply via email to