[zfs-discuss] Losts of small files vs fewer big files

Don Turnbull Tue, 07 Jul 2009 19:27:29 -0700

I work with Greenplum which is essentially a number of Postgres databaseinstances clustered together. Being postgres, the data is held in a lotof individual files which can be each fairly big (hundreds of MB orseveral GB) or very small (50MB or less). We've noticed a performancedifference when our database files are many and small versus few and large.

To test this outside the database, we built a zpool using RAID-10 (itworks for RAID-z too) and filled it with 800, 5MB files. Then we used 4concurrent dd processes to read 1/4 of the files each. This reqiured123seconds.

Then we destroyed the pool, recreated it, and filled it with 20 fileseach 200MB and 780 files each 0bytes (same number of files, same totalspace consumed). The same dd reads took 15 seconds.

Any idea why this is? Various configurations of our product can dividedata in the databases into an enormous number of small files. varyingthe arc cache size limit did not have any effect. Are there othertunables available to Solaris 10 U7 (not openSolaris) that might affectthis behavior?


Thanks!
                                       -dt
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Losts of small files vs fewer big files

Reply via email to