In fact, http://dl.thehellings.com/count.py
churns through kjv.xml in 11 seconds on my machine and gives the desired output of character counts. Can be invoked with either the name of a file (python count.py kjv.xml) as part of a pipe (cat kjv.xml | ./count.py) or with a whole list of files (./count.py kjv.xml kjvfull.xml kjvlite.xml). --Greg On Sun, Jul 3, 2011 at 12:30 PM, Greg Hellings <greg.helli...@gmail.com> wrote: > A few simple pipes in Unix can do the same thing with relative ease. > > cat kjv.xml | sed -e 's/./&\n/g' | sort | uniq -c | sort -nr > 1669596 > 1661832 " > 1330866 o > 1307266 r > 1172801 s > 1156121 e > 1092384 n > 1029125 m > 901465 t > 864037 > > 864037 < > 830916 = > 776214 a > 772641 w > 625029 h > 609087 : > 560652 g > 497519 l > 469056 / > 406801 i > 393184 0 > 370919 p > 350731 1 > 312386 H > 290358 2 > 283469 8 > 263960 3 > 257239 d > 220707 . > 209066 5 > 204056 b > 197713 4 > 197400 c > 193701 7 > 183464 6 > 175932 G > 172006 9 > 152074 - > 133127 I > 126782 M > 121721 D > 115182 N > 114636 v > 113384 T > 111775 u > 109108 y > 107290 P > 94242 A > 85226 S > 84923 f > 74768 , > 73229 C > 39531 J > 36203 V > 35707 k > 34899 > 25991 E > 24737 R > 23948 F > 20676 O > 18179 x > 16367 L > 10159 ; > 6930 z > 5389 K > 5047 B > 4036 … > 3421 ? > 3283 X > 2970 ¶ > 2596 j > 2489 W > 2334 q > 2040 ' > 1776 Z > 797 U > 551 Y > 313 ! > 240 ) > 240 ( > 199 Q > 93 æ > 5 } > 5 { > 3 Æ > 1 ת > 1 ש > 1 ר > 1 ק > 1 צ > 1 פ > 1 ע > 1 ס > 1 נ > 1 מ > 1 ל > 1 כ > 1 י > 1 ט > 1 ח > 1 ז > 1 ו > 1 ה > 1 ד > 1 ג > 1 ב > 1 א > > The format looks a bit nicer on the terminal. Takes about 75 seconds > to run on the file. A few simple lines in Python or the like only > takes about 10s and is equally simple to whip up. > > --Greg > > On Sun, Jul 3, 2011 at 11:53 AM, David Haslam <dfh...@googlemail.com> wrote: >> A useful tool for analysing or editing source text files is BabelPad, the >> Unicode Text Editor (for Windows). >> http://www.babelstone.co.uk/Software/BabelPad.html >> >> One of the Menu Tool Options is Character Frequency. >> >> This can be very helpful to detect unexpected code points, such as when the >> translators were inconsistent when they were editing. >> >> David >> >> >> >> -- >> View this message in context: >> http://sword-dev.350566.n4.nabble.com/Character-Frequency-tp3642222p3642222.html >> Sent from the SWORD Dev mailing list archive at Nabble.com. >> >> _______________________________________________ >> sword-devel mailing list: sword-devel@crosswire.org >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page >> > _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page