Attached are two dumps from our neo db's on a demo system.
-----Original Message----- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Tobias Ivarsson Sent: Saturday, February 05, 2011 5:06 AM To: Neo4j user discussions Subject: Re: [Neo4j] Help us make Neo4j better at handling YOUR data Damn. That one place assumes that you don't have any empty strings. I've uploaded a patched version. Same location: https://github.com/downloads/thobe/neo4j-admin-store/stringstat.jar -tobias On Fri, Feb 4, 2011 at 6:52 PM, Rick Bullotta < rick.bullo...@burningskysoftware.com> wrote: > Same here. > > -----Original Message----- > From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] > On > Behalf Of Axel Morgner > Sent: Friday, February 04, 2011 12:29 PM > To: user@lists.neo4j.org > Subject: Re: [Neo4j] Help us make Neo4j better at handling YOUR data > > Hi Tobias, > > just ran the utility, but got an exception: > > Computing character frequencies for 205895 string records > ............ 30% > ................... 40% > ................... 50% > ................... 60% > ................... 70% > ................... 80% > ................... 90% > ...................100% > Matching potential encodings for 205895 string records > ...Exception in thread "main" java.lang.StringIndexOutOfBoundsException: > String index out of range: 0 > at java.lang.String.charAt(String.java:694) > at org.neo4j.admin.tool.stringstat.Numerical.matches(Numerical.java:30) > at > > org.neo4j.admin.tool.stringstat.TryAssumptions.process(TryAssumptions.java:4 > 6) > at org.neo4j.admin.tool.stringstat.Main.main(Main.java:55) > > Greetings > > Axel > > > I have written a small utility that analyzes the string properties stored > by > > Neo4j and computes some statistics about them. > > If I could get as many of you to run this tool on your stores and send > those > > statistics to me as possible, that would be great. > > > > This tool is available for download here: > > https://github.com/downloads/thobe/neo4j-admin-store/stringstat.jar > > > > To run it, all you need to do is: > > java -jar stringstat.jar /path/to/your/neo4j/store/dir > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Tobias Ivarsson <tobias.ivars...@neotechnology.com> Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Computing character frequencies for 356934 string records ................... 10% ................... 20% ................... 30% ................... 40% ................... 50% ................... 60% ................... 70% ................... 80% ................... 90% ...................100% Matching potential encodings for 356934 string records ................... 10% ................... 20% ................... 30% ................... 40% ................... 50% ................... 60% ................... 70% ................... 80% ................... 90% ...................100% <STRING STORE STATISTICS> = 4 bit frequencies = }: 5116 0: 4361 6: 3878 3: 3414 e: 2304 t: 2112 :: 1820 S: 1732 O: 1720 I: 1642 s: 1477 2: 1474 D: 1402 E: 1367 T: 1341 R: 1286 M: 1273 m: 1203 a: 1138 F: 1014 N: 1004 y: 950 Y: 895 1: 885 7: 859 4: 857 8: 835 A: 810 i: 787 G: 764 n: 748 o: 549 = 5 bit frequencies = t: 66827 r: 66714 e: 33691 a: 33534 n: 33498 m: 33407 S: 33315 y: 33283 E: 33268 i: 418 s: 257 o: 256 c: 184 g: 151 u: 135 l: 133 p: 115 d: 109 : 99 k: 89 T: 75 P: 74 h: 60 B: 52 C: 45 I: 44 A: 42 R: 42 W: 38 b: 37 D: 35 N: 31 L: 28 _: 28 f: 26 w: 24 M: 21 ": 20 v: 20 }: 20 G: 19 O: 18 H: 18 V: 17 0: 16 :: 15 F: 15 U: 15 2: 10 x: 8 .: 7 3: 7 1: 7 K: 5 ]: 5 ,: 4 [: 3 j: 3 ): 2 /: 2 4: 2 \: 2 z: 2 (: 1 = 6 bit frequencies = i: 3277 e: 1831 n: 1825 t: 1822 o: 1380 r: 1322 d: 980 a: 924 s: 866 l: 832 m: 699 D: 612 f: 554 F: 518 A: 500 u: 373 c: 369 b: 311 : 282 S: 252 y: 234 p: 178 T: 169 g: 165 R: 150 _: 136 P: 123 I: 99 E: 97 W: 91 V: 91 G: 72 h: 72 .: 66 }: 61 C: 57 k: 56 0: 55 ": 51 N: 46 O: 44 :: 43 L: 43 B: 42 M: 36 v: 33 w: 23 1: 17 H: 17 *: 15 U: 14 K: 9 ]: 8 /: 6 \: 6 q: 6 2: 5 ?: 5 x: 5 (: 3 ): 3 3: 3 5: 3 Q: 3 7: 2 Y: 2 X: 2 : 1 ': 1 -: 1 6: 1 ;: 1 >: 1 =: 1 <: 1 J: 1 z: 1 54872 strings with category bitmask 0b0 1764 strings with category bitmask 0b1 24 strings with category bitmask 0b10000 937 strings with category bitmask 0b10001 11 strings with category bitmask 0b111111 33469 strings with category bitmask 0b10000000 6 strings with category bitmask 0b100000000 130 strings with category bitmask 0b100000001 190 strings with category bitmask 0b100010000 1741 strings with category bitmask 0b100010001 4 strings with category bitmask 0b100011001 3260 strings with category bitmask 0b101000001 4 strings with category bitmask 0b101010001 200 strings with category bitmask 0b110010000 2142 strings with category bitmask 0b110010001 7 strings with category bitmask 0b111010001 398 strings with category bitmask 0b111011111 = Category index = 0. NineSevenBitAscii 1. LowerCaseHexadecimal 2. UpperCaseHexadecimal 3. PunctuatedNumerical 4. AlphaNumericalName 5. Numerical 6. FrequencyBased:4bit 7. FrequencyBased:5bit 8. FrequencyBased:6bit </STRING STORE STATISTICS>
Computing character frequencies for 20695 string records ................... 10% ................... 20% ................... 30% ................... 40% ................... 50% ................... 60% ................... 70% ................... 80% ................... 90% ...................100% Matching potential encodings for 20695 string records ................... 10% ................... 20% ................... 30% ................... 40% ................... 50% ................... 60% ................... 70% ................... 80% ................... 90% ...................100% <STRING STORE STATISTICS> = 4 bit frequencies = n: 6624 t: 6220 o: 4007 r: 3991 g: 3964 L: 3963 y: 3961 E: 3950 i: 2643 a: 2636 m: 2614 -: 2266 8: 1243 0: 1161 h: 1143 p: 1124 9: 312 : 261 ]: 198 4: 139 2: 138 6: 133 3: 118 =: 117 1: 112 7: 86 5: 61 e: 60 d: 53 u: 53 l: 48 s: 32 = 5 bit frequencies = : 379 i: 221 =: 158 0: 134 n: 128 o: 111 1: 107 l: 104 e: 94 s: 92 r: 92 u: 85 d: 67 b: 67 v: 67 2: 27 a: 25 ]: 24 t: 21 :: 18 R: 6 ,: 6 E: 2 O: 2 .: 2 p: 2 A: 1 c: 1 = 6 bit frequencies = i: 3750 t: 3241 r: 3201 o: 2929 n: 2585 a: 2055 d: 1344 s: 1193 m: 1192 A: 1153 f: 712 u: 712 g: 698 C: 695 -: 477 e: 245 l: 193 1: 185 p: 174 h: 159 : 124 3: 33 R: 26 .: 25 ]: 24 2: 23 4: 20 w: 18 v: 18 ": 17 8: 17 5: 15 7: 13 E: 10 O: 8 6: 7 ,: 6 =: 6 9: 5 B: 2 M: 2 N: 2 U: 2 T: 1 [: 1 b: 1 c: 1 9926 strings with category bitmask 0b0 7 strings with category bitmask 0b1 43 strings with category bitmask 0b10000 988 strings with category bitmask 0b10001 4064 strings with category bitmask 0b1010001 179 strings with category bitmask 0b10000000 26 strings with category bitmask 0b10000001 1 strings with category bitmask 0b11111111 11 strings with category bitmask 0b100000001 1 strings with category bitmask 0b100010000 19 strings with category bitmask 0b100010001 10 strings with category bitmask 0b100111111 2604 strings with category bitmask 0b101010001 6 strings with category bitmask 0b110000000 175 strings with category bitmask 0b110000001 6 strings with category bitmask 0b110010001 8 strings with category bitmask 0b110111111 = Category index = 0. NineSevenBitAscii 1. LowerCaseHexadecimal 2. UpperCaseHexadecimal 3. PunctuatedNumerical 4. AlphaNumericalName 5. Numerical 6. FrequencyBased:4bit 7. FrequencyBased:5bit 8. FrequencyBased:6bit </STRING STORE STATISTICS>
_______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user