Attached are two dumps from our neo db's on a demo system.

-----Original Message-----
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
Behalf Of Tobias Ivarsson
Sent: Saturday, February 05, 2011 5:06 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] Help us make Neo4j better at handling YOUR data

Damn. That one place assumes that you don't have any empty strings.

I've uploaded a patched version. Same location:
https://github.com/downloads/thobe/neo4j-admin-store/stringstat.jar

-tobias

On Fri, Feb 4, 2011 at 6:52 PM, Rick Bullotta <
rick.bullo...@burningskysoftware.com> wrote:

> Same here.
>
> -----Original Message-----
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
> On
> Behalf Of Axel Morgner
> Sent: Friday, February 04, 2011 12:29 PM
> To: user@lists.neo4j.org
> Subject: Re: [Neo4j] Help us make Neo4j better at handling YOUR data
>
> Hi Tobias,
>
> just ran the utility, but got an exception:
>
> Computing character frequencies for 205895 string records
> ............ 30%
> ................... 40%
> ................... 50%
> ................... 60%
> ................... 70%
> ................... 80%
> ................... 90%
> ...................100%
> Matching potential encodings for 205895 string records
> ...Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
> String index out of range: 0
>     at java.lang.String.charAt(String.java:694)
>     at
org.neo4j.admin.tool.stringstat.Numerical.matches(Numerical.java:30)
>     at
>
>
org.neo4j.admin.tool.stringstat.TryAssumptions.process(TryAssumptions.java:4
> 6)
>     at org.neo4j.admin.tool.stringstat.Main.main(Main.java:55)
>
> Greetings
>
> Axel
>
> > I have written a small utility that analyzes the string properties
stored
> by
> > Neo4j and computes some statistics about them.
> > If I could get as many of you to run this tool on your stores and send
> those
> > statistics to me as possible, that would be great.
> >
> > This tool is available for download here:
> > https://github.com/downloads/thobe/neo4j-admin-store/stringstat.jar
> >
> > To run it, all you need to do is:
> > java -jar stringstat.jar /path/to/your/neo4j/store/dir
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Tobias Ivarsson <tobias.ivars...@neotechnology.com>
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
Computing character frequencies for 356934 string records
................... 10%
................... 20%
................... 30%
................... 40%
................... 50%
................... 60%
................... 70%
................... 80%
................... 90%
...................100%
Matching potential encodings for 356934 string records
................... 10%
................... 20%
................... 30%
................... 40%
................... 50%
................... 60%
................... 70%
................... 80%
................... 90%
...................100%
<STRING STORE STATISTICS>
= 4 bit frequencies =
  }: 5116  0: 4361  6: 3878  3: 3414
  e: 2304  t: 2112  :: 1820  S: 1732
  O: 1720  I: 1642  s: 1477  2: 1474
  D: 1402  E: 1367  T: 1341  R: 1286
  M: 1273  m: 1203  a: 1138  F: 1014
  N: 1004  y:  950  Y:  895  1:  885
  7:  859  4:  857  8:  835  A:  810
  i:  787  G:  764  n:  748  o:  549
= 5 bit frequencies =
  t: 66827  r: 66714  e: 33691  a: 33534
  n: 33498  m: 33407  S: 33315  y: 33283
  E: 33268  i:   418  s:   257  o:   256
  c:   184  g:   151  u:   135  l:   133
  p:   115  d:   109   :    99  k:    89
  T:    75  P:    74  h:    60  B:    52
  C:    45  I:    44  A:    42  R:    42
  W:    38  b:    37  D:    35  N:    31
  L:    28  _:    28  f:    26  w:    24
  M:    21  ":    20  v:    20  }:    20
  G:    19  O:    18  H:    18  V:    17
  0:    16  ::    15  F:    15  U:    15
  2:    10  x:     8  .:     7  3:     7
  1:     7  K:     5  ]:     5  ,:     4
  [:     3  j:     3  ):     2  /:     2
  4:     2  \:     2  z:     2  (:     1
= 6 bit frequencies =
  i: 3277  e: 1831  n: 1825  t: 1822
  o: 1380  r: 1322  d:  980  a:  924
  s:  866  l:  832  m:  699  D:  612
  f:  554  F:  518  A:  500  u:  373
  c:  369  b:  311   :  282  S:  252
  y:  234  p:  178  T:  169  g:  165
  R:  150  _:  136  P:  123  I:   99
  E:   97  W:   91  V:   91  G:   72
  h:   72  .:   66  }:   61  C:   57
  k:   56  0:   55  ":   51  N:   46
  O:   44  ::   43  L:   43  B:   42
  M:   36  v:   33  w:   23  1:   17
  H:   17  *:   15  U:   14  K:    9
  ]:    8  /:    6  \:    6  q:    6
  2:    5  ?:    5  x:    5  (:    3
  ):    3  3:    3  5:    3  Q:    3
  7:    2  Y:    2  X:    2  
:    1
  ':    1  -:    1  6:    1  ;:    1
  >:    1  =:    1  <:    1  J:    1
  z:    1
     54872 strings with category bitmask              0b0
      1764 strings with category bitmask              0b1
        24 strings with category bitmask          0b10000
       937 strings with category bitmask          0b10001
        11 strings with category bitmask         0b111111
     33469 strings with category bitmask       0b10000000
         6 strings with category bitmask      0b100000000
       130 strings with category bitmask      0b100000001
       190 strings with category bitmask      0b100010000
      1741 strings with category bitmask      0b100010001
         4 strings with category bitmask      0b100011001
      3260 strings with category bitmask      0b101000001
         4 strings with category bitmask      0b101010001
       200 strings with category bitmask      0b110010000
      2142 strings with category bitmask      0b110010001
         7 strings with category bitmask      0b111010001
       398 strings with category bitmask      0b111011111
= Category index =
 0. NineSevenBitAscii
 1. LowerCaseHexadecimal
 2. UpperCaseHexadecimal
 3. PunctuatedNumerical
 4. AlphaNumericalName
 5. Numerical
 6. FrequencyBased:4bit
 7. FrequencyBased:5bit
 8. FrequencyBased:6bit
</STRING STORE STATISTICS>
Computing character frequencies for 20695 string records
................... 10%
................... 20%
................... 30%
................... 40%
................... 50%
................... 60%
................... 70%
................... 80%
................... 90%
...................100%
Matching potential encodings for 20695 string records
................... 10%
................... 20%
................... 30%
................... 40%
................... 50%
................... 60%
................... 70%
................... 80%
................... 90%
...................100%
<STRING STORE STATISTICS>
= 4 bit frequencies =
  n: 6624  t: 6220  o: 4007  r: 3991
  g: 3964  L: 3963  y: 3961  E: 3950
  i: 2643  a: 2636  m: 2614  -: 2266
  8: 1243  0: 1161  h: 1143  p: 1124
  9:  312   :  261  ]:  198  4:  139
  2:  138  6:  133  3:  118  =:  117
  1:  112  7:   86  5:   61  e:   60
  d:   53  u:   53  l:   48  s:   32
= 5 bit frequencies =
   : 379  i: 221  =: 158  0: 134
  n: 128  o: 111  1: 107  l: 104
  e:  94  s:  92  r:  92  u:  85
  d:  67  b:  67  v:  67  2:  27
  a:  25  ]:  24  t:  21  ::  18
  R:   6  ,:   6  E:   2  O:   2
  .:   2  p:   2  A:   1  c:   1
= 6 bit frequencies =
  i: 3750  t: 3241  r: 3201  o: 2929
  n: 2585  a: 2055  d: 1344  s: 1193
  m: 1192  A: 1153  f:  712  u:  712
  g:  698  C:  695  -:  477  e:  245
  l:  193  1:  185  p:  174  h:  159
   :  124  3:   33  R:   26  .:   25
  ]:   24  2:   23  4:   20  w:   18
  v:   18  ":   17  8:   17  5:   15
  7:   13  E:   10  O:    8  6:    7
  ,:    6  =:    6  9:    5  B:    2
  M:    2  N:    2  U:    2  T:    1
  [:    1  b:    1  c:    1
      9926 strings with category bitmask              0b0
         7 strings with category bitmask              0b1
        43 strings with category bitmask          0b10000
       988 strings with category bitmask          0b10001
      4064 strings with category bitmask        0b1010001
       179 strings with category bitmask       0b10000000
        26 strings with category bitmask       0b10000001
         1 strings with category bitmask       0b11111111
        11 strings with category bitmask      0b100000001
         1 strings with category bitmask      0b100010000
        19 strings with category bitmask      0b100010001
        10 strings with category bitmask      0b100111111
      2604 strings with category bitmask      0b101010001
         6 strings with category bitmask      0b110000000
       175 strings with category bitmask      0b110000001
         6 strings with category bitmask      0b110010001
         8 strings with category bitmask      0b110111111
= Category index =
 0. NineSevenBitAscii
 1. LowerCaseHexadecimal
 2. UpperCaseHexadecimal
 3. PunctuatedNumerical
 4. AlphaNumericalName
 5. Numerical
 6. FrequencyBased:4bit
 7. FrequencyBased:5bit
 8. FrequencyBased:6bit
</STRING STORE STATISTICS>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to