After adding a list of stop words - removed the top 100 - the full text
index works perfectly.

Thanks for suggestions!
Lars

2015-06-28 0:03 GMT+02:00 Christian Grün <christian.gr...@gmail.com>:

> Hi Lars,
>
> It looks as if the input data is indeed too large to be indexed (the
> internal id lists seem to exceed the maximum array size in main
> memory). The usual alternative to make it work is to distribute your
> document(s) into multiple databases.
>
> If you want, you can also provide us with the input data, but I assume
> it will take pretty much space?
>
> Best,
> Christian
>
>
>  Sat, Jun 27, 2015 at 12:50 PM, Lars Johnsen <yoon...@gmail.com> wrote:
> > When trying to to a full text index on a collection of texts, the process
> > runs for a couple of hours with the exit message below - I think it is
> near
> > completed. From the GUI, I have at least seen the progress bar get to
> around
> > 80 %, so I think it is safe to assume that the error is connectedt the
> final
> > stages.
> >
> > The texts are unstructured and represented as one line pr. book. Here is
> the
> > result from the index process. Parameters set in GUI are: Norwegian
> > Snowball, lemmatization, diacritics. There is set aside 30GB for the GUI.
> >
> > Path summary:
> > doc(): 317259x, strings
> >   text: 317259x, leaf
> >     text(): 317259x, strings, leaf
> >
> > Here is the error message:
> >
> > Improper use? Potential bug? Your feedback is welcome:
> > Contact: basex-talk@mailman.uni-konstanz.de
> > Version: BaseX 8.2 beta 7d38949
> > Java: Oracle Corporation, 1.7.0_79
> > OS: Linux, amd64
> > Stack Trace:
> > java.lang.NegativeArraySizeException
> > at java.util.Arrays.copyOf(Arrays.java:2271)
> > at org.basex.util.TokenBuilder.add(TokenBuilder.java:303)
> > at org.basex.util.TokenBuilder.add(TokenBuilder.java:290)
> > at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:248)
> > at org.basex.index.ft.FTBuilder.write(FTBuilder.java:155)
> > at org.basex.index.ft.FTBuilder.index(FTBuilder.java:94)
> > at org.basex.index.ft.FTBuilder.build(FTBuilder.java:102)
> > at org.basex.index.ft.FTBuilder.build(FTBuilder.java:1)
> > at org.basex.data.DiskData.createIndex(DiskData.java:195)
> > at org.basex.core.cmd.ACreate.create(ACreate.java:117)
> > at org.basex.core.cmd.CreateIndex.run(CreateIndex.java:62)
> > at org.basex.core.Command.run(Command.java:398)
> > at org.basex.core.Command.execute(Command.java:100)
> > at org.basex.core.Command.execute(Command.java:123)
> > at org.basex.gui.dialog.DialogProgress$1.run(DialogProgress.java:178)
> >
> > Regards
> > Lars G Johnsen
> > National Library of Norway
>

Reply via email to