Re: [basex-talk] multi-language full-text indexing

2015-04-22 Thread Christian Grün
Chris, Thanks for your feedback. Yes, I see that there is a lot of demand for a more customizable full-text index. Did you already try to build some additional index databases, based on the rules you were listing here? It's not as comfortable as a tightly coupled full-text index, but the more use

Re: [basex-talk] multi-language full-text indexing

2015-04-22 Thread Chris Yocum
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi, I just want to say that for the dictionary that I used BaseX for, having a multi-lingual full text would have been very nice. Bar that a partial index based on certain rules the user supplies would have also been nice. For instance, being able

Re: [basex-talk] Creation of Full-Text-Index failed

2015-04-22 Thread Goetz Heller
Thank you. I tried it out, and now everything works fine. Kind regards, Goetz -Ursprüngliche Nachricht- Von: Christian Grün [mailto:christian.gr...@gmail.com] Gesendet: Mittwoch, 22. April 2015 13:22 An: Goetz Heller Cc: BaseX Betreff: Re: [basex-talk] Creation of Full-Text-Index failed

Re: [basex-talk] Creation of Full-Text-Index failed

2015-04-22 Thread Christian Grün
...enjoy the fixed version [1]. Christian [1] http://files.basex.org/releases/latest On Tue, Apr 21, 2015 at 8:56 PM, Goetz Heller wrote: > For the task at hand I need to create a database on a daily base from file > packages I received. The language taken here is German, however the files > co

Re: [basex-talk] Distributing queries to several on several processors

2015-04-22 Thread Andy Bunce
Hi Erol, I am not volunteering :-) but if somebody wants to take this route this code might give some pointers [1]. It uses Apache Spark to run Saxon-HE, an XQuery example [2], and more info [3]. /Andy [1] https://github.com/elsevierlabs/spark-xml-utils [2] https://github.com/elsevierlabs/spark

Re: [basex-talk] multi-language full-text indexing

2015-04-22 Thread Goetz Heller
Thanks, Fabrice! I’ll work it out. Kind regards, Goetz Von: Fabrice Etanchaud [mailto:fetanch...@questel.com] Gesendet: Mittwoch, 22. April 2015 11:32 An: Goetz Heller; basex-talk@mailman.uni-konstanz.de Betreff: RE: [basex-talk] multi-language full-text indexing Great, Goetz ! A last

Re: [basex-talk] Distributing queries to several on several processors

2015-04-22 Thread Goetz Heller
OK. Let me do my stuff first. Then I will see if I'm able to dive deep enough into the BaseX code to come up with some meaningful contribution! Kind regards, Goetz -Ursprüngliche Nachricht- Von: Christian Grün [mailto:christian.gr...@gmail.com] Gesendet: Mittwoch, 22. April 2015 11:15

Re: [basex-talk] multi-language full-text indexing

2015-04-22 Thread Christian Grün
Reminds me of an old GitHub issue.. I have added a link to your request: https://github.com/BaseXdb/basex/issues/59. On Wed, Apr 22, 2015 at 11:35 AM, Goetz Heller wrote: > Here's another addendum: Even if multi-language full-text indexing is not > going tob e implemented in the near future, it

[basex-talk] multi-language full-text indexing

2015-04-22 Thread Goetz Heller
Here's another addendum: Even if multi-language full-text indexing is not going tob e implemented in the near future, it still would be a useful feature to be able to restrict full-text indexing to parts of a document, e.g. CREATE FULL-TEXT INDEX ON DATABASE XY STARTING WITH ( (path_a)/

Re: [basex-talk] multi-language full-text indexing

2015-04-22 Thread Fabrice Etanchaud
Great, Goetz ! A last thing : If you need to rebuild the original document from parts, be sure to have a way to retrieve them all (by document path, attribute index, or separate index collection with node-id/pre values). If disk space is not an issue, you could store the original document as it

Re: [basex-talk] multi-language full-text indexing

2015-04-22 Thread Goetz Heller
Fabrice, For the time being, this sounds quite nice. I’d to split up the files in some common part and a set of “satellites”, one satellite for each language present in the document. Thanks! Kind regards, Goetz Von: Fabrice Etanchaud [mailto:fetanch...@questel.com] Gesendet: Mittwoch, 22.

Re: [basex-talk] multi-language full-text indexing

2015-04-22 Thread Christian Grün
In a nutshell: It would take some more time to explain all the implications.. We know that there are various non-trivial issues to be solved, as we already thought about adding such an index some years ago. Cheers, Christian On Wed, Apr 22, 2015 at 11:15 AM, Goetz Heller wrote: > The case you d

[basex-talk] multi-language full-text indexing

2015-04-22 Thread Goetz Heller
The case you described should be made a non-issue: If a multi-language full-text index was created then it was surely intended to execute searches within the confines of a specific language. Hence, if none was specified in the query, a runtime error should be thrown in such cases. Kind regards,

Re: [basex-talk] Distributing queries to several on several processors

2015-04-22 Thread Christian Grün
Hi Götz (cc @ basex-talk), > OK, I think I understand. However, I think there should be some possibilities to allow the user to give hints. In my opinion, FOR-loops would be first-class candidates to use parallel streams, in particular in the use case I described in my previous posting: > > FOR $

Re: [basex-talk] Distributing queries to several on several processors

2015-04-22 Thread Christian Grün
Any volunteers out there? ;) On Wed, Apr 22, 2015 at 11:05 AM, Erol Akarsu wrote: > Christian, > > I think we should be able to attach BaseX to Apache spark. But integration > code need to be written. > Everybody is able to read from Hadoop,SOLR, ElasticSearch etc. to Spark and > process there.

Re: [basex-talk] Distributing queries to several on several processors

2015-04-22 Thread Erol Akarsu
Christian, I think we should be able to attach BaseX to Apache spark. But integration code need to be written. Everybody is able to read from Hadoop,SOLR, ElasticSearch etc. to Spark and process there. Why not for BaseX? Erol Akarsu On Wed, Apr 22, 2015 at 4:28 AM, Christian Grün wrote: > Hi G

Re: [basex-talk] multi-language full-text indexing

2015-04-22 Thread Fabrice Etanchaud
Dear Goetz, I have the same requirement (patent documents containing text in different languages). I ended up splitting/filtering each original document in localized parts inserted in different collections (each collection having its own full text index configuration). BaseX is as flexible as o

Re: [basex-talk] multi-language full-text indexing

2015-04-22 Thread Christian Grün
> It is desirable to have > documents indexed by locale-specific parts, e.g. I can see that this would absolutely make sense, but it would be quite some effort to realize it. There are also various conceptul issues related to XQuery Full Text: If you don't specify the language in the query, we'd n

Re: [basex-talk] IllegalMonitorStateException at org.basex.core.locks.DBLocking

2015-04-22 Thread Simon Chatelain
Hello, Excellent. Glad to be of use. I'll try the new snapshot right away. Cheers Simon On Wed, Apr 22, 2015 at 10:06 AM, Christian Grün wrote: > Hi Simon, > > I finally had time to look at your examples, and... > > > One more detail: [...] > > ...seemed to fix it! The original version of th

[basex-talk] multi-language full-text indexing

2015-04-22 Thread Goetz Heller
I'm working with documents destined to be consumed anywhere in the European Community. Many of them have the same tags multiple times but with a different language attribute. It does not make sense to create a full-text index for the whole of these documents therefore. It is desirable to have docum

Re: [basex-talk] Distributing queries to several on several processors

2015-04-22 Thread Christian Grün
Hi Götz, > it would > make perfect sense to parallelize the query. Is there a way to achieve this > using xQuery? Our initial attempts to integrate low-level support for parallelization in XQuery turned out not to be as successful as we hoped they would be. One reason for that is that you can bas

Re: [basex-talk] IllegalMonitorStateException at org.basex.core.locks.DBLocking

2015-04-22 Thread Christian Grün
Hi Simon, I finally had time to look at your examples, and... > One more detail: [...] ...seemed to fix it! The original version of this class was written by Jens (in the cc), but I also believe that the basic problem was that the locks instance was not synchronized. In my fix, I used a Concurre

Re: [basex-talk] RESTXQ accept/produces issue

2015-04-22 Thread Christian Grün
Hi Marc, > "If the %rest:produces annotation is specified, a function will > only be invoked if the HTTP Accept header of the request matches one > of the given types, or if it does not specify any HTTP Accept header at all." I asked Adam a while ago to get the online version of the spec upda

Re: [basex-talk] RESTXQ accept/produces issue

2015-04-22 Thread Marc van Grootel
Hi Christian, You are right, foolish of me not to verify on latest or even on 8.1 were this was fixed already. I was hitting an API that was part of our software which used an 8.0 version still. Just verified it on 8.1 and latest snapshot and there it's fine. One nitpick for the RESTXQ spec thou