Lucene Spatial Question: Is a tree structure explicitly created in the QuadPrefixTree implementation?

2014-10-01 Thread parth_n
Hi everyone, I have a question regarding the quadtree implementation of the spatial module of Lucene. Does the quadtree implementation (QuadPrefixTree) explicitly build a tree structure and store this information? I have gone over the QuadPrefixTree class, but from what I understand it mainly

Re: Does StandardTokenizer remove punctuation (in Lucene 4.1)

2014-10-01 Thread Dawid Weiss
Hi Steve, I have to admit I also find it frequently useful to include punctuation as tokens (even if it's filtered out by subsequent token filters for indexing, it's a useful to-have for other NLP tasks). Do you think it'd be possible (read: relatively easy) to create an analyzer (or a

Re: Does StandardTokenizer remove punctuation (in Lucene 4.1)

2014-10-01 Thread Paul Taylor
On 01/10/2014 08:08, Dawid Weiss wrote: Hi Steve, I have to admit I also find it frequently useful to include punctuation as tokens (even if it's filtered out by subsequent token filters for indexing, it's a useful to-have for other NLP tasks). Do you think it'd be possible (read: relatively

Re: Does StandardTokenizer remove punctuation (in Lucene 4.1)

2014-10-01 Thread Michael McCandless
I played with this possibility on the extremely experimental https://issues.apache.org/jira/browse/LUCENE-5012 which I haven't gotten back to for a long time... The changes on that branch adds the idea of a deleted token, by just setting a new DeletedAttribute marking whether the token is deleted

Re: Lucene Spatial Question: Is a tree structure explicitly created in the QuadPrefixTree implementation?

2014-10-01 Thread david.w.smi...@gmail.com
Hi Parth, Lucene’s “terms dictionary” (an inverted index) is the physical instantiation of the actual PrefixTree/Trie for numeric and spatial data. It doesn’t know it is — it’s just a sorted list of keys pointing to matching documents — it just so happens that the keys aren’t textual words in

Re: Does StandardTokenizer remove punctuation (in Lucene 4.1)

2014-10-01 Thread Steve Rowe
Paul, Boilerplate upgrade recommendation: consider using the most recent Lucene release (4.10.1) - it’s the most stable, performant, and featureful release available, and many bugs have been fixed since the 4.1 release. FYI, StandardTokenizer doesn’t find word boundaries for Chinese, Japanese,

Re: Solr Replication during Tomcat shutdown causes shutdown to hang/fail

2014-10-01 Thread Phil Black-Knight
I was helping to look into this with Nick I think we may have figured out the core of the problem... The problem is easily reproducible by starting replication on the slave and then sending a shutdown command to tomcat (e.g. catalina.sh stop). With a debugger attached, it looks like the

Re: Does StandardTokenizer remove punctuation (in Lucene 4.1)

2014-10-01 Thread Paul Taylor
On 01/10/2014 18:42, Steve Rowe wrote: Paul, Boilerplate upgrade recommendation: consider using the most recent Lucene release (4.10.1) - it’s the most stable, performant, and featureful release available, and many bugs have been fixed since the 4.1 release. Yeah sure, I did try this and hit