[jira] [Created] (SOLR-11306) Solr example schemas inaccurate comments on docValues and StrField

2017-08-31 Thread Tom Burton-West (JIRA)
Tom Burton-West created SOLR-11306: -- Summary: Solr example schemas inaccurate comments on docValues and StrField Key: SOLR-11306 URL: https://issues.apache.org/jira/browse/SOLR-11306 Project: Solr

Error in Solr 6.6 Example schemas re: docValues for StrField type must be single-valued?

2017-08-30 Thread Tom Burton-West
/DocValuesType.html Is the comment in the example schema file completely wrong, or is there some issue with using a docValues with a multivalued StrField? Tom Burton-West https://www.hathitrust.org/blogslarge-scale-search

[jira] [Commented] (SOLR-8841) edismax: minimum match and compound words

2016-03-14 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194276#comment-15194276 ] Tom Burton-West commented on SOLR-8841: --- This looks very similar to the bug that was fixed in Solr

[jira] [Commented] (LUCENE-6828) Speed up requests for many rows

2015-10-07 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947487#comment-14947487 ] Tom Burton-West commented on LUCENE-6828: - Thanks Erick, I plan to add a docValues id field

[jira] [Commented] (LUCENE-6828) Speed up requests for many rows

2015-10-07 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947328#comment-14947328 ] Tom Burton-West commented on LUCENE-6828: - We have a use case where some our users want set-based

Re: Where Search Meets Machine Learning

2015-05-04 Thread Tom Burton-West
Hi Doug and Joaquin, This is a really interesting discussion. Joaquin, I'm looking forward to taking your code for a test drive. Thank you for making it publicly available. Doug, I'm interested in your pyramid observation. I work with academic search which has some of the problems unique

Solr and non-default minBlockSize/maxBlockSize for PostingsFormat

2015-03-18 Thread Tom Burton-West
Hello, Using Solr 10.10.2 I created a wrapper class plugin that instantiates the Lucene41PostingsFormat with non-default parameters for the minBlockSize and maxBlockSize. I have created a read-only index. (i.e. there will never be any updates to this index.) I have two questions. 1)I need to

Re: Solr and non-default minBlockSize/maxBlockSize for PostingsFormat

2015-03-18 Thread Tom Burton-West
Sorry, I know Solr 10 won't be released for quite some time, since 5 is the current release... I meant Solr 4.10.2 On Wed, Mar 18, 2015 at 4:11 PM, Tom Burton-West tburt...@umich.edu wrote: Hello, Using Solr 10.10.2 I created a wrapper class plugin that instantiates

Custom PostingsFormat SPILoader issues

2015-03-13 Thread Tom Burton-West
Hello, I'm trying to configure Solr to use a custom Postings Format using the SPILoader. I specified my custom postings format in the schema.xml file: fieldType name=text_general class=solr.TextField positionIncrementGap=100 postingsFormat=HTPostingsFormatWrapper Then I created a custom

Re: Custom PostingsFormat SPILoader issues

2015-03-13 Thread Tom Burton-West
Thanks Uwe, I'm pretty much going from what Hoss told me in the thread here:: http://lucene.472066.n3.nabble.com/How-to-configure-Solr-PostingsFormat-block-size-tt4179029.html All I am really trying to do is instantiate the regular Lucene41PostingsFormat with non-default minTermBlockSize and

Re: Custom PostingsFormat SPILoader issues

2015-03-13 Thread Tom Burton-West
in the jar. I'll try putting an entry in META-INF/services/org.apache.lucene.codecs.PostingsFormat in the jar . Sorry for not reading your message carefully enough before sending a response. Tom On Fri, Mar 13, 2015 at 12:13 PM, Tom Burton-West tburt...@umich.edu wrote: Thanks Uwe, I'm pretty

Re: Custom PostingsFormat SPILoader issues

2015-03-13 Thread Tom Burton-West
Hi Hoss, Thanks for the detailed explanation. This all makes sense now including the specific error message and my multiple errors . I put the correct org.apache.lucene.codecs.PostingsFormat in the jar, indexed and searched some documents and everything is working fine. I'll push the

[jira] [Commented] (SOLR-7175) optimize maxSegments=2/ results in more than 2 segments after optimize finishes

2015-03-06 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350575#comment-14350575 ] Tom Burton-West commented on SOLR-7175: --- Hi Mike, Our code is supposed to completely

[jira] [Comment Edited] (SOLR-7175) optimize maxSegments=2/ results in more than 2 segments after optimize finishes

2015-03-06 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350575#comment-14350575 ] Tom Burton-West edited comment on SOLR-7175 at 3/6/15 4:56 PM

[jira] [Commented] (SOLR-7175) optimize maxSegments=2/ results in more than 2 segments after optimize finishes

2015-03-06 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350835#comment-14350835 ] Tom Burton-West commented on SOLR-7175: --- Hi Mike, Thanks for taking a look. We found

[jira] [Closed] (SOLR-7175) optimize maxSegments=2/ results in more than 2 segments after optimize finishes

2015-03-06 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West closed SOLR-7175. - Resolution: Not a Problem Problem was in our client code erroneously sending items to Solr

Re: Optimize maxSegments=2 not working right with Solr 4.10.2

2015-03-05 Thread Tom Burton-West
Hello all, We are continuing to see inconsistent behavior with optimize maxSegments=2/ Out of 12 shards one or two of them end up with more than 2 segments at the finish of the optimize command. (and we see no errors in the logs) So far we have found no consistent pattern in which of the shards

[jira] [Updated] (SOLR-7175) optimize maxSegments=2/ results in more than 2 segments after optimize finishes

2015-02-27 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-7175: -- Attachment: solr4.shotz solrconfig.xml file optimize maxSegments=2/ results in more than 2

[jira] [Updated] (SOLR-7175) optimize maxSegments=2/ results in more than 2 segments after optimize finishes

2015-02-27 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-7175: -- Attachment: build-1.indexwriterlog.2015-02-23.gz Attached is an indexwriter log where after

[jira] [Created] (SOLR-7175) optimize maxSegments=2/ results in more than 2 segments after optimize finishes

2015-02-27 Thread Tom Burton-West (JIRA)
Tom Burton-West created SOLR-7175: - Summary: optimize maxSegments=2/ results in more than 2 segments after optimize finishes Key: SOLR-7175 URL: https://issues.apache.org/jira/browse/SOLR-7175

[jira] [Updated] (SOLR-7175) optimize maxSegments=2/ results in more than 2 segments after optimize finishes

2015-02-27 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-7175: -- Attachment: build-4.iw.2015-02-25.txt.gz Previous file did not have an explicit commit

Fwd: Optimize maxSegments=2 not working right with Solr 4.10.2

2015-02-25 Thread Tom Burton-West
seems the same in that after almost all of the segments are merged, one or two new segments are created when a startFullFlush happens after the big merge. Any suggestions on how to troubleshoot this would be appreciated. Tom -- Forwarded message -- From: Tom Burton-West tburt

Re: Optimize maxSegments=2 not working right with Solr 4.10.2

2015-02-25 Thread Tom Burton-West
2 or 4 (or 10) segments. - Toke Eskildsen From: Tom Burton-West [tburt...@umich.edu] Sent: 25 February 2015 18:11 To: dev@lucene.apache.org Subject: Fwd: Optimize maxSegments=2 not working right with Solr 4.10.2 No replies on the Solr users list, so

[jira] [Commented] (LUCENE-6192) Long overflow in LuceneXXSkipWriter can corrupt skip data

2015-01-26 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292536#comment-14292536 ] Tom Burton-West commented on LUCENE-6192: - Patch works! Thanks Mike! Deployed

OOM errors and indexwriter log

2015-01-08 Thread Tom Burton-West
Hi all, I'm experimenting with memory use in Solr 4.10.2. Our index is currently about 250GB and we have allocated 4GB to solr. I'm getting OOM errors: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray. (More details

Security hole in Solr 4.10.2 example:Solrconfig turns on enableRemoteStreaming

2014-12-11 Thread Tom Burton-West
Hello, In the released version as well as previous revisions starting at revision 74163 of the example solrconfig.xml file for Solr 4.10.2 enableRemoteStreaming is set to true. Released version (See line 748)

Re: Security hole in Solr 4.10.2 example:Solrconfig turns on enableRemoteStreaming

2014-12-11 Thread Tom Burton-West
Thanks Hoss, Ah, I didn't look at the timestamps on those revisions! Personally, I'd prefer having the default set to false rather than true because people don't always read the entire config file, but if there has been discussion for several years, and its been decided to leave it enabled in

Performance hit of Solr checkIntegrityAtMerge

2014-12-10 Thread Tom Burton-West
corruption from older segments into new ones, at the expense of slower merging. -- checkIntegrityAtMergefalse/checkIntegrityAtMerge Tom Burton-West http://www.hathitrust.org/blogs/Large-scale-Search

Re: Performance hit of Solr checkIntegrityAtMerge

2014-12-10 Thread Tom Burton-West
being merged. so it could be a perf hit for an extremely large merge. In 5.0 the option is removed: we reworked this computation in merging to always have locality and so on, the checking always happens. On Wed, Dec 10, 2014 at 2:51 PM, Tom Burton-West tburt...@umich.edu wrote: Hello all

Re: Performance hit of Solr checkIntegrityAtMerge

2014-12-10 Thread Tom Burton-West
Thanks Robert! Tom Start at SegmentMerger in both places. In 4.10.x you can see how it just validates every part of every reader in a naive loop: https://github.com/apache/lucene-solr/blob/lucene_solr_4_10/lucene/core/src/java/org/apache/lucene/index/SegmentMerger.java#L58 in 5.x it is

Re: queryResultMaxDocsCached vs. queryResultWindowSize

2014-09-29 Thread Tom Burton-West
at 4:38 PM, Tom Burton-West tburt...@umich.edu wrote: Hi Yonik, I'm still confused. suspect don't understand how paging and caching interact. I probably need to walk through the code. Is there a unit test that exercises SolrIndexSearcher.getDocListC or a good unit test to use

Re: queryResultMaxDocsCached vs. queryResultWindowSize

2014-09-26 Thread Tom Burton-West
Hi Yonik, I'm still confused. suspect don't understand how paging and caching interact. I probably need to walk through the code. Is there a unit test that exercises SolrIndexSearcher.getDocListC or a good unit test to use as a base to write one? Part of what confuses me is whether what

[jira] [Created] (SOLR-6560) Solr example file has outdated termIndexInterval entry

2014-09-24 Thread Tom Burton-West (JIRA)
Tom Burton-West created SOLR-6560: - Summary: Solr example file has outdated termIndexInterval entry Key: SOLR-6560 URL: https://issues.apache.org/jira/browse/SOLR-6560 Project: Solr Issue

[jira] [Updated] (SOLR-6560) Solr example file has outdated termIndexInterval entry

2014-09-24 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-6560: -- Attachment: SOLR-6560.patch Patch removes offending lines in example solrconfig.xml Solr

queryResultMaxDocsCached vs. queryResultWindowSize

2014-09-24 Thread Tom Burton-West
Hello, No response on the Solr user list so I thought I would try the dev list. queryResultWindowSize sets the number of documents to cache for each query in the queryResult cache.So if you normally output 10 results per page, and users don't go beyond page 3 of results, you could set

Re:

2014-05-17 Thread Tom Burton-West
at SearchHub. In such kind of problems, usually it's possible to create sort of specialized custom collectors doing something particular. Have a god day! On Sat, May 17, 2014 at 3:01 AM, Tom Burton-West tburt...@umich.eduwrote: Hello all, I'm trying to get relevance scoring information

Solr 4.7 example solrconfig.xml has confusing comments about a security vulnerability

2014-04-09 Thread Tom Burton-West
In /SOLR-5522 the handler configuration code for the admin/fileedit request handler which would o allow modification of Solr Config files was removed from the example solrconfig.xml, but the comments were left in the example file.

Re: Solr 4.7 example solrconfig.xml has confusing comments about a security vulnerability

2014-04-09 Thread Tom Burton-West
Hi Shawn, OK, will do. (but later today, since I have to eat lunch and then go to a meeting). Tom On Wed, Apr 9, 2014 at 1:19 PM, Shawn Heisey s...@elyograg.org wrote: On 4/9/2014 11:13 AM, Tom Burton-West wrote: In /SOLR-5522 the handler configuration code for the admin/fileedit

[jira] [Created] (SOLR-5978) Warning for SOLR-5522 (file/edit) should be removed from example solrconfig.xml

2014-04-09 Thread Tom Burton-West (JIRA)
Tom Burton-West created SOLR-5978: - Summary: Warning for SOLR-5522 (file/edit) should be removed from example solrconfig.xml Key: SOLR-5978 URL: https://issues.apache.org/jira/browse/SOLR-5978

[jira] [Updated] (SOLR-5978) Warning for SOLR-5522 (file/edit) should be removed from example solrconfig.xml

2014-04-09 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-5978: -- Attachment: SOLR-5522.patch Patch to example solrconfig.xml removes confusing comment Warning

Solr Block-Join requires uniqueKey field to be int?

2014-03-04 Thread Tom Burton-West
Hello all, We have been using strings for our uniqueKey field and discovered that Solr Block-Join requires the uniqueKey field to be an int. This is because the magic field _root_ is required to be an int, and for children it gets populated from the uniqueKey field of the parent record. Would

Re: Solr Block-Join requires uniqueKey field to be int?

2014-03-04 Thread Tom Burton-West
Thanks Yonik, It works fine with a String. How embarassing, Somehow I managed to accidentally set _root_ to an int in my schema. Don't know how I did it. Tom On Tue, Mar 4, 2014 at 11:56 AM, Yonik Seeley yo...@heliosearch.com wrote: On Tue, Mar 4, 2014 at 11:51 AM, Tom Burton-West tburt

Re: Trade-offs in choosing DocValuesFormat

2014-02-01 Thread Tom Burton-West
Thanks Shawn, Joel, and Robert, Shawn, thanks for mentioning the caveat of having to re-index when upgrading Solr. We almost always re-index when we upgrade Solr. There is a ton of misinformation in this thread. I think this might be because the DocValues implementation is a moving target, and

Trade-offs in choosing DocValuesFormat

2014-01-31 Thread Tom Burton-West
are appended below. My apologies if this question should go to Lucene user instead of dev. If it should, please let me know and also let me know how I can tell which list to ask. Tom Burton-West -- The documentation on the Solr wiki seems

Re: Estimating peak memory use for UnInvertedField faceting

2013-11-11 Thread Tom Burton-West
to monitor your Solr cluster, at least while you are tuning it. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Fri, Nov 8, 2013 at 2:41 PM, Tom Burton-West tburt...@umich.edu wrote: Hi Yonik, I don't know enough about

Estimating peak memory use for UnInvertedField faceting

2013-11-08 Thread Tom Burton-West
We are considering indexing our 11 million books at a page level, which comes to about 3 billion Solr documents. Our subject field by necessity is multi-valued so the UnInvertedField is used for faceting. When testing an index of about 200 million documents, when we do a first faceting on one

Re: Estimating peak memory use for UnInvertedField faceting

2013-11-08 Thread Tom Burton-West
, Yonik Seeley yo...@heliosearch.com wrote: On Fri, Nov 8, 2013 at 1:56 PM, Tom Burton-West tburt...@umich.edu wrote: When testing an index of about 200 million documents, when we do a first faceting on one field (query appended below), the memory use rises from about 2.5 GB to 13GB. If I run

Lucene40TermVectorsReader TVTermsEnum totalTermFreq() is not a total

2013-10-25 Thread Tom Burton-West
Hi all, I was reading some code that calls Lucene40TermVectorsReader TVTermsEnum The method totalTermFreq() actually returns freq and the method docFreq() returns 1. Once you think about the context this sort of makes sense but I found this confusing. I'm guessing there is a good reason for the

Is it possible to correct the Changes list re: Block-Join available in 4.4 vs 4.5

2013-10-07 Thread Tom Burton-West
Hello, The JIRA issue SOLR-3076, includes a note stating that Solr Block-Join capability was mistakenly listed in the 4.4 section of CHANGEs.txt, but is first available in Solr 4.5 Is it possible to revise the Changes.txt so that people looking at at changes for 4.5 will realize that Block-Join

Luceneutil high variability between runs

2013-08-16 Thread Tom Burton-West
Hello, I'm trying to benchmark a change to BM25Similarity (LUCENE-5175 )using luceneutil I'm running this on a lightly loaded machine with a load average (top) of about 0.01 when the benchmark is not running. I made the following changes: 1) localrun.py changed Competition(debug=True) to

[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

2013-08-15 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741582#comment-13741582 ] Tom Burton-West commented on LUCENE-5175: - Hi Robert, I tried running

[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

2013-08-14 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740186#comment-13740186 ] Tom Burton-West commented on LUCENE-5175: - I downloaded the luceneutils benchmark

[jira] [Comment Edited] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

2013-08-14 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740186#comment-13740186 ] Tom Burton-West edited comment on LUCENE-5175 at 8/14/13 9:09 PM

[jira] [Issue Comment Deleted] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

2013-08-14 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated LUCENE-5175: Comment: was deleted (was: I downloaded the luceneutils benchmark suite and the enwiki

[jira] [Created] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

2013-08-13 Thread Tom Burton-West (JIRA)
Tom Burton-West created LUCENE-5175: --- Summary: Add parameter to lower-bound TF normalization for BM25 (for long documents) Key: LUCENE-5175 URL: https://issues.apache.org/jira/browse/LUCENE-5175

[jira] [Updated] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

2013-08-13 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated LUCENE-5175: Attachment: LUCENE-5175.patch Patch adds optional parameter delta to lower-bound tf

[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

2013-08-13 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738627#comment-13738627 ] Tom Burton-West commented on LUCENE-5175: - Thanks Robert, In the article

[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

2013-08-13 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738818#comment-13738818 ] Tom Burton-West commented on LUCENE-5175: - I wondered about that crazy cache

[jira] [Commented] (SOLR-4763) Performance issue when using group.facet=true

2013-07-26 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720994#comment-13720994 ] Tom Burton-West commented on SOLR-4763: --- I have similar problems with performance

[jira] [Created] (LUCENE-5065) Refactor TestGrouping.java to break TestRandom into separate tests

2013-06-18 Thread Tom Burton-West (JIRA)
Tom Burton-West created LUCENE-5065: --- Summary: Refactor TestGrouping.java to break TestRandom into separate tests Key: LUCENE-5065 URL: https://issues.apache.org/jira/browse/LUCENE-5065 Project

[jira] [Created] (SOLR-4938) Solr should be able to use Lucene's BlockGroupingCollector for field-collapsing

2013-06-18 Thread Tom Burton-West (JIRA)
Tom Burton-West created SOLR-4938: - Summary: Solr should be able to use Lucene's BlockGroupingCollector for field-collapsing Key: SOLR-4938 URL: https://issues.apache.org/jira/browse/SOLR-4938

Solr field-collapsing should be able to use Lucene's BlockGroupingCollector

2013-06-11 Thread Tom Burton-West
In Lucene it is possible to use the BlockGroupingCollector for grouping in order to take advantage of indexing document blocks ( IndexWriter.addDocuments()http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/index/IndexWriter.html?is-external=true#addDocuments%28java.lang.Iterable%29 ).

[jira] [Updated] (SOLR-3076) Solr should support block joins

2013-06-11 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-3076: -- Attachment: SOLR-3076.patch Patch against trunk (SVN style, patch -p0) that adds testXML

Re: Documentation for Solr/Lucene 4.x, termIndexInterval and limitations of Lucene File format

2013-06-05 Thread Tom Burton-West
, Tom Burton-West tburt...@umich.edu wrote: Thanks Mike. I'm running CheckIndex on the 2TB index right now.Hopefully it will finish running by tomorrow. I'll send you a copy of the output. Tom On Mon, Jun 3, 2013 at 9:04 PM, Michael McCandless luc...@mikemccandless.com wrote: Hi

Re: Documentation for Solr/Lucene 4.x, termIndexInterval and limitations of Lucene File format

2013-06-04 Thread Tom Burton-West
Thanks Mike. I'm running CheckIndex on the 2TB index right now.Hopefully it will finish running by tomorrow. I'll send you a copy of the output. Tom On Mon, Jun 3, 2013 at 9:04 PM, Michael McCandless luc...@mikemccandless.com wrote: Hi Tom, On Mon, Jun 3, 2013 at 12:11 PM, Tom Burton

Documentation for Solr/Lucene 4.x, termIndexInterval and limitations of Lucene File format

2013-06-03 Thread Tom Burton-West
Hello, The current documentation for Lucene 4.3 file formats says When referring to term numbers, Lucene's current implementation uses a Java int to hold the term index, which means the maximum number of unique terms in any single index segment is ~2.1 billion times the term index interval

SOLR-3076 and IndexWriter.addDocuments()

2013-05-20 Thread Tom Burton-West
My understanding of Lucene Block-Join indexing is that at some point IndexWriter.addDocuments() or IndexWriter.updateDocuments() need to be called to actually write a block of documents to disk. I'm trying to understand how SOLR-3076 (Solr should support block joins), works and haven't been

Re: SOLR-3076 and IndexWriter.addDocuments()

2013-05-20 Thread Tom Burton-West
Found it. In AddBlockUpdateTest.testSmallBlockDirect assertEquals(2, h.getCore().getUpdateHandler().addBlock(cmd)); and in the patched code DirectUpdateHandler2.addBlock() Tom On Mon, May 20, 2013 at 5:49 PM, Tom Burton-West tburt...@umich.edu wrote: My understanding of Lucene Block-Join

[jira] [Updated] (SOLR-3076) Solr should support block joins

2013-05-17 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-3076: -- Attachment: SOLR-7036-childDocs-solr-fork-trunk-patched Thanks Vadim, I haven't used SolrJ, so

Re: Help working with patch for SOLR-3076 (Block Joins)

2013-05-16 Thread Tom Burton-West
Thanks Shawn and Vadim, I'll try the July patch against r1351040 of 4_x for now. Vadim, I'm in no hurry, but I'll watch 3076 for your patch and work with that when you post it. Tom On Thu, May 16, 2013 at 2:14 AM, Vadim Kirilchuk vkirilc...@griddynamics.com wrote: Hi, As far as i know,

[jira] [Commented] (SOLR-3076) Solr should support block joins

2013-05-16 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660124#comment-13660124 ] Tom Burton-West commented on SOLR-3076: --- I'd like to test this out with some real

Help working with patch for SOLR-3076 (Block Joins)

2013-05-15 Thread Tom Burton-West
Hello, I would like to build Solr with the July 12th Solr-3076 patch. How do I determine what version/revision of Solr I need to check out to build this patch against? I tried using the latest branch_4x and got a bunch of errors. I suspect I need an earlier revision or maybe trunk. This

Re: Ability to specify 2 different query analyzers for same indexed field in Solr

2013-03-07 Thread Tom Burton-West
to configure different dictionaries for each field, but that could be added. Also see SOLR-4381 for eventual inclusion in Solr. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 5. mars 2013 kl. 17:26 skrev Tom Burton-West tburt

Re: Ability to specify 2 different query analyzers for same indexed field in Solr

2013-03-05 Thread Tom Burton-West
value for the field and then have one or more copyfields that index the same source data differently, but don’t re-store the copied source data. -- Jack Krupansky *From:* Tom Burton-West tburt...@umich.edu *Sent:* Monday, March 04, 2013 3:57 PM *To:* dev@lucene.apache.org *Subject:* Ability

Ability to specify 2 different query analyzers for same indexed field in Solr

2013-03-04 Thread Tom Burton-West
towards the part of the code I might need to modify? Tom Tom Burton-West Information Retrieval Programmer Digital Library Production Service University of Michigan Library http://www.hathitrust.org/blogs/large-scale-search

default updateLog setting in Solr 4 example solrconfig.xml needs warning documentation for possible very large logs

2013-01-15 Thread Tom Burton-West
Hello all, We have been using Solr 4.0 for a while and suddenly we couldn't get Solr to come up. As Solr was starting up it hung after opening a Searcher. There wasn't anything else obvious in the logs. Eventually we realized that the problem was that the updatelog was being read and that the

[jira] [Commented] (LUCENE-2187) improve lucene's similarity algorithm defaults

2013-01-04 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543985#comment-13543985 ] Tom Burton-West commented on LUCENE-2187: - Hi Robert, Is this implementation

Re: VOTE: release 3.6.2

2012-12-19 Thread Tom Burton-West
Hi Robert, Would it be possible to fold in also LUCENE-4286? I don't see the 3.6 backport listed in the JIRA issue, but it would be nice to have that flag available for people still on the 3.6.x branch. Tom On Wed, Dec 19, 2012 at 3:46 PM, Michael McCandless luc...@mikemccandless.com wrote:

Re: VOTE: release 3.6.2

2012-12-19 Thread Tom Burton-West
Thanks Robert, Ok, I can see that logic. People who want the new feature can just apply the patch. Tom On Wed, Dec 19, 2012 at 5:53 PM, Robert Muir rcm...@gmail.com wrote: On Wed, Dec 19, 2012 at 5:50 PM, Tom Burton-West tburt...@umich.edu wrote: Hi Robert, Would it be possible

[jira] [Updated] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams

2012-11-29 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated LUCENE-4286: Attachment: LUCENE-4286.patch_3.x We are still using Solr 3.6 in production so I

[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-11-07 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492487#comment-13492487 ] Tom Burton-West commented on SOLR-3589: --- Forgot to work from your latest patch

[jira] [Updated] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-11-07 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-3589: -- Attachment: SOLR-3589-3.6.PATCH Backport to 3.6 r1406713. Includes synonyms test. Will test

[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-11-07 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492591#comment-13492591 ] Tom Burton-West commented on SOLR-3589: --- Hi Robert, I just put the backport to 3.6

[jira] [Updated] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-11-06 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-3589: -- Attachment: SOLR-3589.patch Back-port to 3.6 branch Edismax parser does

[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-11-06 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491922#comment-13491922 ] Tom Burton-West commented on SOLR-3589: --- I back-ported to 3.6 branch. Forgot

[jira] [Created] (SOLR-4023) Solr query parser does not correctly handle Boolean precedence

2012-10-31 Thread Tom Burton-West (JIRA)
Tom Burton-West created SOLR-4023: - Summary: Solr query parser does not correctly handle Boolean precedence Key: SOLR-4023 URL: https://issues.apache.org/jira/browse/SOLR-4023 Project: Solr

Solr 4.0 Beta Documentation issues: Is it mandatory with 4.0 to run at least one core? /example/solr/README.txt needs updating

2012-08-23 Thread Tom Burton-West
Hello, The CoreAdmin wiki page (http://wiki.apache.org/solr/CoreAdmin) implies that setting up at least one core is not mandatory and neither is solr.xml. However when trying to migrate from 3.6 to 4.0 beta, I got a message in the admin console: There are no SolrCores running — for the current

[jira] [Created] (SOLR-3753) Core admin and solr.xml documentation for 4.0 needs to be updated for 4.0 changes

2012-08-23 Thread Tom Burton-West (JIRA)
Tom Burton-West created SOLR-3753: - Summary: Core admin and solr.xml documentation for 4.0 needs to be updated for 4.0 changes Key: SOLR-3753 URL: https://issues.apache.org/jira/browse/SOLR-3753

[jira] [Updated] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-08-23 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-3589: -- Affects Version/s: 4.0-BETA Edismax parser does not honor mm parameter if analyzer splits

[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-08-23 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440583#comment-13440583 ] Tom Burton-West commented on SOLR-3589: --- Just repeated tests in Solr 4.0Beta

[jira] [Updated] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-08-23 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-3589: -- Attachment: testSolr3589.xml.gz File is gzipped. Unix line endings. Put document in solr

[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-08-23 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440669#comment-13440669 ] Tom Burton-West commented on SOLR-3589: --- I'm not at the point where I understand

[jira] [Updated] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-08-23 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated SOLR-3589: -- Attachment: testSolr3589.xml.gz See above note Edismax parser does not honor

Re: [jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-08-17 Thread Tom Burton-West
I just wanted to mention that there is not only a problem with mm=100% but also with other values of mm where the number of tokens resulting from splitting CJK or otherwise is within an mm limit. For example with mm=2 the query [fire fly] and query [fire-fly] which with WDF gets split into two

Re: [jira] [Commented] (SOLR-3723) Improve OOTB behavior: English word-splitting should default to autoGeneratePhraseQueries=true

2012-08-09 Thread Tom Burton-West
to this but gave up too soon. See: http://lucene.472066.n3.nabble.com/autoGeneratePhraseQueries-sort-of-silently-set-to-false-tc3770638.html Tom Burton-West On Thu, Aug 9, 2012 at 1:25 PM, Yonik Seeley (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-3723?page

[jira] [Commented] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams

2012-08-08 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431297#comment-13431297 ] Tom Burton-West commented on LUCENE-4286: - Thanks Robert for all your work on non

[jira] [Commented] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams

2012-08-06 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429217#comment-13429217 ] Tom Burton-West commented on LUCENE-4286: - We haven't had a request

Add flag to CJKBigramFilter to also output unigrams (Single character Han queries)

2012-08-03 Thread Tom Burton-West
that gets output) would take care of queries containing a single Han unigram. This is somewhat analogus to the flags in LUCENE-1370 for the ShingleFilter. If this makes sense I'll open a JIRA issue. Tom Burton-West

[jira] [Updated] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams

2012-08-03 Thread Tom Burton-West (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated LUCENE-4286: Summary: Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams

[jira] [Created] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well is bigrams

2012-08-03 Thread Tom Burton-West (JIRA)
Tom Burton-West created LUCENE-4286: --- Summary: Add flag to CJKBigramFilter to allow indexing unigrams as well is bigrams Key: LUCENE-4286 URL: https://issues.apache.org/jira/browse/LUCENE-4286

  1   2   >