any clean test failing

2015-10-24 Thread William Bell
It is getting stuck on resolve. ant clean test SOLR 5.3.1 [ivy:retrieve] retrieve done (5ms) Overriding previous definition of property "ivy.version" [ivy:retrieve] no resolved descriptor found: launching default resolve Overriding previous definition of property "ivy.version"

Re: Does docValues impact termfreq ?

2015-10-24 Thread Aki Balogh
Gotcha - that's disheartening. One idea: when I run termfreq, I get all of the termfreqs for each document one-by-one. Is there a way to have solr sum it up before creating the request, so I only receive one number in the response? On Sat, Oct 24, 2015 at 11:05 AM, Upayavira

Re: Does docValues impact termfreq ?

2015-10-24 Thread Aki Balogh
Hi Jack, I'm just using solr to get word count across a large number of documents. It's somewhat non-standard, because we're ignoring relevance, but it seems to work well for this use case otherwise. My understanding then is: 1) since termfreq is pre-processed and fetched, there's no good way

Re: Does docValues impact termfreq ?

2015-10-24 Thread Upayavira
If you just want word length, then do work during indexing - index a field for the word length. Then, I believe you can do faceting - e.g. with the json faceting API I believe you can do a sum() calculation on a field rather than the more traditional count. Thinking aloud, there might be an

Re: Does docValues impact termfreq ?

2015-10-24 Thread Aki Balogh
Thanks, let me think about that. We're using termfreq to get the TF score, but we don't know which term we'll need the TF for. So we'd have to do a corpuswide summing of termfreq for each potential term across all documents in the corpus. It seems like it'd require some development work to

Re: Does docValues impact termfreq ?

2015-10-24 Thread Upayavira
If you mean using the term frequency function query, then I'm not sure there's a huge amount you can do to improve performance. The term frequency is a number that is used often, so it is stored in the index pre-calculated. Perhaps, if your data is not changing, optimising your index would reduce

Re: Order of actions in Update request

2015-10-24 Thread Joseph Hammerman
Hi Jamie! On Sat, Oct 24, 2015 at 7:21 AM, Jamie Johnson wrote: > Looking at the code and jira I see that ordering actions in solrj update > request is currently not supported but I'd like to know if there is any > other way to get this capability. I took a quick look at the

Re: Does docValues impact termfreq ?

2015-10-24 Thread Aki Balogh
Thanks, Jack. I did some more research and found similar results. In our application, we are making multiple (think: 50) concurrent requests to calculate term frequency on a set of documents in "real-time". The faster that results return, the better. Most of these requests are unique, so cache

Re: Does docValues impact termfreq ?

2015-10-24 Thread Jack Krupansky
That's what a normal query does - Lucene takes all the terms used in the query and sums them up for each document in the response, producing a single number, the score, for each document. That's the way Solr is designed to be used. You still haven't elaborated why you are trying to use Solr in a

Re: Does docValues impact termfreq ?

2015-10-24 Thread Aki Balogh
Certainly, yes. I'm just doing a word count, ie how often does a specific term come up in the corpus? On Oct 24, 2015 4:20 PM, "Upayavira" wrote: > yes, but what do you want to do with the TF? What problem are you > solving with it? If you are able to share that... > > On Sat,

Re: Order of actions in Update request

2015-10-24 Thread Shawn Heisey
On 10/24/2015 5:21 AM, Jamie Johnson wrote: > Looking at the code and jira I see that ordering actions in solrj update > request is currently not supported but I'd like to know if there is any > other way to get this capability. I took a quick look at the XML loader > and it appears to process

Re: Does docValues impact termfreq ?

2015-10-24 Thread Aki Balogh
Yes, sorry, I am not being clear. We are not even doing scoring, just getting the raw TF values. We're doing this in solr because it can scale well. But with large corpora, retrieving the word counts takes some time, in part because solr is splitting up word count by document and generating a

Re: any clean test failing

2015-10-24 Thread Erick Erickson
I've been seeing this happen a lot lately, it seems like a series of lock files are left around under some conditions. I've also incorporated some of Mark Miller's suggestions, but perhaps one of my upgrades undid that work. I've found it much less painful to remove all the *.lck files, I don't

Re: any clean test failing

2015-10-24 Thread William Bell
OK I deleted /home/solr/.ivy2 and it started working. On Sat, Oct 24, 2015 at 11:57 AM, William Bell wrote: > It is getting stuck on resolve. > > ant clean test > > SOLR 5.3.1 > > [ivy:retrieve] retrieve done (5ms) > > Overriding previous definition of property

Re: Does docValues impact termfreq ?

2015-10-24 Thread Upayavira
Can you explain more what you are using TF for? Because it sounds rather like scoring. You could disable field norms and IDF and scoring would be mostly TF, no? Upayavira On Sat, Oct 24, 2015, at 07:28 PM, Aki Balogh wrote: > Thanks, let me think about that. > > We're using termfreq to get the

Re: Does docValues impact termfreq ?

2015-10-24 Thread Upayavira
yes, but what do you want to do with the TF? What problem are you solving with it? If you are able to share that... On Sat, Oct 24, 2015, at 09:05 PM, Aki Balogh wrote: > Yes, sorry, I am not being clear. > > We are not even doing scoring, just getting the raw TF values. We're > doing > this in

Using the ExtractRequestHandler

2015-10-24 Thread Salonee Rege
Hi, We are using Solr and need help using the ExtractRequestHandler wherein we cannot decide what input parameters we need to specify.Kindly help *Salonee Rege* USC Viterbi School of Engineering University of Southern California Master of Computer Science - Student Computer Science - B.E

Re: EdgeNGramFilterFactory for Chinese characters

2015-10-24 Thread Tomoko Uchida
> I have rich-text documents that are in both English and Chinese, and > currently I have EdgeNGramFilterFactory enabled during indexing, as I need > it for partial matching for English words. But this means it will also > break up each of the Chinese characters into different tokens.

RE: DIH Caching with Delta Import

2015-10-24 Thread Todd Long
Dyer, James-2 wrote > The DIH Cache feature does not work with delta import. Actually, much of > DIH does not work with delta import. The workaround you describe is > similar to the approach described here: > https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport , > which in my

Re: Lucene/Solr Git Mirrors 5 day lag behind SVN?

2015-10-24 Thread Michael McCandless
I added a comment on the INFRA issue. I don't understand why it periodically "gets stuck". Mike McCandless http://blog.mikemccandless.com On Fri, Oct 23, 2015 at 11:27 AM, Kevin Risden wrote: > It looks like both Apache Git mirror

Order of actions in Update request

2015-10-24 Thread Jamie Johnson
Looking at the code and jira I see that ordering actions in solrj update request is currently not supported but I'd like to know if there is any other way to get this capability. I took a quick look at the XML loader and it appears to process actions as it sees them so if the order was changed to