Re: Can termfreq count stemmed forms of terms?

2016-03-08 Thread Aki Balogh
"end-us" and"experience" -> "experi") before being passed, i.e. termfreq(body, "end-usexperi").* Akos (Aki) Balogh Co-Founder / Chief Product Officer https://www.MarketMuse.com On Tue, Mar 8, 2016 at 1:14 PM, Aki Balogh wrote: > Hi All, > > We&#x

Can termfreq count stemmed forms of terms?

2016-03-08 Thread Aki Balogh
Hi All, We're using solr termfreq to count raw term frequencies (i.e. the tf in tf-idf). This works fine on a regular text field. However, we have a field where we've added snowball stemmer. Should termfreq also work on a stemmed field? Right now, we're only getting data back on terms where th

Re: How to assign a hash range via zookeeper?

2016-02-06 Thread Aki Balogh
Shawn - this worked great! I reloaded the collection, the nodes recovered and it seems to work great now. Thank you, Aki On Thu, Feb 4, 2016 at 7:10 PM, Shawn Heisey wrote: > On 2/4/2016 2:12 PM, Aki Balogh wrote: > > I found the state.json file and it indeed shows that the range f

Re: How to assign a hash range via zookeeper?

2016-02-05 Thread Aki Balogh
Thank you Shawn. I will try this and will get back to you if I run into any issues. Aki On Thu, Feb 4, 2016 at 7:10 PM, Shawn Heisey wrote: > On 2/4/2016 2:12 PM, Aki Balogh wrote: > > I found the state.json file and it indeed shows that the range for shard1 > > is null. >

Re: How to assign a hash range via zookeeper?

2016-02-05 Thread Aki Balogh
on with the implicit > router. What was the command you used to create the collection? > > Best, > Erick > > On Thu, Feb 4, 2016 at 1:21 PM, Aki Balogh wrote: > > I'm not sure how these hash ranges were determined, so I'm not sure if I > > should be manually set

Re: How to assign a hash range via zookeeper?

2016-02-04 Thread Aki Balogh
I'm not sure how these hash ranges were determined, so I'm not sure if I should be manually setting them or somehow allowing solr to pick them for this shard. Thanks, Aki On Thu, Feb 4, 2016 at 4:12 PM, Aki Balogh wrote: > Shawn, > > Thanks - this is very helpful. > &g

Re: How to assign a hash range via zookeeper?

2016-02-04 Thread Aki Balogh
, Shawn Heisey wrote: > On 2/4/2016 1:37 PM, Aki Balogh wrote: > > Specifically, they suggest getting clusterstate.json. But I've tried > that > > and when I get that file, I only get an empty file {} > > > > Is there another way to ask Zookeeper to cover the missi

Re: How to assign a hash range via zookeeper?

2016-02-04 Thread Aki Balogh
PS - confirmed: in the GUI, I go to Admin->Cloud->Tree, click on clusterstate.json and it's empty {} On Thu, Feb 4, 2016 at 3:37 PM, Aki Balogh wrote: > One of our shards went down. We brought it back up but it doesn't have a > hash r

How to assign a hash range via zookeeper?

2016-02-04 Thread Aki Balogh
One of our shards went down. We brought it back up but it doesn't have a hash range: active marketmuse_shard1_replica1 http://172.30.0.254:8080/solr 172.30.0.254:8080_solr active active marketmuse_shard1_replica2 172.30.0.89:8080_solr http://172.30.0.89:8080/solr true This results in

Re: Does docValues impact termfreq ?

2015-10-26 Thread Aki Balogh
s and sum up total number of > occurrences of specific term in index? Is this only way you use index or > this is side functionality? > > Thanks, > Emir > > > On 24.10.2015 22:28, Aki Balogh wrote: > >> Certainly, yes. I'm just doing a word count, ie how often do

Re: Does docValues impact termfreq ?

2015-10-24 Thread Aki Balogh
t; On Sat, Oct 24, 2015, at 09:05 PM, Aki Balogh wrote: > > Yes, sorry, I am not being clear. > > > > We are not even doing scoring, just getting the raw TF values. We're > > doing > > this in solr because it can scale well. > > > > But with large corpora

Re: Does docValues impact termfreq ?

2015-10-24 Thread Aki Balogh
ms and IDF and scoring would be > mostly TF, no? > > Upayavira > > On Sat, Oct 24, 2015, at 07:28 PM, Aki Balogh wrote: > > Thanks, let me think about that. > > > > We're using termfreq to get the TF score, but we don't know which term > > we'll

Re: Does docValues impact termfreq ?

2015-10-24 Thread Aki Balogh
at is the > same for all documents, and facet on it. Instead of counting the number > of documents, calculate the sum() of your word count field. > > I *think* that should work. > > Upayavira > > On Sat, Oct 24, 2015, at 04:24 PM, Aki Balogh wrote: > > Hi Jack, &g

Re: Does docValues impact termfreq ?

2015-10-24 Thread Aki Balogh
elaborated why you are trying to use > Solr in a way other than it was intended. > > -- Jack Krupansky > > On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh wrote: > > > Gotcha - that's disheartening. > > > > One idea: when I run termfreq, I get all of the termf

Re: Does docValues impact termfreq ?

2015-10-24 Thread Aki Balogh
ptimising your index would reduce it to one segment, and thus might > ever so slightly speed the aggregation of term frequencies, but I doubt > it'd make enough difference to make it worth doing. > > Upayavira > > On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote: >

Re: Does docValues impact termfreq ?

2015-10-24 Thread Aki Balogh
s. More importantly, what is so "heavy" about your usage? > Generally, moderate use of a feature is much more advisable to heavy usage, > unless you don't care about performance. > > -- Jack Krupansky > > On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh wrote: > >

Does docValues impact termfreq ?

2015-10-23 Thread Aki Balogh
Hello, In our solr application, we use a Function Query (termfreq) very heavily. Index time and disk space are not important, but we're looking to improve performance on termfreq at query time. I've been reading up on docValues. Would this be a way to improve performance? I had read that Lucene

Re: term frequency with stemming

2015-07-27 Thread Aki Balogh
t; But it is a good way to solve some search requirements ( always keep in > mind that stemming degrade the precision of your system in favour to your > recall) > > Cheers > > > 2015-07-25 20:21 GMT+01:00 Aki Balogh : > > > I believe I found a solution: use a third-

Re: term frequency with stemming

2015-07-25 Thread Aki Balogh
;experience" -> "experi") before being passed, i.e. termfreq(body, "end-us experi"). >From what I can tell, FunctionQuery / termfreq doesn't have a way to apply stemming. Akos (Aki) Balogh Co-Founder, MarketMuse https://www.MarketMuse.com <https://www.mark

term frequency with stemming

2015-07-24 Thread Aki Balogh
Hi All, I'm using TermVectorComponent and stemming (Porter) in order to get term frequencies with fuzzy matching. I'm stemming at index and query time. Is there a way to get term frequency from the index? * termfreq doesn't support stemming or wildcards * terms component doesn't allow additional

Re: AND for multiple faceted queries

2015-07-03 Thread Aki Balogh
rms "crib" and "bedding" perhaps. > > BTW, returning 25,000 rows is something of an anti-pattern > in Solr, usually you do that with the export handler. > > If that's not what's happening, let's see: > 1> results of adding &debug=all to the query

AND for multiple faceted queries

2015-07-02 Thread Aki Balogh
I'm trying to specify multiple fq and get the intersection: (lines separated for readability) query? q=webCrawlId:36& fq=(body:"crib bedding" OR title:"crib bedding")& fq={!frange l=0 u=0}termfreq(body,"crib bedding")& fq={!frange l=0 u=0}termfreq(title,"crib bedding")& rows=25000& tv=false& start

Specify HTTP instead of AJP on tomcat

2015-05-07 Thread Aki Balogh
Hello, I'm seeing the following error while indexing: May 06, 2015 10:52:32 PM org.apache.jk.common.MsgAjp processHeader SEVERE: BAD packet signature 18245 May 06, 2015 10:52:32 PM org.apache.jk.common.ChannelSocket processConnection SEVERE: Error, processing connection java.lang.IndexOutOfBounds

Re: Can solr TermVectorComponent return term frequency for the term in my query?

2015-02-04 Thread Aki Balogh
your help! Akos (Aki) Balogh M: 617-682-0066 Co-Founder, MarketMuse https://www.MarketMuse.com On Wed, Feb 4, 2015 at 5:34 PM, Aki Balogh wrote: > PS - I found that termfreq() actually returns the raw tf, i.e. an integer > for each document. However, I have to get the request and

Re: Can solr TermVectorComponent return term frequency for the term in my query?

2015-02-04 Thread Aki Balogh
(Aki) Balogh M: 617-682-0066 Co-Founder, MarketMuse https://www.MarketMuse.com On Wed, Feb 4, 2015 at 4:58 PM, Aki Balogh wrote: > Is there a way to set solr to only return raw tf (i.e. by maybe turning > off the DefaultSimilarity), so I could use ttf() to get the sum of raw tf > values? &g

Re: Can solr TermVectorComponent return term frequency for the term in my query?

2015-02-04 Thread Aki Balogh
wrote: > Hi, > > So you want raw tf. tf method implemented as square root of raw tf. So you > can re-obtain it by reverse operation. > 1.424 * 1.424 = 2.02 = int = 2 > > Ahmet > > > > > On Wednesday, February 4, 2015 11:31 PM, Aki Balogh > wrote: > Hi Ahmet,

Re: Can solr TermVectorComponent return term frequency for the term in my query?

2015-02-04 Thread Aki Balogh
count)? Thanks, Aki On Wed, Feb 4, 2015 at 3:41 PM, Ahmet Arslan wrote: > Hi Aki, > > How about tf function query? > https://cwiki.apache.org/confluence/display/solr/Function+Queries > > Ahmet > > > > On Wednesday, February 4, 2015 7:59 PM, Aki Balogh > wrote:

Can solr TermVectorComponent return term frequency for the term in my query?

2015-02-04 Thread Aki Balogh
I'm using solr TermVectorComponent to get term frequencies for specific terms in a corpus. I.e. I query for "q=dog" and want to get back term frequencies for "dog" in the corpus. However, when I request term frequencies, I get back ALL term frequencies for ALL matching documents, which is generati