"end-us" and"experience" -> "experi") before being passed, i.e.
termfreq(body, "end-usexperi").*
Akos (Aki) Balogh
Co-Founder / Chief Product Officer
https://www.MarketMuse.com
On Tue, Mar 8, 2016 at 1:14 PM, Aki Balogh wrote:
> Hi All,
>
> We
Hi All,
We're using solr termfreq to count raw term frequencies (i.e. the tf in
tf-idf).
This works fine on a regular text field.
However, we have a field where we've added snowball stemmer.
Should termfreq also work on a stemmed field?
Right now, we're only getting data back on terms where th
Shawn - this worked great!
I reloaded the collection, the nodes recovered and it seems to work great
now.
Thank you,
Aki
On Thu, Feb 4, 2016 at 7:10 PM, Shawn Heisey wrote:
> On 2/4/2016 2:12 PM, Aki Balogh wrote:
> > I found the state.json file and it indeed shows that the range f
Thank you Shawn. I will try this and will get back to you if I run into
any issues.
Aki
On Thu, Feb 4, 2016 at 7:10 PM, Shawn Heisey wrote:
> On 2/4/2016 2:12 PM, Aki Balogh wrote:
> > I found the state.json file and it indeed shows that the range for shard1
> > is null.
>
on with the implicit
> router. What was the command you used to create the collection?
>
> Best,
> Erick
>
> On Thu, Feb 4, 2016 at 1:21 PM, Aki Balogh wrote:
> > I'm not sure how these hash ranges were determined, so I'm not sure if I
> > should be manually set
I'm not sure how these hash ranges were determined, so I'm not sure if I
should be manually setting them or somehow allowing solr to pick them for
this shard.
Thanks,
Aki
On Thu, Feb 4, 2016 at 4:12 PM, Aki Balogh wrote:
> Shawn,
>
> Thanks - this is very helpful.
>
&g
, Shawn Heisey wrote:
> On 2/4/2016 1:37 PM, Aki Balogh wrote:
> > Specifically, they suggest getting clusterstate.json. But I've tried
> that
> > and when I get that file, I only get an empty file {}
> >
> > Is there another way to ask Zookeeper to cover the missi
PS - confirmed: in the GUI, I go to Admin->Cloud->Tree, click on
clusterstate.json and it's empty {}
On Thu, Feb 4, 2016 at 3:37 PM, Aki Balogh wrote:
> One of our shards went down. We brought it back up but it doesn't have a
> hash r
One of our shards went down. We brought it back up but it doesn't have a
hash range:
active
marketmuse_shard1_replica1
http://172.30.0.254:8080/solr
172.30.0.254:8080_solr
active
active
marketmuse_shard1_replica2
172.30.0.89:8080_solr
http://172.30.0.89:8080/solr
true
This results in
s and sum up total number of
> occurrences of specific term in index? Is this only way you use index or
> this is side functionality?
>
> Thanks,
> Emir
>
>
> On 24.10.2015 22:28, Aki Balogh wrote:
>
>> Certainly, yes. I'm just doing a word count, ie how often do
t; On Sat, Oct 24, 2015, at 09:05 PM, Aki Balogh wrote:
> > Yes, sorry, I am not being clear.
> >
> > We are not even doing scoring, just getting the raw TF values. We're
> > doing
> > this in solr because it can scale well.
> >
> > But with large corpora
ms and IDF and scoring would be
> mostly TF, no?
>
> Upayavira
>
> On Sat, Oct 24, 2015, at 07:28 PM, Aki Balogh wrote:
> > Thanks, let me think about that.
> >
> > We're using termfreq to get the TF score, but we don't know which term
> > we'll
at is the
> same for all documents, and facet on it. Instead of counting the number
> of documents, calculate the sum() of your word count field.
>
> I *think* that should work.
>
> Upayavira
>
> On Sat, Oct 24, 2015, at 04:24 PM, Aki Balogh wrote:
> > Hi Jack,
&g
elaborated why you are trying to use
> Solr in a way other than it was intended.
>
> -- Jack Krupansky
>
> On Sat, Oct 24, 2015 at 11:13 AM, Aki Balogh wrote:
>
> > Gotcha - that's disheartening.
> >
> > One idea: when I run termfreq, I get all of the termf
ptimising your index would reduce it to one segment, and thus might
> ever so slightly speed the aggregation of term frequencies, but I doubt
> it'd make enough difference to make it worth doing.
>
> Upayavira
>
> On Sat, Oct 24, 2015, at 03:37 PM, Aki Balogh wrote:
>
s. More importantly, what is so "heavy" about your usage?
> Generally, moderate use of a feature is much more advisable to heavy usage,
> unless you don't care about performance.
>
> -- Jack Krupansky
>
> On Fri, Oct 23, 2015 at 8:19 AM, Aki Balogh wrote:
>
>
Hello,
In our solr application, we use a Function Query (termfreq) very heavily.
Index time and disk space are not important, but we're looking to improve
performance on termfreq at query time.
I've been reading up on docValues. Would this be a way to improve
performance?
I had read that Lucene
t; But it is a good way to solve some search requirements ( always keep in
> mind that stemming degrade the precision of your system in favour to your
> recall)
>
> Cheers
>
>
> 2015-07-25 20:21 GMT+01:00 Aki Balogh :
>
> > I believe I found a solution: use a third-
;experience" -> "experi") before being passed, i.e. termfreq(body, "end-us
experi").
>From what I can tell, FunctionQuery / termfreq doesn't have a way to apply
stemming.
Akos (Aki) Balogh
Co-Founder, MarketMuse
https://www.MarketMuse.com <https://www.mark
Hi All,
I'm using TermVectorComponent and stemming (Porter) in order to get term
frequencies with fuzzy matching. I'm stemming at index and query time.
Is there a way to get term frequency from the index?
* termfreq doesn't support stemming or wildcards
* terms component doesn't allow additional
rms "crib" and "bedding" perhaps.
>
> BTW, returning 25,000 rows is something of an anti-pattern
> in Solr, usually you do that with the export handler.
>
> If that's not what's happening, let's see:
> 1> results of adding &debug=all to the query
I'm trying to specify multiple fq and get the intersection: (lines
separated for readability)
query?
q=webCrawlId:36&
fq=(body:"crib bedding" OR title:"crib bedding")&
fq={!frange l=0 u=0}termfreq(body,"crib bedding")&
fq={!frange l=0 u=0}termfreq(title,"crib bedding")&
rows=25000&
tv=false&
start
Hello,
I'm seeing the following error while indexing:
May 06, 2015 10:52:32 PM org.apache.jk.common.MsgAjp processHeader
SEVERE: BAD packet signature 18245
May 06, 2015 10:52:32 PM org.apache.jk.common.ChannelSocket
processConnection
SEVERE: Error, processing connection
java.lang.IndexOutOfBounds
your help!
Akos (Aki) Balogh
M: 617-682-0066
Co-Founder, MarketMuse
https://www.MarketMuse.com
On Wed, Feb 4, 2015 at 5:34 PM, Aki Balogh wrote:
> PS - I found that termfreq() actually returns the raw tf, i.e. an integer
> for each document. However, I have to get the request and
(Aki) Balogh
M: 617-682-0066
Co-Founder, MarketMuse
https://www.MarketMuse.com
On Wed, Feb 4, 2015 at 4:58 PM, Aki Balogh wrote:
> Is there a way to set solr to only return raw tf (i.e. by maybe turning
> off the DefaultSimilarity), so I could use ttf() to get the sum of raw tf
> values?
&g
wrote:
> Hi,
>
> So you want raw tf. tf method implemented as square root of raw tf. So you
> can re-obtain it by reverse operation.
> 1.424 * 1.424 = 2.02 = int = 2
>
> Ahmet
>
>
>
>
> On Wednesday, February 4, 2015 11:31 PM, Aki Balogh
> wrote:
> Hi Ahmet,
count)?
Thanks,
Aki
On Wed, Feb 4, 2015 at 3:41 PM, Ahmet Arslan
wrote:
> Hi Aki,
>
> How about tf function query?
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>
> Ahmet
>
>
>
> On Wednesday, February 4, 2015 7:59 PM, Aki Balogh
> wrote:
I'm using solr TermVectorComponent to get term frequencies for specific
terms in a corpus. I.e. I query for "q=dog" and want to get back term
frequencies for "dog" in the corpus.
However, when I request term frequencies, I get back ALL term frequencies
for ALL matching documents, which is generati
28 matches
Mail list logo