from:"Herb Roitblat"

Re: substring query

2015-03-04 Thread Herb Roitblat

Do you want to search for shingles? On 3/4/2015 9:16 PM, Stephen Rudd wrote: I have created a slightly hairy document collection that contains 10s of millions of DNA sequence words that I wish to process to find rarer and unique words. Each of the words is between 100 characters (nucleotides)

Re: is there a historical reason why default conjunction operator is "OR"?

2014-04-16 Thread Herb Roitblat

Actually, Google uses OR. The scoring algorithm favors documents that match on more of the ORed terms. On 4/16/2014 8:17 AM, Min-Uk Kim wrote: Hello everyone, I recently wondered, why lucene's default conjunction operator is "OR". Is there a historical reason for that? By the way, Google an

Re: Confuse with Kuromoji

2014-04-06 Thread Herb Roitblat

Thanks. These are familiar. Any other approaches that people use? I guess I'm hoping ... On 4/6/2014 7:37 AM, Benson Margulies wrote: On Sun, Apr 6, 2014 at 10:30 AM, Herb Roitblat wrote: Just curious, what are some of the things that people do to properly tokenize the queries with

Re: Confuse with Kuromoji

2014-04-06 Thread Herb Roitblat

Just curious, what are some of the things that people do to properly tokenize the queries with mixed language collections? What do you do with mixed language queries? On 4/6/2014 4:51 AM, Benson Margulies wrote: You must know what language each text is in, and use an appropriate analyzer. Som

Re: QueryParser

2014-03-24 Thread Herb Roitblat

The default query parser for CJK languages breaks text into bigrams. A word consisting of characters ABCDE is broken into tokens AB, BC, CD, DE, or "轻歌曼舞庆元旦" into data:轻歌 data:歌曼 data:曼舞 data:舞庆 data:庆元 data:元旦 Each pair may or may not be a word, but if you use the same parser (i.e. analyz

Re: Dimension mismatch exception

2014-03-21 Thread Herb Roitblat

Computing the cosine between two documents requires that the vectors for each document to be the same length (same number of elements, same dimensionality, not the norm). The length of the vector is the length of the vocabulary for the whole set. The two sets will inevitably have different nu

Re: Dimension mismatch exception

2014-03-20 Thread Herb Roitblat

If you want to compute the cosines between pairs of documents (each a compared with each b), then the dimension is 100, the size of each document. If you want to compare the whole index then you will need to make them the same length (number of elements) by padding the shorter with zeroes. There

Re: debug filters

2012-01-02 Thread Herb Roitblat

I got that one figured out. Thanks. On 12/31/2011 5:51 PM, Herb Roitblat wrote: Can someone point me to information on how to debug a filter? How do I access the bit-string? Our problem seems to be that when we set a filter, not all of the appropriate bits are set and when we use the filter

debug filters

2011-12-31 Thread Herb Roitblat

Can someone point me to information on how to debug a filter? How do I access the bit-string? Our problem seems to be that when we set a filter, not all of the appropriate bits are set and when we use the filter to retrieve the documents, not all of the documents that we intended to set are r

Re: Picking single results out of a list of results

2011-10-19 Thread Herb Roitblat

hem. -- Ian. On Sat, Oct 15, 2011 at 7:47 PM, Herb Roitblat wrote: I have an application where I would like to pick one document from somewhere in the list of search results. For example, I would like to retrieve one of the results at rank 57, another at rank 1223, etc. I'm not real clear on how

Picking single results out of a list of results

2011-10-15 Thread Herb Roitblat

I have an application where I would like to pick one document from somewhere in the list of search results. For example, I would like to retrieve one of the results at rank 57, another at rank 1223, etc. I'm not real clear on how to do it. I have seen some things on simulating pagination wi

Re: substring query

Re: is there a historical reason why default conjunction operator is "OR"?

Re: Confuse with Kuromoji

Re: Confuse with Kuromoji

Re: QueryParser

Re: Dimension mismatch exception

Re: Dimension mismatch exception

Re: debug filters

debug filters

Re: Picking single results out of a list of results

Picking single results out of a list of results

11 matches

Site Navigation

Mail list logo

Footer information