Re: Search for misspelled words in corpus

2013-06-09 Thread Otis Gospodnetic
Interesting problem. The first thing that comes to mind is to do word expansion during indexing. Kind of like synonym expansion, but maybe a bit more dynamic. If you can have a dictionary of correctly spelled words, then for each token emitted by the tokenizer you could look up the dictionary

Re: Search for misspelled words in corpus

2013-06-09 Thread Shashi Kant
n-grams might help, followed by a edit distance metric such as Jaro-Winkler or Smith-Waterman-Gotoh to further filter out. On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting problem. The first thing that comes to mind is to do word expansion

Re: Note on The Book

2013-06-09 Thread Otis Gospodnetic
It's 2013 and people suffer from ADD. Break it up into a la carte chapter books. Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, May 29, 2013 at 6:23 PM, Jack Krupansky j...@basetechnology.com wrote: Markus, Okay, more pages it is! -- Jack Krupansky -Original

Re: HyperLogLog for Solr

2013-06-09 Thread Otis Gospodnetic
I have not heard of anyone using HLL in Solr, but: https://docs.google.com/presentation/d/1ESNiqd7HuIfuwXSSK81PAAu6AmEPEE0u_vyk4FU5x9o/present#slide=id.p https://github.com/ptdavteam/elasticsearch-approx-plugin Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, May 28, 2013 at

Re: load balancing internal Solr on Azure

2013-06-09 Thread Otis Gospodnetic
Hi Kevin, Would http://search-lucene.com/?q=LBHttpSolrServer work for you? Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, May 24, 2013 at 3:12 PM, Kevin Osborn kevin.osb...@cbsi.com wrote: We are looking install SolrCloud on Azure. We want it to be an internal service.

Re: Search for misspelled words in corpus

2013-06-09 Thread Jagdish Nomula
Another theoretical answer for this question is ngrams approach. You can index the word and its trigrams. Query the index, by the string as well as its trigrams, with a % match search. You than pass the exhaustive resultset through a more expensive scoring such as Smith Waterman. Thanks, Jagdish

Velocity / Solritas not works in solr 4.3 and Tomcat 6

2013-06-09 Thread andy tang
*Could anyone help me to see what is the reason which Solritas page failed?* *I can go to http://localhost:8080/solr without problem, but fail to go to http://localhost:8080/solr/browse* *As below is the status report! Any help is appreciated.* *Thanks!* *Andy* * * *type* Status report

Re: Search for misspelled words in corpus

2013-06-09 Thread Otis Gospodnetic
Hm, I was purposely avoiding mentioning ngrams because just ngramming all indexed tokens would balloon the index My assumption was that only *some* words are misspelled, in which case it may be better not to ngram all tokens Otis -- Solr ElasticSearch Support http://sematext.com/ On

Re: does solr support query time only stopwords?

2013-06-09 Thread jchen2000
Nope. I only searched with individual stop words. Very strange to me Otis Gospodnetic-5 wrote Maybe returned hits match other query terms. Otis Solr ElasticSearch Support http://sematext.com/ On Jun 8, 2013 6:34 PM, jchen2000 lt; jchen200@ gt; wrote: I wanted to analyze high

Re: does solr support query time only stopwords?

2013-06-09 Thread Upayavira
Can you give examples? Show your field type config, the search terms you used. Also, did you reindex after changing your field type? As the index will be written using the analyser that was active at the time of indexing, so maybe your index still contains stop words. Upayavira On Sun, Jun 9,

Re: [blogpost] Memory is overrated, use SSDs

2013-06-09 Thread Sourajit Basak
@Erick, Your revelation on SSDs is very valuable. Do you have any idea on the following ? Does more processors with less cores or less processors with more cores i.e. which of 4P2C or 2P4C has best cost per query ? ~ Sourajit On Fri, Jun 7, 2013 at 4:45 PM, Erick Erickson

Boosting based on value of field

2013-06-09 Thread Spadez
Hi, By the looks of it I have a few options with regards to boosting. I was wondering from a performance point of view am I better to set the boost of certain results on import via the DIH or instead is it better to set the boost when doing queries, by adding it to the default queries? I have a

Re: Help required with fq syntax

2013-06-09 Thread Kamal Palei
Hi Otis Your suggestion worked fine. Thanks kamal On Sun, Jun 9, 2013 at 7:58 AM, Kamal Palei palei.ka...@gmail.com wrote: Though the syntax looks fine, but I get all the records. As per example given above I get all the documents, meaning filtering did not work. I am curious to know if my

LIMIT on number of OR in fq

2013-06-09 Thread Kamal Palei
Dear All I am using below syntax to check for a particular field. fq=locations:(5000 OR 1 OR 15000 OR 2 OR 75100) With this I get the expected result properly. In a particular situations the number of ORs are more (looks around 280) something as below. fq=pref_work_locations:(5000 OR

Re: Note on The Book

2013-06-09 Thread Jack Krupansky
Point taken. Although initially the focus is on one big e-book - to make searching easier, with zero chance of printing that as one paper book, the intent is to go multi-volume for the print edition down the road a little bit. -- Jack Krupansky -Original Message- From: Otis

Re: LIMIT on number of OR in fq

2013-06-09 Thread Aloke Ghoshal
Hi Kamal, You might have to increase the value of maxBooleanClauses in solrconfig.xml (http://wiki.apache.org/solr/SolrConfigXml). The default value 1024 should have been fine for 280 search terms. Though not relevant to your query (OR query) take a look at for an explanation:

Nutch installation

2013-06-09 Thread Andrea Lanzoni
Hi everyone, I am a newcomer to Nutch and Solr and, after studying literature available on web, I tried to install them. I have not been able to match the few instructions on the wikiapache site. I then turned on YouTube and found a video on how to install Nutch and Solr on *Windows7*. I

Re: Search for misspelled words in corpus

2013-06-09 Thread Jagdish Nomula
ngrams will definitely increase the index. But the increase in size might not be super high as the total possible set of dictionary size is 26^3 and we are just storing docs list with each ngram. Another variation of the above ideas would be to add a pre-processing step, where-in you analyze the

Re: Search for misspelled words in corpus

2013-06-09 Thread Upayavira
You haven't stated why figh is correct and sight isn't. Is it because the first letter is different? Upayavira On Wed, Jun 5, 2013, at 02:10 PM, కామేశ్వర రావు భైరవభట్ల wrote: Hi, I have a problem where our text corpus on which we need to do search contains many misspelled words. Same word

Re: LIMIT on number of OR in fq

2013-06-09 Thread Jack Krupansky
Maybe it is hitting some kind of container limit on URL length, like more than 2048? Add debugQuery=true to your query and see what query is both received and parsed and generated. Also, if the default query operator is set to or, fq={! q.op=OR}..., then you can drop the OR operators for

Re: LotsOfCores feature

2013-06-09 Thread Aleksey
Thanks Paul. Just a little clarification: You mention that you migrate data using built-in replication, but if you map and route users yourself, doesn't that mean that you also need to manage replication yourself? Your routing logic needs to be aware of how to map both replicas for each user, and

Re: LotsOfCores feature

2013-06-09 Thread Upayavira
On Fri, Jun 7, 2013, at 02:59 PM, Jack Krupansky wrote: AFAICT, SolrCloud addresses the use case of distributed update for a relatively smaller number of collections (dozens?) that have a relatively larger number of rows - billions over a modest to moderate number of nodes (a handful to

Why clusterstate.json says active for a killed Solr Node?

2013-06-09 Thread Furkan KAMACI
I want to get cluster state of my SolrCloud and this is my method: private final CloudSolrServer solrServer; public SolrCloudServerFactory(String zkHost) throws MalformedURLException { this.solrServer = new CloudSolrServer(zkHost); solrServer.connect(); } and I get what I want from

RE: [blogpost] Memory is overrated, use SSDs

2013-06-09 Thread Toke Eskildsen
Sourajit Basak [sourajit.ba...@gmail.com]: Does more processors with less cores or less processors with more cores i.e. which of 4P2C or 2P4C has best cost per query ? I have not tested that, so everything I say is (somewhat qualified) guesswork. Assuming a NUMA architecture, my guess is that

Re: LotsOfCores feature

2013-06-09 Thread Jack Krupansky
You're right - ZK is simply managing the shared config information for the cluster and has no part in query or transactions between the actual nodes, except as it depends on shared config information (e.g., what the shards are and where the nodes are.) Somewhere in there I was simply making

Re: Why clusterstate.json says active for a killed Solr Node?

2013-06-09 Thread Mark Miller
The true current state is the live nodes info combined with the clusterstate.json. If a node is not live, whatever is in clusterstate.json is simply it's last state, not the current one. - Mark On Sun, Jun 9, 2013 at 4:40 PM, Furkan KAMACI furkankam...@gmail.comwrote: I want to get cluster

Re: Why clusterstate.json says active for a killed Solr Node?

2013-06-09 Thread Furkan KAMACI
Is it enough just look at only live nodes(if not: could you tell me is there any example code part at Solr source code)? By the way what does active means for clusterstate.json? 2013/6/10 Mark Miller markrmil...@gmail.com The true current state is the live nodes info combined with the

Re: Why clusterstate.json says active for a killed Solr Node?

2013-06-09 Thread Mark Miller
You currently kind of have to look at both if you want to know the true state. An active state means that shard is up to date and online serving - as long as it's live node is also up. - Mark On Jun 9, 2013, at 6:18 PM, Furkan KAMACI furkankam...@gmail.com wrote: Is it enough just look at

Re: Why clusterstate.json says active for a killed Solr Node?

2013-06-09 Thread Furkan KAMACI
Here is my code to check state of node: !liveNodes.contains(replica.getNodeName()) ? ZkStateReader.DOWN : replica.get(ZkStateReader.STATE_PROP).toString() 2013/6/10 Mark Miller markrmil...@gmail.com You currently kind of have to look at both if you want to know the true state. An active

RE: OPENNLP problems

2013-06-09 Thread Patrick Mi
Hi Lance, I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch uploaded on 6th June but still had the same problem. Regards, Patrick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Thursday, 6 June 2013 5:16 p.m. To:

Re: Boosting based on value of field

2013-06-09 Thread Otis Gospodnetic
Index time boosting should be a bit faster, but not as flexible. Probably better to go for query time boosting first. Otis Solr ElasticSearch Support http://sematext.com/ On Jun 9, 2013 5:46 AM, Spadez james_will...@hotmail.com wrote: Hi, By the looks of it I have a few options with regards

Get Statistics With CloudSolrServer?

2013-06-09 Thread Furkan KAMACI
There is a stat,st,cs section at admin page and gives information as like: Last Modified, Num Docs, Max Doc and etc. How can I get such kind of information using CloudSolrServer with Solrj?

Re: Get Statistics With CloudSolrServer?

2013-06-09 Thread Mark Miller
On Jun 9, 2013, at 7:52 PM, Furkan KAMACI furkankam...@gmail.com wrote: There is a stat,st,cs section at admin page and gives information as like: Last Modified, Num Docs, Max Doc and etc. How can I get such kind of information using CloudSolrServer with Solrj? There is an admin request

Re: OPENNLP problems

2013-06-09 Thread Lance Norskog
text_opennlp has the right behavior. text_opennlp_pos does what you describe. I'll look some more. On 06/09/2013 04:38 PM, Patrick Mi wrote: Hi Lance, I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch uploaded on 6th June but still had the same problem. Regards,

Re: OPENNLP problems

2013-06-09 Thread Lance Norskog
Found the problem. Please see: https://issues.apache.org/jira/browse/LUCENE-2899?focusedCommentId=13679293page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13679293 On 06/09/2013 04:38 PM, Patrick Mi wrote: Hi Lance, I updated the src from 4.x and applied the latest

Solr 4.3 - Schema Parsing Failed: Invalid field property: compressed

2013-06-09 Thread Uomesh
Hi, I am getting below after upgrading to Solr 4.3. Is compressed attribute no longer supported in Solr 4.3 or it is a bug in 4.3? org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Schema Parsing Failed: Invalid field property: compressed Thanks, Umesh -- View this

Re: Search for misspelled words in corpus

2013-06-09 Thread కామేశ్వర రావు భైరవభట్ల
Thanks everyone for the replies. I too had the same idea of a pre-processing step. So, I first analyzed the corpus using a dictionary and got all the misspelled words and created a separate index with those words in Solr. Now, when I search for a given query word, first I search for the exact

Re: Search for misspelled words in corpus

2013-06-09 Thread కామేశ్వర రావు భైరవభట్ల
Hi Upayavira, The word I am searching for is fight. Terms like figth, figh are spelling mistakes of fight. So I would like to find them. sight is obviously not a spelling mistake of fight. Even if it was a typo, I don't really want to match sight with fight. regards, Kamesh On Sun, Jun 9, 2013

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-09 Thread Prathik Puthran
Hi, @Walter I'm trying to implement the below feature for the user. User types in any substring of the strings in the dictionary (i.e. the indexed string) . SOLR Suggester should return all the strings in the dictionary which has the input string as substring. Thanks, Prathik On Fri, Jun 7,