Interesting problem. The first thing that comes to mind is to do
word expansion during indexing. Kind of like synonym expansion, but
maybe a bit more dynamic. If you can have a dictionary of correctly
spelled words, then for each token emitted by the tokenizer you could
look up the dictionary
n-grams might help, followed by a edit distance metric such as Jaro-Winkler
or Smith-Waterman-Gotoh to further filter out.
On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:
Interesting problem. The first thing that comes to mind is to do
word expansion
It's 2013 and people suffer from ADD. Break it up into a la carte
chapter books.
Otis
--
Solr ElasticSearch Support
http://sematext.com/
On Wed, May 29, 2013 at 6:23 PM, Jack Krupansky j...@basetechnology.com wrote:
Markus,
Okay, more pages it is!
-- Jack Krupansky
-Original
I have not heard of anyone using HLL in Solr, but:
https://docs.google.com/presentation/d/1ESNiqd7HuIfuwXSSK81PAAu6AmEPEE0u_vyk4FU5x9o/present#slide=id.p
https://github.com/ptdavteam/elasticsearch-approx-plugin
Otis
--
Solr ElasticSearch Support
http://sematext.com/
On Tue, May 28, 2013 at
Hi Kevin,
Would http://search-lucene.com/?q=LBHttpSolrServer work for you?
Otis
--
Solr ElasticSearch Support
http://sematext.com/
On Fri, May 24, 2013 at 3:12 PM, Kevin Osborn kevin.osb...@cbsi.com wrote:
We are looking install SolrCloud on Azure. We want it to be an internal
service.
Another theoretical answer for this question is ngrams approach. You can
index the word and its trigrams. Query the index, by the string as well as
its trigrams, with a % match search. You than pass the exhaustive resultset
through a more expensive scoring such as Smith Waterman.
Thanks,
Jagdish
*Could anyone help me to see what is the reason which Solritas page failed?*
*I can go to http://localhost:8080/solr without problem, but fail to go to
http://localhost:8080/solr/browse*
*As below is the status report! Any help is appreciated.*
*Thanks!*
*Andy*
*
*
*type* Status report
Hm, I was purposely avoiding mentioning ngrams because just ngramming
all indexed tokens would balloon the index My assumption was that
only *some* words are misspelled, in which case it may be better not
to ngram all tokens
Otis
--
Solr ElasticSearch Support
http://sematext.com/
On
Nope. I only searched with individual stop words. Very strange to me
Otis Gospodnetic-5 wrote
Maybe returned hits match other query terms.
Otis
Solr ElasticSearch Support
http://sematext.com/
On Jun 8, 2013 6:34 PM, jchen2000 lt;
jchen200@
gt; wrote:
I wanted to analyze high
Can you give examples? Show your field type config, the search terms you
used.
Also, did you reindex after changing your field type? As the index will
be written using the analyser that was active at the time of indexing,
so maybe your index still contains stop words.
Upayavira
On Sun, Jun 9,
@Erick,
Your revelation on SSDs is very valuable.
Do you have any idea on the following ?
Does more processors with less cores or less processors with more cores
i.e. which of 4P2C or 2P4C has best cost per query ?
~ Sourajit
On Fri, Jun 7, 2013 at 4:45 PM, Erick Erickson
Hi,
By the looks of it I have a few options with regards to boosting. I was
wondering from a performance point of view am I better to set the boost of
certain results on import via the DIH or instead is it better to set the
boost when doing queries, by adding it to the default queries?
I have a
Hi Otis
Your suggestion worked fine.
Thanks
kamal
On Sun, Jun 9, 2013 at 7:58 AM, Kamal Palei palei.ka...@gmail.com wrote:
Though the syntax looks fine, but I get all the records. As per example
given above I get all the documents, meaning filtering did not work. I am
curious to know if my
Dear All
I am using below syntax to check for a particular field.
fq=locations:(5000 OR 1 OR 15000 OR 2 OR 75100)
With this I get the expected result properly.
In a particular situations the number of ORs are more (looks around 280)
something as below.
fq=pref_work_locations:(5000 OR
Point taken. Although initially the focus is on one big e-book - to make
searching easier, with zero chance of printing that as one paper book, the
intent is to go multi-volume for the print edition down the road a little
bit.
-- Jack Krupansky
-Original Message-
From: Otis
Hi Kamal,
You might have to increase the value of maxBooleanClauses in solrconfig.xml
(http://wiki.apache.org/solr/SolrConfigXml). The default value 1024 should
have been fine for 280 search terms.
Though not relevant to your query (OR query) take a look at for an
explanation:
Hi everyone, I am a newcomer to Nutch and Solr and, after studying
literature available on web, I tried to install them.
I have not been able to match the few instructions on the wikiapache site.
I then turned on YouTube and found a video on how to install Nutch and
Solr on *Windows7*.
I
ngrams will definitely increase the index. But the increase in size might
not be super high as the total possible set of dictionary size is 26^3 and
we are just storing docs list with each ngram.
Another variation of the above ideas would be to add a pre-processing step,
where-in you analyze the
You haven't stated why figh is correct and sight isn't. Is it because
the first letter is different?
Upayavira
On Wed, Jun 5, 2013, at 02:10 PM, కామేశ్వర రావు భైరవభట్ల wrote:
Hi,
I have a problem where our text corpus on which we need to do search
contains many misspelled words. Same word
Maybe it is hitting some kind of container limit on URL length, like more
than 2048?
Add debugQuery=true to your query and see what query is both received and
parsed and generated.
Also, if the default query operator is set to or, fq={! q.op=OR}..., then
you can drop the OR operators for
Thanks Paul. Just a little clarification:
You mention that you migrate data using built-in replication, but if
you map and route users yourself, doesn't that mean that you also need
to manage replication yourself? Your routing logic needs to be aware
of how to map both replicas for each user, and
On Fri, Jun 7, 2013, at 02:59 PM, Jack Krupansky wrote:
AFAICT, SolrCloud addresses the use case of distributed update for a
relatively smaller number of collections (dozens?) that have a relatively
larger number of rows - billions over a modest to moderate number of
nodes
(a handful to
I want to get cluster state of my SolrCloud and this is my method:
private final CloudSolrServer solrServer;
public SolrCloudServerFactory(String zkHost) throws MalformedURLException {
this.solrServer = new CloudSolrServer(zkHost);
solrServer.connect();
}
and I get what I want from
Sourajit Basak [sourajit.ba...@gmail.com]:
Does more processors with less cores or less processors with more cores
i.e. which of 4P2C or 2P4C has best cost per query ?
I have not tested that, so everything I say is (somewhat qualified) guesswork.
Assuming a NUMA architecture, my guess is that
You're right - ZK is simply managing the shared config information for the
cluster and has no part in query or transactions between the actual nodes,
except as it depends on shared config information (e.g., what the shards are
and where the nodes are.)
Somewhere in there I was simply making
The true current state is the live nodes info combined with the
clusterstate.json. If a node is not live, whatever is in clusterstate.json
is simply it's last state, not the current one.
- Mark
On Sun, Jun 9, 2013 at 4:40 PM, Furkan KAMACI furkankam...@gmail.comwrote:
I want to get cluster
Is it enough just look at only live nodes(if not: could you tell me is
there any example code part at Solr source code)? By the way what does
active means for clusterstate.json?
2013/6/10 Mark Miller markrmil...@gmail.com
The true current state is the live nodes info combined with the
You currently kind of have to look at both if you want to know the true state.
An active state means that shard is up to date and online serving - as long as
it's live node is also up.
- Mark
On Jun 9, 2013, at 6:18 PM, Furkan KAMACI furkankam...@gmail.com wrote:
Is it enough just look at
Here is my code to check state of node:
!liveNodes.contains(replica.getNodeName()) ? ZkStateReader.DOWN :
replica.get(ZkStateReader.STATE_PROP).toString()
2013/6/10 Mark Miller markrmil...@gmail.com
You currently kind of have to look at both if you want to know the true
state.
An active
Hi Lance,
I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch
uploaded on 6th June but still had the same problem.
Regards,
Patrick
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Thursday, 6 June 2013 5:16 p.m.
To:
Index time boosting should be a bit faster, but not as flexible. Probably
better to go for query time boosting first.
Otis
Solr ElasticSearch Support
http://sematext.com/
On Jun 9, 2013 5:46 AM, Spadez james_will...@hotmail.com wrote:
Hi,
By the looks of it I have a few options with regards
There is a stat,st,cs section at admin page and gives information as like:
Last Modified, Num Docs, Max Doc and etc. How can I get such kind of
information using CloudSolrServer with Solrj?
On Jun 9, 2013, at 7:52 PM, Furkan KAMACI furkankam...@gmail.com wrote:
There is a stat,st,cs section at admin page and gives information as like:
Last Modified, Num Docs, Max Doc and etc. How can I get such kind of
information using CloudSolrServer with Solrj?
There is an admin request
text_opennlp has the right behavior.
text_opennlp_pos does what you describe.
I'll look some more.
On 06/09/2013 04:38 PM, Patrick Mi wrote:
Hi Lance,
I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch
uploaded on 6th June but still had the same problem.
Regards,
Found the problem. Please see:
https://issues.apache.org/jira/browse/LUCENE-2899?focusedCommentId=13679293page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13679293
On 06/09/2013 04:38 PM, Patrick Mi wrote:
Hi Lance,
I updated the src from 4.x and applied the latest
Hi,
I am getting below after upgrading to Solr 4.3. Is compressed attribute no
longer supported in Solr 4.3 or it is a bug in 4.3?
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Schema Parsing Failed: Invalid field property: compressed
Thanks,
Umesh
--
View this
Thanks everyone for the replies. I too had the same idea of a
pre-processing step. So, I first analyzed the corpus using a dictionary and
got all the misspelled words and created a separate index with those words
in Solr. Now, when I search for a given query word, first I search for the
exact
Hi Upayavira,
The word I am searching for is fight. Terms like figth, figh are
spelling mistakes of fight. So I would like to find them. sight is
obviously not a spelling mistake of fight. Even if it was a typo, I don't
really want to match sight with fight.
regards,
Kamesh
On Sun, Jun 9, 2013
Hi,
@Walter
I'm trying to implement the below feature for the user.
User types in any substring of the strings in the dictionary (i.e. the
indexed string) .
SOLR Suggester should return all the strings in the dictionary which has
the input string as substring.
Thanks,
Prathik
On Fri, Jun 7,
39 matches
Mail list logo