Hi Francisco,
>> I have many drug products leaflets, each corresponding to 1 product. In
the
other hand we have a medical dictionary with about 10^5 terms.
I want to detect all the occurrences of those terms for any leaflet
document.
Take a look at SolrTextTagger for this use case.
Hi Naresh,
Couldn't you could just model this as an OR query since your requirement is
at least one (but can be more than one), ie:
tags:T1 tags:T2 tags:T3
-sujit
On Mon, May 11, 2015 at 4:14 AM, Naresh Yadav nyadav@gmail.com wrote:
Hi all,
Also asked this here :
Hi Vijay,
I haven't tried this myself, but perhaps you could build the two phrases as
PhraseQueries and connect them up with a SpanQuery? Something like this
(using your original example).
PhraseQuery p1 = new PhraseQuery();
for (String word : this is phrase 1.split()) {
p1.add(new
about adding another Facet Component that will be executed after
the standard FacetComponent. Let me know if you think we should consider
other options.
Thanks,
-Ha
-Original Message-
From: sujitatgt...@gmail.com [mailto:sujitatgt...@gmail.com] On Behalf Of
Sujit Pal
Sent
Hi Ha,
I am the author of the blog post you mention. To your question, I don't
know if the code will work without change (since the Lucene/Solr API has
evolved so much over the last few years), but a more preferred way using
Function Queries way may be found in slides for Timothy Potter's talk
Hi Ludovic,
A bit late to the party, sorry, but here is a bit of a riff off Eric's
idea. Why not store the previous terms in a Bloom filter and once you get
the terms from this week, check to see if they are not in the set. Once you
find the set, add them to the Bloom filter. Bloom filters are
Hi Trey,
In an application I built few years ago, I had a component that rewrote the
input query into a Lucene BooleanQuery and we would set the
minimumNumberShouldMatch value for the query. Worked well, but lately we
are trying to move away from writing our own custom components since
Hi Smitha,
Have you looked at Facet queries? It allows you to attach Solr queries to
facets. The problem with this is that you will need to know all possible
combinations of language and binding (or make an initial query to find this
information).
Hi Eugene,
In a system we built couple of years ago, we had a corpus of English and
French mixed (and Spanish on the way but that was implemented by client
after we handed off). We had different fields for each language. So (title,
body) for English docs was (title_en, body_en), for French
Have you looked at IndexSchema? That would offer you methods to query index
metadata using SolrJ.
http://lucene.apache.org/solr/4_7_2/solr-core/org/apache/solr/schema/IndexSchema.html
-sujit
On Tue, May 27, 2014 at 1:56 PM, T. Kuro Kurosaka k...@healthline.comwrote:
I'd like to write Solr
about, seems like difficult and
time consuming for students like me as i will have to submit this in next
15 Days.
Please suggest me something.
On Tue, Mar 11, 2014 at 5:12 AM, Sujit Pal sujit@comcast.net wrote:
Hi Sohan,
You would be the best person to answer your question of how
Sujit and all for your views about semantic search in solr.
But How do i proceed towards, i mean how do i start off the things to get
on track ?
On Sat, Mar 8, 2014 at 10:50 PM, Sujit Pal sujit@comcast.net wrote:
Thanks for sharing this link Sohan, its an interesting approach. Since
you
Thanks for sharing this link Sohan, its an interesting approach. Since you
have effectively defined what you mean by Semantic Search, there are couple
other approaches I know of to do something like this:
1) preprocess your documents looking for terms that co-occur in the same
document. The more
Hi Furkan,
In the stock definition of the payload field:
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/collection1/conf/schema.xml?view=markup
the analyzer for payloads field type is a WhitespaceTokenizerFactory
followed by a DelimitedPayloadTokenFilterFactory. So if you send
In our case, it is because all our other applications are deployed on
Tomcat and ops is familiar with the deployment process. We also had
customizations that needed to go in, so we inserted our custom JAR into the
solr.war's WEB-INF/lib directory, so to ops the process of deploying Solr
was
Hi Lisheng,
We did something similar in Solr using a custom handler (but I think you could
just build a custom QeryParser to do this), but you could do this in your
application as well, ie, get the language and then rewrite your query to use
the language specific fields. Come to think of it,
Hi ballusethuraman,
I am sure you have done this already, but just to be sure, did you reindex your
existing kilometer data after you changed the data type from string to long? If
not, then you should.
-sujit
On Mar 23, 2013, at 11:21 PM, ballusethuraman wrote:
Hi, I am having a
You could also do this outside Solr, in your client. If your query is
surrounded by quotes, then strip away the quotes and make
q=text_exact_field:your_unquoted_query. Probably better to do outside Solr in
general keeping in mind the upgrade path.
-sujit
On Feb 21, 2013, at 12:20 PM, Van
/uima path.
Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch
can I checkout? This is the Stable release I am running:
Solr 4.1.0 1434440 - sarowe - 2013-01-16 17:21:36
Regards, Bart
On 8 Feb 2013, at 22:11, SUJIT PAL wrote:
Hi Bart,
I did some work
it works perfect.
Best regards, Bart
On 11 Feb 2013, at 20:13, SUJIT PAL wrote:
Hi Bart,
Like I said, I didn't actually hook my UIMA stuff into Solr, content and
queries are annotated before they reach Solr. What you describe sounds like
a classpath problem (but of course you already
Hi Siva,
You will probably get a better reply if you head over to the nutch mailing list
[http://nutch.apache.org/mailing_lists.html] and ask there.
Nutch 2.1 may be what you are looking for (stores pages in NoSQL database).
Regards,
Sujit
On Feb 10, 2013, at 9:16 PM, SivaKarthik wrote:
Hi Bart,
I did some work with UIMA but this was to annotate the data before it goes to
Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked through
the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I believe you
will have to set up your own aggregate analysis
Hi Christian,
Since customization is not a problem in your case, how about writing out the
userId and excluded document ids to the database when it is excluded, and then
for each query from the user (possibly identified by a userid parameter),
lookup the database by userid, construct a NOT
Hi,
We are using google translate to do something like what you (onlinespending)
want to do, so maybe it will help.
During indexing, we store the searchable fields from documents into a fields
named _en, _fr, _es, etc. So assuming we capture title and body from each
document, the fields are
Hi Srilatha,
One way to do this would be by making two calls, one to your sponsored list
where you pick two at random and a solr call where you pick all the search
results and then stick them together in your client.
Sujit
On Oct 4, 2012, at 12:39 AM, srilatha wrote:
For an E-commerce
Hi Alex,
I implemented something similar using the rules described in this page:
http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences
The idea is to normalize the British spelling form to the American form during
indexing and query using a tokenizer that takes in a
Hi Samarendra,
This does look like a candidate for a custom query component if you want to do
this inside Solr. You can of course continue to do this at the client.
-sujit
On May 15, 2012, at 12:26 PM, Samarendra Pratap wrote:
Hi,
I need a suggestion for improving relevance of search
Hi Ian,
I believe you may be able to use a bunch of facet.query parameters, something
like this:
facet.query=yourfield:[NOW-1DAY TO NOW]
facet.query=yourfield:[NOW-2DAY to NOW-1DAY]
...
and so on.
-sujit
On May 3, 2012, at 10:41 PM, Ian Holsman wrote:
Hi.
I would like to be able to do a
Hi Hoss,
Thanks for the pointers, and sorry, it was a bug in my code (was some dead code
which was alphabetizing the facet link text and also the parameters themselves
indirectly by reference).
I actually ended up building a servlet and a component to print out the
multi-valued parameters
ThreadLocal variable, thereby making it available
to your Solr component. It's kind of a hack but would work.
Sent from my phone
On Mar 17, 2012, at 6:53 PM, SUJIT PAL sujit@comcast.net wrote:
Thanks Pravesh,
Yes, converting the myparam to a single (comma-separated) field is probably
Thanks Pravesh,
Yes, converting the myparam to a single (comma-separated) field is probably the
best approach, but as I mentioned, this is probably a bit too late for this to
be practical in my case...
The myparam parameters are facet filter queries, and so far order did not
matter, since
Hello,
I have a custom component which depends on the ordering of a multi-valued
parameter. Unfortunately it looks like the values do not come back in the same
order as they were put in the URL. Here is some code to explain the behavior:
URL:
Hi Thomas,
With Java (from within a custom handler in Solr) you can get a handle to the
IndexSchema from the request, like so:
IndexSchema schema = req.getSchema();
SchemaField sf = schema.getField(fielaname);
boolean isMultiValued = sf.multiValued();
From within SolrJ code, you can use
Hi Tejinder,
I had this problem yesterday (believe it or not :-)), and the fix for us was to
make Tomcat UTF-8 compliant. In server.xml, there is a Controller tag, we
added the attribute URIEncoding=UTF-8 and restarted Tomcat. Not sure what
container you are using, if its Tomcat this will
.
But your problem space may differ.
Best
Erick
On Wed, Feb 1, 2012 at 6:55 PM, SUJIT PAL sujit@comcast.net wrote:
Hi Tejinder,
I had this problem yesterday (believe it or not :-)), and the fix for us was
to make Tomcat UTF-8 compliant. In server.xml, there is a Controller tag
Hi Devon,
Have you considered using a permuterm index? Its workable, but depending
on your requirements (size of fields that you want to create the index
on), it may bloat your index. I've written about it here:
http://sujitpal.blogspot.com/2011/10/lucene-wildcard-query-and-permuterm.html
Hi Eugene,
I proposed a solution for something similar, maybe it will help you.
http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html
-sujit
On Sat, 2011-11-05 at 16:43 -0400, Eugene Strokin wrote:
Hello,
I have a task which seems trivial, but I couldn't find any
Hi Alireza,
Would this work? Sort the results by age desc, then loop through the
results as long as age == age[0].
-sujit
On Tue, 2011-10-18 at 15:23 -0700, Otis Gospodnetic wrote:
Hi,
Are you just looking for:
age:target age
This will return all documents/records where age field is
If you use the CommonsHttpSolrServer from your client (not sure about
the other types, this is the one I use), you can pass the method as an
argument to its query() method, something like this:
QueryResponse rsp = server.query(params, METHOD.POST);
HTH
Sujit
On Fri, 2011-10-14 at 13:29 +,
be cached (see HTTP spec).
POST requests do not include the arguments in the log, which makes your HTTP
logs nearly useless for diagnosing problems.
wunder
Walter Underwood
On Oct 14, 2011, at 9:20 AM, Sujit Pal wrote:
If you use the CommonsHttpSolrServer from your client (not sure about
Hi Mouli,
I was looking at the code here, not sure why you even need to do the
sort...
After you get the DocList, couldn't you do something like this?
ListInteger topofferDocIds = new ArrayListInteger();
for (DocIterator it = ergebnis.iterator(); it.hasNext();) {
That would then return only results with top offer : true and then use
whatever shuffling / randomising you like in your application.
Alternately you could even add sorting on relevance to show the top 5
closest matches to the query rows=5sort=score desc
On 21/09/2011 21:26, Sujit Pal
I have a few blog posts on this...
http://sujitpal.blogspot.com/2011/04/custom-solr-search-components-2-dev.html
http://sujitpal.blogspot.com/2011/04/more-fun-with-solr-component.html
http://sujitpal.blogspot.com/2011/02/solr-custom-search-requesthandler.html
but its quite simple, just look at
Sorry hit send too soon. Personally, given the use case, I think I would
still prefer the two query approach. It seems way too much work to do a
handler (unless you want to learn how to do it) to support this.
On Thu, 2011-09-22 at 12:31 -0700, Sujit Pal wrote:
I have a few blog posts
Hi MOuli,
AFAIK (and I don't know that much about Solr), this feature does not
exist out of the box in Solr. One way to achieve this could be to
construct a DocSet with topoffer:true and intersect it with your result
DocSet, then select the first 5 off the intersection, randomly shuffle
them,
Would it make sense to have a Did you mean? type of functionality for
which you use the EdgeNGram and Metaphone filters /if/ you don't get
appropriate results for the user query?
So when user types cannon and the application notices that there are
no cannons for sale in the index (0 results with
Hi Ron,
There was a discussion about this some time back, which I implemented
(with great success btw) in my own code...basically you store both the
analyzed and non-analyzed versions (use string type) in the index, then
send in a query like this:
+name:clarke name_s:clarke^100
The name field
FWIW, we have some custom classes on top of solr as well. The way we do
it is using the following ant target:
target name=war depends=jar description=Rebuild Solr WAR with
custom code
mkdir dir=${maven.webapps.output}/
!-- we unwar a copy of the 3.2.0 war file in source repo --
I have done this using a custom tokenfilter that (among other things)
detects hyphenated words and converts it to the 3 variations, using a
regex match on the incoming token:
(\w+)-(\w+)
that runs the following regex transform:
s/(\w+)-(\w+)/$1$2__$1 $2/
and then splits by __ and passes the
Hi Sowmya,
I basically wrote an annotator and built a buffering tokenizer around it
so I could include it in a Lucene analyzer pipeline. I've blogged about
it, not sure if its good form to include links to blog posts in public
forums, but here they are, apologies in advance if this is wrong (let
This may or may not help you, we solved something similar based on
hyphenated words - essentially when we encountered a hyphenated word
(say word1-word2) we send in a OR query with the word (word1-word2)
itself, a phrase word1 word2~3 and the word formed by removing the
hyphen (word1word2).
But
Hi,
Sorry for the possible double post, I wrote this up but had the
incorrect sender address, so I am guessing that my previous one is going
to be rejected by the list moderation daemon.
I am trying to figure out options for the following problem. I am on
Solr 1.4.1 (Lucene 2.9.1).
I have
/solr-external-scoring/
On Thu, 2011-05-05 at 13:12 -0700, Ahmet Arslan wrote:
--- On Thu, 5/5/11, Sujit Pal sujit@comcast.net wrote:
From: Sujit Pal sujit@comcast.net
Subject: Custom sorting based on external (database) data
To: solr-user solr-user@lucene.apache.org
Date
Hi,
I am developing a SearchComponent that needs to build some initial
DocSets and then intersect with the result DocSet during each query (in
process()).
When the searcher is reopened, I need to regenerate the initial DocSets.
I am on Solr 1.4.1.
My question is, which method in
.
Would still appreciate knowing if there is a simpler way, or if I am
wildly off the mark.
Thanks
Sujit
On Thu, 2011-04-07 at 16:39 -0700, Sujit Pal wrote:
Hi,
I am developing a SearchComponent that needs to build some initial
DocSets and then intersect with the result DocSet during each query
at 20:58 -0400, Erick Erickson wrote:
I haven't built one myself, but have you considered the Solr
UserCache?
See: http://wiki.apache.org/solr/SolrCaching#User.2BAC8-Generic_Caches
It even receives warmup signals I believe...
Best
Erick
On Thu, Apr 7, 2011 at 7:39 PM, Sujit Pal
this
not
enough.
Another requirement is, when the access permission is changed, we need to
update
the field - my understanding is we can not unless re-index the whole document
again. Am I correct?
thanks,
canal
From: Sujit Pal sujit@comcast.net
Hello,
I am denormalizing a map of string,float into a single lucene document
by storing it as key1|score1 key2|score2 In Solr, I pull this in
using the following analyzer definition.
fieldtype name=payloads stored=false indexed=true
class=solr.TextField
analyzer
How about assigning content types to documents in the index, and map
users to a set of content types they are allowed to access? That way you
will pass in fewer parameters in the fq.
-sujit
On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
Morning,
We use solr to index a range of
This could probably be done using a custom QParser plugin?
Define the pattern like this:
String queryTemplate = title:%Q%^2.0 body:%Q%;
then replace the %Q% with the value of the Q param, send it through
QueryParser.parse() and return the query.
-sujit
On Wed, 2011-03-02 at 11:28 -0800, mrw
Yes, check out the field type payloads in the schema.xml file. If you
set up one or more of your fields as type payloads (you would use the
DelimitedPayloadTokenFilterFactory during indexing in your analyzer
chain), you can then use the PayloadTermQuery to query it with, scoring
can be done with a
Hi Derek,
The XML files you post to Solr needs to be in the correct Solr specific
XML format.
One way to preserve the original structure would be to flatten the
document into field names indicating the position of the text, for
example:
book_titleabbrev: Advancing Return on Investment Analysis
If the dictionary is a Lucene index, wouldn't it be as simple as delete
using a term query? Something like this:
IndexReader sdreader = new IndexReader();
sdreader.delete(new Term(word, sherri));
...
sdreader.optimize();
sdreader.close();
I am guessing your dictionary is built dynamically using
We are currently a Lucene shop, the way we do it (currently) is to have
these results come from a database table (where it is available in rank
order). We want to move to Solr, so what I plan on doing to replicate
this functionality is to write a custom request handler that will do the
database
Another option (assuming the case where a user can be granted access to
a certain class of documents, and more than one user would be able to
access certain documents) would be to store the access filter (as an OR
query of content types) in an external cache (perhaps a database or an
eternal cache
65 matches
Mail list logo