Re: Detect term occurrences

2015-09-11 Thread Sujit Pal
Hi Francisco, >> I have many drug products leaflets, each corresponding to 1 product. In the other hand we have a medical dictionary with about 10^5 terms. I want to detect all the occurrences of those terms for any leaflet document. Take a look at SolrTextTagger for this use case.

Re: Solr query which return only those docs whose all tokens are from given list

2015-05-11 Thread Sujit Pal
Hi Naresh, Couldn't you could just model this as an OR query since your requirement is at least one (but can be more than one), ie: tags:T1 tags:T2 tags:T3 -sujit On Mon, May 11, 2015 at 4:14 AM, Naresh Yadav nyadav@gmail.com wrote: Hi all, Also asked this here :

Re: Proximity Search

2015-04-30 Thread Sujit Pal
Hi Vijay, I haven't tried this myself, but perhaps you could build the two phrases as PhraseQueries and connect them up with a SpanQuery? Something like this (using your original example). PhraseQuery p1 = new PhraseQuery(); for (String word : this is phrase 1.split()) { p1.add(new

Re: Enrich search results with external data

2015-04-17 Thread Sujit Pal
about adding another Facet Component that will be executed after the standard FacetComponent. Let me know if you think we should consider other options. Thanks, -Ha -Original Message- From: sujitatgt...@gmail.com [mailto:sujitatgt...@gmail.com] On Behalf Of Sujit Pal Sent

Re: Enrich search results with external data

2015-04-11 Thread Sujit Pal
Hi Ha, I am the author of the blog post you mention. To your question, I don't know if the code will work without change (since the Lucene/Solr API has evolved so much over the last few years), but a more preferred way using Function Queries way may be found in slides for Timothy Potter's talk

Re: Get the new terms of fields since last update

2014-12-05 Thread Sujit Pal
Hi Ludovic, A bit late to the party, sorry, but here is a bit of a riff off Eric's idea. Why not store the previous terms in a Bloom filter and once you get the terms from this week, check to see if they are not in the set. Once you find the set, add them to the Bloom filter. Bloom filters are

Re: What's the most efficient way to sort by number of terms matched?

2014-11-06 Thread Sujit Pal
Hi Trey, In an application I built few years ago, I had a component that rewrote the input query into a Lucene BooleanQuery and we would set the minimumNumberShouldMatch value for the query. Worked well, but lately we are trying to move away from writing our own custom components since

Re: Query on Facet

2014-07-30 Thread Sujit Pal
Hi Smitha, Have you looked at Facet queries? It allows you to attach Solr queries to facets. The problem with this is that you will need to know all possible combinations of language and binding (or make an initial query to find this information).

Re: Implementing custom analyzer for multi-language stemming

2014-07-30 Thread Sujit Pal
Hi Eugene, In a system we built couple of years ago, we had a corpus of English and French mixed (and Spanish on the way but that was implemented by client after we handed off). We had different fields for each language. So (title, body) for English docs was (title_en, body_en), for French

Re: Any Solrj API to obtain field list?

2014-05-27 Thread Sujit Pal
Have you looked at IndexSchema? That would offer you methods to query index metadata using SolrJ. http://lucene.apache.org/solr/4_7_2/solr-core/org/apache/solr/schema/IndexSchema.html -sujit On Tue, May 27, 2014 at 1:56 PM, T. Kuro Kurosaka k...@healthline.comwrote: I'd like to write Solr

Re: How to apply Semantic Search in Solr

2014-03-11 Thread Sujit Pal
about, seems like difficult and time consuming for students like me as i will have to submit this in next 15 Days. Please suggest me something. On Tue, Mar 11, 2014 at 5:12 AM, Sujit Pal sujit@comcast.net wrote: Hi Sohan, You would be the best person to answer your question of how

Re: How to apply Semantic Search in Solr

2014-03-10 Thread Sujit Pal
Sujit and all for your views about semantic search in solr. But How do i proceed towards, i mean how do i start off the things to get on track ? On Sat, Mar 8, 2014 at 10:50 PM, Sujit Pal sujit@comcast.net wrote: Thanks for sharing this link Sohan, its an interesting approach. Since you

Re: How to apply Semantic Search in Solr

2014-03-08 Thread Sujit Pal
Thanks for sharing this link Sohan, its an interesting approach. Since you have effectively defined what you mean by Semantic Search, there are couple other approaches I know of to do something like this: 1) preprocess your documents looking for terms that co-occur in the same document. The more

Re: Multivalued true Error?

2013-11-26 Thread Sujit Pal
Hi Furkan, In the stock definition of the payload field: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/collection1/conf/schema.xml?view=markup the analyzer for payloads field type is a WhitespaceTokenizerFactory followed by a DelimitedPayloadTokenFilterFactory. So if you send

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Sujit Pal
In our case, it is because all our other applications are deployed on Tomcat and ops is familiar with the deployment process. We also had customizations that needed to go in, so we inserted our custom JAR into the solr.war's WEB-INF/lib directory, so to ops the process of deploying Solr was

Re: Solr language-dependent sort

2013-04-08 Thread SUJIT PAL
Hi Lisheng, We did something similar in Solr using a custom handler (but I think you could just build a custom QeryParser to do this), but you could do this in your application as well, ie, get the language and then rewrite your query to use the language specific fields. Come to think of it,

Re: Solr Sorting is not working properly on long Fields

2013-03-24 Thread SUJIT PAL
Hi ballusethuraman, I am sure you have done this already, but just to be sure, did you reindex your existing kilometer data after you changed the data type from string to long? If not, then you should. -sujit On Mar 23, 2013, at 11:21 PM, ballusethuraman wrote: Hi, I am having a

Re: Matching an exact word

2013-02-21 Thread SUJIT PAL
You could also do this outside Solr, in your client. If your query is surrounded by quotes, then strip away the quotes and make q=text_exact_field:your_unquoted_query. Probably better to do outside Solr in general keeping in mind the upgrade path. -sujit On Feb 21, 2013, at 12:20 PM, Van

Re: Can Solr analyze content and find dates and places

2013-02-11 Thread SUJIT PAL
/uima path. Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch can I checkout? This is the Stable release I am running: Solr 4.1.0 1434440 - sarowe - 2013-01-16 17:21:36 Regards, Bart On 8 Feb 2013, at 22:11, SUJIT PAL wrote: Hi Bart, I did some work

Re: Can Solr analyze content and find dates and places

2013-02-11 Thread SUJIT PAL
it works perfect. Best regards, Bart On 11 Feb 2013, at 20:13, SUJIT PAL wrote: Hi Bart, Like I said, I didn't actually hook my UIMA stuff into Solr, content and queries are annotated before they reach Solr. What you describe sounds like a classpath problem (but of course you already

Re: Crawl Anywhere -

2013-02-10 Thread SUJIT PAL
Hi Siva, You will probably get a better reply if you head over to the nutch mailing list [http://nutch.apache.org/mailing_lists.html] and ask there. Nutch 2.1 may be what you are looking for (stores pages in NoSQL database). Regards, Sujit On Feb 10, 2013, at 9:16 PM, SivaKarthik wrote:

Re: Can Solr analyze content and find dates and places

2013-02-08 Thread SUJIT PAL
Hi Bart, I did some work with UIMA but this was to annotate the data before it goes to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I believe you will have to set up your own aggregate analysis

Re: Per user document exclusions

2012-11-19 Thread SUJIT PAL
Hi Christian, Since customization is not a problem in your case, how about writing out the userId and excluded document ids to the database when it is excluded, and then for each query from the user (possibly identified by a userid parameter), lookup the database by userid, construct a NOT

Re: Query foreign language synonyms / words of equivalent meaning?

2012-10-10 Thread SUJIT PAL
Hi, We are using google translate to do something like what you (onlinespending) want to do, so maybe it will help. During indexing, we store the searchable fields from documents into a fields named _en, _fr, _es, etc. So assuming we capture title and body from each document, the fields are

Re: How to make SOLR manipulate the results?

2012-10-04 Thread SUJIT PAL
Hi Srilatha, One way to do this would be by making two calls, one to your sponsored list where you pick two at random and a solr call where you pick all the search results and then stick them together in your client. Sujit On Oct 4, 2012, at 12:39 AM, srilatha wrote: For an E-commerce

Re: Synonym file for American-British words

2012-08-07 Thread SUJIT PAL
Hi Alex, I implemented something similar using the rules described in this page: http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences The idea is to normalize the British spelling form to the American form during indexing and query using a tokenizer that takes in a

Re: First query to find meta data, second to search. How to group into one?

2012-05-15 Thread SUJIT PAL
Hi Samarendra, This does look like a candidate for a custom query component if you want to do this inside Solr. You can of course continue to do this at the client. -sujit On May 15, 2012, at 12:26 PM, Samarendra Pratap wrote: Hi, I need a suggestion for improving relevance of search

Re: Faceting on a date field multiple times

2012-05-04 Thread SUJIT PAL
Hi Ian, I believe you may be able to use a bunch of facet.query parameters, something like this: facet.query=yourfield:[NOW-1DAY TO NOW] facet.query=yourfield:[NOW-2DAY to NOW-1DAY] ... and so on. -sujit On May 3, 2012, at 10:41 PM, Ian Holsman wrote: Hi. I would like to be able to do a

Re: Any way to get reference to original request object from within Solr component?

2012-03-20 Thread SUJIT PAL
Hi Hoss, Thanks for the pointers, and sorry, it was a bug in my code (was some dead code which was alphabetizing the facet link text and also the parameters themselves indirectly by reference). I actually ended up building a servlet and a component to print out the multi-valued parameters

Re: Any way to get reference to original request object from within Solr component?

2012-03-18 Thread SUJIT PAL
ThreadLocal variable, thereby making it available to your Solr component. It's kind of a hack but would work. Sent from my phone On Mar 17, 2012, at 6:53 PM, SUJIT PAL sujit@comcast.net wrote: Thanks Pravesh, Yes, converting the myparam to a single (comma-separated) field is probably

Re: Any way to get reference to original request object from within Solr component?

2012-03-17 Thread SUJIT PAL
Thanks Pravesh, Yes, converting the myparam to a single (comma-separated) field is probably the best approach, but as I mentioned, this is probably a bit too late for this to be practical in my case... The myparam parameters are facet filter queries, and so far order did not matter, since

Any way to get reference to original request object from within Solr component?

2012-03-16 Thread SUJIT PAL
Hello, I have a custom component which depends on the ordering of a multi-valued parameter. Unfortunately it looks like the values do not come back in the same order as they were put in the URL. Here is some code to explain the behavior: URL:

Re: How to check if a field is a multivalue field with java

2012-02-22 Thread SUJIT PAL
Hi Thomas, With Java (from within a custom handler in Solr) you can get a handle to the IndexSchema from the request, like so: IndexSchema schema = req.getSchema(); SchemaField sf = schema.getField(fielaname); boolean isMultiValued = sf.multiValued(); From within SolrJ code, you can use

Re: How to make search with special characters in keywords

2012-02-01 Thread SUJIT PAL
Hi Tejinder, I had this problem yesterday (believe it or not :-)), and the fix for us was to make Tomcat UTF-8 compliant. In server.xml, there is a Controller tag, we added the attribute URIEncoding=UTF-8 and restarted Tomcat. Not sure what container you are using, if its Tomcat this will

Re: How to make search with special characters in keywords

2012-02-01 Thread SUJIT PAL
. But your problem space may differ. Best Erick On Wed, Feb 1, 2012 at 6:55 PM, SUJIT PAL sujit@comcast.net wrote: Hi Tejinder, I had this problem yesterday (believe it or not :-)), and the fix for us was to make Tomcat UTF-8 compliant. In server.xml, there is a Controller tag

Re: Solr, SQL Server's LIKE

2011-12-29 Thread Sujit Pal
Hi Devon, Have you considered using a permuterm index? Its workable, but depending on your requirements (size of fields that you want to create the index on), it may bloat your index. I've written about it here: http://sujitpal.blogspot.com/2011/10/lucene-wildcard-query-and-permuterm.html

Re: Dynamic rating based on Like feature

2011-11-05 Thread Sujit Pal
Hi Eugene, I proposed a solution for something similar, maybe it will help you. http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html -sujit On Sat, 2011-11-05 at 16:43 -0400, Eugene Strokin wrote: Hello, I have a task which seems trivial, but I couldn't find any

Re: Find Documents with field = maxValue

2011-10-18 Thread Sujit Pal
Hi Alireza, Would this work? Sort the results by age desc, then loop through the results as long as age == age[0]. -sujit On Tue, 2011-10-18 at 15:23 -0700, Otis Gospodnetic wrote: Hi, Are you just looking for: age:target age This will return all documents/records where age field is

Re: SolrJ + Post

2011-10-14 Thread Sujit Pal
If you use the CommonsHttpSolrServer from your client (not sure about the other types, this is the one I use), you can pass the method as an argument to its query() method, something like this: QueryResponse rsp = server.query(params, METHOD.POST); HTH Sujit On Fri, 2011-10-14 at 13:29 +,

Re: SolrJ + Post

2011-10-14 Thread Sujit Pal
be cached (see HTTP spec). POST requests do not include the arguments in the log, which makes your HTTP logs nearly useless for diagnosing problems. wunder Walter Underwood On Oct 14, 2011, at 9:20 AM, Sujit Pal wrote: If you use the CommonsHttpSolrServer from your client (not sure about

Re: Sort five random Top Offers to the top

2011-10-03 Thread Sujit Pal
Hi Mouli, I was looking at the code here, not sure why you even need to do the sort... After you get the DocList, couldn't you do something like this? ListInteger topofferDocIds = new ArrayListInteger(); for (DocIterator it = ergebnis.iterator(); it.hasNext();) {

Re: Sort five random Top Offers to the top

2011-09-22 Thread Sujit Pal
That would then return only results with top offer : true and then use whatever shuffling / randomising you like in your application. Alternately you could even add sorting on relevance to show the top 5 closest matches to the query rows=5sort=score desc On 21/09/2011 21:26, Sujit Pal

Re: Sort five random Top Offers to the top

2011-09-22 Thread Sujit Pal
I have a few blog posts on this... http://sujitpal.blogspot.com/2011/04/custom-solr-search-components-2-dev.html http://sujitpal.blogspot.com/2011/04/more-fun-with-solr-component.html http://sujitpal.blogspot.com/2011/02/solr-custom-search-requesthandler.html but its quite simple, just look at

Re: Sort five random Top Offers to the top

2011-09-22 Thread Sujit Pal
Sorry hit send too soon. Personally, given the use case, I think I would still prefer the two query approach. It seems way too much work to do a handler (unless you want to learn how to do it) to support this. On Thu, 2011-09-22 at 12:31 -0700, Sujit Pal wrote: I have a few blog posts

Re: Sort five random Top Offers to the top

2011-09-21 Thread Sujit Pal
Hi MOuli, AFAIK (and I don't know that much about Solr), this feature does not exist out of the box in Solr. One way to achieve this could be to construct a DocSet with topoffer:true and intersect it with your result DocSet, then select the first 5 off the intersection, randomly shuffle them,

Re: Too many results in dismax queries with one word

2011-08-21 Thread Sujit Pal
Would it make sense to have a Did you mean? type of functionality for which you use the EdgeNGram and Metaphone filters /if/ you don't get appropriate results for the user query? So when user types cannon and the application notices that there are no cannons for sale in the index (0 results with

Re: Exact matching on names?

2011-08-16 Thread Sujit Pal
Hi Ron, There was a discussion about this some time back, which I implemented (with great success btw) in my own code...basically you store both the analyzed and non-analyzed versions (use string type) in the index, then send in a query like this: +name:clarke name_s:clarke^100 The name field

Re: Problems generating war distribution using ant

2011-08-16 Thread Sujit Pal
FWIW, we have some custom classes on top of solr as well. The way we do it is using the following ant target: target name=war depends=jar description=Rebuild Solr WAR with custom code mkdir dir=${maven.webapps.output}/ !-- we unwar a copy of the 3.2.0 war file in source repo --

Re: Strip special chars like -

2011-08-09 Thread Sujit Pal
I have done this using a custom tokenfilter that (among other things) detects hyphenated words and converts it to the 3 variations, using a regex match on the incoming token: (\w+)-(\w+) that runs the following regex transform: s/(\w+)-(\w+)/$1$2__$1 $2/ and then splits by __ and passes the

Re: (Solr-UIMA) Doubt regarding integrating UIMA in to solr - Configuration.

2011-07-08 Thread Sujit Pal
Hi Sowmya, I basically wrote an annotator and built a buffering tokenizer around it so I could include it in a Lucene analyzer pipeline. I've blogged about it, not sure if its good form to include links to blog posts in public forums, but here they are, apologies in advance if this is wrong (let

Re: Results with and without whitspace(soccer club and soccerclub)

2011-05-20 Thread Sujit Pal
This may or may not help you, we solved something similar based on hyphenated words - essentially when we encountered a hyphenated word (say word1-word2) we send in a OR query with the word (word1-word2) itself, a phrase word1 word2~3 and the word formed by removing the hyphen (word1word2). But

Custom sorting based on external (database) data

2011-05-05 Thread Sujit Pal
Hi, Sorry for the possible double post, I wrote this up but had the incorrect sender address, so I am guessing that my previous one is going to be rejected by the list moderation daemon. I am trying to figure out options for the following problem. I am on Solr 1.4.1 (Lucene 2.9.1). I have

Re: Custom sorting based on external (database) data

2011-05-05 Thread Sujit Pal
/solr-external-scoring/ On Thu, 2011-05-05 at 13:12 -0700, Ahmet Arslan wrote: --- On Thu, 5/5/11, Sujit Pal sujit@comcast.net wrote: From: Sujit Pal sujit@comcast.net Subject: Custom sorting based on external (database) data To: solr-user solr-user@lucene.apache.org Date

Hook to do stuff when searcher is reopened?

2011-04-07 Thread Sujit Pal
Hi, I am developing a SearchComponent that needs to build some initial DocSets and then intersect with the result DocSet during each query (in process()). When the searcher is reopened, I need to regenerate the initial DocSets. I am on Solr 1.4.1. My question is, which method in

Re: Hook to do stuff when searcher is reopened?

2011-04-07 Thread Sujit Pal
. Would still appreciate knowing if there is a simpler way, or if I am wildly off the mark. Thanks Sujit On Thu, 2011-04-07 at 16:39 -0700, Sujit Pal wrote: Hi, I am developing a SearchComponent that needs to build some initial DocSets and then intersect with the result DocSet during each query

Re: Hook to do stuff when searcher is reopened?

2011-04-07 Thread Sujit Pal
at 20:58 -0400, Erick Erickson wrote: I haven't built one myself, but have you considered the Solr UserCache? See: http://wiki.apache.org/solr/SolrCaching#User.2BAC8-Generic_Caches It even receives warmup signals I believe... Best Erick On Thu, Apr 7, 2011 at 7:39 PM, Sujit Pal

Re: Solr and Permissions

2011-03-11 Thread Sujit Pal
this not enough. Another requirement is, when the access permission is changed, we need to update the field - my understanding is we can not unless re-index the whole document again. Am I correct? thanks, canal From: Sujit Pal sujit@comcast.net

Any way to do payload queries in Luke?

2011-03-11 Thread Sujit Pal
Hello, I am denormalizing a map of string,float into a single lucene document by storing it as key1|score1 key2|score2 In Solr, I pull this in using the following analyzer definition. fieldtype name=payloads stored=false indexed=true class=solr.TextField analyzer

Re: Solr and Permissions

2011-03-10 Thread Sujit Pal
How about assigning content types to documents in the index, and map users to a set of content types they are allowed to access? That way you will pass in fewer parameters in the fq. -sujit On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote: Morning, We use solr to index a range of

Re: Understanding multi-field queries with q and fq

2011-03-02 Thread Sujit Pal
This could probably be done using a custom QParser plugin? Define the pattern like this: String queryTemplate = title:%Q%^2.0 body:%Q%; then replace the %Q% with the value of the Q param, send it through QueryParser.parse() and return the query. -sujit On Wed, 2011-03-02 at 11:28 -0800, mrw

Re: Solr Payloads retrieval

2011-02-28 Thread Sujit Pal
Yes, check out the field type payloads in the schema.xml file. If you set up one or more of your fields as type payloads (you would use the DelimitedPayloadTokenFilterFactory during indexing in your analyzer chain), you can then use the PayloadTermQuery to query it with, scoring can be done with a

Re: loading XML docbook files into solr

2011-02-26 Thread Sujit Pal
Hi Derek, The XML files you post to Solr needs to be in the correct Solr specific XML format. One way to preserve the original structure would be to flatten the document into field names indicating the position of the text, for example: book_titleabbrev: Advancing Return on Investment Analysis

Re: manually editing spellcheck dictionary

2011-02-25 Thread Sujit Pal
If the dictionary is a Lucene index, wouldn't it be as simple as delete using a term query? Something like this: IndexReader sdreader = new IndexReader(); sdreader.delete(new Term(word, sherri)); ... sdreader.optimize(); sdreader.close(); I am guessing your dictionary is built dynamically using

Re: boosting results by a query?

2011-02-11 Thread Sujit Pal
We are currently a Lucene shop, the way we do it (currently) is to have these results come from a database table (where it is available in rank order). We want to move to Solr, so what I plan on doing to replicate this functionality is to write a custom request handler that will do the database

Re: Architecture decisions with Solr

2011-02-09 Thread Sujit Pal
Another option (assuming the case where a user can be granted access to a certain class of documents, and more than one user would be able to access certain documents) would be to store the access filter (as an OR query of content types) in an external cache (perhaps a database or an eternal cache