Re: less search results in prod

2011-12-03 Thread Jayendra Patil
enable debugQuery and compare the queries evaluated in the development and production environment. Regards, Jayendra On Sun, Dec 4, 2011 at 5:18 AM, alx...@aim.com wrote: Hello, I have build solr-3.4.0 data folder in dev server and copied it to prod server. Made a search for a keyword,

Re: How to change the port of post.jar

2011-11-08 Thread Jayendra Patil
You can pass the full url to post.jar as an argument. example - java -Durl=http://localhost:8080/solr/update -jar post.jar Regards, Jayendra On Wed, Nov 9, 2011 at 2:37 AM, 刘浪 liu.l...@eisoo.com wrote: Hi,     I want to use post.jar to delete index.But my port is 8080. It is 8983 default.

Re: question about Field Collapsing/ grouping

2011-09-14 Thread Jayendra Patil
. Regards Ahsan - Original Message - From: Jayendra Patil jayendra.patil@gmail.com To: solr-user@lucene.apache.org; Ahson Iqbal mianah...@yahoo.com Cc: Sent: Tuesday, September 13, 2011 10:55 AM Subject: Re: question about Field Collapsing/ grouping The time we implemented

Re: question about Field Collapsing/ grouping

2011-09-13 Thread Jayendra Patil
yup .. seems the group count feature is included now, as mentioned by Klein. Regards, Jayendra On Tue, Sep 13, 2011 at 8:27 AM, O. Klein kl...@octoweb.nl wrote: Isn't that what the parameter group.ngroups=true is for? -- View this message in context:

Re: question about Field Collapsing/ grouping

2011-09-12 Thread Jayendra Patil
The time we implemented the feature, there was no straight forward solution. What we did is to facet on the grouped by field and counting the facets. This would give you the distinct count for the groups. You may also want to check the Patch @ https://issues.apache.org/jira/browse/SOLR-2242,

Re: Accessing a doc field while working at entity level

2011-09-06 Thread Jayendra Patil
you should be able to do it using ${feed-source.last-update} You can find examples and explaination @ http://wiki.apache.org/solr/DataImportHandler Regards, Jayendra On Mon, Sep 5, 2011 at 8:02 AM, penela pen...@gmail.com wrote: Hi! This might probably be a stupid question, but I can't find

Re: Search the contents of given URL in Solr.

2011-08-30 Thread Jayendra Patil
For indexing the webpages, you can use Nutch with Solr, which would do the scarping and indexing of the page. For finding similar documents/pages you can use http://wiki.apache.org/solr/MoreLikeThis, by querying the above document (by id or search terms) and it would return similar documents from

Re: How to get all the terms in a document as Luke does?

2011-08-30 Thread Jayendra Patil
you might want to check - http://wiki.apache.org/solr/TermVectorComponent Should provide you with the term vectors with a lot of additional info. Regards, Jayendra On Tue, Aug 30, 2011 at 3:34 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, This time I'm trying to duplicate Luke's

Re: Upload doc and pdf in Solr 3.3.0

2011-08-25 Thread Jayendra Patil
http://wiki.apache.org/solr/ExtractingRequestHandler may help. Regards, Jayendra On Thu, Aug 25, 2011 at 3:24 AM, Moinsn felix.wieg...@googlemail.com wrote: Good Morning, I have to set up a Solr System to seek in documents like pdf and doc. My Solr System is running in the meantime, but i

Re: Issue in indexing Zip file content with apache-solr-3.3.0

2011-08-23 Thread Jayendra Patil
Solr doesn't index the content of the files, but just the file names. you can apply patch - https://issues.apache.org/jira/browse/SOLR-2416 https://issues.apache.org/jira/browse/SOLR-2332 Regards, Jayendra On Tue, Aug 23, 2011 at 2:26 AM, Jagdish Kumar jagdish.thapar...@hotmail.com wrote: Hi

Re: How to start troubleshooting a content extraction issue

2011-08-11 Thread Jayendra Patil
You can test the standalone content extraction with the tika-app.jar - Command to output in text format - java -jar tika-app-0.8.jar --text file_path For more options java -jar tika-app-0.8.jar --help Use the correct tika-app version jar matching the Solr build. Regards, Jayendra On Wed, Aug

Re: Possible bug in FastVectorHighlighter

2011-08-09 Thread Jayendra Patil
Try using - str name=hl.tag.pre![CDATA[b]]/str str name=hl.tag.post![CDATA[/b]]/str Regards, Jayendra On Tue, Aug 9, 2011 at 4:46 AM, Massimo Schiavon mschia...@volunia.com wrote: In my Solr (3.3) configuration I specified these two params: str name=hl.simple.pre![CDATA[b]]/str

Re: Is there anyway to sort differently for facet values?

2011-08-05 Thread Jayendra Patil
you can give it a try with the facet.sort. We had such a requirement for sorting facets by order determined by other field and had to resort to a very crude way to get through it. We pre-pended the facets values with the order in which it had to be displayed ... and used the facet.sort to sort

Re: ' invisible ' words

2011-07-14 Thread Jayendra Patil
Strange .. the only other difference that I see is the different configurations for the word delimiter filter, with the catenatewords and catenatenumbers @ index and query but it should not impact normal word searches. As others suggested, you may just want to use the same chain for both Index

Re: ' invisible ' words

2011-07-13 Thread Jayendra Patil
Hi Denis, The order of the filter during index time and query time are different e.g. the synonyms filter. Do you have a custom synonyms text file which may be causing the issues ? It usually works fine if you have the same filter order during Index and Query time. You can try out. Regards,

Re: Master Slave help

2011-06-06 Thread Jayendra Patil
Do you mean the replication happens everytime you restart the server ? If so, you would need to modify the events you want the replication to happen. Check for the replicateAfter tag and remove the startup option, if you don't need it. requestHandler name=/replication

Re: Hitting the URI limit, how to get around this?

2011-06-02 Thread Jayendra Patil
just a suggestion ... If the shards are know, you can add them as the default params in the requesthandler so they are added always. and the URL would just have the qt parameter. As the limit for uri is browser dependent. How are you querying solr .. any client api ?? through browser ?? is

Re: Extracting contents of zipped files with Tika and Solr 1.4.1 (now Solr 3.1)

2011-05-20 Thread Jayendra Patil
on this thread - if you manage to test the patches before me, let me know how you get on. Thanks and kind regards, Gary. On 11/04/2011 05:02, Jayendra Patil wrote: The migration of Tika to the latest 0.8 version seems to have reintroduced the issue. I was able to get this working again

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-10 Thread Jayendra Patil
, Gary. On 25/01/2011 16:48, Jayendra Patil wrote: Hi Gary, The latest Solr Trunk was able to extract and index the contents of the zip file using the ExtractingRequestHandler. The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and worked pretty well. Tested again

Re: Solrcore.properties

2011-03-28 Thread Jayendra Patil
Can you please attach the other files. It doesn't seem to find the enable.master property, so you may want to check the properties file exists on the box having issues We have the following configuration in the core :- Core - - solrconfig.xml - Master Slave

Re: Solr - multivalue fields - please help

2011-03-23 Thread Jayendra Patil
Just a suggestion .. You can try using dynamic fields by appending the company name (or ID) as prefix ... e.g. For data - Employee ID Employer FromDate ToDate 21345 IBM 01/01/04 01/01/06 MS 01/01/07 01/01/08 BT 01/01/09 Present Index data as :- Employee ID - 21345 Employer Name - IBM MS BT

Re: Solr coding

2011-03-23 Thread Jayendra Patil
Why not just add an extra field to the document in the Index for the user, so you can easily filter out the results on the user field and show only the documents submitted by the User. Regards, Jayendra On Wed, Mar 23, 2011 at 9:20 AM, satya swaroop satya.yada...@gmail.com wrote: Hi All,      

Re: Solr coding

2011-03-23 Thread Jayendra Patil
In that case, you may want to store the groups as multivalued fields who would have access to the document. A filter query on the user group should have the results filtered as you expect. you may also check Apache ManifoldCF as suggested by Szott. Regards, Jayendra On Wed, Mar 23, 2011 at 9:46

Re: Logic operator with dismax

2011-03-21 Thread Jayendra Patil
Dismax does not support boolean queries, you may try using Extended Dismax for the boolean support. https://issues.apache.org/jira/browse/SOLR-1553 Regards, Jayendra On Mon, Mar 21, 2011 at 8:24 AM, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: Hello, The Dismax search

Re: SOLR DIH importing MySQL text column as a BLOB

2011-03-16 Thread Jayendra Patil
Hi Kaushik, If the field is being treated as blobs, you can try using the FieldStreamDataSource mapping. This handles the blob objects to extract contents from it. This feature is available only after Solr 3.1, I suppose.

Re: docBoost

2011-03-09 Thread Jayendra Patil
you can use the ScriptTransformer to perform the boost calcualtion and addition. http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer dataConfig script![CDATA[ function f1(row) { // Add boost row.put('$docBoost',1.5);

Re: Same index is ranking differently on 2 machines

2011-03-09 Thread Jayendra Patil
queryNorm is just a normalizing factor and is the same value across all the results for a query, to just make the scores comparable. So even if it varies in different environment, you should not worried about.

Re: Same index is ranking differently on 2 machines

2011-03-09 Thread Jayendra Patil
versus 7 is dramatic for my client). This must be down to the scoring debug differences - it's the only difference I can find :( On Mar 9, 2011, at 4:34 PM, Jayendra Patil wrote: queryNorm is just a normalizing factor and is the same value across all the results for a query, to just make

Solr Cell DataImport Tika handler broken - fails to index Zip file contents

2011-03-07 Thread Jayendra Patil
Working with the latest Solr Trunk code and seems the Tika handlers for Solr Cell (ExtractingDocumentLoader.java) and Data Import handler (TikaEntityProcessor.java) fails to index the zip file contents again. It just indexes the file names again. This issue was addressed some time back, late last

Re: logical relation among filter queries

2011-03-07 Thread Jayendra Patil
you can use the boolean operators in the filter query. e.g. fq=rating:(PG-13 OR R) Regards, Jayendra On Mon, Mar 7, 2011 at 9:25 PM, cyang2010 ysxsu...@hotmail.com wrote: I wonder what is the logical relation among filter queries.  I can't find much documentation on filter query. for

Re: adding a document using curl

2011-03-03 Thread Jayendra Patil
If you are using the ExtractingRequestHandler, you can also try using the stream.file or stream.url. e.g. curl http://localhost:8080/solr/core0/update/extract?stream.file=C:/777045.zipliteral.id=777045literal.title=Testcommit=true; More detailed explaination @

Re: solr different sizes on master and slave

2011-03-02 Thread Jayendra Patil
Hi Mike, There was an issue with the Snappuller wherein it fails to clean up the old index directories on the slave side. https://issues.apache.org/jira/browse/SOLR-2156 The patch can be applied to fix the issue. You can also delete the old index directories, except for the current one which is

Re: Groupped results

2011-03-02 Thread Jayendra Patil
Hi Rok, If I understood the use case rightly, Grouping of the results are possible in Solr http://wiki.apache.org/solr/FieldCollapsing Probably, you can create new fields with the combination for the groups and use the field collapsing feature to group the results. Id Type1Type2Title

Re: solr score issue

2011-02-25 Thread Jayendra Patil
Check the Need help in understanding output of searcher.explain() function thread. http://mail-archives.apache.org/mod_mbox/lucene-java-user/201008.mbox/%3CAANLkTi=m9a1guhrahpeyqaxhu9gta9fjbnr7-8-zi...@mail.gmail.com%3E Regards, Jayendra On Fri, Feb 25, 2011 at 6:57 AM, Bagesh Sharma

Re: query slop issue

2011-02-24 Thread Jayendra Patil
qs is only the amount of slop on phrase queries explicitly specified in the q for qf fields. So only if the search q is water treatment plant, would the qs come into picture. Slop is the maximum allowable positional distance between terms to be considered a match is called slop. and distance is

Re: Problem in full query searching

2011-02-24 Thread Jayendra Patil
With dismax or extended dismax parser you should be able to achieve this. Dismax :- qf, qs, pf ps should help you to have exact control on the fields and boosts. Extended Dismax :- In addition to qf, qs, pf ps, you have pf2 and pf3 for the two and three words shingles. As Grijesh mentioned,

Re: Index MS office

2011-02-02 Thread Jayendra Patil
http://wiki.apache.org/solr/ExtractingRequestHandler Regards, Jayendra On Wed, Feb 2, 2011 at 10:49 AM, Thumuluri, Sai sai.thumul...@verizonwireless.com wrote: Good Morning,  I am planning to get started on indexing MS office using ApacheSolr - can someone please direct me where I should

Re: configure httpclient to access solr with user credential on third party host

2011-01-27 Thread Jayendra Patil
This should help HttpClient client = new HttpClient(); client.getParams().setAuthenticationPreemptive(true); AuthScope scope = new AuthScope(AuthScope.ANY_HOST,AuthScope.ANY_PORT); client.getState().setCredentials(scope, new UsernamePasswordCredentials(user, password)); Regards, Jayendra

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Jayendra Patil
Hi Gary, The latest Solr Trunk was able to extract and index the contents of the zip file using the ExtractingRequestHandler. The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and worked pretty well. Tested again with sample url and works fine - curl

Re: StopFilterFactory and qf containing some fields that use it and some that do not

2011-01-12 Thread Jayendra Patil
Have used edismax and Stopword filters as well. But usually use the fq parameter e.g. fq=title:the life and never had any issues. Can you turn on the debugQuery and check whats the Query formed for all the combinations you mentioned. Regards, Jayendra On Wed, Jan 12, 2011 at 5:19 PM, Dyer,

Re: solr wildcard queries and analyzers

2011-01-12 Thread Jayendra Patil
Had the same issues with international characters and wildcard searches. One workaround we implemented, was to index the field with and without the ASCIIFoldingFilterFactory. You would have an original field and one with english equivalent to be used during searching. Wildcard searches with

Re: Can't find source or jar for Solr class JaspellTernarySearchTrie

2011-01-12 Thread Jayendra Patil
Checkout and build the code from - https://svn.apache.org/repos/asf/lucene/dev/trunk/ Class - https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellTernarySearchTrie.java

Re: Failover setup (is this a bad idea)

2010-11-30 Thread Jayendra Patil
Rather have a Master and multiple Slave combination, with master only being used for writes and slaves used for reads. Master to Slave replication is easily configurable. Two Solr instances sharing the same index is not at all good idea with both writing to the same index. Regards, Jayendra On

Re: Extracting and indexing content from multiple binary files into a single Solr document

2010-11-17 Thread Jayendra Patil
The way we implemented the same scenario is zipping all the attachments into a single zip file which can be passed to the ExtractingRequestHandler for indexing and included as a part of single Solr document. Regards, Jayendra On Wed, Nov 17, 2010 at 6:27 AM, Gary Taylor g...@inovem.com wrote:

basic authentication for schema.url

2010-11-16 Thread Jayendra Patil
We intend to use schema.url for indexing documents. However, the remote urls are secured and would need basic authentication to be able access the document. The implementation with stream.file would mean to download the files and would cause duplicity, whereas stream.body would have indexing

Re: basic authentication for schema.url

2010-11-16 Thread Jayendra Patil
I meant stream.url Regards, Jayendra On Tue, Nov 16, 2010 at 5:37 PM, Jayendra Patil jayendra.patil@gmail.com wrote: We intend to use schema.url for indexing documents. However, the remote urls are secured and would need basic authentication to be able access the document

Re: Multiple Word Facets

2010-10-27 Thread Jayendra Patil
The Shingle Filter Breaks the words in a sentence into a combination of 2/3 words. For faceting field you should use :- field name=facet_field *type=string* indexed=true stored=true multiValued=true/ The type of the field should be *string *so that it is not tokenised at all. On Wed, Oct 27,

Re: after the slave node pull index from master, when will solr del the tmp index dir

2010-10-27 Thread Jayendra Patil
We faced the same issue. If you are executing a complete clean build, the Slave copies the complete index and just switches the pointer in the index.properties to point to the new index. directory, leaving behind the old copies. And it does not clean it up. Had logged an JIRA and patch to

Re: Solr ExtractingRequestHandler with Compressed files

2010-10-25 Thread Jayendra Patil
There was this issue with the previous version of Solr, wherein only the file names from the zip used to get indexed. We had faced the same issue and ended up using the Solr trunk which has the Tika version upgraded and works fine. The Solr version 1.4.1 should also have the fix included. Try

Re: Solr sorting problem

2010-10-21 Thread Jayendra Patil
need additional information . Sorting is easy in Solr just by passing the sort parameter However, when it comes to text sorting it depends on how you analyse and tokenize your fields Sorting does not work on fields with multiple tokens.

Re: /update/extract

2010-08-21 Thread Jayendra Patil
The Extract Request Handler invokes the classes from the extraction package. https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/extraction/src/main/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java This is package into the apache-solr-cell jar. Regards, Jayendra*

Re: How to compile nightly build?

2010-08-13 Thread Jayendra Patil
yup, The Nightly build you pointed out has pre-built code and does the include the lucene and module dependencies needed for compilation. In case you want to compile from the source You can check the code from the location @ https://svn.apache.org/repos/asf/lucene/dev/trunk/solr There are

Re: diacritics on query string

2010-08-13 Thread Jayendra Patil
*ASCIIFoldingFilter *is probably the filter known to replace the assented chars to normal ones. However i don't see that in your config. For the issue, you can easily debug the issue through solr analysis tool. Regards, Jayendra On Fri, Aug 13, 2010 at 3:20 AM, Andrea Gazzarini

Re: Hierarchical faceting

2010-08-12 Thread Jayendra Patil
We were able to get the hierarchy faceting working with a work around approach. e.g. if you have Europe//Norway//Oslo as an entry 1. Create a new multivalued field with string type field name=country_facet type=string indexed=true stored=true multiValued=true/ 2. Index the field for

Re: edismax pf2 and ps

2010-08-12 Thread Jayendra Patil
We pretty much had the same issue, ended up customizing the ExtendedDismax code. In your case its just a change of a single line addShingledPhraseQueries(query, normalClauses, phraseFields2, 2, tiebreaker, pslop); to addShingledPhraseQueries(query, normalClauses, phraseFields2,

Re: PDF file

2010-08-10 Thread Jayendra Patil
Try ... curl http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?stream.file= Full_Path_of_File/pub2009001.pdfliteral.id=777045commit=true stream.file - specify full path literal.extra params - specify any extra params if needed Regards, Jayendra On Tue, Aug 10, 2010 at 4:49 PM, Ma,

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread jayendra patil
Have got solr working in the Eclipse and deployed on Tomcat through eclipse plugin. The Crude approach, was to 1. Import the Solr war into Eclipse which will be imported as a web project and can be deployed on tomcat. 2. Add multiple source folders to the Project, linked to the checked

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread jayendra patil
The sole home is configured in the web.xml of the application which points to the folder having the conf files and the data directory env-entry env-entry-namesolr/home/env-entry-name env-entry-valueD:/multicore/env-entry-value env-entry-typejava.lang.String/env-entry-type

Re: Solrj ContentStreamUpdateRequest Slow

2010-08-04 Thread jayendra patil
ContentStreamUpdateRequest seems to read the file contents and transfer it over http, which slows down the indexing. Try Using StreamingUpdateSolrServer with stream.file param @ http://wiki.apache.org/solr/SolrPerformanceFactors#Embedded_vs_HTTP_Post e.g. SolrServer server = new

Re: query about qf defaults

2010-08-03 Thread jayendra patil
You can use appends for any additional fq paramters, which would be appended to the ones passed @ query time. Check out the sample solrconfig.xml with the solr. !-- In addition to defaults, appends params can be specified to identify values which should be appended to the list of

QueryUtils API Change - Custom ExtendedDismaxQParserPlugin accessing QueryUtils.makeQueryable throws java.lang.IllegalAccessError

2010-08-02 Thread jayendra patil
We have a custom implementation of ExtendedDismaxQParserPlugin, which we bundle into a jar and have it exposed in the multicore shared lib. The custom ExtendedDismaxQParserPlugin implementation still uses QueryUtils makeQueryable method, same as the ExtendedDismaxQParserPlugin implementation.

Document Boost with Solr Extraction - SolrContentHandler

2010-07-30 Thread jayendra patil
We are using Solr Extract Handler for indexing document metadata with attachments. (/update/extract) However, the SolrContentHandler doesn't seem to support index time document boost attribute. Probably , document.setDocumentBoost(Float.parseFloat(boost)) is missing. Regards, Jayendra