enable debugQuery and compare the queries evaluated in the development
and production environment.
Regards,
Jayendra
On Sun, Dec 4, 2011 at 5:18 AM, alx...@aim.com wrote:
Hello,
I have build solr-3.4.0 data folder in dev server and copied it to prod
server. Made a search for a keyword,
You can pass the full url to post.jar as an argument.
example -
java -Durl=http://localhost:8080/solr/update -jar post.jar
Regards,
Jayendra
On Wed, Nov 9, 2011 at 2:37 AM, 刘浪 liu.l...@eisoo.com wrote:
Hi,
I want to use post.jar to delete index.But my port is 8080. It is 8983
default.
.
Regards
Ahsan
- Original Message -
From: Jayendra Patil jayendra.patil@gmail.com
To: solr-user@lucene.apache.org; Ahson Iqbal mianah...@yahoo.com
Cc:
Sent: Tuesday, September 13, 2011 10:55 AM
Subject: Re: question about Field Collapsing/ grouping
The time we implemented
yup .. seems the group count feature is included now, as mentioned by Klein.
Regards,
Jayendra
On Tue, Sep 13, 2011 at 8:27 AM, O. Klein kl...@octoweb.nl wrote:
Isn't that what the parameter group.ngroups=true is for?
--
View this message in context:
The time we implemented the feature, there was no straight forward solution.
What we did is to facet on the grouped by field and counting the facets.
This would give you the distinct count for the groups.
You may also want to check the Patch @
https://issues.apache.org/jira/browse/SOLR-2242,
you should be able to do it using ${feed-source.last-update}
You can find examples and explaination @
http://wiki.apache.org/solr/DataImportHandler
Regards,
Jayendra
On Mon, Sep 5, 2011 at 8:02 AM, penela pen...@gmail.com wrote:
Hi!
This might probably be a stupid question, but I can't find
For indexing the webpages, you can use Nutch with Solr, which would do
the scarping and indexing of the page.
For finding similar documents/pages you can use
http://wiki.apache.org/solr/MoreLikeThis, by querying the above
document (by id or search terms) and it would return similar documents
from
you might want to check - http://wiki.apache.org/solr/TermVectorComponent
Should provide you with the term vectors with a lot of additional info.
Regards,
Jayendra
On Tue, Aug 30, 2011 at 3:34 AM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
Hello,
This time I'm trying to duplicate Luke's
http://wiki.apache.org/solr/ExtractingRequestHandler may help.
Regards,
Jayendra
On Thu, Aug 25, 2011 at 3:24 AM, Moinsn felix.wieg...@googlemail.com wrote:
Good Morning,
I have to set up a Solr System to seek in documents like pdf and doc. My
Solr System is running in the meantime, but i
Solr doesn't index the content of the files, but just the file names.
you can apply patch -
https://issues.apache.org/jira/browse/SOLR-2416
https://issues.apache.org/jira/browse/SOLR-2332
Regards,
Jayendra
On Tue, Aug 23, 2011 at 2:26 AM, Jagdish Kumar
jagdish.thapar...@hotmail.com wrote:
Hi
You can test the standalone content extraction with the tika-app.jar -
Command to output in text format -
java -jar tika-app-0.8.jar --text file_path
For more options java -jar tika-app-0.8.jar --help
Use the correct tika-app version jar matching the Solr build.
Regards,
Jayendra
On Wed, Aug
Try using -
str name=hl.tag.pre![CDATA[b]]/str
str name=hl.tag.post![CDATA[/b]]/str
Regards,
Jayendra
On Tue, Aug 9, 2011 at 4:46 AM, Massimo Schiavon mschia...@volunia.com wrote:
In my Solr (3.3) configuration I specified these two params:
str name=hl.simple.pre![CDATA[b]]/str
you can give it a try with the facet.sort.
We had such a requirement for sorting facets by order determined by
other field and had to resort to a very crude way to get through it.
We pre-pended the facets values with the order in which it had to be
displayed ... and used the facet.sort to sort
Strange .. the only other difference that I see is the different
configurations for the word delimiter filter, with the catenatewords
and catenatenumbers @ index and query but it should not impact normal
word searches.
As others suggested, you may just want to use the same chain for both
Index
Hi Denis,
The order of the filter during index time and query time are different
e.g. the synonyms filter.
Do you have a custom synonyms text file which may be causing the issues ?
It usually works fine if you have the same filter order during Index
and Query time. You can try out.
Regards,
Do you mean the replication happens everytime you restart the server ?
If so, you would need to modify the events you want the replication to happen.
Check for the replicateAfter tag and remove the startup option, if you
don't need it.
requestHandler name=/replication
just a suggestion ...
If the shards are know, you can add them as the default params in the
requesthandler so they are added always. and the URL would just have
the qt parameter.
As the limit for uri is browser dependent.
How are you querying solr .. any client api ?? through browser ??
is
on this thread - if you manage to test the patches before me, let me know
how you get on.
Thanks and kind regards,
Gary.
On 11/04/2011 05:02, Jayendra Patil wrote:
The migration of Tika to the latest 0.8 version seems to have
reintroduced the issue.
I was able to get this working again
,
Gary.
On 25/01/2011 16:48, Jayendra Patil wrote:
Hi Gary,
The latest Solr Trunk was able to extract and index the contents of the
zip
file using the ExtractingRequestHandler.
The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and
worked pretty well.
Tested again
Can you please attach the other files.
It doesn't seem to find the enable.master property, so you may want to
check the properties file exists on the box having issues
We have the following configuration in the core :-
Core -
- solrconfig.xml - Master Slave
Just a suggestion ..
You can try using dynamic fields by appending the company name (or ID)
as prefix ... e.g.
For data -
Employee ID Employer FromDate ToDate
21345
IBM 01/01/04 01/01/06
MS 01/01/07 01/01/08
BT 01/01/09 Present
Index data as :-
Employee ID - 21345
Employer Name - IBM MS BT
Why not just add an extra field to the document in the Index for the
user, so you can easily filter out the results on the user field and
show only the documents submitted by the User.
Regards,
Jayendra
On Wed, Mar 23, 2011 at 9:20 AM, satya swaroop satya.yada...@gmail.com wrote:
Hi All,
In that case, you may want to store the groups as multivalued fields
who would have access to the document.
A filter query on the user group should have the results filtered as
you expect.
you may also check Apache ManifoldCF as suggested by Szott.
Regards,
Jayendra
On Wed, Mar 23, 2011 at 9:46
Dismax does not support boolean queries, you may try using Extended
Dismax for the boolean support.
https://issues.apache.org/jira/browse/SOLR-1553
Regards,
Jayendra
On Mon, Mar 21, 2011 at 8:24 AM, Savvas-Andreas Moysidis
savvas.andreas.moysi...@googlemail.com wrote:
Hello,
The Dismax search
Hi Kaushik,
If the field is being treated as blobs, you can try using the
FieldStreamDataSource mapping.
This handles the blob objects to extract contents from it.
This feature is available only after Solr 3.1, I suppose.
you can use the ScriptTransformer to perform the boost calcualtion and addition.
http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer
dataConfig
script![CDATA[
function f1(row) {
// Add boost
row.put('$docBoost',1.5);
queryNorm is just a normalizing factor and is the same value across
all the results for a query, to just make the scores comparable.
So even if it varies in different environment, you should not worried about.
versus 7 is dramatic for my client). This must be down to the
scoring debug differences - it's the only difference I can find :(
On Mar 9, 2011, at 4:34 PM, Jayendra Patil wrote:
queryNorm is just a normalizing factor and is the same value across
all the results for a query, to just make
Working with the latest Solr Trunk code and seems the Tika handlers
for Solr Cell (ExtractingDocumentLoader.java) and Data Import handler
(TikaEntityProcessor.java) fails to index the zip file contents again.
It just indexes the file names again.
This issue was addressed some time back, late last
you can use the boolean operators in the filter query.
e.g. fq=rating:(PG-13 OR R)
Regards,
Jayendra
On Mon, Mar 7, 2011 at 9:25 PM, cyang2010 ysxsu...@hotmail.com wrote:
I wonder what is the logical relation among filter queries. I can't find
much documentation on filter query.
for
If you are using the ExtractingRequestHandler, you can also try using
the stream.file or stream.url.
e.g. curl
http://localhost:8080/solr/core0/update/extract?stream.file=C:/777045.zipliteral.id=777045literal.title=Testcommit=true;
More detailed explaination @
Hi Mike,
There was an issue with the Snappuller wherein it fails to clean up
the old index directories on the slave side.
https://issues.apache.org/jira/browse/SOLR-2156
The patch can be applied to fix the issue.
You can also delete the old index directories, except for the current
one which is
Hi Rok,
If I understood the use case rightly, Grouping of the results are
possible in Solr http://wiki.apache.org/solr/FieldCollapsing
Probably, you can create new fields with the combination for the
groups and use the field collapsing feature to group the results.
Id Type1Type2Title
Check the Need help in understanding output of searcher.explain()
function thread.
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201008.mbox/%3CAANLkTi=m9a1guhrahpeyqaxhu9gta9fjbnr7-8-zi...@mail.gmail.com%3E
Regards,
Jayendra
On Fri, Feb 25, 2011 at 6:57 AM, Bagesh Sharma
qs is only the amount of slop on phrase queries explicitly specified
in the q for qf fields.
So only if the search q is water treatment plant, would the qs come
into picture.
Slop is the maximum allowable positional distance between terms to be
considered a match is called slop.
and distance is
With dismax or extended dismax parser you should be able to achieve this.
Dismax :- qf, qs, pf ps should help you to have exact control on the
fields and boosts.
Extended Dismax :- In addition to qf, qs, pf ps, you have pf2 and
pf3 for the two and three words shingles.
As Grijesh mentioned,
http://wiki.apache.org/solr/ExtractingRequestHandler
Regards,
Jayendra
On Wed, Feb 2, 2011 at 10:49 AM, Thumuluri, Sai
sai.thumul...@verizonwireless.com wrote:
Good Morning,
I am planning to get started on indexing MS office using ApacheSolr -
can someone please direct me where I should
This should help
HttpClient client = new HttpClient();
client.getParams().setAuthenticationPreemptive(true);
AuthScope scope = new AuthScope(AuthScope.ANY_HOST,AuthScope.ANY_PORT);
client.getState().setCredentials(scope, new
UsernamePasswordCredentials(user, password));
Regards,
Jayendra
Hi Gary,
The latest Solr Trunk was able to extract and index the contents of the zip
file using the ExtractingRequestHandler.
The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and
worked pretty well.
Tested again with sample url and works fine -
curl
Have used edismax and Stopword filters as well. But usually use the fq
parameter e.g. fq=title:the life and never had any issues.
Can you turn on the debugQuery and check whats the Query formed for all the
combinations you mentioned.
Regards,
Jayendra
On Wed, Jan 12, 2011 at 5:19 PM, Dyer,
Had the same issues with international characters and wildcard searches.
One workaround we implemented, was to index the field with and without the
ASCIIFoldingFilterFactory.
You would have an original field and one with english equivalent to be used
during searching.
Wildcard searches with
Checkout and build the code from -
https://svn.apache.org/repos/asf/lucene/dev/trunk/
Class -
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellTernarySearchTrie.java
Rather have a Master and multiple Slave combination, with master only being
used for writes and slaves used for reads.
Master to Slave replication is easily configurable.
Two Solr instances sharing the same index is not at all good idea with both
writing to the same index.
Regards,
Jayendra
On
The way we implemented the same scenario is zipping all the attachments into
a single zip file which can be passed to the ExtractingRequestHandler for
indexing and included as a part of single Solr document.
Regards,
Jayendra
On Wed, Nov 17, 2010 at 6:27 AM, Gary Taylor g...@inovem.com wrote:
We intend to use schema.url for indexing documents. However, the remote urls
are secured and would need basic authentication to be able access the
document.
The implementation with stream.file would mean to download the files and
would cause duplicity, whereas stream.body would have indexing
I meant stream.url
Regards,
Jayendra
On Tue, Nov 16, 2010 at 5:37 PM, Jayendra Patil
jayendra.patil@gmail.com wrote:
We intend to use schema.url for indexing documents. However, the remote
urls are secured and would need basic authentication to be able access the
document
The Shingle Filter Breaks the words in a sentence into a combination of 2/3
words.
For faceting field you should use :-
field name=facet_field *type=string* indexed=true stored=true
multiValued=true/
The type of the field should be *string *so that it is not tokenised at all.
On Wed, Oct 27,
We faced the same issue.
If you are executing a complete clean build, the Slave copies the complete
index and just switches the pointer in the index.properties to point to the
new index. directory, leaving behind the old copies. And it does not
clean it up.
Had logged an JIRA and patch to
There was this issue with the previous version of Solr, wherein only the
file names from the zip used to get indexed.
We had faced the same issue and ended up using the Solr trunk which has the
Tika version upgraded and works fine.
The Solr version 1.4.1 should also have the fix included. Try
need additional information .
Sorting is easy in Solr just by passing the sort parameter
However, when it comes to text sorting it depends on how you analyse
and tokenize your fields
Sorting does not work on fields with multiple tokens.
The Extract Request Handler invokes the classes from the extraction package.
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/extraction/src/main/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java
This is package into the apache-solr-cell jar.
Regards,
Jayendra*
yup, The Nightly build you pointed out has pre-built code and does the
include the lucene and module dependencies needed for compilation.
In case you want to compile from the source
You can check the code from the location @
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr
There are
*ASCIIFoldingFilter *is probably the filter known to replace the assented
chars to normal ones. However i don't see that in your config.
For the issue, you can easily debug the issue through solr analysis tool.
Regards,
Jayendra
On Fri, Aug 13, 2010 at 3:20 AM, Andrea Gazzarini
We were able to get the hierarchy faceting working with a work around
approach.
e.g. if you have Europe//Norway//Oslo as an entry
1. Create a new multivalued field with string type
field name=country_facet type=string indexed=true stored=true
multiValued=true/
2. Index the field for
We pretty much had the same issue, ended up customizing the ExtendedDismax
code.
In your case its just a change of a single line
addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
tiebreaker, pslop);
to
addShingledPhraseQueries(query, normalClauses, phraseFields2,
Try ...
curl
http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?stream.file=
Full_Path_of_File/pub2009001.pdfliteral.id=777045commit=true
stream.file - specify full path
literal.extra params - specify any extra params if needed
Regards,
Jayendra
On Tue, Aug 10, 2010 at 4:49 PM, Ma,
Have got solr working in the Eclipse and deployed on Tomcat through eclipse
plugin.
The Crude approach, was to
1. Import the Solr war into Eclipse which will be imported as a web
project and can be deployed on tomcat.
2. Add multiple source folders to the Project, linked to the checked
The sole home is configured in the web.xml of the application which points
to the folder having the conf files and the data directory
env-entry
env-entry-namesolr/home/env-entry-name
env-entry-valueD:/multicore/env-entry-value
env-entry-typejava.lang.String/env-entry-type
ContentStreamUpdateRequest seems to read the file contents and transfer it
over http, which slows down the indexing.
Try Using StreamingUpdateSolrServer with stream.file param @
http://wiki.apache.org/solr/SolrPerformanceFactors#Embedded_vs_HTTP_Post
e.g.
SolrServer server = new
You can use appends for any additional fq paramters, which would be appended
to the ones passed @ query time.
Check out the sample solrconfig.xml with the solr.
!-- In addition to defaults, appends params can be specified
to identify values which should be appended to the list of
We have a custom implementation of ExtendedDismaxQParserPlugin, which we
bundle into a jar and have it exposed in the multicore shared lib.
The custom ExtendedDismaxQParserPlugin implementation still uses QueryUtils
makeQueryable method, same as
the ExtendedDismaxQParserPlugin implementation.
We are using Solr Extract Handler for indexing document metadata with
attachments. (/update/extract)
However, the SolrContentHandler doesn't seem to support index time document
boost attribute.
Probably , document.setDocumentBoost(Float.parseFloat(boost)) is missing.
Regards,
Jayendra
62 matches
Mail list logo