Re: filtering footer information

2012-05-23 Thread Otis Gospodnetic
I wonder if Boilerplate could be helpful here?  Boilerplate is now integrated in Tika. Otis  Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm  > > From: "Mark , N" >To: solr-user@lucene.apache.org >Sent: Thursday, May 24,

Re: Throws Null Pointer Exception Even Query is Correct in solr

2012-05-23 Thread in.abdul
Hi Dmitry , There is no out of memory execution in solr .. Thanks and Regards, S SYED ABDUL KATHER On Thu, May 24, 2012 at 1:14 AM, Dmitry Kan [via Lucene] < ml-node+s472066n3985762...@n3.nabble.com> wrote: > do you also see out of memory exception in your tomcat logs? If s

filtering footer information

2012-05-23 Thread Mark , N
Is it possible to filter certain repeated footer information from text documents while indexing to solr ? Are there any built-in filters similar to stop word filters ? -- Thanks, *Nipen Mark *

Re: Big Data Analysis and Management - 2 day Workshop

2012-05-23 Thread Dikchant Sahi
Hi Manish, The attachment seems to be missing. Would you mind sharing the same. Am a Search Engineer based in Bangalore. Would me interested in attending the workshop. Best Regards, Dikchant Sahi On Thu, May 24, 2012 at 10:22 AM, Manish Bafna wrote: > Dear Friend, > We are organizing a worksho

Fwd: Big Data Analysis and Management - 2 day Workshop

2012-05-23 Thread Manish Bafna
Dear Friend, We are organizing a workshop on Big Data. Here are details regarding the same. Please forward it to your company HR and also your friends and let me know if anyone is interested. We have early bird offer if registration is done before 31st May 2012. Big Data is one space that is buzz

Re: System requirements in my case?

2012-05-23 Thread Jan Høydahl
Well, 12000 is probably too little to do a representative sizing, but you can try an optimize() and then calculate what the size will be for 80mill docs. You'll definitely not be able to cache the whole index in memory on one server, but if you can live with that kind of performance then it's ok

Re: Dismax boost + payload boost

2012-05-23 Thread matteosilv
I finally get it working... I was compiling using a different version of solr my class 3.6, while my working solr version was the 3.5 -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-boost-payload-boost-tp3432650p3985797.html Sent from the Solr - User mailing list archiv

Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-23 Thread Yandong Yao
Hi Mark, Darren Thanks very much for your help, Will try collection for each customer then. Regards, Yandong 2012/5/22 Mark Miller > I think the key is this: you want to think of a SolrCore on a single node > Solr installation as a collection on a multi node SolrCloud installation. > > So if

Re: Many Cores with Solr

2012-05-23 Thread Mike Douglass
My interest in this is the desire to create one index per user of a system - the issue here is privacy - data indexed for one user should not be visible to other users. For this purpose solr will be hidden behind a proxy which steers authenticated sessions to the appropriat ecore. Does this seem

field compression in solr 3.6

2012-05-23 Thread pramila_tha...@ontla.ola.org
Hi Everyone, solr 3.6 does not seem to be honoring the field compress. While merging the indexes the size of Index is very big. Is there any other way to handle this to keep compression functionality? thanks, --Pramila -- View this message in context: http://lucene.472066.n3.nabble.com/fie

Re: Throws Null Pointer Exception Even Query is Correct in solr

2012-05-23 Thread Dmitry Kan
do you also see out of memory exception in your tomcat logs? If so, try setting the JVM's -Xmx to something reasonable. -- Dmitry On Wed, May 23, 2012 at 10:09 PM, in.abdul wrote: > Sorry i missed the point i am already using Method.Post Only .. Still i > could not able to execute >

Re: shard distribution of multiple collections in SolrCloud

2012-05-23 Thread Mark Miller
Yeah, currently you have to create the core on each node...we are working on a 'collections' api that will make this a simple one call operation. We should have this soon. - Mark On May 23, 2012, at 2:36 PM, Daniel Brügge wrote: > Hi, > > i am creating several cores using the following script

Re: always getting distinct count of -1 in luke response (solr4 snapshot)

2012-05-23 Thread Mike Hugo
Explicitly running an optimize on the index via the admin screens solved this problem - the correct counts are now being returned. On Tue, May 22, 2012 at 4:33 PM, Mike Hugo wrote: > We're testing a snapshot of Solr4 and I'm looking at some of the responses > from the Luke request handler. Ever

Re: Throws Null Pointer Exception Even Query is Correct in solr

2012-05-23 Thread in.abdul
Sorry i missed the point i am already using Method.Post Only .. Still i could not able to execute Thanks and Regards, S SYED ABDUL KATHER On Thu, May 24, 2012 at 12:19 AM, iorixxx [via Lucene] < ml-node+s472066n3985746...@n3.nabble.com> wrote: > > I have creteria where

Re: Multicore solr

2012-05-23 Thread Amit Jha
Please any one can help me on this Rgds AJ On 23-May-2012, at 14:37, Jens Grivolla wrote: > So are you even doing text search in Solr at all, or just using it as a > key-value store? > > If the latter, do you have your schema configured so > that only the search_id field is indexed (with a ke

Re: how to reduce the result size to 2-3 lines and expand based on user interest

2012-05-23 Thread srini
hi iorixxx, Thank you for your reply. Appreciate it. There are few areas I need little clarity. I am not using any queries. Every thing is been implemented as part of config files(schema.xml, data-config.xml, solr-config.xml). Could you give some more hints based on below config files specificatio

Re: Throws Null Pointer Exception Even Query is Correct in solr

2012-05-23 Thread Ahmet Arslan
>     I have creteria where i am passing more than > 10 ids in Query like > q=(ROWINDEX:(1 2 3 4 )) using solrJ . i had increased > the Max Boolean > clause  = 10500 and i had increased the Max Header > Size in tomcat also > in sufficient amount .. But still its is throwing Null > Poin

shard distribution of multiple collections in SolrCloud

2012-05-23 Thread Daniel Brügge
Hi, i am creating several cores using the following script. I use this for testing SolrCloud and to learn about the distribution of multiple collections. max=500 > for ((i=2; i<=$max; ++i )) ; > do > curl " > http://solrinstance1:8983/solr/admin/cores?action=CREATE&name=collection$i&collectio

Throws Null Pointer Exception Even Query is Correct in solr

2012-05-23 Thread syed kather
Team , I have creteria where i am passing more than 10 ids in Query like q=(ROWINDEX:(1 2 3 4 )) using solrJ . i had increased the Max Boolean clause = 10500 and i had increased the Max Header Size in tomcat also in sufficient amount .. But still its is throwing Null Pointer Excep

Re: configuring solr3.6 for a large intensive index only run

2012-05-23 Thread Lance Norskog
If you want to suppress merging, set the 'mergeFactor' very high. Perhaps 100. Note that Lucene opens many files (50? 100? 200?) for each segment. You would have to set the 'ulimit' for file descriptors to 'unlimited' or 'millions'. Later, you can call optimize with a 'maxSegments' value. Optimize

configuring solr3.6 for a large intensive index only run

2012-05-23 Thread Scott Preddy
I am trying to do a very large insertion (about 68million documents) into a solr instance. Our schema is pretty simple. About 40 fields using these types: We are runnin

Tips on creating a custom QueryCache?

2012-05-23 Thread Aaron Daubman
Greetings, I'm looking for pointers on where to start when creating a custom QueryCache. Our usage patterns are possibly a bit unique, so let me explain the desired use case: Our Solr index is read-only except for dedicated periods where it is updated and re-optimized. On startup, I would like t

Re: how to reduce the result size to 2-3 lines and expand based on user interest

2012-05-23 Thread Ahmet Arslan
> I am using DIH to import data from > Oracle and every thing is working fine. > the description field usually contains  more lines > (from 10-300 lines). When > I present the results through Solr/Browse It displays the > results. However I > have a requirement to show only 2-3 lines as description

Re: getTransformer error

2012-05-23 Thread watson
Anyone found a solution to the getTransformer error. I am getting the same error. Here is my output: Problem accessing /solr/JOBS/select/. Reason: getTransformer fails in getContentType java.lang.RuntimeException: getTransformer fails in getContentType at org.apache.solr.response.X

solr error when querying.

2012-05-23 Thread watson
Here is my query: http://127.0.0.1:/solr/JOBS/select/??q=Apache&wt=xslt&tr=example.xslt The response I get is the following. I have example.xslt in the /conf/xslt path. What is wrong here? Thanks! HTTP ERROR 500 Problem accessing /solr/JOBS/select/. Reason: getTransformer fails in

TermComponent and Optimize

2012-05-23 Thread Dario Rigolin
We have an issue with TermComponent on Solr 3.6 (and 3.5), using term list on filed id (unique id of documents) we receive as reply that we have multiple documents with the same id!! Doing a search only one doc is returned as expected. After more deep investigation this issue is "fixed" doing an

how to reduce the result size to 2-3 lines and expand based on user interest

2012-05-23 Thread srini
I am using DIH to import data from Oracle and every thing is working fine. the description field usually contains more lines (from 10-300 lines). When I present the results through Solr/Browse It displays the results. However I have a requirement to show only 2-3 lines as description and provide s

Re: Faceted on Similarity ?

2012-05-23 Thread Robby
Hi Lee Carrol, Will take a look on the pointers. Really appreciate your feedback, thank you. Regards, Robby On Wed, May 23, 2012 at 5:04 AM, Lee Carroll wrote: > Take a look at the clustering component > > http://wiki.apache.org/solr/ClusteringComponent > > Consider clustering off line and ind

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
> I'd guess that this is because SnowballPorterFilterFactory > does not implement MultiTermAwareComponent. Not sure, though. Yes, I think this hinders the automagically multiterm awarness to do it's job. Could an own analyzer chain with help? Like described (very, very short, too short...) here:

Re: Indexing files using multi-cores - could not fix after many retries

2012-05-23 Thread sudarshan
Thanks Gora i,t worked. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-files-using-multi-cores-could-not-fix-after-many-retries-tp3985253p3985672.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread Michael Ryan
I'd guess that this is because SnowballPorterFilterFactory does not implement MultiTermAwareComponent. Not sure, though. -Michael

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
> Maybe a filter like ISOLatin1AccentFilter that doesn't get > applied when > using wildcards? How do the terms actually appear in the index? Bär get indexed as bar. I use not ISOLatin1AccentFilter . My field def is this:

Re: Wildcard-Search Solr 3.5.0

2012-05-23 Thread Jens Grivolla
Maybe a filter like ISOLatin1AccentFilter that doesn't get applied when using wildcards? How do the terms actually appear in the index? Jens On 05/23/2012 01:19 PM, spr...@gmx.eu wrote: No one an idea? Thx. The text may contain "FooBar". When I do a wildcard search like this: "Foo*" -

Planning of future Solr setup

2012-05-23 Thread Christian von Wendt-Jensen
Hi, I'm in the middle of planning a new Solr setup. The situation is this: - We currently have one document type with around 20 fields, indexed, not stored, except for a few date fields - We currently have indexed 400M documents across 20+ shards. - The number of documents to be indexed is around

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
> -Original Message- > From: Dmitry Kan [mailto:dmitry@gmail.com] > Sent: Mittwoch, 23. Mai 2012 14:02 > To: solr-user@lucene.apache.org > Subject: Re: Wildcard-Search Solr 3.5.0 > > do umlauts arrive properly on the server side, no encoding > issues? Yes, works fine. It must, s

Re: Wildcard-Search Solr 3.5.0

2012-05-23 Thread Dmitry Kan
do umlauts arrive properly on the server side, no encoding issues? Check the query params of the response xml/json/.. set debugQuery to true as well to see if it produces any useful diagnostic info. On Wed, May 23, 2012 at 2:58 PM, wrote: > No. No hits for bä*. > It's something with the umlauts

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
No. No hits for bä*. It's something with the umlauts but I have no idea what... > -Original Message- > From: Dmitry Kan [mailto:dmitry@gmail.com] > Sent: Mittwoch, 23. Mai 2012 13:36 > To: solr-user@lucene.apache.org > Subject: Re: Wildcard-Search Solr 3.5.0 > > what about bä*->hits?

Re: Wildcard-Search Solr 3.5.0

2012-05-23 Thread Dmitry Kan
what about bä*->hits? -- Dmitry On Wed, May 23, 2012 at 2:19 PM, wrote: > No one an idea? > > Thx. > > > > > The text may contain "FooBar". > > > > > > When I do a wildcard search like this: "Foo*" - no hits. > > > When I do a wildcard search like this: "foo*" - doc is > > > found. > > > >

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
No one an idea? Thx. > > The text may contain "FooBar". > > > > When I do a wildcard search like this: "Foo*" - no hits. > > When I do a wildcard search like this: "foo*" - doc is > > found. > > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one

Re: Dismax boost + payload boost

2012-05-23 Thread matteosilv
yes, as in the linked post: public class PayloadSimilarity extends DefaultSimilarity { @Override public float scorePayload(int docId, String fieldName, int start, int end, byte[] payload, int offset, int length) { if (length > 0) { return PayloadHelper.decodeFloat(payload,

Re: Strategy for maintaining De-normalized indexes

2012-05-23 Thread Aditya
Hi Sohail, In my previous mail, I mentioned about storing categories as separate record. You should store and index Category name, MainProduct name as separate record. Index ChildProduct name, MainProduct as separate record. When you want the count, 1. Retrieve the main product name matching the

Re: Multicore solr

2012-05-23 Thread Shanu Jha
Jens, Yes we are doing text search. My question to all is, the approach of creating cores for each user is a good idea? AJ On Wed, May 23, 2012 at 2:37 PM, Jens Grivolla wrote: > So are you even doing text search in Solr at all, or just using it as a > key-value store? > > If the latter, do y

Re: Multicore solr

2012-05-23 Thread Jens Grivolla
So are you even doing text search in Solr at all, or just using it as a key-value store? If the latter, do you have your schema configured so that only the search_id field is indexed (with a keyword tokenizer) and everything else only stored? Also, are you sure that Solr is the best option as a

Re: clickable links as results?

2012-05-23 Thread Dmitry Kan
Hello, Set up a schema with at least 3 fields: id (integer or string, unique), doc_contents (of type text_en for example), link (string). Index each document into doc_contents field and use hightlights when searching (hl=true&hl.fl=doc_contents). For each hit, count how many highlights you have go

Re: Dismax boost + payload boost

2012-05-23 Thread Ahmet Arslan
> In the query debug i can see that i'm using the modified > query parser > however there aren't debug information about the payload > boosts. > I've not implemented a request handler, but i'm specifiying > all the > parameters (e.g. defType=payload plf=entityIdList...) in the > request. > What am

Re: Dismax boost + payload boost

2012-05-23 Thread matteosilv
I'm trying to get working the digitalpebble payload queryparser... *I've a multivalued field with payloads:* *with type:* In the query debug i can see that i'm using the modified query parser however there

Re: Multicore solr

2012-05-23 Thread Shanu Jha
Awaiting for suggestions. On Wed, May 23, 2012 at 8:04 AM, Amit Jha wrote: > Hi, > > Thanks for your advice. > It is basically a meta search application. Users can perform a search on N > number of data sources at a time. We broadcast Parallel search to each > selected data sources and write da