Re: DIH template multivalued fields

2010-03-12 Thread blargy
I was actually able to accomplish (althought not pretty) what I wanted using a regex transformer. blargy wrote: > > How can I manually specify a static multiple value field in the > DataImportHandler? > > I finally figured out the answer of how to statically define a value from > t

Re: DIH template multivalued fields

2010-03-12 Thread Shalin Shekhar Mangar
On Sat, Mar 13, 2010 at 3:57 AM, blargy wrote: > > How can I manually specify a static multiple value field in the > DataImportHandler? > > I finally figured out the answer of how to statically define a value from > this FAQ: http://wiki.apache.org/solr/DataImportHandlerFaq which basically > stat

Re: Spellcheck vs SpellShingle is there a conflict?

2010-03-12 Thread Shalin Shekhar Mangar
On Sat, Mar 13, 2010 at 3:03 AM, Barnett, Jeffrey wrote: > Can I use both for a single index? > > Here is my Solrconfig.xml: > class="org.apache.solr.handler.component.SpellCheckComponent"> > > default > spellingShingle > 0.75 > ./spellShingle > textSpellShingle >

Re: AutoSuggest

2010-03-12 Thread Shalin Shekhar Mangar
On Sat, Mar 13, 2010 at 9:30 AM, Suram wrote: > > Erick Erickson wrote: > > > > Did you commit your changes? > > > > Erick > > > > On Fri, Mar 12, 2010 at 7:38 AM, Suram wrote: > > > >> > >> Can set my index fields for auto Suggestion, sometime the newly index > >> field > >> not found for auto

Re: How to edit / compile the SOLR source code

2010-03-12 Thread Trey
Or, (as Joe Calderon said in the apparent sibling thread) you can just type "ant clean dist" if you want to verifiably blow away the old jars and replace them with the new jars/war to deploy. -Trey On Sat, Mar 13, 2010 at 12:03 AM, Trey wrote: > Hmm... sorry for the bad link... looks like someh

Re: How to edit / compile the SOLR source code

2010-03-12 Thread Trey
Hmm... sorry for the bad link... looks like somehow I inserted a bogus letter: "artiicles" instead of "articles" in the url. http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse At any rate, are you using Ant to compile? Literally all you have to do is run the com

Re: Need help in deploying the modified SOLR source code

2010-03-12 Thread Joe Calderon
do `ant clean dist` within the solr source and use the resulting war file, though in the future you might think about extending the built in parser and creating a parser plugin rather than modifying the actual sources see http://wiki.apache.org/solr/SolrPlugins#QParserPlugin for more info --jo

Re: AutoSuggest

2010-03-12 Thread Suram
Erick Erickson wrote: > > Did you commit your changes? > > Erick > > On Fri, Mar 12, 2010 at 7:38 AM, Suram wrote: > >> >> Can set my index fields for auto Suggestion, sometime the newly index >> field >> not found for auto suggestion and index search >> -- >> View this message in context:

Need help in deploying the modified SOLR source code

2010-03-12 Thread JavaGuy84
Hi, I had made some changes to solrqueryparser.java using Eclipse and I am able to do a leading wildcard search using Jetty plugin (downloaded this plugin for eclipse).. Now I am not sure how I can package this code and redploy it. Can someone help me out please? Thanks, B -- View this message

Hardware Recommendations

2010-03-12 Thread blargy
Ill have about 5m documents indexed (ranging in size) with an expected amount of searches to be between 750k and 1m per day. Ill be using a master/slave setup with an unknown number of slaves. What hardware requirements would you recommend/suggest? Thoughts? -- View this message in context:

GC performance: 1.3 vs 1.4

2010-03-12 Thread Trey Hyde
I messed with a perfectly good 1.3 setup and installed Solr 1.4 in the hope of seeing better faceting performance. Nothing in the release notes or migrating section scared me off.I stuck with it a little too long and now due to the costs of rebuilding the time delta on the index I am commit

Re: How to get Term Positions?

2010-03-12 Thread MitchK
Thank you both for your responses. However, I am not familiar enough with Solr and even not with Lucene. So, at the moment, I have no real idea of what payloads are (I can't even translate this word...). The manual says something about "metadata" - but there is nothing said about what metadata t

Re: Trouble Implementing Extracting Request Handler

2010-03-12 Thread Steve Reichgut
Hi Grant, Thanks for the feedback. In reading the Wiki, it recommended that you copy everything from example/solr/libs directory into a /libs directory in your instance. I went into my example/solr directory and only see two directories - "bin" and "conf". There is no "libs" directory. Where el

DIH template multivalued fields

2010-03-12 Thread blargy
How can I manually specify a static multiple value field in the DataImportHandler? I finally figured out the answer of how to statically define a value from this FAQ: http://wiki.apache.org/solr/DataImportHandlerFaq which basically states to use the TemplateTransformer. My question is what do I

Re: How to get Term Positions?

2010-03-12 Thread Tommy Chheng
I contributed a little reward to whoever can complete this task too http://nextsprocket.com/tasks/solr-1337-spans-and-payloads-query-support-asf-jira Feel free to contribute to the reward if you need this done too! Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http

Re: Trouble Implementing Extracting Request Handler

2010-03-12 Thread Grant Ingersoll
On Mar 12, 2010, at 2:20 PM, Steve Reichgut wrote: > Now that I have configured my Solr instance for standard indexing, I wanted > to start indexing PDF's, MS Doc's, etc. When I tried to test it with a simple > PDF file, I got the following error: > > org.apache.solr.common.SolrException: la

Re: How to get Term Positions?

2010-03-12 Thread Grant Ingersoll
OK, you need https://issues.apache.org/jira/browse/SOLR-1337 and it's related item: https://issues.apache.org/jira/browse/SOLR-1485 Unfortunately, not implemented yet. On Mar 12, 2010, at 1:36 PM, MitchK wrote: > > Thanks for your response, Grant! > > Imagine you are searching for "foo". > "f

DIH: using variables in nested entities

2010-03-12 Thread Tricia Williams
Hi All, The DataImportHandler is the most fantastic thing that has recently come to Solr. Thank you. I'm noticing that when I use variables in nested entities that square brackets are wrapped around the variable value when they are used. For example ${x.url} used in the "tika" entity

Spellcheck vs SpellShingle is there a conflict?

2010-03-12 Thread Barnett, Jeffrey
Can I use both for a single index? Here is my Solrconfig.xml: default spellingShingle 0.75 ./spellShingle textSpellShingle true basicSpell spelling 0.75 ./spellchecker textSpell true What happens is th

Trouble Implementing Extracting Request Handler

2010-03-12 Thread Steve Reichgut
Now that I have configured my Solr instance for standard indexing, I wanted to start indexing PDF's, MS Doc's, etc. When I tried to test it with a simple PDF file, I got the following error: org.apache.solr.common.SolrException: lazy loading error Caused by: org.apache.solr.common.SolrExc

Re: java.lang.OutOfMemoryError, VM may need to be forcibly terminated

2010-03-12 Thread Tom Hill
Hi - The best way is probably to add more ram. :-) That error apparently results from running out of perm gen space, and with 512m, you may not have much perm gen space. Options for increasing this can be found http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp But, if you don't have

Re: How to get Term Positions?

2010-03-12 Thread MitchK
Thanks for your response, Grant! Imagine you are searching for "foo". "foor" occurs in doc1 three times. It is the 5th, the 20th, and the 50th term in the document. I want to get these positions. Of course, if I am searching for "foo bar" and "bar" occurs at the 4th and the 21th position, I also

Re: DIH field options

2010-03-12 Thread blargy
Im still having a problem with this... for example I would assume this would index the value Item into the field called type However I receive this error when starting up Solr: Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Field must have a co

Re: How to get Term Positions?

2010-03-12 Thread Grant Ingersoll
What TermPositions do you want? On a per doc basis or just in general for the index? I think the TermsComponent could add the latter. The former is only possible via TermVectors. -Grant On Mar 12, 2010, at 12:46 PM, MitchK wrote: > > Hello community, > > is it possible to get TermPosition

java.lang.OutOfMemoryError, VM may need to be forcibly terminated

2010-03-12 Thread Oleg Burlaca
Hello, I've searched the list for this kind of error but never find one that is similar to my case: Java HotSpot(TM) Client VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGTERM to handler- the VM may need to be forcibly terminated I use the latest st

Re: Solr Performance Issues

2010-03-12 Thread Erick Erickson
Sounds like you're pretty well on your way then. This is pretty typical of multi-threaded situations... Threads 1-n wait around on I/O and increasing the number of threads increases throughput without changing (much) the individual response time. Threads n+1 - p don't change throughput much, but i

Getting Term Positions

2010-03-12 Thread MitchK
Hello community, is it possible to get TermPositions without a TermVector? If yes, how can I do so? If such a feature is not yet implemented in Solr, it would be interesting how to do so with Lucene. I don't want to use a TermVector, because I have read somewhere that Lucene stores the TermPosit

Re: DIH field options

2010-03-12 Thread Ahmet Arslan
> I feel like the default option is a little hacky plus I'll > probably be > sharing my schema.xml for multiple cores using dynamic > field types. > > I can't believe there isnt an easy way to specify this. So > my only options > is something like this? Also you can generate this static value fro

Re: local solr geo_distance

2010-03-12 Thread Grant Ingersoll
On Mar 12, 2010, at 1:23 AM, wicketnewuser wrote: > > Hi I'm getting geo_distance as str eventhough I'm define the field as > tdouble. my search looks like > /solr/select?&qt=geo&lat=xx.xx&long=yy.yy&q=*&radius=10 > Is there anyway i can get is as > double instead of str I'm not sure about this

Re: Solr Performance Issues

2010-03-12 Thread Siddhant Goel
Hi, Thanks for your responses. It actually feels good to be able to locate where the bottlenecks are. I've created two sets of data - in the first one I'm measuring the time took purely on Solr's end, and in the other one I'm including network latency (just for reference). The data that I'm posti

Re: local solr geo_distance

2010-03-12 Thread Grant Ingersoll
I'm sorry. You sent this to the correct one. I made a mistake in my mail client. On Mar 12, 2010, at 10:49 AM, Grant Ingersoll wrote: > Hi, > > From http://people.apache.org/~hossman/#solr-dev: > > Your question is better suited for the solr-u...@lucene mailing list ... > not the solr-...@lu

Re: DIH field options

2010-03-12 Thread blargy
I feel like the default option is a little hacky plus I'll probably be sharing my schema.xml for multiple cores using dynamic field types. I can't believe there isnt an easy way to specify this. So my only options is something like this? What if I don't need a templa

Re: KeywordTokenizer for faceting gives too many results

2010-03-12 Thread Michael Kuhlmann
On 03/12/10 17:51, Ahmet Arslan wrote: > >> >> try using Parenthesis with queries that contain more than >> one term. &fq=label:(Aces+of+London) >> Otherwise >> jumps >> in. > > defaultSearchField stuff is correct but I just realized that you need to use > quotes in your case. Because query p

Re: KeywordTokenizer for faceting gives too many results

2010-03-12 Thread Ahmet Arslan
> > try using Parenthesis with queries that contain more than > one term. &fq=label:(Aces+of+London) > Otherwise  > jumps > in. defaultSearchField stuff is correct but I just realized that you need to use quotes in your case. Because query parser splits on white-spaces. &fq=label:"Aces of Lo

How does ReversedWildcardFilterFactory work?

2010-03-12 Thread JavaGuy84
Hi, I am just curious to know how this class works and how this can be implemented in right way. I got the details from JIRA as below This patch is an implementation of the "reversed tokens" strategy for efficient leading wildcards queries. ReversedWildcardsTokenFilter reverses tokens and retu

Re: DIH field options

2010-03-12 Thread Tommy Chheng
Haven't tried this myself but try adding a default value and don't specify it during the import. http://wiki.apache.org/solr/SchemaXml On 3/12/10 7:56 AM, blargy wrote: Forgive me but I'm slightly retarded... I grew up underneath some power lines ;) I've read through that wiki but I still c

Re: KeywordTokenizer for faceting gives too many results

2010-03-12 Thread Ahmet Arslan
> I have some fields that are only used for faceting, so > they're only > queried by facet results. No modification is needed, no > lowercase, > nothing. So the KeywordTokenizerFactory seems to be perfect > for them. You can use plain string type definition (comes in schema.xml) for that purpose.

KeywordTokenizer for faceting gives too many results

2010-03-12 Thread Michael Kuhlmann
Hi, I have some fields that are only used for faceting, so they're only queried by facet results. No modification is needed, no lowercase, nothing. So the KeywordTokenizerFactory seems to be perfect for them. Alas, when the value contains spaces, I'm still getting too many results. I have a field

Re: KeywordTokenizer for faceting; was: Re: Analyzer for indexing only, not for queries

2010-03-12 Thread Michael Kuhlmann
Hi Erick, On 03/12/10 17:09, Erick Erickson wrote: > << of my fields does not have any analyzers defined at all, and it's > working fine without problems.>>> > > Field or fieldType? ...one of my fields with a fieldtype that does not have any analyzer defined at all, ... ;-) > > << So, it must

Re: DIH field options

2010-03-12 Thread Ahmet Arslan
> Forgive me but I'm slightly retarded... I grew up > underneath some power lines > ;) > > I've read through that wiki but I still can't find what I'm > looking for. I > just want to give one of the DIH entities/fields a static > value (ie it > doesnt come from a database column). How can I config

Re: KeywordTokenizer for faceting; was: Re: Analyzer for indexing only, not for queries

2010-03-12 Thread Erick Erickson
<<>> Field or fieldType? << So, it must be possible to define field type without specifying any analyzers. >> Truth to tell, I don't know off the top of my head what happens if you define no analyzer for a fieldType. I think it would be bad practice anyway, *I* want to *know* what indexing and a

Re: DIH field options

2010-03-12 Thread blargy
Forgive me but I'm slightly retarded... I grew up underneath some power lines ;) I've read through that wiki but I still can't find what I'm looking for. I just want to give one of the DIH entities/fields a static value (ie it doesnt come from a database column). How can I configure this? FYI th

Re: local solr geo_distance

2010-03-12 Thread Grant Ingersoll
Hi, From http://people.apache.org/~hossman/#solr-dev: Your question is better suited for the solr-u...@lucene mailing list ... not the solr-...@lucene list. solr-dev is for discussing development of the internals of the Solr application ... it is *not* the appropriate place to ask questions abou

Re: local solr geo_distance

2010-03-12 Thread wicketnewuser
any one with localsolr expirence? wicketnewuser wrote: > > Hi I'm getting geo_distance as str eventhough I'm define the field as > tdouble. my search looks like > /solr/select?&qt=geo&lat=xx.xx&long=yy.yy&q=*&radius=10 > Is there anyway i can get is as > double instead of str > -- View this m

KeywordTokenizer for faceting; was: Re: Analyzer for indexing only, not for queries

2010-03-12 Thread Michael Kuhlmann
Hi Erick, thank you very much for your help. What's confusing me is that another of my fields does not have any analyzers defined at all, and it's working fine without problems. So, it must be possible to define field type without specifying any analyzers. I don't understand why it shouldn't be po

A bunch of questions

2010-03-12 Thread Shawn Heisey
Does SolrCloud's notion of a "collection", which appears to use cores, override normal multi-core usage for building an offline index and quickly swapping it into production? Some of the features in SolrCloud look useful, if it's still possible to exert manual control over cores and shards. A

Where are my boosts ?

2010-03-12 Thread pcmanprogrammeur
Hey ! I use boost on my documents and Solr takes my boosts in consideration ! But now, i would like to get the "boost" of each documents but there are not present in my XML result ... why ? What i have in my XML result: 1 What i want in my XML result : 1 Thanks for your help ! -- View this me

FW: highlighting snippet length

2010-03-12 Thread Sean Bronée
Hi everybody, I would need a little help understanding what seems to be a rather erratic behavior in Solr Highlighting. In my query I want the field "text" to be summarized, and want a maximum of 300 characters and 3 snippets. I have therefore set fragsize=100 and snippets=3. But there seems to b

Re: AutoSuggest

2010-03-12 Thread Erick Erickson
Did you commit your changes? Erick On Fri, Mar 12, 2010 at 7:38 AM, Suram wrote: > > Can set my index fields for auto Suggestion, sometime the newly index field > not found for auto suggestion and index search > -- > View this message in context: > http://old.nabble.com/AutoSuggest-tp27874542p2

Re: Solr Performance Issues

2010-03-12 Thread Erick Erickson
You've probably already looked at this, but here goes anyway. The first question probably should have been "what are you measuring"? I've been fooled before by looking at, say, average response time and extrapolating. You're getting 20 qps if your response time is 1 second, but you have 20 threads

Re: Highlighting Results

2010-03-12 Thread Ahmet Arslan
> Hi All > > Im not sure where i'm going wrong but highlighting does not > seem to work for me. > > I have indexed around 5000 PDF documents which went well. > > Running normal queries against the attr_content works > well. > > When adding any hl code it does not seem to make a bit of > differe

RE: Highlighting Results

2010-03-12 Thread Dave Searle
Hi Lee, What issues are you having mate? The highlighted fragments appear in a different section to the response, toward the bottom (can't remember the element names without referring to the docs). The stored fields returned are not replaced with the highlighted fragments. Cheers Dave -O

Re: Analyzer for indexing only, not for queries

2010-03-12 Thread Erick Erickson
Well, what would you have SOLR do that makes sense if you don't define a query analyzer? Very very strange things happen if you use different analyzers for indexing and querying. At least defaulting that way has a *chance* of giving expected results... Why not use, say, KeywordTokenizerFactory if

Re: Best Practices for Runtime Index Updates

2010-03-12 Thread Erick Erickson
You'll find many discussions of this topic if you search the mail archive for "near real time". Erick 2010/3/12 Kranti™ K K Parisa > Hi, > > What are the Best Practices for Runtime Index Updates? Means we have index > and user may add some data like tags, notes..etc to each solr document. >

Re: Architectural help

2010-03-12 Thread Erick Erickson
Data Import Handler, see http://wiki.apache.org/solr/DataImportHandler Erick On Fri, Mar 12, 2010 at 12:08 AM, Dennis Gearon wrote: > What is DIH? I feel like I'm saying, "Duh . . .", sorry. > > > Dennis Gearon > > Signature Warning > > EARTH has a Right To Life, > otherwise we

AutoSuggest

2010-03-12 Thread Suram
Can set my index fields for auto Suggestion, sometime the newly index field not found for auto suggestion and index search -- View this message in context: http://old.nabble.com/AutoSuggest-tp27874542p27874542.html Sent from the Solr - User mailing list archive at Nabble.com.

dataImport handler: how to figure out errors

2010-03-12 Thread Julian Davchev
Hello folks, I have two questions regarding dataimport handler. Using mysql datasource with mysql java connector. 1. Full import works fine http://example.com:8983/solr/dataimport?command=full-import delta-import as well Question is how can I figure out if there are errors when running delta

Best way to get usable terms out of TermComponent

2010-03-12 Thread Mark Roberts
Hi, I want to implement a suggestion/autocomplete dropdown on my searchbox - can anyone help with: 1) Is the TermComponent the advised way for autocomplete? 2) How can I ensure that the returned terms are complete, valid and English words? Any help much appareciated.

Re: issue with delete index

2010-03-12 Thread muneeb
Thanks for replying Yonik. I did actually restart the jetty after changing schema and before re-indexing. But i realized that it was to do with 'omitNorms' field attribute, i had titles with length 3 and 4 with similar length normalization score (0.5), which I was, wrongly, expecting to be diffe

Re: field length normalization

2010-03-12 Thread muneeb
Ah I see. Thanks very much Jay for your explanation, it really helped a lot. I guess I have to deal with this in some other way, since I am working with short titles and I really want short titles to appear at top. Can you suggest anything to bring titles with length 3 to appear before titles wi

Re: distinct on my result

2010-03-12 Thread stocki
okay now its better ;9 thx =) gwk-4 wrote: > > Hi, > > Try replacing KeywordTokenizerFactory with a WhitespaceTokenizerFactory > so it'll create separate terms per word. After a reindex it should work. > > Regards, > > gwk > > On 3/11/2010 4:33 PM, stocki wrote: >> hey, >> >> okay i show

Fwd: Highlighting Results

2010-03-12 Thread Lee Smith
Can anyone help ?? Begin forwarded message: > From: Lee Smith > Date: 11 March 2010 17:25:59 GMT > To: solr-user@lucene.apache.org > Subject: Highlighting Results > Reply-To: solr-user@lucene.apache.org > > Hi All > > Im not sure where i'm going wrong but highlighting does not seem to work for

Re: Solr Performance Issues

2010-03-12 Thread Siddhant Goel
I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS disk caching. I think that at any point of time, there can be a maximum of concurrent requests, which happens to make sense btw (does it?). As I increase the number of threads, the load average shown by top goes up to as high

Analyzer for indexing only, not for queries

2010-03-12 Thread Michael Kuhlmann
Hi all, I have a field with some kind of category tree as a string. The format is like this: "prefix>first>second#prefix>otherfirst>othersecond" So, the document is categorized in two categories, separated by '#', and all categories start with the same prefix which I don't want to use. F