Re: [ANN] Zoie Solr Plugin - Zoie Solr Plugin enables real-time update functionality for Apache Solr 1.4+

2010-03-09 Thread Don Werve
2010/3/9 Shalin Shekhar Mangar > I think Don is talking about Zoie - it requires a long uniqueKey. > Yep; we're using UUIDs.

Re: More contextual information in anlyzers

2010-03-09 Thread dbejean
So, the way I made my analyzer is the good one. Thank you. hossman wrote: > > > : If I write a custom analyser that accept a specific attribut in the > : constructor > : > : public MyCustomAnalyzer(String myAttribute); > : > : Is there a way to dynamically send a value for this attribute fro

Re: Is "UniqueKey" in schema and "pk" attribute for DataimportHandler entities still optional in solr 1.4?

2010-03-09 Thread Chris Hostetter
: I allways build solr index from scratch, so I don't have neither "pk" : attribute in "entity" tag (dataconfig.xml file) nor "UniqueKey" in index : schema. When I updated solr from 1.3 to 1.4 I got the following exception : during solr initialization: This is in fact a bug in Solr 1.4... https:/

Re: More contextual information in anlyzers

2010-03-09 Thread Chris Hostetter
: If I write a custom analyser that accept a specific attribut in the : constructor : : public MyCustomAnalyzer(String myAttribute); : : Is there a way to dynamically send a value for this attribute from Solr at : index time in the XML Message ? : : : : . fundementally there are tw

Solr ad on stackoverflow.com

2010-03-09 Thread Mauricio Scheffer
Stackoverflow.com is serving ads for open source projects: http://meta.stackoverflow.com/questions/31913/open-source-advertising-sidebar-1h-2010 I think it would be good publicity for Solr to have a banner there... anyone up for designing one? (if it's ok with the Solr dev team, of course) Cheers

Re: digest

2010-03-09 Thread Chris Hostetter
: Mailing-List: contact solr-user-h...@lucene.apache.org; run by ezmlm : Precedence: bulk : List-Help: ...if you send mail to that address it should have info about subscribing in digest mode. And PS... : Subject: digest : In-Reply-To: <8f0ad1f3100309

Scaling indexes with high document count

2010-03-09 Thread Peter Sturge
Hello, I wonder if anyone might have some insight/advice on index scaling for high document count vs size deployments... The nature of the incoming data is a steady stream of, on average, 4GB per day. Importantly, the number of documents inserted during this time is ~7million (i.e. lots of small

Re: Using SOLR

2010-03-09 Thread Erick Erickson
Well, the LukeRequestHandler lets you peek at the index, see: http://wiki.apache.org/solr/LukeRequestHandler warning: it'll take a bit for this to make lots of sense. You can get a copy of Luke (google Lucene Luke) for what the above is based on, point it at your index and have at it. One bit of

Architectural help

2010-03-09 Thread blargy
I was wondering if someone could be so kind to give me some architectural guidance. A little about our setup. We are RoR shop that is currently using Ferret (no laughs please) as our search technology. Our indexing process at the moment is quite poor as well as our search results. After some deli

Re: digest

2010-03-09 Thread Erick Erickson
Not that I know of, but you can certainly search it at: http://old.nabble.com/Solr-f14479.html or http://www.lucidimagination.com/search/ and there's the Wiki at: http://wiki.apache.org/solr/FrontPage Erick On Tue, Mar 9, 2010 at 7:12 PM, Dennis Gearon wrote: > Is there a digest mode to this l

Using SOLR

2010-03-09 Thread CP Hennessy
Hi, I'm trying to figure out if SOLR is the component I need and if so that I'm asking the right questions :) I need to index a large set of multilingual documents against a project specific taxonomy. From what I've read SOLR should be perfect for this. However I'm not sure that my approac

Re: Index an entire Phrase and not it's constituent parts?

2010-03-09 Thread Erick Erickson
P.S. although phrase queries with fields that do NOT have stopwords removed feels kinda like what you're hinting at. Erick On Tue, Mar 9, 2010 at 6:49 PM, Erick Erickson wrote: > I think you need to back up and tell us what you're > trying to accomplish from a higher level. > See Hossman's apach

Re: Extracting content from mailman managed mail list archive

2010-03-09 Thread Chris Hostetter
: I just checked popular search services and it seems that neither : lucidimagination search nor search-lucene support this: it really depends on what you want to do ... most people i know who index email want to included quoted portions in the message because it's part of hte context of the me

Re: Documents disappearing

2010-03-09 Thread Chris Hostetter
: A quick check did show me a couple of duplicates, but if I understand : correctly, even if two different process send the same document, the last : one should update the previous. If I send the same documents 10 times, in : the end, it should only be in my index once, no? it should yes ... i di

digest

2010-03-09 Thread Dennis Gearon
Is there a digest mode to this list? It's very active and helpful. I'm just not fully 'dove in' to using it yet. Just need to look in the digests for answers to my questions. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat,

Re: Warning : no lockType configured for...

2010-03-09 Thread Chris Hostetter
: Ok I think I know where the problem is ... : It's the constructor used by SolrCore in r772051 Ughhh... so to be clear: you haven't been using Solr 1.4 at any point in this thread? that explains why no one else could recreate the problem you were describing. For future refrence: if

Re: Index an entire Phrase and not it's constituent parts?

2010-03-09 Thread Erick Erickson
I think you need to back up and tell us what you're trying to accomplish from a higher level. See Hossman's apache page: Your question appears to be an "XY Problem" ... that is: you are dealing with "X", you are assuming "Y" will help you, and you are asking about "Y" without giving more details a

Re: Weird issue with solr and jconsole/jmx

2010-03-09 Thread Chris Hostetter
: I connected to one of my solr instances with Jconsole today and : noticed that most of the mbeans under the solr hierarchy are missing. : The only thing there was a Searcher, which I had no trouble seeing : attributes for, but the rest of the statistics beans were missing. : They all show up jus

Re: CoreAdminHandler question

2010-03-09 Thread Chris Hostetter
I *think* that you can use the same instanceDir for multiple cores, the key issue being that you need to make sure they each have distinct dataDirs (which as i recall can be done using property replacement with the core name) : The action CREATE creates a new core based on preexisting : instan

Re: search and count ocurrences

2010-03-09 Thread Chris Hostetter
: I need to implement a search where i should count the number of times : the string appears on the search field, : : ie: only return articles that mention the word 'HP' at least 2x. ... : Is there a way that SOLR does this type of operation for me? you'd have to implement it in a custo

Re: Removing duplicate values from multivalued fields

2010-03-09 Thread Chris Hostetter
: Is there a way to remove duplicate values from the multivalued fields? I am : using Solrj client with solr 1.4 version. not trivially, but you could write an UpdateProcessor to do this fairly trivially, or emplement it in the client. -Hoss

Re: Search on dynamic fields which contains spaces /special characters

2010-03-09 Thread Chris Hostetter
: I do not believe the SOLR or LUCENE syntax allows this At the lowest level, Solr and Lucene-Java both support any arbitrary character you want in the field name -- it's just that sevearl features use syntax that doesn't play nicely with characters like whitespace in field names. when using t

RE: Index an entire Phrase and not it's constituent parts?

2010-03-09 Thread Christopher Ball
Unfortunately, I don't see how the KeywordTokenizerFactory could work given the field in question is delimited text (paragraphs) and the KeywordTokenizerFactory essentially does nothing to the inbound content. Feel like I must be missing something . . . but can't figure out what. Do I reall

Re: Highlighting

2010-03-09 Thread Ahmet Arslan
> Yes it shows when I run the debug > > - name="org.apache.solrhandler.component.HighlightComponent"> >     0.0 > > > Any other ideas ? is the field attr_content stored? Are you querying this field? What happens when you append &hl.maxAnalyzedChars=-1 to your search ulr?

Re: SolrJ commit options

2010-03-09 Thread Chris Hostetter
: One technique to control commit times is to do automatic commits: you : can configure a core to commit every N seconds (really milliseconds, : but less than 5 minutes becomes difficult) and/or every N documents. : This promotes a more fixed amount of work per commit. ...but increaseing commit f

Re: SolrConfig - constructing the object

2010-03-09 Thread Chris Hostetter
: Now a few days later I am thinking, I need access to the SolrConfig object : in multiple classes. Maybe I should not be reloading it over and over? I : see that there is a getSolrConfig() method in the SolrCore class that will : return the SolrConfig object. : Should I maybe just take all m

Re: Cleaning up dirty OCR

2010-03-09 Thread simon
On Tue, Mar 9, 2010 at 2:35 PM, Robert Muir wrote: > > Can anyone suggest any practical solutions to removing some fraction of > the tokens containing OCR errors from our input stream? > > one approach would be to try > http://issues.apache.org/jira/browse/LUCENE-1812 > > and filter terms that on

Re: Highlighting

2010-03-09 Thread Lee Smith
Yes it shows when I run the debug - 0.0 Any other ideas ? On 9 Mar 2010, at 21:06, Joe Calderon wrote: > did u enable the highlighting component in solrconfig.xml? try setting > debugQuery=true to see if the highlighting component is even being > called... > > On Tue, Mar 9, 2010 at 12:

Distributed search fault tolerance

2010-03-09 Thread Shawn Heisey
I attended the Webinar on March 4th. Many thanks to Yonik for putting that on. That has led to some questions about the best way to bring fault tolerance to our distributed search. High level question: Should I go with SolrCloud, or stick with 1.4 and use load balancing? I hope the rest of

Re: Highlighting

2010-03-09 Thread Joe Calderon
did u enable the highlighting component in solrconfig.xml? try setting debugQuery=true to see if the highlighting component is even being called... On Tue, Mar 9, 2010 at 12:23 PM, Lee Smith wrote: > Hey All > > I have indexed a whole bunch of documents and now I want to search against > them. >

Highlighting

2010-03-09 Thread Lee Smith
Hey All I have indexed a whole bunch of documents and now I want to search against them. My search is going great all but highlighting. I have these items set hl=true hl.snippets=2 hl.fl = attr_content hl.fragsize=100 Everything works apart from the highlighted text found not being surrounded

Re: master/slave

2010-03-09 Thread Peter Sturge
Hi Dino, I suppose you could write your own ReplicationHandler to do the replication yourself, but I should think the effort involved would be better spent deploying the existing Solr http replication or using a Hadoop-based solution, or UNIX scripting. By far, the easiest path to replication is

Re: Cleaning up dirty OCR

2010-03-09 Thread Robert Muir
> Can anyone suggest any practical solutions to removing some fraction of the > tokens containing OCR errors from our input stream? one approach would be to try http://issues.apache.org/jira/browse/LUCENE-1812 and filter terms that only appear once in the document. -- Robert Muir rcm...@gmail

Cleaning up dirty OCR

2010-03-09 Thread Burton-West, Tom
Hello all, We have been indexing a large collection of OCR'd text. About 5 million books in over 200 languages. With 1.5 billion OCR'd pages, even a small OCR error rate creates a relatively large number of meaningless unique terms. (See http://www.hathitrust.org/blogs/large-scale-search/too

Is "UniqueKey" in schema and "pk" attribute for DataimportHandler entities still optional in solr 1.4?

2010-03-09 Thread Alexandr Savochkin
I allways build solr index from scratch, so I don't have neither "pk" attribute in "entity" tag (dataconfig.xml file) nor "UniqueKey" in index schema. When I updated solr from 1.3 to 1.4 I got the following exception during solr initialization: --

Re: Store input text after analyzers and token filters

2010-03-09 Thread JCodina
Otis, I've been thinking on it, and trying to figure out the different solutions - Try to solve it doing a bridge between solr and clustering. - Try to solve it before/during indexing The second option, of course is better for performance, but how to do it?? I think a good option may be to crea

Re: SolrConfig - constructing the object

2010-03-09 Thread Mark Miller
Yes - I think you should if you can. If you can make them SolrAware that is - only certain plugin classes have the ability to do so (due to a runtime check against a list of approved classes) - Mark On 03/09/2010 01:28 PM, Kimberly Kantola wrote: Thank you Mark for your help. Now a few days l

Re: SolrConfig - constructing the object

2010-03-09 Thread Kimberly Kantola
Thank you Mark for your help. Now a few days later I am thinking, I need access to the SolrConfig object in multiple classes. Maybe I should not be reloading it over and over? I see that there is a getSolrConfig() method in the SolrCore class that will return the SolrConfig object. Should I m

Re: Filter to cut out all zeors?

2010-03-09 Thread Ahmet Arslan
> I'm trying to figure out the best way to cut out all zeros > of an input string like "01.10." or "022.300"... > Is there such a filter in Solr or anything similar that I > can adapt to do the task? With solr.MappingCharFilterFactory[1] you can replace all zeros with "" before tokenizer. Sol

Re: master/slave

2010-03-09 Thread Dino Di Cola
Ok Peter for script-based replication; I forgot to mention I already verified that mechanism. When I configure the slave as follows http://localhost:8983/solr/admin/replication 00:00:20 ... SOLR uses the org.apache.solr.handler.ReplicationHandler to access ev

Re: Solr Startup CPU Spike

2010-03-09 Thread John Williams
Yonik, We are on Solr 1.3. The total number of documents is 54173459. Let me know if need any additional info. Thanks, John -- John Williams System Administrator 37signals On Mar 9, 2010, at 11:39 AM, Yonik Seeley wrote: > Ahhh, FieldCache loading... what version of Solr are you usin

Re: Solr Startup CPU Spike

2010-03-09 Thread John Williams
Mark, I am trying to load that url but its taking quite a while. I will let you know if/when it loads. -John -- John Williams System Administrator 37signals On Mar 9, 2010, at 11:38 AM, Mark Miller wrote: > Ah - loading the fieldcache - do you have a *lot* of unique terms in the > fi

Re: master/slave

2010-03-09 Thread Peter Sturge
The SolrEmbededServer doesn't have any http, and so you can't use the http replication. You can use the script-based replication if you're on LUNIX. See: http://wiki.apache.org/solr/CollectionDistribution It would be worth looking at using Solr in a Jetty container and using the http replicati

Re: Solr Startup CPU Spike

2010-03-09 Thread Yonik Seeley
Ahhh, FieldCache loading... what version of Solr are you using? It's interesting it would take that long to load too (and maxing out one CPU - doesn't look particularly IO bound). How many documents are in this index? -Yonik On Tue, Mar 9, 2010 at 12:33 PM, John Williams wrote: > Yonik, > > I

Re: Solr Startup CPU Spike

2010-03-09 Thread Mark Miller
Ah - loading the fieldcache - do you have a *lot* of unique terms in the fields you are sorting/faceting on? localhost:8983/solr/admin/luke is helpful for checking this. -- - Mark http://www.lucidimagination.com On 03/09/2010 12:33 PM, John Williams wrote: Yonik, I have provided an image

Re: Solr Startup CPU Spike

2010-03-09 Thread John Williams
Yonik, I have provided an image below gives details on what is causing the blocked http thread. Is there any way to resolve this issue. Thanks, John -- John Williams System Administrator 37signals <> On Mar 9, 2010, at 10:41 AM, John Williams wrote: > Yonik, > > I got yourkit setup to profil

master/slave

2010-03-09 Thread Dino Di Cola
Dear all, I am trying to setup a master/slave index replication with two slaves embedded in a tomcat cluster and a master kept in a separate machine. I would like to know if is it possible to configure slaves with a ReplicationHandler able to access master by starting an embedded server instead of

Re: QueryElevationComponent blues

2010-03-09 Thread Ryan Grange
I'd read that too, but in the debug data queryBoosting is showing matches on our int typed identifiers (though it does show it as 123456). Is the problem that it can match against an integer, but it can't reorder them in the results? This seems unlikely as using a standard query and elevation

Filter to cut out all zeors?

2010-03-09 Thread Sebastian F
Hey there, I'm trying to figure out the best way to cut out all zeros of an input string like "01.10." or "022.300"... Is there such a filter in Solr or anything similar that I can adapt to do the task? Thanks for any help

tmp

2010-03-09 Thread Dino Di Cola
tmp

Re: Solr Startup CPU Spike

2010-03-09 Thread John Williams
Yonik, I got yourkit setup to profile the Tomcat instance and as you will see in the graph below all of the http threads are blocked (red) until around 4:40. This is the point where the instance becomes responsive and CPU usage drops. I have also ruled out GC being the issue by using the GC m

Embedded solr - SLF4J exception

2010-03-09 Thread John Ament
While attempting to work around my other issue, I'm trying to use an embedded solr server to try to programatically load data into solr. It seems though that I can't deploy my app, as a result of this exception: : java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBind

Re: Can't delete from curl

2010-03-09 Thread Paul Tomblin
On Mon, Mar 8, 2010 at 9:39 PM, Lance Norskog wrote: > ... curl http://xen1.xcski.com:8080/solrChunk/nutch/select > > that should be /update, not /select Ah, that seems to have fixed it. Thanks. -- http://www.linkedin.com/in/paultomblin http://careers.stackoverflow.com/ptomblin

Re: PDF extraction leads to reversed words

2010-03-09 Thread Robert Muir
On Tue, Mar 9, 2010 at 9:44 AM, Abdelhamid ABID wrote: > I put ICU4J 4.2 in the lib of Solr, nothing changed, I'm trying now with > ICU4J 3.8 > Hello, what version of Solr are you using? I think you will need to use the trunk version. I created a patch for this issue that you can apply to trunk

Re: PDF extraction leads to reversed words

2010-03-09 Thread Abdelhamid ABID
I tried couples of times to get this patch, but downloads fail, filesize missmach or someting like error poped up is there another link On 3/9/10, Dominique Bejean wrote: > > Hi, > > The problem comes form PDFBox ( > http://brutus.apache.org/jira/browse/PDFBOX-377) and is fixed now. However > Tik

Re: PDF extraction leads to reversed words

2010-03-09 Thread Abdelhamid ABID
I put ICU4J 4.2 in the lib of Solr, nothing changed, I'm trying now with ICU4J 3.8 On 3/9/10, Robert Muir wrote: > > I think the problem is that Solr does not include the ICU4J jar, so it > won't work with Arabic PDF files. > > Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your

Re: Confused by Solr Ranking

2010-03-09 Thread Erick Erickson
Well, that's a matter of opinion, isn't it? If *your* application requires this, you could always copy the field to a non-stemmed field and apply boosts... Erick On Tue, Mar 9, 2010 at 9:21 AM, abhishes wrote: > > I kind of suspected stemming to be the reason behind this. But I consider > stemm

Dummy boost question

2010-03-09 Thread Mark Roberts
Hi, I have indexed some documents that have title, content and keyword (multi-value). I want to *search* on title and content, and then, within these results *boost* by keyword. I have set up my qf as such: content^0.5 title^1.0 And my bq as such: keyword:(*.*

Re: Confused by Solr Ranking

2010-03-09 Thread Michael Lackhoff
On 09.03.2010 16:01 Ahmet Arslan wrote: > >> I kind of suspected stemming to be the reason behind this. >> But I consider stemming to be a good feature. > > This is the side effect of stemming. Stemming increases recall while harming > precision. But most people want the best possible combinat

Re: Warning : no lockType configured for...

2010-03-09 Thread Mani EZZAT
Ok I think I know where the problem is @Deprecated 169 public SolrIndexWriter(String name, String path, DirectoryFactory dirFactory, boolean create, IndexSchema schema, SolrIndexConfig config) throws IOException { 170super(getDirectory(path, dirFactory, null), config.luceneAutoCom

Re: PDF extraction leads to reversed words

2010-03-09 Thread Robert Muir
On Tue, Mar 9, 2010 at 10:10 AM, Abdelhamid ABID wrote: > nor 3.8 version does change anythings ! > the patch (https://issues.apache.org/jira/browse/SOLR-1813) can only work on Solr trunk. It will not work with Solr 1.4. Solr 1.4 uses pdfbox-0.7.3.jar, which does not support Arabic. Solr trunk

Re: PDF extraction leads to reversed words

2010-03-09 Thread Abdelhamid ABID
nor 3.8 version does change anythings ! On 3/9/10, Robert Muir wrote: > > I think the problem is that Solr does not include the ICU4J jar, so it > won't work with Arabic PDF files. > > Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your > classpath. > > > On Mon, Mar 8, 2010 at 6

Re: Tomcat save my Index temp ...

2010-03-09 Thread stocki
okay i got it .. iam studid XD i set my dataDir to /var/data/solr/... and gives the correct rights now it runs. Jens Kapitza-2 wrote: > > Am 08.03.2010 15:08, schrieb stocki: >> Hello. >> >> is use 2 cores for solr. >> >> when is restart my tomcat on debian, tomcat delete my index. >> > y

Re: Confused by Solr Ranking

2010-03-09 Thread abhishes
I kind of suspected stemming to be the reason behind this. But I consider stemming to be a good feature. The point is that if an exact match exists, then solr should report that first and then stemmed results should be reported. disabling stemming altogether would be a step in the wrong dire

Re: Confused by Solr Ranking

2010-03-09 Thread Ahmet Arslan
> I kind of suspected stemming to be the reason behind this. > But I consider stemming to be a good feature. This is the side effect of stemming. Stemming increases recall while harming precision.

Re: Search on dynamic fields which contains spaces /special characters

2010-03-09 Thread Erick Erickson
Please repost as a separate thread.. From: http://people.apache.org/~hossman/#threadhijack When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track

Re: Child entities in document not loading

2010-03-09 Thread John Ament
So right now I'm thinking that solr just doesn't like me. I just noticed that the following document config doesn't work for me

Re: Confused by Solr Ranking

2010-03-09 Thread Shalin Shekhar Mangar
On Tue, Mar 9, 2010 at 4:38 PM, abhishes wrote: > > I am indexing a column in a database. I have chosen field type of text for > this column (this type was defined in the sample schema file which comes in > the Solr Example). > > When I search for the word "impress" and top 3 results. I get these

RE: HTML encode extracted docs - Problems with solr.HTMLStripCharFilter

2010-03-09 Thread Mark Roberts
Sounds like "solr.HTMLStripCharFilter" may work... except, I'm getting a couple of problems: 1) HTML still seems to be getting into my content field All I did was add to the index analyzer for the my "text" fieldType. 2) Some it seems to have broken my highlighting, I get this error: 'org.a

Re: [ANN] Zoie Solr Plugin - Zoie Solr Plugin enables real-time update functionality for Apache Solr 1.4+

2010-03-09 Thread Shalin Shekhar Mangar
I think Don is talking about Zoie - it requires a long uniqueKey. On Tue, Mar 9, 2010 at 10:18 AM, Lance Norskog wrote: > Solr unique ids can be any type. The QueryElevateComponent complains > if the unique id is not a string, but you can comment out the QEC. I > have one benchmark test with 2

Re: PDF extraction leads to reversed words

2010-03-09 Thread Abdelhamid ABID
I doen't know about pdftotext, is it pluggable with Solr, or do we need hard-code the step of extraction before Solr turn. On 3/9/10, Dominique Bejean wrote: > > Hi, > > The problem comes form PDFBox ( > http://brutus.apache.org/jira/browse/PDFBOX-377) and is fixed now. However > Tika doesn't yet

Re: PDF extraction leads to reversed words

2010-03-09 Thread Robert Muir
this depends on what version of solr you are using, the trunk version has a version of tika that supports this. See SOLR-1813 On Tue, Mar 9, 2010 at 3:59 AM, Dominique Bejean wrote: > Hi, > > The problem comes form PDFBox > (http://brutus.apache.org/jira/browse/PDFBOX-377) and is fixed now. Howev

Re: Confused by Solr Ranking

2010-03-09 Thread Avi Rosenschein
> > > > I kind of suspected stemming to be the reason behind this. > > But I consider stemming to be a good feature. > > This is the side effect of stemming. Stemming increases recall while > harming precision. > This is a side effect of stemming, the way it is currently implemented in Lucene. Ste

Re: Wildcard question -- case issue

2010-03-09 Thread cjkadakia
Understood. My solution was to convert any search terms with an asterisk to lowercase prior to submitting to solr and it seems to be working correctly now. Thanks for your help. -- View this message in context: http://old.nabble.com/Wildcard-questioncase-issue-tp27823332p27836740.html Sent f

Re: PDF extraction leads to reversed words

2010-03-09 Thread Abdelhamid ABID
I'm using 1.4 version of Solr On 3/9/10, Robert Muir wrote: > > On Tue, Mar 9, 2010 at 9:44 AM, Abdelhamid ABID > wrote: > > I put ICU4J 4.2 in the lib of Solr, nothing changed, I'm trying now with > > ICU4J 3.8 > > > > > Hello, what version of Solr are you using? I think you will need to > use

Re: PDF extraction leads to reversed words

2010-03-09 Thread Abdelhamid ABID
nor 3.8 version does change anythings ! On 3/9/10, Robert Muir wrote: > > I think the problem is that Solr does not include the ICU4J jar, so it > won't work with Arabic PDF files. > > Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your > classpath. > > > On Mon, Mar 8, 2010 at 6

Confused by Solr Ranking

2010-03-09 Thread abhishes
I am indexing a column in a database. I have chosen field type of text for this column (this type was defined in the sample schema file which comes in the Solr Example). When I search for the word "impress" and top 3 results. I get these 3 documents bare desire pronounce villainy draught beasts

indexing key/value field type

2010-03-09 Thread muneeb
Hi, I have built an index of several million documents with all primitive type fields, either String, text or int. I have another multivalued field to index now for each document which is a list of tags as a hashmap, so: tags , where key is String and value is Int. key is a given tag and value i

Re: PDF extraction leads to reversed words

2010-03-09 Thread Robert Muir
sorry for the link to the wrong JIRA issue, was looking at another issue. its here: https://issues.apache.org/jira/browse/SOLR-1813 again you will need to apply it to trunk I think, as thats the only place I have tested it. -- Robert Muir rcm...@gmail.com

Re: Tomcat save my Index temp ...

2010-03-09 Thread stocki
okay i install my solr so like how the wiki said. and a new try. here one of my two XML-files: /var/lib/conf/Catalina/localhost/suggest.xml should i set name="solr/home" to --> name="$SOLR_HOME" ??? id did not find the reason. Solr Home is set by : export JAVA_OPTS="$JAVA_OPTS -Dsolr.

Re: PDF extraction leads to reversed words

2010-03-09 Thread Dominique Bejean
Hi, The problem comes form PDFBox (http://brutus.apache.org/jira/browse/PDFBOX-377) and is fixed now. However Tika doesn't yet use this version of PDFBox. So for PDF text extraction, I doesn't use Tika but pdftotext. Dominique Le 09/03/10 06:00, Robert Muir a écrit : it is an optional depe