Re: Indexing HTML document

2010-03-03 Thread György Frivolt
Thank you! That's even more I wanted to know. ;) Georg On Tue, Mar 2, 2010 at 10:05 PM, Walter Underwood wun...@wunderwood.orgwrote: You are in luck, because Avi Rappoport has just written a tutorial about how to do this. It is available from Lucid Imagination:

Clustering from anlayzed text instead of raw input

2010-03-03 Thread JCodina
I'm trying to use carrot2 (now I started with the workbench) and I can cluster any field, but, the text used for clustering is the original raw text, the one that was indexed, without any of the processing performed by the tokenizer or filters. So I get stop words. I also did shingles (after

Re: 2 Cores, 1 Table, 2 DataImporter -- Import at the same time ?

2010-03-03 Thread stocki
pleeease help me somebody =( :P stocki wrote: Hello again ;) i install tomcat5.5 on my debian server ... i use 2 cores and two different DIH with seperatet Index, one for the normal search-feature and the other core for the suggest-feature. but i cannot start both DIH with an

error in sum function

2010-03-03 Thread JCodina
the sum function or the map one are not parsed correctly, doing this sort, works as a charm... sort=score+desc,sum(Num,map(Num,0,2000,42000))+asc but sort=score+desc,sum(map(Num,0,2000,42000),Num)+asc gives the following exception SEVERE: org.apache.solr.common.SolrException: Must declare sort

Re: Implementing hierarchical facet

2010-03-03 Thread Geert-Jan Brits
you could always define 1 dynamicfield and encode the hierarchy level in the fieldname: dynamicField name=_loc_hier_* type=string stored=false indexed=true omitNorms=true/ using: facet=onfacet.field={!key=Location}_loc_hier_cityfq=_loc_hier_country:somecountryid ... adding cityarea later for

Re: error in sum function

2010-03-03 Thread Koji Sekiguchi
Can you try it latest trunk? I have just fixed it in a couple of days Koji Sekiguchi from mobile On 2010/03/03, at 18:18, JCodina joan.cod...@barcelonamedia.org wrote: the sum function or the map one are not parsed correctly, doing this sort, works as a charm...

Solr with Tika - Text ordering garbled.

2010-03-03 Thread Wick2804
We are loading PDF documents with OCR contentl ayer into Solr through Tika. The load process appears to work fine and all of the words from the OCR layer are stored as Text in Solr, and therfore searchable. Our problem is that in the results returned from a search the words in the 'Text' field

Error on startup

2010-03-03 Thread Lee Smith
Hi All. I have shutdown solr removed the index so I can start over then re-launched. I am getting an error of SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.solrc...@14db38a4 (core1) has a reference count of 1 Any idea on what this is a result of ? Hope you can advise. Lee

Problems with variable geo_distance

2010-03-03 Thread Emad Mushtaq
Hi, I am having a very strange problem, related to local solr. In my documents there is a record for location called Gujranwala which is a city in Pakistan. I try to get search results with respect to coordinates of Lahore (another city of Pakistan). When I do a search within 100 miles, there are

[ANN] Carrot2 3.2.0 released

2010-03-03 Thread Stanislaw Osinski
Dear All, I'm happy to announce three releases from the Carrot Search team: Carrot2 v3.2.0, Lingo3G v1.3.1 and Carrot Search Labs. Carrot2 is an open source search results clustering engine. Version v3.2.0 introduces: * experimental support for clustering Korean and Arabic content, * a

Re-index after Solr config file changed without restarting services

2010-03-03 Thread Marc Wilson
Hi, I am attempting to achieve what I believe many others have attempted in the past: allow an end user to modify a Solr config file through a custom UI and then roll out any changes made without restarting any services. Specifically, I want to be able to let the user edit the synonyms.txt

Re: Logging in Embedded SolrServer - What a nightmare.

2010-03-03 Thread Lucas F. A. Teixeira
Hello Kevin, No, haven't worked. I tried a lot of combinations between the jars of log4j, lsf4j and log4j-slf4j and got no success. As I said, for the solr.war, this you said seems to work, the same way I got it working confiuring jre/lib/logging.properties, but not with embedded server...

Re: Clustering from anlayzed text instead of raw input

2010-03-03 Thread Stanislaw Osinski
Hi Joan, I'm trying to use carrot2 (now I started with the workbench) and I can cluster any field, but, the text used for clustering is the original raw text, the one that was indexed, without any of the processing performed by the tokenizer or filters. So I get stop words. The easiest way

need help with Solr Cores

2010-03-03 Thread muneeb
Hi Everyone, I am new to Solr, and still trying to get my hands on it. I have indexed over 6 million documents and currently have a single large index. I update my index using SolrJ client due to the format I store my documents (i.e. JSON blobs) in database. I need to find a way to have

Best performance for facet dates in trunk using solr.TrieDateField

2010-03-03 Thread Marc Sturlese
Hey there, I am testing date facets in trunk with huge index. Aparently, as the default solrconfig.xml shows, the fastest way to run dace facets queries is index the field with this data type: !-- A Trie based date field for faster date range queries and date faceting. -- fieldType

Re: 2 Cores, 1 Table, 2 DataImporter -- Import at the same time ?

2010-03-03 Thread Erik Hatcher
what's the error you're getting? is DIH keeping some static that prevents it from running across two cores separately? if so, that'd be a bug. Erik On Mar 3, 2010, at 4:12 AM, stocki wrote: pleeease help me somebody =( :P stocki wrote: Hello again ;) i install tomcat5.5

Re: error in sum function

2010-03-03 Thread JCodina
Ok, solved!!! Joan Koji Sekiguchi-2 wrote: Can you try it latest trunk? I have just fixed it in a couple of days Koji Sekiguchi from mobile On 2010/03/03, at 18:18, JCodina joan.cod...@barcelonamedia.org wrote: the sum function or the map one are not parsed correctly, doing

Re: Issue on stopword list

2010-03-03 Thread Suram
Joe Calderon-2 wrote: or you can try the commongrams filter that combines tokens next to a stopword On Tue, Mar 2, 2010 at 6:56 AM, Walter Underwood wun...@wunderwood.org wrote: Don't remove stopwords if you want to search on them. --wunder On Mar 2, 2010, at 5:43 AM, Erick Erickson

Re: 2 Cores, 1 Table, 2 DataImporter -- Import at the same time ?

2010-03-03 Thread stocki
okay i change the lockType to single but with no good effect. so i think now, that my two DIH are using the same data-Folder. why ist it so ? i thought that each DIH use his own index ... ?! i think it is not possible to import from one table parallel with more than one DIH`s ?! myexception:

Re: Clustering from anlayzed text instead of raw input

2010-03-03 Thread JCodina
Thanks Staszek I'll give a try to stopwords treatbment, but the problem is that we perform POS tagging and then use payloads to keep only Nouns and Adjectives, and we thought that could be interesting to perform clustering only with these elements, to avoid senseless words. Of course is a

Can I used .XML files instead of .OSM files

2010-03-03 Thread mamathahl
I'm very new to Solr. I downloaded apache-solr-1.5-dev and was trying out the example in order to first figure out how Solr is working. I found out that the data directory consisted of .OSM files. But I have an XML file consisting of latitude, longitude and relevant news for that location.

Re: need help with Solr Cores

2010-03-03 Thread muneeb
Figured it out !! I actually created two folders in solr.home/data folder, each holding the index for a given core. So for core0 and core1 i had indexes as: solr.home/data/core0/index solr.home/data/core1/index Feeling a little stupid now, having figured out a simple issue :s muneeb wrote:

How to see the query generated by MoreLikeThisHandler?

2010-03-03 Thread Christopher Bottaro
Hello, Is there a way to see exactly what query is generated by the MoreLikeThisHandler? If I send debugQuery=true then I see in the response a key called parsedquery but it doesn't seem quite right. What I mean by that is when I make the MoreLikeThis query, I set mlt.fl to title,content but

DisMaxRequestHandler questions about bf and bq

2010-03-03 Thread Christopher Bottaro
Hello, I have a couple of questions regarding the bf and bq params to the DisMaxRequestHandler. 1) Can I specify them more than once? Ex: bf=log(popularity)bf=log(comment_count) 2) When using bq, how can I specify what score to use for documents not returned by the query? In other words,

Formatting Results

2010-03-03 Thread Lee Smith
Hey All I am indexing around 10,000 documents with Solar Cell which has gone superb. I can of course search the content like the example given: http://localhost:8983/solr/select?q=attr_content:tutorial But what I would like is for Solr to return the document with x many words and the matched

RE: DIH onError question

2010-03-03 Thread Shah, Nirmal
Thanks for your prompt reply. I resolved the ERROR, and used continue to bypass any EXCEPTIONS. Nirmal Shah Remedy Consultant|Column Technologies|Cell: (630) 244-1648 -Original Message- From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com] Sent: Tuesday, March 02, 2010 11:13

Re: Formatting Results

2010-03-03 Thread Marc Sturlese
I'll give you an example about how to configure your default SearchHandler to do highlighting but I strongly recomend you to check properly the wiki. Everything is really well explained in there: http://wiki.apache.org/solr/HighlightingParameters str name=hltrue/str str

SOLR Index or database

2010-03-03 Thread caman
Hello All, Just struggling with a thought where SOLR or a database would be good option for me.Here are my requirements. We index about 600+ news/blogs into out system. Only information we store locally is the title,link and article snippet.We are able to index all these sources into SOLR index

Re: Error on startup

2010-03-03 Thread Marc Sturlese
If you shut down the server propertly it's weird that you get an error when starting up again. How did you delete the index? I was experiencing something similar long time ago because I was removing the content from the index folder but not the folder itself. The correct way to do it was to

Re: Formatting Results

2010-03-03 Thread Lee Smith
Thanks Mark Ill have a good look at that part now. And I managed to get it started again :-). Thank you again Lee On 3 Mar 2010, at 18:52, Marc Sturlese wrote: I'll give you an example about how to configure your default SearchHandler to do highlighting but I strongly recomend you to

Re: SOLR Index or database

2010-03-03 Thread Walter Underwood
You need two, maybe three things that Solr doesn't do (or doesn't do well): * field updating * storing content * real time search and/or simple transactions I would seriously look at Mark Logic for that. It does all of those, plus full-text search, gracefully, plus it scales. There is also a

Re: Can I used .XML files instead of .OSM files

2010-03-03 Thread Marc Sturlese
Are you sure you don't have a folder called exampledocs with xml files inside? These are the files to index as a first example: apache-solr-1.5-dev/example/exampledocs Check the /home/marc/Desktop/data/apache-solr-1.5-dev/example/solr/conf/schema.xml and solrconfig.xml and you will see how to

Re: Need suggestion regarding custom transformer

2010-03-03 Thread Marc Sturlese
I think you can handle that writing a custom transformer. There's a good explanation in the wiki: http://wiki.apache.org/solr/DIHCustomTransformer KshamaPai wrote: Hi, Am new to solr. I am trying location aware search with spatial lucene in solr1.5 nightly build. My table in mysql has

Randomize MoreLikeThis

2010-03-03 Thread André Maldonado
Hello. I'm implementing More Like This functionality in my search request. Everything works fine, but I need to randomize the return of this more like this query. Something like this: *First request:* Query - docId:528369 Results - fields ... More like This result name=528369 numFound=57162

Re: DisMaxRequestHandler questions about bf and bq

2010-03-03 Thread Erik Hatcher
On Mar 3, 2010, at 12:26 PM, Christopher Bottaro wrote: I have a couple of questions regarding the bf and bq params to the DisMaxRequestHandler. 1) Can I specify them more than once? Ex: bf=log(popularity)bf=log(comment_count) Yes, you can use multiple bf parameters, each adding an

Weird issue with solr and jconsole/jmx

2010-03-03 Thread Andrew Greenburg
Hi, I connected to one of my solr instances with Jconsole today and noticed that most of the mbeans under the solr hierarchy are missing. The only thing there was a Searcher, which I had no trouble seeing attributes for, but the rest of the statistics beans were missing. They all show up just

Escaping options for tika/solr cell extract-only output

2010-03-03 Thread Dan Hertz (Insight 49, LLC)
Looking at http://wiki.apache.org/solr/ExtractingRequestHandler: Extract Only the output includes XML generated by Tika (and is hence further escaped by Solr's XML) ...is there an option to NOT have the resulting TIKA output escaped? so lt;headgt; would come back as head/ If no, what would

Lucene: Finite-State Queries, Flexible Indexing, Scoring, and more

2010-03-03 Thread Otis Gospodnetic
Hello folks, Those of you in or near New York and using Lucene or Solr should come to Lucene: Finite-State Queries, Flexible Indexing, Scoring, and more on March 24th: http://www.meetup.com/NYC-Search-and-Discovery/calendar/12720960/ The presenter will be the hyper active Lucene committer

Re: Randomize MoreLikeThis

2010-03-03 Thread Otis Gospodnetic
The first thing that came to mind is to index a random number with each doc and sort by that. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: André Maldonado andre.maldon...@gmail.com

Re: Re-index after Solr config file changed without restarting services

2010-03-03 Thread Otis Gospodnetic
Marc, At least for the force Solr to reindex part, I think you'll need to index yourself. That is, you need to run whatever app you run when you (re)index the data normally. Solr won't automagically reindex the data. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop

Multi core Search is not working when used with SHARDS

2010-03-03 Thread JavaGuy84
Hi all, I am trying to search on multiple cores (distributed search) but not able to succeed using Shards. I am able to get the results when I am hitting each core seperately, http://localhost:8981/solr/core1/select/?q=test http://localhost:8981/solr/core0/select/?q=test but when I try to use

Solr query parsing

2010-03-03 Thread Jason Rutherglen
Why would fq=sdate:+20100110 parse via a Solr server but not via QueryParsing.parseQuery? Its choking on the + symbol in the sdate value. I'd use QParserPlugin however it requires passing a SolrQueryRequest, which is not kosher for testing, perhaps I'll need to bite the bullet and reproduce

Re: Multi core Search is not working when used with SHARDS

2010-03-03 Thread Yonik Seeley
Hmmm, do you have a uniqueKey defined in your schemas? -Yonik http://www.lucidimagination.com On Wed, Mar 3, 2010 at 4:23 PM, JavaGuy84 bbar...@gmail.com wrote: Hi all, I am trying to search on multiple cores (distributed search) but not able to succeed using Shards. I am able to get

Can Solr Create New Indexes?

2010-03-03 Thread Thomas Nguyen
Is there a setting in the config I can set to have Solr create a new Lucene index if the dataDir is empty on startup? I'd like to open our Solr system to allow other developers here to add new cores without having to use the Lucene API directly to create the indexes.

Re: Can Solr Create New Indexes?

2010-03-03 Thread Mark Miller
On 03/03/2010 07:56 PM, Thomas Nguyen wrote: Is there a setting in the config I can set to have Solr create a new Lucene index if the dataDir is empty on startup? I'd like to open our Solr system to allow other developers here to add new cores without having to use the Lucene API directly to

weighted search and index

2010-03-03 Thread Jianbin Dai
Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set

RE: Can Solr Create New Indexes?

2010-03-03 Thread Thomas Nguyen
Hmm I've tried starting Solr with no Lucene index in the dataDir. Here's the Exception I receive when starting Solr and when attempting to add a document to the core: 2010-03-03 16:44:06,479 [main] ERROR org.apache.solr.core.CoreContainer -

Re: Can Solr Create New Indexes?

2010-03-03 Thread Mark Miller
I'm guessing the index folder itself already exists? The data dir can be there, but the index dir itself must not be - that's how it knows to create a new one. Otherwise it thinks the empty dir is the index and cant find the files it expects. On 03/03/2010 08:15 PM, Thomas Nguyen wrote: Hmm

RE: Can Solr Create New Indexes?

2010-03-03 Thread Thomas Nguyen
Ah that's the problem. Not sure why it didn't come to mind to follow the call stack. Thanks for your help! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Wednesday, March 03, 2010 5:20 PM To: solr-user@lucene.apache.org Subject: Re: Can Solr Create New

Re: weighted search and index

2010-03-03 Thread Erick Erickson
You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No

RE: weighted search and index

2010-03-03 Thread Jianbin Dai
Thank you very much Erick! 1. I used boost in search, but I don't know exactly what's the best way to boost, for such as Sports 0.8, golf 0.5 in my example, would it be sports^0.8 AND golf^0.5 ? 2. I cannot use boost in indexing. Because the weight of the value changes, not the field, look at

Re: weighted search and index

2010-03-03 Thread Erick Erickson
Then I'm totally lost as to what you're trying to accomplish. Perhaps a higher-level statement of the problem would help. Because no matter how often I look at your point 2, I don't see what relevance the numbers have if you're not using them to boost at index time. Why are they even there?

RE: weighted search and index

2010-03-03 Thread Jianbin Dai
Hi Erick, Each doc contains some keywords that are indexed. However each keyword is associated with a weight to represent its importance. In my example, D1: fruit 0.8, apple 0.4, banana 0.2 The keyword fruit is the most important keyword, which means I really really want it to be matched in a

Confused with Shards multicore search results

2010-03-03 Thread JavaGuy84
Hi, I finally got shards work with multicore but now I am facing a different issue. I have 2 seperate schema / data config files for each core. I also have different unique id for each schema.xml file. I indexed both the cores and I was able to successfully search independently on each core

Re: Implementing hierarchical facet

2010-03-03 Thread Andy
This dynamicfield feature is great. Didn't know about it. Thanks! --- On Wed, 3/3/10, Geert-Jan Brits gbr...@gmail.com wrote: From: Geert-Jan Brits gbr...@gmail.com Subject: Re: Implementing hierarchical facet To: solr-user@lucene.apache.org Date: Wednesday, March 3, 2010, 5:04 AM you could

Re: 2 Cores, 1 Table, 2 DataImporter -- Import at the same time ?

2010-03-03 Thread Lance Norskog
No, a core is a lucene index. Two DataImportHandler sessions to the same core will run on the same index. You should use lockType of simple or native. 'single' should only be used on a read-only index. From the stack trace it looks like you're only using one index in solr/core. You have to

facet performance when number of values is large

2010-03-03 Thread Andy
I have a facet field whose values are created by users. So potentially there could be a very large number of values. is that going to be a problem performance-wise? A few more questions to help me understand how facet works: - after the filter cache warmed up, will the (if any) performance

Re: Escaping options for tika/solr cell extract-only output

2010-03-03 Thread Lance Norskog
You can return it with any of the other writers, like JSON or PHP. The alternative design decision for the XML output writer would be to emit using CDATA instead of escaping. On Wed, Mar 3, 2010 at 12:54 PM, Dan Hertz (Insight 49, LLC) insigh...@gmail.com wrote: Looking at

Re: weighted search and index

2010-03-03 Thread Lance Norskog
Boosting by convention is flat at 1.0. Usually people boost with numbers like 3 or 5 or 20. On Wed, Mar 3, 2010 at 6:34 PM, Jianbin Dai j...@huawei.com wrote: Hi Erick, Each doc contains some keywords that are indexed. However each keyword is associated with a weight to represent its

Re: Confused with Shards multicore search results

2010-03-03 Thread Lance Norskog
different unique id for each schema.xml file. All cores should have the same schema file with the same unique id field and type. Did you mean that the documents in both cores have a different value for the unique id field? On Wed, Mar 3, 2010 at 6:45 PM, JavaGuy84 bbar...@gmail.com wrote: Hi,

Re: Confused with Shards multicore search results

2010-03-03 Thread JavaGuy84
Thanks a lot for your reply, I will surely try this.. I have a requirement to index 2 diff schema's but need to do a search on both using a single url. Is there a way I can have 2 diff schema's / data config file and do a search on both the indexes using a single URL (like using Shards?)

Re: Confused with Shards multicore search results

2010-03-03 Thread Otis Gospodnetic
Hi, I think this will work as long as the fields involved in the search are identical. That's probably not the case with your shards, though. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message

Update Index : Updating Specific Fields

2010-03-03 Thread Kranti™ K K Parisa
Hi, Is there any way to update the index for only the specific fields? Eg: Index has ONE document consists of 4 fields, F1, F2, F3, F4 Now I want to update the value of field F2, so if I send the update xml to SOLR, can it keep the old field values for F1,F3,F4 and update the new value

Re: Update Index : Updating Specific Fields

2010-03-03 Thread Walter Underwood
No. --wunder On Mar 3, 2010, at 10:40 PM, Kranti™ K K Parisa wrote: Hi, Is there any way to update the index for only the specific fields? Eg: Index has ONE document consists of 4 fields, F1, F2, F3, F4 Now I want to update the value of field F2, so if I send the update xml to SOLR,

Too many .cfs files

2010-03-03 Thread mklprasad
HI All, I set up my 'mergerfactor ' as 10. i have loaded a 1million docs in to solr,after that iam able to see 14 .cfs files in my data/index folder. mergeFactor will not merge after the 11th record comes? Plese clearify? Thanks, Prasad -- View this message in context: