Re: Question on Access or viewing TermFrequency Vector via SOLR.

2009-09-28 Thread Grant Ingersoll
http://wiki.apache.org/solr/TermVectorComponent. You may want to hack in your own capabilities to implement your own TermVectorMapper for efficiency reasons. On Sep 28, 2009, at 5:05 PM, Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340 wrote: Mark, Thanks. I think this may be partially

Re: Highlighting in stemmed or n-grammed fields possible?

2009-09-28 Thread aodhol
But it would seem that Lucene has always supported highlighting on NGram fields? as show by the example here: https://issues.apache.org/jira/browse/LUCENE-1489 When I try to use highlighting with NGramming, none of the text is highlighted, and instead I get a long string in the highlighting field

Re: Highlighting in stemmed or n-grammed fields possible?

2009-09-28 Thread Koji Sekiguchi
I think I need a further explanation for that. The Lucene's FastVectorHighlighter which is pointed in SOLR-1268 is a highlighter that supports n-gram field. Please see the description for the features etc: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-fast-vector-highligh

Re: Solr and Garbage Collection

2009-09-28 Thread Bill Au
One way to track expensive is to look at the query time, QTime, in the solr log. There are a couple of tools for analyzing gc logs: http://www.tagtraum.com/gcviewer.html https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=HPJMETER They will give you frequency and duratio

Re: Highlighting in stemmed or n-grammed fields possible?

2009-09-28 Thread aodhol
Hi Koji et.al, You say https://issues.apache.org/jira/browse/SOLR-1268 is an open issue for the ngram highlighting problem, but it seems to refer to something unrelated. Can you/anyone confirm that it is not possible to use highlighting with an ngram tokenizer/filter.. Thanks, Aodh.

Re: Seattle / PNW Hadoop/Lucene/HBase Meetup, Wed Sep 30th

2009-09-28 Thread Bradford Stephens
Hello everyone! Don't forget that the Meetup is THIS Wednesday! I'm looking forward to hearing about Hive from the Facebook team ... and there might be a few other interesting talks as well. Here's the details in the wiki: http://wiki.apache.org/hadoop/PNW_Hadoop_%2B_Apache_Cloud_Stack_User_Group

Re: FileNotFoundException in Java replication handler backups

2009-09-28 Thread Mark Miller
Mark Miller wrote: > Looks like a bug to me. I don't see the commit point being reserved in > the backup code - which means its likely be removed before its done > being copied. Gotto reserve it using the delete policy to keep around > for the full backup duration. I'd file a JIRA issue. > > > Y

Re: FileNotFoundException in Java replication handler backups

2009-09-28 Thread Mark Miller
Looks like a bug to me. I don't see the commit point being reserved in the backup code - which means its likely be removed before its done being copied. Gotto reserve it using the delete policy to keep around for the full backup duration. I'd file a JIRA issue. -- - Mark http://www.lucidimagina

FileNotFoundException in Java replication handler backups

2009-09-28 Thread Chris Harris
Thanks to Noble Paul, I think I now understand the Java replication handler's backup feature. It seems to work as expected on a toy index. When trying it out on a copy of my production index (300GB-ish), though, I'm getting FileNotFoundExceptions. These cancel the backup, and delete the snapshot.yy

Re: Solr and Garbage Collection

2009-09-28 Thread Mark Miller
Another good option. Here is a comparison of the commands I replied with and this one: http://docs.hp.com/en/5992-5899/ch06s02.html Very similar. Otis Gospodnetic wrote: > Jonathan, > > Here is the JVM argument for logging GC activity: > > -Xloggc:log GC status to a file with time stamp

Re: Solr and Garbage Collection

2009-09-28 Thread Mark Miller
|-verbose:gc | |[GC 325407K->83000K(776768K), 0.2300771 secs] [GC 325816K->83372K(776768K), 0.2454258 secs] [Full GC 267628K->83769K(776768K), 1.8479984 secs]| Additional details with: |-XX:+PrintGCDetails| |[GC [DefNew: 64575K->959K(64576K), 0.0457646 secs] 196016K->133633K(261184K), 0.045906

Re: Solr and Garbage Collection

2009-09-28 Thread Otis Gospodnetic
Jonathan, Here is the JVM argument for logging GC activity: -Xloggc:log GC status to a file with time stamps Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Jonathan

Re: Problem changing the default MergePolicy/Scheduler

2009-09-28 Thread Jibo John
On Sep 27, 2009, at 9:42 PM, Shalin Shekhar Mangar wrote: On Mon, Sep 28, 2009 at 2:59 AM, Jibo John wrote: Additionally, I get the same exception even if I declare the in the . class="org.apache.lucene.index.LogByteSizeMergePolicy"> true That should be instead of Ye

RE: Question on Access or viewing TermFrequency Vector via SOLR.

2009-09-28 Thread Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340
Mark, Thanks. I think this may be partially what I need. Basically, what I'm trying to figure out is the following If someone enters a keyword say Apple. I would like to find all the documents that have the word apple In them, and then for each document, the number of times it showed up in each

Re: Writing optimized index to different storage?

2009-09-28 Thread Phillip Farber
Thanks to all for thinking about this question. Otis: could you say a bit more about per segment readers. This is new to me. I gather that there is a way to specify that the number of readers should correspond (or automatically correspond) to the number of segments? I suppose this gives eac

Re: Question on Access or viewing TermFrequency Vector via SOLR.

2009-09-28 Thread Mark Miller
Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340 wrote: > is there a SOLR query that can access or view the TermFrequencies for > the various documents > discovered, Or is the only wya to programmatically access this > information. > If so could someon share an example and maybe a link for informatio

Re: Solr and Garbage Collection

2009-09-28 Thread Jonathan Ariel
How do you track major collections? Even better, how do you log your GC behavior with details? Right now I just log total time spent on collections, but I don't really know on which collections.Regard application performance with the ConcMarkSweepGC, I think I didn't experience any impact for now.

Question on Access or viewing TermFrequency Vector via SOLR.

2009-09-28 Thread Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340
is there a SOLR query that can access or view the TermFrequencies for the various documents discovered, Or is the only wya to programmatically access this information. If so could someon share an example and maybe a link for information on how to do this? Some sample queries? Thank you in advance

Re: Use cases for ReplicationHandler's backup facility?

2009-09-28 Thread Chris Harris
2009/9/24 Noble Paul നോബിള്‍ नोब्ळ् : > Yes, the only reason to take a backup should be for restoration/archival > They should contain all the files required for the latest commit point. Ok, I think I get it now. I assumed "all the files required for the latest commit point" meant that the backup

Re: alphanumeric queries using LuceneQParser

2009-09-28 Thread Yonik Seeley
On Mon, Sep 28, 2009 at 3:54 PM, Tarun Jain wrote: > Hi, > I have created an index where the fields have been indexed with > omitNorms="true" omitTermFreqAndPositions="true" > to improve indexing performance. One of the side effects of this is that some > of the searches with alphanumeric words a

Re: Regular expression not working

2009-09-28 Thread Lance Norskog
You would have to index GIlmore and gilmore. You could make a separate field type which does not do upper->lower case transformation. On Mon, Sep 28, 2009 at 11:49 AM, Siddhartha Pahade wrote: > Thnx for the reply > > I want to make gilmore* work...sombody told me you can make attributes case > i

alphanumeric queries using LuceneQParser

2009-09-28 Thread Tarun Jain
Hi, I have created an index where the fields have been indexed with omitNorms="true" omitTermFreqAndPositions="true" to improve indexing performance. One of the side effects of this is that some of the searches with alphanumeric words are not working correctly. Example.. Below is the debugQuery

Re: Solr and Garbage Collection

2009-09-28 Thread Mark Miller
Do you have your GC logs? Are you still seeing major collections? Where is the time spent? Hard to say without some of that info. The goal of the low pause collector is to finish collecting before the tenured space is filled - if it doesn't, a standard major collection occurs. The collector wil

Re: Solr and Garbage Collection

2009-09-28 Thread Jonathan Ariel
Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems to solve this ugly bug. With the upgraded JVM I could run the solr servers for more than 12 hours on the production environment with the GC mentioned in the previous e-mails. The results are really amazing. The time spent on

Re: Writing optimized index to different storage?

2009-09-28 Thread Otis Gospodnetic
That's right. mergeFactor=1 is an even more extreme case. However, with the new per-segment readers, having an optimized index is no longer the best index state to go for in some cases. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, H

Re: Regular expression not working

2009-09-28 Thread Siddhartha Pahade
Thnx for the reply I want to make gilmore* work...sombody told me you can make attributes case insensitive while building an index... I am trying to research on it... Do you got any pointer? Thanks... On Mon, Sep 28, 2009 at 2:29 PM, Lance Norskog wrote: > Wildcards don't really get proces

Re: Limit number of docs that can be indexed (security)

2009-09-28 Thread Valdir Salgueiro
Israel, thanks for your comments. The problem with that alternative is that it works only if the search application is in our server (and in that case, of course, the user doesn't have access to any config file). But more often than not the application is installed on the customer's network, thus h

Re: Writing optimized index to different storage?

2009-09-28 Thread Lance Norskog
The optimize operation happens in place. I've been told that if you set "mergeFactor=2" when indexing, it will be slower but you will always have a "mostly optimized" index. On Mon, Sep 28, 2009 at 10:22 AM, Jason Rutherglen wrote: > Hmm... Interesting question, not that I know of. The only way

Re: Regular expression not working

2009-09-28 Thread Lance Norskog
Wildcards don't really get processed like other queries - Gilmore* will work. On Mon, Sep 28, 2009 at 8:30 AM, Avlesh Singh wrote: > Such questions are better answered on the user mailing list. You don't need > to post them on the dev list. > What matches an incoming query is largely a function o

Re: Question on trying to Index and XML document...

2009-09-28 Thread Lance Norskog
Another way to index XML data is to use the normal Solr XML updater and wrap your XML documents inside CDATA blocks. On Mon, Sep 28, 2009 at 2:12 AM, Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340 wrote: > With a basically default install of the trunk version of solr 1.4 > when trying to index an

Re: Parallel requests to Tomcat

2009-09-28 Thread Michael
Great news for Solr -- a third party library that I'm calling is serialized. Silly me, I made a mistake when ruling out that library as the culprit earlier. Solr itself scales just great as add threads. JProfiler helped me find the problem. Sorry for the false alarm, and thanks for the suggestio

Re: download pre-release nightly solr 1.4

2009-09-28 Thread michael8
markrmiller wrote: > > michael8 wrote: >> >> markrmiller wrote: >> >>> michael8 wrote: >>> Hi, I know Solr 1.4 is going to be released any day now pending Lucene 2.9 release. Is there anywhere where one can download a pre-released nighly build of Solr 1.4

Re: Writing optimized index to different storage?

2009-09-28 Thread Jason Rutherglen
Hmm... Interesting question, not that I know of. The only way one could do this would be to intercept the newly optimized files via a FileSwitchDirectory like implementation that knows which new files are optimized and should "underneath" go to a different physical path. On Mon, Sep 28, 2009 at 7:

Re: Showcase: Facetted Search for Wine using Solr

2009-09-28 Thread Marian Steinbach
On Mon, Sep 28, 2009 at 4:46 PM, Olivier Dobberkau wrote: > > hi marian. > our extension will be able to do see also once we have set up the indexing > queue for the typo3 backend. > we have a concept called typo3 extensions connectors so that you will be > able to add index documents to your inde

Re: Regular expression not working

2009-09-28 Thread Avlesh Singh
Such questions are better answered on the user mailing list. You don't need to post them on the dev list. What matches an incoming query is largely a function of your field type definition and the way you analyze your field data query time and index time. Copy-paste your field and its type definit

Regular expression not working

2009-09-28 Thread Siddhartha Pahade
Hi guys, My search result is Gilmore Girls If I search on Gilmore, it gives me result Gilmore Girls in the output as desired. However, if I search on string gilmore* or gilm , it does not work whereas we want it to work. Any help highly appreciated. Thanks!

RE: Mixed field types and boolean searching

2009-09-28 Thread Ensdorf Ken
> The DisMax parser essentially creates a set of queries against > different fields. These queries are analyzed as per each field. > > I think this what you are talking about- "The" in a movie title is > diffferent from "the" in the movie description. Would you expect "The > Sound Of Music" to fet

Re: "Only one usage of each socket address" error

2009-09-28 Thread Steinar Asbjørnsen
I'm using the add(MyObject) command form (SolrNet) in a foreach loop to add my objects to the index. In the catalina-log i cannot see anything that helps me out. It stops at: 28.sep.2009 08:58:40 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[12345]} 0 187 28.sep.2009

Writing optimized index to different storage?

2009-09-28 Thread Phillip Farber
Is it possible to tell Solr or Lucene, when optimizing, to write the files that constitute the optimized index to somewhere other than SOLR_HOME/data/index or is there something about the optimize that requires the final segment to be created in SOLR_HOME/data/index? Thanks, Phil

Re: q.alt matching no documents

2009-09-28 Thread Erik Hatcher
Note that whatever query you use will be cached in the query cache. - *:* is likely the best choice. Another alternative if you've got dynamic fields wired in, is something like _nonexistent_field_s:dummy_value Erik On Sep 28, 2009, at 5:17 AM, Øystein F. Steimler wrote: Hi, l

Re: q.alt matching no documents

2009-09-28 Thread John Wang
patch created for lucene: https://issues.apache.org/jira/browse/LUCENE-1931 I am not sure what the right thing to do here is to hook it into QueryParser.java. Maybe the Solr people can comment on how to hook it into Solr. -John On Mon, Sep 28, 2009 at 6:31 AM, John Wang wrote: > You can actu

Re: Showcase: Facetted Search for Wine using Solr

2009-09-28 Thread Olivier Dobberkau
Marian Steinbach schrieb: On Sat, Sep 26, 2009 at 3:22 AM, Lance Norskog wrote: Have you seen this? It is another Solr/Typeo3 integration project. http://forge.typo3.org/projects/show/extension-solr Would you consider open-sourcing your Solr/Typo3 integration? Hi Lance! I wasn't a

Re: Thread Blocking Radomly

2009-09-28 Thread Jeff Newburn
Further interestingness with replication on the thread blocking issue. 1 core seems to take a VERY long time to replicate. This duration is close to 5 minutes when cores 2x its size take like 100 seconds to pull down. The searcher is also taking about 4-5 minutes to warm when an almost identical

Re: q.alt matching no documents

2009-09-28 Thread John Wang
You can actually write a NoHitsQuery implementation,it is rather simple. If you like, I can create a issue and attach a patch. -John On Mon, Sep 28, 2009 at 5:17 AM, Øystein F. Steimler wrote: > Hi, list! > > I want to add a q.alt matching no documents in my dismax handler to serve a > consiste

Re: Measuring timing with debugQuery=true

2009-09-28 Thread Yonik Seeley
On Mon, Sep 28, 2009 at 7:51 AM, Rahul R wrote: > Yonik, > I understand that the network can be a bottle-neck but I am pretty sure that > it is not. I am operating on a 100 MBPS intranet... How do I ensure that > stored fields are cached by the OS ? Only the Solr caches within the JVM are > un

Re: "Only one usage of each socket address" error

2009-09-28 Thread Erik Hatcher
There's nothing in that output that indicates something we can help with over in solr-user land. What is the call you're making to Solr? Did Solr log anything anomalous? Erik On Sep 28, 2009, at 4:41 AM, Steinar Asbjørnsen wrote: I just posted to the SolrNet-group since i have th

q.alt matching no documents

2009-09-28 Thread Øystein F. Steimler
Hi, list! I want to add a q.alt matching no documents in my dismax handler to serve a consistent reply to a client application. Without a q.alt, a missing q from the client will cause an "missing query string" error. With a q.alt matching no document I will be able to respond with an empty res

Re: Measuring timing with debugQuery=true

2009-09-28 Thread Rahul R
Yonik, I understand that the network can be a bottle-neck but I am pretty sure that it is not. I am operating on a 100 MBPS intranet... How do I ensure that stored fields are cached by the OS ? Only the Solr caches within the JVM are under my control.. The result set has around 10K document

Re: "Only one usage of each socket address" error

2009-09-28 Thread Steinar Asbjørnsen
I just posted to the SolrNet-group since i have the exact same(?) problem. Hope I'm not beeing rude posting here as well (since the SolrNet-group doesn't seem as active as this mailinglist). The problem occurs when I'm running an incremental feed(self made) of a index. My post: [snip] Wha

Question on trying to Index and XML document...

2009-09-28 Thread Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340
With a basically default install of the trunk version of solr 1.4 when trying to index an xml file, it appears that the xml tags seem to get stripped when indexed. If the tag names and their frequenicies are important to me for search purposes could someone tell me what my options are to not hav

Re: Showcase: Facetted Search for Wine using Solr

2009-09-28 Thread Marian Steinbach
On Sat, Sep 26, 2009 at 3:22 AM, Lance Norskog wrote: > Have you seen this? It is another Solr/Typeo3 integration project. > > http://forge.typo3.org/projects/show/extension-solr > > Would you consider open-sourcing your Solr/Typo3 integration? > Hi Lance! I wasn't aware of that extension. Havin