http://wiki.apache.org/solr/TermVectorComponent. You may want to hack
in your own capabilities to implement your own TermVectorMapper for
efficiency reasons.
On Sep 28, 2009, at 5:05 PM, Thung, Peter C CIV SPAWARSYSCEN-PACIFIC,
56340 wrote:
Mark,
Thanks. I think this may be partially
But it would seem that Lucene has always supported highlighting on
NGram fields? as show by the example here:
https://issues.apache.org/jira/browse/LUCENE-1489
When I try to use highlighting with NGramming, none of the text is
highlighted, and instead I get a long string in the highlighting
field
I think I need a further explanation for that.
The Lucene's FastVectorHighlighter which is pointed in SOLR-1268 is
a highlighter that supports n-gram field. Please see the description
for the features etc:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-fast-vector-highligh
One way to track expensive is to look at the query time, QTime, in the solr
log.
There are a couple of tools for analyzing gc logs:
http://www.tagtraum.com/gcviewer.html
https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=HPJMETER
They will give you frequency and duratio
Hi Koji et.al,
You say https://issues.apache.org/jira/browse/SOLR-1268 is an open
issue for the ngram highlighting problem, but it seems to refer to
something unrelated.
Can you/anyone confirm that it is not possible to use highlighting
with an ngram tokenizer/filter..
Thanks,
Aodh.
Hello everyone!
Don't forget that the Meetup is THIS Wednesday! I'm looking forward to
hearing about Hive from the Facebook team ... and there might be a few other
interesting talks as well. Here's the details in the wiki:
http://wiki.apache.org/hadoop/PNW_Hadoop_%2B_Apache_Cloud_Stack_User_Group
Mark Miller wrote:
> Looks like a bug to me. I don't see the commit point being reserved in
> the backup code - which means its likely be removed before its done
> being copied. Gotto reserve it using the delete policy to keep around
> for the full backup duration. I'd file a JIRA issue.
>
>
>
Y
Looks like a bug to me. I don't see the commit point being reserved in
the backup code - which means its likely be removed before its done
being copied. Gotto reserve it using the delete policy to keep around
for the full backup duration. I'd file a JIRA issue.
--
- Mark
http://www.lucidimagina
Thanks to Noble Paul, I think I now understand the Java replication
handler's backup feature. It seems to work as expected on a toy index.
When trying it out on a copy of my production index (300GB-ish),
though, I'm getting FileNotFoundExceptions. These cancel the backup,
and delete the snapshot.yy
Another good option.
Here is a comparison of the commands I replied with and this one:
http://docs.hp.com/en/5992-5899/ch06s02.html
Very similar.
Otis Gospodnetic wrote:
> Jonathan,
>
> Here is the JVM argument for logging GC activity:
>
> -Xloggc:log GC status to a file with time stamp
|-verbose:gc
|
|[GC 325407K->83000K(776768K), 0.2300771 secs]
[GC 325816K->83372K(776768K), 0.2454258 secs]
[Full GC 267628K->83769K(776768K), 1.8479984 secs]|
Additional details with: |-XX:+PrintGCDetails|
|[GC [DefNew: 64575K->959K(64576K), 0.0457646 secs] 196016K->133633K(261184K),
0.045906
Jonathan,
Here is the JVM argument for logging GC activity:
-Xloggc:log GC status to a file with time stamps
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
- Original Message
> From: Jonathan
On Sep 27, 2009, at 9:42 PM, Shalin Shekhar Mangar wrote:
On Mon, Sep 28, 2009 at 2:59 AM, Jibo John wrote:
Additionally, I get the same exception even if I declare the
in the .
class="org.apache.lucene.index.LogByteSizeMergePolicy">
true
That should be instead of
Ye
Mark,
Thanks. I think this may be partially what I need.
Basically, what I'm trying to figure out is the following
If someone enters a keyword say
Apple.
I would like to find all the documents that have the word apple
In them, and then for each document, the number of times it showed up in
each
Thanks to all for thinking about this question. Otis: could you say a
bit more about per segment readers. This is new to me.
I gather that there is a way to specify that the number of readers
should correspond (or automatically correspond) to the number of segments?
I suppose this gives eac
Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340 wrote:
> is there a SOLR query that can access or view the TermFrequencies for
> the various documents
> discovered, Or is the only wya to programmatically access this
> information.
> If so could someon share an example and maybe a link for informatio
How do you track major collections? Even better, how do you log your GC
behavior with details? Right now I just log total time spent on collections,
but I don't really know on which collections.Regard application performance
with the ConcMarkSweepGC, I think I didn't experience any impact for now.
is there a SOLR query that can access or view the TermFrequencies for
the various documents
discovered, Or is the only wya to programmatically access this
information.
If so could someon share an example and maybe a link for information on
how to do this?
Some sample queries?
Thank you in advance
2009/9/24 Noble Paul നോബിള് नोब्ळ् :
> Yes, the only reason to take a backup should be for restoration/archival
> They should contain all the files required for the latest commit point.
Ok, I think I get it now. I assumed "all the files required for the
latest commit point" meant that the backup
On Mon, Sep 28, 2009 at 3:54 PM, Tarun Jain wrote:
> Hi,
> I have created an index where the fields have been indexed with
> omitNorms="true" omitTermFreqAndPositions="true"
> to improve indexing performance. One of the side effects of this is that some
> of the searches with alphanumeric words a
You would have to index GIlmore and gilmore. You could make a separate
field type which does not do upper->lower case transformation.
On Mon, Sep 28, 2009 at 11:49 AM, Siddhartha Pahade
wrote:
> Thnx for the reply
>
> I want to make gilmore* work...sombody told me you can make attributes case
> i
Hi,
I have created an index where the fields have been indexed with
omitNorms="true" omitTermFreqAndPositions="true"
to improve indexing performance. One of the side effects of this is that some
of the searches with alphanumeric words are not working correctly.
Example..
Below is the debugQuery
Do you have your GC logs? Are you still seeing major collections?
Where is the time spent?
Hard to say without some of that info.
The goal of the low pause collector is to finish collecting before the
tenured space is filled - if it doesn't, a standard major collection occurs.
The collector wil
Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems
to solve this ugly bug. With the upgraded JVM I could run the solr servers
for more than 12 hours on the production environment with the GC mentioned
in the previous e-mails. The results are really amazing. The time spent on
That's right. mergeFactor=1 is an even more extreme case. However, with the
new per-segment readers, having an optimized index is no longer the best index
state to go for in some cases.
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, H
Thnx for the reply
I want to make gilmore* work...sombody told me you can make attributes case
insensitive while building an index...
I am trying to research on it...
Do you got any pointer?
Thanks...
On Mon, Sep 28, 2009 at 2:29 PM, Lance Norskog wrote:
> Wildcards don't really get proces
Israel, thanks for your comments. The problem with that alternative is that
it works only if the search application is in our server (and in that case,
of course, the user doesn't have access to any config file). But more often
than not the application is installed on the customer's network, thus h
The optimize operation happens in place.
I've been told that if you set "mergeFactor=2" when indexing, it will
be slower but you will always have a "mostly optimized" index.
On Mon, Sep 28, 2009 at 10:22 AM, Jason Rutherglen
wrote:
> Hmm... Interesting question, not that I know of. The only way
Wildcards don't really get processed like other queries - Gilmore* will work.
On Mon, Sep 28, 2009 at 8:30 AM, Avlesh Singh wrote:
> Such questions are better answered on the user mailing list. You don't need
> to post them on the dev list.
> What matches an incoming query is largely a function o
Another way to index XML data is to use the normal Solr XML updater
and wrap your XML documents inside CDATA blocks.
On Mon, Sep 28, 2009 at 2:12 AM, Thung, Peter C CIV
SPAWARSYSCEN-PACIFIC, 56340 wrote:
> With a basically default install of the trunk version of solr 1.4
> when trying to index an
Great news for Solr -- a third party library that I'm calling is serialized.
Silly me, I made a mistake when ruling out that library as the culprit
earlier. Solr itself scales just great as add threads. JProfiler helped me
find the problem.
Sorry for the false alarm, and thanks for the suggestio
markrmiller wrote:
>
> michael8 wrote:
>>
>> markrmiller wrote:
>>
>>> michael8 wrote:
>>>
Hi,
I know Solr 1.4 is going to be released any day now pending Lucene 2.9
release. Is there anywhere where one can download a pre-released
nighly
build of Solr 1.4
Hmm... Interesting question, not that I know of. The only way
one could do this would be to intercept the newly optimized
files via a FileSwitchDirectory like implementation that knows
which new files are optimized and should "underneath" go to a
different physical path.
On Mon, Sep 28, 2009 at 7:
On Mon, Sep 28, 2009 at 4:46 PM, Olivier Dobberkau
wrote:
>
> hi marian.
> our extension will be able to do see also once we have set up the indexing
> queue for the typo3 backend.
> we have a concept called typo3 extensions connectors so that you will be
> able to add index documents to your inde
Such questions are better answered on the user mailing list. You don't need
to post them on the dev list.
What matches an incoming query is largely a function of your field type
definition and the way you analyze your field data query time and index
time.
Copy-paste your field and its type definit
Hi guys,
My search result is Gilmore Girls
If I search on Gilmore, it gives me result Gilmore Girls in the output as
desired.
However, if I search on string gilmore* or gilm , it does not work whereas
we want it to work.
Any help highly appreciated.
Thanks!
> The DisMax parser essentially creates a set of queries against
> different fields. These queries are analyzed as per each field.
>
> I think this what you are talking about- "The" in a movie title is
> diffferent from "the" in the movie description. Would you expect "The
> Sound Of Music" to fet
I'm using the add(MyObject) command form (SolrNet) in a foreach loop
to add my objects to the index.
In the catalina-log i cannot see anything that helps me out.
It stops at:
28.sep.2009 08:58:40
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {add=[12345]} 0 187
28.sep.2009
Is it possible to tell Solr or Lucene, when optimizing, to write the
files that constitute the optimized index to somewhere other than
SOLR_HOME/data/index or is there something about the optimize that
requires the final segment to be created in SOLR_HOME/data/index?
Thanks,
Phil
Note that whatever query you use will be cached in the query cache. -
*:* is likely the best choice. Another alternative if you've got
dynamic fields wired in, is something like
_nonexistent_field_s:dummy_value
Erik
On Sep 28, 2009, at 5:17 AM, Øystein F. Steimler wrote:
Hi, l
patch created for lucene:
https://issues.apache.org/jira/browse/LUCENE-1931
I am not sure what the right thing to do here is to hook it into
QueryParser.java.
Maybe the Solr people can comment on how to hook it into Solr.
-John
On Mon, Sep 28, 2009 at 6:31 AM, John Wang wrote:
> You can actu
Marian Steinbach schrieb:
On Sat, Sep 26, 2009 at 3:22 AM, Lance Norskog wrote:
Have you seen this? It is another Solr/Typeo3 integration project.
http://forge.typo3.org/projects/show/extension-solr
Would you consider open-sourcing your Solr/Typo3 integration?
Hi Lance!
I wasn't a
Further interestingness with replication on the thread blocking issue. 1
core seems to take a VERY long time to replicate. This duration is close to
5 minutes when cores 2x its size take like 100 seconds to pull down. The
searcher is also taking about 4-5 minutes to warm when an almost identical
You can actually write a NoHitsQuery implementation,it is rather simple. If
you like, I can create a issue and attach a patch.
-John
On Mon, Sep 28, 2009 at 5:17 AM, Øystein F. Steimler wrote:
> Hi, list!
>
> I want to add a q.alt matching no documents in my dismax handler to serve a
> consiste
On Mon, Sep 28, 2009 at 7:51 AM, Rahul R wrote:
> Yonik,
> I understand that the network can be a bottle-neck but I am pretty sure that
> it is not. I am operating on a 100 MBPS intranet... How do I ensure that
> stored fields are cached by the OS ? Only the Solr caches within the JVM are
> un
There's nothing in that output that indicates something we can help
with over in solr-user land. What is the call you're making to Solr?
Did Solr log anything anomalous?
Erik
On Sep 28, 2009, at 4:41 AM, Steinar Asbjørnsen wrote:
I just posted to the SolrNet-group since i have th
Hi, list!
I want to add a q.alt matching no documents in my dismax handler to serve a
consistent reply to a client application.
Without a q.alt, a missing q from the client will cause an "missing query
string" error. With a q.alt matching no document I will be able to respond
with an empty res
Yonik,
I understand that the network can be a bottle-neck but I am pretty sure that
it is not. I am operating on a 100 MBPS intranet... How do I ensure that
stored fields are cached by the OS ? Only the Solr caches within the JVM are
under my control.. The result set has around 10K document
I just posted to the SolrNet-group since i have the exact same(?)
problem.
Hope I'm not beeing rude posting here as well (since the SolrNet-group
doesn't seem as active as this mailinglist).
The problem occurs when I'm running an incremental feed(self made) of
a index.
My post:
[snip]
Wha
With a basically default install of the trunk version of solr 1.4
when trying to index an xml file, it appears that the xml tags
seem to get stripped when indexed.
If the tag names and their frequenicies are important to me for search
purposes could someone tell me what
my options are to not hav
On Sat, Sep 26, 2009 at 3:22 AM, Lance Norskog wrote:
> Have you seen this? It is another Solr/Typeo3 integration project.
>
> http://forge.typo3.org/projects/show/extension-solr
>
> Would you consider open-sourcing your Solr/Typo3 integration?
>
Hi Lance!
I wasn't aware of that extension. Havin
51 matches
Mail list logo