://wiki.apache.org/nutch/WritingPluginExample-0.9
<http://wiki.apache.org/nutch/WritingPluginExample-0.9>After adding this
plugin, I was able to index the files by skipping this index page...hope
this helps...
On Wed, Apr 28, 2010 at 1:54 PM, BK wrote:
> Hello all,
>
> I have indexed
Hello all,
I have indexed few directories which contain html files and the *index to
each directory* is showing up as one of the search results. Is there any way
to skip this directory from search results. e.g. *Index of
C:\temp\html*, *Index
of C:\temp\html\dir2 *are showing up in the results
I have a requirement where I want to index and search file system contents
(my local server contents), and at the same time crawl a select set of
web-sites on the same search query.
I have search for my local file system implemented through Lucene. I would
like to have Nutch just crawl the web
YES - I forgot to include that... robots.txt is fine. it is wide open:
###
#
# sample robots.txt file for this website
#
# addresses all robots by using wild card *
User-agent: *
#
# list folders robots are not allowed to index
#Disallow: /tutorials/404redirect
esoftware.com]
>>> Sent: Wednesday, 21 April 2010 9:44 AM
>>> To: nutch-user@lucene.apache.org
>>> Subject: nutch says No URLs to fetch - check your seed list and URL
>>> filters when trying to index fmforums.com
>>>
>>> nutch says No URLs to fetch - chec
.com]
Sent: Wednesday, 21 April 2010 9:44 AM
To: nutch-user@lucene.apache.org
Subject: nutch says No URLs to fetch - check your seed list and URL
filters when trying to index fmforums.com
nutch says No URLs to fetch - check your seed list and URL filters when
trying to index fmforums.com.
I am using th
ers when trying to index fmforums.com
>
> nutch says No URLs to fetch - check your seed list and URL filters when
> trying to index fmforums.com.
>
> I am using this command:
>
> bin/nutch crawl urls -dir crawl -depth 3 -topN 50
>
> - urls directory contains urls.txt which
nutch says No URLs to fetch - check your seed list and URL filters when
trying to index fmforums.com.
I am using this command:
bin/nutch crawl urls -dir crawl -depth 3 -topN 50
- urls directory contains urls.txt which contains http://www.fmforums.com/
- crawl-urlfilter.txt contains +^http
trying to index fmforums.com???
also fmforums.com/robots.txt looks ok:
###
#
# sample robots.txt file for this website
#
# addresses all robots by using wild card *
User-agent: *
#
# list folders robots are not allowed to index
#Disallow: /tutorials/404redirect/
Disallow
ilter, seeds, paths,
> nutch-site.xml and so on by overwrite the solr- and nutch-configurations).
> the solr-server shall indexing the configured paths of intranet (file,
> smb-shares, svn, ...) and hold the index and nutch shall crawl the configured
> websites (html, pdf, doc, ...) and i
Hi,
I found the problem, I could open the index on my server (Linux) but not on
my desktop (Windows) so something must be messed up in transfering the files
(FTP), same thing used to work just fine with nutch-0.9. I tried to zip it
on the server and then unzip it on the windows and then I can
On 2010-04-01 21:09, Magnús Skúlason wrote:
> Hi,
>
> I am getting the following exception when I try to open a nutch 1.0 (I am
> using the official release) index with Luke (0.9.9.1)
>
> java.io.IOException: read past EOF
> at
> org.apache.lucene.store.B
Hi,
I am getting the following exception when I try to open a nutch 1.0 (I am
using the official release) index with Luke (0.9.9.1)
java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.
java:151)
at
the index and nutch shall
crawl the configured websites (html, pdf, doc, ...) and indexing these to
solr-server. currently i use the whole-web-crawl-script shown below. the
indexing of the plain/text websites into solr is not a problem, but when i
would like to crawl a website (included pdf
I was trying a crawl with 200 seeds. In previous cases it used to create the
index with out any problem , now when i started the crawl its show the
following exception at depth 2
attempt_201003301923_0007_m_00_0: Aborting with 100 hung threads.
Task attempt_201003301923_0007_m_04_0
Hello Arnaud;
why not start by making sure you are parsing correctly; by debugging your code
and check if author
is added to metadata ..
if it parsed and added to meta data then move to the index part.
in indexer do something like this; and make sure it added to doc:
String[] tags
Hello Ahmad and all nutch-users
I would to thank you for your response.
Exactly as you say there isn't the same interface with the version 1.0 of
NUTCH
So actually i don't have error in building the plugin but the problem is
that no new field appears on the index
when i display the i
__
From: Arnaud Garcia
To: nutch-user@lucene.apache.org
Sent: Wed, March 17, 2010 8:25:26 AM
Subject: Re: Plugin installed , deployed and works correctly but no new field
in the index
2010/3/17 Arnaud Garcia
>
>
> 2010/3/17 Arnaud Garcia
>
> Hello everybody
>&g
hi there,
I need to open a solr 1.4 index from the file system using
SolrIndexReader.open();
this seems to do the job ok and i can read my documents. the problem
arises when i try to get a date field which was indexed as text; here is
what toString gives me;
stored/uncompressed,binary
y
>> in the file /nutch/src/plugin/ and
>>
>> the "author" directory was been created on the directory /nutch/build/ .
>>
>>
>> THE PROBLEM IS :
>>
>> No new field named "author" exists in the index .
>>
>> I m using Luke
, all things are built successfully , (plugin (separately)+ Nutch ) ,
> the name of the plugin ("author) was added in nutch-site.xml file ,
>
> and the balisewas added correctly in
> the file /nutch/src/plugin/ and
>
> the "author" directory was been created on the di
uthor) was added in nutch-site.xml file ,
and the balisewas added correctly in
the file /nutch/src/plugin/ and
the "author" directory was been created on the directory /nutch/build/ .
THE PROBLEM IS :
No new field named "author" exists in the index .
I m using Luke to read
Is there possible to do this with nutch?
--
View this message in context:
http://old.nabble.com/Can-nutch-index-file-exchanger-such-as-depositfiles.com-tp27874535p27874535.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Hi, all
I get outofmemory Error when index using bin/nutch index crawl/indexes
crawl/crawldb crawl/linkdb crawl/segments/*
I have configure HADOOP_HEAPSIZE in hadoop-env.sh and
mapred.child.java.opts in mapred-site.xml to the hardware limit.
mapred.child.java.opts
-Xmx2600m
There's no good way to do this.
I'm waiting for Hbase integration with Nutch, which will make this
operation much easier. The data store structure nutch is using now is
not suitable for adding a single url to the index as I know.
Thanks!
Xiao
On Tue, Feb 16, 2010 at 7:47 PM, Ahmad Al-A
or you can add a previlege filed in lucene index, and add a phrase on
this field for each search query.
On Mon, Feb 22, 2010 at 10:48 AM, QueroVc wrote:
>
> Looking for a solution to the following subject:
>
> My research will be available to both internal and external audience
access the results?
Now, thank you.
--
View this message in context:
http://old.nabble.com/Two-index-tp27692301p27692301.html
Sent from the Nutch - User mailing list archive at Nabble.com.
and inject them into the crawldb... after that, I need
to index it only; I guess just use the current generator and other
stuff with depth equals to one doing it.
what I supposed to use for doing this; and any other missing information I
should know ?!!
and is building a plug-in is more suitable
rts.
So I had to write restarting procedure myself. Nothing difficult. Just close
current index and reopen it again.
And then I just pull corresponding URL to re-initialize index after
re-crawling.
private void reInitNutchBean() throws IOException
{
bean.close();
bean.reinitilize();
}
p
Hello all -
I need to update the live search index - most preferably without restarting
the application. I'm using nutch 0.9 in WebSphere. By doing a few
searches, it seems that this is a large issue with a lot of history. Where
does it stand today? Is there a .jsp I can crea
Great thx I can open it will help,
but I don't get the summary page to be populated is this normal ???
2009/12/11 Andrzej Bialecki
> On 2009-12-11 22:21, MilleBii wrote:
>
>> Guys is there a way you can get Luke to read the index from hdfs:// ???
>> Or you have to c
On 2009-12-11 22:21, MilleBii wrote:
Guys is there a way you can get Luke to read the index from hdfs:// ???
Or you have to copy it out to the local filesystem?
Luke 0.9.9 can open indexes directly from HDFS hosted on Hadoop 0.19.x.
Luke 0.9.9.1 can do the same, but uses Hadoop 0.20.1.
Start
Guys is there a way you can get Luke to read the index from hdfs:// ???
Or you have to copy it out to the local filesystem?
--
-MilleBii-
Is there any readymade plug-in for office 2007 documents available or I have to
write it by my own?
-Original Message-
From: yangfeng [mailto:yea...@gmail.com]
Sent: Monday, December 07, 2009 4:35 PM
To: nutch-user@lucene.apache.org
Subject: Re: How to successfully crawl and index
docx should be parsed,A plugin can be used to Parsed docx file. you get some
help info from parse-html plugin and so on.
2009/12/4 Rupesh Mankar
> Hi,
>
> I am new to Nutch. I want to crawl and search office 2007 documents (.docx,
> .pptx etc) from Nutch. But when I try to crawl, crawler throws
Hi,
I am new to Nutch. I want to crawl and search office 2007 documents (.docx,
.pptx etc) from Nutch. But when I try to crawl, crawler throws following error:
fetching http://10.88.45.140:8081/tutorial/Office-2007-document.docx
Error parsing: http://10.88.45.140:8081/tutorial/Office-2007-docume
Jesse Hires wrote:
Does "bin/nutch merge" only create a whole new index out of several smaller
indexes, or can it be used to incrementally update a single large index with
newly fetched and indexed smaller segments?
It can do either - the tool merges indexes as-is without de-duplica
Does "bin/nutch merge" only create a whole new index out of several smaller
indexes, or can it be used to incrementally update a single large index with
newly fetched and indexed smaller segments?
Jesse
int GetRandomNumber()
{
return 4; // Chosen by fair ro
mp): 2548
status 5 (db_redir_perm): 3252
status 6 (db_notmodified): 1499
CrawlDb statistics: done
Once I do the generate/fetch/updatedb/mergesegs,
I do a mergesegs -slice (1/2 * total_urls) This is the step that is taking
too long.
I then index each of the two new segments indivi
hi,
not sure if this will work in your case;
but in a nutshell -
first create a nutch index by crawling some urls.
open both indexes ie
IndexReader r = IndexReader.open(nutch_index)
IndexReader r2 = IndexReader.open(your_index)
then write a new index;
IndexWriter writer = new IndexWriter
I have built a custom index using Lucene, as the data source is not web pages
but some custom text files. I stored several indexed fields in my index.
Now my colleague requires me to make sure this index works with Nutch. So I
set up Nutch on the server, followed the steps, crawled some pages
that seems to work. thanks for that. it was a bit fiddly more than i
expected but got the index sorted.
found an issue with sorting as most fields cannot be sorted by; and
throwing a
java.lang.RuntimeException: Unknown sort value type!
at
fa...@butterflycluster.net wrote:
hi all,
i have an existing index - we have a custom field that needs to be added
or changed in every currently indexed document ;
whats the best way to go about this without recreating the index again?
There are ways to do it directly on the index, but this
?
db.fetch.schedule.class = AdaptiveFetchSchedule
db.update.additions.allowed = true
db.ignore.internal.links = false
db.ignore.external.links = true (because we are intranet only)
>
>
> Currently we crawl every two days and create a new index and then merge
> with
> earlier index. For one it
hi all,
i have an existing index - we have a custom field that needs to be added
or changed in every currently indexed document ;
whats the best way to go about this without recreating the index again?
currently some documents have the field some dont;
the ones that have it need to be updated
Currently we crawl every two days and create a new index and then merge with
earlier index. For one it takes too long as mergesegs seems to take time
proportional to the size of both indexes combined. Equally problematic
issue is mergesegs fail a significant portion of the time. Probability
Hello,
I am looking for a way to search for multiple indexes from one webapp
and found some code. I can allways make one webapp = one website but
what if it grows?
Is it possible to make this code work:
in search.jsp
/*
Comment this original line of code and use code below.
Configuratio
ed
>> So didn't get outlinks to kml files from html.
>> So I can't parse and index kml files.
>> I might not be right, but I have a feeling that it's not possible
>> without modifying source code.
>
> It's possible to do this with a custom indexing
Dmitriy Fundak wrote:
If I disable html-parser(remove "parse-(html" from plugin.includes
property) html filed didn't get parsed
So didn't get outlinks to kml files from html.
So I can't parse and index kml files.
I might not be right, but I have a feeling that it's
If I disable html-parser(remove "parse-(html" from plugin.includes
property) html filed didn't get parsed
So didn't get outlinks to kml files from html.
So I can't parse and index kml files.
I might not be right, but I have a feeling that it's not possible
without mod
disable the html-parser from the nutch-site and keep only your parser.
you can also add in uour filter file this : -(htm|html)$
thx
> Date: Mon, 26 Oct 2009 17:53:11 +0300
> Subject: How to index files only with specific type
> From: dfun...@gmail.com
> To: nutch-user@lucen
Hi, I've create parser and indexer to specific file type(geo xml meta
file - kml).
I am trying to crawl couple of sites, and index only files of this type.
I don't want to index html or anything else.
How can I achieve this?
Thanks.
paul tomblin has sent a patch at 14.10.2009
to filter out not modified pages makes sense for me if the index is
built incrementally and
if these pages are already in the index which is updated then
lucene offers the option to update an index
but in my case i always build a new one.
you may
Some groups of urls are all from the same site. After I
> am done with all groups, I copy all the segments together, do a crawldb
> update, which will create a new crawldb, and then index.
>
> This scheme worked well with nutch 0.9. But when I switch to nutch 1.0,
> search results
,
and schedules. Some groups of urls are all from the same site. After I
am done with all groups, I copy all the segments together, do a crawldb
update, which will create a new crawldb, and then index.
This scheme worked well with nutch 0.9. But when I switch to nutch 1.0,
search results will miss urls
Andrzej Bialecki wrote:
>
> JusteAvantToi wrote:
>> Hi all,
>>
>> I am new on using Nutch and I found that Nutch is really good. I have a
>> problem and hope somebody can shed a light.
>>
>> I have built an index and a web application that makes u
JusteAvantToi wrote:
Hi all,
I am new on using Nutch and I found that Nutch is really good. I have a
problem and hope somebody can shed a light.
I have built an index and a web application that makes use of that index. I
plan to have two web application servers running the application. Since
Hi all,
I am new on using Nutch and I found that Nutch is really good. I have a
problem and hope somebody can shed a light.
I have built an index and a web application that makes use of that index. I
plan to have two web application servers running the application. Since I do
not want to
esting it again.
>>
>> Needless to say it would seem more straightforward to tackle this in some
>> kind of parser plugin that could break the original page into pieces that
>> are treated as standalone pages for indexing purposes.
>>
>> Last but not least conce
ward to tackle this in some
kind of parser plugin that could break the original page into pieces that
are treated as standalone pages for indexing purposes.
Last but not least conceptually a plugin for the indexer might be able to
take a set of custom meta data for a replies "collection" an
me
kind of parser plugin that could break the original page into pieces that
are treated as standalone pages for indexing purposes.
Last but not least conceptually a plugin for the indexer might be able to
take a set of custom meta data for a replies "collection" and index it as
separate
cation in nutch mean and how does
it work??
Is it possible to configure nutch to remove duplicate contents like
navigation bar during its de-duplication process??
Regards,
Winz
--
View this message in context:
http://www.nabble.com/how-can-I-index-only-a-portion-of-html-content--tp5149557p258
structure. the nutchgui-searcher use a little bit another
folder structure like the original nutch.
+ you need a property 'nutch.instance.folder' which define the folder
where your crawl folders exists. for example nutch.instance.folder:
'tmp/nutch/crawls'
/tmp/nutch/crawl
Hi,
I 'm sorry I 'm sending the same question again.
Each time I create a new index for my web app, NutchBean throws an
exception and the search page crashes. I know that this is an old
problem for nutch (it needs a context restart after index updating) but
I was hoping that it would
Hi,
Each time I create a new index for my web app, NutchBean throws an
exception and the search page crashes. I know that this is an old
problem for nutch (it needs a context restart after index updating) but
I was hoping that it would be solved with newer versions. I'm using
Nutch 1.
Perhaps I have my terminology wrong, so I am looking at this the wrong way.
If I want to distribute my search across multiple nodes, having only a
portion of the data on each node, is this just a matter of using mergesegs
to get the number and size of segments I want, then rebuild the index (house
gt; Ok, I will paraphrase the question.
>
> Consider I want to use distributed search using 3 servers: one primary and
> two secondary nodes.
>
> I create single BIG index using distributed crawler using other computers.
> Now I want to split this single BIG index on two parts t
Ok, I will paraphrase the question.
Consider I want to use distributed search using 3 servers: one primary and
two secondary nodes.
I create single BIG index using distributed crawler using other computers.
Now I want to split this single BIG index on two parts to put on the search
nodes.
How
Hi Jesse,
I'm not sure what you're trying to achieve. Do you want to use the distributed
search or do you want to split an existing index? None of these tasks is the
prerequisite for the other.
If you want to split an index, there are several ways to do this. Which way to
choose depe
My apologies in advance.
I've been digging through the mail archives searching for information on
splitting the index after crawling, but I am getting even more confused or
the information is too incomplete for a newbie like myself.
I see reference to using mergesegs, but not enough to ma
You should give a bit more details because it depends how specific is you
Lucene indexer.
The way to extend indexing in Nutch is via the indexing plug-in mechanism
(look in the wiki for plugin addition).
A good starting point index-more plug-in.
As in Lucene you want to have a query extension too
Hi,
I'm exploring nutch and this forum for help on integrating a Lucene indexer
into the Nutch Crawl (version 0.9) process in Java.
Can anyone suggest how to do so or recommend an example/similar thread?
Thanks!
--
View this message in context:
http://www.nabble.com/Adding-Lucene-Index
using
> bin/nutch commnad.
> When nutch 0.9 index file made by main method of
> org.apache.nutch.crawl.Crawl class can be read from program.
> But when nutch 1.0 index file made by main method of
> org.apache.nutch.crawl.Crawl class can not be read from program.
>
>
> Also re
Hi,
I am new to nutch.
Now I am trying to do crawing from Java servlet program without using
bin/nutch commnad.
When nutch 0.9 index file made by main method of
org.apache.nutch.crawl.Crawl class can be read from program.
But when nutch 1.0 index file made by main method of
Hello,
I have a crawl folder with 2GB data and its index is 160MB. Then, nutch indexed
another set of domains and its crawl folder is about 1MB. I wondered if there
is an effective way making available for search indexes from both folders
without using merge script, since merging large
I like using Nutch for the crawlDB, scalability, threading, document
parsing, ... but crawling is not important to me as I index targeted
data sources.
Obviously, I'm using it with Solr for indexing and searching documents.
Fabrice
Alexander Aristov a écrit :
Nutch primarily is a crawl
On Tue, Aug 11, 2009 at 2:10 PM, Paul Tomblin wrote:
> I want to iterate through all the documents that are in the crawl,
> programattically. The only code I can find does searches. I don't
> want to search for a term, I want everything. Is there a way to do
> this?
To answer my own question, w
Nutch primarily is a crawler. I would suggest you to take a look at solr
which is just indexer and searcher. You may use it's API as well as open
interfaces
Best Regards
Alexander Aristov
2009/8/12 Fabrice Estiévenart
> Hello,
>
> How can I use Nutch Java objects to index
Try looking at how the indexers work. They *do* iterate through all
the documents in the crawl (or rather one segment at a time). However
they do it in a Hadoop way...
2009/8/11 Paul Tomblin :
> I want to iterate through all the documents that are in the crawl,
> programattically. The only code
Hello,
How can I use Nutch Java objects to index one (or a very limited set of)
web page(s) without crawling them ?
Do I need to use the crawling tools (such as Injector, Generator, ...)
or can I do it by the means of lower-level objects (Content,
ParseResult, ...) ?
Thanks for your help
I want to iterate through all the documents that are in the crawl,
programattically. The only code I can find does searches. I don't
want to search for a term, I want everything. Is there a way to do
this?
--
http://www.linkedin.com/in/paultomblin
On Mon, Jul 27, 2009 at 09:34, Saurabh Suman wrote:
>
> I am using solr for searching.I used the class SolrIndexer.But i can search
> on content only?I want to search on author also?How to index on author?
You need to write your own query plugin. Take a look at query-basic
plugin under s
Wouldn't that be using facets, as per
http://wiki.apache.org/solr/SimpleFacetParameters
On Mon, Jul 27, 2009 at 2:34 AM, Saurabh Suman
wrote:
>
> I am using solr for searching.I used the class SolrIndexer.But i can search
> on content only?I want to search on author also?How to i
I am using solr for searching.I used the class SolrIndexer.But i can search
on content only?I want to search on author also?How to index on author?
--
View this message in context:
http://www.nabble.com/How-to-index-other-fields-in-solr-tp24674208p24674208.html
Sent from the Nutch - User
of failure,
but of succeeding at something that doesn't really matter.
-- ANONYMOUS
On Tue, Jul 14, 2009 at 8:32 AM, Beats wrote:
>
> hi,
>
> actually what i want is to crawl a web page say 'page A' and all its
> outlinks.
> i want to index all the content
hi,
actually what i want is to crawl a web page say 'page A' and all its
outlinks.
i want to index all the content gathered by crawling the outlinks. But not
the 'page A'.
is there any way to do it in single run.
with Regards
Beats
be...@yahoo.com
SunGod wrote:
>
&
h first
>
> 2009/7/13 Beats
>
>
>> can anyone help me on this..
>>
>> i m using solr to index the nutch doc.
>> So i think prune tool will not work.
>>
>> i do not want to index the document taken from a particular set of sites
>>
>> wi
test/segments/20090628160619
loop step 3 - 5, write a bash script running is best!
next time please use google search first
2009/7/13 Beats
>
> can anyone help me on this..
>
> i m using solr to index the nutch doc.
> So i think prune tool will not work.
>
> i do not want
can anyone help me on this..
i m using solr to index the nutch doc.
So i think prune tool will not work.
i do not want to index the document taken from a particular set of sites
with regards Beats
--
View this message in context:
http://www.nabble.com/how-to-crawl-a-page-but-not-index-it
hi all
i want to crawl a page and then crawl all its outlinks and index the content
of those crawled outlinks..
the problem is i dont want to index the page from where i get these
outlinks..
thanx in advance
--
View this message in context:
http://www.nabble.com/how-to-crawl-a-page-but
hi,
i m getting the same problem here..
is there some changes that are needed to b done before using "feed" plugin??
i m getting parsing error
Felix Zimmermann-2 wrote:
>
> Hi,
>
>
>
> Is there an easy way to parse and index the content field of feeds wi
yes that is correct, in order to do that you could modify the parser to
store the content of special tags into another field that you would give a
higher boost.
best regards,
Magnus
On Thu, Jul 9, 2009 at 3:30 PM, Joel Halbert wrote:
> Hi, Would I be correct in thinking that Nutch, when indexin
Hi, Would I be correct in thinking that Nutch, when indexing an html
document, does not weight the different text nodes (h1, h2, anchor etc)
differently - instead it just lumps together all text as one? (this is
the impression I get from looking at
org.apache.nutch.parse.html.HtmlParser)
Rgs,
Joe
Example1:
Error parsing:
http://localhost/mydocs/Programacion/Web/Ajax/Ajax.Hacks.Tips.and.Tools.for.Creating.Responsive.Web.Sites.Mar.2006.chm:
org.apache.nutch.parse.ParseException: parser not found for
contentType=chemical/x-chemdraw
url=http://localhost/mydocs/Programacion/Web/Ajax/Ajax.H
Example1:
Error parsing:
http://localhost/mydocs/Programacion/Web/Ajax/Ajax.Hacks.Tips.and.Tools.for.Creating.Responsive.Web.Sites.Mar.2006.chm:
org.apache.nutch.parse.ParseException: parser not found for
contentType=chemical/x-chemdraw
url=http://localhost/mydocs/Programacion/Web/Ajax/Ajax.Ha
how can I list documents stored in the nutch index. I have indexed some
file.zip to get information about the data within the .txt file (using
parse-text)
below ~/nutch-1.0/index/ are some binary files and I want to know if
that .txt information went into the nutch index.
please advise
Short answer, not way to do it currently. Now for the long answer.
You can handle searching in two ways:
1) Have a single massive index and segments, merge everything including
segments and indexes. Then split the indexes and segments (don't forget
having to split the segments othe
I agree with you that we should spilit up index at the stage of
indexing. We are thinking on the same page. Maybe we can read index file
directory and segements directory in nutch api, and spilt segements file
dir by documents, and build index on each segements file?
nutch claim that it a
from the top of my head I am afraid this in no direct solution. Index can be
split at the stage of indexing by setting the parameter number of URLs in an
index but you also have segments which store page content. And you cannot
alter them after you have done indexing.
Probably the solution might
i am considering this problem now , anyone can help?
charlie w wrote:
>
> With regard to distributed search I see lots of discussion about splitting
> the index, but no actual discussion about specifically how that's done.
>
> I have a small, but growing, index. Is it
1 - 100 of 770 matches
Mail list logo