Re: can not deal too many files under one folder

2008-09-03 Thread
> take a look at that file unde conf folder. > > setting db.max.outlinks.per.page as -1 may solve your problem. but also > take a look at other variables. those alse may cause a problem in future, > like http.content.limit... > > hope this helps.. > > regards > > onur

can not deal too many files under one folder

2008-09-01 Thread
Hi all, I have post this porblem before, but not solved. I use nutch to crawl on intranet to crawl some documents. For some urls there are many documents under it. I find after crawling, if there are more than 32 files under one folder, I only can search 32 documents before ,other documents afte

can not deal with documents more than 32 under one folder?

2008-08-11 Thread
Hi all, I met a problem when using nutch. I use it to crawl on intranet to crawl some documents. For some urls there are many documents under it. I find after crawling, I only can search 32 documents,other documents after can not be searched. I check it at luke, it have the same situation. It m

Re: nutch fetched but no indexed

2008-07-29 Thread
. regards, Gong Zhao 2008/7/28 wuqi <[EMAIL PROTECTED]> > Try to set log for Dedup program to "DEBUG" in your log4j.properties > file and you may find the cause.. > > - Original Message - > *From:* 宫照 <[EMAIL PROTECTED]> > *To:* nutch-user@lucen

Re: nutch fetched but no indexed

2008-07-27 Thread
the > segement file.. > > > > - Original Message - > From: "宫照" <[EMAIL PROTECTED]> > To: ; <[EMAIL PROTECTED]> > Sent: Friday, July 25, 2008 9:53 AM > Subject: Re: nutch fetched but no indexed > > > > Hi Patrick, > > > >

Re: nutch fetched but no indexed

2008-07-24 Thread
)|... > > > > > That's the only thing I can think of at first glance. > > Patrick > -Original Message- > From: 宫照 [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 23, 2008 11:27 PM > To: nutch-user@lucene.apache.org > Subject: nutch fetched but n

nutch fetched but no indexed

2008-07-23 Thread
Hi everybody, I face a problem when using nutch. I use nuth to crawl in intranet. It works well before. But recently, I add some urls to crawl. These urls ara different with normal .The new urls like this: http://compass.mydomain.com/go/247460034 there are many folders or documents under this url

Re: CRAWLING USING LATEST NUTCH AND HADOOP

2008-07-16 Thread
Hi, I have the same problems. Because there are some bugs with hadoop-0.12.2,I want to change to hadoop-0.17.0, but the api changed,we can't use it directly. If your find the way to solve this problem. let me know. Regards, gong zhao 2008/7/15 kranthi reddy <[EMAIL PROTECTED]>: > Hi, > > I am

Re: how to search pdf and word

2008-07-07 Thread
gt; > for example: > > >plugin.includes >protocol-(httpclient|file)|urlfilter-(regex)|parse-(text| > html|js|pdf|msword)|index-(basic)|query- > (basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex| > basic) > > > > > On Tue,

Re: Indexing static html files

2008-07-07 Thread
hi everybody, I setup nuthc-0.9, and I can search txt and html in local system . Now i want to search pdf and msword , can you tell me how to do? BR, mingkong

how to search pdf and word

2008-07-07 Thread
hi everybody, I setup nuthc-0.9, and I can search txt and html in local system . Now i want to search pdf and msword , can you tell me how to do? BR, mingkong