Total hits: 0 , search results are zero

2009-09-23 Thread sanjeev rathore
I am trying to test the integrity of the crawl before using Tomcat.  I am using Nutch 1.0 on Ubuntu 9.04.  Could someone tell me as to why I am getting Total hits: 0.  Regards, Sanjeev Here are the config files that I have configured.  conf/crawl-urlfilter.txt

Re: HTML parsing and charset for Polish

2009-09-23 Thread Dawid Weiss
Can you provide the HTTP headers and HEAD of the HTML of a Web page for which Nutch fails? Perhaps there is an inconsistency between HTTP and META headers or a mispelled codepage? Just a wild guess, but believe me -- Java does convert fine between Cp1250, Iso8859-2 and internal UTF-16 so there mus

Re: Event search engine

2009-09-23 Thread Brian Ulicny
Here's a recently announced event search engine: http://searchengineland.com/what-where-when-travel-local-search-combine-goby-com-26395 Just heard of it today. Brian Ulicny On Wed, 23 Sep 2009 09:27 +0200, "Michael Wechner" wrote: > Mitia Notaras schrieb: > > Hi there, > > > > The two event se

RE: AW: DC metadata

2009-09-23 Thread BELLINI ADAM
hi again martina :) now i dont have the error when crawling since i used the addIndexBackendOptions(Configuration conf) to add my fields ..thx for that. but still dont see an empty field (dc.subject) in my index when listing it with Luke...i guess i have to write the parser to extract it and ad

Re: AW: Null Indexing

2009-09-23 Thread Cisek
I had the same little big problem - everything seemed OK: - bin/nutch org.apache.nutch.searcher.NutchBean ... [in my case search query = "apache"] in cygwin returns 62 Total hits on cawled "+^http://([a-z0-9]*\.)*apache.org/" - Nutch in Tomcat webapp after deploy seemed fine (no errors) - I ha

RE: AW: DC metadata

2009-09-23 Thread BELLINI ADAM
yes i saw the differences and i wrote my index-cutom as the index-more plugin (nutch-1.0). but guess u right !! i didnt use the addFiledOptions method to add my custom fileds information ... so if i will add them in this method.. so for the parser i have to see first how is made the htmlparser

AW: DC metadata

2009-09-23 Thread Koch Martina
Hi, the howtos you're referring to are for Nutch 0.9. In Nutch 1.0 the indexing system changed a little bit. If you look at the index-basic or index-more plugin you see that the doc.add method changed. It's no longer doc.add(new Field("category", "puppies", false, true, false)) -> here you cr

RE: AW: DC metadata

2009-09-23 Thread BELLINI ADAM
hi, thank you for your answer... i was talking about this howto : CreateNewFilter Howto add a category metadata to your index and be able to search for it. For this, you need to write an indexing filter and a query filter. Indexing your custom metadata For the indexing filter, copy the index-mo

Specify at least one source--a file or resource collection error

2009-09-23 Thread Jaime Martín
Hi: I´m following the steps to run Nucth 1.0 release with Eclipse and Windows described in this link http://wiki.apache.org/nutch/RunNutchInEclipse1.0 I´m trying to build it, but when I launch the war target I have this error C:\ECLIPSE321\workspace\nutch-1.0\build.xml:62: Specify at least one so

Re: HTML parsing and charset for Polish

2009-09-23 Thread MilleBii
At last someone answers. Correct CP1250. My pages look fine in the browsers of course, but it does not mean Nutch handles them properly. What I'm wondering is if the the nutch HTML parser reads them properly, because when I do a search on such characters it fails on pages iso8859-2 or cp1250, but

Re: splitting an index (yes, again)

2009-09-23 Thread Jesse Hires
Exactly! sorry for being so confusing in my original question. Jesse int GetRandomNumber() { return 4; // Chosen by fair roll of dice // Guaranteed to be random } // xkcd.com On Wed, Sep 23, 2009 at 4:45 AM, Alexander Aristov < alexander.aris...@gmail.com> wrote: > Ok, I

Re: HTML parsing and charset for Polish

2009-09-23 Thread Dawid Weiss
Polish Web sites use Cp1250 (windows-1250) or iso8859-2 (or UTF-8 of course). Check if diacritics like these: ęółąśćżń look all right in the above encodings and use appropriately. Dawid On Wed, Sep 16, 2009 at 4:47 PM, MilleBii wrote: > same thing when there is > charset=ISO-8859-2 > > 2009/9/

Re: splitting an index (yes, again)

2009-09-23 Thread Alexander Aristov
Ok, I will paraphrase the question. Consider I want to use distributed search using 3 servers: one primary and two secondary nodes. I create single BIG index using distributed crawler using other computers. Now I want to split this single BIG index on two parts to put on the search nodes. How ca

Re: Event search engine

2009-09-23 Thread Michael Wechner
Mitia Notaras schrieb: Hi there, The two event search engines I found are down : betherebesquare.com and BusyTonight.com I would like your advice : Is it difficult to build one? I guess it depends on the details of the requirements. Do you have a requirements sheet? I have knowledge of web