RE: [ANNOUNCE] Web Crawler

2013-07-15 Thread Ramakrishna
3.nabble.com/ANNOUNCE-Web-Crawler-tp2607833p4078229.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: [ANNOUNCE] Web Crawler

2013-07-15 Thread Ramakrishna
3.nabble.com/ANNOUNCE-Web-Crawler-tp2607833p4078228.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: [ANNOUNCE] Web Crawler

2013-07-15 Thread karl.wright
Usually, if a webmaster finds that your crawler has ignored their robots.txt, they will block you machine, or maybe even your entire IP block, from accessing their site. Karl -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Monday, July 15, 2013 9:30 AM

Re: [ANNOUNCE] Web Crawler

2013-07-15 Thread Jack Krupansky
anybody on this mailing list would engage in such an unethical or unprofessional activity. -- Jack Krupansky -Original Message- From: Ramakrishna Sent: Monday, July 15, 2013 9:13 AM To: java-user@lucene.apache.org Subject: Re: [ANNOUNCE] Web Crawler Hi.. I'm trying nutch to

Re: [ANNOUNCE] Web Crawler

2013-07-15 Thread Ramakrishna
else plz suggest me which are the crawlers to use to crawl web-sites without bothering about robots.txt of that particular site. Its urgent plz reply as soon as possible. Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607833p4078039

Lucene Desktop Search Engine with JavaFX/Tika/Filesystem Crawler/HTML5

2013-04-29 Thread Mirko Sertic
Hi@all Lucene rocks, and based on some JavaFX/HTML5 hyprids i built a small Java search engine for your desktop! The prototype and the result can be seen here: http://www.mirkosertic.de/doku.php/javastuff/fxdesktopsearch I am using a multithreaded pipes and filters architecture with Tika as

Re: [ANNOUNCE] Web Crawler

2011-05-27 Thread Dominique Bejean
Hi, Sorry for the delay, but I haven't been checking the mailing list for a long time. Crawl-anywhere includes 3 piece of software : a crawler, a pipeline and a solr indexer. There is a default Solr schema used by Crawl-anywhere, tested with Solr 1.4.1 and Solr 3.1.0. But, yo

Re: [ANNOUNCE] Web Crawler

2011-05-16 Thread abhayd
in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607833p2947623.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apac

Re: [ANNOUNCE] Web Crawler

2011-05-16 Thread Julien Nioche
> I dont see any activities on Nutch wiki so wondering if its not being > developed anymore. But most forums say Nutch is standard for solr. > Looking at the mail archives is a good clue of whether a project is still alive or not. In the case of Nutch, the project is active as you can see on the l

RE: [ANNOUNCE] Web Crawler

2011-05-15 Thread karl.wright
You might want to look at ManifoldCF also. Karl -Original Message- From: ext abhayd [mailto:ajdabhol...@hotmail.com] Sent: Saturday, May 14, 2011 9:29 AM To: java-user@lucene.apache.org Subject: Re: [ANNOUNCE] Web Crawler hi Dominique, I am looking for a crawler to feed solr index

Re: [ANNOUNCE] Web Crawler

2011-05-15 Thread abhayd
hi Dominique, I am looking for a crawler to feed solr index. After looking at various posts i have settled down on two Nutch and crawl anywhere. I dont see any activities on Nutch wiki so wondering if its not being developed anymore. But most forums say Nutch is standard for solr. Crawl

[ANNOUNCE] Web Crawler

2011-03-01 Thread Dominique Bejean
Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document processing pipeline * a solr indexer The crawler has a web administration in order to manage web sites to be crawled. Each web site crawl is configured with a

Re: Apache Lucene Crawler search

2009-05-27 Thread Mark Miller
Lucene is more like a search utility library than a full blown Search Engine like FAST. The Lucene sub project, Solr is more comparable to FAST, but Solr does not have a built in crawler available either (though its easy enough to do basic crawls). There are many open source crawlers you

Re: Apache Lucene Crawler search

2009-05-27 Thread Michael McCandless
Have a look at Apache droids? http://incubator.apache.org/droids/ Mike On Wed, May 27, 2009 at 5:37 AM, gnixinfosoft wrote: > > How to implement crawler search in Apache Lucene, >> >> I am currently using FAST search engine in my project, which uses crawler >&g

Apache Lucene Crawler search

2009-05-27 Thread gnixinfosoft
How to implement crawler search in Apache Lucene, > > I am currently using FAST search engine in my project, which uses crawler > facility > > How to implement this using Apache Lucene, I read somewhere that there is > no > direct functionality to this in Apache Lucene, bu

Re: crawler questions..

2009-03-05 Thread adasal
That's interesting. I've been working in python recently, not crawling though. But, as ever, the more you get into it the more curious you get. Did you come up with a solution to a node error? Are you really talking about a broken link, or are you just saying the bottom of the tree has been reached

Re: crawler questions..

2009-03-04 Thread Tim Williams
On Wed, Mar 4, 2009 at 4:41 PM, Grant Ingersoll wrote: > You might have a look at Droids (http://incubator.apache.org/droids/) or > Nutch (http://lucene.apache.org/nutch) and their communities.  They are much > more focused on crawling (not to say there aren't people here who crawl, > just saying

Re: crawler questions..

2009-03-04 Thread Grant Ingersoll
You might have a look at Droids (http://incubator.apache.org/droids/) or Nutch (http://lucene.apache.org/nutch) and their communities. They are much more focused on crawling (not to say there aren't people here who crawl, just saying those projects are (mostly) about crawling) On Mar 4, 2

crawler questions..

2009-03-04 Thread bruce
Hi... Sorry that this is a bit off track. Ok, maybe way off track! But I don't have anyone to bounce this off of.. I'm working on a crawling project, crawling a college website, to extract course/class information. I've built a quick test app in python to crawl the site. I crawl at the top level

Re: Crawler

2009-01-30 Thread Michael Wechner
Jay Malaluan schrieb: Hi, You can check out Nutch at http://lucene.apache.org/nutch/. also see http://incubator.apache.org/projects/droids.html Cheers Michael Regards, Jay Joel Malaluan Haroldo Nascimento-2 wrote: Hi, There is any crawler that integrate with index lucene

Re: Crawler

2009-01-29 Thread Jay Malaluan
Hi, You can check out Nutch at http://lucene.apache.org/nutch/. Regards, Jay Joel Malaluan Haroldo Nascimento-2 wrote: > > > Hi, > > There is any crawler that integrate with index lucene ? > > T

Crawler

2009-01-29 Thread Haroldo Nascimento
Hi, There is any crawler that integrate with index lucene ? Thanks Haroldo _ Conheça o Windows Live Spaces, a rede de relacionamentos do Messenger! http://www.amigosdomessenger.com.br/

Looking for crawler recommendations.

2007-02-01 Thread spamsucks
Has anyone integrated a crawler with lucene that they had success with? I cannot use Nutch, since 60% of our searchable content is contained in a database. I need to do a hybrid between database indexing and website crawling. I would be just crawling one domain with a given set of

Write my own crawler VS use nutch?

2007-01-26 Thread spamsucks
pplication to nutch b) Write a web crawler to crawl our site and inject the crawl results into our lucene index. I am leaning towards option B (write our own crawler), since I think it would only take me a couple of days of write a simple crawler and I wouldn't have to change much else. C