Re: [ANNOUNCE] Web Crawler

2013-05-23 Thread Dominique Bejean
Hi, Release 3.0.3 was tested with : * Oracle Java 6 but should work fine with version 7 * Tomcat 5.5 and 6 and 7 * PHP 5.2.x and 5.3.x * Apache 2.2.x * MongoDB 64 bits 2.2 (know issue with 2.4) The new release 4.0.0-alpha-2 is available under Github - https://github.com/bejean/crawl-anywhere

Re: [ANNOUNCE] Web Crawler

2013-05-22 Thread Rajesh Nikam
Hi, crawl anywhere seems to using old versions of java, tomcat, etc. http://www.crawl-anywhere.com/installation-v300/ Will it work with new versions of these required software ? Is there updated installation guide available ? Thanks Rajesh On Wed, May 22, 2013 at 6:48 PM, Dominique Bejean

Re: [ANNOUNCE] Web Crawler

2013-05-22 Thread Dominique Bejean
Hi, I did see this message (again). Please, use the new dedicated Crawl-Anywhere forum for your next questions. https://groups.google.com/forum/#!forum/crawl-anywhere Did you solve your problem ? Thank you Dominique Le 29/01/13 09:28, SivaKarthik a écrit : Hi, i resolved the issue "Acc

Re: [ANNOUNCE] Web Crawler

2013-05-22 Thread Dominique Bejean
Hi, Crawl-Anywhere is now open-source - https://github.com/bejean/crawl-anywhere Best regards. Le 02/03/11 10:02, findbestopensource a écrit : Hello Dominique Bejean, Good job. We identified almost 8 open source web crawlers http://www.findbestopensource.com/tagged/webcrawler I don't kno

Re: [ANNOUNCE] Web Crawler

2013-01-29 Thread SivaKarthik
Hi, i resolved the issue "Access denied for user 'crawler'@'localhost' (using password: YES)" mysql user crawler/crawler was created and privileges added as mentioned in the tutorial.. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-t

Re: [ANNOUNCE] Web Crawler

2013-01-29 Thread SivaKarthik
Klein, Thank you for ur reply.. i hosted the application in apache2 server and able to access the link http://localhost/search/ but while accessing http://localhost/crawler/login.php its showing the error msg as "Access denied for user 'crawler'@'localhost' (using passwor

Re: [ANNOUNCE] Web Crawler

2013-01-27 Thread O. Klein
This is actualy showing it works. crawlerws is used by Crawl Anywhere UI and will pass it the correct arguments when needed. SivaKarthik wrote > Hii, > I'm trying to configure crawl-anywhere 3.0.3 version in my local system.. > i'm following the steps from the page > http://www.crawl-anywher

Re: [ANNOUNCE] Web Crawler

2013-01-27 Thread SivaKarthik
Hii, I'm trying to configure crawl-anywhere 3.0.3 version in my local system.. i'm following the steps from the page http://www.crawl-anywhere.com/installation-v300/ but, crawlerws is failing and throwing the below error message in the brower http://localhost:8080/crawlerws/ 1

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Nestor Oviedo
ginal Message- >> From: Dominique Bejean [mailto:dominique.bej...@eolya.fr] >> Sent: Wednesday, March 02, 2011 6:22 AM >> To: solr-user@lucene.apache.org >> Subject: Re: [ANNOUNCE] Web Crawler >> >> Aditya, >> >> The crawler is not open source an

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
NTLM2 and that is posing challenges with Nutch? -Original Message- From: Dominique Bejean [mailto:dominique.bej...@eolya.fr] Sent: Wednesday, March 02, 2011 6:22 AM To: solr-user@lucene.apache.org Subject: Re: [ANNOUNCE] Web Crawler Aditya, The crawler is not open source and won't

RE: [ANNOUNCE] Web Crawler

2011-03-02 Thread Thumuluri, Sai
@lucene.apache.org Subject: Re: [ANNOUNCE] Web Crawler Aditya, The crawler is not open source and won't be in the next future. Anyway, I have to change the license because it can be use for any personal or commercial projects. Sincerely, Dominique Le 02/03/11 10:02, findbestopensource a

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Paul Libbrecht
VIewing the indexing result, which is a part of what you are describing I think, is a nice job for such an indexing framework. Do you guys know whether such feature is already out there? paul Le 2 mars 2011 à 12:20, Geert-Jan Brits a écrit : > Hi Dominique, > > This looks nice. > In the past

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Hi, The crawler comes with a extendible document processing pipeline. If you know java libraries or web services for 'wrapper induction' processing, it is possible to implement a dedicated stage in the pipeline. Dominique Le 02/03/11 12:20, Geert-Jan Brits a écrit : Hi Dominique, This look

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Aditya, The crawler is not open source and won't be in the next future. Anyway, I have to change the license because it can be use for any personal or commercial projects. Sincerely, Dominique Le 02/03/11 10:02, findbestopensource a écrit : Hello Dominique Bejean, Good job. We identified

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Lukas, I am thinking about it but no decision yet. Anyway, in next release, I will provide source code of pipeline stages and connectors as samples. Dominique Le 02/03/11 10:01, Lukáš Vlček a écrit : Hi, is there any plan to open source it? Regards, Lukas [OT] I tried HuriSearch, input "

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Geert-Jan Brits
Hi Dominique, This looks nice. In the past, I've been interested in (semi)-automatically inducing a scheme/wrapper from a set of example webpages (often called 'wrapper induction' is the scientific field) . This would allow for fast scheme-creation which could be used as a basis for extraction. L

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Rosa, In the pipeline, there is a stage that extract the text from the original document (PDF, HTML, ...). It is possible to plug scripts (Java 6 compliant) in order to keep only relevant parts of the document. See http://www.wiizio.com/confluence/display/CRAWLUSERS/DocTextExtractor+stage Do

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
David, The UI was not the only reason that make me choose to write a totaly new crawler. After eliminating candidate crawlers due to various reasons (inactive project, ...), Nutch and Heritrix where the 2 crawlers in my short list of possible candidates to be use. In my mind, the crawler and

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread findbestopensource
Hello Dominique Bejean, Good job. We identified almost 8 open source web crawlers http://www.findbestopensource.com/tagged/webcrawler I don't know how far yours would be different from the rest. Your license states that it is not open source but it is free for personnel use. Regards Aditya ww

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Lukáš Vlček
Hi, is there any plan to open source it? Regards, Lukas [OT] I tried HuriSearch, input "Java" into search field, it returned a lot of references to coldfusion error pages. May be a recrawl would help? On Wed, Mar 2, 2011 at 1:25 AM, Dominique Bejean wrote: > Hi, > > I would like to announce Cr

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Rosa (Anuncios)
Nice job! It would be good to be able to extract specific data from a given page via XPATH though. Regards, Le 02/03/2011 01:25, Dominique Bejean a écrit : Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document pro

Re: [ANNOUNCE] Web Crawler

2011-03-01 Thread David Smiley (@MITRE.org)
Dominique, The obvious number one question is of course why you re-invented this wheel when there are several existing crawlers to choose from. Your website says the reason is that the UIs on existing crawlers (e.g. Nutch, Heritrix, ...) weren't sufficiently user-friendly or had the site-specific