from:"karl.wright"

RE: [ANNOUNCE] Web Crawler

2013-07-15 Thread karl.wright

Usually, if a webmaster finds that your crawler has ignored their robots.txt, they will block you machine, or maybe even your entire IP block, from accessing their site. Karl -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Monday, July 15, 2013 9:30 AM To

Jar packaging issue

2013-02-04 Thread karl.wright

Hello anyone, We recently ran into something people might not be fully aware of. Specifically, because codec jars require META-INF/services files in order to be discovered, and each codec has the same files, it's not a straightforward operation to glom all the Lucene jars of interest into one

RE: is it possible to index wiki markup files?

2012-01-11 Thread karl.wright

You might be interested in looking at ManifoldCF for getting your documents into Solr. See http://incubator.apache.org/connectors for more details. Karl -Original Message- From: ext Reyna Melara [mailto:reynamel...@gmail.com] Sent: Wednesday, January 11, 2012 2:13 PM To: java-user@luc

RE: what's the status of droids project(http://incubator.apache.org/droids/)?

2011-08-23 Thread karl.wright

It's also worth looking at ManifoldCF. Karl -Original Message- From: ext Markus Jelsma Sent: 23/08/2011, 6:24 AM To: solr-u...@lucene.apache.org Cc: java-user@lucene.apache.org Subject: Re: what's the status of droids project(http://incubator.apache.org/droids/)? You should ask on the

RE: [Help Wanted] Graphics and other help for new Lucene/Solr website

2011-08-10 Thread karl.wright

The site looks great. And thank you for including the ManifoldCF link. ;-) Karl -Original Message- From: ext Grant Ingersoll [mailto:gsing...@apache.org] Sent: Wednesday, August 10, 2011 10:09 AM To: solr-u...@lucene.apache.org; java-user@lucene.apache.org Subject: [Help Wanted] Graphic

RE: need help

2011-06-21 Thread karl.wright

You might want to look at ManifoldCF too. http://incubator.apache.org/connectors/ Karl -Original Message- From: ext Marlen [mailto:zmach...@facinf.uho.edu.cu] Sent: Tuesday, June 21, 2011 9:49 AM To: java-user@lucene.apache.org Subject: need help I need to create a search engine that s

RE: [ANNOUNCE] Web Crawler

2011-05-15 Thread karl.wright

You might want to look at ManifoldCF also. Karl -Original Message- From: ext abhayd [mailto:ajdabhol...@hotmail.com] Sent: Saturday, May 14, 2011 9:29 AM To: java-user@lucene.apache.org Subject: Re: [ANNOUNCE] Web Crawler hi Dominique, I am looking for a crawler to feed solr index. Aft

RE: how to get all documents in the results ?

2011-03-22 Thread karl.wright

Not sure what your use case actually is, but it sounds like you may be unclear how Lucene works. Each query clause you have will produce an iterator that walks over the documents that match that clause. All the documents from the entire, root query get scored. The scoring evaluation per docum

RE: ManifoldCF in Action

2011-03-10 Thread karl.wright

Ah, I was not thinking of a Solr addon! I thought you were referring to some other crawler that I'd never heard of. So the answer to your question is that ManifoldCF differs from DIH in at least the following ways: - ManifoldCF can handle a wide range of repositories, not just database tables

Re: ManifoldCF in Action

2011-03-10 Thread karl.wright

>> Karl, can you give, in one paragraph, the difference between ManifoldCF and DIH? thanks in advance paul << I am unfamiliar with DIH as an acronym in either the content management or crawling infrastructure space. Can you clarify what you mean? Karl

ManifoldCF in Action

2011-03-01 Thread karl.wright

Dear Lucene/Solr user, It is possible you may not know of an Apache project called ManifoldCF, whose purpose is to provide content to Solr for index. If you have interest in this project, this is to inform you that the ManifoldCF book from Manning Publishing, titled ManifoldCF in Action, is no

RE: [ANNOUNCE] Web Crawler

Jar packaging issue

RE: is it possible to index wiki markup files?

RE: what's the status of droids project(http://incubator.apache.org/droids/)?

RE: [Help Wanted] Graphics and other help for new Lucene/Solr website

RE: need help

RE: [ANNOUNCE] Web Crawler

RE: how to get all documents in the results ?

RE: ManifoldCF in Action

Re: ManifoldCF in Action

ManifoldCF in Action

11 matches

Site Navigation

Mail list logo

Footer information