Usually, if a webmaster finds that your crawler has ignored their robots.txt,
they will block you machine, or maybe even your entire IP block, from accessing
their site.
Karl
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Monday, July 15, 2013 9:30 AM
To
Hello anyone,
We recently ran into something people might not be fully aware of.
Specifically, because codec jars require META-INF/services files in order to be
discovered, and each codec has the same files, it's not a straightforward
operation to glom all the Lucene jars of interest into one
You might be interested in looking at ManifoldCF for getting your documents
into Solr. See http://incubator.apache.org/connectors for more details.
Karl
-Original Message-
From: ext Reyna Melara [mailto:reynamel...@gmail.com]
Sent: Wednesday, January 11, 2012 2:13 PM
To: java-user@luc
It's also worth looking at ManifoldCF.
Karl
-Original Message-
From: ext Markus Jelsma
Sent: 23/08/2011, 6:24 AM
To: solr-u...@lucene.apache.org
Cc: java-user@lucene.apache.org
Subject: Re: what's the status of droids
project(http://incubator.apache.org/droids/)?
You should ask on the
The site looks great. And thank you for including the ManifoldCF link. ;-)
Karl
-Original Message-
From: ext Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Wednesday, August 10, 2011 10:09 AM
To: solr-u...@lucene.apache.org; java-user@lucene.apache.org
Subject: [Help Wanted] Graphic
You might want to look at ManifoldCF too.
http://incubator.apache.org/connectors/
Karl
-Original Message-
From: ext Marlen [mailto:zmach...@facinf.uho.edu.cu]
Sent: Tuesday, June 21, 2011 9:49 AM
To: java-user@lucene.apache.org
Subject: need help
I need to create a search engine that s
You might want to look at ManifoldCF also.
Karl
-Original Message-
From: ext abhayd [mailto:ajdabhol...@hotmail.com]
Sent: Saturday, May 14, 2011 9:29 AM
To: java-user@lucene.apache.org
Subject: Re: [ANNOUNCE] Web Crawler
hi Dominique,
I am looking for a crawler to feed solr index. Aft
Not sure what your use case actually is, but it sounds like you may be unclear
how Lucene works.
Each query clause you have will produce an iterator that walks over the
documents that match that clause. All the documents from the entire, root
query get scored. The scoring evaluation per docum
Ah, I was not thinking of a Solr addon! I thought you were referring to some
other crawler that I'd never heard of.
So the answer to your question is that ManifoldCF differs from DIH in at least
the following ways:
- ManifoldCF can handle a wide range of repositories, not just database tables
>>
Karl,
can you give, in one paragraph, the difference between ManifoldCF and DIH?
thanks in advance
paul
<<
I am unfamiliar with DIH as an acronym in either the content management or
crawling infrastructure space. Can you clarify what you mean?
Karl
Dear Lucene/Solr user,
It is possible you may not know of an Apache project called ManifoldCF, whose
purpose is to provide content to Solr for index. If you have interest in this
project, this is to inform you that the ManifoldCF book from Manning
Publishing, titled ManifoldCF in Action, is no
11 matches
Mail list logo