Hi everyone.
I use Tomcat7 and MCF 2.2
I realize when tomcat falls down, MCF agent also falls down.
I did not see the agent falls down with tomcat in older versions of MCF.
Did the specification become changed?
Hi Karl
I have a basic question about how web crawler crawls contents after
the first crawling.
Does it crawls and indexes all pages from the root all the time or it
crawls
only pages that are modified.
If it crawls only modified pages how does it figure out the pages are
modified?
By checking
Hello guys.
I like to clarify how “minimal start” of job execution work in MCF, using
web contents as a repository connection and Solr as an output connection.
I thought it supposed to skip deletion and it only crawls changed or new
documents. However, there is a guy in my team who tested
Hi Karl,
We checked out the lates trunk and ran it.
Yet, MCF would not crawl Google native documents such as Google Spreadsheet.
The simple history still tells the result code is No Length
2014-08-15 20:27 GMT+09:00 Karl Wright daddy...@gmail.com:
Hi Shigeki,
We've decided to fix this
Hi Karl,
It seems that all documents that are shared by other users also do not get
crawled.
you can see Google would not show those documents sizes as well.
I feel the document size check is bothering the Google docs crawling so a
lot of documents are actually not applied by MCF.
Hi Karl
As the result of running the latest trunk, the following error occurred in
crawling Google Drive:
WARN 2014-08-13 19:12:08,227 (Worker thread '2') - Attempt to set file
lock
Hello guys
I am having a trouble in setting web crawling jobs.
I want to index only php sites so I set the Include in index option to
.*\.php.*
I set the option above because php sites' URLs can take parameters like
.php?a=b,
but MCF indexes only URLs end with .php
I need to index URLs with
Hi guys.
I use MCF1.1 running on MySQL 5.5 indexing files on a Windows server into
Solr4.3.1.
After running MCF for a while a job got stuck with 1 Active process
remained. The job never get finished.
I checked Document Status to see what Documets currently in progress and
then there was one
Hello, guys.
I have a question about Web crawling with setting Hop count mode.
In Hop count mode, you can choose “Keep unreachable documents forever”.
With that setting, first crawling was fine. But when the web service that
is to be crawled is down, the second time crawling deletes all
log?
Karl
On Fri, Jan 18, 2013 at 5:27 AM, Shigeki Kobayashi
shigeki.kobayas...@g.softbank.co.jp wrote:
Hello
I would like some advice to improve crawling time of new/updated files
using
Windows share connection.
I crawl file in Windows server and index them into Solr
Hi.
As I have used MCF so far, I've faced timeout error many times while
crawling and indexing files to Solr.
I would like to propose to have the following timeout values configurable
in properties.xml.
Timeout errors often occur depending on files and environments(machines),
so it would be nice
Hi
I run MCF1.1dev trunk downloaded on Dec. 22nd and craw file using Windows
share connection
under MySQL 5.5.28, for Linux (x86_64).
The following Error occurred and then the job exited:
---
2012/12/21 10:09:37 ERROR (Worker thread '78') - Exception tossed:
AM, Shigeki Kobayashi
shigeki.kobayas...@g.softbank.co.jp wrote:
Hi Karl,
Thanks for the reply.
I did EXPLAIN as following:
mysql explain SELECT
-
t0.id
,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
- FROM jobqueue t0 WHERE t0
Sorry, My bad.
jcifs.jar was missing. Probably this is the cause.
Sorry
Shigeki
2012/12/11 Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp
Hi Karl.
I could build the source ok but the following code is missing
from connectors.xml. Does this mean I built it incorrectly
rows in set (0.00 sec)
Full table scanning was happening still.
Regards,
Shigeki
2012/12/11 Karl Wright daddy...@gmail.com
You just need to run ant make-deps too before building.
Karl
Sent from my Windows Phone
--
From: Shigeki Kobayashi
Sent: 12/11/2012 3
Hi
There are sometimes errors occurred that stop the jobs crawling files using
Windows share connecton.
In this case, when starting the stopped the job again by clicking 'Start',
i suppose that MCF crawls from the beginning again.
If that's right, is there any way that could have MCF crawl from
for this
CONNECTORS-584.
Karl
On Mon, Dec 10, 2012 at 2:13 AM, Shigeki Kobayashi
shigeki.kobayas...@g.softbank.co.jp wrote:
Hi.
I downloaded MCF1.1dev on Nov, 29th, and ran it using MySQL
I tried to crawl 10 million files using Windows share connection and
index
them into Solr.
As MCF
Hi.
I downloaded MCF1.1dev on Nov, 29th, and ran it using MySQL
I tried to crawl 10 million files using Windows share connection and index
them into Solr.
As MCF reached over 1 million files, the crawling speed started getting
slower.
So I checked slow queries and found out that too many slow
to MCF 1.01 and see if this still happens for you?
Karl
On Wed, Dec 5, 2012 at 9:46 PM, Shigeki Kobayashi
shigeki.kobayas...@g.softbank.co.jp wrote:
Hello Karl.
MySQL: 5.5.24
Tomcat: 6.0.35
CentOS: 6.3
Regards,
Shigeki
2012/12/5 Karl Wright daddy...@gmail.com
Yes, I
Hi.
I ran MCF0.6 under MySQL5.5. I crawled WEB and the following error
occurred, then MCF stopped the job:
2012/12/04 18:50:07 ERROR (Worker thread '0') - Exception tossed:
Unexpected jobqueue status - record id 1354608871138, expecting active
status, saw 3
a MySQL ticket.
Karl
On Wed, Dec 5, 2012 at 6:57 AM, Shigeki Kobayashi
shigeki.kobayas...@g.softbank.co.jp wrote:
Hi.
I ran MCF0.6 under MySQL5.5. I crawled WEB and the following error
occurred,
then MCF stopped the job:
2012/12/04 18:50:07 ERROR (Worker
the properties.xml file is located is with a -D
switch, I don't think you can run multiple instances properly in one
JVM.
If this is important to you, please let us know, and also please
describe what you are trying to do this for.
Thanks,
Karl
On Thu, Nov 29, 2012 at 8:05 PM, Shigeki Kobayashi
Hi everyone,
Just wondering if there is anyone tried running multiple MCFs on one Tomcat
(not multiple jobs in one MCF).
If that's possible, I like to try testing crawling performance using
multiple MCFs.
Regards,
Shigeki
/issues in this area.
Thanks,
Karl
On Sun, Nov 18, 2012 at 10:55 PM, Shigeki Kobayashi
shigeki.kobayas...@g.softbank.co.jp wrote:
Hi.
I have a question of process behavior of executing multiple jobs.
I run MCF1.0 on Tomcat, crawl files on Windows file servers, and index
them
Hi.
I have a question of process behavior of executing multiple jobs.
I run MCF1.0 on Tomcat, crawl files on Windows file servers, and index them
into Solr3.6.
When I set multiple jobs and execute them at the same times, I realize the
number of documents processed by each job seems to be
it gets aborted as a
result of another transfer being aborted. The CIFS protocol is
vulnerable to this. Solution: reduce the Max Connections parameter in
ManifoldCF for that connection to something between 2 and 5.
Karl
On Tue, Nov 13, 2012 at 3:51 AM, Shigeki Kobayashi
shigeki.kobayas
Hi
I am having a trouble with crawling web using MCF1.0.
I run MCF with MySQL 5.5 and Tomcat 6.0.
It should keep crawling contents, but MCF prints the following Database
exception log, then hangs.
After DB Exception, Socket Time Exception occurs.
Anyone has faced this problem?
--Database
...@yahoo.com wrote:
Hi Shigeki
Can you try entering *text.txt in the text box?
Ahmet
--- On Tue, 9/11/12, Shigeki Kobayashi
shigeki.kobayas...@g.softbank.co.jp wrote:
From: Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp
Subject: Rules of excluding specific files in Windows
28 matches
Mail list logo