MCF agent falls down with tomcat

2015-11-10 Thread Shigeki Kobayashi
Hi everyone. I use Tomcat7 and MCF 2.2 I realize when tomcat falls down, MCF agent also falls down. I did not see the agent falls down with tomcat in older versions of MCF. Did the specification become changed?

how web crawler crawls contents after the first crawling

2015-02-05 Thread Shigeki Kobayashi
Hi Karl I have a basic question about how web crawler crawls contents after the first crawling. Does it crawls and indexes all pages from the root all the time or it crawls only pages that are modified. If it crawls only modified pages how does it figure out the pages are modified? By checking

start minimal option even deletes contents whose links are deleted

2014-12-23 Thread Shigeki Kobayashi
Hello guys. I like to clarify how “minimal start” of job execution work in MCF, using web contents as a repository connection and Solr as an output connection. I thought it supposed to skip deletion and it only crawls changed or new documents. However, there is a guy in my team who tested

Re: Google native documents are not crawled

2014-08-20 Thread Shigeki Kobayashi
Hi Karl, We checked out the lates trunk and ran it. Yet, MCF would not crawl Google native documents such as Google Spreadsheet. The simple history still tells the result code is No Length 2014-08-15 20:27 GMT+09:00 Karl Wright daddy...@gmail.com: Hi Shigeki, We've decided to fix this

Re: Google native documents are not crawled

2014-08-20 Thread Shigeki Kobayashi
Hi Karl, It seems that all documents that are shared by other users also do not get crawled. you can see Google would not show those documents sizes as well. I feel the document size check is bothering the Google docs crawling so a lot of documents are actually not applied by MCF.

Re: Google native documents are not crawled

2014-08-13 Thread Shigeki Kobayashi
Hi Karl As the result of running the latest trunk, the following error occurred in crawling Google Drive: WARN 2014-08-13 19:12:08,227 (Worker thread '2') - Attempt to set file lock

basic question of Web crawler setting of Include in index

2014-06-27 Thread Shigeki Kobayashi
Hello guys I am having a trouble in setting web crawling jobs. I want to index only php sites so I set the Include in index option to .*\.php.* I set the option above because php sites' URLs can take parameters like .php?a=b, but MCF indexes only URLs end with .php I need to index URLs with

[Windows Shares Connector]A job hangs while processing a directory

2013-07-18 Thread Shigeki Kobayashi
Hi guys. I use MCF1.1 running on MySQL 5.5 indexing files on a Windows server into Solr4.3.1. After running MCF for a while a job got stuck with 1 Active process remained. The job never get finished. I checked Document Status to see what Documets currently in progress and then there was one

“Keep unreachable documents, forever” deletes index

2013-05-21 Thread Shigeki Kobayashi
Hello, guys. I have a question about Web crawling with setting Hop count mode. In Hop count mode, you can choose “Keep unreachable documents forever”. With that setting, first crawling was fine. But when the web service that is to be crawled is down, the second time crawling deletes all

Re: Crawling new/updated files using Windows share connection takes too long

2013-01-20 Thread Shigeki Kobayashi
log? Karl On Fri, Jan 18, 2013 at 5:27 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hello I would like some advice to improve crawling time of new/updated files using Windows share connection. I crawl file in Windows server and index them into Solr

Timeout values to be configurable

2012-12-25 Thread Shigeki Kobayashi
Hi. As I have used MCF so far, I've faced timeout error many times while crawling and indexing files to Solr. I would like to propose to have the following timeout values configurable in properties.xml. Timeout errors often occur depending on files and environments(machines), so it would be nice

File crawl using exited with an unexpected jobqueue status error under MySQL

2012-12-20 Thread Shigeki Kobayashi
Hi I run MCF1.1dev trunk downloaded on Dec. 22nd and craw file using Windows share connection under MySQL 5.5.28, for Linux (x86_64). The following Error occurred and then the job exited: --- 2012/12/21 10:09:37 ERROR (Worker thread '78') - Exception tossed:

Re: Too many slow queries caused by MCF running MySQL 5.5

2012-12-11 Thread Shigeki Kobayashi
AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi Karl, Thanks for the reply. I did EXPLAIN as following: mysql explain SELECT - t0.id ,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset - FROM jobqueue t0 WHERE t0

Re: Too many slow queries caused by MCF running MySQL 5.5

2012-12-11 Thread Shigeki Kobayashi
Sorry, My bad. jcifs.jar was missing. Probably this is the cause. Sorry Shigeki 2012/12/11 Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp Hi Karl. I could build the source ok but the following code is missing from connectors.xml. Does this mean I built it incorrectly

Re: Too many slow queries caused by MCF running MySQL 5.5

2012-12-11 Thread Shigeki Kobayashi
rows in set (0.00 sec) Full table scanning was happening still. Regards, Shigeki 2012/12/11 Karl Wright daddy...@gmail.com You just need to run ant make-deps too before building. Karl Sent from my Windows Phone -- From: Shigeki Kobayashi Sent: 12/11/2012 3

How to crawl from the point where the job is stopped by errors

2012-12-11 Thread Shigeki Kobayashi
Hi There are sometimes errors occurred that stop the jobs crawling files using Windows share connecton. In this case, when starting the stopped the job again by clicking 'Start', i suppose that MCF crawls from the beginning again. If that's right, is there any way that could have MCF crawl from

Re: Too many slow queries caused by MCF running MySQL 5.5

2012-12-10 Thread Shigeki Kobayashi
for this CONNECTORS-584. Karl On Mon, Dec 10, 2012 at 2:13 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi. I downloaded MCF1.1dev on Nov, 29th, and ran it using MySQL I tried to crawl 10 million files using Windows share connection and index them into Solr. As MCF

Too many slow queries caused by MCF running MySQL 5.5

2012-12-09 Thread Shigeki Kobayashi
Hi. I downloaded MCF1.1dev on Nov, 29th, and ran it using MySQL I tried to crawl 10 million files using Windows share connection and index them into Solr. As MCF reached over 1 million files, the crawling speed started getting slower. So I checked slow queries and found out that too many slow

Re: Web crawl exited with an unexpected jobqueue status error under MySQL

2012-12-06 Thread Shigeki Kobayashi
to MCF 1.01 and see if this still happens for you? Karl On Wed, Dec 5, 2012 at 9:46 PM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hello Karl. MySQL: 5.5.24 Tomcat: 6.0.35 CentOS: 6.3 Regards, Shigeki 2012/12/5 Karl Wright daddy...@gmail.com Yes, I

Web crawl exited with an unexpected jobqueue status error under MySQL

2012-12-05 Thread Shigeki Kobayashi
Hi. I ran MCF0.6 under MySQL5.5. I crawled WEB and the following error occurred, then MCF stopped the job: 2012/12/04 18:50:07 ERROR (Worker thread '0') - Exception tossed: Unexpected jobqueue status - record id 1354608871138, expecting active status, saw 3

Re: Web crawl exited with an unexpected jobqueue status error under MySQL

2012-12-05 Thread Shigeki Kobayashi
a MySQL ticket. Karl On Wed, Dec 5, 2012 at 6:57 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi. I ran MCF0.6 under MySQL5.5. I crawled WEB and the following error occurred, then MCF stopped the job: 2012/12/04 18:50:07 ERROR (Worker

Re: Running multiple MCFs on one Tomcat

2012-12-04 Thread Shigeki Kobayashi
the properties.xml file is located is with a -D switch, I don't think you can run multiple instances properly in one JVM. If this is important to you, please let us know, and also please describe what you are trying to do this for. Thanks, Karl On Thu, Nov 29, 2012 at 8:05 PM, Shigeki Kobayashi

Running multiple MCFs on one Tomcat

2012-11-29 Thread Shigeki Kobayashi
Hi everyone, Just wondering if there is anyone tried running multiple MCFs on one Tomcat (not multiple jobs in one MCF). If that's possible, I like to try testing crawling performance using multiple MCFs. Regards, Shigeki

Re: Process behavior of executing multiple jobs

2012-11-19 Thread Shigeki Kobayashi
/issues in this area. Thanks, Karl On Sun, Nov 18, 2012 at 10:55 PM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi. I have a question of process behavior of executing multiple jobs. I run MCF1.0 on Tomcat, crawl files on Windows file servers, and index them

Process behavior of executing multiple jobs

2012-11-18 Thread Shigeki Kobayashi
Hi. I have a question of process behavior of executing multiple jobs. I run MCF1.0 on Tomcat, crawl files on Windows file servers, and index them into Solr3.6. When I set multiple jobs and execute them at the same times, I realize the number of documents processed by each job seems to be

Re: Changing logging level affect crawling results

2012-11-13 Thread Shigeki Kobayashi
it gets aborted as a result of another transfer being aborted. The CIFS protocol is vulnerable to this. Solution: reduce the Max Connections parameter in ManifoldCF for that connection to something between 2 and 5. Karl On Tue, Nov 13, 2012 at 3:51 AM, Shigeki Kobayashi shigeki.kobayas

Web crawling causes Socket Timeout after Database Exception

2012-10-10 Thread Shigeki Kobayashi
Hi I am having a trouble with crawling web using MCF1.0. I run MCF with MySQL 5.5 and Tomcat 6.0. It should keep crawling contents, but MCF prints the following Database exception log, then hangs. After DB Exception, Socket Time Exception occurs. Anyone has faced this problem? --Database

Re: Rules of excluding specific files in Windows file server are not recognized

2012-09-11 Thread Shigeki Kobayashi
...@yahoo.com wrote: Hi Shigeki Can you try entering *text.txt in the text box? Ahmet --- On Tue, 9/11/12, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: From: Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp Subject: Rules of excluding specific files in Windows