Re: Exporting crawler configuration easier?

2012-06-27 Thread Karl Wright
The fact that the export is a zip is not supposed to be used to actually edit the stored information. It sounds like the reason that you want to edit it is to remove the passwords from the file. Perhaps we should look at it from that point of view and allow an export option that does not include

Re: Crawling behind an ISA proxy (iis 7.5)

2012-06-28 Thread Karl Wright
I was wondering if you'd picked up and tried the patch for CONNECTORS-483. This patch adds official proxy support for the Web Connector. Alternatively, you could try to build and run with trunk code. Karl On Wed, May 16, 2012 at 12:12 PM, Karl Wright daddy...@gmail.com wrote: Hi Rene

RE: How to increase cache settings for ManifoldCF Authority Service

2012-07-04 Thread Karl Wright
It would be great if you could open a ticket to request that this cache value be configurable like it is in the active directory authority. Karl Sent from my Windows Phone -- From: Anupam Bhattacharya Sent: 7/3/2012 10:13 AM To: user@manifoldcf.apache.org Subject: Re:

[ANNOUNCE] ManifoldCF 0.6 is released!

2012-07-16 Thread Karl Wright
I'd like to announce the release of ManifoldCF 0.6. The list of changes can be found at https://svn.apache.org/repos/asf/manifoldcf/branches/release-0.6-branch/CHANGES.txt. Congratulations to all involved! Karl

Re: How to import data from Oracle to Solr

2012-07-17 Thread Karl Wright
Hi Wolfgang, ManifoldCF is meant to handle a binary document and its metadata. You must provide the document. Metadata is optional. The JDBC connector does not currently support metadata. In order to index this, therefore, you will need to decide what should go into your binary document from

Re: How to import data from Oracle to Solr

2012-07-18 Thread Karl Wright
-Ursprüngliche Nachricht- Von: Karl Wright [mailto:daddy...@gmail.com] Gesendet: Di 17.07.2012 15:13 An: user@manifoldcf.apache.org Betreff: Re: How to import data from Oracle to Solr So if I understand correctly ... 1) ... all mappings added to the Solr Field Mapping tab are ignored

Re: Repeated service interruptions

2012-07-19 Thread Karl Wright
to reduce maximum number of connections from 10 to 5, but didn't avoid busy error. I'll try to reduce more. Thank you. Shinichiro Abe On 2012/07/19, at 15:55, Karl Wright wrote: Hi Abe-san, The all pipe instances are busy error is coming from the Windows server you are trying to crawl. I don't

RE: How to import data from Oracle to Solr

2012-07-21 Thread Karl Wright
Hi Wolfgang, Looking at the code it turns out I was wrong about metadata support being there in the connector. Sorry for the confusion. The way it works is that in the data query and no required return column is considered to be metadata with a field name corresponding to the return column

Re: Repeated service interruptions

2012-07-24 Thread Karl Wright
Hi Abe-san, Did you figure out what the problem was? Karl On Thu, Jul 19, 2012 at 5:52 AM, Karl Wright daddy...@gmail.com wrote: Hi Abe-san, Sometimes what looks like a server error can actually be due to the domain controller. I wonder if the domain controller needs to be rebooted

Re: crawled counts on WEB crawling differ between MCF0.4 and MCF0.5

2012-07-29 Thread Karl Wright
is a result of first crawling after deleting indexing history from DB. It seems that changing DB affects crawling and indexing. Regards, Shigeki 2012/7/27 Karl Wright daddy...@gmail.com There was a bug fixed in the way hopcount was being computed. See CONNECTORS-464. This means

Re: Repeated service interruptions

2012-08-01 Thread Karl Wright
, worked well. Thank you, Shinichiro Abe On 2012/07/24, at 22:13, Karl Wright wrote: Hi Abe-san, Did you figure out what the problem was? Karl On Thu, Jul 19, 2012 at 5:52 AM, Karl Wright daddy...@gmail.com wrote: Hi Abe-san, Sometimes what looks like a server error can actually

Re: SharePoint Library consist of folders

2012-08-02 Thread Karl Wright
In that case, you will need to wait until CONNECTORS-492 is resolved. Because of SharePoint's lack of support for accessing large libraries via the Lists service, we're having to write our own. But this is not yet ready, although we are getting closer to trying it out soon. Karl On Thu, Aug 2,

Re: SharePoint Library consist of folders

2012-08-03 Thread Karl Wright
I checked this change into trunk, and added also corresponding code in the place where fields and metadata are fetched. This may work for you in the interim while we're finishing up CONNECTORS-492. Karl On Fri, Aug 3, 2012 at 8:11 AM, Ahmet Arslan iori...@yahoo.com wrote: Hello, I found that

Re: Document Security Modification Requirement during Indexing

2012-08-13 Thread Karl Wright
Well, you can either modify the document's acls in the Tika pipeline (which I think would be easiest), or you can hack up the Apache ManifoldCF Solr Plugin. Those seem like your only real choices to me. I would choose the former since Tika is meant to be configured in this way. Karl On Tue,

Re: Crawling MySQL with latest MySQL connector fails

2012-08-20 Thread Karl Wright
the column name. See CONNECTORS-509. Karl On Mon, Aug 20, 2012 at 8:00 AM, Karl Wright daddy...@gmail.com wrote: Here's some additional info. The JDBC class ResultSetMetaData has two methods: getColumnName(), and getColumnLabel(). For all supported databases, getColumnName() returns the right

Re: Job crawling SharePoint repository does not end

2012-09-04 Thread Karl Wright
Karl, Yes, this is SharePoint 2010 OK, then I'll try switching to trunk and start working with it. Thanks for the information, Karl. Thanks and Regards, Swapna. On Tue, Sep 4, 2012 at 3:44 PM, Karl Wright daddy...@gmail.com wrote: Hi - What version of SharePoint are you trying to crawl

Re: Job crawling SharePoint repository does not end

2012-09-04 Thread Karl Wright
Also, please be certain to look at CONNECTORS-492, which applies to SharePoint 2010. It may not affect you, but if it does, bear in mind we have not completed development on it yet. Karl On Tue, Sep 4, 2012 at 6:48 AM, Karl Wright daddy...@gmail.com wrote: You will need the SharePoint-2010

Re: Job crawling SharePoint repository does not end

2012-09-06 Thread Karl Wright
. Thanks and Regards, Swapna. On Wed, Sep 5, 2012 at 11:01 PM, Karl Wright daddy...@gmail.com wrote: FWIW, CONNECTORS-492 was just completed, and merged into trunk. You will need a new build of the SharePoint-2010 plugin to use it. Thanks, Karl On Tue, Sep 4, 2012 at 7:34 AM, Swapna

RE: Job crawling SharePoint repository does not end

2012-09-10 Thread Karl Wright
is mandatory, just for being able to crawl them and index into Solr ? Why is it different from the connector I was using in ManifoldCF 0.6 ? Thanks and Regards, Swapna. On Thu, Sep 6, 2012 at 7:17 PM, Karl Wright daddy...@gmail.com wrote: There is a SharePoint-2010 plugin 0.1 release candidate

Re: Does anyone use MOSS?

2012-10-10 Thread Karl Wright
I don't know of any difference from a SharePoint standpoint between MOSS and WSS, except for additional Office-related plugins on MOSS. Connection working means you could could get to SharePoint at least. Can you look in the log and find the exception associated with the Cannot open the

Re: Web crawling causes Socket Timeout after Database Exception

2012-10-10 Thread Karl Wright
Hi Shigeki, The socket timeout exception is only a warning. It means that some site you are crawling did not accept a socket connection within the allowed time (5 minutes I think). The Web Connector will retry the connection a few times, and if it is still rejected, it will eventually give up

Re: Strange behaviour on internet free server

2012-10-15 Thread Karl Wright
I take it by internet free you mean a local network that is not connected to the internet? There should be no reason why ManifoldCF would not operate in such an environment. Can you describe the strange behavior you have been seeing? Karl On Mon, Oct 15, 2012 at 12:28 PM, Johan Persson

Re: Strange behaviour on internet free server

2012-10-16 Thread Karl Wright
-version correctly. Thanks for pointing that out. BTW There is there by any chance a payed support number to call? / Johan 2012/10/16 Karl Wright daddy...@gmail.com: If this is SharePoint 2010, you need to select SharePoint 4.0 (2010) in the pulldown. It looks like you have not done

Re: Web crawling causes Socket Timeout after Database Exception

2012-10-18 Thread Karl Wright
So, what was the resolution of this problem? Any news? Karl On Thu, Oct 11, 2012 at 2:28 AM, Karl Wright daddy...@gmail.com wrote: The only change is that the MySQL driver now performs ANALYZE operations on the fly in order to keep the database operating at high efficiency

Re: Web crawling causes Socket Timeout after Database Exception

2012-10-19 Thread Karl Wright
exception . 2012/10/18 Karl Wright daddy...@gmail.com So, what was the resolution of this problem? Any news? Karl On Thu, Oct 11, 2012 at 2:28 AM, Karl Wright daddy...@gmail.com wrote: The only change is that the MySQL driver now performs ANALYZE operations on the fly in order to keep

Re: Problem with reading files from Sharepoint 2010 to manifldcf 1.0.1

2012-10-30 Thread Karl Wright
I finally was able to look at the logs. The exception that stops the job is in fact coming from the GetListItems call: at org.apache.axis.client.Call.invoke(Call.java:1812) at com.microsoft.sharepoint.webpartpages.PermissionsSoapStub.getListItems(PermissionsSoapStub.java:234) at

Re: Problem with reading files from Sharepoint 2010 to manifldcf

2012-10-30 Thread Karl Wright
operation supported in the browser. What can be the reason for this? Can there be a mismatch between the sharepoint driver on MCF and the sharepoint server? How do you suggest I continue to investigate? Thanks Oren. -Original Message- From: Karl Wright [mailto:daddy...@gmail.com

Re: Problem with reading files from Sharepoint 2010 to manifldcf

2012-10-31 Thread Karl Wright
a redirection taking place to reach your _vti_bin directory, try using the final target of the redirection instead of the initial URL, and see if that helps... Karl On Tue, Oct 30, 2012 at 11:44 AM, Karl Wright daddy...@gmail.com wrote: Seeing the existence of the service in the browser does

Re: Problem with reading files from Sharepoint 2010 to manifldcf

2012-10-31 Thread Karl Wright
idea/help it would be great Thanks Oren. -Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent: יום ד 31 אוקטובר 2012 09:39 To: Fridler, Oren; user@manifoldcf.apache.org Subject: Re: Problem with reading files from Sharepoint 2010 to manifldcf Hi Oren, I've been

Re: Problem with reading files from Sharepoint 2010 to manifldcf

2012-10-31 Thread Karl Wright
wrote: Sorry, my bad, I attached the wrong file. Attached is manifoldcf log when 127.0.0.1 is used for sharepoint server Oren -Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent: יום ד 31 אוקטובר 2012 15:25 To: Fridler, Oren Cc: user@manifoldcf.apache.org Subject

Re: Problem with manifold

2012-11-02 Thread Karl Wright
the (full) query and the results returned - that may also be useful. Thanks, Karl On Fri, Nov 2, 2012 at 6:25 AM, Karl Wright daddy...@gmail.com wrote: Hi Pablo, The first thing that I notice is that, as you have this configured, you need four fields declared in your schema as indexable fields

Re: ManifoldCF 1.0.1 MySQL setup : Error getting connection: Access denied for user

2012-11-02 Thread Karl Wright
Hi Nigel, I'm not a MySQL expert, but I seem to recall there was something interesting about the way MySQL authenticated remote connections. There are two properties that the MySQL driver looks at: /** MySQL server property */ public static final String mysqlServerProperty =

Re: Problem with manifold

2012-11-05 Thread Karl Wright
/requestHandler /config On Mon, Nov 5, 2012 at 5:42 AM, Karl Wright daddy...@gmail.com wrote: No - I mean modifying ManifoldCFSearchComponent itself, and rebuilding the component yourself. You can download the sources that correspond to the release from the ManifoldCF download page, http

RE: ManifoldCF 1.0.1 MySQL setup : Error getting connection:

2012-11-05 Thread Karl Wright
postgres instance with full privileges for the moment and may revisit the code later. Thanks for the help. Nigel Thomas On 2 November 2012 15:35, Karl Wright daddy...@gmail.com wrote: Hi Nigel, I'm not a MySQL expert, but I seem to recall there was something interesting about the way MySQL

Re: ManifoldCF 1.0.1 MySQL setup : Error getting connection: Access denied for user

2012-11-05 Thread Karl Wright
and users already exist. I have reverted to using postgres instance with full privileges for the moment and may revisit the code later. Thanks for the help. Nigel Thomas On 2 November 2012 15:35, Karl Wright daddy...@gmail.com wrote: Hi Nigel, I'm not a MySQL expert, but I seem to recall

Re: Cannot connect to SharePoint 2010 instance

2012-11-06 Thread Karl Wright
/_vti_bin works with any valid site in sitepath including the previously mentioned _admin site. That said do you have any thoughts on why I would be getting the 404 error? Thanks Bob -Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent: Monday, November 05, 2012 2:45

Re: The Schedulars are not starting automatically

2012-11-06 Thread Karl Wright
Bhattacharya anupam...@gmail.com wrote: Thanks.. There is a option to set Start Method in Connection tab in the Job settings. I made to changes to Start when the Schedule window starts and the problem got resolved. Regards Anupam On Thu, Aug 2, 2012 at 10:59 PM, Karl Wright daddy...@gmail.com

Re: Cannot connect to SharePoint 2010 instance

2012-11-06 Thread Karl Wright
not authenticate properly, or has insufficient permissions to access http://.xxx.xxx: (401)Unauthorized I can log into the SharePoint site from the browser using the same credentials. Any Thoughts? Thanks Bob -Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent

Re: Cannot connect to SharePoint 2010 instance

2012-11-06 Thread Karl Wright
Bob -Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent: Tuesday, November 06, 2012 2:50 PM To: user@manifoldcf.apache.org Subject: Re: Cannot connect to SharePoint 2010 instance Yes, this can be somewhat tricky. There are a lot of potential configurations

Re: Cannot connect to SharePoint 2010 instance

2012-11-06 Thread Karl Wright
: Karl, If this is not possible can you recommend any other products to crawl SharePoint content and index it in Solr? Thanks Bob -Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent: Tuesday, November 06, 2012 3:10 PM To: user@manifoldcf.apache.org Subject: Re

Re: Cannot connect to SharePoint 2010 instance

2012-11-06 Thread Karl Wright
...@novartis.com wrote: Karl, On another topic is there a roadmap for supporting SharePoint 2013 ? We are in the process of migrating and were wondering when your ManifoldCF product would be available to support it. Thanks Bob -Original Message- From: Karl Wright [mailto:daddy

Re: Cannot connect to SharePoint 2010 instance

2012-11-06 Thread Karl Wright
been released so we can cross #1 off the list :) Thanks Bob -Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent: Tuesday, November 06, 2012 3:47 PM To: user@manifoldcf.apache.org Subject: Re: Cannot connect to SharePoint 2010 instance Hi Bob, That depends very

Re: Problem with manifold

2012-11-07 Thread Karl Wright
have any idea please let me know. I will anyway tell you whether it worked or not. Thanks, Pablo -Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent: lunes, 05 de noviembre de 2012 11:57 To: user@manifoldcf.apache.org Subject: Re: Problem with manifold Just

RE: Problem with manifold

2012-11-07 Thread Karl Wright
the share tokens, or perhaps we should do something at the machine that contains the documents that we are indexing, to configure share-level security as we did at the document level. -Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent: miércoles, 07 de noviembre de 2012 11:42

Re: value of DATACOLUMN

2012-11-12 Thread Karl Wright
and can not indexed on Solr 4.0. On Solr 3.6 content-type was text/plain, On Solr 4.0 content-type was application/octet-stream. Is this Solr's issue, not database's encoding? On 2012/11/12, at 20:36, Karl Wright wrote: It looks like the Postgresql JDBC driver sets the encoding itself, from what I

Re: Process behavior of executing multiple jobs

2012-11-19 Thread Karl Wright
Hi Shigeki, This is a complex question, which is actually at the center of what ManifoldCF does. There are two different kinds of scheduling that MCF does. The first is scheduling documents within a single connection. The second is scheduling documents across connections. Let's start with the

Re: Cannot connect to SharePoint 2010 instance

2012-11-26 Thread Karl Wright
client. In order for that to happen, connectors that support Kerberos would need to be able to kerberos authenticate. But, for right now, this may work for people needing Kerberos. Karl On Sun, Nov 11, 2012 at 8:42 AM, Karl Wright daddy...@gmail.com wrote: The port of the SharePoint connector

Re: SharePoint 2007 Connector - (401)HTTP/1.1 401 Unauthorized

2012-11-27 Thread Karl Wright
org.apache.manifoldcf.crawler.connectors.sharepoint.common, locale it 2012/11/27 Karl Wright daddy...@gmail.com Hi Luigi, The warning is coming from the part of commons-httpclient that is trying to set up communication with your SharePoint instance. It thinks it needs to use SPNEGO to figure out

Re: SharePoint 2007 Connector - (401)HTTP/1.1 401 Unauthorized

2012-11-27 Thread Karl Wright
-kirey.lan password= Connection status:Crawl user did not authenticate properly, or has insufficient permissions to access http://vm-shpt2k7.services-kirey.lan/KireyRep: *(401)HTTP/1.1 401 Unauthorized* on manifoldcf.log *no error trace !* 2012/11/27 Karl Wright daddy

Re: SharePoint 2007 Connector - (401)HTTP/1.1 401 Unauthorized

2012-11-27 Thread Karl Wright
+3.0.4506.2152;+.NET+CLR+3.5.30729) *200* 0 0 It's quite a conundrum ... 2012/11/27 Karl Wright daddy...@gmail.com Ok, can you try a fully-qualified domain name, rather than the abbreviated one you have given, for the credentials? Also, you might want to look at the server-side event logs

Re: Cannot connect to SharePoint 2010 instance

2012-11-27 Thread Karl Wright
and I get this error Crawl user did not authenticate properly, or has insufficient permissions to access http://...: (401)HTTP/1.1 401 Unauthorized Thanks Bob -Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent: Monday, November 26, 2012 6:38 PM

Re: SharePoint 2007 Connector - (401)HTTP/1.1 401 Unauthorized

2012-11-27 Thread Karl Wright
: */* HTTP/1.1 *411 Length Required* Content-Type: text/html Date: Tue, 27 Nov 2012 16:32:06 GMT Connection: close Content-Length: 24 h1Length Required/h1* Closing connection #0 2012/11/27 Karl Wright daddy...@gmail.com Just on a whim, can you try POST with curl also? It is possible

Re: SharePoint 2007 Connector - (401)HTTP/1.1 401 Unauthorized

2012-11-27 Thread Karl Wright
2012/11/27 Karl Wright daddy...@gmail.com You need to use the --data option, not -X. Karl On Tue, Nov 27, 2012 at 11:37 AM, Luigi D'Addario luigi.dadda...@googlemail.com wrote: Karl, via curl in POST i get a HTTP/1.1 *411 Length Required* * * It meand that POST is blocked ? curl

Re: Cannot connect to SharePoint 2010 instance

2012-11-27 Thread Karl Wright
. Karl On Tue, Nov 27, 2012 at 11:36 AM, Karl Wright daddy...@gmail.com wrote: Hi Bob, This is really beginning to sound like there is a header problem of some kind. This is what I'd like to try. (1) Turn on wire debugging for SharePoint, as described here: https://cwiki.apache.org

Re: Cannot connect to SharePoint 2010 instance

2012-11-27 Thread Karl Wright
Bob -Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent: Tuesday, November 27, 2012 12:52 PM To: user@manifoldcf.apache.org Subject: Re: Cannot connect to SharePoint 2010 instance Hi Bob, If the headers all check out, then maybe this is the cause: http

Re: Cannot connect to SharePoint 2010 instance

2012-11-27 Thread Karl Wright
was unable to process request. ---gt; Data at the root level is invalid. Line 1, position 1./soap:Text/soap:Reasonsoap:Detail //soap:Fault/soap:Body/soap:Envelope[iannero1@ip-10-145-32-121 logs]$ -Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent: Tuesday

Re: Cannot connect to SharePoint 2010 instance

2012-11-27 Thread Karl Wright
Here we go: Header logging: org.apache.http.headers=DEBUG Wire logging (which we probably don't need): org.apache.http.wire=DEBUG Karl On Tue, Nov 27, 2012 at 2:04 PM, Karl Wright daddy...@gmail.com wrote: The wire debugging setup you are using will only work with commons-httpclient

Re: Cannot connect to SharePoint 2010 instance

2012-11-27 Thread Karl Wright
Yes. Karl On Tue, Nov 27, 2012 at 2:14 PM, Iannetti, Robert robert.ianne...@novartis.com wrote: So would the org.apache.http.headers=DEBUG replace the log4j.logger.httpclient.wire=DEBUG entry in the logging.ini file? -Original Message- From: Karl Wright [mailto:daddy...@gmail.com

Re: Web crawling causes Socket Timeout after Database Exception

2012-11-28 Thread Karl Wright
Ok, fix has been checked in. Karl On Wed, Nov 28, 2012 at 3:19 AM, Karl Wright daddy...@gmail.com wrote: The ticket is CONNECTORS-571. Karl On Wed, Nov 28, 2012 at 3:12 AM, Karl Wright daddy...@gmail.com wrote: Hi Shigeki, This confirms my theory that our MySQL driver is not detecting all

Re: Cannot connect to SharePoint 2010 instance

2012-11-29 Thread Karl Wright
-Original Message- From: Karl Wright [mailto:daddy...@gmail.com] Sent: Tuesday, November 27, 2012 5:25 PM To: user@manifoldcf.apache.org Subject: Re: Cannot connect to SharePoint 2010 instance no, you need: log4j.logger.logger_name=DEBUG in this case: log4j.logger.org.apache.http.headers

Re: Web crawling causes Socket Timeout after Database Exception

2012-11-30 Thread Karl Wright
of the problem is to reduce deadlocks in MySQL. I am not sure if this could be solved by MCF but this is a task that people using MySQL need to know. Regards, Shigeki 2012/11/28 Karl Wright daddy...@gmail.com Yes, the SQL code will be output to the manifoldcf.log as part of the exception

Re: Running multiple MCFs on one Tomcat

2012-11-30 Thread Karl Wright
Hi Shigeki, Each MCF instance should have its own properties.xml file. Since the way you tell MCF where the properties.xml file is located is with a -D switch, I don't think you can run multiple instances properly in one JVM. If this is important to you, please let us know, and also please

Re: Running multiple MCFs on one Tomcat

2012-12-04 Thread Karl Wright
...@g.softbank.co.jp wrote: Hi Karl, I noticed MCF does not use much CPU. I was wondering if running multiple MCFs could increase the CPU usages. Regards, Shigeki 2012/11/30 Karl Wright daddy...@gmail.com Hi Shigeki, Each MCF instance should have its own properties.xml file. Since the way

Re: Cannot connect to SharePoint 2010 instance

2012-12-05 Thread Karl Wright
- From: Karl Wright [mailto:daddy...@gmail.com] Sent: Thursday, November 29, 2012 3:28 AM To: user@manifoldcf.apache.org Subject: Re: Cannot connect to SharePoint 2010 instance Hi Robert, Luigi and I think we've discovered the issue, which we're going to see if we can confirm today

Re: Cannot connect to SharePoint 2010 instance

2012-12-05 Thread Karl Wright
I actually did decide to modify the build to pull the changed jars down automatically. So you can just download the artifacts under http://people.apache.org/~kwright/apache-manifoldcf-1.1-dev and you should get updated binaries. Karl On Wed, Dec 5, 2012 at 6:03 PM, Karl Wright daddy

Re: Web crawl exited with an unexpected jobqueue status error under MySQL

2012-12-05 Thread Karl Wright
...@g.softbank.co.jp wrote: Hello Karl. MySQL: 5.5.24 Tomcat: 6.0.35 CentOS: 6.3 Regards, Shigeki 2012/12/5 Karl Wright daddy...@gmail.com Yes, I believe it is related, in the sense that the fix for CONNECTORS-246 was a fix to the HSQLDB database. This error makes it clear that MySQL has

Re: SharePoint 2007 Connector - (401)HTTP/1.1 401 Unauthorized

2012-12-06 Thread Karl Wright
the requested Sharepoint Site. I send you manifoldcf.log. Thanks. Luigi 2012/12/5 Luigi D'Addario luigi.dadda...@googlemail.com ..and I, finally, tomorrow will try to put into Solr my SharPoint documents ! 2012/12/5 Karl Wright daddy...@gmail.com I'll have to figure out how to get

Re: SharePoint 2007 Connector - (401)HTTP/1.1 401 Unauthorized

2012-12-06 Thread Karl Wright
with curl until you get it to not fail. Thanks, Karl On Thu, Dec 6, 2012 at 6:29 AM, Karl Wright daddy...@gmail.com wrote: Hi Luigi, Others have also run into this exception, from one or more SharePoint web services. It is a server side catch-all exception which tells us very little. You may

Re: Too many slow queries caused by MCF running MySQL 5.5

2012-12-09 Thread Karl Wright
Hi Shigeki, The rules for when a database will use an index for an ORDER BY clause differ significantly from database to database. The current logic seems to satisfy PostgreSQL, HSQLDB, and Derby, but clearly not MySQL. I will see if I can find a solution. The ticket for this CONNECTORS-584.

Re: Too many slow queries caused by MCF running MySQL 5.5

2012-12-10 Thread Karl Wright
, 2012 at 2:49 AM, Karl Wright daddy...@gmail.com wrote: Hi Shigeki, The rules for when a database will use an index for an ORDER BY clause differ significantly from database to database. The current logic seems to satisfy PostgreSQL, HSQLDB, and Derby, but clearly not MySQL. I will see if I can

Re: latest trunk BUILD FAILED

2012-12-10 Thread Karl Wright
Ok, I fixed this. Karl On Sun, Dec 9, 2012 at 8:57 PM, Shinichiro Abe shinichiro.ab...@gmail.com wrote: Hi, I couldn't build the latest trunk. It seemed that MeridioConnector could not be compiled. compile-connector: [javac] /Users/abe/mcf/trunk/connectors/connector-build.xml:420:

Re: Too many slow queries caused by MCF running MySQL 5.5

2012-12-10 Thread Karl Wright
you recommend for the case of crawling a humongous number of files for now? PostgreSQL? Regards, Shigeki 2012/12/10 Karl Wright daddy...@gmail.com Since you have a large table, can you try an EXPLAIN for the following query, which should match the explanation given here: http

Re: Too many slow queries caused by MCF running MySQL 5.5

2012-12-10 Thread Karl Wright
Sorry, the FORCE INDEX hint requires the name of the index. Since ManifoldCF does not assign index names to fixed values, you will need to find the right one, by using the SHOW INDEX command first to get the right index's name. Apologies, Karl On Mon, Dec 10, 2012 at 6:41 AM, Karl Wright daddy

Re: Too many slow queries caused by MCF running MySQL 5.5

2012-12-10 Thread Karl Wright
Experiments here indicate that FORCE INDEX seems to do what we need. I'm going to think about it a bit and then come up with a fix that should use FORCE INDEX in this situation. Then we can see if it actually helps for you. Karl On Mon, Dec 10, 2012 at 8:01 AM, Karl Wright daddy...@gmail.com

Re: Too many slow queries caused by MCF running MySQL 5.5

2012-12-10 Thread Karl Wright
should be to use the existing database instance. Thanks, Karl On Mon, Dec 10, 2012 at 5:05 PM, Karl Wright daddy...@gmail.com wrote: Experiments here indicate that FORCE INDEX seems to do what we need. I'm going to think about it a bit and then come up with a fix that should use FORCE INDEX

RE: Too many slow queries caused by MCF running MySQL 5.5

2012-12-11 Thread Karl Wright
=org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector/ Regards, Shigeki 2012/12/11 Karl Wright daddy...@gmail.com Hi Shigeki, I'm uploading a new version of ManifoldCF 1.1-dev, which you can pick up at http://people.apache.org/~kwright/apache-manifoldcf-1.1-dev . This has a good chance of fixing the query performance

Re: How to crawl from the point where the job is stopped by errors

2012-12-12 Thread Karl Wright
ManifoldCF is incremental and will do as little work as possible when a job is restarted. The details of what that means depend on the actual connector involved. For Windows Share connections, the document's modify date is checked again, but the document does not need to be indexed if that has

Re: Many sleep process in MySQL while crawling files using Window share connection

2012-12-12 Thread Karl Wright
The MySQL threads correspond to handles in the ManifoldCF handle pool. Since a worker thread can use only one handle at a time, one expects that at best the number of MySQL processes that are active during a crawl are about equal to the number of ManifoldCF worker threads. If this is not true it

Re: Build failure on Java7

2012-12-12 Thread Karl Wright
I created a ticket, CONNECTORS-586, to track this problem. Karl On Wed, Dec 12, 2012 at 4:52 PM, Karl Wright daddy...@gmail.com wrote: Native2Ascii is a maven plugin, but it may well not be compatible with Java 7, or you might be using a non-Oracle jdk. Generally we recommend openjdk

Re: File crawl using exited with an unexpected jobqueue status error under MySQL

2012-12-20 Thread Karl Wright
Yes, it is the same cause - a transactional integrity bug in the database, MySQL in this case. I can open a ManifoldCF ticket, but the real fix has to come from the MySQL team. Karl On Thu, Dec 20, 2012 at 8:59 PM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hi I run

Re: Timeout values to be configurable

2013-01-03 Thread Karl Wright
FWIW, the newest version of the Solr connector now has configurable timeout values. But my original comment still stands; you really should not find yourself in a position to need this. Karl On Wed, Dec 26, 2012 at 6:19 AM, Karl Wright daddy...@gmail.com wrote: Hi Shigeki, While timeout

Re: Http status code 302

2013-01-09 Thread Karl Wright
, Karl Wright daddy...@gmail.com wrote: It sounds like the httpclient upgrade definitely broke something. We should open a ticket. But first, can you confirm what connector this is? Is it the web connector? If so, I am puzzled because the web connector has always logged any 302 return

Re: Http status code 302

2013-01-09 Thread Karl Wright
On 2013/01/09, at 17:49, Karl Wright wrote: When I try the URL you gave using curl and no special arguments, I get this: C:\Users\Karlcurl -vvv http://lucene.jugem.jp/?eid=39; * About to connect() to lucene.jugem.jp port 80 (#0) * Trying 210.172.160.170... connected * Connected

Re: Http status code 302

2013-01-09 Thread Karl Wright
Is it enough to diagnose? Thank you very much, Shinichiro On 2013/01/09, at 23:12, Karl Wright wrote: Wire debugging with MCF 1.0.1 requires different logging.ini parameters, because it uses commons-httpclient instead. That's described here: http://hc.apache.org/httpclient-3.x

Re: Http status code 302

2013-01-09 Thread Karl Wright
I created CONNECTORS-604 to track this problem. Karl On Wed, Jan 9, 2013 at 10:02 AM, Karl Wright daddy...@gmail.com wrote: There seems to be only two differences. The Host header value is different, and there is an Accept header in the one that works. (Accept: */*) I will experiment

Re: Monitoring Manifold CF

2013-01-16 Thread Karl Wright
Hi, The REST API can give you the job status. Karl On Wed, Jan 16, 2013 at 6:12 AM, Christian Hepworth christian.hepwo...@york.ac.uk wrote: Hello We are using Manifold CF to index Solr, via an Oracle connection. Our job is currently scheduled to run every evening, but we have had a few

Re: Crawling new/updated files using Windows share connection takes too long

2013-01-18 Thread Karl Wright
Hi Shigeki, What database is ManifoldCF configured to use in this case? Do you see any indication of slow queries in the ManifoldCF log? Karl On Fri, Jan 18, 2013 at 5:27 AM, Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp wrote: Hello I would like some advice to improve crawling

Re: XML parsing error quits file crawling using Windows share connection

2013-01-21 Thread Karl Wright
This means that the Solr you are talking to has returned an unintelligible (non-XML) response. When this happens I believe the actual return text is included in the Simple History, so I'd look there first to see what the problem might be. You may also eventually want to update to the current

Re: Job hanging on Starting up with never ending external query.

2013-01-21 Thread Karl Wright
Hi Anthony, What happens between the framework recognizing that the job should be started (which it does fine in both cases), and actually achieving a correct job start, is the seeding phase, which is going to try to execute the seeding query against your Oracle database. If something happens at

Re: Crawling new/updated files using Windows share connection

2013-01-21 Thread Karl Wright
CONNECTORS-618 Karl On Mon, Jan 21, 2013 at 9:08 AM, Karl Wright daddy...@gmail.com wrote: Bad news, I am afraid. MySQL seems to always put null values at the front of the index, and that cannot be changed through any means I can find. This is different from all other databases I know

Re: Job hanging on Starting up with never ending external query.

2013-01-21 Thread Karl Wright
http://twitter.com/apbleonard Times Higher Education University of the Year 2010 On Mon, Jan 21, 2013 at 1:25 PM, Karl Wright daddy...@gmail.com wrote: Hi Anthony, What happens between the framework recognizing that the job should be started (which it does fine in both cases), and actually

Re: Crawling new/updated files using Windows share connection

2013-01-21 Thread Karl Wright
I checked a fix for this into trunk. Please sync up with trunk and see if this fixes your problem. If it does, I will gladly include the fix in MCF 1.1. Karl On Mon, Jan 21, 2013 at 9:14 AM, Karl Wright daddy...@gmail.com wrote: CONNECTORS-618 Karl On Mon, Jan 21, 2013 at 9:08 AM, Karl

Re: Job hanging on Starting up with never ending external query.

2013-01-22 Thread Karl Wright
)1904 434350 http://twitter.com/apbleonard Times Higher Education University of the Year 2010 On Mon, Jan 21, 2013 at 2:15 PM, Karl Wright daddy...@gmail.com wrote: kill -QUIT should not abort the agents process, just cause a thread dump. kill -9 is a different story. You can also do

Re: Job hanging on Starting up with never ending external query.

2013-01-22 Thread Karl Wright
not terribly likely to cause exceptions. I've opened a ticket - CONNECTORS-620. Karl On Tue, Jan 22, 2013 at 12:53 PM, Karl Wright daddy...@gmail.com wrote: Hmm. The following threads are of interest here: Thread 29975: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Compiled frame

Re: max_pred_locks_per_transaction

2013-01-25 Thread Karl Wright
Hi Erlend, Leaving logging at the default values would have shown the ERROR message you have below. So the cause for the pause must have been something else. When ManifoldCF seems to make no progress, the first thing to do is look at the simple history and see if it is retrying on something for

Re: Job hanging on Starting up with never ending external query.

2013-01-25 Thread Karl Wright
of York, Heslington, York, UK, YO10 5DD Tel: +44 (0)1904 434350 http://twitter.com/apbleonard Times Higher Education University of the Year 2010 On Tue, Jan 22, 2013 at 6:51 PM, Karl Wright daddy...@gmail.com wrote: I've checked in code in both trunk and the release branch for this issue

Re: Diagnosing REJECTED documents in job history

2013-01-30 Thread Karl Wright
for better error reporting in this connector - it was a contribution and AFAIK the error handling is not very robust at this point, but I can fix that quickly with your help. ;-) Karl On Wed, Jan 30, 2013 at 8:55 AM, Andrew Clegg andrew.cl...@gmail.com wrote: On 30 January 2013 13:33, Karl

Re: Diagnosing REJECTED documents in job history

2013-01-30 Thread Karl Wright
... I'll try running wireshark to see if I can follow the TCP stream. On 30 January 2013 14:16, Karl Wright daddy...@gmail.com wrote: Ok, ElasticSearch is not happy about something when the document is being posted. The connector is seeing a non-200 HTTP response, and throwing an exception

Re: Diagnosing REJECTED documents in job history

2013-01-30 Thread Karl Wright
I just checked in a refactoring to trunk that should improve Elastic Search error reporting significantly. Karl On Wed, Jan 30, 2013 at 9:39 AM, Karl Wright daddy...@gmail.com wrote: I agree that the Elastic Search connector needs far better logging and error handling. CONNECTORS-629. Karl

Re: Diagnosing REJECTED documents in job history

2013-02-01 Thread Karl Wright
-- although I did add pdf as an allowed mime type in the ElasticSearch page of the job config, just to see if it would parse this ok. Do you know if there's any way to map from a source's content type to a destination's content type? On 31 January 2013 23:09, Karl Wright daddy...@gmail.com

  1   2   3   4   5   6   7   8   9   10   >