Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-16 Thread Mingchun Zhao
Hi Karl, Got it. I'll spin up the RC1 just now. Regards, Mingchun 2014-08-15 19:26 GMT+09:00 Karl Wright daddy...@gmail.com: The ticket is CONNECTORS-1010, which I have fixed and pulled up a fix for 1.7 for. Mingchun, can you close this vote, and spin up an RC1 that we can vote on?

[CANCEL][VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-16 Thread Karl Wright
Release candidate is withdrawn. Karl On Sat, Aug 16, 2014 at 6:55 AM, Mingchun Zhao mingchun.zha...@gmail.com wrote: Hi Karl, Got it. I'll spin up the RC1 just now. Regards, Mingchun 2014-08-15 19:26 GMT+09:00 Karl Wright daddy...@gmail.com: The ticket is CONNECTORS-1010, which I

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-15 Thread Erlend Garåsen
On 12.08.14 05:13, Mingchun Zhao wrote: Hi all, Please vote on whether to release the ManifoldCF, version 1.7, RC0. You can find the artifact at: http://people.apache.org/~mingchun/apache-manifoldcf-1.7-RC0 There is also a tag at:

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-15 Thread Erlend Garåsen
-1 All my first tests pass, but I think I found a blocker when I ran the last one. By running MCF using FileLockManager, I'm getting the following error and MCF just tries to run this task over and over again. My synch folder now contains a lot of files and it still grows. I think MCF

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-15 Thread Erlend Garåsen
Another thing. It's not possible to abort the job due to this problem. LockManager still tries to set locks over and over again. It's not just the previous URL/filename I entered, but several others: WARN 2014-08-15 10:07:46,178 (Worker thread '31') - Attempt to set file lock

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-15 Thread Karl Wright
Hi Erlend, This is actually the result of a bug fix that was made in the 1.7 time frame. The problem actually arose in the first place in 1.5, when a lock needed to go from being a critical section to a cross-process lock. But, due to an oversight, this was only fixed now. We can lock based on

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-15 Thread Karl Wright
The ticket is CONNECTORS-1010, which I have fixed and pulled up a fix for 1.7 for. Mingchun, can you close this vote, and spin up an RC1 that we can vote on? Thanks! Karl On Fri, Aug 15, 2014 at 5:50 AM, Karl Wright daddy...@gmail.com wrote: Hi Erlend, This is actually the result of a bug

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Karl Wright
I request that the vote be left open at least until 8/21/2014, since 1.7 is a major release and we want as many people to try it out as possible before declaring it complete. Thanks! Karl On Tue, Aug 12, 2014 at 12:44 AM, Shinichiro Abe shinichiro.ab...@gmail.com wrote: Hi, +1 from me.

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Karl Wright
Hi Abe-san, I wonder if you would be willing to try indexing Koji's site into Solr using the Tika transformation connection and the Solr standard update handler? It would be good to know if this works just like the Solr extracting update handler. If not we should fix it. Thanks! Karl On

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Karl Wright
Hi Abe-san, I think TikaExtractor should fully support sjis input at this point. It should convert extracted sjis content to utf-8. I know SOLR-6199 is an issue but I want to be sure we have otherwise completed the Tika connector properly. So that is why I asked you to try it, but not in

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Karl Wright
I ran ant rat-sources, and inspected the packages. All looks good. The only comment is that the connector-lib area has grown by about 18MB this cycle, and of course all the images for the Chinese documentation add another 5MB, so our binary packages are now just about 200MB. I don't think this

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Shinichiro Abe
Hi Karl, The content field was garbled via /update and tika connector. Sample Docs: http://www.rondhuit.com/download.html#whitepaper My mcf-job was from filesystem:Japanese PDF,XLS to Solr. I was surprised that Solr threw an exception when en_US end-user-documentation.pdf was posted via tika

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Karl Wright
Hi Abe-san, It looks to me like SolrJ when it uses SolrInputDocument cannot correctly post some kinds of characters. The exception is coming from inside Solr itself -- not SolrJ. So I think a Solr ticket would be the right thing to do here. Can you try leaving your pipeline to include Tika,

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Karl Wright
It looks like the Tika content extraction is not actually producing valid utf-8. I'm not sure what it is producing, but that is the underlying problem. I'll create a ticket and look into it. Karl On Tue, Aug 12, 2014 at 9:52 AM, Karl Wright daddy...@gmail.com wrote: Hi Abe-san, It looks

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Shinichiro Abe
Thanks Karl, When posting MCF's end-user-documentation.pdf(English) via standard update handler, Solr throws an exception, this is a problem, I'm not sure why. It works by leaving my pipeline to include Tika and using the extracting update handler. Solr's Tika version matches MCF's Tika one(1.5).

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Karl Wright
So there are two problems. One problem is that the Tika Extractor is not doing the right thing (I think). The second problem is that valid characters are not being sent to Solr when SolrInputDocument is used. Karl On Tue, Aug 12, 2014 at 10:15 AM, Shinichiro Abe shinichiro.ab...@gmail.com

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Karl Wright
Ok, I've done some more experimentation, and confirmed that there is really only ONE problem: in SolrJ or Solr. ManifoldCF is working perfectly. The ticket I created, CONNECTORS-1008, will therefore be postponed to MCF 2.0. The workaround is the use the extracting update handler even when the

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Shinichiro Abe
I apologize for the mistake, I forgot to configure tika connector in the job. I configured documentFilter and Metadata adjuster only. It works by adding tika connector, there is no problem. English pdf, Japanese pdf/xls are not garbled! I'm sorry! So we don't have to fix CONNECTORS-1008.

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Karl Wright
Ok, I closed the ticket. So thanks, I think I'm now read to vote +1. Karl On Tue, Aug 12, 2014 at 11:38 AM, Shinichiro Abe shinichiro.ab...@gmail.com wrote: I apologize for the mistake, I forgot to configure tika connector in the job. I configured documentFilter and Metadata adjuster

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Shinichiro Abe
Hi Karl, I also confirmed that using a SJIS file attached on CONNECTORS-613, then the file was not garbled and could extract content and metadata properly by tika connector. Therefore currently we don't need to respin RC. I have a question. What is this? - hard-coded mymetype checkings,

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Shinichiro Abe
Ok, I understand we specify 'text/plain;charset=utf-8' string temporarily so that we accept all kinds of mime types. Thanks, Shinichiro Abe 2014-08-13 1:25 GMT+09:00 Karl Wright daddy...@gmail.com: bq. I have a question. What is this? - hard-coded mymetype checkings,

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-12 Thread Karl Wright
Hi Abe-san, Actually, for the Tika transformation connector, there are TWO different mime types. One mime type represents what the connector generates. The other represents what the connector can accept. This is true of all transformation connectors. Hope that helps. Karl On Tue, Aug 12,

[VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-11 Thread Mingchun Zhao
Hi all, Please vote on whether to release the ManifoldCF, version 1.7, RC0. You can find the artifact at: http://people.apache.org/~mingchun/apache-manifoldcf-1.7-RC0 There is also a tag at: https://svn.apache.org/repos/asf/manifoldcf/tags/release-1.7-RC0 Vote will remain open at least 72

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-11 Thread Mingchun Zhao
+1 from me. Inspected unpacking(tar.gz, zip), crawling(file system only). Mingchun 2014-08-12 12:13 GMT+09:00 Mingchun Zhao mingchun.zha...@gmail.com: Hi all, Please vote on whether to release the ManifoldCF, version 1.7, RC0. You can find the artifact at:

Re: [VOTE] Release Apache ManifoldCF 1.7 RC0

2014-08-11 Thread Shinichiro Abe
Hi, +1 from me. -Checked SIGS, checksum by running check_signatures.sh. -Checked that the code signing Key of Mingchun is available online. Shinichiro Abe On 2014/08/12, at 12:13, Mingchun Zhao mingchun.zha...@gmail.com wrote: Hi all, Please vote on whether to release the ManifoldCF,