[jira] [Created] (CONNECTORS-968) All output connectors should be updated so they can coexist with each other in the UI

2014-06-18 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-968: -- Summary: All output connectors should be updated so they can coexist with each other in the UI Key: CONNECTORS-968 URL: https://issues.apache.org/jira/browse/CONNECTORS-968

[jira] [Commented] (CONNECTORS-967) add links to Java7MCF framework Javadoc

2014-06-18 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035514#comment-14035514 ] Karl Wright commented on CONNECTORS-967: Looks good -- please go ahead and

[jira] [Resolved] (CONNECTORS-967) add links to Java7MCF framework Javadoc

2014-06-18 Thread Shinichiro Abe (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichiro Abe resolved CONNECTORS-967. --- Resolution: Fixed Fix Version/s: ManifoldCF 1.7 Assignee:

[jira] [Created] (CONNECTORS-969) Output connectors that use a JSON description string likely to fail

2014-06-18 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-969: -- Summary: Output connectors that use a JSON description string likely to fail Key: CONNECTORS-969 URL: https://issues.apache.org/jira/browse/CONNECTORS-969

Re: Solr Extracting request handler

2014-06-18 Thread Alessandro Benedetti
But guys, why not simply pass to a classic SolrJ SolrDocument creation and ingestion in the Solr Server ? Easy and Straighforward ! In the end at that point the RepositoryDocument will me only a Map of metadata and values. Content will be part of that, so I guess the conversion to a SolrDocument

Re: Solr Extracting request handler

2014-06-18 Thread Alessandro Benedetti
Hello Karl, What i was thinking is: assuming we have the Tika Connector, the responsibility to extract content will pass from Solr to the Tika processor. So we can change the part in the Solr Connector that manages the building of the request to send to the Extract update handler. Particularly

[jira] [Created] (CONNECTORS-970) Hadoop error and silent failure

2014-06-18 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-970: -- Summary: Hadoop error and silent failure Key: CONNECTORS-970 URL: https://issues.apache.org/jira/browse/CONNECTORS-970 Project: ManifoldCF Issue Type:

Re: Solr Extracting request handler

2014-06-18 Thread Matteo Grolla
Hi Alessandro, ideally I think that text extraction from rich documents should be Manifold responsibility, not Solr's So the ideal place to implement it would be in the new document processing pipeline (using Tika) -- Matteo Grolla Sourcesense - making sense of Open Source

Re: Solr Extracting request handler

2014-06-18 Thread Karl Wright
Hi Alessandro, The reason for backwards compatibility is obvious: people upgrade ManifoldCF all the time, and when they do it should not stop working for them. Putting Tika all the time in the pipeline is also not appropriate for other output connections. Even if you did it just for Solr, you'd

[jira] [Resolved] (CONNECTORS-968) All output connectors should be updated so they can coexist with each other in the UI

2014-06-18 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-968. Resolution: Fixed various individual commits All output connectors should be updated

Re: Solr Extracting request handler

2014-06-18 Thread Alessandro Benedetti
2014-06-18 16:10 GMT+01:00 Karl Wright daddy...@gmail.com: Hi Alessandro, The reason for backwards compatibility is obvious: people upgrade ManifoldCF all the time, and when they do it should not stop working for them. Ok i agree ! Putting Tika all the time in the pipeline is also not

ManifoldCF 2.0 plans

2014-06-18 Thread Karl Wright
Hi all, By now it is becoming clear that ManifoldCF has accumulated a lot of backwards-compatibility dead weight we have to carry around from release to release. However, ManifoldCF 2.0 will present an opportunity to break backwards compatibility with the 1.x releases. Originally, I was

Re: ManifoldCF 2.0 plans

2014-06-18 Thread Ahmet Arslan
Hi Karl, Big +1 to making 2.0 our next release. My suggestions : * Looks like discussion is ongoing but Lets assume 2.0 will be next release and consider switching to tika transformer  as in : http://searchhub.org/2012/02/14/indexing-with-solrj/ * Lets make SharePoint 2010 default value in

Re: ManifoldCF 2.0 plans

2014-06-18 Thread Karl Wright
Good suggestions! Would you be willing to create Jira tickets for these, and make the Fix In Version field be 2.0? Thanks in advance! Karl On Wed, Jun 18, 2014 at 1:10 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Karl, Big +1 to making 2.0 our next release. My suggestions : *

Re: Solr Extracting request handler

2014-06-18 Thread Karl Wright
Since a Tika transformer is critical to this plan, I'm going to code one up now. Stay tuned! Karl On Wed, Jun 18, 2014 at 11:59 AM, Karl Wright daddy...@gmail.com wrote: bq. I don't agree on this. Why is not appropriate for all the connectors ? Some output connectors want the document in

Re: ManifoldCF 2.0 plans

2014-06-18 Thread Piergiorgio Lucidi
+1 from me for breaking backwords compatibility and focusing on non-SQL data store. Piergiorgio 2014-06-18 18:19 GMT+02:00 Karl Wright daddy...@gmail.com: Hi all, By now it is becoming clear that ManifoldCF has accumulated a lot of backwards-compatibility dead weight we have to carry

Re: ManifoldCF 2.0 plans

2014-06-18 Thread Karl Wright
Hi Piergiorgio, Just to clarify -- I don't have a workable plan yet for a non-SQL data store, so maybe that waits until 3.0. Karl On Wed, Jun 18, 2014 at 3:13 PM, Piergiorgio Lucidi piergior...@apache.org wrote: +1 from me for breaking backwords compatibility and focusing on non-SQL data

Re: ManifoldCF 2.0 plans

2014-06-18 Thread Ahmet Arslan
Hi, What is non-SQL data store ? You mean to remove MFC's dependency to PostgreSQL, MySQL, Derby etc? By the way solr guys are looking for a Data Import Handler (DIH) replacement. See for the thread : http://search-lucene.com/m/WwzTb2z1w7F DIH is mostly used to sync RDBMS to Solr. What do

Re: ManifoldCF 2.0 plans

2014-06-18 Thread Karl Wright
bq. What is non-SQL data store ? You mean to remove MFC's dependency to PostgreSQL, MySQL, Derby etc? See CONNECTORS-286. bq. What do you think about this? Can MCF be dih replacement? How is our DB crawler compared to DIH? In theory it could. I'd hesitate before claiming feature-to-feature

Re: ManifoldCF 2.0 plans

2014-06-18 Thread Karl Wright
Hi Muhammed, Can you go into more depth about these: 1) Sharding support 2) Selectable seeding model. Thanks, Karl On Wed, Jun 18, 2014 at 5:38 PM, Karl Wright daddy...@gmail.com wrote: bq. What is non-SQL data store ? You mean to remove MFC's dependency to PostgreSQL, MySQL, Derby etc?

[jira] [Commented] (CONNECTORS-954) Amazon Cloud Search connector's use of Tika should be revisited after pipelines are added

2014-06-18 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036471#comment-14036471 ] Karl Wright commented on CONNECTORS-954: Committed the basic connector:

[jira] [Commented] (CONNECTORS-954) Amazon Cloud Search connector's use of Tika should be revisited after pipelines are added

2014-06-18 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036739#comment-14036739 ] Karl Wright commented on CONNECTORS-954: Added the field mapping tab: r1603687