You’re welcome Paul. Just in case, could you check the Alfresco logs to see if there is something informative there?
Cheers, Rafa On Wed, Oct 28, 2015 at 11:47 AM, Paul Farrell <[email protected]> wrote: > I see. That makes sense. > No problem. Thanks for the feedback Rafa. Much appreciated. > Paul Farrell > Senior Search Consultant > > 109-123 Clifton Street, London EC2A 4LD > T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/> > UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES > Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - > Twitter <https://twitter.com/funnelback> > Funnelback UK Ltd is a limited liability company registered in England & > Wales. Registered address: Zetland House 109-123, Clifton Street, London. > EC2A 4LD. Company registration number: 07004264. >> On 28 Oct 2015, at 10:45, Rafa Haro <[email protected]> wrote: >> >> Hi Paul, >> >> Before contributing the Alfresco connector, we performed several tests >> similar to yours using an Alfresco 4.x version. Therefore, initially, my >> guess is the Webscript is not behaving correctly for Alfresco 5 instances. >> I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the >> email thread. He might can provide some feedback about this or just confirm >> my suspicions. >> >> Cheers, >> Rafa >> >> >> >> >> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi all, >> >> In follow up to my recent email (below) I thought I would share my findings >> with the ‘Alfresco Indexer’ connector >> (https://github.com/maoo/alfresco-indexer >> <https://github.com/maoo/alfresco-indexer>) in case someone may be able to >> advise on it’s usage. >> >> The reason I went to this is due to the lack of change control detection >> with either of the packaged Manifold Alfresco connectors (AtomPub or >> WebService). I needed a method whereby the crawl runs each night and picks >> up any and all changes to the documents from the previous 24 hours. A common >> scenario. >> >> Unfortunately, I am still to achieve this. >> >> Having built and installed both the AMP and JAR files needed for the new >> connector, changes are still not coming through. In fact, I have two >> observations so far: >> >> 1. Changes to document content or properties does not cause the same >> document to be picked up by the Alfresco connector on the next run >> 2. Adding ‘Filter Configuration’ seems to do very little to change what is >> picked up >> >> IN DETAIL >> 1. Failing to pick up modified content >> >> Looking at the log files (which are set to debug) I can see that, upon the >> first crawl of Alfresco, Manifold sends the following requests: >> >> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET >> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 >> HTTP/1.1 >> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET >> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 >> HTTP/1.1 >> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> >> "GET >> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 >> HTTP/1.1[\r][\n]" >> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET >> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 >> HTTP/1.1 >> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET >> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 >> HTTP/1.1 >> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> >> "GET >> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 >> HTTP/1.1[\r][\n]" >> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET >> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content >> HTTP/1.1 >> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET >> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content >> HTTP/1.1 >> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> >> "GET >> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content >> HTTP/1.1[\r][\n]" >> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET >> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a >> HTTP/1.1 >> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET >> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a >> HTTP/1.1 >> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> >> "GET >> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a >> HTTP/1.1[\r][\n]" >> >> This picks up all of the content e.g. documents. >> >> Running a second crawl, without any other actions being done, results in the >> following requests: >> >> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET >> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D >> HTTP/1.1 >> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET >> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D >> HTTP/1.1 >> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET >> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D >> HTTP/1.1[\r][\n]” >> >> So I can see that, in the first instance, we are targeting content directly >> while, in the second, we are asking for changes. The problem is that no >> changes are returned from the second set of requests. The response from >> these calls is: >> >> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >> "totalNodes" : "0", [\r][\n]" >> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >> "elapsedTime" : "8",[\r][\n]" >> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >> "docs" : [[\r][\n]" >> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >> ],[\r][\n]" >> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >> "last_txn_id" : "352",[\r][\n]" >> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >> "last_acl_changeset_id" : "13",[\r][\n]" >> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >> "store_id" : "SpacesStore",[\r][\n]" >> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >> "store_protocol" : "workspace"[\r][\n]" >> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}" >> >> Regardless of what changes I make to a document that I have been using for >> testing, the document is not updated. The response from the calls for >> changes (totalNodes) is always ‘0’. >> >> >> 2. Adding ‘Filter Configuration’ seems to do very little to change what is >> picked up >> >> Within my test Alfresco environment I have one site set up (Finance). Within >> the Finance doc library I have three test docs. No other changes have been >> made to the Alfresco instance. >> Running a crawl with no filter configurations set returns 81 items. This is >> via the URL in a browser. >> If I then set the Site Filter configuration to ‘Finance’ and apply, I still >> get 81 items when I re-run the crawl. >> I can see that the term ‘Finance’ is being added to the URL but this does >> not seem to change the behaviour. >> >> >> I am happy to spend time diagnosing this is there is anyone available to >> assist. >> >> Thanks >> >> Paul >> >> >> >>> On 27 Oct 2015, at 18:14, [email protected] >>> <mailto:[email protected]> wrote: >>> >>> Hi all, >>> >>> This is a question regarding the relatively new Alfresco Webscript >>> connector. >>> >>> SETUP >>> I have a vanilla Alfresco Community 5.0 installation >>> One site has been created called 'Finance' >>> A handful of documents have been created in 'Finance' Doc Library. >>> I have cloned and packaged up the 'alfresco-indexer' >>> (https://github.com/maoo/alfresco-indexer >>> <https://github.com/maoo/alfresco-indexer>) and have applied the AMP and >>> CLIENT packages to their respective environments. >>> >>> >>> ISSUE >>> The issue is that the default API call used by Manifold is returning >>> nothing. The full API call used by Manifold, and based on my config, is : >>> >>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D >>> >>> >>> TESTS >>> I have identified two streamlined URL's. The first one returns the >>> documents that exist in the doc library of the 'Finance' site. This URL is: >>> >>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D >>> >>> The second URL simply adds the site restriction. This URL returns nothing: >>> >>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D >>> >>> <http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D> >>> >>> >>> >>> Can anyone explain why the documents do not return when only the containing >>> site is named in the API URL? >>> >>> Cheers >>> >>> Paul >>> >>> >> >>
