Hi Geert/Soumadri, Thanks for the information.
I created my custom collector and am able to chunk my file into smaller files based on my conditions and insert into the target database using Information studio. Thanks and Regards, Gnanaprakash Bodireddy Sr Associate - Projects | IME Cognizant Technology Solutions VNET: 682831 (O): +91 (40) 44514444 extn: 682831 (M): +91-8897575644 -----Original Message----- From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of general-requ...@developer.marklogic.com Sent: Thursday, May 31, 2012 10:15 PM To: general@developer.marklogic.com Subject: General Digest, Vol 95, Issue 54 Send General mailing list submissions to general@developer.marklogic.com To subscribe or unsubscribe via the World Wide Web, visit http://community.marklogic.com/mailman/listinfo/general or, via email, send a message with subject or body 'help' to general-requ...@developer.marklogic.com You can reach the person managing the list at general-ow...@developer.marklogic.com When replying, please edit your Subject line so it is more specific than "Re: Contents of General digest..." Today's Topics: 1. Re: General Digest, Vol 95, Issue 53 (Chowdhury, Soumadri) 2. Collection operations performance (Ollier, John) 3. Re: Collection operations performance (Geert Josten) 4. Re: Collection operations performance (Danny Sokolsky) 5. Re: Collection operations performance (Michael Blakeley) ---------------------------------------------------------------------- Message: 1 Date: Wed, 30 May 2012 20:09:30 +0000 From: "Chowdhury, Soumadri" <srroychowdh...@innodata.com> Subject: Re: [MarkLogic Dev General] General Digest, Vol 95, Issue 53 To: "general@developer.marklogic.com" <general@developer.marklogic.com> Message-ID: <9a63c2aad44d8a4db7d444877a5802a81a8db...@svrndaexc01.noida.innodata.net> Content-Type: text/plain; charset="iso-8859-1" Hi Gnana, Transformation suits best where the whole input document is transformed and stored in the DB. I had faced similar issues where I was inserting (xdmp:document-insert) a new document (after extracting some part of input XML) in transformation phase, and those document were not getting inserted into the database. In my case, I dumped all the extracted files to disk using xdmp:save() (which is not a very elegant solution) and the application picked it up from there. As Geert mentioned, it is good if you create a custom one-shot collector. Hope this helps. ________________________________________ From: general-boun...@developer.marklogic.com [general-boun...@developer.marklogic.com] on behalf of general-requ...@developer.marklogic.com [general-requ...@developer.marklogic.com] Sent: Thursday, May 31, 2012 12:30 AM To: general@developer.marklogic.com Subject: General Digest, Vol 95, Issue 53 Send General mailing list submissions to general@developer.marklogic.com To subscribe or unsubscribe via the World Wide Web, visit http://community.marklogic.com/mailman/listinfo/general or, via email, send a message with subject or body 'help' to general-requ...@developer.marklogic.com You can reach the person managing the list at general-ow...@developer.marklogic.com When replying, please edit your Subject line so it is more specific than "Re: Contents of General digest..." Today's Topics: 1. Splitting documents using Information Studio (Gnanaprakash Bodireddy) 2. Re: Splitting documents using Information Studio (Geert Josten) ---------------------------------------------------------------------- Message: 1 Date: Wed, 30 May 2012 19:09:31 +0530 From: Gnanaprakash Bodireddy <gnanaprakash.bodire...@gmail.com> Subject: [MarkLogic Dev General] Splitting documents using Information Studio To: general@developer.marklogic.com Message-ID: <CACppFhRHpE94KdRiB326Mb-HCHZZa+yOtdz7uyM=3unjzrh...@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi, I am trying to split a simple xml document into small documents using XQuery transformation and load them into my database using Information studio. But, the documents are chunked and kept in *Fab *database and are not getting moved to my *target database*. Information Studio document says only if there is any error while loading documents, it will retain documents in Fab else it will move documents into the target database from fab after processing. Can you please let me know why documents are not getting loaded into the target database. i am not getting errors while loading content. -- Thanks and Regards, Gnanaprakash Bodireddy Phone | Mobile: +918897575644 | gnanaprakash.bodire...@gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://community.marklogic.com/pipermail/general/attachments/20120530/ff2a14a1/attachment-0001.html ------------------------------ Message: 2 Date: Wed, 30 May 2012 15:58:13 +0200 From: Geert Josten <geert.jos...@dayon.nl> Subject: Re: [MarkLogic Dev General] Splitting documents using Information Studio To: MarkLogic Developer Discussion <general@developer.marklogic.com> Message-ID: <9d722646336739a7560931ad87893...@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi Gnanaprakash, Let me guess, you are splitting the documents in the transformation phase? With the current design it works best if you would customize one of the collectors to do the splitting there. There are ways around doing it in the Transformation phase, but they are less elegant.. Kind regards, Geert *Van:* general-boun...@developer.marklogic.com [mailto: general-boun...@developer.marklogic.com] *Namens *Gnanaprakash Bodireddy *Verzonden:* woensdag 30 mei 2012 15:40 *Aan:* general@developer.marklogic.com *Onderwerp:* [MarkLogic Dev General] Splitting documents using Information Studio Hi, I am trying to split a simple xml document into small documents using XQuery transformation and load them into my database using Information studio. But, the documents are chunked and kept in *Fab *database and are not getting moved to my *target database*. Information Studio document says only if there is any error while loading documents, it will retain documents in Fab else it will move documents into the target database from fab after processing. Can you please let me know why documents are not getting loaded into the target database. i am not getting errors while loading content. -- Thanks and Regards, Gnanaprakash Bodireddy Phone | Mobile: +918897575644 | gnanaprakash.bodire...@gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://community.marklogic.com/pipermail/general/attachments/20120530/f1a60c88/attachment-0001.html ------------------------------ _______________________________________________ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general End of General Digest, Vol 95, Issue 53 *************************************** "This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this e-mail or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful." ------------------------------ Message: 2 Date: Thu, 31 May 2012 15:14:23 +0100 From: "Ollier, John" <j.oll...@nature.com> Subject: [MarkLogic Dev General] Collection operations performance To: <general@developer.marklogic.com> Message-ID: <cbed3b4f.e9da%j.oll...@nature.com> Content-Type: text/plain; charset="us-ascii" I have been trying to create/add to a collection of documents with a query like this: define function addTitlesToCollection($collectionUri, $isbnList) { let $isbn-query-string := fn:concat('cts:search(fn:collection("my-main-collection"),cts:or-query((', getValueQueryStringForISBN($isbnList), ')),"unfiltered")') let $search-documents := xdmp:value($isbn-query-string) return <result>{ for $book in $search-documents let $book-uri := fn:document-uri($book) let $result := xdmp:document-add-collections($book-uri,$collectionUri) return if (fn:empty($result)) then <success>Successfully adding {$book-uri} to collection {$collectionUri}</success> else <error>Error adding {$book-uri} to collection {$collectionUri}</error> } </result> } It's extremely slow - adding 100 documents to the collection takes 6 minutes and I'd like to be able to add 1000. I appreciate that the Marklogic documentation warns you that creating a collection is resource intensive. But is there any way of making it faster? I'm looking at handling the creation process asynchronously, but I'm concerned the fact that it's so slow means that the performance of Marklogic will be affected for other applications that use it. What the best way to check this? Thanks in advance. John ******************************************************************************** DISCLAIMER: This e-mail is confidential and should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage mechanism. Neither Macmillan Publishers Limited nor any of its agents accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Macmillan Publishers Limited or one of its agents. Please note that neither Macmillan Publishers Limited nor any of its agents accept any responsibility for viruses that may be contained in this e-mail or its attachments and it is your responsibility to scan the e-mail and attachments (if any). No contracts may be concluded on behalf of Macmillan Publishers Limited or its agents by means of e-mail communication. Macmillan Publishers Limited Registered in England and Wales with registered number 785998 Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS ******************************************************************************** ------------------------------ Message: 3 Date: Thu, 31 May 2012 16:28:04 +0200 From: Geert Josten <geert.jos...@dayon.nl> Subject: Re: [MarkLogic Dev General] Collection operations performance To: MarkLogic Developer Discussion <general@developer.marklogic.com> Message-ID: <5995a5f395c84881d6c1b00bfaf02...@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hi John, Not sure where you read that creating a collection would be resource 'intensive'. Working with collections should actually be pretty fast. I think there is something else in your code that is slowing things down. I'm suspecting the xdmp:value could be the cause of this. Can you elaborate on what the getValueQueryStringForISBN is doing? I don't think you would need the xdmp:value. Kind regards, Geert > -----Oorspronkelijk bericht----- > Van: general-boun...@developer.marklogic.com [mailto:general- > boun...@developer.marklogic.com] Namens Ollier, John > Verzonden: donderdag 31 mei 2012 16:14 > Aan: general@developer.marklogic.com > Onderwerp: [MarkLogic Dev General] Collection operations performance > > I have been trying to create/add to a collection of documents with a query > like this: > > define function addTitlesToCollection($collectionUri, $isbnList) { > let $isbn-query-string := > fn:concat('cts:search(fn:collection("my-main-collection"),cts:or-query((', > getValueQueryStringForISBN($isbnList), ')),"unfiltered")') > let $search-documents := xdmp:value($isbn-query-string) > return <result>{ > for $book in $search-documents > let $book-uri := fn:document-uri($book) > let $result := > xdmp:document-add-collections($book-uri,$collectionUri) > return if (fn:empty($result)) > then <success>Successfully adding {$book-uri} to collection > {$collectionUri}</success> > else <error>Error adding {$book-uri} to collection > {$collectionUri}</error> > } > </result> > } > > > It's extremely slow - adding 100 documents to the collection takes 6 minutes > and I'd like to be able to add 1000. > > I appreciate that the Marklogic documentation warns you that creating > a collection is resource intensive. But is there any way of making it faster? > I'm looking at handling the creation process asynchronously, but I'm > concerned the fact that it's so slow means that the performance of Marklogic > will be affected for other applications that use it. What the best way to > check this? > > Thanks in advance. > > John > > > > > ***************************************************************** > *************** > DISCLAIMER: This e-mail is confidential and should not be used by > anyone who is > not the original intended recipient. If you have received this e-mail > in error > please inform the sender and delete it from your mailbox or any other storage > mechanism. Neither Macmillan Publishers Limited nor any of its agents accept > liability for any statements made which are clearly the sender's own > and not > expressly made on behalf of Macmillan Publishers Limited or one of its agents. > Please note that neither Macmillan Publishers Limited nor any of its agents > accept any responsibility for viruses that may be contained in this e-mail or > its attachments and it is your responsibility to scan the e-mail and > attachments (if any). No contracts may be concluded on behalf of Macmillan > Publishers Limited or its agents by means of e-mail communication. Macmillan > Publishers Limited Registered in England and Wales with registered number > 785998 > Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS > ***************************************************************** > *************** > > _______________________________________________ > General mailing list > General@developer.marklogic.com > http://community.marklogic.com/mailman/listinfo/general ------------------------------ Message: 4 Date: Thu, 31 May 2012 08:43:22 -0700 From: Danny Sokolsky <danny.sokol...@marklogic.com> Subject: Re: [MarkLogic Dev General] Collection operations performance To: MarkLogic Developer Discussion <general@developer.marklogic.com> Message-ID: <c9924d15b04672479b089f7d55ffc13222639e5...@exchg-be.marklogic.com> Content-Type: text/plain; charset="us-ascii" John, Adding a collection to a document should cost the price of updating a document. You might try using the URI lexicon to get the list of URIs you want to update, and then iterate through that loop to update batches of documents (in batches of say 100 to 1000, depending) by adding the collections. My guess is that your function is locking many more documents than needed. Also, try running it through the profiler to see where it is spending time. -Danny -----Original Message----- From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Geert Josten Sent: Thursday, May 31, 2012 7:28 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Collection operations performance Hi John, Not sure where you read that creating a collection would be resource 'intensive'. Working with collections should actually be pretty fast. I think there is something else in your code that is slowing things down. I'm suspecting the xdmp:value could be the cause of this. Can you elaborate on what the getValueQueryStringForISBN is doing? I don't think you would need the xdmp:value. Kind regards, Geert > -----Oorspronkelijk bericht----- > Van: general-boun...@developer.marklogic.com [mailto:general- > boun...@developer.marklogic.com] Namens Ollier, John > Verzonden: donderdag 31 mei 2012 16:14 > Aan: general@developer.marklogic.com > Onderwerp: [MarkLogic Dev General] Collection operations performance > > I have been trying to create/add to a collection of documents with a query > like this: > > define function addTitlesToCollection($collectionUri, $isbnList) { > let $isbn-query-string := > fn:concat('cts:search(fn:collection("my-main-collection"),cts:or-query((', > getValueQueryStringForISBN($isbnList), ')),"unfiltered")') > let $search-documents := xdmp:value($isbn-query-string) > return <result>{ > for $book in $search-documents > let $book-uri := fn:document-uri($book) > let $result := > xdmp:document-add-collections($book-uri,$collectionUri) > return if (fn:empty($result)) > then <success>Successfully adding {$book-uri} to collection > {$collectionUri}</success> > else <error>Error adding {$book-uri} to collection > {$collectionUri}</error> > } > </result> > } > > > It's extremely slow - adding 100 documents to the collection takes 6 minutes > and I'd like to be able to add 1000. > > I appreciate that the Marklogic documentation warns you that creating > a collection is resource intensive. But is there any way of making it faster? > I'm looking at handling the creation process asynchronously, but I'm > concerned the fact that it's so slow means that the performance of Marklogic > will be affected for other applications that use it. What the best way to > check this? > > Thanks in advance. > > John > > > > > ***************************************************************** > *************** > DISCLAIMER: This e-mail is confidential and should not be used by > anyone who is > not the original intended recipient. If you have received this e-mail > in error > please inform the sender and delete it from your mailbox or any other storage > mechanism. Neither Macmillan Publishers Limited nor any of its agents accept > liability for any statements made which are clearly the sender's own > and not > expressly made on behalf of Macmillan Publishers Limited or one of its agents. > Please note that neither Macmillan Publishers Limited nor any of its agents > accept any responsibility for viruses that may be contained in this e-mail or > its attachments and it is your responsibility to scan the e-mail and > attachments (if any). No contracts may be concluded on behalf of Macmillan > Publishers Limited or its agents by means of e-mail communication. Macmillan > Publishers Limited Registered in England and Wales with registered number > 785998 > Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS > ***************************************************************** > *************** > > _______________________________________________ > General mailing list > General@developer.marklogic.com > http://community.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general ------------------------------ Message: 5 Date: Thu, 31 May 2012 09:44:45 -0700 From: Michael Blakeley <m...@blakeley.com> Subject: Re: [MarkLogic Dev General] Collection operations performance To: MarkLogic Developer Discussion <general@developer.marklogic.com> Message-ID: <9e578b02-e012-4bae-b6d0-b0a296192...@blakeley.com> Content-Type: text/plain; charset=us-ascii I think the first step is to rewrite that isbn query without using xdmp:value. From what I can see, there is no reason to use xdmp:value there. You should be able to compose the whole cts:or-query using cts:query constructors. Once you have done that, the performance problem may become clearer. As Danny mentioned, the update transaction is probably locking a large number of documents. Your next step will probably be to switch from cts:search to cts:uris. Also, the error-checking code at the end is a bit misguided. The xdmp:document-add-collections function will never return a result: that's what empty-sequence() means in its function signature. If anything goes wrong with xdmp:document-add-collections, it will throw an error and your query will terminate. If you want to prevent that, use try-catch. -- Mike On 31 May 2012, at 07:14 , Ollier, John wrote: > I have been trying to create/add to a collection of documents with a > query like this: > > define function addTitlesToCollection($collectionUri, $isbnList) { > let $isbn-query-string := > fn:concat('cts:search(fn:collection("my-main-collection"),cts:or-query > ((', getValueQueryStringForISBN($isbnList), ')),"unfiltered")') > let $search-documents := xdmp:value($isbn-query-string) > return <result>{ > for $book in $search-documents > let $book-uri := fn:document-uri($book) > let $result := > xdmp:document-add-collections($book-uri,$collectionUri) > return if (fn:empty($result)) > then <success>Successfully adding {$book-uri} to > collection {$collectionUri}</success> > else <error>Error adding {$book-uri} to collection > {$collectionUri}</error> > } > </result> > } > > > It's extremely slow - adding 100 documents to the collection takes 6 > minutes and I'd like to be able to add 1000. > > I appreciate that the Marklogic documentation warns you that creating > a collection is resource intensive. But is there any way of making it faster? > I'm looking at handling the creation process asynchronously, but I'm > concerned the fact that it's so slow means that the performance of > Marklogic will be affected for other applications that use it. What > the best way to check this? > > Thanks in advance. > > John > > > > > ******************************************************************************** > > DISCLAIMER: This e-mail is confidential and should not be used by > anyone who is not the original intended recipient. If you have > received this e-mail in error please inform the sender and delete it > from your mailbox or any other storage mechanism. Neither Macmillan > Publishers Limited nor any of its agents accept liability for any > statements made which are clearly the sender's own and not expressly made on > behalf of Macmillan Publishers Limited or one of its agents. > Please note that neither Macmillan Publishers Limited nor any of its > agents accept any responsibility for viruses that may be contained in > this e-mail or its attachments and it is your responsibility to scan > the e-mail and attachments (if any). No contracts may be concluded on > behalf of Macmillan Publishers Limited or its agents by means of > e-mail communication. Macmillan Publishers Limited Registered in England and > Wales with registered number 785998 > Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS > ********************************************************************** > ********** > > _______________________________________________ > General mailing list > General@developer.marklogic.com > http://community.marklogic.com/mailman/listinfo/general > ------------------------------ _______________________________________________ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general End of General Digest, Vol 95, Issue 54 *************************************** This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorised review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful. _______________________________________________ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general