Hi Geert/Soumadri,

Thanks for the information.

I created my custom collector and am able to chunk my file into smaller files 
based on my conditions and insert into the target database using Information 
studio.

Thanks and Regards,

Gnanaprakash Bodireddy
Sr Associate - Projects | IME
Cognizant Technology Solutions
VNET: 682831
(O): +91 (40) 44514444 extn: 682831
(M): +91-8897575644


-----Original Message-----
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of 
general-requ...@developer.marklogic.com
Sent: Thursday, May 31, 2012 10:15 PM
To: general@developer.marklogic.com
Subject: General Digest, Vol 95, Issue 54

Send General mailing list submissions to
        general@developer.marklogic.com

To subscribe or unsubscribe via the World Wide Web, visit
        http://community.marklogic.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
        general-requ...@developer.marklogic.com

You can reach the person managing the list at
        general-ow...@developer.marklogic.com

When replying, please edit your Subject line so it is more specific than "Re: 
Contents of General digest..."


Today's Topics:

   1. Re: General Digest, Vol 95, Issue 53 (Chowdhury, Soumadri)
   2. Collection operations performance (Ollier, John)
   3. Re: Collection operations performance (Geert Josten)
   4. Re: Collection operations performance (Danny Sokolsky)
   5. Re: Collection operations performance (Michael Blakeley)


----------------------------------------------------------------------

Message: 1
Date: Wed, 30 May 2012 20:09:30 +0000
From: "Chowdhury, Soumadri" <srroychowdh...@innodata.com>
Subject: Re: [MarkLogic Dev General] General Digest, Vol 95, Issue 53
To: "general@developer.marklogic.com"
        <general@developer.marklogic.com>
Message-ID:
        
<9a63c2aad44d8a4db7d444877a5802a81a8db...@svrndaexc01.noida.innodata.net>
        
Content-Type: text/plain; charset="iso-8859-1"

Hi Gnana,

Transformation suits best where the whole input document is transformed and 
stored in the DB. I had faced similar issues where I was inserting 
(xdmp:document-insert) a new document (after extracting some part of input XML) 
in transformation phase, and those document were not getting inserted into the 
database.

In my case, I dumped all the extracted files to disk using xdmp:save() (which 
is not a very elegant solution) and the application picked it up from there.

As Geert mentioned, it is good if you create a custom one-shot collector.

Hope this helps.
________________________________________
From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of 
general-requ...@developer.marklogic.com 
[general-requ...@developer.marklogic.com]
Sent: Thursday, May 31, 2012 12:30 AM
To: general@developer.marklogic.com
Subject: General Digest, Vol 95, Issue 53

Send General mailing list submissions to
        general@developer.marklogic.com

To subscribe or unsubscribe via the World Wide Web, visit
        http://community.marklogic.com/mailman/listinfo/general
or, via email, send a message with subject or body 'help' to
        general-requ...@developer.marklogic.com

You can reach the person managing the list at
        general-ow...@developer.marklogic.com

When replying, please edit your Subject line so it is more specific than "Re: 
Contents of General digest..."


Today's Topics:

   1. Splitting documents using Information Studio
      (Gnanaprakash Bodireddy)
   2. Re: Splitting documents using Information Studio (Geert Josten)


----------------------------------------------------------------------

Message: 1
Date: Wed, 30 May 2012 19:09:31 +0530
From: Gnanaprakash Bodireddy <gnanaprakash.bodire...@gmail.com>
Subject: [MarkLogic Dev General] Splitting documents using Information
        Studio
To: general@developer.marklogic.com
Message-ID:
        <CACppFhRHpE94KdRiB326Mb-HCHZZa+yOtdz7uyM=3unjzrh...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

I am trying to split a simple xml document into small documents using XQuery 
transformation and load them into my database using Information studio.

But, the documents are chunked and kept in *Fab *database and are not getting 
moved to my *target database*.

Information Studio document says only if there is any error while loading 
documents, it will retain documents in Fab else it will move documents into the 
target database from fab after processing.

Can you please let me know why documents are not getting loaded into the target 
database. i am not getting errors while loading content.

--
Thanks and Regards,
Gnanaprakash Bodireddy
Phone | Mobile: +918897575644 | gnanaprakash.bodire...@gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://community.marklogic.com/pipermail/general/attachments/20120530/ff2a14a1/attachment-0001.html

------------------------------

Message: 2
Date: Wed, 30 May 2012 15:58:13 +0200
From: Geert Josten <geert.jos...@dayon.nl>
Subject: Re: [MarkLogic Dev General] Splitting documents using
        Information     Studio
To: MarkLogic Developer Discussion <general@developer.marklogic.com>
Message-ID: <9d722646336739a7560931ad87893...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Gnanaprakash,



Let me guess, you are splitting the documents in the transformation phase?
With the current design it works best if you would customize one of the 
collectors to do the splitting there. There are ways around doing it in the 
Transformation phase, but they are less elegant..



Kind regards,

Geert



*Van:* general-boun...@developer.marklogic.com [mailto:
general-boun...@developer.marklogic.com] *Namens *Gnanaprakash Bodireddy
*Verzonden:* woensdag 30 mei 2012 15:40
*Aan:* general@developer.marklogic.com
*Onderwerp:* [MarkLogic Dev General] Splitting documents using Information 
Studio



Hi,



I am trying to split a simple xml document into small documents using XQuery 
transformation and load them into my database using Information studio.



But, the documents are chunked and kept in *Fab *database and are not getting 
moved to my *target database*.



Information Studio document says only if there is any error while loading 
documents, it will retain documents in Fab else it will move documents into the 
target database from fab after processing.



Can you please let me know why documents are not getting loaded into the target 
database. i am not getting errors while loading content.



--
Thanks and Regards,
Gnanaprakash Bodireddy
Phone | Mobile: +918897575644 | gnanaprakash.bodire...@gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://community.marklogic.com/pipermail/general/attachments/20120530/f1a60c88/attachment-0001.html

------------------------------

_______________________________________________
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


End of General Digest, Vol 95, Issue 53
***************************************
"This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient, please contact the sender by reply 
e-mail and destroy all copies of the original message. Any unauthorized review, 
use, disclosure, dissemination, forwarding, printing or copying of this e-mail 
or any action taken in reliance on this e-mail is strictly prohibited and may 
be unlawful."


------------------------------

Message: 2
Date: Thu, 31 May 2012 15:14:23 +0100
From: "Ollier, John" <j.oll...@nature.com>
Subject: [MarkLogic Dev General] Collection operations performance
To: <general@developer.marklogic.com>
Message-ID: <cbed3b4f.e9da%j.oll...@nature.com>
Content-Type: text/plain; charset="us-ascii"

I have been trying to create/add to a collection of documents with a query like 
this:

define function addTitlesToCollection($collectionUri, $isbnList) {
    let $isbn-query-string :=
fn:concat('cts:search(fn:collection("my-main-collection"),cts:or-query((',
getValueQueryStringForISBN($isbnList), ')),"unfiltered")')
    let $search-documents := xdmp:value($isbn-query-string)
    return <result>{
        for $book in $search-documents
            let $book-uri := fn:document-uri($book)
            let $result :=
xdmp:document-add-collections($book-uri,$collectionUri)
            return if (fn:empty($result))
                then <success>Successfully adding {$book-uri}  to collection 
{$collectionUri}</success>
                else <error>Error adding {$book-uri} to collection 
{$collectionUri}</error>
    }
    </result>
}


It's extremely slow - adding 100 documents to the collection takes 6 minutes 
and I'd like to be able to add 1000.

I appreciate that the Marklogic documentation warns you that creating a 
collection is resource intensive. But is there any way of making it faster?
I'm looking at handling the creation process asynchronously, but I'm concerned 
the fact that it's so slow means that the performance of Marklogic will be 
affected for other applications that use it. What the best way to check this?

Thanks in advance.

John




********************************************************************************
   
DISCLAIMER: This e-mail is confidential and should not be used by anyone who is 
not the original intended recipient. If you have received this e-mail in error 
please inform the sender and delete it from your mailbox or any other storage 
mechanism. Neither Macmillan Publishers Limited nor any of its agents accept 
liability for any statements made which are clearly the sender's own and not 
expressly made on behalf of Macmillan Publishers Limited or one of its agents.
Please note that neither Macmillan Publishers Limited nor any of its agents 
accept any responsibility for viruses that may be contained in this e-mail or 
its attachments and it is your responsibility to scan the e-mail and 
attachments (if any). No contracts may be concluded on behalf of Macmillan 
Publishers Limited or its agents by means of e-mail communication. Macmillan 
Publishers Limited Registered in England and Wales with registered number 
785998 
Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS   
********************************************************************************



------------------------------

Message: 3
Date: Thu, 31 May 2012 16:28:04 +0200
From: Geert Josten <geert.jos...@dayon.nl>
Subject: Re: [MarkLogic Dev General] Collection operations performance
To: MarkLogic Developer Discussion <general@developer.marklogic.com>
Message-ID: <5995a5f395c84881d6c1b00bfaf02...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi John,

Not sure where you read that creating a collection would be resource 
'intensive'. Working with collections should actually be pretty fast. I think 
there is something else in your code that is slowing things down.
I'm suspecting the xdmp:value could be the cause of this. Can you elaborate on 
what the getValueQueryStringForISBN is doing? I don't think you would need the 
xdmp:value.

Kind regards,
Geert

> -----Oorspronkelijk bericht-----
> Van: general-boun...@developer.marklogic.com [mailto:general- 
> boun...@developer.marklogic.com] Namens Ollier, John
> Verzonden: donderdag 31 mei 2012 16:14
> Aan: general@developer.marklogic.com
> Onderwerp: [MarkLogic Dev General] Collection operations performance
>
> I have been trying to create/add to a collection of documents with a
query
> like this:
>
> define function addTitlesToCollection($collectionUri, $isbnList) {
>     let $isbn-query-string :=
>
fn:concat('cts:search(fn:collection("my-main-collection"),cts:or-query((',
> getValueQueryStringForISBN($isbnList), ')),"unfiltered")')
>     let $search-documents := xdmp:value($isbn-query-string)
>     return <result>{
>         for $book in $search-documents
>             let $book-uri := fn:document-uri($book)
>             let $result :=
> xdmp:document-add-collections($book-uri,$collectionUri)
>             return if (fn:empty($result))
>                 then <success>Successfully adding {$book-uri}  to
collection
> {$collectionUri}</success>
>                 else <error>Error adding {$book-uri} to collection 
> {$collectionUri}</error>
>     }
>     </result>
> }
>
>
> It's extremely slow - adding 100 documents to the collection takes 6
minutes
> and I'd like to be able to add 1000.
>
> I appreciate that the Marklogic documentation warns you that creating 
> a collection is resource intensive. But is there any way of making it
faster?
> I'm looking at handling the creation process asynchronously, but I'm 
> concerned the fact that it's so slow means that the performance of
Marklogic
> will be affected for other applications that use it. What the best way
to
> check this?
>
> Thanks in advance.
>
> John
>
>
>
>
> *****************************************************************
> ***************
> DISCLAIMER: This e-mail is confidential and should not be used by 
> anyone
who is
> not the original intended recipient. If you have received this e-mail 
> in
error
> please inform the sender and delete it from your mailbox or any other
storage
> mechanism. Neither Macmillan Publishers Limited nor any of its agents
accept
> liability for any statements made which are clearly the sender's own 
> and
not
> expressly made on behalf of Macmillan Publishers Limited or one of its
agents.
> Please note that neither Macmillan Publishers Limited nor any of its
agents
> accept any responsibility for viruses that may be contained in this
e-mail or
> its attachments and it is your responsibility to scan the e-mail and 
> attachments (if any). No contracts may be concluded on behalf of
Macmillan
> Publishers Limited or its agents by means of e-mail communication.
Macmillan
> Publishers Limited Registered in England and Wales with registered
number
> 785998
> Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS
> *****************************************************************
> ***************
>
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://community.marklogic.com/mailman/listinfo/general


------------------------------

Message: 4
Date: Thu, 31 May 2012 08:43:22 -0700
From: Danny Sokolsky <danny.sokol...@marklogic.com>
Subject: Re: [MarkLogic Dev General] Collection operations performance
To: MarkLogic Developer Discussion <general@developer.marklogic.com>
Message-ID:
        <c9924d15b04672479b089f7d55ffc13222639e5...@exchg-be.marklogic.com>
Content-Type: text/plain; charset="us-ascii"

John,

Adding a collection to a document should cost the price of updating a document. 
 You might try using the URI lexicon to get the list of URIs you want to 
update, and then iterate through that loop to update batches of documents (in 
batches of say 100 to 1000, depending) by adding the collections.  My guess is 
that your function is locking many more documents than needed.  Also, try 
running it through the profiler to see where it is spending time.

-Danny

-----Original Message-----
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Geert Josten
Sent: Thursday, May 31, 2012 7:28 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Collection operations performance

Hi John,

Not sure where you read that creating a collection would be resource 
'intensive'. Working with collections should actually be pretty fast. I think 
there is something else in your code that is slowing things down.
I'm suspecting the xdmp:value could be the cause of this. Can you elaborate on 
what the getValueQueryStringForISBN is doing? I don't think you would need the 
xdmp:value.

Kind regards,
Geert

> -----Oorspronkelijk bericht-----
> Van: general-boun...@developer.marklogic.com [mailto:general- 
> boun...@developer.marklogic.com] Namens Ollier, John
> Verzonden: donderdag 31 mei 2012 16:14
> Aan: general@developer.marklogic.com
> Onderwerp: [MarkLogic Dev General] Collection operations performance
>
> I have been trying to create/add to a collection of documents with a
query
> like this:
>
> define function addTitlesToCollection($collectionUri, $isbnList) {
>     let $isbn-query-string :=
>
fn:concat('cts:search(fn:collection("my-main-collection"),cts:or-query((',
> getValueQueryStringForISBN($isbnList), ')),"unfiltered")')
>     let $search-documents := xdmp:value($isbn-query-string)
>     return <result>{
>         for $book in $search-documents
>             let $book-uri := fn:document-uri($book)
>             let $result :=
> xdmp:document-add-collections($book-uri,$collectionUri)
>             return if (fn:empty($result))
>                 then <success>Successfully adding {$book-uri}  to
collection
> {$collectionUri}</success>
>                 else <error>Error adding {$book-uri} to collection 
> {$collectionUri}</error>
>     }
>     </result>
> }
>
>
> It's extremely slow - adding 100 documents to the collection takes 6
minutes
> and I'd like to be able to add 1000.
>
> I appreciate that the Marklogic documentation warns you that creating 
> a collection is resource intensive. But is there any way of making it
faster?
> I'm looking at handling the creation process asynchronously, but I'm 
> concerned the fact that it's so slow means that the performance of
Marklogic
> will be affected for other applications that use it. What the best way
to
> check this?
>
> Thanks in advance.
>
> John
>
>
>
>
> *****************************************************************
> ***************
> DISCLAIMER: This e-mail is confidential and should not be used by 
> anyone
who is
> not the original intended recipient. If you have received this e-mail 
> in
error
> please inform the sender and delete it from your mailbox or any other
storage
> mechanism. Neither Macmillan Publishers Limited nor any of its agents
accept
> liability for any statements made which are clearly the sender's own 
> and
not
> expressly made on behalf of Macmillan Publishers Limited or one of its
agents.
> Please note that neither Macmillan Publishers Limited nor any of its
agents
> accept any responsibility for viruses that may be contained in this
e-mail or
> its attachments and it is your responsibility to scan the e-mail and 
> attachments (if any). No contracts may be concluded on behalf of
Macmillan
> Publishers Limited or its agents by means of e-mail communication.
Macmillan
> Publishers Limited Registered in England and Wales with registered
number
> 785998
> Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS
> *****************************************************************
> ***************
>
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://community.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


------------------------------

Message: 5
Date: Thu, 31 May 2012 09:44:45 -0700
From: Michael Blakeley <m...@blakeley.com>
Subject: Re: [MarkLogic Dev General] Collection operations performance
To: MarkLogic Developer Discussion <general@developer.marklogic.com>
Message-ID: <9e578b02-e012-4bae-b6d0-b0a296192...@blakeley.com>
Content-Type: text/plain; charset=us-ascii

I think the first step is to rewrite that isbn query without using xdmp:value. 
From what I can see, there is no reason to use xdmp:value there. You should be 
able to compose the whole cts:or-query using cts:query constructors.

Once you have done that, the performance problem may become clearer. As Danny 
mentioned, the update transaction is probably locking a large number of 
documents. Your next step will probably be to switch from cts:search to 
cts:uris.

Also, the error-checking code at the end is a bit misguided. The 
xdmp:document-add-collections function will never return a result: that's what 
empty-sequence() means in its function signature. If anything goes wrong with 
xdmp:document-add-collections, it will throw an error and your query will 
terminate. If you want to prevent that, use try-catch.

-- Mike

On 31 May 2012, at 07:14 , Ollier, John wrote:

> I have been trying to create/add to a collection of documents with a 
> query like this:
> 
> define function addTitlesToCollection($collectionUri, $isbnList) {
>    let $isbn-query-string :=
> fn:concat('cts:search(fn:collection("my-main-collection"),cts:or-query
> ((', getValueQueryStringForISBN($isbnList), ')),"unfiltered")')
>    let $search-documents := xdmp:value($isbn-query-string)
>    return <result>{
>        for $book in $search-documents
>            let $book-uri := fn:document-uri($book)
>            let $result :=
> xdmp:document-add-collections($book-uri,$collectionUri)
>            return if (fn:empty($result))
>                then <success>Successfully adding {$book-uri}  to 
> collection {$collectionUri}</success>
>                else <error>Error adding {$book-uri} to collection 
> {$collectionUri}</error>
>    }
>    </result>
> }
> 
> 
> It's extremely slow - adding 100 documents to the collection takes 6 
> minutes and I'd like to be able to add 1000.
> 
> I appreciate that the Marklogic documentation warns you that creating 
> a collection is resource intensive. But is there any way of making it faster?
> I'm looking at handling the creation process asynchronously, but I'm 
> concerned the fact that it's so slow means that the performance of 
> Marklogic will be affected for other applications that use it. What 
> the best way to check this?
> 
> Thanks in advance.
> 
> John
> 
> 
> 
> 
> ********************************************************************************
>    
> DISCLAIMER: This e-mail is confidential and should not be used by 
> anyone who is not the original intended recipient. If you have 
> received this e-mail in error please inform the sender and delete it 
> from your mailbox or any other storage mechanism. Neither Macmillan 
> Publishers Limited nor any of its agents accept liability for any 
> statements made which are clearly the sender's own and not expressly made on 
> behalf of Macmillan Publishers Limited or one of its agents.
> Please note that neither Macmillan Publishers Limited nor any of its 
> agents accept any responsibility for viruses that may be contained in 
> this e-mail or its attachments and it is your responsibility to scan 
> the e-mail and attachments (if any). No contracts may be concluded on 
> behalf of Macmillan Publishers Limited or its agents by means of 
> e-mail communication. Macmillan Publishers Limited Registered in England and 
> Wales with registered number 785998
> Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS   
> **********************************************************************
> **********
> 
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://community.marklogic.com/mailman/listinfo/general
> 



------------------------------

_______________________________________________
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general


End of General Digest, Vol 95, Issue 54
***************************************

This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please contact the sender by reply 
e-mail and destroy all copies of the original message. 
Any unauthorised review, use, disclosure, dissemination, forwarding, printing 
or copying of this email or any action taken in reliance on this e-mail is 
strictly prohibited and may be unlawful.
_______________________________________________
General mailing list
General@developer.marklogic.com
http://community.marklogic.com/mailman/listinfo/general

Reply via email to