Hello!

Quick heads up!  The new streaming features completely solved my issue! As
the dataset is big requesting a big part of it took lots of RAM.
This means that I might have managed it LIMIT OFFSET but the query is slow
and doing it multiple time was really slow.
So using streaming with 3.4 saved the day.  Nice job arangodb & the team!
This software is really awesome and every new feature are well made!

Le ven. 16 nov. 2018 à 11:02, Killian Janod <killian.ja...@ismart.fr> a
écrit :

> Hi Simran, Thank's a lot for answering my questions.
>  I'm sorry I'have been slow to answer back.
>
> There was a copy/paste mistake in the Gist. That's why there was an
> undefined gfi_s. The Gist has been update to reflect better the query.
> https://gist.github.com/Waateur/6c8c0b2c40d0dfa08cecfe780275cd9f
>
> The collect bellow  usually return under 5  results
>   COLLECT code=b.code, category_id=b.category_id INTO code_group
>
> You are right about this filter   FILTER LENGTH(a) == 1 is used to remove
> b documents that have no match in a.
>
> About the data distribution this is what i have in mind
> A contain 31,025,557 docs and B  20,785,230 doc.
> 90% of A is link to one or more doc in B
> 60% of A is link to more than one B. ( mostly 3 and almost never over 5 )
> 20% of B is link to more than A. ( usually 2 or 3 )
> These kind of duplicate are differentiated by the delivery string which
> is a "date of upload" information.
>
> As for example of documents, they were added to the gist.
> I don't know if any other information can be useful.
>
>
> Best,
> Killian
>
> Le lun. 12 nov. 2018 à 15:03, Simran Brucherseifer <sim...@arangodb.com>
> a écrit :
>
>> Hi Killian,
>>
>> it would be helpful to see your exact index definitions, some example
>> documents and to know a bit about the data distribution.
>>
>> How many documents end up in one group here on average?
>>
>>  COLLECT code=b.code, category_id=b.category_id INTO code_group
>>
>>
>> What do you actually want to return here? "gfi_s" is not defined in the
>> Gist:
>>
>> FOR a_tmp IN A
>> ...
>> RETURN gfi_s
>>
>>
>> What is this for? The sub-query is limited to one result anyway. Is this
>> for the case of no match or a missing attribute (gfi_s, see above)?
>>
>> FILTER LENGTH(a) == 1
>>
>> Best, Simran
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "ArangoDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to arangodb+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
>
> Killian Janod
> Datascientist @  iSmart / Kware
> killian.ja...@kware.fr
> killian.ja...@ismart.fr <killian.ja...@kware.fr>
> +33 (0) 6 61 33 34 76
>


-- 

Killian Janod
Datascientist @  iSmart / Kware
killian.ja...@kware.fr
killian.ja...@ismart.fr <killian.ja...@kware.fr>
+33 (0) 6 61 33 34 76

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to arangodb+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to