Hi Eugene,

Answering your questions:
1) Grouping is performed on both mapper and reducer (coordinator). If you
group be affinity key, you may try setting "SqlFieldsQuery.colocated=true"
to bypass grouping on reducer
2) For this specific query H2 will store (customer_id, count(*),
sum(views)) for every customer_id. It is hard to guess how much space it
would take in heap, but I think it would be ~50-100 bytes per customer_id.
So if you have N customers, it would be (100 * N) bytes
3) Please see
https://apacheignite-sql.readme.io/docs/performance-and-debugging

Vladimir.

On Thu, Aug 30, 2018 at 5:57 PM eugene miretsky <eugene.miret...@gmail.com>
wrote:

> Thanks against for the detailed response!
>
> Our main use case is preforming large SQL queries over tables with 200M+
> rows  - wanted to give you a bit more details and context you can pass along
>
> A simple example would be:
>
>    - Table: customer_id, date, category, views, clicks ( pkey =
>    "customer_id, date", affinity key = date )
>    - Query: SELECT count(*) where date < X AND categroy in (C1, C2, C3)
>    GROUP BY customer_id HAVING SUM(views) > 20
>
> My main concernse are
> 1) How is the group by performed. You mentioend that it is performend on
> the coordinator, I was coping that singe we are grouping using an colomn
> that is an affintiy key, each node will be able to do it's own group by
> 2) How much heap should I allocate for the group by stage
> 3) General performance tips
>
> Cheers,
> Eugene
>
>
> On Thu, Aug 30, 2018 at 1:32 AM Denis Magda <dma...@apache.org> wrote:
>
>> Eugene,
>>
>> Just want to be sure you know about the existence of the following pages
>> which elaborate on Ignite memory architecture in details:
>>
>>    -
>>    
>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Entriesandpagesindurablememory
>>    -
>>    
>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>>
>>
>>
>>> 1) Are indexs loaded into heap (when used)?
>>>
>>
>> Something might be copied to disk but in most of the cases we perform
>> comparisons and other operations directly off-heap.
>> See org.apache.ignite.internal.processors.query.h2.database.InlineIndexHelper
>> and related classes.
>>
>> 2) Are full pages loaded into heap, or only the matching records?
>>>
>>
>> Matching records (result set) are presently loaded. The pages are not.
>>
>>
>>> 3) When the query needs more processing than the exisiting index
>>> (non-indexed columns, groupBy, aggreag) where/how does it happen?
>>>
>>
>> We will be doing a full scan. Grouping and aggregations are finalized on
>> the query coordinator which needs to get a full result set.
>>
>> 4) How is the query coordinator chosen? Is it the client node? How about
>>> when using the web console?
>>>
>>
>> That's your application. Web Console uses Ignite SQL APIs as well.
>>
>>
>>> 5) What paralalism settings would your recomend, we were thinking to set
>>> parallelJobsNumber  to 1  and task parallelism to number of cores * 2 -
>>> this way we can make sure that each job gets al the heap memory instead of
>>> all jobs fighting each other. Not sure if it makes sense, and it will also
>>> prevent us from making real time transactional transactional queries.(we
>>> are hoping to use ignite for both olap and simple real time queries)
>>
>>
>> I would start a separate discussion for this bringing this question to
>> the attention of our SQL experts. I'm not the one of them.
>>
>> --
>> Denis
>>
>> On Mon, Aug 27, 2018 at 8:54 PM eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> Denis, thanks for the detailed response.
>>>
>>> A few more follow up questions
>>> 1) Are indexs loaded into heap (when used)?
>>> 2) Are full pages loaded into heap, or only the matching records?
>>> 3) When the query needs more processing than the exisiting index
>>> (non-indexed columns, groupBy, aggreag) where/how does it happen?
>>> 4) How is the query coordinator chosen? Is it the client node? How about
>>> when using the web console?
>>> 5) What paralalism settings would your recomend, we were thinking to set
>>> parallelJobsNumber  to 1  and task parallelism to number of cores * 2 -
>>> this way we can make sure that each job gets al the heap memory instead of
>>> all jobs fighting each other. Not sure if it makes sense, and it will also
>>> prevent us from making real time transactional transactional queries.(we
>>> are hoping to use ignite for both olap and simple real time queries)
>>>
>>> Cheers,
>>> Eugene
>>>
>>>
>>> On Sat, Aug 25, 2018 at 3:25 AM Denis Magda <dma...@apache.org> wrote:
>>>
>>>> Hello Eugene,
>>>>
>>>> 1) In what format is data stored off heap?
>>>>
>>>>
>>>> Data is always stored in the binary format let it be on-heap, off-heap
>>>> or Ignite persistence.
>>>> https://apacheignite.readme.io/docs/binary-marshaller
>>>>
>>>> 2) What happens when a SQL query is executed, in particular
>>>>
>>>>>
>>>>>    - How is H2 used? How is data loaded in H2? What if some of the
>>>>>    data is on disk?
>>>>>
>>>>> H2 is used to build execution plans for SELECTs. H2 calls Ignite's
>>>> B+Tree based indexing implementation to see which indexes are set. All the
>>>> data and indexes are always stored in Ignite (off-heap + disk).
>>>>
>>>>>
>>>>>    - When is data loaded into heap, and how much? Is only the output
>>>>>    of H2 loaded, or everything?
>>>>>
>>>>> Queries results are stored in Java heap temporarily. Once the result
>>>> set is read by your application, it will be garbage collected.
>>>>
>>>>>
>>>>>    - How is the reduce stage performed? Is it performed only on one
>>>>>    node (hence that node needs to load all the data into memory)
>>>>>
>>>>> Correct, the final result set is reduced on a query coordinator - your
>>>> application that executed a SELECT.
>>>>
>>>> 3) What happens when Ingite runs out of memory during execution? Is
>>>>> data evictied to disk (if persistence is enabled)?
>>>>
>>>>
>>>> I guess you mean what happens if a result set doesn't fit in RAM during
>>>> the execution, right? If so, then OOM will occur. We're working on an
>>>> improvement that will offload the result set to disk to avoid OOM for all
>>>> the scenarious:
>>>> https://issues.apache.org/jira/browse/IGNITE-7526
>>>>
>>>>
>>>>
>>>>> 4) Based on the code, it looks like I need to set my data region size
>>>>> to at most 50% of available memory (to avoid the warning), this seems a 
>>>>> bit
>>>>> wastefull.
>>>>
>>>>
>>>> There is no such a requirement. I know many deployments use cases when
>>>> one data region is given 20% of RAM, the other is given 40% and everything
>>>> else is persisted to disk.
>>>>
>>>> 5) Do you have any general advice on benchmarking the memory
>>>>> requirpement? So far I have not been able to find a way to check how much
>>>>> memory each table takes on and off heap, and how much memory each query
>>>>> takes.
>>>>
>>>>
>>>> We use Yardstick for performance benchmarking:
>>>> https://apacheignite.readme.io/docs/perfomance-benchmarking
>>>>
>>>> --
>>>> Denis
>>>>
>>>> On Fri, Aug 24, 2018 at 7:06 AM eugene miretsky <
>>>> eugene.miret...@gmail.com> wrote:
>>>>
>>>>> Thanks!
>>>>>
>>>>> I am trying to understand when and how data is moved from off-heap to
>>>>> on heap, particularly when using SQL.  I took a look at the wiki
>>>>> <https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood>
>>>>>  but
>>>>> still have a few questions
>>>>>
>>>>> My understanding is that data is always store off-heap
>>>>>
>>>>> 1) In what format is data stored off heap?
>>>>> 2) What happens when a SQL query is executed, in particular
>>>>>
>>>>>    - How is H2 used? How is data loaded in H2? What if some of the
>>>>>    data is on disk?
>>>>>    - When is data loaded into heap, and how much? Is only the output
>>>>>    of H2 loaded, or everything?
>>>>>    - How is the reduce stage performed? Is it performed only on one
>>>>>    node (hence that node needs to load all the data into memory)
>>>>>
>>>>> 3) What happens when Ingite runs out of memory during execution? Is
>>>>> data evictied to disk (if persistence is enabled)?
>>>>> 4) Based on the code, it looks like I need to set my data region size
>>>>> to at most 50% of available memory (to avoid the warning), this seems a 
>>>>> bit
>>>>> wastefull.
>>>>> 5) Do you have any general advice on benchmarking the memory
>>>>> requirpement? So far I have not been able to find a way to check how much
>>>>> memory each table takes on and off heap, and how much memory each query
>>>>> takes.
>>>>>
>>>>> Cheers,
>>>>> Eugene
>>>>>
>>>>> On Fri, Aug 24, 2018 at 8:06 AM, NSAmelchev <nsamelc...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Eugene,
>>>>>>
>>>>>> Yes, it's a misprint as Dmitry wrote.
>>>>>>
>>>>>> Ignite print this warning if nodes on local machine require more than
>>>>>> 80% of
>>>>>> physical RAM.
>>>>>>
>>>>>> From code, you can see that total heap/offheap memory summing
>>>>>> from nodes having the same mac address. This way calculates total
>>>>>> memory
>>>>>> used
>>>>>> by the local machine.
>>>>>>
>>>>>> --
>>>>>> Best wishes,
>>>>>> Amelchev Nikita
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>>>>
>>>>>
>>>>>

Reply via email to