I agree with Bhupesh, DB does not guarantees that your data will be
retrieved in a specific or sorted order if an 'order by' clause is not
given in the query.
IMO in case of our poll operator we will have to sort the records for
non-poller partitions to ensure all records are emitted and no 2 records
are emitted by different partitions.
I think we can get away with sorting for poller partition with the idea
that Thomas has suggested.

--Hitesh
On Tue, Jun 27, 2017 at 10:48 AM, Bhupesh Chawda <bhup...@datatorrent.com>
wrote:

> IMO we would need to sort since, even though the keys are monotonically
> increasing, it may not return the data in the same order. Depends on the
> implementation and file format of the given db.
>
> ~ Bhupesh
>
>
> _______________________________________________________
>
> Bhupesh Chawda
>
> E: bhup...@datatorrent.com | Twitter: @bhupeshsc
>
> www.datatorrent.com  |  apex.apache.org
>
>
>
> On Tue, Jun 27, 2017 at 9:16 AM, Thomas Weise <t...@apache.org> wrote:
>
>> Hi,
>>
>> It seems the poll operator performs unnecessary operations in the case
>> where the "key" column values in the source table are monotonic increasing.
>> There should be no need to sort or do count selects. Instead it should be
>> sufficient to just filter with the key range.
>>
>> Let's say the key column is a timestamp that is set by a trigger, one
>> could use:
>>
>> SELECT ... WHERE UPDATE_DATE > "<LAST_SEEN_DATE>"
>>
>> Instead of operating with ORDER BY, OFFSET and LIMIT.
>>
>> Thanks
>>
>>
>>
>

Reply via email to