Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

Mich Talebzadeh Sat, 17 Sep 2016 12:54:06 -0700

Thanks Todd.

I will have a look.


Regards

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 17 September 2016 at 20:45, Todd Nist <tsind...@gmail.com> wrote:

> Hi Mich,
>
> Have you looked at Apache Ignite?  https://apacheignite-fs.readme.io/docs.
>
>
> This looks like something that may be what your looking for:
>
> http://apacheignite.gridgain.org/docs/data-analysis-with-apache-zeppelin
>
> HTH.
>
> -Todd
>
>
> On Sat, Sep 17, 2016 at 12:53 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> I am seeing similar issues when I was working on Oracle with Tableau as
>> the dashboard.
>>
>> Currently I have a batch layer that gets streaming data from
>>
>> source -> Kafka -> Flume -> HDFS
>>
>> It stored on HDFS as text files and a cron process sinks Hive table with
>> the the external table build on the directory. I tried both ORC and Parquet
>> but I don't think the query itself is the issue.
>>
>> Meaning it does not matter how clever your execution engine is, the fact
>> you still have to do  considerable amount of Physical IO (PIO) as opposed
>> to Logical IO (LIO) to get the data to Zeppelin is on the critical path.
>>
>> One option is to limit the amount of data in Zeppelin to certain number
>> of rows or something similar. However, you cannot tell a user he/she cannot
>> see the full data.
>>
>> We resolved this with Oracle by using Oracle TimesTen
>> <http://www.oracle.com/technetwork/database/database-technologies/timesten/overview/index.html>IMDB
>> to cache certain tables in memory and get them refreshed (depending on
>> refresh frequency) from the underlying table in Oracle when data is
>> updated). That is done through cache fusion.
>>
>> I was looking around and came across Alluxio <http://www.alluxio.org/>.
>> Ideally I like to utilise such concept like TimesTen. Can one distribute
>> Hive table data (or any table data) across the nodes cached. In that case
>> we will be doing Logical IO which is about 20 times or more lightweight
>> compared to Physical IO.
>>
>> Anyway this is the concept.
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>
>

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

Reply via email to