Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

Mich Talebzadeh Sun, 18 Sep 2016 01:54:21 -0700

Thanks everyone for ideas.

Sounds like Ignite has been taken by GridGain  so becomes similar to
HazelCast open source by name only. However, an in-memory Java Cache may or
may not help.

The other options like faster databases are on the table depending who
wants what (that are normally decisions that includes more than technical
criteria). Example if the customer already had Tableau, persuading them to
go for QlickView (as an example) may not work.

So my view is to build the batch layer foundation and leave these finer
choices to the customer. We will offer Zeppelin with Parquet and ORC with a
certain refresh of these tables and let the customer decide. I stand
corrected otherwise.

BTW I did these simple test on using Zeppelin (running on Spark Standalone
mode)

1) Read data using Spark sql from Flume text files on HDFS (real time)
2) Read data using Spark sql from ORC table in Hive (lagging by 15 min)
3) Read data using Spark sql from Parquet table in Hive(lagging by 15 min)

Timings

1)            2 min, 16 sec
2)            1 min, 1 sec
3)            1 min, 6 sec

So unless one splits the atom, ORC or Parquet on Hive look similar
performance.

In all probability customer has a data warehouse that use Tableau or
QlikView or similar. Their BAs will carry on using these tools. If they
have data scientist then they will either use R that has in built UI or can
use Spark sql with Zeppelin. Also one can fire Zeppelin on each node of
Spark or even on the same node with different Port. Then of coursed one has
to think about adequate response in a concurrent environment.

Cheers

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 18 September 2016 at 08:52, Sean Owen <so...@cloudera.com> wrote:

> Alluxio isn't a database though; it's storage. I may be still harping
> on the wrong solution for you, but as we discussed offline, that's
> also what Impala, Drill et al are for.
>
> Sorry if this was mentioned before but Ignite is what GridGain became,
> if that helps.
>
> On Sat, Sep 17, 2016 at 11:00 PM, Mich Talebzadeh
> <mich.talebza...@gmail.com> wrote:
> > Thanks Todd
> >
> > As I thought Apache Ignite is a data fabric much like Oracle Coherence
> cache
> > or HazelCast.
> >
> > The use case is different between an in-memory-database (IMDB) and Data
> > Fabric. The build that I am dealing with has a 'database centric' view of
> > its data (i.e. it accesses its data using Spark sql and JDBC) so an
> > in-memory database will be a better fit. On the other hand If the
> > application deals solely with Java objects and does not have any notion
> of a
> > 'database', does not need SQL style queries and really just wants a
> > distributed, high performance object storage grid, then I think Ignite
> would
> > likely be the preferred choice.
> >
> > So will likely go if needed for an in-memory database like Alluxio. I
> have
> > seen a rather debatable comparison between Spark and Ignite that looks
> to be
> > like a one sided rant.
> >
> > HTH
> >
> >
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
> >
> >
>

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

Reply via email to