Re: An Architecture question on the use of virtualised clusters

Mich Talebzadeh Mon, 05 Jun 2017 14:17:03 -0700

My main concern is that the choice of Isolin is not for one use case. It
will be a strategic decision for the client and if we decide to go that way
we are effectively moving away from HDFS principals (3x replication) etc as
well.


Granted one can argue this may be OK but of course we have to look at our
future needs. From my experience of these tools, you cannot simply roll it
back without incurring considerable work and considerable cost.

And after all will the cost justify the whole of this setup? How about
performance and other bottlenecks?

Thanks



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 5 June 2017 at 15:46, John Leach <jle...@splicemachine.com> wrote:

> Mich,
>
> Yes, Isilon is in production...
>
> Isilon is a serious product and has been around for quite a while.  For
> on-premise external storage, we see it quite a bit.  Separating the compute
> from the storage actually helps.  It is also a nice transition to the cloud
> providers.
>
> Have you looked at MapR?  Usually the system guys target snapshots,
> volumes, and posix compliance if they are bought into Isilon.
>
> Good luck Mich.
>
> Regards,
> John Leach
>
>
>
>
> On Jun 5, 2017, at 9:27 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Hi John,
>
> Thanks. Did you end up in production or in other words besides PoC did you
> use it in anger?
>
> The intention is to build Isilon on top of the whole HDFS cluster!. If we
> go that way we also need to adopt it for DR as well.
>
> Cheers
>
>
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 5 June 2017 at 15:19, John Leach <jle...@splicemachine.com> wrote:
>
>> Mich,
>>
>> We used Isilon for a POC of Splice Machine (Spark for Analytics, HBase
>> for real-time).  We were concerned initially and the initial setup took a
>> bit longer than excepted, but it performed well on both low latency and
>> high throughput use cases at scale (our POC ~ 100 TB).
>>
>> Just a data point.
>>
>> Regards,
>> John Leach
>>
>> On Jun 5, 2017, at 9:11 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>> I am concerned about the use case of tools like Isilon or Panasas to
>> create a layer on top of HDFS, essentially a HCFS on top of HDFS with the
>> usual 3x replication gone into the tool itself.
>>
>> There is interest to push Isilon  as a the solution forward but my
>> caution is about scalability and future proof of such tools. So I was
>> wondering if anyone else has tried such solution.
>>
>> Thanks
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 2 June 2017 at 19:09, Gene Pang <gene.p...@gmail.com> wrote:
>>
>>> As Vincent mentioned earlier, I think Alluxio can work for this. You can 
>>> mount
>>> your (potentially remote) storage systems to Alluxio
>>> <http://www.alluxio.org/docs/master/en/Unified-and-Transparent-Namespace.html>,
>>> and deploy Alluxio co-located to the compute cluster. The computation
>>> framework will still achieve data locality since Alluxio workers are
>>> co-located, even though the existing storage systems may be remote. You can
>>> also use tiered storage
>>> <http://www.alluxio.org/docs/master/en/Tiered-Storage-on-Alluxio.html>
>>> to deploy using only memory, and/or other physical media.
>>>
>>> Here are some blogs (Alluxio with Minio
>>> <https://www.alluxio.com/blog/scalable-genomics-data-processing-pipeline-with-alluxio-mesos-and-minio>,
>>> Alluxio with HDFS
>>> <https://www.alluxio.com/blog/qunar-performs-real-time-data-analytics-up-to-300x-faster-with-alluxio>,
>>> Alluxio with S3
>>> <https://www.alluxio.com/blog/accelerating-on-demand-data-analytics-with-alluxio>)
>>> which use similar architecture.
>>>
>>> Hope that helps,
>>> Gene
>>>
>>> On Thu, Jun 1, 2017 at 1:45 AM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> As a matter of interest what is the best way of creating virtualised
>>>> clusters all pointing to the same physical data?
>>>>
>>>> thanks
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 1 June 2017 at 09:27, vincent gromakowski <
>>>> vincent.gromakow...@gmail.com> wrote:
>>>>
>>>>> If mandatory, you can use a local cache like alluxio
>>>>>
>>>>> Le 1 juin 2017 10:23 AM, "Mich Talebzadeh" <mich.talebza...@gmail.com>
>>>>> a écrit :
>>>>>
>>>>>> Thanks Vincent. I assume by physical data locality you mean you are
>>>>>> going through Isilon and HCFS and not through direct HDFS.
>>>>>>
>>>>>> Also I agree with you that shared network could be an issue as well.
>>>>>> However, it allows you to reduce data redundancy (you do not need R3 in
>>>>>> HDFS anymore) and also you can build virtual clusters on the same data. 
>>>>>> One
>>>>>> cluster for read/writes and another for Reads? That is what has been
>>>>>> suggestes!.
>>>>>>
>>>>>> regards
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>> LinkedIn * 
>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>> may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 1 June 2017 at 08:55, vincent gromakowski <
>>>>>> vincent.gromakow...@gmail.com> wrote:
>>>>>>
>>>>>>> I don't recommend this kind of design because you loose physical
>>>>>>> data locality and you will be affected by "bad neighboors" that are also
>>>>>>> using the network storage... We have one similar design but restricted 
>>>>>>> to
>>>>>>> small clusters (more for experiments than production)
>>>>>>>
>>>>>>> 2017-06-01 9:47 GMT+02:00 Mich Talebzadeh <mich.talebza...@gmail.com
>>>>>>> >:
>>>>>>>
>>>>>>>> Thanks Jorn,
>>>>>>>>
>>>>>>>> This was a proposal made by someone as the firm is already using
>>>>>>>> this tool on other SAN based storage and extend it to Big Data
>>>>>>>>
>>>>>>>> On paper it seems like a good idea, in practice it may be a
>>>>>>>> Wandisco scenario again..  Of course as ever one needs to EMC for 
>>>>>>>> reference
>>>>>>>> calls ans whether anyone is using this product in anger.
>>>>>>>>
>>>>>>>>
>>>>>>>> At the end of the day it's not HDFS.  It is OneFS with a HCFS API.
>>>>>>>>  However that may suit our needs.  But  would need to PoC it and test 
>>>>>>>> it
>>>>>>>> thoroughly!
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * 
>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>> which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>> damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1 June 2017 at 08:21, Jörn Franke <jornfra...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have done this (not Isilon, but another storage system). It can
>>>>>>>>> be efficient for small clusters and depending on how you design the 
>>>>>>>>> network.
>>>>>>>>>
>>>>>>>>> What I have also seen is the microservice approach with object
>>>>>>>>> stores (e.g. In the cloud s3, on premise swift) which is somehow also
>>>>>>>>> similar.
>>>>>>>>>
>>>>>>>>> If you want additional performance you could fetch the data from
>>>>>>>>> the object stores and store it temporarily in a local HDFS. Not sure 
>>>>>>>>> to
>>>>>>>>> what extent this affects regulatory requirements though.
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>>
>>>>>>>>> On 31. May 2017, at 18:07, Mich Talebzadeh <
>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I realize this may not have direct relevance to Spark but has
>>>>>>>>> anyone tried to create virtualized HDFS clusters using tools like 
>>>>>>>>> ISILON or
>>>>>>>>> similar?
>>>>>>>>>
>>>>>>>>> The prime motive behind this approach is to minimize the
>>>>>>>>> propagation or copy of data which has regulatory implication. In 
>>>>>>>>> shoret you
>>>>>>>>> want your data to be in one place regardless of artefacts used 
>>>>>>>>> against it
>>>>>>>>> such as Spark?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * 
>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>>> which may
>>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>>> damages
>>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>
>

Re: An Architecture question on the use of virtualised clusters

Reply via email to