Sorry cannot help you there - I do not know the cost for isilon. I also cannot predict what the majority will do ...
> On 18. Jun 2017, at 21:49, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > thanks Jorn. > > I have been told that Hadoop 3 (alpha testing now) will support Docking and > virtualised Hadoop clusters > > Also if we decided to use something like Isolin and blue data to create > zoning (meaning two different Hadoop clusters migrated to Isolin storage each > residing on its zone/compartment) and virtualised clusters, we haave to > migrate two separate physical Hadoop clusters to Isolin and then create the > structure. > > My point is if we went that way we have to weight up the cost and efforts in > migrating two Hadoop clusters to Isolin, versus adding one Hadoop cluster to > the other one to make one cluster out of two and still we have the underlying > HDFS file system. And then of course how many companies going this way and > overriding reason to use such approach. What will happen if we have > performance issues, where to pinpoint the bottleneck (Isolin) or third party > Hadoop vendor. There is really no community to rely on as well. > > Your thoughts? > > Thanks > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > >> On 15 June 2017 at 21:27, Jörn Franke <jornfra...@gmail.com> wrote: >> On HDFS you have storage policies where you can define ssd etc >> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html >> >> Not sure if this is a similar offering to what you refer to. >> >> Open stack swift is similar to S3 but for your own data center >> https://docs.openstack.org/developer/swift/associated_projects.html >> >>> On 15. Jun 2017, at 21:55, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>> In Isilon etc you have SSD, middle layer and archive later where data is >>> moved. Can that be implemented in HDFS itself Yorn? What is swift. Isa that >>> low level archive disk? >>> >>> thanks >>> >>> Dr Mich Talebzadeh >>> >>> LinkedIn >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> >>> http://talebzadehmich.wordpress.com >>> >>> Disclaimer: Use it at your own risk. Any and all responsibility for any >>> loss, damage or destruction of data or any other property which may arise >>> from relying on this email's technical content is explicitly disclaimed. >>> The author will in no case be liable for any monetary damages arising from >>> such loss, damage or destruction. >>> >>> >>>> On 15 June 2017 at 20:42, Jörn Franke <jornfra...@gmail.com> wrote: >>>> Well this happens also if you use amazon EMR - most data will be stored on >>>> S3 and there you have also no data locality. You can move it temporary to >>>> HDFS or in-memory (ignite) and you can use sampling etc to avoid the need >>>> to process all the data. In fact, that is done in Spark machine learning >>>> algorithms (stochastic gradient descent etc). This will avoid that you >>>> need to move all the data through the networks and you loose only little >>>> precision (and you can statistically reason on that). >>>> For a lot of data I see also the trend that companies move it anyway to >>>> cheap object storages (swift etc) to reduce cost - particularly because it >>>> is not used often. >>>> >>>> >>>>> On 15. Jun 2017, at 21:34, Mich Talebzadeh <mich.talebza...@gmail.com> >>>>> wrote: >>>>> >>>>> thanks Jorn. >>>>> >>>>> If the idea is to separate compute from data using Isilon etc then one is >>>>> going to lose the locality of data. >>>>> >>>>> Also the argument is that we would like to run queries/reports against >>>>> two independent clusters simultaneously so do this >>>>> >>>>> Use Isilon OneFS for Big data to migrate two independent Hadoop clusters >>>>> into Isilon OneFS >>>>> Locate data from each cluster into its own zone in Isilon >>>>> Run queries to combine data from each zone >>>>> Use blue data to create virtual Hadoop clusters on top of Isilon so one >>>>> isolates the performance impact of analytics/Data Science versus other >>>>> users >>>>> >>>>> Now that is easily said than done as usual. First you have to migrate the >>>>> two existing clusters data into zones in Isilon. Then you are effectively >>>>> separating Compute from data so data locality is lost. This is no >>>>> different from your Spark cluster accessing data from each cluster. There >>>>> are a lot of tangential arguments here. Like Isilon will use RAID and you >>>>> don't need to replicate your data R3. Even including Isilon licensing >>>>> cost, the total cost goes down! >>>>> >>>>> The side effect is the network now that you have lost data locality. How >>>>> fast your network is going to be to handle the throughputs. Networks are >>>>> shared across say a Bank unless you spend $$$ creating private infiniband >>>>> networks. Standard 10Gbits/s is not going to be good enough. >>>>> >>>>> Also in reality blue data does not need Isilon. It runs on HP and other >>>>> hardware also. In Apache Hadoop 3.0 docker engine on yarn is available. >>>>> Alpha currently, will be released at end of this year. As we have not >>>>> started on Isilon it may be worth looking at this also? >>>>> >>>>> Cheers >>>>> >>>>> >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> LinkedIn >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> Disclaimer: Use it at your own risk. Any and all responsibility for any >>>>> loss, damage or destruction of data or any other property which may arise >>>>> from relying on this email's technical content is explicitly disclaimed. >>>>> The author will in no case be liable for any monetary damages arising >>>>> from such loss, damage or destruction. >>>>> >>>>> >>>>>> On 15 June 2017 at 17:05, Jörn Franke <jornfra...@gmail.com> wrote: >>>>>> It does not matter to Spark you just put the HDFS URL of the namenode >>>>>> there. Of course the issue is that you loose data locality, but this >>>>>> would be also the case for Oracle. >>>>>> >>>>>>> On 15. Jun 2017, at 18:03, Mich Talebzadeh <mich.talebza...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> With Spark how easy is it to fetch data from two different clusters and >>>>>>> do a join in Spark. >>>>>>> >>>>>>> I can use two JDBC connections to join two tables from two different >>>>>>> Oracle instances in Spark though creating two Data Frames and joining >>>>>>> them together. >>>>>>> >>>>>>> would that be possible for data residing on two different HDFS clusters? >>>>>>> >>>>>>> thanks >>>>>>> >>>>>>> >>>>>>> Dr Mich Talebzadeh >>>>>>> >>>>>>> LinkedIn >>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>> >>>>>>> http://talebzadehmich.wordpress.com >>>>>>> >>>>>>> Disclaimer: Use it at your own risk. Any and all responsibility for any >>>>>>> loss, damage or destruction of data or any other property which may >>>>>>> arise from relying on this email's technical content is explicitly >>>>>>> disclaimed. The author will in no case be liable for any monetary >>>>>>> damages arising from such loss, damage or destruction. >>>>>>> >>>>> >>> >