You should see an exception and your job fails by default after I think 4 attempts. If you see an exception you may want to clean the staging table for loading and reload again.
> On 4 Feb 2017, at 09:06, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > Ingesting from Hive tables back into Oracle. What mechanisms are in place to > ensure that data ends up consistently into Oracle table and Spark is notified > when Oracle has issues with data ingested (say rollback)? > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > >> On 29 January 2017 at 22:22, Jörn Franke <jornfra...@gmail.com> wrote: >> You can use HDFS, S3, Azure, glusterfs, Ceph, ignite (in-memory ) .... a >> Spark cluster itself does not store anything it just processes. >> >>> On 29 Jan 2017, at 15:37, Alex <siri8...@gmail.com> wrote: >>> >>> But for persistance after intermediate processing can i use spark cluster >>> itself or i have to use hadoop cluster?! >>> >>> On Jan 29, 2017 7:36 PM, "Deepak Sharma" <deepakmc...@gmail.com> wrote: >>> The better way is to read the data directly into spark using spark sql read >>> jdbc . >>> Apply the udf's locally . >>> Then save the data frame back to Oracle using dataframe's write jdbc. >>> >>> Thanks >>> Deepak >>> >>>> On Jan 29, 2017 7:15 PM, "Jörn Franke" <jornfra...@gmail.com> wrote: >>>> One alternative could be the oracle Hadoop loader and other Oracle >>>> products, but you have to invest some money and probably buy their Hadoop >>>> Appliance, which you have to evaluate if it make sense (can get expensive >>>> with large clusters etc). >>>> >>>> Another alternative would be to get rid of Oracle alltogether and use >>>> other databases. >>>> >>>> However, can you elaborate a little bit on your use case and the business >>>> logic as well as SLA requires. Otherwise all recommendations are right >>>> because the requirements you presented are very generic. >>>> >>>> About get rid of Hadoop - this depends! You will need some resource >>>> manager (yarn, mesos, kubernetes etc) and most likely also a distributed >>>> file system. Spark supports through the Hadoop apis a wide range of file >>>> systems, but does not need HDFS for persistence. You can have local >>>> filesystem (ie any file system mounted to a node, so also distributed >>>> ones, such as zfs), cloud file systems (s3, azure blob etc). >>>> >>>> >>>> >>>>> On 29 Jan 2017, at 11:18, Alex <siri8...@gmail.com> wrote: >>>>> >>>>> Hi All, >>>>> >>>>> Thanks for your response .. Please find below flow diagram >>>>> >>>>> Please help me out simplifying this architecture using Spark >>>>> >>>>> 1) Can i skip step 1 to step 4 and directly store it in spark >>>>> if I am storing it in spark where actually it is getting stored >>>>> Do i need to retain HAdoop to store data >>>>> or can i directly store it in spark and remove hadoop also? >>>>> >>>>> I want to remove informatica for preprocessing and directly load the >>>>> files data coming from server to Hadoop/Spark >>>>> >>>>> So My Question is Can i directly load files data to spark ? Then where >>>>> exactly the data will get stored.. Do I need to have Spark installed on >>>>> Top of HDFS? >>>>> >>>>> 2) if I am retaining below architecture Can I store back output from >>>>> spark directly to oracle from step 5 to step 7 >>>>> >>>>> and will spark way of storing it back to oracle will be better than using >>>>> sqoop performance wise >>>>> 3)Can I use SPark scala UDF to process data from hive and retain entire >>>>> architecture >>>>> >>>>> which among the above would be optimal >>>>> >>>>> >>>>> >>>>>> On Sat, Jan 28, 2017 at 10:38 PM, Sachin Naik <sachin.u.n...@gmail.com> >>>>>> wrote: >>>>>> I strongly agree with Jorn and Russell. There are different solutions >>>>>> for data movement depending upon your needs frequency, bi-directional >>>>>> drivers. workflow, handling duplicate records. This is a space is known >>>>>> as " Change Data Capture - CDC" for short. If you need more information, >>>>>> I would be happy to chat with you. I built some products in this space >>>>>> that extensively used connection pooling over ODBC/JDBC. >>>>>> >>>>>> Happy to chat if you need more information. >>>>>> >>>>>> -Sachin Naik >>>>>> >>>>>> >>Hard to tell. Can you give more insights >>on what you try to achieve >>>>>> >>and what the data is about? >>>>>> >>For example, depending on your use case sqoop can make sense or not. >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Jan 27, 2017, at 11:22 PM, Russell Spitzer >>>>>>> <russell.spit...@gmail.com> wrote: >>>>>>> >>>>>>> You can treat Oracle as a JDBC source >>>>>>> (http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases) >>>>>>> and skip Sqoop, HiveTables and go straight to Queries. Then you can >>>>>>> skip hive on the way back out (see the same link) and write directly to >>>>>>> Oracle. I'll leave the performance questions for someone else. >>>>>>> >>>>>>>> On Fri, Jan 27, 2017 at 11:06 PM Sirisha Cheruvu <siri8...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> On Sat, Jan 28, 2017 at 6:44 AM, Sirisha Cheruvu <siri8...@gmail.com> >>>>>>>> wrote: >>>>>>>> Hi Team, >>>>>>>> >>>>>>>> RIght now our existing flow is >>>>>>>> >>>>>>>> Oracle-->Sqoop --> Hive--> Hive Queries on Spark-sql (Hive >>>>>>>> Context)-->Destination Hive table -->sqoop export to Oracle >>>>>>>> >>>>>>>> Half of the Hive UDFS required is developed in Java UDF.. >>>>>>>> >>>>>>>> SO Now I want to know if I run the native scala UDF's than runninng >>>>>>>> hive java udfs in spark-sql will there be any performance difference >>>>>>>> >>>>>>>> >>>>>>>> Can we skip the Sqoop Import and export part and >>>>>>>> >>>>>>>> Instead directly load data from oracle to spark and code Scala UDF's >>>>>>>> for transformations and export output data back to oracle? >>>>>>>> >>>>>>>> RIght now the architecture we are using is >>>>>>>> >>>>>>>> oracle-->Sqoop (Import)-->Hive Tables--> Hive Queries --> Spark-SQL--> >>>>>>>> Hive --> Oracle >>>>>>>> what would be optimal architecture to process data from oracle using >>>>>>>> spark ?? can i anyway better this process ? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Sirisha >>>>>>>> >>>>> >>> >