Re: Spark SQL Transaction

Andrés Ivaldi Wed, 20 Apr 2016 12:46:45 -0700

I think the same, and I don't think reducing batches size improves speed
but will avoid loosing all data when rollback.



Thanks for the help..


On Wed, Apr 20, 2016 at 4:03 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> yep. I think it is not possible to make SQL Server do a non logged
> transaction. Other alternative is doing inserts in small batches if
> possible. Or write to a CSV type file and use Bulk copy to load the file
> into MSSQL with frequent commits like every 50K rows?
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 20 April 2016 at 19:42, Andrés Ivaldi <iaiva...@gmail.com> wrote:
>
>> Yes, I know that behavior , but there is not explicit Begin Transaction
>> in my code, so, maybe Spark or the same driver is adding the begin
>> transaction, or implicit transaction is configured. If spark is'n adding a
>> Begin transaction on each insertion, then probably is database or Driver
>> configuration...
>>
>> On Wed, Apr 20, 2016 at 3:33 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>>
>>> You will see what is happening in SQL Server. First create a test table
>>> called  testme
>>>
>>> 1> use tempdb
>>> 2> go
>>> 1> create table testme(col1 int)
>>> 2> go
>>> -- Now explicitly begin a transaction and insert 1 row and select from
>>> table
>>> 1>
>>> *begin tran*2> insert into testme values(1)
>>> 3> select * from testme
>>> 4> go
>>> (1 row affected)
>>>  col1
>>>  -----------
>>>            1
>>> -- That value col1=1 is there
>>> --
>>> (1 row affected)
>>> -- Now rollback that transaction meaning in your case by killing your
>>> Spark process!
>>> --
>>> 1> rollback tran
>>> 2> select * from testme
>>> 3> go
>>>  col1
>>>  -----------
>>> (0 rows affected)
>>>
>>> -- You can see that record has gone as it rolled back!
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 20 April 2016 at 18:42, Andrés Ivaldi <iaiva...@gmail.com> wrote:
>>>
>>>> Sorry I'cant answer before, I want to know if spark is the responsible
>>>> to add the Begin Tran, The point is to speed up insertion over losing data,
>>>>  Disabling Transaction will speed up the insertion and we dont care about
>>>> consistency... I'll disable te implicit_transaction and see what happens.
>>>>
>>>> thanks
>>>>
>>>> On Wed, Apr 20, 2016 at 12:09 PM, Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Assuming that you are using JDBC for putting data into any ACID
>>>>> compliant database (MSSQL, Sybase, Oracle etc), you are implicitly or
>>>>> explicitly  adding BEGIN TRAN to INSERT statement in a distributed
>>>>> transaction. MSSQL does not know or care where data is coming from. If 
>>>>> your
>>>>> connection completes OK a COMMIT TRAN will be sent and that will tell MSQL
>>>>> to commit transaction. If yoy kill Spark transaction before MSSQL receive
>>>>> COMMIT TRAN, the transaction will be rolled back.
>>>>>
>>>>> The only option is that if you don't care about full data getting to
>>>>> MSSQL,to break your insert into chunks at source and send data to MSSQL in
>>>>> small batches. In that way you will not lose all data in MSSQL because of
>>>>> rollback.
>>>>>
>>>>> HTH
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 20 April 2016 at 07:33, Mich Talebzadeh <mich.talebza...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Are you using JDBC to push data to MSSQL?
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * 
>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 19 April 2016 at 23:41, Andrés Ivaldi <iaiva...@gmail.com> wrote:
>>>>>>
>>>>>>> I mean local transaction, We've ran a Job that writes into SQLServer
>>>>>>> then we killed spark JVM just for testing purpose and we realized that
>>>>>>> SQLServer did a rollback.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> On Tue, Apr 19, 2016 at 5:27 PM, Mich Talebzadeh <
>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> What do you mean by *without transaction*? do you mean forcing SQL
>>>>>>>> Server to accept a non logged operation?
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * 
>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 19 April 2016 at 21:18, Andrés Ivaldi <iaiva...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello, is possible to execute a SQL write without Transaction? we
>>>>>>>>> dont need transactions to save our data and this adds an overhead to 
>>>>>>>>> the
>>>>>>>>> SQLServer.
>>>>>>>>>
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Ing. Ivaldi Andres
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ing. Ivaldi Andres
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ing. Ivaldi Andres
>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Ing. Ivaldi Andres
>>
>
>


-- 
Ing. Ivaldi Andres

Re: Spark SQL Transaction

Reply via email to