Thanks Ayan!

*Daniel Lopes*
Chief Data and Analytics Officer | OneMatch
c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes

www.onematch.com.br
<http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>

On Thu, Sep 8, 2016 at 7:54 PM, ayan guha <guha.a...@gmail.com> wrote:

> Another way of debugging would be writing another UDF, returning string.
> Also, in that function, put something useful in catch block, so you can
> filter those records from df.
> On 9 Sep 2016 03:41, "Daniel Lopes" <dan...@onematch.com.br> wrote:
>
>> Thanks Mike,
>>
>> A good way to debug! Was that already!
>>
>> Best,
>>
>> *Daniel Lopes*
>> Chief Data and Analytics Officer | OneMatch
>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>
>> www.onematch.com.br
>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>
>> On Thu, Sep 8, 2016 at 2:26 PM, Mike Metzger <m...@flexiblecreations.com>
>> wrote:
>>
>>> My guess is there's some row that does not match up with the expected
>>> data.  While slower, I've found RDDs to be easier to troubleshoot this kind
>>> of thing until you sort out exactly what's happening.
>>>
>>> Something like:
>>>
>>> raw_data = sc.textFile("<path to text file(s)>")
>>> rowcounts = raw_data.map(lambda x: (len(x.split(",")),
>>> 1)).reduceByKey(lambda x,y: x+y)
>>> rowcounts.take(5)
>>>
>>> badrows = raw_data.filter(lambda x: len(x.split(",")) != <expected
>>> number of columns>)
>>> if badrows.count() > 0:
>>>     badrows.saveAsTextFile("<path to malformed.csv>")
>>>
>>>
>>> You should be able to tell if there are any rows with column counts that
>>> don't match up (the thing that usually bites me with CSV conversions).
>>> Assuming these all match to what you want, I'd try mapping the unparsed
>>> date column out to separate fields and try to see if a year field isn't
>>> matching the expected values.
>>>
>>> Thanks
>>>
>>> Mike
>>>
>>>
>>> On Thu, Sep 8, 2016 at 8:15 AM, Daniel Lopes <dan...@onematch.com.br>
>>> wrote:
>>>
>>>> Thanks,
>>>>
>>>> I *tested* the function offline and works
>>>> Tested too with select * from after convert the data and see the new
>>>> data good
>>>> *but* if I *register as temp table* to *join other table* stilll shows *the
>>>> same error*.
>>>>
>>>> ValueError: year out of range
>>>>
>>>> Best,
>>>>
>>>> *Daniel Lopes*
>>>> Chief Data and Analytics Officer | OneMatch
>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>
>>>> www.onematch.com.br
>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>
>>>> On Thu, Sep 8, 2016 at 9:43 AM, Marco Mistroni <mmistr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Daniel
>>>>> Test the parse date offline to make sure it returns what you expect
>>>>> If it does   in spark shell create a df with 1 row only and run ur
>>>>> UDF. U should b able to see issue
>>>>> If not send me a reduced CSV file at my email and I give it a try this
>>>>> eve ....hopefully someone else will b able to assist in meantime
>>>>> U don't need to run a full spark app to debug issue
>>>>> Ur problem. Is either in the parse date or in what gets passed to the
>>>>> UDF
>>>>> Hth
>>>>>
>>>>> On 8 Sep 2016 1:31 pm, "Daniel Lopes" <dan...@onematch.com.br> wrote:
>>>>>
>>>>>> Thanks Marco for your response.
>>>>>>
>>>>>> The field came encoded by SQL Server in locale pt_BR.
>>>>>>
>>>>>> The code that I am formating is:
>>>>>>
>>>>>> --------------------------
>>>>>> def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'):
>>>>>>     try:
>>>>>>         locale.setlocale(locale.LC_TIME, 'pt_BR.utf8')
>>>>>>         return datetime.strptime(argument, format_date)
>>>>>>     except:
>>>>>>         return None
>>>>>>
>>>>>> convert_date = funcspk.udf(lambda x: parse_date(x, '%b %d %Y %H:%M'),
>>>>>> TimestampType())
>>>>>>
>>>>>> transacoes = transacoes.withColumn('tr_Vencimento',
>>>>>> convert_date(transacoes.*tr_Vencimento*))
>>>>>>
>>>>>> --------------------------
>>>>>>
>>>>>> the sample is
>>>>>>
>>>>>> -------------------------
>>>>>> +-----------------+----------------+-----------------+------
>>>>>> --+------------------+-----------+-----------------+--------
>>>>>> -------------+------------------+--------------+------------
>>>>>> ----+-------------+-------------+----------------------+----
>>>>>> ------------------------+--------------------+--------+-----
>>>>>> ---+------------------+----------------+--------+----------+
>>>>>> -----------------+----------+
>>>>>> |tr_NumeroContrato|tr_TipoDocumento|    *tr_Vencimento*
>>>>>> |tr_Valor|tr_DataRecebimento|tr_TaxaMora|tr_De
>>>>>> scontoMaximo|tr_DescontoMaximoCorr|tr_ValorAtualizado|tr_Com
>>>>>> Garantia|tr_ValorDesconto|tr_ValorJuros|tr_ValorMulta|tr_Dat
>>>>>> aDevolucaoCheque|tr_ValorCorrigidoContratante|
>>>>>>  tr_DataNotificacao|tr_Banco|tr_Praca|tr_DescricaoAlinea|tr_
>>>>>> Enquadramento|tr_Linha|tr_Arquivo|tr_DataImportacao|tr_Agencia|
>>>>>> +-----------------+----------------+-----------------+------
>>>>>> --+------------------+-----------+-----------------+--------
>>>>>> -------------+------------------+--------------+------------
>>>>>> ----+-------------+-------------+----------------------+----
>>>>>> ------------------------+--------------------+--------+-----
>>>>>> ---+------------------+----------------+--------+----------+
>>>>>> -----------------+----------+
>>>>>> | 0000992600153001|                |*Jul 20 2015 12:00*|  254.35|
>>>>>>            null|       null|             null|                 null|
>>>>>>        null|             0|            null|         null|         null|
>>>>>>                null|                      254.35|2015-07-20 12:00:...|
>>>>>>  null|    null|              null|            null|    null|      null|
>>>>>>         null|      null|
>>>>>> | 0000992600153001|                |*Abr 20 2015 12:00*|  254.35|
>>>>>>            null|       null|             null|                 null|
>>>>>>        null|             0|            null|         null|         null|
>>>>>>                null|                      254.35|                null|
>>>>>>  null|    null|              null|            null|    null|      null|
>>>>>>         null|      null|
>>>>>> | 0000992600153001|                |Nov 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|2015-11-20 12:00:...|    
>>>>>> null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Dez 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    
>>>>>> null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Fev 20 2016 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    
>>>>>> null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Fev 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    
>>>>>> null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Jun 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|2015-06-20 12:00:...|    
>>>>>> null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Ago 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    
>>>>>> null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Jan 20 2016 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|2016-01-20 12:00:...|    
>>>>>> null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Jan 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|2015-01-20 12:00:...|    
>>>>>> null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Set 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    
>>>>>> null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Mai 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    
>>>>>> null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Out 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    
>>>>>> null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Mar 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|2015-03-20 12:00:...|    
>>>>>> null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> +-----------------+----------------+-----------------+------
>>>>>> --+------------------+-----------+-----------------+--------
>>>>>> -------------+------------------+--------------+------------
>>>>>> ----+-------------+-------------+----------------------+----
>>>>>> ------------------------+--------------------+--------+-----
>>>>>> ---+------------------+----------------+--------+----------+
>>>>>> -----------------+----------+
>>>>>>
>>>>>> -------------------------
>>>>>>
>>>>>> *Daniel Lopes*
>>>>>> Chief Data and Analytics Officer | OneMatch
>>>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>>>
>>>>>> www.onematch.com.br
>>>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>>>
>>>>>> On Thu, Sep 8, 2016 at 5:33 AM, Marco Mistroni <mmistr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Pls paste code and sample CSV
>>>>>>> I m guessing it has to do with formatting time?
>>>>>>> Kr
>>>>>>>
>>>>>>> On 8 Sep 2016 12:38 am, "Daniel Lopes" <dan...@onematch.com.br>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm* importing a few CSV*s with spark-csv package,
>>>>>>>> Always when I give a select at each one looks ok
>>>>>>>> But when i join then with sqlContext.sql give me this error
>>>>>>>>
>>>>>>>> all tables has fields timestamp
>>>>>>>>
>>>>>>>> joins are not with this dates
>>>>>>>>
>>>>>>>>
>>>>>>>> *Py4JJavaError: An error occurred while calling o643.showString.*
>>>>>>>> : org.apache.spark.SparkException: Job aborted due to stage
>>>>>>>> failure: Task 54 in stage 92.0 failed 10 times, most recent failure: 
>>>>>>>> Lost
>>>>>>>> task 54.9 in stage 92.0 (TID 6356, yp-spark-dal09-env5-0036):
>>>>>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>>>>>>> recent call last):
>>>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>>>> lib/pyspark.zip/pyspark/worker.py", line 111, in main
>>>>>>>>     process()
>>>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>>>> lib/pyspark.zip/pyspark/worker.py", line 106, in process
>>>>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>>>> lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
>>>>>>>>     vs = list(itertools.islice(iterator, batch))
>>>>>>>>   File 
>>>>>>>> "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py",
>>>>>>>> line 1563, in <lambda>
>>>>>>>>     func = lambda _, it: map(lambda x:
>>>>>>>> returnType.toInternal(f(*x)), it)
>>>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>>>> lib/pyspark.zip/pyspark/sql/types.py", line 191, in toInternal
>>>>>>>>     else time.mktime(dt.timetuple()))
>>>>>>>> *ValueError: year out of range  *
>>>>>>>>
>>>>>>>> Any one knows this problem?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> *Daniel Lopes*
>>>>>>>> Chief Data and Analytics Officer | OneMatch
>>>>>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>>>>>
>>>>>>>> www.onematch.com.br
>>>>>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>

Reply via email to