Kryo not registered class

2017-11-19 Thread Angel Francisco Orta
Hello, I'm with spark 2.1.0 with scala and I'm registering all classes with kryo, and I have a problem registering this class, org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex$SerializableFileStatus$SerializableBlockLocation[] I can't register with classOf[Array[Class.forNam

[BlockMatrix] multiply is an action or a transformation ?

2017-08-13 Thread Jose Francisco Saray Villamizar
do I have to make something like m1.multiply(m2).count(). Thanks. -- -- Buen dia, alegria !! José Francisco Saray Villamizar cel +33 6 13710693 Lyon, France

Spark SVD benchmark for dense matrices

2017-08-09 Thread Jose Francisco Saray Villamizar
matrix . ? Is this time normal ? Thank you. -- -- Buen dia, alegria !! José Francisco Saray Villamizar cel +33 6 13710693 Lyon, France

Spark querying parquet data partitioned in S3

2017-06-30 Thread Francisco Blaya
We have got data stored in S3 partitioned by several columns. Let's say following this hierarchy: s3://bucket/data/column1=X/column2=Y/parquet-files We run a Spark job in a EMR cluster (1 master,3 slaves) and realised the following: A) - When we declare the initial dataframe to be the whole datas

Re: Parquet file generated by Spark, but not compatible read by Hive

2017-06-12 Thread Angel Francisco Orta
Hello, Do you use df.write or you make with hivecontext.sql(" insert into ...")? Angel. El 12 jun. 2017 11:07 p. m., "Yong Zhang" escribió: > We are using Spark *1.6.2* as ETL to generate parquet file for one > dataset, and partitioned by "brand" (which is a string to represent brand > in this

Re: Joins in Spark

2017-05-02 Thread Angel Francisco Orta
ns Thanks, Asmath On Tue, May 2, 2017 at 1:38 PM, Angel Francisco Orta < angel.francisco.o...@gmail.com> wrote: > Have you tried to make partition by join's field and run it by segments, > filtering both tables at the same segments of data? > > Example: > > Val ta

Re: Joins in Spark

2017-05-02 Thread Angel Francisco Orta
join on these tables now. > > > > On Tue, May 2, 2017 at 1:27 PM, Angel Francisco Orta < > angel.francisco.o...@gmail.com> wrote: > >> Hello, >> >> Is the tables partitioned? >> If yes, what is the partition field? >> >> Thanks >> >> &

Re: Joins in Spark

2017-05-02 Thread Angel Francisco Orta
Hello, Is the tables partitioned? If yes, what is the partition field? Thanks El 2 may. 2017 8:22 p. m., "KhajaAsmath Mohammed" escribió: Hi, I am trying to join two big tables in spark and the job is running for quite a long time without any results. Table 1: 192GB Table 2: 92 GB Does any

Re: Spark SQL query key/value in Map

2015-04-16 Thread JC Francisco
Ah yeah, didn't notice that difference. Thanks! It worked. On Fri, Apr 17, 2015 at 4:27 AM, Yin Huai wrote: > For Map type column, fields['driver'] is the syntax to retrieve the map > value (in the schema, you can see "fields: map"). The syntax of > fields.driver is used for struct type. > > On

RE: Spark SQL odbc on Windows

2015-02-23 Thread Francisco Orchard
r in the first place is because its rolap mode (direct query) is still too limited. And thanks for writing the klout paper!! We were already using it as a guideline for our tests. Best regards, Francisco -Original Message- From: "Denny Lee" Sent: ‎22/‎02/‎2015 17:56 To:

Spark SQL odbc on Windows

2015-02-22 Thread Francisco Orchard
command available for windows. Somebody knows if there is a way to make this work? Thanks in advance!! Francisco

Re: Size exceeds Integer.MAX_VALUE in BlockFetcherIterator

2014-09-17 Thread francisco
Looks like this is a known issue: https://issues.apache.org/jira/browse/SPARK-1353 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Size-exceeds-Integer-MAX-VALUE-in-BlockFetcherIterator-tp14483p14500.html Sent from the Apache Spark User List mailing list ar

Size exceeds Integer.MAX_VALUE in BlockFetcherIterator

2014-09-17 Thread francisco
Hi, We are running aggregation on a huge data set (few billion rows). While running the task got the following error (see below). Any ideas? Running spark 1.1.0 on cdh distribution. ... 14/09/17 13:33:30 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 2083 bytes result sent to driver 14/09

Re: Memory under-utilization

2014-09-16 Thread francisco
Thanks for the tip. http://localhost:4040/executors/ is showing Executors(1) Memory: 0.0 B used (294.9 MB Total) Disk: 0.0 B Used However, running as standalone cluster does resolve the problem. I can see a worker process running w/ the allocated memory. My conclusion (I may be wrong) is for 'l

Re: Memory under-utilization

2014-09-16 Thread francisco
Thanks for the reply. I doubt that's the case though ... the executor kept having to do a file dump because memory is full. ... 14/09/16 15:00:18 WARN ExternalAppendOnlyMap: Spilling in-memory map of 67 MB to disk (668 times so far) 14/09/16 15:00:21 WARN ExternalAppendOnlyMap: Spilling in-memor

Memory under-utilization

2014-09-16 Thread francisco
Hi, I'm a Spark newbie. We had installed spark-1.0.2-bin-cdh4 on a 'super machine' with 256gb memory and 48 cores. Tried to allocate a task with 64gb memory but for whatever reason Spark is only using around 9gb max. Submitted spark job with the following command: " /bin/spark-submit -class Sim