Re: Update MySQL table via Spark/SparkR?

2017-08-22 Thread Jake Russ
:44 PM To: Jake Russ <jr...@bloomintelligence.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Subject: Re: Update MySQL table via Spark/SparkR? Hi Jake, This is an issue across all RDBMs including Oracle etc. When you are updating you have to commit or roll back in RDB

Update MySQL table via Spark/SparkR?

2017-08-21 Thread Jake Russ
Hi everyone, I’m currently using SparkR to read data from a MySQL database, perform some calculations, and then write the results back to MySQL. Is it still true that Spark does not support UPDATE queries via JDBC? I’ve seen many posts on the internet that Spark’s DataFrameWriter does not

Apparent bug in KryoSerializer

2015-12-31 Thread Russ
The ScalaTest code that is enclosed at the end of this email message demonstrates what appears to be a bug in the KryoSerializer.  This code was executed from IntelliJ IDEA (community edition) under Mac OS X 10.11.2 The KryoSerializer is enabled by updating the original SparkContext  (that is

How to register a Tuple3 with KryoSerializer?

2015-12-30 Thread Russ
I need to register with KryoSerializer a Tuple3 that is generated by a call to the sortBy() method that eventually calls collect() from Partitioner.RangePartitioner.sketch(). The IntelliJ Idea debugger indicates that the for the Tuple3 are java.lang.Integer, java.lang.Integer and long[].  So,

building a distributed k-d tree with spark

2015-12-22 Thread Russ
the associated source code, so if anyone has suggestions for improvement, please feel free to communicate them to me. Thanks, Russ Brown

Re: Indexing Support

2015-10-18 Thread Russ Weeks
Distributed R-Trees are not very common. Most "big data" spatial solutions collapse multi-dimensional data into a distributed one-dimensional index using a space-filling curve. Many implementations exist outside of Spark for eg. Hbase or Accumulo. It's simple enough to write a map function that

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread Russ Weeks
representation of an entire logical row; it's a useful convenience if you can be sure that your rows always fit in memory. I haven't tested it since Spark 1.0.1 but I doubt anything important has changed. Regards, -Russ On Thu, Mar 26, 2015 at 11:41 AM, David Holiday dav...@annaisystems.com wrote

Re: Reading from HBase is too slow

2014-09-29 Thread Russ Weeks
will be better for your cluster. -Russ On Mon, Sep 29, 2014 at 7:43 PM, Nan Zhu zhunanmcg...@gmail.com wrote: can you look at your HBase UI to check whether your job is just reading from a single region server? Best, -- Nan Zhu On Monday, September 29, 2014 at 10:21 PM, Tao Xiao wrote: I

Re: Does anyone have experience with using Hadoop InputFormats?

2014-09-24 Thread Russ Weeks
I use newAPIHadoopRDD with AccumuloInputFormat. It produces a PairRDD using Accumulo's Key and Value classes, both of which extend Writable. Works like a charm. I use the same InputFormat for all my MR jobs. -Russ On Wed, Sep 24, 2014 at 9:33 AM, Steve Lewis lordjoe2...@gmail.com wrote: I

Re: Does anyone have experience with using Hadoop InputFormats?

2014-09-24 Thread Russ Weeks
No, they do not implement Serializable. There are a couple of places where I've had to do a Text-String conversion but generally it hasn't been a problem. -Russ On Wed, Sep 24, 2014 at 10:27 AM, Steve Lewis lordjoe2...@gmail.com wrote: Do your custom Writable classes implement Serializable - I

Re: Accumulo and Spark

2014-09-10 Thread Russ Weeks
(hadoopJob.getConfiguration(), AccumuloInputFormat.class, Key.class, Value.class); } There's tons of docs around how to operate on a JavaPairRDD. But you're right, there's hardly anything at all re. how to plug accumulo into spark. -Russ On Wed, Sep 10, 2014 at 1:17 PM, Megavolt jbru...@42six.com wrote: I've

Re: Spark + AccumuloInputFormat

2014-09-10 Thread Russ Weeks
down to 30s from 18 minutes and I'm seeing much better utilization of my accumulo tablet servers. -Russ On Tue, Sep 9, 2014 at 5:13 PM, Russ Weeks rwe...@newbrightidea.com wrote: Hi, I'm trying to execute Spark SQL queries on top of the AccumuloInputFormat. Not sure if I should be asking

Spark + AccumuloInputFormat

2014-09-09 Thread Russ Weeks
tablet servers with active scans. Since the data is spread across all the tablet servers, I hoped to see 8! I realize there are a lot of moving parts here but I'd any advice about where to start looking. Using Spark 1.0.1 with Accumulo 1.6. Thanks! -Russ