Shortest path performance in Graphx with Spark

2017-01-10 Thread Gerard Casey
Hello everyone, I am creating a graph from a `gz` compressed `json` file of `edge` and `vertices` type. I have put the files in a dropbox folder [here][1] I load and map these `json` records to create the `vertices` and `edge` types required by `graphx` like this: val vertices_raw = sqlCo

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Gerard Casey
Sure - I wanted to check with admin before sharing. I’ve attached it now, does this help? Many thanks again, G Container: container_e34_1479877553404_0174_01_03 on hdp-node12.xcat.cluster_45454_1481228528201 ==

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-08 Thread Gerard Casey
Right. I’m confident that is setup correctly. I can run the SparkPi test script. The main difference between it and my application is that it doesn’t access HDFS. > On 8 Dec 2016, at 18:43, Marcelo Vanzin wrote: > > On Wed, Dec 7, 2016 at 11:54 PM, Gerard Casey > wrote: >&

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
park will handle it then. I may be wrong > though. > > I guess it gets even more complicated if you need to access other secured > service from Spark like hbase or Phoenix, but i guess this is for another > discussion. > > Regards, > Marcin > > > On Thu,

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
in the startup of your code but any HDFS access will require a TOKEN or KERBEROS ticket. Cheers, Wilfred > On 8 Dec 2016, at 08:35, Gerard Casey wrote: > > Thanks Marcelo. > > I’ve completely removed it. Ok - even if I read/write from HDFS? > > Trying to the SparkPi exampl

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
pal / keytab configs. > > Literally all you have to do is login with kinit then run spark-submit. > > Try with the SparkPi example for instance, instead of your own code. > If that doesn't work, you have a configuration issue somewhere. > > On Wed, Dec 7, 2016 at 1:09

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Thanks. I’ve checked the TGT, principal and key tab. Where to next?! > On 7 Dec 2016, at 22:03, Marcelo Vanzin wrote: > > On Wed, Dec 7, 2016 at 12:15 PM, Gerard Casey > wrote: >> Can anyone point me to a tutorial or a run through of how to use Spark with >> Kerbero

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-07 Thread Gerard Casey
Marcelo Vanzin wrote: > > That's not the error, that's just telling you the application failed. > You have to look at the YARN logs for application_1479877553404_0041 > to see why it failed. > > On Mon, Dec 5, 2016 at 10:44 AM, Gerard Casey > wrote: >>

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
12/05 18:24:18 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2e566133-d50a-4904-920e-ab5cec07c644 On Mon, Dec 5, 2016 at 10:30 AM, Gerard Casey wrote: > >> On 5 Dec 2016, at 19:26, Marcelo Vanzin wrote: >> >> There's generally an exception in these cases,

Re: Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
cutor-memory 13G --total-executor-cores 32 target/scala-2.10/graphx_sp_2.10-1.0.jar However, the error persists Any ideas? Thanks Geroid > On 5 Dec 2016, at 13:35, Gerard Casey wrote: > > Hello all, > > I am using Spark with Kerberos authentication. > > I can run my co

Kerberos and YARN - functions in spark-shell and spark submit local but not cluster mode

2016-12-05 Thread Gerard Casey
Hello all, I am using Spark with Kerberos authentication. I can run my code using `spark-shell` fine and I can also use `spark-submit` in local mode (e.g. —master local[16]). Both function as expected. local mode - spark-submit --class "graphx_sp" --master local[16] --driver-memory 20G

RDD to HDFS - Kerberos - authentication error - RetryInvocationHandler

2016-11-11 Thread Gerard Casey
Hi all, I have an RDD that I wish to write to HDFS. data.saveAsTextFile("hdfs://path/vertices") This returns: WARN RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over null. Not retrying because try once and fail. org.apache.hadoop.ipc.RemoteExce

GraphX and Public Transport Shortest Paths

2016-11-08 Thread Gerard Casey
Hi all, I’m doing a quick lit review. Consider I have a graph that has link weights dependent on time. I.e., a bus on this road gives a journey time (link weight) of x at time y. This is a classic public transport shortest path problem. This is a weighted directed graph that is time dependent

GraphX VerticesRDD issue - java.lang.ArrayStoreException: java.lang.Long

2016-08-18 Thread Gerard Casey
Dear all, I am building a graph from two JSON files. Spark version 1.6.1 Creating Edge and Vertex RDDs from JSON files. The vertex JSON files looks like this: {"toid": "osgb400031043205", "index": 1, "point": [508180.748, 195333.973]} {"toid": "osgb400031043206", "inde