Thanks for the update. we are using 2.0 version.
so planning to write own custom logic to remove the null values.
Thanks,
selvam R
On Fri, Aug 26, 2016 at 9:08 PM, Russell Spitzer
wrote:
> Cassandra does not differentiate between null and empty, so when reading
> from C* all empty values are r
There is connection leak problem with hortonworks hbase connector if you
use hbase 1.2.0.
I tried to use hortonwork's connector and felt into the same problem.
Have a look at this Hbase issue HBASE-16017 [0]. The fix for this was
backported to 1.3.0, 1.4.0 and 2.0.0
I have raised a ticket on their
Hi Sachin,
Have a look at the spark job server project, it allows you to share rdds &
dataframes between spark jobs running in the same context, the catch is you
have to implement your spark job as a spark job server spark job.
https://github.com/spark-jobserver/spark-jobserver/blob/master/README
Hi,
I would need some thoughts or inputs or any starting point to achieve
following scenario.
I submit a job using spark-submit with a certain set of parameters.
It reads data from a source, does some processing on RDDs and generates
some output and completes.
Then I submit same job again with ne
Best to create alias and place in your bashrc
On 29 Aug 2016 08:30, "Russell Jurney" wrote:
> In order to use PySpark with MongoDB and ElasticSearch, I currently run
> the rather long commands of:
>
> 1) pyspark --executor-memory 10g --jars ../lib/mongo-hadoop-spark-2.0.
> 0-rc0.jar,../lib/mongo-
I'm trying to use streamBulkGet method in my Spark application. I want to
save the result of this streamBulkGet to a JavaDStream object, but I'm
unable to get the code compiled. I get error:
Required: org.apacheJavaDStream
Found: void
Here is the code:
JavaDStream x =
this.javaHBaseContext.
> Does parquet file has limit in size ( 1TB ) ?
I did’t see any problem but 1TB is too big to operation need to divide into
small pieces.
> Should we use SaveMode.APPEND for long running streaming app ?
Yes, but you need to partition it by time so it easy to maintain like update or
delete a spec
Hi Mich,
My stack is as following:
Data sources:
* IBM MQ
* Oracle database
Kafka to store all messages from data sources
Spark Streaming fetching messages from Kafka and do a bit transform and
write parquet files to HDFS
Hive / SparkSQL / Impala will query on parquet files.
Do you have any re
(Sorry, typo -- I was using spark.hadoop.mapreduce.
fileoutputcommitter.algorithm.version=2 not 'hadooop', of course)
On Sun, Aug 28, 2016 at 12:51 PM, Everett Anderson wrote:
> Hi,
>
> I'm having some trouble figuring out a failure when using S3A when writing
> a DataFrame as Parquet on EMR 4.7
In order to use PySpark with MongoDB and ElasticSearch, I currently run the
rather long commands of:
1) pyspark --executor-memory 10g --jars
../lib/mongo-hadoop-spark-2.0.0-rc0.jar,../lib/mongo-java-driver-3.2.2.jar,../lib/mongo-hadoop-2.0.0-rc0.jar
--driver-class-path
../lib/mongo-hadoop-spark-2.
Hi,
I'm having some trouble figuring out a failure when using S3A when writing
a DataFrame as Parquet on EMR 4.7.2 (which is Hadoop 2.7.2 and Spark
1.6.2). It works when using EMRFS (s3://), though.
I'm using these extra conf params, though I've also tried without
everything but the encryption on
Hi
I'm struggling with the following issue.
I need to build a cube with 6 dimensions for app usage
for example:
---+---+--+-+--+--
user | app | d3 | d4 | d5 | d6
---+---+--+-+--+--
u1 | a1 | x| y | z | 5
---+--
Spark best fits for processing. But depending on the use case, you could expand
the scope of Spark to moving data using the native connectors. The only that
Spark is not, is Storage. Connectors are available for most storage options
though.
Regards,
Sivakumaran S
> On 28-Aug-2016, at 6:04 P
Hi,
Can you explain about you particular stack.
Example what is the source of streaming data and the role that Spark plays.
Are you dealing with Real Time and Batch and why Parquet and not something
like Hbase to ingest data real time.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linked
Hi,
There are design patterns that use Spark extensively. I am new to this area so
I would appreciate if someone explains where Spark fits in especially within
faster or streaming use case.
What are the best practices involving Spark. Is it always best to deploy it for
processing engine,
For ex
No, it is just being truncated for display as the ... implies. Pass
truncate=false to the show command.
On Sun, Aug 28, 2016, 15:24 Kevin Tran wrote:
> Hi,
> I wrote to parquet file as following:
>
> ++
> |word|
> ++
> |THIS IS MY CHARACTER
Hi,
Does anyone know what is the best practises to store data to parquet file?
Does parquet file has limit in size ( 1TB ) ?
Should we use SaveMode.APPEND for long running streaming app ?
How should we store in HDFS (directory structure, ... )?
Thanks,
Kevin.
Hi,
I wrote to parquet file as following:
++
|word|
++
|THIS IS MY CHARACTERS ...|
|// ANOTHER LINE OF CHAC...|
++
These lines are not full text and it is being trimmed down.
Does anyone know how many chacters StringType
I am trying to do a high performance calculations which require custom
functions.
As a first stage I am trying to profile the effect of using UDF and I am
getting weird results.
I created a simple test (in
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f
Hi,
We use such kind of logic for training our model
val model = new LogisticRegressionWithLBFGS()
.setNumClasses(3)
.run(train)
Next, during spark streaming, we load model and apply incoming data to this
model to get specific class, for example:
model.predict(Vectors.dense(1
Yes I realised that. Actually I thought it was s not $. it has been around
in shell for years say for actual values --> ${LOG_FILE}, for position 's/
etc
cat ${LOG_FILE} | egrep -v 'rows affected|return status|&&&' | sed -e
's/^[]*//g' -e 's/^//g' -e '/^$/d' > temp.out
Dr Mi
Hi Mich,
This is Scala's string interpolation which allow for replacing $-prefixed
expressions with their values.
It's what cool kids use in Scala to do templating and concatenation 😁
Jacek
On 23 Aug 2016 9:21 a.m., "Mich Talebzadeh"
wrote:
> What is --> s below before the text of sql?
>
>
22 matches
Mail list logo