Thanks Krishna for your response.
Features in the training set has more categories than test set so when
vectorAssembler is used these numbers are usually different and I believe
it is as expected right ?
Test dataset usually will not have so many categories in their features as
Train is the belie
I can't say this is the best way to do so but my instant thought is as
below:
Create two df
sc.hadoopConfiguration.set(XmlInputFormat.START_TAG_KEY, s"")
sc.hadoopConfiguration.set(XmlInputFormat.END_TAG_KEY, s"")
sc.hadoopConfiguration.set(XmlInputFormat.ENCODING_KEY, "UTF-8")
val strXmlDf = sc
I found there are several .conf files in the conf directory, which one is
used as the default one when I click the "new" button on the notebook
homepage? I want to edit the default profile configuration so all my
notebooks are created with custom settings.
--
Thanks,
David S.
Hi,
Just after I sent the mail, I realized that the error might be with the
training-dataset not the test-dataset.
1. it might be that you are feeding the full Y vector for training.
2. Which could mean, you are using ~50-50 training-test split.
3. Take a good look at the code that doe
Hi,
Looks like the test-dataset has different sizes for X & Y. Possible steps:
1. What is the test-data-size ?
- If it is 15,909, check the prediction variable vector - it is now
29,471, should be 15,909
- If you expect it to be 29,471, then the X Matrix is not right.
Hi,
I have built the logistic regression model using training-dataset.
When I am predicting on a test-dataset, it is throwing the below error of
size mismatch.
Steps done:
1. String indexers on categorical features.
2. One hot encoding on these indexed features.
Any help is appreciated to resolv
Any update on this?
On Tuesday, 16 August 2016, Aneela Saleem wrote:
> Thanks Steve,
>
> I went through this but still not able to fix the issue
>
> On Mon, Aug 15, 2016 at 2:01 AM, Steve Loughran > wrote:
>
>> Hi,
>>
>> Just came across this while going through all emails I'd left unread over
Any update on this?
On Tuesday, 16 August 2016, Aneela Saleem wrote:
> Thanks Steve,
>
> I have gone through it's documentation, i did not get any idea how to
> install it. Can you help me?
>
> On Mon, Aug 15, 2016 at 4:23 PM, Steve Loughran > wrote:
>
>>
>> On 15 Aug 2016, at 08:29, Aneela Sal
Hi Hyukjin,
I have created the below issue.
https://github.com/databricks/spark-xml/issues/155
Sent from Mail for Windows 10
From: Hyukjin Kwon
Hello Experts,
I’m using spark-xml package which is automatically inferring my schema and
creating a DataFrame.
I’m extracting few fields like id, name (which are unique) from below xml, but
my requirement is to store entire XML in one of the column as well. I’m writing
this data to AVRO hive
Hi Kamesh,
The message you are getting after 7 days:
PriviledgedActionException as:sys_bio_replicator (auth:KERBEROS)
cause:org.apache.hadoop.ipc.RemoteException(org.apache.
hadoop.security.token.SecretManager$InvalidToken): Token has expired
Sounds like an IPC issue with Kerberos Authentication
Hi,
That's my understanding -- you need to fit another model given the
training data.
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Fri, Aug 19, 2016 at 10:2
On Sun, Aug 21, 2016 at 3:08 AM, Bedrytski Aliaksandr
wrote:
> Hi,
>
> we share the same spark/hive context between tests (executed in
> parallel), so the main problem is that the temporary tables are
> overwritten each time they are created, this may create race conditions
> as these tempTables
Hi,
See
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala#L65
to learn how Spark SQL parses SQL texts. It could give you a way out.
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache S
Hi sudhanshu,
Try user-unsubscribe.spark.apache.org
- Rahul P
Sent from my iPhone
> On Aug 21, 2016, at 9:19 AM, Sudhanshu Janghel
> wrote:
>
> Hello,
>
> I wish to unsubscribe from the channel.
>
> KIND REGARDS,
> SUDHANSHU
Hello,
I wish to unsubscribe from the channel.
KIND REGARDS,
SUDHANSHU
Hi Kamesh,
I believe your only option is to re-start your application every 7
days (perhaps you need to enable checkpointing). See
https://github.com/apache/spark/commit/ab648c0004cfb20d53554ab333dd2d198cb94ffa
for a change with automatic security token renewal.
Pozdrawiam,
Jacek Laskowski
Another option would be to look at spark-xml-utils. We use this extensively in
the manipulation of our XML content.
https://github.com/elsevierlabs-os/spark-xml-utils
There are quite a few examples. Depending on your preference (and what you
want to do), you could use xpath, xquery, or xslt
You are attempting to read a tar file. That won't work. A compressed JSON
file would.
On Sun, Aug 21, 2016, 12:52 Chua Jie Sheng wrote:
> Hi Spark user list!
>
> I have been encountering corrupted records when reading Gzipped files that
> contains more than one file.
>
> Example:
> I have two .j
Hi Spark user list!
I have been encountering corrupted records when reading Gzipped files that
contains more than one file.
Example:
I have two .json file, [a.json, b.json]
Each have multiple records (one line, one record).
I tar both of them together on
Mac OS X, 10.11.6
bsdtar 2.8.3 - libarch
Hi Diwakar,
Spark XML library can take RDD as source.
```
val df = new XmlReader()
.withRowTag("book")
.xmlRdd(sqlContext, rdd)
```
If performance is critical, I would also recommend to take care of creation
and destruction of the parser.
If the parser is not serializble, then you can do th
Hi,
we share the same spark/hive context between tests (executed in
parallel), so the main problem is that the temporary tables are
overwritten each time they are created, this may create race conditions
as these tempTables may be seen as global mutable shared state.
So each time we create a temp
I having trouble loading data from an s3 repo
Currently DCOS is running spark 2 so I not sure if there is a modifcation
to code with the upgrade
my code atm looks like this
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "xxx")
23 matches
Mail list logo