Thanks guys.
This seemed to be working after declaring all columns as Strings to start
and using filters below to avoid rogue characters. The second filter
ensures that there was trade volumes on that date.
val *rs = df2.filter($"Open" !== "-").filter($"Volume".cast("Integer")
> 0*).filter(change
Hi Mich,
if I understood you well, you may cast the value to float, it will yield
null if the value is not a correct float:
val df = Seq(("-", 5), ("1", 6), (",", 7), ("8.6", 7)).toDF("value",
"id").createOrReplaceTempView("lines")
spark.sql("SELECT cast(value as FLOAT) from lines").show()
+---
Thanks all.
This is the csv schema all columns mapped to String
scala> df2.printSchema
root
|-- Stock: string (nullable = true)
|-- Ticker: string (nullable = true)
|-- TradeDate: string (nullable = true)
|-- Open: string (nullable = true)
|-- High: string (nullable = true)
|-- Low: string
Hi Mich -
Can you run a filter command on df1 prior to your map for any rows where
p(3).toString != '-' then run your map command?
Thanks
Mike
On Tue, Sep 27, 2016 at 5:06 PM, Mich Talebzadeh
wrote:
> Thanks guys
>
> Actually these are the 7 rogue rows. The column 0 is the Volume column
>
Hi Mich,
I guess you could use nullValue option by setting it to null.
If you are reading them into strings at the first please, then, you would
meet https://github.com/apache/spark/pull/14118 first which is resolved
from 2.0.1
Unfortunately, this bug also exists in external csv library for stri
Thanks guys
Actually these are the 7 rogue rows. The column 0 is the Volume column
which means there was no trades on those days
*cat stock.csv|grep ",0"*SAP SE,SAP, 23-Dec-11,-,-,-,40.56,0
SAP SE,SAP, 21-Apr-11,-,-,-,45.85,0
SAP SE,SAP, 30-Dec-10,-,-,-,38.10,0
SAP SE,SAP, 23-Dec-10,-,-,-,38.36,
You can read as string, write a map to fix rows and then convert back to
your desired Dataframe.
On 28 Sep 2016 06:49, "Mich Talebzadeh" wrote:
>
> I have historical prices for various stocks.
>
> Each csv file has 10 years trade one row per each day.
>
> These are the columns defined in the clas
We use the spark-csv (a successor of which is built in to spark 2.0) for
this. It doesn't cause crashes, failed parsing is logged. We run on
Mesos so I have to pull back all the logs from all the executors and
search for failed lines (so that we can ensure that the failure rate
isn't too hig
I have historical prices for various stocks.
Each csv file has 10 years trade one row per each day.
These are the columns defined in the class
case class columns(Stock: String, Ticker: String, TradeDate: String, Open:
Float, High: Float, Low: Float, Close: Float, Volume: Integer)
The issue is w