Never mind. I found the solution:

val newDataFrame = hc.createDataFrame(hiveLoadedDataFrame.rdd,
hiveLoadedDataFrame.schema)

which translate to convert the data frame to rdd and back again to data
frame. Not the prettiest solution, but at least it solves my problems.


Thanks,
Cesar Flores



On Thu, Apr 16, 2015 at 11:17 AM, Cesar Flores <ces...@gmail.com> wrote:

>
> I have a data frame in which I load data from a hive table. And my issue
> is that the data frame is missing the columns that I need to query.
>
> For example:
>
> val newdataset = dataset.where(dataset("label") === 1)
>
> gives me an error like the following:
>
> ERROR yarn.ApplicationMaster: User class threw exception: resolved
> attributes label missing from label, user_id, ...(the rest of the fields of
> my table
> org.apache.spark.sql.AnalysisException: resolved attributes label missing
> from label, user_id, ... (the rest of the fields of my table)
>
> where we can see that the label field actually exist. I manage to solve
> this issue by updating my syntax to:
>
> val newdataset = dataset.where($"label" === 1)
>
> which works. However I can not make this trick in all my queries. For
> example, when I try to do a unionAll from two subsets of the same data
> frame the error I am getting is that all my fields are missing.
>
> Can someone tell me if I need to do some post processing after loading
> from hive in order to avoid this kind of errors?
>
>
> Thanks
> --
> Cesar Flores
>



-- 
Cesar Flores

Reply via email to