Re: Error with Spark + IGFS (HDFS cache) through Hive

Maximiliano Patricio Méndez Thu, 13 Sep 2018 08:33:02 -0700

Hi evgenii, thanks for answering.

I'm fairly new to ignite, but from what I can see in the docs the ignite
rdd or the ignite dataframe are only used to serve data that is already in
Ignite in the form of a Table or cache. Unfortunately I currently have a
couple hundred tables in HDFS referenced through the Hive metastore which
conveniently enables me to reference those tables in spark with minimal
configuration on the user side. (spark.sql SELECT FROM any table in the
hive metastore).


What I was hoping to accomplish was to use Ignite (IGFS) as a HDFS cache
with HDFS as its SecondaryFileSystem, referencing those table through the
hive metastore to use in spark. I haven't tested the performance yet, but I
feel that it most certainly will be a worse solution than using the native
ignite tables.

I finally compiled a different version of the ignite-hadoop module
overwriting the LOG variable which causes the problem for a log of the
v1.IgniteHadoopFileSystem.class, making the LOG passed through the
HadoopIgfsWrapper and beyond to be a log of the v1.IgniteHadoopFileSystem,
instead of a log of the hadoop.FileSystem class which caused the
LinkageError. This seems to have solved the issue so far, and it's the same
logging solution present in the v2.IgniteHadoopFileSystem.

My questions would be,

Do you think this fix is worth a small patch or is this kind of usage
completely unadvised?
The philosophy behind ignite to be used through spark would be to always
create native ignite tables?

Thanks in advance.

El mar., 11 sept. 2018 a las 10:25, Evgenii Zhuravlev (<
e.zhuravlev...@gmail.com>) escribió:

> Hi,
>
> Do you really need to use Hive here? You can just use Spark integration
> with Ignite, which allows to run sql: DataFrame(
> https://apacheignite-fs.readme.io/docs/ignite-data-frame) or RDD(
> https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd). For
> sure, this solution will work much faster.
>
> Evgenii
>
> пн, 10 сент. 2018 г. в 23:08, Maximiliano Patricio Méndez <
> mmen...@despegar.com>:
>
>> Hi,
>>
>> I'm having an LinkageError in spark trying to read a hive table that has
>> the external location in IGFS:
>> java.lang.LinkageError: loader constraint violation: when resolving field
>> "LOG" the class loader (instance of
>> org/apache/spark/sql/hive/client/IsolatedClientLoader$$anon$1) of the
>> referring class, org/apache/hadoop/fs/FileSystem, and the class loader
>> (instance of sun/misc/Launcher$AppClassLoader) for the field's resolved
>> type, org/apache/commons/logging/Log, have different Class objects for that
>> type
>>   at
>> org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem.initialize(IgniteHadoopFileSystem.java:255)
>>
>> From what I can see the exception comes when spark tries to read a table
>> from Hive and then through IGFS and passing the "LOG" variable of the
>> FileSystem around to the HadoopIgfsWrapper (and beyond...).
>>
>> The steps I followed to reach this error were:
>>
>>    - Create a file /tmp/test.parquet in HDFS
>>    - Create an external table test.test in hive with location =
>>    igfs://igfs@<host>/tmp/test.parquet
>>    - Start spark-shell with the command:
>>       - ./bin/spark-shell --jars
>>       
>> $IGNITE_HOME/ignite-core-2.6.0.jar,$IGNITE_HOME/ignite-hadoop/ignite-hadoop-2.6.0.jar,$IGNITE_HOME/ignite-shmem-1.0.0.jar,$IGNITE_HOME/ignite-spark-2.6.0.jar
>>       - Read the table through spark.sql
>>       - spark.sql("SELECT * FROM test.test")
>>
>> Is there maybe a way to avoid having this issue? Has anyone used ignite
>> through hive as HDFS cache in a similar way?
>>
>>
>>

Re: Error with Spark + IGFS (HDFS cache) through Hive

Reply via email to