DataFileAvroStore within LogManager Tutorial [WAS] Re: DataFileAvroStore vs. AvroStore

Lewis John Mcgibbney Wed, 10 Oct 2012 13:40:41 -0700

Hi,

For the sake of obtaining a pure understanding of this myself I'm
trying to use DataFileAvroStore with the gora-tutorial LogManager
scenario... with little luck. Config as follows


gora.properties
---------------------
gora.datastore.default=org.apache.gora.avro.store.DataFileAvroStore
gora.avrostore.output.path=file:///home/lewis/ASF/gora_trunk/gora.output

gora-datafileavrostore-mapping.xml
---------------------------------------------------
non-existent... yet

I'm running hadoop 1.0.1 (for compatibility with Gora trunk) in pseudo
distrib with the following settings

core-site.xml
------------------
<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
         <description>URI of NameNode.</description>
     </property>
</configuration>

hdfs-site.xml
------------------
     <property>
         <name>dfs.replication</name>
         <value>1</value>
         <description></description>
     </property>

     <property>
         <name>dfs.name.dir</name>
         <value>/home/lewis/ASF/hadoop_output/dfs/name/</value>
         <description>Path on the local filesystem where the NameNode
stores the namespace and transactions logs persistently.</description>
     </property>

     <property>
         <name>dfs.data.dir</name>
         <value>/home/lewis/ASF/hadoop_output/dfs/data/</value>
         <description>Comma separated list of paths on the local
filesystem of a DataNode where it should store its blocks.
</description>
     </property>

mapred-site.xml
------------------------
<property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    <description>URI of job tracker.</description>
</property>

<property>
    <name>mapred.system.dir</name>
    <value>/home/lewis/ASF/hadoop_output/mapred/system_files</value>
    <description>Path on the HDFS where where the MapReduce framework
stores system files e.g. /hadoop/mapred/system/. </description>
</property>

<property>
    <name>mapred.local.dir</name>
    <value>/home/lewis/ASF/hadoop_output/mapred/</value>
    <description>Comma-separated list of paths on the local filesystem
where temporary MapReduce data is written. </description>
</property>

<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024m</value>
    <description>Memory allocated to the medred children nodes.</description>
</property>

I've been running this set up with both Nutch 2.x (head) and Cassandra
1.1.1 as well as the goraci module so I know my current Hadoop set up
is 'OK'. When I parse the webserver logs within the tutorial module
everything is fine, however when I attempt to query an individual
record I am getting

lewis@lewis-desktop:~/ASF/gora_trunk$ ./bin/gora logmanager -query 10
Exception in thread "main" java.lang.IllegalArgumentException: Can not
create a Path from a null string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:78)
        at org.apache.hadoop.fs.Path.<init>(Path.java:90)
        at 
org.apache.gora.avro.store.DataFileAvroStore.createFsInput(DataFileAvroStore.java:85)
        at 
org.apache.gora.avro.store.DataFileAvroStore.executeQuery(DataFileAvroStore.java:67)
        at 
org.apache.gora.store.impl.FileBackedDataStoreBase.execute(FileBackedDataStoreBase.java:163)
        at org.apache.gora.query.impl.QueryBase.execute(QueryBase.java:71)
        at org.apache.gora.tutorial.log.LogManager.query(LogManager.java:156)
        at org.apache.gora.tutorial.log.LogManager.main(LogManager.java:246)

Before I head over to hadoop forums I thought best to fire this one on
here as it primarily concerns Gora config and fitting this around
Hadoop.

Any thoughts would be excellent here...

Thanks

Lewis


On Wed, Oct 10, 2012 at 12:57 AM, Enis Söztutar <e...@apache.org> wrote:
> Sorry, It's been some time that I last looked into these. AvroStore uses
> files and writes data with DatumWriter directly, whereas DataFileAvroStore
> uses the data file, which is an avro file format. This format support
> blocks, so they can be split for mapreduce tasks.
>
> Yes, all FileBasedDataStores work on top of files stored at a hadoop file
> system. even local file system should work.

DataFileAvroStore within LogManager Tutorial [WAS] Re: DataFileAvroStore vs. AvroStore

Reply via email to