Hi, For the sake of obtaining a pure understanding of this myself I'm trying to use DataFileAvroStore with the gora-tutorial LogManager scenario... with little luck. Config as follows
gora.properties --------------------- gora.datastore.default=org.apache.gora.avro.store.DataFileAvroStore gora.avrostore.output.path=file:///home/lewis/ASF/gora_trunk/gora.output gora-datafileavrostore-mapping.xml --------------------------------------------------- non-existent... yet I'm running hadoop 1.0.1 (for compatibility with Gora trunk) in pseudo distrib with the following settings core-site.xml ------------------ <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>URI of NameNode.</description> </property> </configuration> hdfs-site.xml ------------------ <property> <name>dfs.replication</name> <value>1</value> <description></description> </property> <property> <name>dfs.name.dir</name> <value>/home/lewis/ASF/hadoop_output/dfs/name/</value> <description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description> </property> <property> <name>dfs.data.dir</name> <value>/home/lewis/ASF/hadoop_output/dfs/data/</value> <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks. </description> </property> mapred-site.xml ------------------------ <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> <description>URI of job tracker.</description> </property> <property> <name>mapred.system.dir</name> <value>/home/lewis/ASF/hadoop_output/mapred/system_files</value> <description>Path on the HDFS where where the MapReduce framework stores system files e.g. /hadoop/mapred/system/. </description> </property> <property> <name>mapred.local.dir</name> <value>/home/lewis/ASF/hadoop_output/mapred/</value> <description>Comma-separated list of paths on the local filesystem where temporary MapReduce data is written. </description> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx1024m</value> <description>Memory allocated to the medred children nodes.</description> </property> I've been running this set up with both Nutch 2.x (head) and Cassandra 1.1.1 as well as the goraci module so I know my current Hadoop set up is 'OK'. When I parse the webserver logs within the tutorial module everything is fine, however when I attempt to query an individual record I am getting lewis@lewis-desktop:~/ASF/gora_trunk$ ./bin/gora logmanager -query 10 Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:78) at org.apache.hadoop.fs.Path.<init>(Path.java:90) at org.apache.gora.avro.store.DataFileAvroStore.createFsInput(DataFileAvroStore.java:85) at org.apache.gora.avro.store.DataFileAvroStore.executeQuery(DataFileAvroStore.java:67) at org.apache.gora.store.impl.FileBackedDataStoreBase.execute(FileBackedDataStoreBase.java:163) at org.apache.gora.query.impl.QueryBase.execute(QueryBase.java:71) at org.apache.gora.tutorial.log.LogManager.query(LogManager.java:156) at org.apache.gora.tutorial.log.LogManager.main(LogManager.java:246) Before I head over to hadoop forums I thought best to fire this one on here as it primarily concerns Gora config and fitting this around Hadoop. Any thoughts would be excellent here... Thanks Lewis On Wed, Oct 10, 2012 at 12:57 AM, Enis Söztutar <e...@apache.org> wrote: > Sorry, It's been some time that I last looked into these. AvroStore uses > files and writes data with DatumWriter directly, whereas DataFileAvroStore > uses the data file, which is an avro file format. This format support > blocks, so they can be split for mapreduce tasks. > > Yes, all FileBasedDataStores work on top of files stored at a hadoop file > system. even local file system should work.