Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Jander g
Thanks for all the above reply. Now my idea is: running word segmentation on Hadoop and creating the inverted index in mysql. As we know, Hadoop MR supports writing and reading to mysql. Does this have any problem? On Sat, Jan 1, 2011 at 7:49 AM, James Seigel wrote: > Check out katta for an ex

Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread Harsh J
Hello, On Sat, Jan 1, 2011 at 5:32 AM, W.P. McNeill wrote: > However, the 0.20.2 documentation has the same error: > http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#Hadoop+Streaming > . Looks like the current release (0.21.0) and trunk also have the same error. > Is there a place I

Re: HDFS FS Commands Hanging System

2010-12-31 Thread li ping
I suggest you should look through the logs to see if there is any error. And the second point that I need to point out is which node you run the command "hadoop fs -ls ". If you run the command on Node A, the configuration item "fs.default.name" should point to the HDFS. On Sat, Jan 1, 2011 at 3:2

Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread W.P. McNeill
I went to the top Google hit for "Hadoop streaming" and didn't notice that this was the 0.15.2 documentation instead of the one that matches my version. However, the 0.20.2 documentation has the same error: http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#Hadoop+Streaming . I verified

Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread James Seigel
Check out katta for an example J Sent from my mobile. Please excuse the typos. On 2010-12-31, at 4:47 PM, Lance Norskog wrote: > This will not work for indexing. Lucene requires random read/write to > a file and HDFS does not support this. HDFS only allows sequential > writes: you start at the

Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Lance Norskog
This will not work for indexing. Lucene requires random read/write to a file and HDFS does not support this. HDFS only allows sequential writes: you start at the beginninig and copy the file in to block 0, block 1,...block N. For querying, if your HDFS implementation makes a local cache that appea

Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread Zhenhua Guo
The doc you mentioned is for Hadoop 0.15.2. But you seem to use 0.20.2. Probably you should read Hadoop docs for your installed version. Gerald On Fri, Dec 31, 2010 at 2:02 PM, W.P. McNeill wrote: > Found it under /opt/hadoop/contrib/streaming.  I am now able to run Hadoop > streaming jobs on my

FW:FW

2010-12-31 Thread He Chen
I bought some items from a commercial site, because of the unique channel of purchases, product prices unexpected, I think you can go to see: elesales.com , high-quality products can also attract you.

Re: HDFS FS Commands Hanging System

2010-12-31 Thread Todd Lipcon
Hi Jon, Try: HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls / -Todd On Fri, Dec 31, 2010 at 11:20 AM, Jon Lederman wrote: > Hi Michael, > > Thanks for your response. It doesn't seem to be an issue with safemode. > > Even when I try the command dfsadmin -safemode get, the system hangs. I am >

Re: HDFS FS Commands Hanging System

2010-12-31 Thread Jon Lederman
Hi Michael, Thanks for your response. It doesn't seem to be an issue with safemode. Even when I try the command dfsadmin -safemode get, the system hangs. I am unable to execute any FS shell commands other than hadoop fs -help. I am wondering whether this an issue with communication between th

Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread W.P. McNeill
Found it under /opt/hadoop/contrib/streaming. I am now able to run Hadoop streaming jobs on my laptop. By the way, here is the documentation I found confusing: http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Hadoop+Streaming This seems to apply to my install, but says that the strea

Re: Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread Ken Goodhope
It is one of the contrib modules. If you look in the src dir you will see a contrib dir containing all the contrib modules. On Dec 31, 2010 10:38 AM, "W.P. McNeill" wrote: > I installed the Apache distribution of Hadoop on > my laptop and set it up to run in local mode.

Is hadoop-streaming.jar part of the Apache distribution?

2010-12-31 Thread W.P. McNeill
I installed the Apache distribution of Hadoop on my laptop and set it up to run in local mode. It's working for me, but I can't find the hadoop-streaming.jar file. It is nowhere under the Hadoop home directory. The root of the Hadoop home directory contains the follow

RE:HDFS FS Commands Hanging System

2010-12-31 Thread Black, Michael (IS)
Try checking your dfs status hadoop dfsadmin -safemode get Probably says "ON" hadoop dfsadmin -safemode leave Somebody else can probably say how to make this happen every reboot Michael D. Black Senior Scientist Advanced Analytics Directorate Northrop Grumman Information Systems __

HDFS FS Commands Hanging System

2010-12-31 Thread Jon Lederman
Hi All, I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode. I have been successful in getting SSH configured. I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from

Re: Multiple Input Data Processing using MapReduce

2010-12-31 Thread Harsh J
It is map.input.file [.start and .length also relate to the InputSplit for the mapper] For more: http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Task+JVM+Reuse With a custom RR, you can put in this value yourself (FileSplit.getPath()) before control heads to the Mapper/MapRunner

Re: Multiple Input Data Processing using MapReduce

2010-12-31 Thread Zhou, Yunqing
You can use "map.input.split"(something like that, I can't remember..) param in Configuration. this param contains the input file path, you can use it to branch your logic this param can be found in TextInputFormat.java On Thu, Oct 14, 2010 at 10:03 PM, Matthew John wrote: > Hi all , > > I have

Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Zhou, Yunqing
You should implement the Directory class by your self. Nutch provided one, named HDFSDirectory. You can use it to build the index, but when doing search on HDFS, it is relatively slower, especially on phrase queries. I recommend you to download it to disk when performing a search. On Fri, Dec 31,

Re: Help for the problem of running lucene on Hadoop

2010-12-31 Thread Eason.Lee
You'd better make the index in localfile,and copy the final index into the hdfs~~ It is not recommanded to using hdfs as the FileSystem for lucene(Though it can be used for search) 2010/12/31 Jander g > Hi, all > > I want to run lucene on Hadoop, The problem as follows: > > IndexWriter writer

Re: ClassNotFoundException

2010-12-31 Thread Harsh J
The answer is in your log output: 10/12/31 10:26:54 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String). Alternatively, use Job.setJarByClass(Class class); On Fri, Dec 31, 2010 at 3:02 PM, Cavus,M.,Fa. Post Direkt wrote: > I look in my

ClassNotFoundException

2010-12-31 Thread Cavus,M.,Fa. Post Direkt
I look in my Jar File but I get a ClassNotFoundException why?: $ jar -xvf hd.jar dekomprimiert: META-INF/MANIFEST.MF dekomprimiert: org/postdirekt/hadoop/Map.class dekomprimiert: org/postdirekt/hadoop/Map.java dekomprimiert: org/postdirekt/hadoop/WordCount.class dekomprimiert: org/postdirekt/hadoo

RE: Retrying connect to server

2010-12-31 Thread Cavus,M.,Fa. Post Direkt
Hi, I've forgotten to start start-mapred.sh Thanks All -Original Message- From: Cavus,M.,Fa. Post Direkt [mailto:m.ca...@postdirekt.de] Sent: Friday, December 31, 2010 10:20 AM To: common-user@hadoop.apache.org Subject: RE: Retrying connect to server Hi, I do get this: $ jps 6017 DataNo

RE: Retrying connect to server

2010-12-31 Thread Cavus,M.,Fa. Post Direkt
Hi, I do get this: $ jps 6017 DataNode 5805 NameNode 6234 SecondaryNameNode 6354 Jps What can I do to start JobTracker? Here my config Files: $ cat mapred-site.xml mapred.job.tracker localhost:9001 The host and port that the MapReduce job tracker runs at. cat hdfs-site.xml dfs.

Help for the problem of running lucene on Hadoop

2010-12-31 Thread Jander g
Hi, all I want to run lucene on Hadoop, The problem as follows: IndexWriter writer = new IndexWriter(FSDirectory.open(new File("index")),new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); when using Hadoop, whether the first param must be the dir of HDFS? And how to use? Thanks