any suggestions on IIS log storage and analysis?

2013-12-30 Thread Fengyun RAO
Hi, HDFS splits files into blocks, and mapreduce runs a map task for each block. However, Fields could be changed in IIS log files, which means fields in one block may depend on another, and thus make it not suitable for mapreduce job. It seems there should be some preprocess before storing and

Re: any suggestions on IIS log storage and analysis?

2013-12-30 Thread Azuryy Yu
You can run a mapreduce firstly, Join these data sets into one data set. then analyze the joined dataset. On Mon, Dec 30, 2013 at 3:58 PM, Fengyun RAO raofeng...@gmail.com wrote: Hi, HDFS splits files into blocks, and mapreduce runs a map task for each block. However, Fields could be

Re: any suggestions on IIS log storage and analysis?

2013-12-30 Thread Fengyun RAO
what do you mean by join the data sets? a fake sample log file: #Software: Microsoft Internet Information Services 7.5 #Version: 1.0 #Date: 2013-07-04 20:00:00 #Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status

Write an object into hadoop hdfs issue

2013-12-30 Thread unmesha sreeveni
I am trying to write an object into hdfs . public static Split *currentsplit *= new Split(); *Split currentsplit = new Split();* *Path p = new Path(C45/mysavedobject);* *ObjectOutputStream oos = new ObjectOutputStream(fs.create(p));* *oos.writeObject(currentsplit);* *oos.close();* But I am not

Job fails while re attempting the task in multiple outputs case

2013-12-30 Thread AnilKumar B
Hi, I am using multiple outputs in our job. So whenever any reduce task fails, all it's next task attempts are failing with file exist exception. The output file name should also append the task attempt right? But it's only appending the task id. Is this the bug or Some thing wrong from my

Error: Java Heap space

2013-12-30 Thread Ranjini Rathinam
Hi, While excuting the word count mapreduce program , input file is 95.2 MB. the error occures like this Error: Java Heap space I have added -D mapred.child.java.opts=Xmx4096M in runtime also, but the error has not solve . In code also i have written conf.mapred.map.java.task=Xmx512M for

Re: Error: Java Heap space

2013-12-30 Thread Dieter De Witte
not sure but i think you need to write =-Xmx, you forgot the dash.. 2013/12/30 Ranjini Rathinam ranjinibe...@gmail.com Hi, While excuting the word count mapreduce program , input file is 95.2 MB. the error occures like this Error: Java Heap space I have added -D

Re:conf.set() and conf.get()

2013-12-30 Thread ??????
json string to java object and then java object to json string then conf.set(yourkey,jsonStr); -- - BestWishes?? Blog:http://snv.iteye.com/ Email:1134687...@qq.com -- Original -- From: unmesha

Re: Write an object into hadoop hdfs issue

2013-12-30 Thread Chris Mawata
Not unique to hdfs. The same thing would happen on your local file system or anywhere and any way you store the state of the object outside of the JVM. That is why singletons should not be serializable. Chris On Dec 30, 2013 5:46 AM, unmesha sreeveni unmeshab...@gmail.com wrote: I am trying to

Re: Job fails while re attempting the task in multiple outputs case

2013-12-30 Thread Harsh J
Are you using the MultipleOutputs class shipped with Apache Hadoop or one of your own? If its the latter, please take a look at gotchas to take care of described at http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2Fwrite-to_hdfs_files_directly_from_map.2Freduce_tasks.3F On Mon, Dec 30,

Unable to access the link

2013-12-30 Thread navaz
Hi I am using below instruction set to set up hadoop cluster. https://www.dropbox.com/s/05aurcp42asuktp/Chiu%20Hadoop%20Pig%20Install%20Instructions.docx and to download hadoop using *https://www.dropbox.com/s/znonl6ia1259by3/hadoop-1.1.2.tar.gz

RE: any suggestions on IIS log storage and analysis?

2013-12-30 Thread java8964
I don't know any example of IIS log files. But from what you described, it looks like analyzing one line of log data depends on some previous lines data. You should be more clear about what is this dependence and what you are trying to do. Just based on your questions, you still have different

RE: Unable to access the link

2013-12-30 Thread java8964
What's wrong to download it from the Apache official website? http://archive.apache.org/dist/hadoop/core/hadoop-1.1.2/ Yong Date: Mon, 30 Dec 2013 11:42:25 -0500 Subject: Unable to access the link From: navaz@gmail.com To: user@hadoop.apache.org Hi I am using below instruction set to set up

Re: Unable to access the link

2013-12-30 Thread navaz
Thanks. But I am following the steps mentioned in the above file. Also i am interested use the same wordcount program and gettysburg.txt which is used in the instructions set. On Mon, Dec 30, 2013 at 11:49 AM, java8964 java8...@hotmail.com wrote: What's wrong to download it from the Apache

Re: Job fails while re attempting the task in multiple outputs case

2013-12-30 Thread AnilKumar B
Thanks Harsh. @Are you using the MultipleOutputs class shipped with Apache Hadoop or one of your own? I am using Apache Hadoop's multipleOutputs. But as you see in stack trace, it's not appending the attempt id to file name, it's only consists of task id. Thanks Regards, B Anil Kumar. On

Re: Unable to access the link

2013-12-30 Thread Devin Suiter RDX
The (509) error is telling you what the problem is: This account's public links are generating too much traffic and have been temporarily disabled! Which seems to mean, since it is Dropbox. there has been too much traffic directed towards the file or to other public links owned by the hoster's

MapReduce MIME Input type?

2013-12-30 Thread Devin Suiter RDX
Hi, I am trying to puzzle this out, and am hoping for some insight - I have an IMAP inbox dump that I am analyzing - I need to track how many times a given item is referred to in the inbox, i.e. how many emails came in about that thing and over what time. I can load it into MapReduce as

subscribe

2013-12-30 Thread sunqp

Re: subscribe

2013-12-30 Thread Ted Yu
You can find subscribe mail Ids on this page: http://hadoop.apache.org/mailing_lists.html On Mon, Dec 30, 2013 at 12:10 AM, sunqp qipeng@gmail.com wrote:

Re: Job fails while re attempting the task in multiple outputs case

2013-12-30 Thread Jiayu Ji
I think if the task fails, the output related to that task will be clean up before the second attempt. I am guessing you have this exception is because you have two reducers tried to write to the same file. One thing you need to be aware of is that all data that is supposed to be in the same file

Re: Unable to access the link

2013-12-30 Thread navaz
Thank you. Now the link is up. I have saved the file which i required. On Mon, Dec 30, 2013 at 12:43 PM, Devin Suiter RDX dsui...@rdx.com wrote: The (509) error is telling you what the problem is: This account's public links are generating too much traffic and have been temporarily

Re: any suggestions on IIS log storage and analysis?

2013-12-30 Thread Fengyun RAO
Thanks, Yong! The dependence never cross files, but since HDFS splits files into blocks, it may cross blocks, which makes it difficult to write MR job. I don't quite understand what you mean by WholeFileInputFormat . Actually, I have no idea how to deal with dependence across blocks. 2013/12/31

RE: any suggestions on IIS log storage and analysis?

2013-12-30 Thread java8964
Google Hadoop WholeFileInputFormat or search it in book Hadoop: The Definitive Guide Yong Date: Tue, 31 Dec 2013 09:39:58 +0800 Subject: Re: any suggestions on IIS log storage and analysis? From: raofeng...@gmail.com To: user@hadoop.apache.org Thanks, Yong! The dependence never cross files,

get job data in command line in MRv2

2013-12-30 Thread xeon
Hi, I would like to know if the MRv2 provide the following commands through the bash command line: - get the number of jobs running? - get the percentage of job completion of jobs? - get the number of jobs that are waiting to be submitted? -- Thanks,

Re:get job data in command line in MRv2

2013-12-30 Thread ??????
ui or hadoop job command like??hadoop job -list -- - BestWishes?? ?? Blog:http://snv.iteye.com/ Email:1134687...@qq.com -- Original -- From: xeon;psdc1...@gmail.com; Date: Tue, Dec 31, 2013

Re: get job data in command line in MRv2

2013-12-30 Thread Azuryy Yu
Generally, MRv2 indicates Yarn. you can try: yarn application then there are full help lists. On Tue, Dec 31, 2013 at 12:32 PM, 小网客 smallnetvisi...@foxmail.com wrote: ui or hadoop job command like:hadoop job -list -- -

Re: Error: Java Heap space

2013-12-30 Thread Ranjini Rathinam
i have used the dash(-) but still the error is coming. Not ablre to fix. Please help to fix it. Regards, Ranjini On Mon, Dec 30, 2013 at 5:24 PM, Dieter De Witte drdwi...@gmail.com wrote: not sure but i think you need to write =-Xmx, you forgot the dash.. 2013/12/30 Ranjini Rathinam

LookUp in mapreduce

2013-12-30 Thread Ranjini Rathinam
Hi, I want to compare the value from one hbase table to another hbase table value , and need to add one column as valid indicator if value is matching mark the field has 0 or not matching means 1. i have used Filter command in mapreduce code but the column is not printing in hbase table.

Re: MapReduce MIME Input type?

2013-12-30 Thread Harsh J
Hey Devin, Are you perhaps looking for http://james.apache.org/mime4j/? You may have to adapt it for MR but I don't imagine that would be too difficult to do. On Mon, Dec 30, 2013 at 11:59 PM, Devin Suiter RDX dsui...@rdx.com wrote: Hi, I am trying to puzzle this out, and am hoping for some

Re: LookUp in mapreduce

2013-12-30 Thread Harsh J
-user@hadoop (bcc) Please ask HBase questions on its own lists (u...@hbase.apache.org, you may have to subscribe) You're constructing a Put object. Do you then call table.put(obj) to actually send it to the table? On Tue, Dec 31, 2013 at 11:31 AM, Ranjini Rathinam ranjinibe...@gmail.com wrote: