Re: Allow setting of end-of-record delimiter for TextInputFormat

2012-06-18 Thread Sonal Goyal
Hi, The record delimiter is not to be specified while copying the file, but when you run the map reduce job. Just copy the file and specify the delimiter at the time of the job run. Best Regards, Sonal Crux: Reporting for HBase Nube Technologies

Re: Hbase + mapreduce -- operational design question

2011-09-10 Thread Sonal Goyal
Chinmay, how are you configuring your job? Have you checked using setScan and selecting the keys you care to run MR over? See http://ofps.oreilly.com/titles/9781449396107/mapreduce.html As a shameless plug - For your reports, see if you want to leverage Crux: https://github.com/sonalgoyal/crux B

Re: Too many maps?

2011-09-06 Thread Sonal Goyal
Mark, Having a large number of emitted key values from the mapper should not be a problem. Just make sure that you have enough reducers to handle the data so that the reduce stage does not become a bottleneck. Best Regards, Sonal Crux: Reporting for HBase Nube

Re: Hadoop in process?

2011-08-26 Thread Sonal Goyal
Hi Frank, You can use the ClusterMapReduceCase class from org.apache.hadoop.mapred. Here is an example of adapting it to Junit4 and running test dfs and cluster. https://github.com/sonalgoyal/hiho/blob/master/test/co/nubetech/hiho/common/HihoTestCase.java And here is a blog post that discusses

Re: Configuration settings

2011-06-21 Thread Sonal Goyal
Hi Mark, You can take a look at http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/ and http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/to configure your cluster. Along with the tasks, you can change the child jvm heap size,

Re: Estimating Time required to compute M/Rjob

2011-04-16 Thread Sonal Goyal
What is your MR job doing? What is the amount of data it is processing? What kind of a cluster do you have? Would you be able to share some details about what you are trying to do? If you are looking for metrics, you could look at the Terasort run .. Thanks and Regards, Sonal

Re: Question on hadoop installation and setup - Pseudo-distributed mode

2011-04-16 Thread Sonal Goyal
I see a space in fs. default.name after fs and hdfs: //, is that intentional or a typo? Thanks and Regards, Sonal Hadoop ETL and Data Integration Nube Technologies

Re: "Retrying connect" error while configuring hadoop

2011-04-12 Thread Sonal Goyal
Are your datanodes and namenode machines able to see each other - ping etc? Is the /etc/hosts configured correctly? Is the namenode process(seen through jps on master) up ? Thanks and Regards, Sonal Hadoop ETL and Data Integration

Re: Hadoop EC2 setup

2011-03-13 Thread Sonal Goyal
Please make sure that the AWS EC2 command line tools are installed and the environment variables EC2_HOME, EC2_CERT, EC2_PRIVATE_KEY and PATH are set. Thanks and Regards, Sonal Hadoop ETL and Data Integration Nube Technologies

Re: Setting java.library.path for map-reduce job

2011-02-28 Thread Sonal Goyal
tech.co> <http://in.linkedin.com/in/sonalgoyal> On Tue, Mar 1, 2011 at 9:34 AM, Adarsh Sharma wrote: > Sonal Goyal wrote: > >> Adarsh, >> >> Are you trying to distribute both the native library and the jcuda.jar? >> Could you please explain your job's d

Re: Setting java.library.path for map-reduce job

2011-02-28 Thread Sonal Goyal
echnologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Mon, Feb 28, 2011 at 6:54 PM, Adarsh Sharma wrote: > Sonal Goyal wrote: > >> Hi Adarsh, >> >> I think your mapred.cache.files property has an extra space at the end. >> Try >>

Re: Setting java.library.path for map-reduce job

2011-02-28 Thread Sonal Goyal
Hi Adarsh, I think your mapred.cache.files property has an extra space at the end. Try removing that and let us know how it goes. Thanks and Regards, Sonal Hadoop ETL and Data Integration Nube Technologies

Re: Hadoop XML Error

2011-02-07 Thread Sonal Goyal
Mike, This error is not related to malformed XML files etc you are trying to copy, but because for some reason, the source or destination listing can not be retrieved/parsed. Are you trying to copy between diff versions of clusters? As far as I know, your destination should be writable, distcp sho

Re: Import data from mysql

2011-01-08 Thread Sonal Goyal
Hi Brian, You can check HIHO at https://github.com/sonalgoyal/hiho which can help you load data from any JDBC database to the Hadoop file system. If your table has a date or id field, or any indicator for modified/newly added rows, you can import only the altered rows every day. Please let me know

Re: Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

2011-01-07 Thread Sonal Goyal
Which Hadoop versions are you testing and compiling against? Thanks and Regards, Sonal Connect Hadoop with databases, Salesforce, FTP servers and others Nube Technologies

Re: How to manage large record in MapReduce

2011-01-07 Thread Sonal Goyal
Jerome, You can take a look at FileStreamInputFormat at https://github.com/sonalgoyal/hiho/tree/hihoApache0.20/src/co/nubetech/hiho/mapreduce/lib/input This provides an input stream per file. In our case, we are using the input stream to load data into the database directly. Maybe you can use thi

Re: Dumping Cassandra into Hadoop

2010-10-19 Thread Sonal Goyal
Have you checked https://issues.apache.org/jira/browse/CASSANDRA-913 ? Thanks and Regards, Sonal Sonal Goyal | Founder and CEO | Nube Technologies LLP http://www.nubetech.co | http://in.linkedin.com/in/sonalgoyal On Tue, Oct 19, 2010 at 8:31 PM, Mark wrote: > As the subject implies I

Re: Help for Sqlserver querying with hadoop

2010-09-25 Thread Sonal Goyal
Biju, Have you tried using DataDrivenDBInputFormat? Thanks and Regards, Sonal Sonal Goyal | Founder and CEO | Nube Technologies LLP Ph: +91-8800541717 | so...@nubetech.co | Skype: sonal.goyal http://www.nubetech.co | http://in.linkedin.com/in/sonalgoyal On Fri, Sep 24, 2010 at 2:06 PM

Re: Hadoop 0.21.0 release Maven repo

2010-09-12 Thread Sonal Goyal
e HDFS-1292 and MAPREDUCE-1929. > > Cheers, > Tom > > On Fri, Sep 10, 2010 at 1:33 PM, Sonal Goyal > wrote: > > Hi, > > > > Can someone please point me to the Maven repo for 0.21 release? Thanks. > > > > Thanks and Regards, > > Sonal > > www.meghsoft.com > > http://in.linkedin.com/in/sonalgoyal > > >

Hadoop 0.21.0 release Maven repo

2010-09-10 Thread Sonal Goyal
Hi, Can someone please point me to the Maven repo for 0.21 release? Thanks. Thanks and Regards, Sonal www.meghsoft.com http://in.linkedin.com/in/sonalgoyal

Re: Hive JDBC Connection Timeout

2010-06-17 Thread Sonal Goyal
See if this works: DriverManger.setLoginTimeout(...); Thanks and Regards, Sonal www.meghsoft.com http://in.linkedin.com/in/sonalgoyal On Thu, Jun 17, 2010 at 10:20 PM, T2thenike wrote: > > I am working with complex Hive queries and moderate amounts of data.  I am > running into a problem whe

Re: How to add external jar file while running a hadoop program

2010-05-07 Thread Sonal Goyal
Akhil, For the rejar to work, the to be included jar has to be in the lib folder of the main jar. Thanks and Regards, Sonal www.meghsoft.com On Fri, May 7, 2010 at 3:31 PM, akhil1988 wrote: > > You need to jar the stanford-parser with your ep.jar > For this you canunjar the stanford-parser.ja

Re: having a directory as input split

2010-05-04 Thread Sonal Goyal
One way to do this will be: Create a DirectoryInputFormat which accepts the list of directories as inputs and emits each directory path in one split. Your custom RecordReader can then read this split and generate appropriate input for your mapper. Thanks and Regards, Sonal www.meghsoft.com On F

Re: Hbase & Hive

2010-04-30 Thread Sonal Goyal
If you are looking for an ORM layer for HBase, there is one at http://github.com/enis/gora Thanks and Regards, Sonal www.meghsoft.com On Sat, May 1, 2010 at 4:13 AM, Nick Dimiduk wrote: > If by "efficiently", you mean "low latency" then no, you will not get > ms-response time for your hive qu

Re: import multiple jar

2010-04-20 Thread Sonal Goyal
Hi, You can add your dependencies in the lib folder of your main jar. Hadoop will automatically distribute them to the cluster. You can also explore using DistributedCache or -libjars options. Thanks and Regards, Sonal www.meghsoft.com On Mon, Apr 19, 2010 at 7:54 PM, Gang Luo wrote: > Hi all

Re: Does Hadoop compress files?

2010-04-03 Thread Sonal Goyal
Hi, Please check http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Data+Compression Thanks and Regards, Sonal www.meghsoft.com On Sat, Apr 3, 2010 at 11:15 PM, u235sentinel wrote: > I'm starting to evaluate Hadoop. We are currently running Sensage and > store a lot of log file

Re: Manually splitting files in blocks

2010-03-24 Thread Sonal Goyal
Hi Yuri, You can also check the source code of FileInputFormat and create your own RecordReader implementation. Thanks and Regards, Sonal www.meghsoft.com On Wed, Mar 24, 2010 at 9:08 PM, Patrick Angeles wrote: > Yuri, > > Probably the easiest thing is to actually create distinct files and > co

Re: Sqoop Installation on Apache Hadop 0.20.2

2010-03-19 Thread Sonal Goyal
at hiho is a single map/reduce job handling the MySQL > hadoop Integration. Is it also possible to use it with other JDBC > connectors > too? > > Best Regards, > Utku > > On Fri, Mar 19, 2010 at 5:07 AM, Sonal Goyal > wrote: > > > Hi Utku, > > >

Re: Sqoop Installation on Apache Hadop 0.20.2

2010-03-18 Thread Sonal Goyal
Hi Utku, If MySQL is your target database, you may check Meghsoft's hiho: http://code.google.com/p/hiho/ The current release supports transferring data from Hadoop to the MySQL database. We will be releasing the functionality of transfer from MySQL to Hadoop soon, sometime next week. Thanks and

Re: WritableName can't load class in hive

2010-03-16 Thread Sonal Goyal
For some custom functions, I put the jar on the local path accessible to the CLI. Have you tried that? Thanks and Regards, Sonal On Tue, Mar 16, 2010 at 3:49 PM, Oded Rotem wrote: > We have a bunch of sequence files containing keys & values of custom > Writable classes that we wrote, in a HDFS

Re: Cloudera AMIs

2010-03-15 Thread Sonal Goyal
t; > Cheers, > Tom > > P.S. For Cloudera-specific questions please consider using the > Cloudera forum at http://getsatisfaction.com/cloudera > > On Sun, Mar 14, 2010 at 7:03 AM, Sonal Goyal > wrote: > > Hi, > > > > I want to know which Cloudera AMI su

Re: I want to group "similar" keys in the reducer.

2010-03-15 Thread Sonal Goyal
Hi Raymond, A custom partitioner is probably what you need. An alternate approach is to emit keys based on your pattern. Say you are currently emitting , , , You can instead emit > > > > Thanks and Regards, Sonal 2010/3/16 Jim Twensky > Hi Raymond, > > Take a look at > http://hadoop.apach

Cloudera AMIs

2010-03-14 Thread Sonal Goyal
Hi, I want to know which Cloudera AMI supports which Hadoop version. For example, ami-2932d440:cloudera-ec2-hadoop-images/cloudera-hadoop-ubuntu-20090602-i386.manifest.xml ami-ed59bf84: cloudera-ec2-hadoop-images/cloudera-hadoop-ubuntu-20090623-i386.manifest.xml Whats the difference between th

Re: where does jobtracker get the IP and port of namenode?

2010-03-09 Thread Sonal Goyal
and Regards, Sonal On Tue, Mar 9, 2010 at 3:53 PM, jiang licht wrote: > Thanks Sonal. How to set that debug mode? Actually I set > "dfs.namenode.logging.level" to "all". Please see my first and previous > posts for error messages. > > Thanks, > > Michael &

Re: where does jobtracker get the IP and port of namenode?

2010-03-09 Thread Sonal Goyal
Can you turn logging level to debug to see what the logs say? Thanks and Regards, Sonal On Tue, Mar 9, 2010 at 1:08 PM, jiang licht wrote: > I guess my confusion is this: > > I point "fs.default.name" to hdfs:A:50001 in core-site.xml (A is IP > address). I assume when tasktracker starts, it sh

Re: Ubuntu Single Node Tutorial failure. No live or dead nodes.

2010-02-13 Thread Sonal Goyal
0.20.2, > or (c) Cloudera's 0.20.1 based build at > http://archive.cloudera.com/cdh/2/hadoop-0.20.1+169.56.tar.gz which is > 0.20.1 plus 225 extra patches (incl most of what's in 0.20.2). > > -Todd > > On Sat, Feb 13, 2010 at 8:35 AM, Sonal Goyal > wrote: >

Re: Ubuntu Single Node Tutorial failure. No live or dead nodes.

2010-02-13 Thread Sonal Goyal
ck until HDFS is ready for user commands in read/write > mode. > - Aaron > > > On Fri, Feb 12, 2010 at 8:44 AM, Sonal Goyal > wrote: > > > Hi > > > > I had faced a similar issue on Ubuntu and Hadoop 0.20 and modified the > > start-all script to intr

Re: Ubuntu Single Node Tutorial failure. No live or dead nodes.

2010-02-12 Thread Sonal Goyal
Hi I had faced a similar issue on Ubuntu and Hadoop 0.20 and modified the start-all script to introduce a sleep time : bin=`dirname "$0"` bin=`cd "$bin"; pwd` . "$bin"/hadoop-config.sh # start dfs daemons "$bin"/start-dfs.sh --config $HADOOP_CONF_DIR *echo 'sleeping' sleep 60 echo 'awake'* # st

Re: DBOutputFormat Speed Issues

2010-02-01 Thread Sonal Goyal
Hi Nick, If you dont mind, can you please share your performance benchmarks of using DataDrivenInputFormat/DBInputFormat and MySQL? Thanks and Regards, Sonal On Mon, Feb 1, 2010 at 3:33 AM, Aaron Kimball wrote: > Nick, > > I'm afraid that right now the only available OutputFormat for JDBC is

DefaultStringifier throws NullPointer

2009-12-09 Thread Sonal Goyal
Hi, I need to store a object in the configuration. I am trying to use DefaultStringifier's load and store methods, but I get the following exception while storing: java.lang.NullPointerException at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.jav

Re: return in map

2009-12-06 Thread Sonal Goyal
Hi, Maybe you could post your code/logic for doing this. One way would be to set a flag once your criteria is met and emit keys based on the flag. Thanks and Regards, Sonal 2009/12/5 Gang Luo > Hi all, > I got a tricky problem. I input a small file manually to do some filtering > work on each