date:20111206

Re: Hive query taking too much time

2011-12-06 Thread Ayon Sinha

How about a simple Pig script with a load and a store statement? Set the max # reducers to say 20 or 30, that way you will only have 20-30 files as output. Then put these files in the Hive dir. Make sure to match the delimiters in Hive & Pig. -Ayon See My Photos on Flickr Also check out my Blo

Re: Hive query taking too much time

2011-12-06 Thread Vikas Srivastava

hey if u having the same col of all the files then you can easily merge by shell script list=`*.csv` $table=yourtable for file in $list do cat $file >>new_file.csv done hive -e "load data local inpath '$file' into table $table" it will merge all the files in single file then you can upload it in

Re: log4j format logs in Hive table

2011-12-06 Thread Aniket Mokashi

Pig has a Log loader in Piggybank. You can use that to generate the columns of that table and make the table point to it. Take a look-- https://github.com/apache/pig/tree/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/apachelog Thanks, Aniket On Tue, Dec 6, 2011 at 1

Re: How to see the intermediate results between AST and optimized logical query plan.

2011-12-06 Thread Mohit Gupta

Hi, I am trying to understand the output of hive Explain command. I found the documentation provided ( https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain ) to be of little help. Is there any other place where I can find the detailed documentation on this? Hiroyuki, were you ab

Re: log4j format logs in Hive table

2011-12-06 Thread Abhishek Pratap Singh

Hi Sangeetha, One more easier option is to use Flume Decorators to put some delimiter in you stream of data and then load the data into table. For example: Below data can be converted to say PIPE Delimited data (You an code for any delimiter) by using Flume decorators. [2011-10-17 16:30:57,281]

Re: Hive Reducers hanging - interesting problem - skew ?

2011-12-06 Thread Aaron Sun

Can you try "from B join A". One simple rule of join in Hive is "Largest table last". The smaller tables can then be buffered into distributed cache for fast retrieval and comparison. Thanks Aaron On Tue, Dec 6, 2011 at 4:01 AM, john smith wrote: > Hi Mark, > > Thanks for your response. I trie

Re: log4j format logs in Hive table

2011-12-06 Thread alo alt

Hi Sangeetha, sry, was on road and the answer tooks a while. As Mark wrote, SerDe will be a good start. If its usefull for you take a look at http://code.google.com/p/hive-json-serde/wiki/GettingStarted. - alex On Tue, Dec 6, 2011 at 10:26 AM, sangeetha k wrote: > Hi, > > Thanks for the resp

Re: log4j format logs in Hive table

2011-12-06 Thread Mark Grover

Hi Sangeetha, Hive uses SerDe (Serializer/Deserializer) for reading data from and writing to HDFS. You have many options for choosing the SerDe for your table. For example, if your file contains tab delimited fields, you could use the default SerDe (by not specifying any SerDe) and specify the de

Re: Hive web console - schema is empty

2011-12-06 Thread sangeetha k

I get this error message in the console.. 11/12/06 08:14:50 INFO DataNucleus.MetaData: Registering listener for metadata initialisation 11/12/06 08:14:50 INFO metastore.ObjectStore: Initialized ObjectStore 11/12/06 08:14:50 WARN DataNucleus.MetaData: MetaData Parser encountered an error in file

Hive web console - schema is empty

2011-12-06 Thread sangeetha k

Hi, I opened the web console for Hive using http://localhost:/hwi In the Browse Schema option, I could see only the default Hive table list name and description. Not able to view the tables. What should be issue? I have created 2 tables under default schema , but could not able to see th

Re: Hive query taking too much time

2011-12-06 Thread Mohit Gupta

Hi Paul, I am having the same problem. Do you know any efficient way of merging the files? -Mohit On Tue, Dec 6, 2011 at 8:14 PM, Paul Mackles wrote: > How much time is it spending in the map/reduce phases, respectively? The > large number of files could be creating a lot of mappers which creat

RE: Hive query taking too much time

2011-12-06 Thread Paul Mackles

How much time is it spending in the map/reduce phases, respectively? The large number of files could be creating a lot of mappers which create a lot of overhead. What happens if you merge the 2624 files into a smaller number like 24 or 48. That should speed up the mapper phase significantly. Fr

Re: Hive query taking too much time

2011-12-06 Thread Wojciech Langiewicz

Hi, In your case total file size isn't main factor that reduces performance, number of files is. To test this try merging those over 2000 files into one (or few) big, then upload it to HDFS and test hive performance (it should be definitely higher). It this works you should think about mergin

Re: Hive Reducers hanging - interesting problem - skew ?

2011-12-06 Thread john smith

Hi Mark, Thanks for your response. I tried skew optimization and I also saw the video by Lin and Namit. From what I understand about skew join, instead of a single go , they divide it into 2 stages. Stage1 Join non-skew pairs. and write the skew pairs into temporary files on HDFS. Stage 2 Do a M

Hive query taking too much time

2011-12-06 Thread Savant, Keshav

Hi All, My setup is hadoop-0.20.203.0 hive-0.7.1 I am having a total of 5 node cluster: 4 data nodes, 1 namenode (it is also acting as secondary name node). On namenode I have setup hive with HiveDerbyServerMode to support multiple hive server connection. I have inserted plain text C

Re: log4j format logs in Hive table

2011-12-06 Thread sangeetha k

Hi, Thanks for the response. Yes, You got my question. An example of my log message line will be as below: [2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0] [net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource] [Organization: Travelocity] [Client: AA] [Location o

Re: log4j format logs in Hive table

2011-12-06 Thread alo alt

Hi, I hope I understood your question correct - did you describe your table? Like "create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE;" row* = a name of your descision, Datatype look @documentation. After

Re: Hive query taking too much time

Re: Hive query taking too much time

Re: log4j format logs in Hive table

Re: How to see the intermediate results between AST and optimized logical query plan.

Re: log4j format logs in Hive table

Re: Hive Reducers hanging - interesting problem - skew ?

Re: log4j format logs in Hive table

Re: log4j format logs in Hive table

Re: Hive web console - schema is empty

Hive web console - schema is empty

Re: Hive query taking too much time

RE: Hive query taking too much time

Re: Hive query taking too much time

Re: Hive Reducers hanging - interesting problem - skew ?

Hive query taking too much time

Re: log4j format logs in Hive table

Re: log4j format logs in Hive table

17 matches

Site Navigation

Mail list logo

Footer information