How about a simple Pig script with a load and a store statement? Set the max #
reducers to say 20 or 30, that way you will only have 20-30 files as output.
Then put these files in the Hive dir. Make sure to match the delimiters in Hive
& Pig.
-Ayon
See My Photos on Flickr
Also check out my Blo
hey if u having the same col of all the files then you can easily merge by
shell script
list=`*.csv`
$table=yourtable
for file in $list
do
cat $file >>new_file.csv
done
hive -e "load data local inpath '$file' into table $table"
it will merge all the files in single file then you can upload it in
Pig has a Log loader in Piggybank. You can use that to generate the columns
of that table and make the table point to it.
Take a look--
https://github.com/apache/pig/tree/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/apachelog
Thanks,
Aniket
On Tue, Dec 6, 2011 at 1
Hi,
I am trying to understand the output of hive Explain command. I found the
documentation provided (
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain )
to be of little help. Is there any other place where I can find the
detailed documentation on this?
Hiroyuki, were you ab
Hi Sangeetha,
One more easier option is to use Flume Decorators to put some delimiter in
you stream of data and then load the data into table.
For example:
Below data can be converted to say PIPE Delimited data (You an code for any
delimiter) by using Flume decorators.
[2011-10-17 16:30:57,281]
Can you try "from B join A".
One simple rule of join in Hive is "Largest table last". The smaller tables
can then be buffered into distributed cache for fast retrieval and
comparison.
Thanks
Aaron
On Tue, Dec 6, 2011 at 4:01 AM, john smith wrote:
> Hi Mark,
>
> Thanks for your response. I trie
Hi Sangeetha,
sry, was on road and the answer tooks a while.
As Mark wrote, SerDe will be a good start. If its usefull for you take a
look at http://code.google.com/p/hive-json-serde/wiki/GettingStarted.
- alex
On Tue, Dec 6, 2011 at 10:26 AM, sangeetha k wrote:
> Hi,
>
> Thanks for the resp
Hi Sangeetha,
Hive uses SerDe (Serializer/Deserializer) for reading data from and writing to
HDFS. You have many options for choosing the SerDe for your table.
For example, if your file contains tab delimited fields, you could use the
default SerDe (by not specifying any SerDe) and specify the de
I get this error message in the console..
11/12/06 08:14:50 INFO DataNucleus.MetaData: Registering listener for metadata
initialisation
11/12/06 08:14:50 INFO metastore.ObjectStore: Initialized ObjectStore
11/12/06 08:14:50 WARN DataNucleus.MetaData: MetaData Parser encountered an
error in file
Hi,
I opened the web console for Hive using http://localhost:/hwi
In the Browse Schema option, I could see only the default Hive table list name
and description.
Not able to view the tables. What should be issue?
I have created 2 tables under default schema , but could not able to see th
Hi Paul,
I am having the same problem. Do you know any efficient way of merging the
files?
-Mohit
On Tue, Dec 6, 2011 at 8:14 PM, Paul Mackles wrote:
> How much time is it spending in the map/reduce phases, respectively? The
> large number of files could be creating a lot of mappers which creat
How much time is it spending in the map/reduce phases, respectively? The large
number of files could be creating a lot of mappers which create a lot of
overhead. What happens if you merge the 2624 files into a smaller number like
24 or 48. That should speed up the mapper phase significantly.
Fr
Hi,
In your case total file size isn't main factor that reduces performance,
number of files is.
To test this try merging those over 2000 files into one (or few) big,
then upload it to HDFS and test hive performance (it should be
definitely higher). It this works you should think about mergin
Hi Mark,
Thanks for your response. I tried skew optimization and I also saw the
video by Lin and Namit. From what I understand about skew join, instead of
a single go , they divide it into 2 stages.
Stage1
Join non-skew pairs. and write the skew pairs into temporary files on HDFS.
Stage 2
Do a M
Hi All,
My setup is
hadoop-0.20.203.0
hive-0.7.1
I am having a total of 5 node cluster: 4 data nodes, 1 namenode (it is
also acting as secondary name node). On namenode I have setup hive with
HiveDerbyServerMode to support multiple hive server connection.
I have inserted plain text C
Hi,
Thanks for the response.
Yes, You got my question.
An example of my log message line will be as below:
[2011-10-17 16:30:57,281] [ INFO] [33157362@qtp-28456974-0]
[net.hp.tr.webservice.referenceimplcustomer.resource.CustomersResource]
[Organization: Travelocity] [Client: AA] [Location o
Hi,
I hope I understood your question correct - did you describe your table?
Like
"create TABLE YOURTABLE (row1 STRING, row2 STRING, row3 STRING) ROW FORMAT
DELIMITED FIELDS TERMINATED BY 'YOUR TERMINATOR' STORED AS TEXTFILE;"
row* = a name of your descision, Datatype look @documentation.
After
17 matches
Mail list logo