/
@marcosluis2186 http://twitter.com/marcosluis2186
**
http://www.uci.cu/
--
Jay Vyas
MMSB/UCHC
Hmmm... I always make this mistake on my hadoop vm -- trying to set
parameters which require xml settings in the conf.setInt(...) API at
runtime, which sometimes has no effect.
How can we know, (without having to individually troubleshoot a parameter)
which parameters CAN versus CANNOT be set
VM.
Replacing the
1.4 jar with the 1.7 does seem to fix the problem but this doesn't
seem too
sane. Hopefully there is a better alternative.
Thanks!
--
Harsh J
--
Jay Vyas
MMSB/UCHC
Presumably, if you have a reasonable number of cores - speeding the cores
up will be better than forking a task into smaller and smaller chunks -
because at some point the overhead of multiple processes would be a
bottleneck - maybe due to streaming reads and writes? I'm sure each and
every
analysis rather than any processing on the files
themselves.
In other words, what I really want is a distributed, resilient, scalable
filesystem.
Is Hadoop suitable if we just use this facility, or would I be misusing
it
and inviting grief?
M
--
Harsh J
--
Jay Vyas
MMSB/UCHC
and Reducers in a job?
- What is the performance and best practices of using Hadoop
counters? I am not sure if using Hadoop counters too heavy, there will be
performance downgrade to the whole job?
regards,
Lin
--
Bertrand Dechoux
--
Jay Vyas
http://jayunit100
as MapReduce coding.
I would like to subscribe to the mailing list.
Is it possible that I get mails only when a response is provided to the
queries I post?
Thanks and regards !
--
Jay Vyas
http://jayunit100.blogspot.com
Hive: Know SQL internals - how joins work, data structures and disk
algorithms, etc.. And how those would be implemented in MapReduce. Know
what a projection, aggregation, etc.. is.
Hadoop: Know how terasort works, know how word count works, and know about
why java serialization is non ideal.
Amazon has a really cheap, large scale backup solution called glacier which
is good if your just backing up for the sake of archival in emergencies.
If you need the archival to be performant, than you might want to just
consider a higher replication rate.
de ce courriel, supprimez-le et contactez immédiatement
l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent
courriel
--
Jay Vyas
http://jayunit100.blogspot.com
What do you mean immutable? Do u mean non modifiable maybe .? Immutable
implies that they can't be deleted .
Jay Vyas
MMSB
UCHC
On Nov 8, 2012, at 5:28 PM, Mohammad Tariq donta...@gmail.com wrote:
Files are immutable, once written into the Hdfs. And touchz creates a file of
0 length
Wow that's an awesome trick.! Okay thanks.
Jay Vyas
MMSB
UCHC
On Nov 13, 2012, at 3:56 AM, Bertrand Dechoux decho...@gmail.com wrote:
You should look at the job conf file.
You will see that indeed the class for the mapper and reducer are explicitly
written.
So if you generate the class
Hmmm What do you mean wrong configuration file.? How could that ever happen?
Jay Vyas
On Nov 13, 2012, at 10:25 AM, Mark Kerzner mark.kerz...@shmsoft.com wrote:
Exactly! I found the right one, and it is 80.
Thank you,
Mark
On Tue, Nov 13, 2012 at 10:23 AM, Serge Blazhiyevskyy
cluster. MAybe, for example, data nodes could log the amount of time spent
on I/O for certain files as a way of reporting wether or not
defragmentation needed to be run on a particular node in a cluster.
--
Jay Vyas
http://jayunit100.blogspot.com
This question is fundamentally flawed : it assumes that a mapper will ask for
anything.
The mapper class run method reads from a record reader. The question you
really should ask is :
How does a RecordReader read records across block boundaries?
Jay Vyas
http://jayunit100.blogspot.com
Hmm... so when a record reader calls fs.open(...) , I guess Im looking for
an example of how the input stream is created... ?
...@gmail.comwrote:
Yes. another big data, data scientist, no ops, devops, cloud computing
specialist is born. Thank goodness we have multiple choice tests to
identify the best coders and administrators.
--
Jay Vyas
http://jayunit100.blogspot.com
of the FileSystem class simply seems to indicate
the working directory for a given filesystem as set by applications.
They don't seem very related per se, unless I am missing something ?
Thanks
Hemanth
On Tue, Jan 15, 2013 at 2:54 AM, Jay Vyas jayunit...@gmail.com wrote:
Hi guys: What
ah okay. so - in default hadoop dfs, the workingDir is (i beleive)
/user/hadoop/ , because as i recall when putting a file into hdfs, that
seems to be where the files naturally end up if there is no path
specified.
Is there a way to have the Configuration report wether or not it was able /
unable to find the default configuration resources? It looks at the moment
that it simply prints out all the resources it *wants* to find, but it
doesnt actually report the files which it *did* find on the classpath.
()?
Thanks,
Tom
On Wed, Jan 16, 2013 at 4:33 PM, Jay Vyas jayunit...@gmail.com wrote:
Hi guys:
I've finally extracted my problem of loading a special filesystem
into a unit test.
Below, clearly, Im creating a raw configuration and adding a single
resource
to it (core-site.xml
...@ezako.comwrote:
conf.addResource(file.getAbsoluteFile().toURI().toURL());
--
Jay Vyas
http://jayunit100.blogspot.com
).
Julien
2013/1/17 Jay Vyas jayunit...@gmail.com
Good catch with that string.length() - you're right, that was a silly
mistake. --- sorry - im not sure what i was thinking. it was a late night
:)
In any case, the same code with file.exists() fails... i've validated
that path many ways
local location.. it is pointing to hdfs..
What am I doing wrong?
For reference, I am trying to run this code:
http://kickstarthadoop.blogspot.com/2011/09/joins-with-plain-map-reduce.html
THanks
--
Jay Vyas
http://jayunit100.blogspot.com
i don't think you can't do an embarassingly parallel sort of a randomly
ordered file without merging results.
However, if you know that the file is psudeoordered:
1123
1232
1000
19991019
20200222
30111
3000
Then you can (maybe) sort the individual blocks in mappers using
well.. ok... i guess you could have a 1TB block do an in place sort on the
file, write it to a tmp directory, and then spill the records in order or
something. at that point might as well not use hadoop.
michael_se...@hotmail.comwrote:
Why do you need a 1TB block?
On Feb 15, 2013, at 1:29 PM, Jay Vyas jayunit...@gmail.com wrote:
well.. ok... i guess you could have a 1TB block do an in place sort on the
file, write it to a tmp directory, and then spill the records in order or
something
Wow that's very heavy weight and difficult to modify. Why not graphviz or
generating the diagrams from some Or text format.?
On Feb 25, 2013, at 4:11 AM, David Parks davidpark...@yahoo.com wrote:
We’ve taken to documenting our Hadoop jobs in a simple visual manner using
PPT (attached
:1522)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3821)
--
Jay Vyas
http://jayunit100.blogspot.com
message and, also, maybe I shouldn't
be using /usr/lib/hadoop/mapred as an actively writen directory.
On Wed, Mar 6, 2013 at 11:21 AM, Jay Vyas jayunit...@gmail.com wrote:
Hi guys: I'm getting an odd error involving a file called toBeDeleted.
I've never seen this - somehow its blocking my task
Hi guys:
Is there a mvn repo for hadoop's 3.0.0 trunk build?
Clearly the hadoop pom.xml allows us to build hadoop from scratch and
installs it as 3.0.0-SNAPSHOT -- but its not clear wether there is a
published version of this
snapshot jar somewhere.
--
Jay Vyas
http://jayunit100.blogspot.com
:
https://repository.apache.org/content/groups/snapshots
-Giri
On Sat, Apr 6, 2013 at 2:00 PM, Harsh J ha...@cloudera.com wrote:
I don't think we publish nightly or rolling jars anywhere on maven
central from trunk builds.
On Sun, Apr 7, 2013 at 2:17 AM, Jay Vyas jayunit
Hmmm.. maybe im missing something.. but (@bjorn) Why would you use hdfs as a
replacement for the distributed cache?
After all - the distributed cache is just a file with replication over the
whole cluster, which isn't in hdfs. Cant you Just make the cache size big and
store the file there?
into individual tasks
So, my (related) questions are:
Is there a way to start a job directly from a job.xml file?
What components depend on and read the job.xml file?
Where is the job.xml defined/documented (if anywhere)?
--
Jay Vyas
http://jayunit100.blogspot.com
the wrong codes, Or is there any other ways to bulid
fuse-dfs?
* * Please guide me .
* *
*
*
*Thanks *
regards
--
Jay Vyas
http://jayunit100.blogspot.com
/some_location/file /new_location/
Thanks, your responses are appreciated.
-- Kay
--
Jay Vyas
http://jayunit100.blogspot.com
How many intermediate keys? If small enough, you can keep them in memory. If
large, you can just wait for the job to finish and siphon them into your job as
input with the MultipleInputs API.
On Apr 20, 2013, at 10:43 AM, Vikas Jadhav vikascjadha...@gmail.com wrote:
Hello,
Can anyone help
OutputFormat is going to have to
find the corresponding file to append.
On Sun, Apr 21, 2013 at 10:54 PM, YouPeng Yang yypvsxf19870...@gmail.comwrote:
Hi All
Can I append a MR output file to an existed file on HDFS.
I‘m using CDH4.1.2 vs MRv2
Regards
--
Jay Vyas
http
reduce
dependencies and I am not sure which to pick. Are there other dependencies
need (such as JobConf)? What are the imports needed? During the
construction of the configuration what heuristics are used to find the
configuration for the Hadoop cluster?
** **
Thank you.
--
Jay Vyas
traffic... but maybe (1) or (2) will be a precise way to use
partitions as a poor mans block.
Just a thought - not sure if anyone has tried (1) or (2) before in order to
simulate blocks and increase locality by utilizing the partition API.
--
Jay Vyas
http://jayunit100.blogspot.com
/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 12:16 AM, Jay Vyas jayunit...@gmail.com wrote:
Hi guys:
Im wondering - if I'm running mapreduce jobs on a cluster with large
block sizes - can i increase performance with either:
1) A custom FileInputFormat
2) A custom partitioner
3) -DnumReducers
Clearly
it be a bottleneck from 'disk'
point of view??Are you not going away from the distributed paradigm??
Am I taking it in the correct way. Please correct me if I am getting it
wrong.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 12:34 AM, Jay Vyas jayunit
it considerably high will definitely give you some boost.
But it'll require a high level tinkering.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Wed, May 1, 2013 at 1:29 AM, Jay Vyas jayunit...@gmail.com wrote:
Yes it is a problem at the first stage. What I'm wondering
to implement on a single node:
cat ~/.ssh/id_rsa.pub /root/.ssh/authorized_keys
On Thu, May 16, 2013 at 11:31 AM, Jay Vyas jayunit...@gmail.com wrote:
Yes it is required -- in psuedodistributed node the jobtracker is not
necessarily aware that the task trackers / data nodes are on the same
Hi! a few weeks ago I had the same question... Tried a first iteration at
documenting this by going through the classes starting with key/value pairs in
the blog post below.
http://jayunit100.blogspot.com/2013/04/the-kv-pair-salmon-run-in-mapreduce-hdfs.html
Note it's not perfect yet but I
efficiently using MRv2 jobs.
thanks,
Rahul
--
Jay Vyas
http://jayunit100.blogspot.com
...@yahoo.in wrote:
Just wondering if anyone has any documentation or references to any
articles how to simulate a multi node cluster setup in 1 laptop with
hadoop
running on multiple ubuntu VMs. any help is appreciated.
Thanks
Sai
--
Jay Vyas
http://jayunit100.blogspot.com
the data is located. *
What are these interfaces and where they are in the source code? Is
there any manual for the interfaces?
Regards,
Mahmood
--
Jay Vyas
http://jayunit100.blogspot.com
is and if I need to set any
particular env variables when doing so.
--
Jay Vyas
http://jayunit100.blogspot.com
(DataNode.java:1575)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1598)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1751)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1772)
--
Jay Vyas
http
someone help me out? It would be much appreciated. J
** **
Thanks in advance,
** **
Andrew
--
Jay Vyas
http://jayunit100.blogspot.com
)
at
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
--
Jay Vyas
http://jayunit100.blogspot.com
”
configuration in client with the HDFS.
** **
Thanks
Devaraj k
** **
*From:* Jay Vyas [mailto:jayunit...@gmail.com]
*Sent:* 12 July 2013 04:12
*To:* common-u...@hadoop.apache.org
*Subject:* Staging directory ENOTDIR error.
** **
Hi , I'm getting an ungoogleable
True that it deserves some posting on solr, but i think It's still partially
relevant...
The SolrInputFormat and SolrOutputFormat handle this for you and will be used
in your map reduce jobs .
They will output one core. per reducer, where each reducer corresponds to a
core.. This is
Then is this a bug? Synchronization in absence of any race condition is
normally considered bad.
In any case id like to know why this writer is synchronized whereas the other
one are not.. That is, I think, then point at issue: either other writers
should be synchronized or else this one
an e-Science application to run on Hadoop?
Thanks.
Felipe
--
*--
-- Felipe Oliveira Gutierrez
-- felipe.o.gutier...@gmail.com
-- https://sites.google.com/site/lipe82/Home/diaadia*
--
Jay Vyas
http://jayunit100.blogspot.com
between this and the
ambari REST services, but not sure where to start digging.
I want to run some rest calls at the end of some jobs to query how many
tasks failed, etc...
Hopefully, I could get this in JSON rather than scraping HTML.
Thanks!
--
Jay Vyas
http://jayunit100.blogspot.com
this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
--
Jay Vyas
http://jayunit100.blogspot.com
iirc sequence files can be concatenated as is and read as one large file
but maybe im forgetting something.
inaccessible for developers, or am I missing something?
regards
tmp
--
Jay Vyas
http://jayunit100.blogspot.com
The way we have gotten around this in the past is extending and then
copying the private code and creating a brand new implementation.
On Thu, Sep 26, 2013 at 10:50 AM, Jay Vyas jayunit...@gmail.com wrote:
This is actually somewhat common in some of the hadoop core classes :
Private
the input splits.
Any help please.
Thanks
Sai
--
Jay Vyas
http://jayunit100.blogspot.com
are not the intended recipient or have
received this message in error, please notify me immediately and delete
this message from your computer system. Any unauthorized use or
distribution is prohibited. Please consider the environment before printing
this email.
--
Jay Vyas
http
, but depending on underlying
filesystem the semantics of this last modified time might vary.
Any thoughts on this?
--
Jay Vyas
http://jayunit100.blogspot.com
I believe there is a FUSE mount for hdfs which will allow you to open files
normally in your streaming app rather than requiring using the jav API.
Also consider that For Media and highly available binary data for a front end I
would guess that hdfs might be overkill because of the
is packaged into
a jar anymore... but i fear it is not.
--
Jay Vyas
http://jayunit100.blogspot.com
version is rewally important here..
- If 1.x, then Where (NN , JT , TT ?)
- if 2.x, then where? (AM, NM, ... ?) -- probably less likely here, since
the resources are ephemeral.
I know that some older 1x versions had an issue with the jobtracker having
an ever-expanding hashmap or something like
OS? So that if a user is authenticated by the LDAP
,who will also access the HDFS directory?
Regards
--
Jay Vyas
http://jayunit100.blogspot.com
are there any ways to plug in an alternate distributed cache implantation (I.e
when nodes of a cluster already have an nfs mount or other local data
service...)?
are not the intended recipient or have
received this message in error, please notify me immediately and delete
this message from your computer system. Any unauthorized use or
distribution is prohibited. Please consider the environment before printing
this email.
--
Jay Vyas
http://jayunit100
it uploads
files. So I am only looking to trace fs commands through the DFS shell. I
believe this should be require less work in debugging than actually going
to mapred VMs!
--
Best Regards,
Karim Ahmed Awara
On Mon, Dec 16, 2013 at 5:57 PM, Jay Vyas jayunit...@gmail.com wrote:
Excellent
I recently found a pretty simple and easy way to set ldap up for my machines on
rhel and wrote it up using jumpbox and authconfig.
If you are in the cloud and only need a quick easy ldap idh and nssswitch
setup, this is I think the easiest / cheapest way to do it.
I know rhel and fedora come
印 liyin.lian...@aliyun-inc.com wrote:
What is the difference between Hdfs.java and DistributedFileSystem.java
in Hadoop2?
Best Regards,
Liyin Liang
Tel: 78233
Email: liyin.lian...@alibaba-inc.com
--
Jay Vyas
http://jayunit100.blogspot.com
what is happening when you remove the shutdown hook ?is that supposed
to trigger an exception -
Hi folks:
At the **end** of a successful job, im getting some strange stack traces
this when using pig, however, it doesnt seem to be pig specific from
the stacktrace. Rather, it appears that the job client is attempting to do
something funny.
Anyone ever see this sort of exception in
issue with hadoop and
pig.
I'm using Java version - *1.6.0_31*
Please help me out.
--
Regards,
Viswa.J
--
Jay Vyas
http://jayunit100.blogspot.com
into this issue.
** Is hadoop fs -put inherently slower than a unix cpaction, regardless
of filesystem -- and if so , why? **
--
Jay Vyas
http://jayunit100.blogspot.com
No , im using a glob pattern, its all done in one put statement
On Tue, Jan 28, 2014 at 9:22 PM, Harsh J ha...@cloudera.com wrote:
Are you calling one command per file? That's bound to be slow as it
invokes a new JVM each time.
On Jan 29, 2014 7:15 AM, Jay Vyas jayunit...@gmail.com wrote
)
Add a file to be localized
and it works fine. The same way you were using DC before.. Well I am not
sure what would be the best answer, but if you are trying to use DC , I was
able to do it with Job class itself.
Regards
Prav
On Wed, Jan 29, 2014 at 9:27 PM, Jay Vyas jayunit
this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
--
Jay Vyas
http://jayunit100.blogspot.com
Im noticing that resource localization is much more complex in YARN than
MR1, in particular, the timestamps need to be identical, or else, an
exception is thrown.
i never saw that in MR1.
How did MR1 JobTrackers handle resource localization differently than MR2
App Masters?
--
Jay Vyas
http
is the simplest way to do this on the cloud?
Is there any way to do it for free?
Thank in advance
--
Jay Vyas
http://jayunit100.blogspot.com
)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
where can i find the root cause of the non-zero exit code ?
--
Jay Vyas
http://jayunit100
that
you can check under the container's work directory after it fails?
On Fri, Feb 14, 2014 at 9:46 AM, Jay Vyas jayunit...@gmail.com wrote:
I have a linux container that dies. The nodemanager logs only say:
WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor:
Exception
Feb 4 17:27 ..
drwx--x--- 2 htf htf 4096 Feb 4 17:27 .
-rw-rw-r-- 1 htf htf 50471 Feb 4 17:31 syslog
Regards
./g
-Original Message-
From: Jay Vyas [mailto:jayunit...@gmail.com]
Sent: Friday, February 14, 2014 7:02 AM
To: user@hadoop.apache.org
Cc: user
--
Jay Vyas
http://jayunit100.blogspot.com
is a must to parse
each
log line. It means log file could NOT be simply splitted, otherwise the
second split would lost the file format information.
How could each mapper get the first few lines in the file?
--
Harsh J
--
Jay Vyas
http://jayunit100.blogspot.com
of few
things, but as far as installation is concerned, it should be easily doable.
Regards
Prav
On Wed, Mar 19, 2014 at 3:41 PM, sri harsha rsharsh...@gmail.com wrote:
Hi all,
is it possible to install Mongodb on the same VM which consists hadoop?
--
amiable harsha
--
Jay Vyas
http
.
--
Jay Vyas
http://jayunit100.blogspot.com
see a gain in using a more efficient data
serialisation format for data files.
On Sun, Mar 30, 2014 at 9:09 PM, Jay Vyas jayunit...@gmail.com wrote:
Those are all great questions, and mostly difficultto answer.I havent
played with serialization APIs in some time, but let me try to give some
using Java, btw.
Thank you,
Natalia Connolly
--
Harsh J
--
Jay Vyas
http://jayunit100.blogspot.com
works fine if Kerberos
authentication is disabled. Any idea what what the problem could be?
Thanks,
Terance.
--
Jay Vyas
http://jayunit100.blogspot.com
of? Perhaps some permissions issues?
Thank you,
Natalia
--
Jay Vyas
http://jayunit100.blogspot.com
attempt.RMAppAttemptImpl:
appattempt_1398370674313_0004_01 State change from SUBMITTED to
SCHEDULED
--
Jay Vyas
http://jayunit100.blogspot.com
appattempt_1398370674313_0004_01 to scheduler from user: yarn
14/04/24 16:20:33 INFO attempt.RMAppAttemptImpl:
appattempt_1398370674313_0004_01 State change from SUBMITTED to SCHEDULED
--
Jay Vyas
http://jayunit100.blogspot.com
CONFIDENTIALITY NOTICE
NOTICE: This message is intended
Sounds oddSo (1) you got a filenotfound exception and (2) you fixed it by
commenting out memory specific config parameters?
Not sure how that would work... Any other details or am I missing something
else?
On May 11, 2014, at 4:16 AM, Tao Xiao xiaotao.cs@gmail.com wrote:
I'm sure
You can either use san to back your datanodes, or implement a custom FileSystem
over your san storage. Either would have different drawbacks depending on your
requirements.
helpful material are appreciated.
Manar,
--
--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com
--
jay vyas
Hbase is not harcoded to hdfs: it works on any file system that implements the
file system interface, we've run it on glusterfs for example. I assume some
have also run it on s3 and other alternative file systems .
** However **
For best performance, direct block io hooks on hdfs can boost
options?
Also, I have searched safari books online including rough cuts, but not
seeing books for the 2.4 release. If you know of a book for this release,
please share.
Thank you.
--
jay vyas
1 - 100 of 173 matches
Mail list logo