Re: CapacityScheduler questions - (AM) preemption

2019-10-11 Thread Lars Francke
ate RUNNING yarn.scheduler.capacity.root.foo.user-limit-factor 1 On Wed, Oct 9, 2019 at 10:56 PM Lars Francke wrote: > Sunil, > > thank you for the answer. > > This is HDP 3.1 based on Hadoop 3.1.1. > No preemption defaults were changed I believe. T

Re: CapacityScheduler questions - (AM) preemption

2019-10-09 Thread Lars Francke
> which are added) > > - Sunil > > On Wed, Oct 9, 2019 at 6:23 PM Lars Francke > wrote: > >> Hi, >> >> I've got a question about behavior we're seeing. >> >> Two queues: Preemption enabled, CapacityScheduler (happy to provide more >> config if

CapacityScheduler questions - (AM) preemption

2019-10-09 Thread Lars Francke
Hi, I've got a question about behavior we're seeing. Two queues: Preemption enabled, CapacityScheduler (happy to provide more config if needed), 50% of resources to each Submit a job to queue 1 which uses 100% of the cluster. Submit a job to queue 2 which doesn't get allocated because there are

Re: BlockPlacementPolicy question with hierarchical topology

2019-07-11 Thread Lars Francke
; > https://www.slideshare.net/Hadoop_Summit/disaster-recovery-experience-at-cacib-hardening-hadoop-for-critical-financial-applications > > Thanks, > - Takanobu > > > From: Takanobu Asanuma > Sent: Thursday, July 4, 2019 8:29:23 PM > To: Lar

Re: BlockPlacementPolicy question with hierarchical topology

2019-07-04 Thread Lars Francke
aultTolerant.java > > Thanks, > - Takanobu > ____ > From: Lars Francke > Sent: Thursday, July 4, 2019 18:15 > To: hdfs-user@hadoop.apache.org > Subject: BlockPlacementPolicy question with hierarchical topology > > Hi, > > I have a customer who wants

BlockPlacementPolicy question with hierarchical topology

2019-07-04 Thread Lars Francke
Hi, I have a customer who wants to make sure that copies of his data are distributed amongst datacenters. So they are using rack names like this /dc1/rack1, /dc1/rack2, /dc2/rack1 etc. Unfortunately, the BlockPlacementPolicyDefault seems to place all blocks on /dc1/* sometimes. Is there a way

Re: Failover & Cold start time and block reports

2019-05-20 Thread Lars Francke
Just pinging to see if anyone has any insight here? On Mon, May 13, 2019 at 10:31 PM Lars Francke wrote: > Hi, > > I'm working with a few clusters of 100+ nodes and I've been wondering how > exactly the failover, as well as a cold start, works in respect to the > block reports.

Failover & Cold start time and block reports

2019-05-13 Thread Lars Francke
Hi, I'm working with a few clusters of 100+ nodes and I've been wondering how exactly the failover, as well as a cold start, works in respect to the block reports. I sometimes see failover times of 15-45 minutes waiting in the safe mode for all blocks to report in. Datanodes usually send a

Re: Git tag policy

2018-02-26 Thread Lars Francke
<cdoug...@apache.org> wrote: > On Tue, Feb 20, 2018 at 3:09 AM, Lars Francke <lars.fran...@gmail.com> > wrote: > > Is this intentional or just oversight/inconsistencies? > > The release candidate (RC) tags are created during votes. They can > probably be cleaned up after th

Git tag policy

2018-02-20 Thread Lars Francke
Hi, can anyone tell me what the policies are around tags in git these days? I see from the HowToRelease[1] wiki that everything should be tagged in "rel/" but looking at the last releases I see: * release-3.0.1-RC0 * rel/release-3.0.0 ... * release-3.0.0-RC1 ... * rel/release-3.0.0-alpha4 etc.

CapacityScheduler vs. FairScheduler

2016-06-03 Thread Lars Francke
Hi, I've been using Hadoop for years and have always just taken for granted that FairScheduler = Cloudera and CapacityScheduler = Hortonworks/Yahoo. There are some comparisons but all of them are years old and somewhat (if not entirely) outdated. The documentation doesn't really help and neither

Re: Simulating an auto-incrementing column

2010-08-10 Thread Lars Francke
Hi Tim, I had a similar need and came across https://issues.apache.org/jira/browse/HIVE-1304 but haven't got round to trying it yet. well that looks exactly like what I'm looking for. The missing link for me was this line: set mapred.reduce.tasks=1; I've used it before but I don't know why

Simulating an auto-incrementing column

2010-08-09 Thread Lars Francke
Hi, I have a problem and I hope someone has an idea on how to solve it. My dataset consists of just very simple key-value pairs of strings coming from PostgreSQL using Sqoop. 1) I need to count how often a key occurs - Easy 2) I need to count how often a key-value pair occurs - Easy I need to

Re: worth choosing the shortest possible column names/keys?

2010-03-12 Thread Lars Francke
Will I save a lot of space (especially if I have many small columns)? I don't have any hard numbers for you but I tested it and I remember that on a dataset of about 10-20 GB I could save about 200-500 MB (this was with compression enabled) just by not using descriptive sting qualifiers that

Re: Regular expression as column qualifier

2010-03-07 Thread Lars Francke
Hi, Is it possible to use regex as column qualifier in the get operation ? no, that's not possible. For a get[1] operation you need to specify the exact row key you are looking for and there will be at most one result. You can use a scan[2] (which may return multiple results) in combination

Re: Regular expression as column qualifier

2010-03-07 Thread Lars Francke
Thank You for quick reply, but I'm not talking about row key, but about column qualifier (column key). Oh! I'm sorry - must have misread that. The answer is unfortunately the same. It is not possible to do something like that in the get operation. With the addColumn function you have to add

Re: Why windows support is critical

2010-03-01 Thread Lars Francke
I ended up creating a pseudo-distributed installation on Ubuntu in a Virtual Box. It all works fine from localhost, and I can run the shell. But I don't see how that's useful to anyone who actually wants to build a real application. I'm struggling to figure out how to connect to it from a

Re: Thrift api and binary keys

2010-02-09 Thread Lars Francke
On Tue, Feb 9, 2010 at 17:27, Saptarshi Guha saptarshi.g...@gmail.com wrote: Thank you. The status of hbase thrift is undefined? Let ne ask me this as new question. You might want to take a look at HBASE-1744[1]. Work is currently being done to bring the Thrift API up do date and to match the

Re: scan hbase using thrift api with timestamp range

2010-02-06 Thread Lars Francke
Is there anyway to scan a hbase table using thrift api with timestamp range? AFAIK, currently thrift api is the ONLY way to access hbase from non-java languages such as C++/Python/PHP/etc.. However, the thrift api seems incomplete compared to native Java hbase client api. You have the option

Re: starting thrift generates TTransportException

2009-12-25 Thread Lars Francke
Is there any property in hbase-site.xml which I can use to sets concrete address of thrift server? Unfortunately not yet. If you are comfortable compiling your own HBase I could provide a patch for the ThriftServer that does what you need (binding to a specific address) But I guess/hope there

question about compound keys with two/multiple strings

2009-11-24 Thread Lars Francke
I have another schema design question. I hope you don't mind. My data are key-value pairs (tags on domain elements) with key and value being strings. Now I've got a keys table with a column family values and the column qualifiers being all the values for this key. Everything is fine so far. But

Re: question about compound keys with two/multiple strings

2009-11-24 Thread Lars Francke
If you need to be able to scan/lookup based on two different key/values, then you will most likely need duplicate tables or duplicate rows. This is common when you need to support two different lookup/read patterns. Thanks for the answer. I had hoped there was some kind of (order insensitive)

Re: LogConfigurationException: no suitable log implementation

2009-11-09 Thread Lars Francke
How did you get into this state?  Are you not using the hbase scripts and the pre-defined lib/* directory? It is quite easy to run into this problem as there is no documentation about the dependencies for a HBasce client. The lib directory contains jetty and jruby among others and I knew I'd

Schema questions: Best practices, versions/timestamps

2009-11-09 Thread Lars Francke
I've read numerous threads on this mailing list and I've asked several times on IRC but the answers I get are rarely the same so I'd like to try once more. I have a data model that would be a perfect match for the versions/timestamps that are available in HBase. Some say that it is perfectly