Re: hive will die or not?

2016-08-07 Thread Marcin Tustin
I think that's right. My testing (not very scientific) puts it on par for redshift for the datasets I use. On Sunday, August 7, 2016, Edward Capriolo wrote: > A few entities going to "kill/take out/better than hive" > I seem to remember HadoopDb, Impala, RedShift ,

Re: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-07 Thread Marcin Tustin
estruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 7 August 2016 at 13:17, Ma

Re: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-07 Thread Marcin Tustin
Will CREATE TABLE sales5 AS SELECT * FROM SALES; not work for you? On Thu, Aug 4, 2016 at 5:05 PM, Nagabhushanam Bheemisetty < nbheemise...@gmail.com> wrote: > Hi I've a scenario where I need to create a table from partitioned table > but my destination table should not be partitioned. I won't

Re: Create table from orc file

2016-08-03 Thread Marcin Tustin
d using the latest > orc-core lib (1.1.2). That seems not to be the same implementation for orc > files access as being used in hive. > > > Thanks for all hints! > > > > Am Mittwoch, 3. August 2016, 08:45:45 CEST schrieb Marcin Tustin: > > Yes. Cr

Re: Create table from orc file

2016-08-03 Thread Marcin Tustin
Yes. Create an external table whose location contains only the orc file(s) you want to include in the table. On Wed, Aug 3, 2016 at 7:53 AM, Johannes Stamminger < johannes.stammin...@airbus.com> wrote: > Hi, > > > is it possible to write data to an orc file(s) using the hive-orc api and > to >

Re: A dedicated Web UI interface for Hive

2016-07-15 Thread Marcin Tustin
operty which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 14 July 2016 at 23:29, Marcin Tustin <mtus...@handyboo

Re: A dedicated Web UI interface for Hive

2016-07-14 Thread Marcin Tustin
What do you want it to do? There are at least two web interfaces I can think of. On Thu, Jul 14, 2016 at 6:04 PM, Mich Talebzadeh wrote: > Hi Gopal, > > If I recall you were working on a UI support for Hive. Currently the one > available is the standard Hadoop one on

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Marcin Tustin
and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Marcin Tustin
Quick note - my experience (no benchmarks) is that Tez without LLAP (we're still not on hive 2) is faster than MR by some way. I haven't dug into why that might be. On Tue, Jul 12, 2016 at 9:19 AM, Mich Talebzadeh wrote: > sorry I completely miss your points > > I was

Re: loading in ORC from big compressed file

2016-06-21 Thread Marcin Tustin
This is because a GZ file is not splittable at all. Basically, try creating this from an uncompressed file, or even better split up the file and put the files in a directory in hdfs/s3/whatever. On Tue, Jun 21, 2016 at 7:45 PM, @Sanjiv Singh wrote: > Hi , > > I have big

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Marcin Tustin
Mich - it sounds like maybe you should try these benchmarks with alluxio abstracting the storage layer, and see how much it makes a difference. Alluxio should (if I understand it right) provide a lot of the optimisation you're looking for with in memory work. I've never used it, but I would love

NullPointerException when dropping database backed by S3

2016-05-06 Thread Marcin Tustin
Hi All, I have a database backed by an s3 bucket. When I try to drop that database, I get a NullPointerException: hive> drop database services_csvs cascade; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.NullPointerException)

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Marcin Tustin
> > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Marcin Tustin
They're not simply interchangeable. sqoop is written to use mapreduce. I actually implemented my own replacement for sqoop-export in spark, which was extremely simple. It wasn't any faster, because the bottleneck was the receiving database. Is your motivation here speed? Or correctness? On Sat,

Re: Hive footprint

2016-04-20 Thread Marcin Tustin
w latency here? Are you referring to the >>> performance of SQL against HBase tables compared to Hive. As I understand >>> HBase is a columnar database. Would it be possible to use Hive against ORC >>> to achieve the same? >>> >>> Dr Mich Talebzadeh >>

Re: Hive footprint

2016-04-18 Thread Marcin Tustin
w > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 18 April 2016 at 23:43, Marcin Tustin <mtus...@handybook.com> wrote: > >> HBase has a different use case -

Re: Hive footprint

2016-04-18 Thread Marcin Tustin
HBase has a different use case - it's for low-latency querying of big tables. If you combined it with Hive, you might have something nice for certain queries, but I wouldn't think of them as direct competitors. On Mon, Apr 18, 2016 at 6:34 PM, Mich Talebzadeh wrote: >

Re: De-identification_in Hive

2016-03-19 Thread Marcin Tustin
This is a classic transform-load problem. You'll want to anonymise it once before making it available for analysis. On Thursday, March 17, 2016, Ajay Chander wrote: > Hi Everyone, > > I have a csv.file which has some sensitive data in a particular column > in it. Now I

Re: Hive alter table concatenate loses data - can parquet help?

2016-03-14 Thread Marcin Tustin
ill be great if you can attach a small enough repro for this issue. I > can verify it and provide a fix in case of bug. > > Thanks > Prasanth > > On Mar 8, 2016, at 5:52 AM, Marcin Tustin <mtus...@handybook.com > <javascript:_e(%7B%7D,'cvml','mtus...@handybook.com')

Re: How to rename a hive table without changing location?

2016-03-12 Thread Marcin Tustin
I you wish to keep it in its current location consider creating an external table. On Saturday, March 12, 2016, Rex X wrote: > Hi Mich, > > I am doing this, because I need to update an existing big hive table, > which can be stored in any arbitrary customized location on

Re: Hive alter table concatenate loses data - can parquet help?

2016-03-08 Thread Marcin Tustin
CCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 7 March 2016 at 23:25, Marcin Tustin <mtus...@handybook.com> wrote: > >> Hi All, >> >> Following on from from our parquet vs orc discussion, today I observed >> hive's alter ta

Re: Hive 2 insert error

2016-03-07 Thread Marcin Tustin
I believe updates and deletes have always had this constraint. It's at least hinted at by: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-ConfigurationValuestoSetforINSERT,UPDATE,DELETE On Mon, Mar 7, 2016 at 7:46 PM, Mich Talebzadeh

Hive alter table concatenate loses data - can parquet help?

2016-03-07 Thread Marcin Tustin
Hi All, Following on from from our parquet vs orc discussion, today I observed hive's alter table ... concatenate command remove rows from an ORC formatted table. 1. Has anyone else observed this (fuller description below)? And 2. How to do parquet users handle the file fragmentation issue?

Re: Updating column in table throws error

2016-03-06 Thread Marcin Tustin
Don't bucket on columns you expect to update. Potentially you could delete the whole row and reinsert it. On Sunday, March 6, 2016, Ashok Kumar wrote: > Hi gurus, > > I have an ORC table bucketed on invoicenumber with "transactional"="true" > > I am trying to update

Re: Parquet versus ORC

2016-03-06 Thread Marcin Tustin
If you google, you'll find benchmarks showing each to be faster than the other. In so far as there's any reality to which is faster in any given comparison, it seems to be a result of each incorporating ideas from the other, or at least going through development cycles to beat each other. ORC is

Data corruption/loss in hive

2016-01-22 Thread Marcin Tustin
Hi All, I'm seeing some data loss/corruption in hive. This isn't HDFS-level corruption - hdfs reports that the files and blocks are healthy. I'm using managed ORC tables. Normally we write once an hour to each table, with occasional concatenations through hive. We perform the writing using spark

Re: the `use database` command will change the scheme of target table?

2016-01-19 Thread Marcin Tustin
That is the expected behaviour. Managed tables are created within the directory of their host database. On Tuesday, 19 January 2016, 董亚军 wrote: > hi list, > > we use the HDFS and S3 as the Hive Filesystem at the same time. here has > an issue: > > > *scenario* 1: > >

Re: eiquivalent to identity column in Hive

2016-01-16 Thread Marcin Tustin
See this: http://stackoverflow.com/questions/23082763/need-to-add-auto-increment-column-in-a-table-using-hive On Sat, Jan 16, 2016 at 11:52 AM, Ashok Kumar wrote: > Hi, > > Is there an equivalent to Microsoft IDENTITY column in Hive please. > > Thanks and regards > --

Re: Loading data containing newlines

2016-01-15 Thread Marcin Tustin
I second this. I've generally found anything else to be disappointing when working with data which is at all funky. On Wed, Jan 13, 2016 at 8:13 PM, Alexander Pivovarov wrote: > Time to use Spark and Spark-Sql in addition to Hive? > It's probably going to happen sooner or

Re: Loading data containing newlines

2016-01-15 Thread Marcin Tustin
ed > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this ema

Re: foreign keys in Hive

2016-01-10 Thread Marcin Tustin
You can join on any equality criterion, just like in any other relational database. Foreign keys in "standard" relational databases are primarily an integrity constraint. Hive in general lacks integrity constraints. On Sun, Jan 10, 2016 at 9:45 AM, Ashok Kumar wrote: > hi,

Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Marcin Tustin
lume > one out shortly > > > > http://talebzadehmich.wordpress.com > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immed

Re: Running the same query on 1 billion rows fact table in Hive on Spark compared to Sybase IQ columnar database

2015-12-30 Thread Marcin Tustin
Yes, that's why I haven't had to compile anything. On Wed, Dec 30, 2015 at 4:16 PM, Jörn Franke <jornfra...@gmail.com> wrote: > Hdp Should have TEZ already on-Board bye default. > > On 30 Dec 2015, at 21:42, Marcin Tustin <mtus...@handybook.com> wrote: > > I'm afraid

Importing into a hive database with minimal unavailability or renaming a database

2015-12-18 Thread Marcin Tustin
Hi All, We import our production database into hive on a schedule using sqoop. Unfortunately, sqoop won't update the table schema in hive when the table schema has changed in the source database. Accordingly, to get updates to the table schema we drop the hive table first. Unfortunately, this