Details:
https://github.com/kevinweil/elephant-bird/wiki/Elephant-Bird-Lucene
On Fri, Jan 4, 2013 at 7:55 AM, Bill Graham wrote:
> ElephantBird now has pig-lucene support:
>
>
> https://github.com/kevinweil/elephant-bird/blob/master/pig-lucene/src/main/java/com/twitter/elephantbird/pig/load/Luc
Try jstacking it a few times while it's running. Is it just sitting idly in
a sleep() ?
On Mon, Jan 7, 2013 at 11:56 AM, Cheolsoo Park wrote:
> Typo: it makes much sense to run them in cluster => it doesn't make much
> sense to run them in cluster.
>
> On Mon, Jan 7, 2013 at 11:55 AM, Cheolsoo P
Sorry, Looks like my suggestion won't help unless you were able to specify
the schema with the original load statement. If the number of field is ONLY
available at runtime but each row have the same number field and you know
the position of join key, then I have a ugly approach. First, sample the
Hi Jinyuan,
Since I don't know how many columns I will have, I do something like this.
six_month_and_variable_month_sales_2 = FOREACH
six_month_and_variable_month_sales
GENERATE $0 AS ed_style_id,
$1 AS sale_start_month,
$2 AS sale_month_1,
$3 AS sale_month_2,
$4 AS sale_month_3
Hmm,
I was using pretty much the same setup and got errors complaining
about Counter being an interface when it expected a class.
I'll try again with the jars straight out of maven tomorrow. Thanks.
~T
On 7 January 2013 21:32, meghana narasimhan
wrote:
> Hi Tim,
>
> We are using elephant-bird 3.
This seems like a bug to me. It makes it risky to work with JSON data
generated by something other than Pig since the ordering might change.
What do you think?
I didn't see a bug for it in Jira, so would this (still open) one be
the place to mention it? Or should I make a new one?
https://issues.a
If you can load it but join operation need the complete schema, then you
can try do a generate statement to project your original relation to
produce the one you can define schema for all fields.
On Mon, Jan 7, 2013 at 2:19 PM, Chan, Tim wrote:
> Is it possible to declare a schema when doing a
Is it possible to declare a schema when doing a LOAD for data in which you
do not know the total number of columns?
For instance. I know the data contains 6 or more columns. These columns are
of the same data type.
I basically want to join this data with another data set, but I was getting
the fo
Hi Tim,
We are using elephant-bird 3.0.2 with hadoop-2.0.0-mr1-cdh4.1.1
and pig-0.10.0-cdh4.1.1. We are using the jar available in the maven repo.
Didnt have to build it out.
- Meg
On Mon, Jan 7, 2013 at 11:56 AM, Tim Sell wrote:
> When using JsonLoader with Pig 0.10.0
>
> if I have an input.
Currently the JsonLoader does assume ordering of the fields. It does not do
any name matching against the given schema to find the right field.
Alan.
On Jan 7, 2013, at 11:56 AM, Tim Sell wrote:
> When using JsonLoader with Pig 0.10.0
>
> if I have an input.json file that looks like this:
>
When using JsonLoader with Pig 0.10.0
if I have an input.json file that looks like this:
{"date": "2007-08-25", "id": 16}
{"date": "2007-09-08", "id": 17}
{"date": "2007-09-15", "id": 18}
And I use
a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray');
DUMP a;
I get errors when it tr
Typo: it makes much sense to run them in cluster => it doesn't make much
sense to run them in cluster.
On Mon, Jan 7, 2013 at 11:55 AM, Cheolsoo Park wrote:
> it makes much sense to run them in cluster.
Hi Malc,
>> When you say to use MR mode, do you mean install hadoop onto the node ?
I meant the cluster mode, but given the size of your input files, it makes
much sense to run them in cluster.
Instead, you might consider to execute jobs in parallel in local mode if
it's possible to process inpu
Hi,
It's Pig 0.10.0. Here's some timings I took. I have more than 3
files to process, but I just started out with 3 files to get some numbers.
# Files Time(s)
1 28
2 48
3 73
Cheolsoo, the documentation does seem to indicate that you wi
14 matches
Mail list logo