Re: Hello, a question about Dashborad in Flink

2016-01-29 Thread Philip Lee
internals/monitoring_rest_api.html#details-of-a-running-or-completed-job > > However, not all of this data is going over the network because some tasks > can be locally connected. > > Best, Fabian > > 2016-01-29 8:50 GMT+01:00 Philip Lee : > >> Thanks, >> &g

Re: Hello, a question about Dashborad in Flink

2016-01-28 Thread Philip Lee
ot possible to pass this data back into Flink's > dashboard, but you have to process and plot it yourself. > > Best, Fabian > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/internals/monitoring_rest_api.html#overview-of-jobs > > > > 2016-01-25

Reading ORC format on Flink

2016-01-27 Thread Philip Lee
Hello, Question about reading ORC format on Flink. I want to use dataset after loadtesting csv to orc format by Hive. Can Flink support reading ORC format? If so, please let me know how to use the dataset in Flink. Best, Phil

Hello, a question about Dashborad in Flink

2016-01-25 Thread Philip Lee
Hello, According to http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Apache-Flink-Web-Dashboard-Completed-Job-history-td4067.html, I cannot retrieve the job history from Dashboard after turnning off JM. But as Fabian mentioned here, "However, you can query all stats that are di

Re: Hive bug? about no such table

2015-12-18 Thread Philip Lee
Opps, sorry I was supposed to email this one to hive mailiing list. On Fri, Dec 18, 2015 at 2:19 AM, Philip Lee wrote: > I think It is from Hive Bug about something related to metastore. > > Here is the thing. > > After I generated scale factor 300 named bigbench300 and bigbe

Hive bug? about no such table

2015-12-17 Thread Philip Lee
I think It is from Hive Bug about something related to metastore. Here is the thing. After I generated scale factor 300 named bigbench300 and bigbench100, which already existed before, I run "hive job with bigbench300". At first it was really fine. Then I run hive job with bigbench100 again. It w

Re: Hello a question about metrics

2015-12-11 Thread Philip Lee
pache Ambari[2], > you can fetch metrics easily from pre-installed ganglia. > > [1]: http://ganglia.sourceforge.net > [2]: https://ambari.apache.org > > > On Dec 8, 2015, at 4:54 AM, Philip Lee wrote: > > > > Hello, a question about metrics. > > > > I wan

Hello a question about metrics

2015-12-07 Thread Philip Lee
Hello, a question about metrics. I want to evaluate some queris on Spark, Flink, and Hive for a comparison. I am using 'vmstat' to check metrics to see the amount of memory used, swap, io, cpu. My way of evaulating is right? becaues they use JVM's resource for memory, cpu. Is there any linux app

Hello, the performance of apply function after join

2015-12-01 Thread Philip Lee
Hello, the performance of apply function after join. Just for your information, I am running Flink job on the cluster consisted of 9 machine with each 48 cores. I am working on some benchmark with comparison of Flink, Spark-Sql, and Hive. I tried to optimize *join function with Hint* for better p

Hi, join with two columns of both tables

2015-11-08 Thread Philip Lee
I want to join two tables with two columns like //AND sr_customer_sk = ws_bill_customer_sk //AND sr_item_sk = ws_item_sk val srJoinWs = storeReturn.join(webSales).where(_._item_sk).equalTo(_._item_sk){ (storeReturn: StoreReturn, webSales: WebSales, out: Collector[(Long,L

Re: Hi, question about orderBy two columns more

2015-11-02 Thread Philip Lee
Hi Philip, > > thanks for reporting the issue. I just verified the problem. > It is working correctly for the Java API, but is broken in Scala. > > I will work on a fix and include it in the next RC for 0.10.0. > > Thanks, Fabian > > 2015-11-02 12:58 GMT+01:00 Philip Le

Re: Hi, question about orderBy two columns more

2015-11-02 Thread Philip Lee
> the same as in SQL when you state "ORDER BY col1, col2". > > The SortPartitionOperator created with the first "sortPartition(col1)" > call appends further columns, rather than instantiating a new sort. > > Greetings, > Stephan > > > On Sun, Nov

Hi, question about orderBy two columns more

2015-11-01 Thread Philip Lee
Hi, I know when applying order by col, it would be sortPartition(col).setParralism(1) What about orderBy two columns more? If the sql is to state order by col_1, col_2, sortPartition().sortPartition () does not solve this SQL. because orderby in sql is to sort the fisrt coulmn and the second c

Re: reading csv file from null value

2015-10-26 Thread Philip Lee
an read the >>> CSV into a DataSet and treat the empty string as a null value. Not very >>> nice but a workaround. As of now, Flink deliberately doesn't support null >>> values. >>> >>> Regards, >>> Max >>> >>> >>

Re: reading csv file from null value

2015-10-24 Thread Philip Lee
Plus, from Shiti to overcome this null value, we could use RowSerializer, right? I tried it in many ways, but it still did not work. Could you take an example for it according to the previous email? On Sat, Oct 24, 2015 at 11:19 PM, Philip Lee wrote: > Maximilian said if we handle null va

Re: reading csv file from null value

2015-10-24 Thread Philip Lee
27;t support null >> values. >> >> Regards, >> Max >> >> >> On Thu, Oct 22, 2015 at 4:30 PM, Philip Lee wrote: >> >>> Hi, >>> >>> I am trying to load the dataset with the part of null value by using >>> readCsvFile().

reading csv file from null value

2015-10-22 Thread Philip Lee
Hi, I am trying to load the dataset with the part of null value by using readCsvFile(). // e.g _date|_click|_sales|_item|_web_page|_user case class WebClick(_click_date: Long, _click_time: Long, _sales: Int, _item: Int,_page: Int, _user: Int) private def getWebClickDataSet(env: ExecutionEnviro

Re: Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-20 Thread Philip Lee
roup on a different field than the grouping field. > > It is not possible to call partitionByHash().sortGroup() because, > sortGroup() requires groups which is done by groupBy(). > > Best, Fabian > > 2015-10-19 14:31 GMT+02:00 Philip Lee : > >> Thanks, Fabian. >> &g

Re: Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-19 Thread Philip Lee
reduces other than in the value itself. >> The GroupReduce, on the other hand, may produce none, one, or multiple >> elements per grouping and keep state in between emitting values. Thus, >> GroupReduce is a more powerful operator and can be seen as a superset >> of the

Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-18 Thread Philip Lee
Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API. In sum up, I am working on some benchmark for flink I am Philip Lee majoring in Computer Science in Master Degree of TUB. , I work on translation from Hive Query of Benchmark to Flink codes. As