Yes, it seems related. I think the query string is not refreshed when hive decides to run without a map reduce job. Problem is that I try to interact with the query string to apply an early filter in the record reader. Any other known way to detect that a map reduce job is not spawned so that I can work around this issue?
/Petter Den tisdagen den 3:e december 2013 skrev Adam Kawa: > Hmmm? > > Maybe it is related to the fact, that a query: > > select * from mytable limit 100; > does not start any MapReduce job. It is starts a reading operation from > HDFS (and a communication with MetaStore to know what is the schema and how > to parse the data using InputFormat and SerDe). > > For example, If you run a query that has the same functionality (i.e. to > show all content of the table by specifying all columns in SELECT) > > select column1, column2, ... columnN from mytable limit 100; > then a map-only job will be started and maybe (?) hive.query.string will > contain this query.. > > > 2013/12/3 Petter von Dolwitz (Hem) > <petter.von.dolw...@gmail.com<javascript:_e({}, 'cvml', > 'petter.von.dolw...@gmail.com');> > > > >> Hi, >> >> I use hive 0.11 with a five machine cluster. I am reading the property >> hive.query.string from a custom RecordReader (used for reading external >> tables). >> >> If I first invoke a query like >> >> select * from mytable where mycolumn='myvalue'; >> >> I get the correct query string in this property. >> >> If I then invoke >> >> select * from mytable limit 100; >> >> the property hive.query.string still contains the first query. Seems like >> hive uses local mode for the second query. Don't know if it is related. >> >> Anybody knows why the query string is not updated in the second case? >> >> Thanks, >> Petter >> > >