Hi All, After a conversation with Till, I have updated the english queries. The new queries have specific details about which station, dates, etc. are used for the query. The list of queries has been reduced to only the ones that should be interesting for a user.
Filtering Query Query: See historical data for Riverside, CA (ASN00008113) station by selecting the weather readings for December 25 over the last 10 years. Query: Find all reading for hurricane force wind warning or extreme wind warning. The warnings occur when the wind speed exceeds 110 mph. Aggregation Query Query: Find the annual precipitation for a Seattle using the airport station (USW00024233) for 1999. Query: Find the lowest/highest recorded temperature. Join Query Query: Find all the weather readings for Los Angeles county for a specific day 1976/7/4. Join and Aggregation Query Query: Find the lowest/highest recorded temperature in the state of Oregon for 2001. On Tue, Nov 19, 2013 at 8:18 PM, Eldon Carman <[email protected]> wrote: > Thanks Mike. I will update my test plans. > > Vinayak, I realized later that I did not include the sample XML files. > > Weather Data Overview > The weather data has been downloaded from NOAA their HTTP available dat > file and set up to mimic the XML web service offered on their website. The > data set for Global Historical Climatology Network (GHCN)-Daily includes > summaries of climate recording. The core data includes fields for high and > low temperatures, snowfall, snow depth and rainfall. The full list of > fields can be found on NOAA site ( > http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt). The readings > include the date, datatype, station id, value, and attributes about the > reading. In a separate web service query details about the station can be > downloaded. The station has its name, latitude, longitude, date of first > and last reading, and various names. > > Attached are two sample XML files: a single day's sensor readings and the > station details. > > Sensor Data basic scheme > > <?xml version="1.0" encoding="UTF-8" standalone="yes"?> > <dataCollection pageCount="1" totalCount="11"> > <data> > <date>2013-09-22T00:00:00.000</date> > <dataType>AWND</dataType> > <station>GHCND:USW00003822</station> > <value>17</value> > <attributes> > <attribute></attribute> <!-- measurement flag --> > <attribute></attribute> <!-- quality flag --> > <attribute>W</attribute> <!-- source flag --> > <attribute></attribute> <!-- time of reading --> > </attributes> > </data> > <!-- repeat data tag --> > </dataCollection> > > Station Data basic scheme > > <?xml version="1.0" encoding="UTF-8" standalone="yes"?> > <stationCollection pageSize="100" pageCount="1" totalCount="1"> > <station> > <id>GHCND:USW00003822</id> > <displayName>SAVANNAH INTERNATIONAL AIRPORT, GA US</displayName> > <minDate>1948-01-01</minDate> > <maxDate>2013-10-20</maxDate> > <latitude>32.13</latitude> > <longitude>-81.21</longitude> > <elevation>14</elevation> > <locationLabels> > <type>ZIP</type> > <id>ZIP:31408</id> > <displayName>Savannah, GA 31408</displayName> > </locationLabels> > <!-- repeat location labels --> > <coverage>1</coverage> > </station> > </stationCollection> > > > Filtering Test > The filtering test only returns a sub section of data.. > > Query: Select a single weather station reading. > Query: See historical data by select the weather readings for today last > year. > Query: Find all reading for severe wind readings. > > > Aggregation Test > The aggregation test compiles information into a single result. The > aggregation query’s focus on processing data to help with analysis. > > Query: Count the number of weather reading in the database. > Query: Find the annual precipitation for a station. > Query: Find the lowest/highest recorded temperature. > > > Join Test > The join test works when nested loops are present on different data sets. > The weather data has both weather data and station details. > > Query: Find the station’s name and date of the first sensor reading. > Query: Find regional weather readings for a specific day. > > > Side Question: Should we have a test for creating modified data results? > > > > On Tue, Nov 19, 2013 at 12:11 AM, Vinayak Borkar <[email protected]> > wrote: > > > > Preston, > > > > > > With respect to the benchmark queries, let me suggest the following > approach. Send out an email with: > > - GHCN information content (schema with some English description of > each field). > > - A list of questions in English that represent the interesting > questions to ask of that data. > > > > This will give everyone the necessary background to possibly suggest > modifications and other interesting queries to ask against the data. > > > > Finally, once there is consensus regarding the queries, we can translate > the English version into XQuery. > > > > Thanks, > > Vinayak > > > > > > > > On 11/17/13, 12:33 PM, Eldon Carman wrote: > >> > >> The goal of the benchmark tests are to highlight the parallel aspects of > >> VXQuery. The tests need to show how VXQuery scales. In addition, other > >> queries may be added to highlight our specific speed improvements or > where > >> improvements can still be made. At first we want to show how the system > >> works with parallel queries. We focus on three types of queries: > filtering, > >> aggregation and nested loops (join). > >> > >> For these three queries the following scaling tests will be completed: > >> scale up and speed up. > >> * Scale up keeps the number of nodes in the cluster constant and > increases > >> the data set in each successive test. > >> * Speed up keeps the data set size constant and increases the number > of > >> nodes processing the data in each successive test. ( > >> http://en.wikipedia.org/wiki/Speedup) > >> > >> Still working on the specific queries for our GHCN daily data, but you > can > >> see the draft version here: > >> > https://svn.apache.org/repos/asf/incubator/vxquery/trunk/vxquery/vxquery-benchmark/src/main/resources/noaa-ghcn-daily/queries/ > >> > > >
