Thanks Mike. I will update my test plans.

Vinayak, I realized later that I did not include the sample XML files.

Weather Data Overview
The weather data has been downloaded from NOAA their HTTP available dat
file and set up to mimic the XML web service offered on their website. The
data set for Global Historical Climatology Network (GHCN)-Daily includes
summaries of climate recording. The core data includes fields for high and
low temperatures, snowfall, snow depth and rainfall. The full list of
fields can be found on NOAA site (
http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt). The readings
include the date, datatype, station id, value, and attributes about the
reading. In a separate web service query details about the station can be
downloaded. The station has its name, latitude, longitude, date of first
and last reading, and various names.

Attached are two sample XML files: a single day's sensor readings and the
station details.

Sensor Data basic scheme

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<dataCollection pageCount="1" totalCount="11">
  <data>
    <date>2013-09-22T00:00:00.000</date>
    <dataType>AWND</dataType>
    <station>GHCND:USW00003822</station>
    <value>17</value>
    <attributes>
      <attribute></attribute> <!--  measurement flag -->
      <attribute></attribute> <!-- quality flag -->
      <attribute>W</attribute> <!-- source flag -->
      <attribute></attribute> <!-- time of reading -->
    </attributes>
  </data>
  <!-- repeat data tag -->
</dataCollection>

Station Data basic scheme

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<stationCollection pageSize="100" pageCount="1" totalCount="1">
  <station>
    <id>GHCND:USW00003822</id>
    <displayName>SAVANNAH INTERNATIONAL AIRPORT, GA US</displayName>
    <minDate>1948-01-01</minDate>
    <maxDate>2013-10-20</maxDate>
    <latitude>32.13</latitude>
    <longitude>-81.21</longitude>
    <elevation>14</elevation>
    <locationLabels>
      <type>ZIP</type>
      <id>ZIP:31408</id>
      <displayName>Savannah, GA 31408</displayName>
    </locationLabels>
    <!-- repeat location labels -->
    <coverage>1</coverage>
  </station>
</stationCollection>


Filtering Test
The filtering test only returns a sub section of data..

Query: Select a single weather station reading.
Query: See historical data by select the weather readings for today last
year.
Query: Find all reading for severe wind readings.


Aggregation Test
The aggregation test compiles information into a single result. The
aggregation query’s focus on processing data to help with analysis.

Query: Count the number of weather reading in the database.
Query: Find the annual precipitation for a station.
Query: Find the lowest/highest recorded temperature.


Join Test
The join test works when nested loops are present on different data sets.
The weather data has both weather data and station details.

Query: Find the station’s name and date of the first sensor reading.
Query: Find regional weather readings for a specific day.


Side Question: Should we have a test for creating modified data results?


On Tue, Nov 19, 2013 at 12:11 AM, Vinayak Borkar <[email protected]> wrote:
>
> Preston,
>
>
> With respect to the benchmark queries, let me suggest the following
approach. Send out an email with:
>   - GHCN information content (schema with some English description of
each field).
>   - A list of questions in English that represent the interesting
questions to ask of that data.
>
> This will give everyone the necessary background to possibly suggest
modifications and other interesting queries to ask against the data.
>
> Finally, once there is consensus regarding the queries, we can translate
the English version into XQuery.
>
> Thanks,
> Vinayak
>
>
>
> On 11/17/13, 12:33 PM, Eldon Carman wrote:
>>
>> The goal of the benchmark tests are to highlight the parallel aspects of
>> VXQuery. The tests need to show how VXQuery scales. In addition, other
>> queries may be added to highlight our specific speed improvements or
where
>> improvements can still be made. At first we want to show how the system
>> works with parallel queries. We focus on three types of queries:
filtering,
>> aggregation and nested loops (join).
>>
>> For these three queries the following scaling tests will be completed:
>> scale up and speed up.
>>   * Scale up keeps the number of nodes in the cluster constant and
increases
>> the data set in each successive test.
>>   * Speed up keeps the data set size constant and increases the number of
>> nodes processing the data in each successive test. (
>> http://en.wikipedia.org/wiki/Speedup)
>>
>> Still working on the specific queries for our GHCN daily data, but you
can
>> see the draft version here:
>>
https://svn.apache.org/repos/asf/incubator/vxquery/trunk/vxquery/vxquery-benchmark/src/main/resources/noaa-ghcn-daily/queries/
>>
>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><dataCollection pageCount="1" totalCount="11">
<data><date>2013-09-22T00:00:00.000</date><dataType>AWND</dataType><station>GHCND:USW00003822</station><value>17</value><attributes><attribute></attribute><attribute></attribute><attribute>W</attribute><attribute></attribute></attributes></data><data><date>2013-09-22T00:00:00.000</date><dataType>PRCP</dataType><station>GHCND:USW00003822</station><value>36</value><attributes><attribute></attribute><attribute></attribute><attribute>W</attribute><attribute>2400</attribute></attributes></data><data><date>2013-09-22T00:00:00.000</date><dataType>SNOW</dataType><station>GHCND:USW00003822</station><value>0</value><attributes><attribute></attribute><attribute></attribute><attribute>W</attribute><attribute></attribute></attributes></data><data><date>2013-09-22T00:00:00.000</date><dataType>SNWD</dataType><station>GHCND:USW00003822</station><value>0</value><attributes><attribute></attribute><attribute></attribute><attribute>W</attribute><attribute></attribute></attributes></data><data><date>2013-09-22T00:00:00.000</date><dataType>TMAX</dataType><station>GHCND:USW00003822</station><value>250</value><attributes><attribute></attribute><attribute></attribute><attribute>W</attribute><attribute>2400</attribute></attributes></data><data><date>2013-09-22T00:00:00.000</date><dataType>WT01</dataType><station>GHCND:USW00003822</station><value>1</value><attributes><attribute></attribute><attribute></attribute><attribute>W</attribute><attribute></attribute></attributes></data><data><date>2013-09-22T00:00:00.000</date><dataType>WDF2</dataType><station>GHCND:USW00003822</station><value>350</value><attributes><attribute></attribute><attribute></attribute><attribute>W</attribute><attribute></attribute></attributes></data><data><date>2013-09-22T00:00:00.000</date><dataType>WDF5</dataType><station>GHCND:USW00003822</station><value>280</value><attributes><attribute></attribute><attribute></attribute><attribute>W</attribute><attribute></attribute></attributes></data><data><date>2013-09-22T00:00:00.000</date><dataType>WSF2</dataType><station>GHCND:USW00003822</station><value>54</value><attributes><attribute></attribute><attribute></attribute><attribute>W</attribute><attribute></attribute></attributes></data><data><date>2013-09-22T00:00:00.000</date><dataType>WSF5</dataType><station>GHCND:USW00003822</station><value>107</value><attributes><attribute></attribute><attribute></attribute><attribute>W</attribute><attribute></attribute></attributes></data><data><date>2013-09-22T00:00:00.000</date><dataType>TMIN</dataType><station>GHCND:USW00003822</station><value>206</value><attributes><attribute></attribute><attribute></attribute><attribute>W</attribute><attribute>2400</attribute></attributes></data>
</dataCollection>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><stationCollection pageSize="100" pageCount="1" totalCount="1"><station><id>GHCND:USW00003822</id><displayName>SAVANNAH INTERNATIONAL AIRPORT, GA US</displayName><minDate>1948-01-01</minDate><maxDate>2013-10-20</maxDate><latitude>32.13</latitude><longitude>-81.21</longitude><elevation>14</elevation><locationLabels><type>ZIP</type><id>ZIP:31408</id><displayName>Savannah, GA 31408</displayName></locationLabels><locationLabels><type>HYD_CAT</type><id>HUC:03060109</id><displayName>Lower Savannah Hydrologic Unit</displayName></locationLabels><locationLabels><type>HYD_ACC</type><id>HUC:030601</id><displayName>Savannah Hydrologic Unit</displayName></locationLabels><locationLabels><type>HYD_SUB</type><id>HUC:0306</id><displayName>Ogeechee-Savannah Hydrologic Unit</displayName></locationLabels><locationLabels><type>HYD_REG</type><id>HUC:03</id><displayName>South Atlantic-Gulf Hydrologic Unit</displayName></locationLabels><locationLabels><type>CITY</type><id>CITY:US130022</id><displayName>Savannah, GA US</displayName></locationLabels><locationLabels><type>CNTY</type><id>FIPS:13051</id><displayName>Chatham County, GA</displayName></locationLabels><locationLabels><type>ST</type><id>FIPS:13</id><displayName>Georgia</displayName></locationLabels><locationLabels><type>CLIM_DIV</type><id>CLIM:1309</id><displayName>Southeast Georgia Climate Division</displayName></locationLabels><locationLabels><type>CLIM_REG</type><id>CLIM:104</id><displayName>Southeast Climate Region</displayName></locationLabels><locationLabels><type>CNTRY</type><id>FIPS:US</id><displayName>United States</displayName></locationLabels><coverage>1</coverage></station></stationCollection>

Reply via email to