Richard,
I have had success querying the FileManager via XML-RPC for ProductTypes,
and then Product Metadata in python. Then I just collect the sub-set of
the metadata I want and re-formatted the content into GeoJSON and loaded
each one into a MongoDB. It isn't exactly a SQL database, but the core
principles should remain the same.
The use case was that we needed to drive some website widgets (time series
plots and drop points into Google Maps) and we didn't want to directly
connect and query FileManager.
Below is a snippet of code from my python program that does the initial
setup and Query: (hope this helps)
######## CODE BLOCK ################
'''This code will read a filemanager and query for all product types.
For each product type it will collect the OFSN, [Lat,Lon] (or geoPoint)
and the meantime of the product and create a new collection within mongoDB.
mongoDB json Doc format (GeoJSON):
Starred(*) Elements are going to be indexed for Query Support
which are coordinates, start_date, and stop_date
{
"type":"[MultiPoint OR Point]",
"*coordinates*":[
[lon,lat],[lon,lat],[lon,lat]
],
"properties": {
"OFSN":"http_url_to_OSFN",
"*start_date*": new Datetime,
"*stop_date*": new Datetime,
}
}
'''
# used to call the XML-RPC interface from Python
import xmlrpclib
# simple string to dates module
import datetime
# Used for Connecting to MongoDB
from pymongo import Connection, GEO2D, ASCENDING
#setup the connection to the filemanager
fm_url = "http://localhost:9000"
filemgr = xmlrpclib.ServerProxy(fm_url).filemgr
#connect to the mongoDB and use the daily_date_list collection
db = Connection('localhost', 27017)
ddl = db.product_met # final collection where inserts will be made
#turn on safe insert mode (Comment out for DANGER mode)
ddl.set_lasterror_options(safe=True)
#Global list of productTypes
p_type_list = filemgr.getProductTypes()
''' This simple one liner will return a Python Dictionary with all of the
ProductTypes in a format they you can iterate over and pass back into the
fileManager via XML-RPC (as opposed to trying to create the Python
equivalent
of a Java object)'''
# This is just used for clearning and building a MongoDB Collection
def build_collection(p_type, main_collection=ddl):
'''This will take in a product type and a main collection:
1. Create a connection to a sub-collection using the product type
name metadata value.
2. ALWAYS - Drop the Collection
3. Create the 3 Required Indexes'''
try:
# 1. collection is a sub-collection
collection = ddl[p_type['name']]
print str(collection.count()) + ' documents found, purging...'
# 2. Drop Collection
collection.drop()
# 3. Create Indexes
collection.create_index([("coordinates", GEO2D)])
print 'geo-index created on the coordinates element'
ddl.create_index("properties.start_date", ASCENDING)
ddl.create_index("properties.stop_date", ASCENDING)
print 'Collection has been built'
return collection
except:
print 'Cannot Build Collection for' + str(p_type['name'])
def build_geo_json(met):
'''take the met dictionary and return a properly formed
geo-json document. Below in code the places you see
met['KEYNAME'] is how you grab a metadata element by
KEYNAME. Your metadata keys will differ and so will your
mileage.'''
geo_json = {} # empty dict we will populate
raw_geo_list = [] # container to hold all [lon,lat] lists
try: # to get metadata elements
# grab OFSN
ofsn = met['OFSNFilePath'][0]
# get the metadata and trim off the microseconds
start = met['TimeBegin'][0].rsplit('.')[0]
stop = met['TimeEnd'][0].rsplit('.')[0]
# create geopoint list, and split
raw_geo_list = met['CAS.geopoint']
geo_list = []
# append all points within the raw_geo_list to the new geo_list
for point in raw_geo_list:
# split the values up into a single list with 2 values
string_list = point.split(' ')
# flip the GeoPoint [LAT,LON] into [LON,LAT]
string_list = [string_list[1], string_list[0]]
float_list = map(float, string_list)
geo_list.append(float_list)
except KeyError:
print met
except IndexError:
print "%s has a problem with it's geopoint metadata. CAS.geopoint =
%s, MaxLon = %s, MaxLat = %s" % (met['CAS.ProductName'][0], raw_geo_list,
met['MaxLon'][0], met['MaxLat'][0])
# convert to datetime objects
start = datetime.datetime.strptime(start, '%Y-%m-%dT%H:%M:%S')
stop = datetime.datetime.strptime(stop, '%Y-%m-%dT%H:%M:%S')
# if a single point is in the geo_list use type:point
if len(geo_list) == 1:
geo_json['type'] = "Point"
else:
geo_json['type'] = "MultiPoint"
geo_json['coordinates'] = geo_list
props = {"OFSN": ofsn,
"start_date": start,
"stop_date": stop}
geo_json['properties'] = props
return geo_json
# print geo_json
def build_geo_json_list(p_type):
# list to hold docs
geo_json_list = []
# get product list
p_list = filemgr.getProductsByProductType(p_type)
# for each product extract metadata
for p in p_list:
# grab the metadata and build geo_json_doc
met = filemgr.getMetadata(p) # this is a dictionary
if met == {}: # SKIP products that have NO METADATA
pass
else:
try:
geo_json_doc = build_geo_json(met)
geo_json_list.append(geo_json_doc)
except KeyError as err:
print err
return geo_json_list
for p_type in p_type_list:
'''In this loop we iterate over each product_type from the p_type_list
we got from the fileManager and we call getProductsByProductType to
loop over all of the products in the catalog'''
# grab a list of products from filemanager using ProductType
p_list = filemgr.getProductsByProductType(p_type)
if len(p_list) == 0: # If no products are found, then do nothing, else
get to work
print 'No products found for ' + str(p_type['name'])
pass
else:
# build the productType based collection
p_collection = build_collection(p_type)
# build the entire geo_json dict
geo_json_list = build_geo_json_list(p_type)
# do a single insert of all geo_json using a single BULK insert
p_collection.insert(geo_json_list)
print str(len(geo_json_list)) + ' is the length of the geo_json
list for %s' % p_type['name']
############ END CODE BLOCK ###############
On Wed, Feb 1, 2012 at 10:52 AM, Yu, Richard (GSFC-474.0)[NOAA-JPSS] <
[email protected]> wrote:
> ** **
>
> Hi,****
>
> ** **
>
> We are new to OODT and would like to know if there is any example of
> populating the metadata into a database after successful ingest. The
> database could be the following:****
>
> POSTGRESQL, or MYSQL. Your help in this area would be most appreciated!**
> **
>
> ** **
>
> Richard Yu****
>
> ** **
>