hi,
ok, thanks for prompt comments.
I did notice that the functionality is separated in MIFFile's inner
classes.
private class Reader implements FeatureReader
private class Writer implements FeatureWriter
Mainly I was thinking that when someone is starting to review the
functionality of geotools MIF-plugin it's more 'readable'. Thus it seems
that the use of inner classes is common practise when dealing with classes
that represent the datamodel of some geographic data. For exampe some shp
file reader / writer representations are written in this way, despite that
the growth of amount of loc and code readability .
I agree also that MapInfo have irritating way to support mixed geometries
and text and style information inside mif (and tab). I think the styles
used in mif -files should never be considered 'business critical', instead
the handling of core information (naturally that's geographical data) is
the main information we should take care about. It's good to notice that
you have been thinking this issue and you already have some workaround
ideas.
I remember that I read some conversation that geotools should not provide
'official' support for interchange data formats like MIF. Instead geotools
should concentrate to provide support for 'real' dataformats like shp. I
think of course it would be better to have support for 'real' MapInfo tab
but how to hack the data format specifications? I quess that tab -support
is out of question so Sig's current contribution's on MIF plugin is a must
for geotools.
Yours:
Jukka
"Luca Sigfrido Percich" <[EMAIL PROTECTED]>
26/09/2005 13:40
To: "Jukka Sirviö" <[EMAIL PROTECTED]>,
[email protected]
cc:
Subject: Re: Geotools mif
Hi Jukka!
Please, can you FW my reply to the udig list, as I'm not a member of
it? Meanwhile I CC the reply to Geotools-devel as the discussion
might be of interest. Thanx!!!
For up to date code and documentation, please refer to:
http://docs.codehaus.org/display/GEOTOOLS/MIFDataStore
and
http://svn.geotools.org/geotools/trunk/gt/plugin/mif/
On 26 Sep 2005 at 11:28, Jukka Sirviö wrote:
> we could start from separating the reading and writing processes, this
> way we can share the testing and developing burden more easily? What's
> your opinion?
> I suggest that the MIFFile -class is splitted into a bit more
> convenient pieces like: 1) MIFFileReader 2) MIFFileWriter
>
> probaply also
> 3) MIFFeatureWriter
> 4) MIFFeatureReader
Yes, there's already a certain degree of separation, take a look at
the MIFile private inner classes Reader (extends FeatureReader) and
Writer (Extends FeatureWriter)... I've left them inside MIFFile
'cause they share some pieces of code, and I think they don't need to
be derived or accessed form outside MIFFile. The problem is that if
in the future a new type of geometry or geometry attributes are
defined for MIF format, the class has to be modified, so no "plugin"
or "extension" model has been provided.
MIFFile has been designed for being used standalone also (i.e.
outside a MIFDataStore).
> JUnit testcases could be 'splitted' also, for example when running the
> tests the writer and reader tests could be easily separated. I ran the
> current test's with normal mif in our purposes (30 - 100 Mb). I faced
> performance issues probaply mainly related that the writing and
> reading is done simultaneously during the run of JUnit -testclass.
Ok.
> How about the Styling and text part of Region -element? Coodinates are
> straight-forward to handle but how about the example below? Any ideas?
Well, we've already discussed this issue (but unfortunately nobody
seemed to be so interested in it), anyway the idea is that keeping
style along with geographic data is a bad practice. It's nearly the
same problem I faced when somebody asked me to support text
objects... text objects are no database concept at all, they are
merely a "map drawing" issue. Supporting the style at the row level
for a geometry has the same value of supporting font style for a char
field at the row level... font style, as object style, must derive
from formalized object properties which are accessible from the
database where the object is stored (i.e. all the polygons of
buildings belonging to this class are to be filled in red).
Please refer to:
http://jira.codehaus.org/browse/GEOT-653
On one hand, MapInfo allows for "bad practices" (IMHO). (Another bad
practice is supporting mixed geometry types in the same table, again
IMHO). On the other, style and text and other MapInfo features don't
find a corresponding entity in the JTS/GeoTools world. In the
following example, what we could do is similar to the approach for
supporting TEXT objects, i.e. creating one or more MIF_PROPERTY
String fields in which store the information we read so that we can
write it back when needed. But what happens when a new feature is
created? And how interpret the MIF style definition for rendering
purposes?
> Region 1
> 19
> 3605482.86 6908238.08
> 3605461.4 6908257.01
> 3605451.29 6908270.91
> Pen (3,2,8388736)
> Brush (1,0,16777215)
> Center 3605495.49 6908255.75
My final proposal - I'm a DataBase/Developer guy, not a cartographer
;o) is to ignore all the style info from MIF format, and concentrate
in a best suited approach which could be automatically GENERATE a SLD
DESCRIPTOR from a MapInfo WORKSPACE FILE (.WOR).
The definition of styles for thematic maps in a workspace files uses
the same style objects used in mif files, we have to implement
classes which parse the MapInfo style descriptors (Symbol, Pen and
Brush) in GeoTools renderers styles.
> Third, I did not find the the support for multipolygons?
MultiPolygons are supported. MapInfo describes in the same way a
linear ring, a polygon with a hole and a multipolygon: they are
simply lists of linear rings. The MIFFile.Reader builds a Polygon or
MultiPolygon according to the preferences (for example it can cast
all the simple polygons to Multi) and to what it finds in MIF file.
> I suppose that current approach is well when examined the architect of
> it. The Tokenizer classes are well separated from the parsing. The
> actual String to be handled is always asked through
> MIFFileTokenizer.getToken() -method with several types of 'tokenizer'
> (or delimiter) parameter passsed by (empty or ',' or ' ' , '(', or
> ')'). That is of course flexible way to do things but does it affect
> perfomance when parsing large files?
As I told you, I didn't check for performance. :o(
Sure the best performance improvement could be achieved modifying the
MIFFile.Reader.readMIFCoordinate() method, as MIF format uses to keep
a coordinate pair on each row. The problem is when things don't work
as expected in the MIF file, i.e.:
x1, y1, x2, y2
or
x1, y1,
x2, y2
are treated correctly by the current
MIFFile.Reader.readMIFCoordinate() method, but they probably wouldn't
work in the new version. Keep in mind that MIF files can also be
generated by third party software which may or may not write them
according to precise formatting rules.
> of stuff we will get from mif -file? The basic delimiter is stored in
> mif file's header (as you have used), but I think we also know, or at
> least we can easily determine, the rest of the things in mif file
> without every time going through the variety of if -checkings
> (MIFStringTokenizer.getToken(char, boolean, boolean))?
While it's no use improving the parsing of header (few rows), it's
worth thinking about the MID files (Feature non-geometric
attributes).
The problem of parsing the MID file lies in String fields which are
always delimited by double quotes, and where double quote chars are
escaped by doubling inside the string... like "A string containing a
""quoted string"""... substring would not work here, not in a
straightforward way I mean.
The different getToken methods allow you to skip parsing of double
quotes, and in fact use a plain substring() call if a non-quoted
string is expected.
Give a look at the MIFValueSetter objects, which are currently used
for mapping MIF values into Feature attributes... they could also be
used for getting data from the MIF stream... so we could think of a
MIFValueSetter.getToken() method... this could give better
performances.
> This kind of approach could be straigh-forward way to handle the
> parsing: - loop through the whole file one line at a time - parse line
> with StringTokenizer into smaller pieces - determine the type of the
> line from first token and do the necessary actions for the line based
> on that linetype
This is what is currently done, apart that you don't parse the whole
file but one feature at a time according to the FeatureReader /
FeatureWriter model. The MIFStringTokenizer.putToken() method has
been added for allowing a "look forward", useful if you want to read
all the style stuff (as I did for Text objects) instead of simply
skipping it when you read the next feature.
> Yours:
> Jukka
Cheers
Sig
--
Luca Sigfrido Percich ([EMAIL PROTECTED])
Agenzia Milanese Mobilità e Ambiente s.r.l. (http://www.ama-mi.it)
Direzione Sistemi Informativi e Modellistica
Via Beccaria, 19 - 20122 Milano - tel. +39 02 884.67.262
_______________________________________________
User-friendly Desktop Internet GIS (uDig)
http://udig.refractions.net
http://lists.refractions.net/mailman/listinfo/udig-devel