[udig-devel] Re: Geotools mif

Jukka Sirviö Mon, 26 Sep 2005 05:33:10 -0700

hi,
ok, thanks for prompt comments.

I did notice that the functionality is separated in MIFFile's inner 
classes.
private class Reader implements FeatureReader
private class Writer implements FeatureWriter

Mainly I was thinking that when someone is starting to review the 
functionality of geotools MIF-plugin it's more 'readable'. Thus it seems 
that the use of inner classes is common practise when dealing with classes 
that represent the datamodel of some geographic data. For exampe some shp 
file reader / writer representations are written in this way, despite that 
the growth of amount of loc and  code readability . 

I agree also that MapInfo have irritating way to support mixed geometries 
and text and style information inside mif (and tab). I think the styles 
used in mif -files should never be considered 'business critical', instead 
the handling of core information (naturally that's geographical data) is 
the main information we should take care about. It's good to notice that 
you have been thinking this issue and you already have some workaround 
ideas.

I remember that I read some conversation that geotools should not provide 
'official' support for interchange data formats like MIF. Instead geotools 
should concentrate to provide support for 'real' dataformats like shp. I 
think of course it would be better to have support for 'real' MapInfo tab 
but how to hack the data format specifications? I quess that tab -support 
is out of question so Sig's current contribution's on MIF plugin is a must 
for geotools. 

Yours:
Jukka

"Luca Sigfrido Percich" <[EMAIL PROTECTED]>
26/09/2005 13:40

        To:     "Jukka Sirviö" <[EMAIL PROTECTED]>, 
[email protected]
        cc: 
        Subject:        Re: Geotools mif

Hi Jukka!

Please, can you FW my reply to the udig list, as I'm not a member of 
it? Meanwhile I CC the reply to Geotools-devel as the discussion 
might be of interest. Thanx!!!

For up to date code and documentation, please refer to:

http://docs.codehaus.org/display/GEOTOOLS/MIFDataStore

and

http://svn.geotools.org/geotools/trunk/gt/plugin/mif/

On 26 Sep 2005 at 11:28, Jukka Sirviö wrote:

> we could start from separating the reading and writing processes, this
> way we can share the testing and developing burden more easily? What's
> your opinion?
> I suggest that the MIFFile -class is splitted into a bit more
> convenient pieces like: 1) MIFFileReader 2) MIFFileWriter
> 
> probaply also
> 3) MIFFeatureWriter
> 4) MIFFeatureReader

Yes, there's already a certain degree of separation, take a look at 
the MIFile private inner classes Reader (extends FeatureReader) and 
Writer (Extends FeatureWriter)... I've left them inside MIFFile 
'cause they share some pieces of code, and I think they don't need to 
be derived or accessed form outside MIFFile. The problem is that if 
in the future a new type of geometry or geometry attributes are 
defined for MIF format, the class has to be modified, so no "plugin" 
or "extension" model has been provided.

MIFFile has been designed for being used standalone also (i.e. 
outside a MIFDataStore).

> JUnit testcases could be 'splitted' also, for example when running the
> tests the writer and reader tests could be easily separated. I ran the
> current  test's with normal mif in our purposes (30 - 100 Mb). I faced
> performance issues probaply mainly related that the writing and
> reading is done simultaneously during the run of JUnit  -testclass.

Ok.

> How about the Styling and text part of Region -element? Coodinates are
> straight-forward to handle but how about the example below? Any ideas?

Well, we've already discussed this issue (but unfortunately nobody 
seemed to be so interested in it), anyway the idea is that keeping 
style along with geographic data is a bad practice. It's nearly the 
same problem I faced when somebody asked me to support text 
objects... text objects are no database concept at all, they are 
merely a "map drawing" issue. Supporting the style at the row level 
for a geometry has the same value of supporting font style for a char 
field at the row level... font style, as object style, must derive 
from formalized object properties which are accessible from the 
database where the object is stored (i.e. all the polygons of 
buildings belonging to this class are to be filled in red). 

Please refer to:

http://jira.codehaus.org/browse/GEOT-653

On one hand, MapInfo allows for "bad practices" (IMHO). (Another bad 
practice is supporting mixed geometry types in the same table, again 
IMHO). On the other, style and text and other MapInfo features don't 
find a corresponding entity in the JTS/GeoTools world. In the 
following example, what we could do is similar to the approach for 
supporting TEXT objects, i.e. creating one or more MIF_PROPERTY 
String fields in which store the information we read so that we can 
write it back when needed. But what happens when a new feature is 
created? And how interpret the MIF style definition for rendering 
purposes?

> Region  1
>   19
> 3605482.86 6908238.08
> 3605461.4 6908257.01
> 3605451.29 6908270.91
>     Pen (3,2,8388736) 
>     Brush (1,0,16777215)
>     Center 3605495.49 6908255.75

My final proposal - I'm a DataBase/Developer guy, not a cartographer 
;o) is to ignore all the style info from MIF format, and concentrate 
in a best suited approach which could be automatically GENERATE a SLD 
DESCRIPTOR from a MapInfo WORKSPACE FILE (.WOR). 

The definition of styles for thematic maps in a workspace files uses 
the same style objects used in mif files, we have to implement 
classes which parse the MapInfo style descriptors (Symbol, Pen and 
Brush) in GeoTools renderers styles.

> Third, I did not find the the support for multipolygons? 

MultiPolygons are supported. MapInfo describes in the same way a 
linear ring, a polygon with a hole and a multipolygon: they are 
simply lists of linear rings. The MIFFile.Reader builds a Polygon or 
MultiPolygon according to the preferences (for example it can cast 
all the simple polygons to Multi) and to what it finds in MIF file.

> I suppose that current approach is well when examined the architect of
> it. The Tokenizer classes are well separated from the parsing. The
> actual String to be handled is always asked through
> MIFFileTokenizer.getToken() -method with several types of 'tokenizer'
> (or delimiter) parameter passsed by (empty or  ',' or ' ' , '(', or
> ')'). That is of course flexible way to do things but does it affect
> perfomance when parsing large files?

As I told you, I didn't check for performance. :o(

Sure the best performance improvement could be achieved modifying the 
MIFFile.Reader.readMIFCoordinate() method, as MIF format uses to keep 
a coordinate pair on each row. The problem is when things don't work 
as expected in the MIF file, i.e.:

x1, y1, x2, y2

or

x1, y1,
x2, y2

are treated correctly by the current 
MIFFile.Reader.readMIFCoordinate() method, but they probably wouldn't 
work in the new version. Keep in mind that MIF files can also be 
generated by third party software which may or may not write them 
according to precise formatting rules.

> of stuff we will get from mif -file? The basic delimiter is stored in
> mif file's header (as you have used), but I think we also know, or at
> least we can easily determine, the rest of the things in mif file
> without every time going through the variety of if -checkings
> (MIFStringTokenizer.getToken(char, boolean, boolean))? 

While it's no use improving the parsing of header (few rows), it's 
worth thinking about the MID files (Feature non-geometric 
attributes).
The problem of parsing the MID file lies in String fields which are 
always delimited by double quotes, and where double quote chars are 
escaped by doubling inside the string... like "A string containing a 
""quoted string"""... substring would not work here, not in a 
straightforward way I mean.

The different getToken methods allow you to skip parsing of double 
quotes, and in fact use a plain substring() call if a non-quoted 
string is expected.

Give a look at the MIFValueSetter objects, which are currently used 
for mapping MIF values into Feature attributes... they could also be 
used for getting data from the MIF stream... so we could think of a 
MIFValueSetter.getToken() method... this could give better 
performances.

> This kind of approach could be straigh-forward way to handle the
> parsing: - loop through the whole file one line at a time - parse line
> with StringTokenizer into smaller pieces - determine the type of the
> line from first token and do the necessary actions for the line based
> on that linetype

This is what is currently done, apart that you don't parse the whole 
file but one feature at a time according to the FeatureReader / 
FeatureWriter model. The MIFStringTokenizer.putToken() method has 
been added for allowing a "look forward", useful if you want to read 
all the style stuff (as I did for Text objects) instead of simply 
skipping it when you read the next feature.

> Yours:
> Jukka

Cheers

Sig

-- 
Luca Sigfrido Percich    ([EMAIL PROTECTED])
Agenzia Milanese Mobilità e Ambiente s.r.l. (http://www.ama-mi.it)
Direzione Sistemi Informativi e Modellistica 
Via Beccaria, 19 - 20122 Milano - tel. +39 02 884.67.262

_______________________________________________
User-friendly Desktop Internet GIS (uDig)
http://udig.refractions.net
http://lists.refractions.net/mailman/listinfo/udig-devel

[udig-devel] Re: Geotools mif

Reply via email to