Hi,

Stefan did not tell enough details for repeating the test exactly so my test set-up is probably a bit different.

Program: GDAL 2.1.0dev, released 2015/99/99 on Windows, compiled with sqlite-3.10.2 and libspatialite-4.3.0a
Test data: 1000000 points (2D) without attributes in shp format
Storage: USB 2.0 drive, same disk used for both input and output
Test script which is also testing a performance hint about turning synchronous write off:

@echo %time%
ogr2ogr -f gpkg plain_points.gpkg plain_points.shp
@echo %time%
ogr2ogr -f gpkg plain_points_nosync.gpkg plain_points.shp --config OGR_SQLITE_SYNCHRONOUS OFF
@echo %time%
ogr2ogr -f SQLite -dsco spatialite=yes plain_points.sqlite plain_points.shp
@echo %time%
ogr2ogr -f SQLite -dsco spatialite=yes plain_points_nosync.sqlite plain_points.shp --config OGR_SQLITE_SYNCHRONOUS OFF
@echo %time%
ogr2ogr -f SQLite -dsco spatialite=yes plain_points_nosync_gt.sqlite plain_points.shp --config OGR_SQLITE_SYNCHRONOUS OFF -gt 65536
@echo %time%

Timings from the 5 runs:

1: 38 seconds
2: 38 seconds
3: 41 seconds
4: 37 seconds
5: 39 seconds

Conclusions:

1) In my tests with my data, software, and hardware, there were no meaningful differences in the conversion speeds. Test data of one million points may be too small or then the conversion needs so little computing that I was actually measuring the I/O speed of the USB drive. 2) Stefan measured a creation time of 1 min 51 sec for the Spatialite. There must be something in that. Fresh GDAL uses a default value of 20000 for -gt (not corrected into all GDAL documents) and using -gt 65536 did not make any difference with GDAL 2.1-dev. However, with explicit -gt 200 the timing was 93 second so it may explain most part of the slow result. 200 is what old GDAL versions use as a default value for -gt which makes 5000 transactions for one million points. Stefan, which GDAL version did you use? Could you make a new test with bigger -gt value and if needed, re-consider your conclusion "Spatialite is several times slower for creation time". Could it rather be "Transactions are expensive with SQLite, try to avoid making too many"?

File sizes of my test data which had million points without attributes:
shp: 35 MB
gpkg: 92 GB
splite: 130 GB

I took also a sample of one million points from a real world dataset with attributes: three doubles, two dates, and two strings and that changed the numbers quite a lot:
shp: 170 MB
gpkg: 153 GB
splite: 185 GB

No huge differences is file sizes when data have attributes.

-Jukka Rahkonen-




Mike Thompson wrote 2016-03-23 03:26:
This from the ogr2ogr documentation[1] may be relevant:

"When writing into transactional DBMS (SQLite/PostgreSQL,MySQL,
etc...), it might be beneficial to increase the number of INSERT
statements executed between BEGIN TRANSACTION and COMMIT TRANSACTION
statements. This number is specified with the -gt option. For example,
for SQLite, explicitly defining -GT 65536 ensures optimal performance"

Mike
[1] http://www.gdal.org/ogr2ogr.html [3]

On Tue, Mar 22, 2016 at 5:01 PM, Stefan Keller <sfkel...@gmail.com>
wrote:

Hi Frederik and Jukka

Before I try give answers to performance let's be aware that we're
(at
least I am) speaking about a "desktop exchange format", not a
storage
fomat for GIS processing.

But Frederik's comment piqued my curiosity and I did some quick
comparison.
I generated 1 mio. records in PostsGIS with this table
CREATE TABLE benchmark (id serial primary key, txt varchar(32),
geom
geometry(point,4326) );

Then I used OGR2OGR to create the following three file formats:
GeoPackage (using 73.9 MB disk space), Shapefiles (dbf/shp/shx 117
MB)
and Spatialite (173 MB).

Creation time of GeoPackage was 18 sec., Shapefile 21 sec. and
Spatialite 1 min 51 sec.
So, GeoPackage is a bit faster than Shapefiles and significantly
(about 37%) smaller in size.
Spatialite in fact consumes much more disk space than Shapefile and
GeoPackage, and Spatialite is several times slower for creation
time.

This could explain the preformance issues of Spatialite Frederic
mentioned.

:Stefan

2016-03-22 13:56 GMT+01:00 Jukka Rahkonen
<jukka.rahko...@latuviitta.fi>:
Frederik Ramm <frederik <at> remote.org [1]> writes:


Hi,

On 03/20/2016 10:56 PM, Stefan Keller wrote:
> But Shapefile remains an oldtimer with more drawbacks than
limited
> field names; see [1].
> GeoJSON (ascii) and GeoPackages (binary) are formats which are
more
> suited for the job.
> I still have hope that JOSM will be able to read those vector
formats too.

Frankly, whenever I venture into the brave new world of
Spatialite, I
come back to good old shape files after a while for performance
reasons.
I'm not sure if Geopackage has significant performance
improvements over
simple Spatialite but if it hasn't then my recommendation for
simple GIS
processing is certainly to stick with shape files for the time
being -
despite all their shortcomings.


Hi Frederic,

I would like to receive some sample data, exact way to reproduce
some of
your ventures and cold numbers about the speed you have
experienced.
Spatialite does have it's limits but for plain selects with
spatial and
attribute filters it can well outperform both shapefiles and
PostGIS.

I keep most vector data for WMS services in Spatialite or
GeoPackage due to
the already mentioned and some other reasons:
- supports long attribute names
- supports strings longer than 255 characters
- supports SQL
- supports attribute indexes
- much less encoding problems due to UTF-8
- one single file vs. a bunch of files in shapefile, perhaps even
split to
separate bunches for points, lines and polygons.

For me SpatiaLite is a little bit slower than shapefiles if only
spatial
filter (BBOX) is used but usually faster if also attribute
filters are
involved, especially if more than one field is needed in filters
(Shapefiles
can be sorted by one attribute only). Of course spatialite must
have indexes
which suit the queries and when it comes to spatial index, the
client must
know how to utilize the table based R-Tree index. I also
recommend to VACUUM
once the database is ready to use.

Many spatial operations are relatively slow in Spatialite and I
don't
usually utilize them on-the-fly with WMS server. Instead, I run
the
algorithm once and store the result into a new table because a
few
mega/gigabytes of additional disk space is not crucial on the
server.
However, such operations tend to be slow also if shapefiles are
used as
source data.

Write performance especially with concurrent writes is another
story. I am
talking about read-only operations. I know that I am writing
empty words as
far as I do not include reproducible facts but I am willing to
join to a
controlled test if someone is organizing such.

-Jukka Rahkonen-




_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk [2]

_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk [2]



Links:
------
[1] http://remote.org
[2] https://lists.openstreetmap.org/listinfo/talk
[3] http://www.gdal.org/ogr2ogr.html

_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk

Reply via email to