Hi again,

My apologies, but I have found the problem, and it is in the OSMImporter
itself, nothing to do with Lucene or Neo4j. Peter made a
commit<https://github.com/neo4j/neo4j-spatial/commit/b5e0f1d1a11ed9c8b2b8074f529362a1607a7643#src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.java>in
May that while at first glance appears to be a cleanup of my code
(removal of string literals), it did have two meaningful changes I only saw
on deeper inspection:

   - Addition of the map "type": "exact" to the index creating (when I
   removed this, node creation improved from 70/s to 140/s)
   - User control over the commit size (previously I had hard-coded this to
   5000 nodes per tx).

There was a small, but significant bug in the commit size, with the new user
parameter not being used to initialize anything, with the consequence that
every node was committed individually. Setting the block size back to 5000
increased the node creation rate to nearly 10000 (over 100 times faster).
That is a serious improvement.

Sorry again for wasting space on the list. I'm glad this was a user error,
though, not a neo4j issue :-)

Regards, Craig

On Mon, Jun 27, 2011 at 12:54 AM, Craig Taverner <cr...@amanzi.com> wrote:

> Sorry for the lack of details. I wrote the email late at night, as I am
> again.
>
> Anyway, the relevant code in github is 
> OSMImporter.java<https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.java>.
> When adding nodes to the graph, it also adds the osm-id to a lucene index.
> There is no index#removal call, only multiple index#add calls within the
> same transaction. In fact we call index.add and index.get for one index (osm
> changesets), while calling index.add on another (osm-nodes). The relevant
> lines of code are 812 for adding new OSM nodes to the graph, and 914 for
> finding changesets in a different index.
>
> I have not investigated for which version of neo4j the slowdown started, or
> if there is somehow some other cause. I will try find time to do that later
> this week. But I thought I should ask on the list anyway in case anyone else
> has a similar problem, or if there are some obvious answers.
>
>
> On Sun, Jun 26, 2011 at 1:45 PM, Mattias Persson <
> matt...@neotechnology.com> wrote:
>
>> Please elaborate on how you are using your index. Are you using
>> Index#remove(entity,key) or Index#remove(entity) followed by get/query in
>> the same tx? There was a recent change in transactional state
>> implementation, where a full representation (in-memory lucene index) was
>> needed for it to be able to return accurate results in some corner cases.
>> That change could slow things down, but not that much though. I'll give
>> some
>> different scenarios a go and see if I can find some culprit for this.
>>
>> But again, a little more information would be useful, as always.
>>
>> 2011/6/26 Craig Taverner <cr...@amanzi.com>
>>
>> > Hi,
>> >
>> > Has anyone noticed a slowdown of imports into neo4j with recent
>> snapshots?
>> > Neo4j-spatial importing OSM data (which uses lucene to find matching
>> nodes
>> > for ways) is suddenly running much slower than usual on non-batch
>> imports.
>> > For most of my medium sized test cases, I normally have surprisingly
>> > similar
>> > import times for batch inserter and non-batch inserter
>> > (EmbeddedGraphDatabase) versions of the OSM import, but in recent runs
>> the
>> > normal API is now more than 10 times slower. Down to 70 nodes per
>> second,
>> > which is insanely slow.
>> >
>> > Any idea if there is something in the recent snapshots for me to look
>> into?
>> > Reproducing the problem requires simply running the TestOSMImport test
>> > cases
>> > in neo4j-spatial. I have only tried this on my laptop, so I have not
>> ruled
>> > out that there is something local going on.
>> >
>> > Regards, Craig
>> > _______________________________________________
>> > Neo4j mailing list
>> > User@lists.neo4j.org
>> > https://lists.neo4j.org/mailman/listinfo/user
>> >
>>
>>
>>
>> --
>> Mattias Persson, [matt...@neotechnology.com]
>> Hacker, Neo Technology
>> www.neotechnology.com
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to