Re: [OSM-ja] Fwd: [Tagging] Mechanical Edit: fix japanese train stations wikipedia/names fields

Satoshi IIDA Thu, 18 Oct 2012 06:51:35 -0700

いいだです。

Fabienさんに説明しました。


1.ソースコードをアップロードしていただけました。
　まだ僕も読めていないですが、これから読みます。

■スクリプト本体
http://fabsk.eu/osm/osm_wk_name.py

2.Wikipediaから名称を引っ張ってくるのは、ライセンスの関係でやめておいたほうがよさそうなので取りやめ。
なので、StationのNodeに対して wikipediaタグを挿入することを目的にしています。
Wikipediaの編集ページに出てくる文字列で、座標を記述している行を抜き出して、
OSM上に存在しているStation Nodeと手動で引き比べることで、編集を行おうとされています。

ただし、目標は「すべてのStationにナニガナンデモWikipediaタグを補完する」ことではなく、
「Wikipediaタグを入れたいので、できるだけ労力をかけずに、できる限りの部分を補完したい。
Station Nodeが無い場合などは諦める」など、かなり柔軟です。
# そもそも、ライセンス移行でけっこうな数の鉄道駅が消えている気がするので、そもそも全数はちょっとすぐには無理です(^^;

あと、Wikipediaにページが無い、当該の駅のページかどうか判断がつかない、座標が明らかにずれている、などは
スクリプト内でレポートされ、手動でチェックが出来るようになっています。
説明用のWikipageはこれから作成するとのことです。

OpenLinkMapで付帯情報が見やすくなっていたり、公式Mapnikの機能拡張が提唱されていたりする昨今、
とても面白いチャレンジだと思っています。
みなさまどんなかんじでございましょ？

■OpenLinkMap
http://www.openlinkmap.org/
http://lists.openstreetmap.org/pipermail/talk-ja/2012-September/006762.html

■原文
I know that the content of the Wikipedia pages are not very
structured, but it's possible to retrieve information in a safe way. I
will write a Wiki page describing in depth my proposition, but I can
already give you a few details.

Let's consider for example this train station: 堀内公園駅. If you go to its
Wikipedia edit page, you can see that the Wikipedia tagging language
is not too complicated to analyse. You will have a line like that:

|座標      = {{ウィキ座標2段度分秒|34|55|39.35|N|137|5|22.85|E|region:JP}}

So it is easy to find the line, extract the coordinates, compare to
the one in Openstreetmap and be sure that were are talking about the
right train station.
If there are no coordinates in the Wikipedia page, we could either:
- give up for this station
- check if there are more than one station with this name (in the OSM
data). If there is only one station, we could assume that we are
pointing on the right Wikipedia page

My goal was not to fill the field ≪wikipedia≫ of each of the train
stations, but to do it safely for a large number of them, rather than
doing it manually.
And for those we could not fill, it can put the reason (≪no page≫,
≪disambiguation page≫, ≪non-matching coordinates≫) in a report, so
someone (for example: me) can a look manually.






2012年10月15日 20:28 ribbon <o...@ns.ribbon.or.jp>:
> On Mon, Oct 15, 2012 at 01:50:46PM +0900, Satoshi IIDA wrote:
>>
>> > とりあえず思いつくとしては、Wikipedia:jaのページから
>> > どうデータを拾うか、それを説明してもらうことでしょうか。
>> そうですね。
>>　 ・Githubとかにソースをアップロード
>>　 ・あるいは、Wikiページで説明
>>
>> の、どちらかをお願いしてみましょうか。
>> (文章で説明してくださっているのでだいたいイメージはわかりますが、
>> 細かい部分、という意味ですよね？)
>
> Wikipediaを見てみたのですが、必ずしも項目毎にXMLで構造化されている
> わけではないので、うまく拾えるか、という所が気になります。
> また、ローマ字部分に付いては、きちんと正規化されていない(大文字小文字が
> 混じっていたり、大文字だけ)ように見えます。
>
> 本当は、Wikipediaのデータがきちんと整備(構造化)されて、そのまますっと
> 引っ張って来れればいいんでしょうけど。
>
> というわけで、実際はどうなの、という所が見たいわけです。
>
> oota
>
> _______________________________________________
> Talk-ja mailing list
> Talk-ja@openstreetmap.org
> http://lists.openstreetmap.org/listinfo/talk-ja



-- 
Satoshi IIDA
mail: nyamp...@gmail.com
twitter: @nyampire

_______________________________________________
Talk-ja mailing list
Talk-ja@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-ja

Re: [OSM-ja] Fwd: [Tagging] Mechanical Edit: fix japanese train stations wikipedia/names fields

メールによる返信