- http://www.mediawiki.org/wiki/API%3aMain_page
5- http://jwbf.sourceforge.net/
I'd appreciate any suggestions.
Regards
Khalida Ben Sidi Ahmed
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hello,
I need an html dump of Wikipedia but the link http://static.wikipedia.org/ does
not work.
I'd appreciate any explanation or suggestion.
Regards
Ben Sidi Ahmed
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
I need static Html dumps. In the webpage you've mentionned, when I click on
the link of static html, this one is not accessible.
Truly yours
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
I need an Html dump of Wikipedia because I have written a java code which
extract text from an html content and I would like to apply it on this
dump. In fact I need to extract the first sentence of a list of articles
(200) and I don' know how to do it on other dumps. If you have any idea of
other
Currently, I'm using the online version with the java api Jsoup. It does
not work perfectly. After the extraction of less than 10 articles, my
project shows me a set of exceptions.
Could you please give me the approximative number of articles I can get
with these tools.
If you just need a few
I just wonder if the problem can be due to the speed of my connexion. The
text of the exception is :
Grave: null
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at
Hi Kaminski
I appreciate your help, thank you very much indeed. I will try the options
that were given to me today. If my attempts fail, I will contact you for
help.
Many thanks for Hoehrmann: I'll immediately see if I can succeed with curl
or wget.
Regards
Khalida Ben Sidi Ahmed
An other important question to which I seek a response days ago:
If I download Wikitaxi and have Wikipedia offline, can I query this offline
version using Java?
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
Hello!
I don't know if the subject of this question belongs to the scope of this
group. Anyway, I will be pleased if I find an aswer to my question.
I'm writing some Java code in order to realize NLP tasks upon texts using
Wikipedia. What can I do in order to extract the first paragraph of a
I have already read the responses given in this post.
I want to the extract the first paragraph (or the first sentence) for a
list of 100 articles.
I could not use JWPL beacause I don't have a big hard disk space to create
the DB. I try to use JSoup but I need examples.
I'm developping my project in Java. I'm not a good php developper.
JWPL needs fist to create a database whose size =158 GB. For the RAM, at
least 2 GB are necessary. I don't have neither a big hard disk neither a
big space ram. In addition, creating such big database to just extract the
first
The list of the articles I will need is not known from the beggining.
Through my project, I will find a list of words (50). I try to find for
them definitions in Wikipedia. After that I will extract the hyperonym of
each word. I will have a new list for which I then retrieve the respective
The words I use are belonging to a special domain (oil and gas industry).
So Wordnet and even Wictionnary are not useful enough (they are generalized
corpora).
Thank you very much indeed.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
Thank you Hoehrmann. I will try to apply the options you've mentionned.
However, if someone can help me in using JSoup, his ideas are welcomed.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
Hello!
In the html code of a Wikipedia article how to recognise the
*first*sentence of this article?
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thank you very much. That exactly what I wanted to know.
2011/11/27 Bjoern Hoehrmann derhoe...@gmx.net
* Khalida BEN SIDI AHMED wrote:
In the html code of a Wikipedia article how to recognise the
*first*sentence of this article?
It's not marked up and probably differs among language
Hello,
This is the answer that was given for my question:
http://stackoverflow.com/questions/8286786/wikipedia-first-paragraph
It works perfectly, the code may be useful for you.
Truly yours
Khalida Ben Sidi Ahmed
___
Wikitech-l mailing list
Wikitech
For my research I need to download 3 files:
- [LANGCODE]wiki-[DATE]-pages-articles.xml.bz2 *OR*
[LANGCODE]wiki-[DATE]-pages-meta-current.xml.bz2
- [LANGCODE]wiki-[DATE]-pagelinks.sql.gz
- [LANGCODE]wiki-[DATE]-categorylinks.sql.gz
I downloaded the 2 first ones. Now I can not have an
://dumps.wikimedia.org/enwiki/2015/
and they are accessible. Can you give a couple of specific links that
did not work?
Ariel
Στις 26-11-2011, ημέρα Σαβ, και ώρα 20:54 +0100, ο/η Khalida BEN SIDI
AHMED έγραψε:
For my research I need to download 3 files:
- [LANGCODE]wiki-[DATE]-pages
Even http://dumps.wikimedia.org/enwiki/2015/ doesn't work.
The text what the browser shows is alaways the same : 403 - Forbidden
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
In fact, I'm downloading now enwiki-latest-pagelinks.sql.gz in my laptop (
wifi connection).
All the links of Wikipedia are forbidden either in my laptop or in my pc
(linked internet connection).
I stopped the downlod and the links are accessible.
___
happenning simultaniously.
Thank you very much indeed for your responses.
Truly yours
Khalida Ben Sidi Ahmed
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
22 matches
Mail list logo