Dear Community Experts! To start the text mining, we need the corpus.
Did any of you come across any open source solutions that can do the following tasks : 1. A researcher enters a few keywords, to the program, for example, "iphone", "Apple products", "MAcBook", restricts the results to the time period of 1 week. 2. The program goes to Google, searches for these keywords, 3. Creates a list of 200 first URLS for these queries. 4. Downloads the WebPages with these results as txt files, cleaning up the trash such as advertisements. A researcher can work with the results in openNLP or other text mining program. Thank you for your advice in case of a spare minute! All the best in what you do, Ivan --- Это сообщение свободно от вирусов и вредоносного ПО благодаря защите от вирусов avast! http://www.avast.com
