Dear Community Experts!

To start the text mining, we need the corpus.



Did any of you come across any open source solutions that can do the
following tasks :



1.           A researcher enters a few keywords, to the program, for
example, "iphone", "Apple products", "MAcBook", restricts the results to the
time period of 1 week.

2.           The program goes to Google, searches for these keywords,

3.           Creates a list of 200 first URLS for these queries.

4.           Downloads the WebPages with these results as txt files,
cleaning up the trash such as advertisements.

A researcher can work with the results in openNLP or other text mining
program.



Thank you for your advice in case of a spare minute!

All the best in what you do,

Ivan







---
Это сообщение свободно от вирусов и вредоносного ПО благодаря защите от вирусов 
avast!
http://www.avast.com

Reply via email to