building corpus fast - searching for advice from experts

Ivan VAGANOV Mon, 28 Jul 2014 12:07:34 -0700

Dear Community Experts!

To start the text mining, we need the corpus.




Did any of you come across any open source solutions that can do the
following tasks :



1.           A researcher enters a few keywords, to the program, for
example, "iphone", "Apple products", "MAcBook", restricts the results to the
time period of 1 week.

2.           The program goes to Google, searches for these keywords,

3.           Creates a list of 200 first URLS for these queries.

4.           Downloads the WebPages with these results as txt files,
cleaning up the trash such as advertisements.

A researcher can work with the results in openNLP or other text mining
program.



Thank you for your advice in case of a spare minute!

All the best in what you do,

Ivan







---
Это сообщение свободно от вирусов и вредоносного ПО благодаря защите от вирусов 
avast!
http://www.avast.com

building corpus fast - searching for advice from experts

Reply via email to