Re: AW: Working with very large text documents

Thilo Goetz Fri, 18 Oct 2013 08:10:28 -0700

Don't you have a hadoop cluster you can use? Hadoop would handle thefile splitting for you, and if your UIMA analysis is well-behaved, youcan deploy it as a M/R job, one record at a time.


--Thilo


On 10/18/2013 12:25 PM, armin.weg...@bka.bund.de wrote:

Hi Jens,

It's a log file.

Cheers,
Armin

-----Ursprüngliche Nachricht-----
Von: Jens Grivolla [mailto:j+...@grivolla.net]
Gesendet: Freitag, 18. Oktober 2013 11:05
An: user@uima.apache.org
Betreff: Re: Working with very large text documents

On 10/18/2013 10:06 AM, Armin Wegner wrote:

What are you doing with very large text documents in an UIMA Pipeline, for 
example 9 GB in size.


Just out of curiosity, how can you possibly have 9GB of text that represent one 
document? From a quick look at project gutenberg it seems that a full book with 
HTML markup is about 500kB to 1MB, so that's about a complete public library 
full of books.

Bye,
Jens

Re: AW: Working with very large text documents

Reply via email to