Ciao Mark S.

I just looked at the Gutenberg Complete Shakespeare 
http://www.gutenberg.org/cache/epub/100/pg100.txt

I see your point. It has bizarre stuff in it. Its a mess. It is not well 
laid out either.

I think what is needed is a regex PRE-process that samples texts to see if 
they follow a defined standard rigorously. That is one idea. 

That texts could be converted to JSON format for TW import I think still 
holds but clearly some kind of quality control is needed. BUT if it ends up 
having to be document by document its pointless.

Josiah

On Monday, 6 June 2016 19:27:18 UTC+2, Mark S. wrote:
>
> The Gutenberg KJV text is almost ideally suited for conversion with regex 
> of some type -- every verse has it's own id in exactly the same format.
>
> Looking at Shakespeare texts , on the other hand, at least in my quick 
> sampling, the formatting is very scatter-shot. It appears that the original 
> documents were scanned, but the output never corrected. In some cases 
> entire conversations might be mixed up like a paragraph. So you not only 
> have to import the data, but you have to do original hand-formatting and 
> fixing. If you wanted it broken down line-by-line, you would have to apply 
> your own numbering system as well.
>
> Mark 
>
> On Monday, June 6, 2016 at 9:57:48 AM UTC-7, Josiah wrote:
>>
>> Marc & all
>>
>> I really like this thread.
>>
>> Its dealing with CONTENT.
>>
>> It seems to me that MASS conversion of Gutenberg texts into a reliable 
>> TiddlyWiki JSON importable format file (using regex, or better a full 
>> featured Grep engine) is not beyond reach. In fact, very close.
>>
>> Its interesting to think that through further IMO.
>>
>> Josiah
>>
>>
>> On Friday, 26 February 2016 00:54:15 UTC+1, Marc wrote:
>>>
>>> I am trying to make a simple scripture using tiddly wiki 5 so people can 
>>> use it to add notes and reflections. 
>>>
>>> I would sure like to see some examples that might be out there to see 
>>> how others are doing similar projects. 
>>>
>>> There is such great knowledge amount this group. 
>>>
>>> Thanks 
>>>
>>> Sent from my iPhone
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tiddlywiki+unsubscr...@googlegroups.com.
To post to this group, send email to tiddlywiki@googlegroups.com.
Visit this group at https://groups.google.com/group/tiddlywiki.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tiddlywiki/122764ef-e8a8-46d5-b8b7-6ce75266a680%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to