Re: [tw5] Re: Tiddlywiki and regexp

TonyM Thu, 22 Aug 2019 07:58:19 -0700

Mark - Wow,

I will test it out tomorrow to see how far I can take it.


I hope it works for multi-line tags

My interest would be also the option to return
<li>line 3</li>
<li>line 2</li>
<li>line 1</li>
or
line 3
line 2 <https://tiddlywiki.com/#line%202>
line 1 <https://tiddlywiki.com/#line%201>
Because keeping the valid tags can be made use of as well.

Ahd also see how to handle If the list tag had a style eg <li 
style="something"> it would be nice if we could return
<li style="something">line 1</li>
or
line 1

If so a lot can be done to extract useful content from html, even if just 
to summarise some content.

Perhaps further resolution would help like <section 
name=extract>content</section>

Or extract list items.

Even without using html a tiddlers text field could use html block and 
inline elements https://www.w3schools.com/html/html_blocks.asp to structure 
the content, and with such a regex macro extract parts of the tiddler text 
such as say a prepared extract from the content, or an excerpt, or a config 
settings or more.

Regards
Tony


On Friday, August 23, 2019 at 12:22:47 AM UTC+10, Mark S. wrote:
>
>
> There's that saying, "When all you have is a hammer, everything starts to 
> look like a nail."
>
> All we have is regex. It would be great to have some other tool for 
> extracting actual DOM-like structures the way you
> could with TW classic. But we don't have it.
>
> Actually, the tool we have for regexp is also a bit lacking. There's no 
> tool for directly lifting desired target text. The new splitregexp only 
> splits, it doesn't 
> return the text we want to find. Here's my version that does most 
> literally what you ask for
>
> <$vars realchars="[^\s]+">
> <$list filter="[{test}splitregexp[\n]join[ ]splitregexp[<li>
> ]butfirst[1]splitregexp[</li>]butlast[1]regexp<realchars>]">
>
> </$list>
> </$vars>
>
> Input:
>
> More text here
> <li>line 3</li>
> <li>line 2</li>
> <li>line 1</li>
> More text there
>
> Output
>
>
> line 3 <https://tiddlywiki.com/#line%203>
> line 2 <https://tiddlywiki.com/#line%202>
> line 1 <https://tiddlywiki.com/#line%201>
>
>
>
> Good luck!
>
> On Thursday, August 22, 2019 at 2:21:34 AM UTC-7, TonyM wrote:
>>
>> Jeremy,
>>
>> You are aware I do not want so much to parse it as locate the content 
>> between matching tags.
>>
>> Its intention is to access content delimited by html tags inside the text 
>> content.
>>
>> Perhaps we could use it to retrieve items between the section div tags or 
>> all instances of text between the li tags.
>>
>> Regards
>> Tony
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tiddlywiki/69ec934d-1330-4961-9758-e2ce91c80e60%40googlegroups.com.

Re: [tw5] Re: Tiddlywiki and regexp

Reply via email to