Re: common ways to run regex against either Hickory HTML or zippers?

2022-02-03 Thread Laws
Thank you, everyone. On Wednesday, February 2, 2022 at 3:22:53 PM UTC-5 lawrence...@gmail.com wrote: > Assume I've been cursed to scrape HTML. If I convert the pages to Hickory > I end up with a big mass of data which, sadly, lacks many "class" or "id"s > that would let me easily pick out the

Re: common ways to run regex against either Hickory HTML or zippers?

2022-02-02 Thread Harold
I would use enlive for this. - https://github.com/cgrand/enlive `re-pred` seems relevant: https://cljdoc.org/d/enlive/enlive/1.1.6/api/net.cgrand.enlive-html#re-pred Here's someone doing something similar a while ago: https://stackoverflow.com/questions/18604049/clojure-enlive-a-selector-tha

Re: common ways to run regex against either Hickory HTML or zippers?

2022-02-02 Thread Cora Sutton
If all you're looking for is the format CVE--N then by all means just use regex against the plain text of the page. If you need to do dom traversal then jsoup is a good choice. Otherwise, like Mark said, tree-seq is a great choice if you don't want to play with clojure.walk. On Wed, Feb 2,

Re: common ways to run regex against either Hickory HTML or zippers?

2022-02-02 Thread Mark Nutter
I don't know how common it is, but have you looked at the `tree-seq` function in Clojure? This seems like a good use case for it. Mark On Wed, Feb 2, 2022 at 3:22 PM lawrence...@gmail.com < lawrence.krub...@gmail.com> wrote: > Assume I've been cursed to scrape HTML. If I convert the pages to Hic

common ways to run regex against either Hickory HTML or zippers?

2022-02-02 Thread lawrence...@gmail.com
Assume I've been cursed to scrape HTML. If I convert the pages to Hickory I end up with a big mass of data which, sadly, lacks many "class" or "id"s that would let me easily pick out the data I need. However, for the most part, the only thing I really need off this page is the CVEs, which look