Thank you, everyone.
On Wednesday, February 2, 2022 at 3:22:53 PM UTC-5 lawrence...@gmail.com
wrote:
> Assume I've been cursed to scrape HTML. If I convert the pages to Hickory
> I end up with a big mass of data which, sadly, lacks many "class" or "id"s
> that would let me easily pick out the
I would use enlive for this.
- https://github.com/cgrand/enlive
`re-pred` seems
relevant:
https://cljdoc.org/d/enlive/enlive/1.1.6/api/net.cgrand.enlive-html#re-pred
Here's someone doing something similar a while
ago:
https://stackoverflow.com/questions/18604049/clojure-enlive-a-selector-tha
If all you're looking for is the format CVE--N then by all means
just use regex against the plain text of the page. If you need to do dom
traversal then jsoup is a good choice. Otherwise, like Mark said, tree-seq
is a great choice if you don't want to play with clojure.walk.
On Wed, Feb 2,
I don't know how common it is, but have you looked at the `tree-seq`
function in Clojure? This seems like a good use case for it.
Mark
On Wed, Feb 2, 2022 at 3:22 PM lawrence...@gmail.com <
lawrence.krub...@gmail.com> wrote:
> Assume I've been cursed to scrape HTML. If I convert the pages to Hic
Assume I've been cursed to scrape HTML. If I convert the pages to Hickory I
end up with a big mass of data which, sadly, lacks many "class" or "id"s
that would let me easily pick out the data I need. However, for the most
part, the only thing I really need off this page is the CVEs, which look