I’m following this as part of my talk at COSCON which I plan to include common
crawler.
Who is in charge of where the crawler is pointed and how would one ask for
additional URLs?
Regards,
Dave
Sent from my iPhone
> On Oct 8, 2018, at 10:31 PM, Dominik Stadler wrote:
>
> Hi Andi,
>
> I hav
Hi Andi,
I have now executed the CommonCrawlDownload-tool on crawl 2018-30, only 144
files did match by extension, I have collected them at
https://www.dropbox.com/s/w3sxnb5l3er3kdq/downloadEMF.zip?dl=0 however many
are actually some HTML, mostly redirects.
5hwaterwiki2011.wikispaces.com_file_lin
https://bz.apache.org/bugzilla/show_bug.cgi?id=62807
PJ Fanning changed:
What|Removed |Added
OS||All
Resolution|---
At some point I extracted all emfs from our corpus. I’ll see if that data
is still around and/or re-extract...prob have time tomorrow/ Wednesday
On Sun, Oct 7, 2018 at 5:01 PM Dominik Stadler
wrote:
> Hi Andi
>
> It is easy to change CommonCrawlDocumentDownload to fetch other mime-types,
> see
>
https://bz.apache.org/bugzilla/show_bug.cgi?id=62807
Bug ID: 62807
Summary: XSLFTableCell#removeBorder(BorderEdge.right) removes
the bottom edge not the right edge.
Product: POI
Version: unspecified
Hardware: PC