Hi Eric,

unfortunately, on Windows you also need to download and install winutils.exe and hadoop.dll,
see
  https://github.com/cdarlint/winutils and

https://stackoverflow.com/questions/41851066/exception-in-thread-main-java-lang-unsatisfiedlinkerror-org-apache-hadoop-io

The installation of Hadoop is not mandatory - the Nutch binary package
already includes Hadoop jar files.

Alternatively, you may prefer to run Nutch on Linux - no additional installations required.

Best,
Sebastian

On 5/15/23 04:07, Eric Valencia wrote:
Hello everyone,

So, I set up Nutch 1.19, Solr 8.11.2, and hadoop 3.3.5, to the best of my
knowledge.

After, I went into the nutch directory and ran this command:
*bin/nutch generate crawl/crawldb crawl/segments*

Then, I got an error:
*Exception in thread "main" java.lang.UnsatisfiedLinkError: 'boolean
org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String,
int)'*

Does anyone know how to solve this problem?

Below is the full output:
$ bin/nutch generate crawl/crawldb crawl/segments
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/C:/Users/User/Desktop/wiki/a/ApacheNutch/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/C:/Users/User/Desktop/wiki/a/ApacheNutch/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]
2023-05-14 19:01:16,433 INFO o.a.n.p.PluginManifestParser [main] Plugins:
looking in:
C:\Users\User\Desktop\wiki\a\ApacheNutch\apache-nutch-1.19\plugins
2023-05-14 19:01:16,558 INFO o.a.n.p.PluginRepository [main] Plugin
Auto-activation mode: [true]
2023-05-14 19:01:16,558 INFO o.a.n.p.PluginRepository [main] Registered
Plugins:
2023-05-14 19:01:16,558 INFO o.a.n.p.PluginRepository [main]    Regex URL
Filter (urlfilter-regex)
2023-05-14 19:01:16,558 INFO o.a.n.p.PluginRepository [main]    Html Parse
Plug-in (parse-html)
2023-05-14 19:01:16,558 INFO o.a.n.p.PluginRepository [main]    HTTP
Framework (lib-http)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]    the nutch
core extension points (nutch-extensionpoints)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]    Basic
Indexing Filter (index-basic)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]    Anchor
Indexing Filter (index-anchor)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]    Tika Parser
Plug-in (parse-tika)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]    Basic URL
Normalizer (urlnormalizer-basic)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]    Regex URL
Filter Framework (lib-regex-filter)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]    Regex URL
Normalizer (urlnormalizer-regex)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]    URL
Validator (urlfilter-validator)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]    CyberNeko
HTML Parser (lib-nekohtml)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]    OPIC
Scoring Plug-in (scoring-opic)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]
  Pass-through URL Normalizer (urlnormalizer-pass)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]    Http
Protocol Plug-in (protocol-http)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]
  SolrIndexWriter (indexer-solr)
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main] Registered
Extension-Points:
2023-05-14 19:01:16,559 INFO o.a.n.p.PluginRepository [main]     (Nutch
Content Parser)
2023-05-14 19:01:16,560 INFO o.a.n.p.PluginRepository [main]     (Nutch URL
Filter)
2023-05-14 19:01:16,560 INFO o.a.n.p.PluginRepository [main]     (HTML
Parse Filter)
2023-05-14 19:01:16,560 INFO o.a.n.p.PluginRepository [main]     (Nutch
Scoring)
2023-05-14 19:01:16,560 INFO o.a.n.p.PluginRepository [main]     (Nutch URL
Normalizer)
2023-05-14 19:01:16,560 INFO o.a.n.p.PluginRepository [main]     (Nutch
Publisher)
2023-05-14 19:01:16,560 INFO o.a.n.p.PluginRepository [main]     (Nutch
Exchange)
2023-05-14 19:01:16,560 INFO o.a.n.p.PluginRepository [main]     (Nutch
Protocol)
2023-05-14 19:01:16,560 INFO o.a.n.p.PluginRepository [main]     (Nutch URL
Ignore Exemption Filter)
2023-05-14 19:01:16,560 INFO o.a.n.p.PluginRepository [main]     (Nutch
Index Writer)
2023-05-14 19:01:16,560 INFO o.a.n.p.PluginRepository [main]     (Nutch
Segment Merge Filter)
2023-05-14 19:01:16,560 INFO o.a.n.p.PluginRepository [main]     (Nutch
Indexing Filter)
2023-05-14 19:01:16,969 INFO o.a.n.c.Generator [main] Generator: starting
at 2023-05-14 19:01:16
2023-05-14 19:01:16,969 INFO o.a.n.c.Generator [main] Generator: Selecting
best-scoring urls due for fetch.
2023-05-14 19:01:16,969 INFO o.a.n.c.Generator [main] Generator: filtering:
true
2023-05-14 19:01:16,969 INFO o.a.n.c.Generator [main] Generator:
normalizing: true
2023-05-14 19:01:16,974 INFO o.a.n.c.Generator [main] Generator: running in
local mode, generating exactly one partition.
Exception in thread "main" java.lang.UnsatisfiedLinkError: 'boolean
org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String,
int)'
         at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native
Method)
         at
org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:793)
         at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1249)
         at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1454)
         at
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:601)
         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972)
         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014)
         at org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:2180)
         at
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2179)
         at
org.apache.hadoop.fs.ChecksumFileSystem.listLocatedStatus(ChecksumFileSystem.java:783)
         at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:320)
         at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:279)
         at
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
         at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:404)
         at
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310)
         at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327)
         at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1571)
         at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1568)
         at
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
         at java.base/javax.security.auth.Subject.doAs(Subject.java:525)
         at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1568)
         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1589)
         at org.apache.nutch.crawl.Generator.generate(Generator.java:895)
         at org.apache.nutch.crawl.Generator.run(Generator.java:1119)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
         at org.apache.nutch.crawl.Generator.main(Generator.java:1066)

Reply via email to