Thanks Sebastian. I will try it and if it works, I will merge the fixes you guys put out.
On Tue, Jan 17, 2023 at 4:02 AM Sebastian Nagel <wastl.na...@googlemail.com> wrote: > Hi Kamil, > > after some trials I come up with a different solution for the issue with > the > "unparseable date", see > > https://github.com/apache/nutch/pull/752 > > The solution providing a pattern reproducibly fails in certain locales, see > the comments in > > https://issues.apache.org/jira/browse/NUTCH-2974 > > Just in case you want to try it. > > ~Sebastian > > On 11/21/22 10:36, Sebastian Nagel wrote: > > Hi Kamil, > > > > thanks for trying and finding a solution! I've open a JIRA issue to > track the > > problem: https://issues.apache.org/jira/browse/NUTCH-2974 > > > > Thanks! > > > > Sebastian > > > > On 11/19/22 18:37, Kamil Mroczek wrote: > >> I've been able to work around this issue by adding "pattern" to touch > tag > >> on line 101 in build.xml like so: > >> > >> <touch datetime="01/25/1971 2:00 pm" pattern="MM/dd/YYYY hh:mm a"> > >> > >> On Fri, Nov 18, 2022 at 12:32 PM Kamil Mroczek <kamil@elio.earth> > wrote: > >> > >>> Hello, > >>> > >>> When I run the "ant runtime" command I am getting: > >>> > >>> /home/hadoop/apache-nutch/build.xml:101: Unparseable date: "01/25/1971 > >>> 2:00 pm" > >>> > >>> I've tried different date formats to no avail. There was a similar > issue > >>> that was fixed in version 1.19, NUTCH-2512 > >>> <https://issues.apache.org/jira/browse/NUTCH-2512>. I am using Nutch > >>> 1.19. I am using Java 11. This is running on the AWS EMR master node > using > >>> a vanilla AMI running AWS Linux 2.0.20221004.0. Some more debugging > info > >>> below. > >>> > >>> Kamil > >>> ============= > >>> [hadoop@ip-172-31-25-62 apache-nutch]$ java -version > >>> openjdk version "11.0.16.1" 2022-08-12 LTS > >>> OpenJDK Runtime Environment Corretto-11.0.16.9.1 (build > 11.0.16.1+9-LTS) > >>> OpenJDK 64-Bit Server VM Corretto-11.0.16.9.1 (build 11.0.16.1+9-LTS, > >>> mixed mode) > >>> > >>> [hadoop@ip-172-31-25-62 apache-nutch]$ locale > >>> LANG=en_US.UTF-8 > >>> LC_CTYPE="en_US.UTF-8" > >>> LC_NUMERIC="en_US.UTF-8" > >>> LC_TIME="en_US.UTF-8" > >>> LC_COLLATE="en_US.UTF-8" > >>> LC_MONETARY="en_US.UTF-8" > >>> LC_MESSAGES="en_US.UTF-8" > >>> LC_PAPER="en_US.UTF-8" > >>> LC_NAME="en_US.UTF-8" > >>> LC_ADDRESS="en_US.UTF-8" > >>> LC_TELEPHONE="en_US.UTF-8" > >>> LC_MEASUREMENT="en_US.UTF-8" > >>> LC_IDENTIFICATION="en_US.UTF-8" > >>> LC_ALL= > >>> > >>> ------- Ant diagnostics report ------- > >>> Apache Ant(TM) version 1.9.2 compiled on November 13 2017 > >>> > >>> ------------------------------------------- > >>> Implementation Version > >>> ------------------------------------------- > >>> core tasks : 1.9.2 in file:/usr/share/java/ant/ant.jar > >>> > >>> ------------------------------------------- > >>> ANT PROPERTIES > >>> ------------------------------------------- > >>> ant.version: Apache Ant(TM) version 1.9.2 compiled on November 13 2017 > >>> ant.java.version: 1.8 > >>> Is this the Apache Harmony VM? no > >>> Is this the Kaffe VM? no > >>> Is this gij/gcj? no > >>> ant.core.lib: /usr/share/java/ant/ant.jar > >>> ant.home: /usr/share/ant > >>> > >>> ------------------------------------------- > >>> ANT_HOME/lib jar listing > >>> ------------------------------------------- > >>> ant.home: /usr/share/ant > >>> ant-bootstrap.jar (20919 bytes) > >>> ant-launcher.jar (19038 bytes) > >>> ant.jar (1998416 bytes) > >>> > >>> ------------------------------------------- > >>> USER_HOME/.ant/lib jar listing > >>> ------------------------------------------- > >>> user.home: /home/hadoop > >>> No such directory. > >>> > >>> ------------------------------------------- > >>> Tasks availability > >>> ------------------------------------------- > >>> junitreport : Not Available (the implementation class is not present) > >>> sshsession : Not Available (the implementation class is not present) > >>> sshexec : Not Available (the implementation class is not present) > >>> telnet : Not Available (the implementation class is not present) > >>> scp : Not Available (the implementation class is not present) > >>> antlr : Not Available (the implementation class is not present) > >>> netrexxc : Not Available (the implementation class is not present) > >>> ftp : Not Available (the implementation class is not present) > >>> rexec : Not Available (the implementation class is not present) > >>> sound : Not Available (the implementation class is not present) > >>> image : Not Available (the implementation class is not present) > >>> junit : Not Available (the implementation class is not present) > >>> jdepend : Not Available (the implementation class is not present) > >>> splash : Not Available (the implementation class is not present) > >>> A task being missing/unavailable should only matter if you are trying > to > >>> use it > >>> > >>> ------------------------------------------- > >>> org.apache.env.Which diagnostics > >>> ------------------------------------------- > >>> Not available. > >>> Download it at http://xml.apache.org/commons/ > >>> > >>> ------------------------------------------- > >>> XML Parser information > >>> ------------------------------------------- > >>> XML Parser : org.apache.xerces.jaxp.SAXParserImpl > >>> XML Parser Location: file:/usr/share/java/xerces-j2.jar > >>> Namespace-aware parser : > org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser > >>> Namespace-aware parser Location: file:/usr/share/java/xerces-j2.jar > >>> > >>> ------------------------------------------- > >>> XSLT Processor information > >>> ------------------------------------------- > >>> XSLT Processor : > >>> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl > >>> XSLT Processor Location: unknown > >>> > >>> ------------------------------------------- > >>> System properties > >>> ------------------------------------------- > >>> java.runtime.name : OpenJDK Runtime Environment > >>> java.vm.version : 11.0.16.1+9-LTS > >>> sun.boot.library.path : /usr/lib/jvm/java-11-amazon-corretto.x86_64/lib > >>> ant.library.dir : /usr/share/ant/lib > >>> java.vm.vendor : Amazon.com Inc. > >>> java.vendor.url : https://aws.amazon.com/corretto/ > >>> path.separator : : > >>> java.vm.name : OpenJDK 64-Bit Server VM > >>> sun.os.patch.level : unknown > >>> user.country : US > >>> sun.java.launcher : SUN_STANDARD > >>> java.vm.specification.name : Java Virtual Machine Specification > >>> user.dir : /home/hadoop/apache-nutch > >>> java.vm.compressedOopsMode : Zero based > >>> java.runtime.version : 11.0.16.1+9-LTS > >>> java.awt.graphicsenv : sun.awt.X11GraphicsEnvironment > >>> os.arch : amd64 > >>> java.io.tmpdir : /tmp > >>> line.separator : > >>> > >>> java.vm.specification.vendor : Oracle Corporation > >>> os.name : Linux > >>> ant.home : /usr/share/ant > >>> sun.jnu.encoding : UTF-8 > >>> java.library.path : > /usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib > >>> jdk.debug : release > >>> java.class.version : 55.0 > >>> java.specification.name : Java Platform API Specification > >>> sun.management.compiler : HotSpot 64-Bit Tiered Compilers > >>> os.version : 4.14.294-220.533.amzn2.x86_64 > >>> user.home : /home/hadoop > >>> user.timezone : > >>> java.awt.printerjob : sun.print.PSPrinterJob > >>> file.encoding : UTF-8 > >>> java.specification.version : 11 > >>> user.name : hadoop > >>> java.class.path : > >>> > /usr/share/java/ant.jar:/usr/share/java/ant-launcher.jar:/usr/share/java/jaxp_parser_impl.jar:/usr/share/java/xml-commons-apis.jar:/usr/share/ant/lib/ant-bootstrap.jar:/usr/share/ant/lib/ant-launcher.jar:/usr/share/ant/lib/ant.jar > >>> java.vm.specification.version : 11 > >>> sun.arch.data.model : 64 > >>> sun.java.command : org.apache.tools.ant.launch.Launcher -cp > -diagnostics > >>> java.home : /usr/lib/jvm/java-11-amazon-corretto.x86_64 > >>> user.language : en > >>> java.specification.vendor : Oracle Corporation > >>> awt.toolkit : sun.awt.X11.XToolkit > >>> java.vm.info : mixed mode > >>> java.version : 11.0.16.1 > >>> java.vendor : Amazon.com Inc. > >>> file.separator : / > >>> java.version.date : 2022-08-12 > >>> java.vendor.url.bug : https://github.com/corretto/corretto-11/issues/ > >>> sun.io.unicode.encoding : UnicodeLittle > >>> sun.cpu.endian : little > >>> java.vendor.version : Corretto-11.0.16.9.1 > >>> sun.cpu.isalist : > >>> > >>> ------------------------------------------- > >>> Temp dir > >>> ------------------------------------------- > >>> Temp dir is /tmp > >>> Temp dir is writeable > >>> Temp dir alignment with system clock is 11 ms > >>> > >>> ------------------------------------------- > >>> Locale information > >>> ------------------------------------------- > >>> Timezone Coordinated Universal Time offset=0 > >>> > >>> ------------------------------------------- > >>> Proxy information > >>> ------------------------------------------- > >>> Java1.5+ proxy settings: > >>> Direct connection > >>> > >> >