Hi Shi Wei, (looping back to user@nutch - sorry, should have replied to the list)
First, the masking of sensitive strings is tracked in https://issues.apache.org/jira/browse/NUTCH-2905 Second, to disable the logging: The logging class is IndexerOutputFormat, so you need to add log4j.logger.org.apache.nutch.indexer.IndexerOutputFormat=WARN or for Nutch 1.19 and the current master edit the file conf/log4j2.xml and add to the list of <Loggers>: <Logger name="org.apache.nutch.indexer.IndexerOutputFormat" level="WARN" additivity="false"> <Appender-ref ref="RollingFile" level="WARN" /> </Logger> Best, Sebastian On 11/17/21 14:07, sw.l...@quandatics.com wrote: > Hi Sebastian, > > Thanks for your reply. > > According to the statement, "You could set the log level for the class > logging the password from INFO to WARN.", may we know which > class/parameter that we should set to only restrict the Elasticsearch > indexer logs to WARN level? This is because we have tried to set the > following in the log4j.properties but it doesn't help. > > log4j.logger.org.apache.nutch.indexwriter.elastic.ElasticIndexWriter=WARN,cmdstdout > log4j.logger.org.apache.nutch.indexwriter.elastic.ElasticUtils=WARN,cmdstdout > > > Best Regards, > Shi Wei > > On 2021-11-15 21:26, Sebastian Nagel wrote: >> Hi Shi Wei, >> >>> hide the password value in hadoop.log file table ? >> >> You could set the log level for the class logging the password from INFO >> to WARN. Then the index writer configuration isn't logged anymore. >> As said, this is a work-around not a final solution which should, >> of course, mask passwords when logging. >> >>> We also ran into an issue where an https connection could not be >>> established with elasticsearch >> >> If the problem persists could you start a separate thread? >> >> Thanks, >> Sebastian >> >> On 11/12/21 10:57, sw.l...@quandatics.com >> <mailto:sw.l...@quandatics.com> wrote: >>> Hi, Sebastian >>> >>> Thanks for your suggestion, may I know if there is a way to hide the >>> password value in hadoop.log file table ? >>> We also ran into an issue where an https connection could not be >>> established with elasticsearch. Do you have any suggestions to solve >>> this problem? >>> Thank >>> >>> >>> >>> Best Regards, >>> Shi Wei >>> >>> -----Original Message----- >>> From: Sebastian Nagel <wastl.na...@googlemail.com.INVALID >>> <mailto:wastl.na...@googlemail.com.INVALID>> >>> Sent: Friday, 12 November, 2021 1:20 AM >>> To: user@nutch.apache.org <mailto:user@nutch.apache.org> >>> Subject: Re: encrypt password of the index-writer.xml >>> >>> Hi Shi Wei, >>> >>> there is a way, although definitely not the recommended one. >>> Sorry, and it took me a little bit to proof it. >>> >>> Do you know about external XML entities or XXE attacks? >>> >>> 1. On top of the index-writers.xml you add an entity declaration: >>> >>> <?xml version="1.0" encoding="UTF-8" ?> >>> <!DOCTYPE urlset [ >>> <!ENTITY CREDENTIALS SYSTEM "file:///path/to/credentials.txt"> >>> ]> >>> >>> >>> 2. it's used later in the index writer spec: >>> >>> <writer id="indexer_solr_1" >>> class="org.apache.nutch.indexwriter.solr.SolrIndexWriter"> >>> <parameters> >>> ... >>> &CREDENTIALS; >>> </parameters> >>> >>> 3. you add your credentials snippet to the file /path/to/credentials.txt >>> >>> <param name="username" value="username"/> <param name="password" >>> value="SECRET"/> >>> >>> 4. and voila: >>> >>> $> bin/nutch index crawldb segment >>> ... >>> ├────────────┼─────────────────────────────┼─────────┤ >>> │username │The username of Solr server. │username │ >>> ├────────────┼─────────────────────────────┼─────────┤ >>> │password │The password of Solr server. │SECRET │ >>> └────────────┴─────────────────────────────┴─────────┘ >>> >>> >>> Note: this is an dirty hack but not a security issue: with access to >>> the index-writers.xml you can write anything into it. But there is >>> no guarantee that this hack will continue to work in the future. >>> >>> Would you please be so kind to open a Jira issue to add real support >>> for passwords in the index-writers.xml >>> >>> Best, >>> Sebastian >>> >>> >>> >>> On 11/10/21 11:16, sw.l...@quandatics.com >>> <mailto:sw.l...@quandatics.com> wrote: >>>> Hi , >>>> >>>> >>>> >>>> >>>> >>>> We have tried the variable expansion method on the index-writers.xml, >>>> it doesn't work. Could you advise if there are any alternative ways to >>>> encrypt the password in the index-writers.xml file? >>>> >>>> >>>> >>>> >>>> >>>> Best Regards, >>>> >>>> Shi Wei >>>> >>>> >>>> >>>>