Hello Michael, That is impossible to say, maybe the original data had no value for the title>raw fields, maybe the raw field in CloudSearch is not configured to be stored, but only indexed instead.
What you can do is use Nutch indexchecker <URL> tool, this will print the exact fields that Nutch would index to CloudSearch. Markus Op wo 4 sep 2024 om 17:54 schreef Fritsch, Michael <[email protected]>: > Hello, > I use Nutch 1.19 to crawl my website and to index the data into AWS > CloudSearch. > For this, I use the CloudSearch Index writer. > Everything works fine. > Now I want to copy the content of the "content" field into a different > field in CloudSearch. > I've created this field in CloudSearch with the name "raw" and the same > settings (except for the analysis scheme) as the "content" field. > In the index-writers.xml configuration file, I used the following > configuration in order to copy the content: > > <writer id="indexer_cloud_search_1" > class="org.apache.nutch.indexwriter.cloudsearch.CloudSearchIndexWriter"> > <parameters> > <param name="endpoint" value="MyEndpointAddress"/> > <param name="region" value="eu-west-1"/> > <param name="batch.dump" value="false"/> > <param name="batch.maxSize" value="-1"/> > </parameters> > <mapping> > <copy source="title" target="raw"/> > <rename /> > <remove /> > </mapping> > </writer> > > Everything works without errors, that means the standard content is > indexed into CloudSearch but I do not see any content in the "raw" field. > Has anyone an idea, why this happens? > > Best regards, > Michael > > > Dr. Michael Fritsch > Technical Editor > > [image: A picture containing graphics, graphic design, font, logo > Description automatically generated] <https://www.coremedia.com/> > > > > *Elevate Experience. Drive Impact.* > > > E-Mail: [email protected] > > Phone: +49 (0) 40 325 587 0 > *www.coremedia.com <https://www.coremedia.com/>* > > [image: A pink and red letter on a black background Description > automatically generated with low confidence] > <https://www.linkedin.com/company/coremedia-corp/>[image: A logo of a > camera Description automatically generated with low confidence] > <https://www.instagram.com/coremediacc/>[image: A picture containing > colorfulness, screenshot, graphics, red Description automatically generated] > <https://www.youtube.com/channel/UC3u29ExYv1263SfUBWnsgdQ> > > > -------------------------------------------------------------------------------- > > CoreMedia GmbH > > Rödingsmarkt 9, 20459 Hamburg, Germany > > Managing Director: Sören Stamer > > Commercial Register: Amtsgericht Hamburg, HRB 162480 > > > -------------------------------------------------------------------------------- > > > > >

