Yes, I have already gone through the reference guide. Its all because of the 
guide and documentation that I have reached till this stage.
Well, I am indexing rich document formats like - .docx, .pptx, .pdf etc.
The metadata I am talking about is - that currently sorl puts all the data like 
author, editor, content type details of the documents in the _text_  field, 
along with the textual content, and what I want is to separate them.
I also tried using ExtractingRequestHandler, understood the fmap.content in 
tika, but still can't reach the desired output.

-----Original Message-----
From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: 28 August 2019 12:55
To: solr-user@lucene.apache.org
Subject: Re: Require searching only for file content and not metadata

You need to provide a little bit more details.  What is your Schema? How is the 
document structured ? Where do you get metadata from?

Have you read the Solr reference guide? Have you read a book about Solr?

> Am 28.08.2019 um 08:10 schrieb Khare, Kushal (MIND) 
> <kushal.kh...@mind-infotech.com>:
>
> Could anyone please help me with how to use this approach ? I humbly request 
> all the users to please help me get through this.
> Thanks !
>
> -----Original Message-----
> From: Yogendra Kumar Soni [mailto:yogendra.ku...@dolcera.com]
> Sent: 28 August 2019 04:08
> To: solr-user@lucene.apache.org
> Subject: Re: Require searching only for file content and not metadata
>
> It will be easier to parse documents create content, metadata and other 
> required fields yourself in place of using default post tool. You will have 
> better control on what is going to  which field.
>
>
>> On Tue 27 Aug, 2019, 6:48 PM Khare, Kushal (MIND), < 
>> kushal.kh...@mind-infotech.com> wrote:
>>
>> Basically, what problem I am facing is - I am getting the textual
>> content
>> + other metadata in my _text_ field. But, I want only the textual
>> + content
>> written inside the document.
>> I tried various Request Handler Update Extract configurations, but
>> none of them worked for me.
>> Please help me resolve this as I am badly stuck in this.
>>
>> -----Original Message-----
>> From: Khare, Kushal (MIND) [mailto:kushal.kh...@mind-infotech.com]
>> Sent: 27 August 2019 12:59
>> To: solr-user@lucene.apache.org; ch...@christopherschultz.net
>> Subject: RE: Require searching only for file content and not metadata
>>
>> Chris,
>> What I have done is, I just created a core, used POST tool to index
>> the documents from my file system, and then moved to Solr Admin for querying.
>> For 'Metadata' vs 'Content' , I mean that I just want the field '_text_'
>> to be searched for, instead of all the fields that solr creates by
>> itself like - author name. last modified, creator, id, etc.
>> I simply want solr to search only for the content inside the document
>> (the body of the document) & not on all the fields. For an example,
>> if I search for 'Kushal', it should return the document only if it
>> has the word in it as the content, not because it has author name or owner 
>> as Kushal.
>> Hope its clear than before now. Please help me with this !
>>
>> Thankyou!
>> Kushal Khare
>>
>> -----Original Message-----
>> From: Christopher Schultz [mailto:ch...@christopherschultz.net]
>> Sent: 26 August 2019 18:47
>> To: solr-user@lucene.apache.org
>> Subject: Re: Require searching only for file content and not metadata
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> Kushal,
>>
>>> On 8/26/19 07:52, Khare, Kushal (MIND) wrote:
>>> This is Kushal Khare, a new addition to the user-list. I started
>>> working with Solr few days ago for implementing it in my project.
>>>
>>> Now, I have the basics done, and reached the query stage.
>>>
>>> My problem is – I need to restrict the solr to search only for the
>>> file content and not the metadata. I have gone through various
>>> articles on the internet, but could not get any help.
>>>
>>> Therefore, I hope I could get some solutions here.
>>
>> How are you querying Solr? Are you querying from a web application?
>> From a thick-client application? Directly from a web browser?
>>
>> What do you consider "metadata" versus "content"? To Solr, everything
>> is the same...
>>
>> - -chris
>> -----BEGIN PGP SIGNATURE-----
>> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>>
>> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl1j268ACgkQHPApP6U8
>> pFi6GA//VY8SU6H5T3G6fpUqQrVp05E9g7f0oGGVW1eaRY3NjgQzfbwJQmJqg16Y
>> MyUKpp0/P6EpR/dMPmiKBPvLppSqjT1SUNgrFi2btwtBaTibxWXd0WtEqNdinWCo
>> DFyJaPQaIT20IR887SPWrQSYc4oC8aKNAEDAXxlyWDzEgImE23AyCeWs++gJsaKm
>> RphkleBeIKCX6SkRzDFeEzx4VyKBZKcjI+Ks/9z2s9tcGmElxyMDPHYf5VXJQgcz
>> A1D3jPVPqm2OMvThXd2ll4NlnXe2PWV5eYfZQt/6YMwx4jF+rqG66jDXEhTHzDro
>> jmiZVj1VbQ0RlFLqP6OHu2YRj+01a0OtE8l4mWiGSNIrKymp+ycT9E+L0eC9yGIT
>> hLUfo7a3ONfOTTNAbuI/363+2WA1wBxSHm2m3kQT8Ho8ydjd7w/umR1L6/wr+q9B
>> jEZfAHs1TLFXd6lgqLtmIyf6Ya5bloWM+yjwnjfpniOuHCcXTiJi+5GvxLwih8yE
>> 6CQ32kIUuspJ7N5hyiJvM4AcuWWMldDlZaYoHuUwhVbWCCT+Y4X6R1+IZfyXZnvn
>> wFEMD3+3r382M3G0uyh2MJk899l1kSPcX+BtRg3pOqDZh0WR+2xWpTndeiMxsmGj
>> UC1J1PssKUa1P0dMk7wLvgOl0BiiGC+WwgD7ZfHjF7NPL1jPtW8=
>> =LWwW
>> -----END PGP SIGNATURE-----
>>
>> ________________________________
>>
>> The information contained in this electronic message and any
>> attachments to this message are intended for the exclusive use of the
>> addressee(s) and may contain proprietary, confidential or privileged
>> information. If you are not the intended recipient, you should not
>> disseminate, distribute or copy this e-mail. Please notify the sender
>> immediately and destroy all copies of this message and any
>> attachments. WARNING: Computer viruses can be transmitted via email.
>> The recipient should check this email and any attachments for the
>> presence of viruses. The company accepts no liability for any damage
>> caused by any virus/trojan/worms/malicious code transmitted by this
>> email. www.motherson.com
>>
>> ________________________________
>>
>> The information contained in this electronic message and any
>> attachments to this message are intended for the exclusive use of the
>> addressee(s) and may contain proprietary, confidential or privileged
>> information. If you are not the intended recipient, you should not
>> disseminate, distribute or copy this e-mail. Please notify the sender
>> immediately and destroy all copies of this message and any
>> attachments. WARNING: Computer viruses can be transmitted via email.
>> The recipient should check this email and any attachments for the
>> presence of viruses. The company accepts no liability for any damage
>> caused by any virus/trojan/worms/malicious code transmitted by this
>> email. www.motherson.com
>>
>
> ________________________________
>
> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of the
> addressee(s) and may contain proprietary, confidential or privileged
> information. If you are not the intended recipient, you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and destroy all copies of this message and any
> attachments. WARNING: Computer viruses can be transmitted via email.
> The recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any damage
> caused by any virus/trojan/worms/malicious code transmitted by this
> email. www.motherson.com

________________________________

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any 
virus/trojan/worms/malicious code transmitted by this email. www.motherson.com

Reply via email to