The PRONOM file format signature for image/x-portable-bitmap is '“P1”
followed by a whitespace char (blank, TAB, CR, LF).' which would at least
tighten it up a bit. Ditto for image/x-portable-graymap

https://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=236&strPageToDisplay=signatures
https://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=1155&strPageToDisplay=signatures

On Wed, Mar 20, 2024 at 2:14 PM Tim Allison <[email protected]> wrote:

> I'm wondering if we can tighten the detection to include a newline after
> the P2, etc.  It looks like we require a new line for some of those file
> format variants. Let me do some research, unless anyone happens to know.
>
> On Mon, Mar 18, 2024 at 4:40 PM Kashif Khan <[email protected]>
> wrote:
>
>> Hi,
>> I tried configuring the tika configuration using the config file and
>> importing it to the program where I am parsing the text, but that didn't
>> work and I am still getting the same error/result.
>> Basically, I want my program (using tika for parsing) to consider any
>> kind of data that is provided as a simple "text" and nothing else.
>>
>> Could you please suggest a path forward how I can solve this?
>>
>> -Kashif
>>
>> On Sun, Mar 17, 2024 at 10:23 PM Tilman Hausherr <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> The best would of course be that you don't make it look as if your text
>>> files are something else.
>>>
>>> The second best: fine tune the tika configuration
>>> https://tika.apache.org/2.9.1/configuring.html
>>>
>>> Tilman
>>>
>>> On 17.03.2024 17:46, Kashif Khan wrote:
>>>
>>> Do you think it is an issue to be fixed? And also, is there a workaround
>>> for this to work?
>>>
>>> On Sun, Mar 17, 2024, 5:03 PM Tilman Hausherr <[email protected]>
>>> wrote:
>>>
>>>> The first one is recognized as image/x-portable-graymap because "P2" is
>>>> a magic number for that type.
>>>>
>>>> "P1" is a magic number for image/x-portable-bitmap.
>>>>
>>>> Tilman
>>>>
>>>> On 16.03.2024 12:37, Kashif Khan wrote:
>>>>
>>>> Hello Tim/Forum,
>>>>
>>>> While I am trying to parse the below content the result is null/empty:
>>>> *"P2P He has Asthma"*
>>>> OR
>>>> *"P18-8610 He has Asthma"*
>>>> OR
>>>> *"P2P Scheduled as He had breathing issues *for the last* 1 year."*
>>>>
>>>> Whereas, the below gets parsed without any issues:
>>>> *"He has Asthma"*
>>>> *"Appointment Scheduled as He had breathing issues for last 1 year."*
>>>>
>>>> Could you please help in understand the exact issue and help with the
>>>> resolution?
>>>>
>>>> -Kashif Khan
>>>> [email protected]
>>>>
>>>>
>>>>
>>>

-- 
Greg Lepore
Information Technology Specialist
National Archives at College Park
8601 Adelphi Road, Rm 4300
College Park, MD 20740
Cell 443-741-0970 (personal)

Reply via email to