A couple of random questions about ConvertExcelToCSVProcessor:

Why does this processor only handle the xlsx Excel file format?  From the
Description for ConvertExcelToCSVProcessor:  "*This processor is currently
only capable of processing .xlsx (XSSF 2007 OOXML file format) Excel
documents and not older .xls (HSSF '97(-2007) file format) documents.*" I
ask because it seems unfortunate to have to develop a separate distinct
flow path to handle the .xls files that this native processor cannot. Why
was it that handling of xls Excel files was not baked into
ConvertExcelToCSVProcessor too? Do later releases lift this limitation?

What is it about this processor that required including the word Processor
in its name? It seems redundant and inconsistent with the naming convention
used for the majority of the other processors. I figure there was an
interesting reason behind this, and so wanted to ask.

I am using a slightly older version of NiFi. Does this limitation go away
in later versions?

On Mon, Sep 25, 2023 at 3:23 AM Chris Sampson <chris.samp...@naimuri.com>
wrote:

> I completely missed the fact that this was an external python conversion
> script through the ExecuteStreamCommand, but as Matt says, that will be
> catered for in the new NiFi versions.
>
> From a quick look, although I've not tested to confirm, it appears both
> the existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can
> now be paired with the relatively new ExcelReader,  e.g. in a ConvertRecord
> processor) will both set the result flowfile's mime.type attribute as
> text/csv, which would allow the expected downstream content viewer
> behaviour.
>
> On Mon, 25 Sept 2023, 06:54 Matt Burgess, <mattyb...@apache.org> wrote:
>
>> I added MIME Type properties to ExecuteProcess and ExecuteStream command
>> so you can set it explicitly if you want [1]. They will be in the 1.24.0
>> and 2.0 releases.
>>
>> Regards,
>> Matt
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-12011
>>
>>
>> On Mon, Sep 25, 2023 at 1:41 AM Joe Witt <joe.w...@gmail.com> wrote:
>>
>>>  Chris
>>>
>>> Yep. Though this case was ExecuteStreamCommand so following with
>>> UpdateAttr as you mention or IdentifyMimeType would do the trick.
>>>
>>> Thanks
>>>
>>> On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson <
>>> chris.samp...@naimuri.com> wrote:
>>>
>>>> An UpdateAttribute could also be used to update the mime.type, e.g. to
>>>> text/csv.
>>>>
>>>> I'd think the csv record writer should probably do this automatically
>>>> though, so maybe worth a jira to correct that (I'm reasonably sure the
>>>> existing json and avro writers do that, for example).
>>>>
>>>> On Sun, 24 Sept 2023, 23:52 James McMahon, <jsmcmah...@gmail.com>
>>>> wrote:
>>>>
>>>>> That was it. I was missing the forest for the trees, yet again <lol>.
>>>>> I do all the hard work and then forget to IdentifyMimeType at the end.
>>>>> Thanks very much Joe.
>>>>> Jim
>>>>>
>>>>> On Sun, Sep 24, 2023 at 6:30 PM Joe Witt <joe.w...@gmail.com> wrote:
>>>>>
>>>>>> Jim,
>>>>>>
>>>>>> Before you try to view it you can likely run it through
>>>>>> IdentifyMimeType.  As you note the conversion from XLS to CSV happens but
>>>>>> we still see a mime type of 'application/vnd.
>>>>>> openxmlformats-officedocument.spreadsheetml.sheet' so that is likely
>>>>>> causing it to not even attempt to display.  So after your python script
>>>>>> execution run the data through IdentifyMimeType then you can likely view 
>>>>>> it
>>>>>> just fine.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Sun, Sep 24, 2023 at 3:21 PM James McMahon <jsmcmah...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I sure can Joe. Here they are:
>>>>>>>
>>>>>>> RouteOnAttribute.Route
>>>>>>> isExcel
>>>>>>> execution.command
>>>>>>> /usr/bin/python3
>>>>>>> execution.command.args
>>>>>>> /opt/nifi/config_resources/scripts/excelToCSV.py
>>>>>>> execution.error
>>>>>>> Empty string set
>>>>>>> execution.status
>>>>>>> 0
>>>>>>> filename
>>>>>>> Alltables.csv
>>>>>>> hash.value.md5
>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>> hash.value.sha256
>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>>>>>> isChild
>>>>>>> false
>>>>>>> mime.extension
>>>>>>> .xlsx
>>>>>>> mime.type
>>>>>>> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
>>>>>>> parent.MD5
>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>> parent.SHA256
>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>>>>>> path
>>>>>>> ./
>>>>>>> s3.bucket
>>>>>>> rampart-raw-data
>>>>>>> s3.encryptionStrategy
>>>>>>> SSE_S3
>>>>>>> s3.etag
>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>> s3.isLatest
>>>>>>> true
>>>>>>> s3.lastModified
>>>>>>> 1672701227000
>>>>>>> s3.length
>>>>>>> 830934
>>>>>>> s3.owner
>>>>>>> b34a7aa80a4130503fee2e8d4c2b674e154af3c4db69db9a4e3bff8a47cc92d1
>>>>>>> s3.sseAlgorithm
>>>>>>> AES256
>>>>>>> s3.storeClass
>>>>>>> STANDARD
>>>>>>> s3.version
>>>>>>> null
>>>>>>> sourcing.MD5
>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>> sourcing.SHA256
>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>>>>>> sourcing.sourceMD5
>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>>>>> sourcing.sourceSHA256
>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>>>>>>> triage.datatype
>>>>>>> excel
>>>>>>> uuid
>>>>>>> d72ec2e9-cfbd-435e-9954-4f7fae55c550
>>>>>>>
>>>>>>> Thanks for any help. Perhaps my data is there but I simply can't
>>>>>>> render it in the Viewer?
>>>>>>> Jim
>>>>>>>
>>>>>>> On Sun, Sep 24, 2023 at 6:08 PM Joe Witt <joe.w...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Jim,
>>>>>>>>
>>>>>>>> If a content type attribute exists and is not a type NiFi
>>>>>>>> understands it will not be able to render it.  Can you show what 
>>>>>>>> flowfile
>>>>>>>> attributes are present at the point you attempt to view it?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> On Sun, Sep 24, 2023 at 3:03 PM James McMahon <jsmcmah...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello. I have converted incoming Excel files to csv. I'd like to
>>>>>>>>> look at the result, but when I select my flowfiles from the output 
>>>>>>>>> queue, I
>>>>>>>>> can only select "View as hex" - but I cannot get the display to show 
>>>>>>>>> me the
>>>>>>>>> records in the form I expect. Viewing them using the hex display is 
>>>>>>>>> not
>>>>>>>>> helpful.
>>>>>>>>>
>>>>>>>>> How can I fix this viewing issue?
>>>>>>>>>
>>>>>>>>> Here is an example of what I can see:
>>>>>>>>>
>>>>>>>>> 0x00000000 22 54 61 62 6C 65 20 31 2E 20 20 45 73 74 69 6D "Table
>>>>>>>>> 1. Estim
>>>>>>>>> 0x00000010 61 74 65 64 20 4D 6F 6E 74 68 6C 79 20 53 61 6C ated
>>>>>>>>> Monthly Sal
>>>>>>>>> 0x00000020 65 73 20 61 6E 64 20 49 6E 76 65 6E 74 6F 72 69 es and
>>>>>>>>> Inventori
>>>>>>>>> 0x00000030 65 73 20 66 6F 72 20 4D 61 6E 75 66 61 63 74 75 es for
>>>>>>>>> Manufactu
>>>>>>>>> 0x00000040 72 65 72 73 2C 20 52 65 74 61 69 6C 65 72 73 2C rers,
>>>>>>>>> Retailers,
>>>>>>>>> 0x00000050 20 61 6E 64 20 4D 65 72 63 68 61 6E 74 20 57 68 and
>>>>>>>>> Merchant Wh
>>>>>>>>> 0x00000060 6F 6C 65 73 61 6C 65 72 73 22 2C 55 6E 6E 61 6D
>>>>>>>>> olesalers",Unnam
>>>>>>>>> 0x00000070 65 64 3A 20 31 2C 55 6E 6E 61 6D 65 64 3A 20 32 ed:
>>>>>>>>> 1,Unnamed: 2
>>>>>>>>> 0x00000080 2C 55 6E 6E 61 6D 65 64 3A 20 33 2C 55 6E 6E 61 ,Unnamed:
>>>>>>>>> 3,Unna
>>>>>>>>> 0x00000090 6D 65 64 3A 20 34 2C 55 6E 6E 61 6D 65 64 3A 20 med:
>>>>>>>>> 4,Unnamed:
>>>>>>>>> 0x000000A0 35 2C 55 6E 6E 61 6D 65 64 3A 20 36 2C 55 6E 6E 5,Unnamed:
>>>>>>>>> 6,Unn
>>>>>>>>> 0x000000B0 61 6D 65 64 3A
>>>>>>>>>
>>>>>>>>

Reply via email to