Re: Error in Jython environment

2018-04-13 Thread James McMahon
No sir. When I run from the command line - I encapsulate script-specific command sequences in code triggered by SCRIPT var value, and NiFi sequences encapsulated within NIFI var values - it runs in a Python environment. I attempted jython -V at the command line, and it was unable to find an instal

Re: NiFi cluster with DistributedMapCacheServer/Client

2018-04-13 Thread Andrew Grande
HBase-backed cache is a great choice here. Redis is nice and nimble, but when it comes to clustering and enterprise security, may not be the best fit. The original legacy cache server in NiFi is... well, it should be deprecated and removed, IMO :) Andrew On Fri, Apr 13, 2018, 3:45 PM James Sriniv

Re: Error in Jython environment

2018-04-13 Thread Matt Burgess
Jim, Just to confirm Joe's comments, we were using 2.7.0 but then bumped it up to 2.7.1 as there was at least one method os.getpid() that hadn't been implemented [1]. However, 2.7.1 apparently has multithreading issues [2] that are causing some folks headaches as well. Regards, Matt [1] https:/

Re: NiFi cluster with DistributedMapCacheServer/Client

2018-04-13 Thread James Srinivasan
Thanks, I might try moving to the HBase implementation anyway because: 1) It is already in NiFi 1.3 2) We already have HBase installed (but unused) on our cluster 3) There doesn't seem to be a limit to the number of cache entries. For our use case (avoiding downloading the same file multiple times

Re: NiFi cluster with DistributedMapCacheServer/Client

2018-04-13 Thread Joe Witt
James, You have it right about the proper solution path.. I think we have a Redis one in there now too that might be interesting (not in 1.3.0 perhaps but..). We offered a simple out of the box one early and to ensure the interfaces are right. Since then the community has popped up some real/st

NiFi cluster with DistributedMapCacheServer/Client

2018-04-13 Thread James Srinivasan
Hi all, Is there a recommended way to set up a DistributedMapCacheServer/Client on a cluster, ideally with some amount of HA (NiFi 1.3.0)? I'm using a shared persistence directory, and when adding and enabling the controller it seems to start on my primary node (but not the other two - status keep

Re: Error in Jython environment

2018-04-13 Thread Joe Witt
Independent of NiFi have you successfully used any version of Jython to make that call? With the NiFi 1.x line that processor has been updated a little bit and in particular i think we have Jython 2.7.1 (off memory so could be totally nonsense) where as before we had 2.7.0 or something older... T

Error in Jython environment

2018-04-13 Thread James McMahon
Good afternoon. I am running a python script from an ExecuteScript processor in NiFi 0.7.1.c. I'm assuming this is actually running in a Jython environment under the hood of the processor. General question: How can I tell what version of Jython i am employing? Specific problem: I employ an os.sta

Re: Clustering Questions

2018-04-13 Thread Joe Witt
Jon, Node Failure: You have to care about two things generally speaking. First is the flow execution and second is data in-flight For flow execution nifi clustering will take care of re-assigning the primary node and cluster coordinator as needed. For data we do not at present offer distribute

Clustering Questions

2018-04-13 Thread Jon Logan
All, I had a few general questions regarding Clustering, and was looking for any sort of advice or best-practices information -- - documentation discusses failure handling primarily from a NiFi crash scenario, but I don't recall seeing any information on entire node-failure scenarios. Is there a w

Re: MergeRecord, queue & backpressure

2018-04-13 Thread Juan Sequeiros
Good afternoon, Another thing to help you out maybe ... You can also tweak the nifi.properties setting: nifi.queue.swap.threshold=2 This setting will control the value of the max flowfile count on a connection if exceeded it will flush those flowfiles to disk. I am not sure however there is

Re: MergeRecord, queue & backpressure

2018-04-13 Thread Mark Payne
Aurélien, In that case you're looking to merge about 500,000 FlowFiles into a single FlowFile, so you'll definitely want to use a cascading approach. I'd shoot for about 1 MB for the first MergeRecord and then merge 128 of those together for the second MergeRecord. The provenance backpressure i

MergeRecord, queue & backpressure

2018-04-13 Thread DEHAY Aurelien
Hello. It's me again regarding my mergerecord question. I still don't manage to have what I want, I may have understand how bin based processor works, it's for clarification and a question regarding performance. I want to merge a huge number of 300 octets flowfiles in 128 MB parquet file. My

RE: MergeRecord

2018-04-13 Thread DEHAY Aurelien
Hello. We looked in first place in the InferSchema to see if there was an option for that. Anyway, thank you very much, it works fine with the update attribute. Aurélien DEHAY Big Data Architect +33 616 815 441 aurelien.de...@faurecia.com 2 rue Hennape - 92735 Nanterre Cedex – France ---

Re: MergeRecord

2018-04-13 Thread Koji Kawamura
Hi, Just FYI, If I replaces the schema doc comment by UpdateAttribute, I was able to merge records. ${inferred.avro.schema:replaceAll('"Type inferred from [^"]+"', '""')} I looked at InferAvroSchema and underlying Kite source code, but there's no option to suppress the doc comment when inferring

Re: MergeRecord

2018-04-13 Thread Koji Kawamura
Hi, I've tested InferAvroSchema and MergeRecord scenario. As you described, records are not merged as expected. The reason in my case is, InferAvroSchema generates schema text like this: inferred.avro.schema { "type" : "record", "name" : "example", "doc" : "Schema generated by Kite", "fields" : [