Really thanks a lot for your patience and help . Yes you are right in the summary . About the topology do you mean that trying to process data with size low to see if it will work or I'm wrong ?
On Saturday, July 29, 2017, Stig Rohde Døssing <[email protected]> wrote: > I'm not terribly familiar with Trident, but it seems odd to me that your fields are static. If Function works at all like a regular bolt, there may be more than one instance of Processing in play in your topology, and you'd be running into interference issues if more than one Processing executor (thread) runs in a JVM. > > Just to try to summarize your issue: You have a topology which does some computation using Trident and then writes the result to a file. When the topology is submitted to a distributed Storm 0.10.2 cluster, it will execute for a bit, and then the workers will die with no obvious reason in the log. Please correct me if this summary is wrong. > > I still don't feel like I have much to go on here. I doubt you're getting an OOME, I just checked with a small memory leaking topology on 0.10.2, and you should get an error log in the worker log if an OOME occurs, at least if it's caused by code in a bolt. Could you strip down your topology to a minimum working example that exhibits the problem (workers dying for no apparent reason), and post that example? > > 2017-07-28 15:47 GMT+02:00 sam mohel <[email protected]>: >> >> is there any help , please ? >> my code of processing using hashsets and hashmaps i declared in the class like >> public class Processing implements Function >> { >> public static HashMap<String, Integer> A; >> public static HashMap<String, Integer> B; >> public static HashMap<String, Double> C; >> public static HashSet<String> D ; >> public static HashSet<String> E ; >> public void prepare(Map conf, TridentOperationContext context) { >> A=new HashMap<String, Integer>(); >> B=new HashMap<String, Integer>(); >> C=new HashMap<String, Double>(); >> D= new HashSet<String>(); >> E=new HashSet<String>(); >> } >> when i tried to initialize hashmaps and hashsets objects outside prepare method . the processing is stopped and got my result with 5KB but when i initialized it in prepare method , the processing is stopped too but at 50 KB i got in the result file . is there anything should i do with them ? Can the problem be in this class ? Although this class was working well before . >> is there anything ii should clean it from memory or something ? >> On Fri, Jul 28, 2017 at 6:09 AM, sam mohel <[email protected]> wrote: >>> >>> i submitted the topology again and attached screen shot (( error.png)) of what i got in storm UI but after seconds i got ((error1.png)) zeros in all columns because worker died . really i'm still confusing to figure out where is the problem >>> On Thu, Jul 27, 2017 at 10:13 PM, sam mohel <[email protected]> wrote: >>>> >>>> i submitted the topology in distributed mode with localhost >>>> i didn't use anything to shutdown >> The strange thing is i submitted this topology before without any problems . But now got this issue . Anything, Should i check it ? >>>> On Thu, Jul 27, 2017 at 9:59 PM, John, Dintu <[email protected]> wrote: >>>>> >>>>> Are you using LocalCluster.shutdown or killTopology in the main method once you submit the topology? From the logs it looks like that… >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Thanks & Regards >>>>> >>>>> Dintu Alex John >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> From: sam mohel [mailto:[email protected]] >>>>> Sent: Thursday, July 27, 2017 2:54 PM >>>>> To: [email protected]; [email protected] >>>>> Subject: Re: Setting heap size parameters by workers.childopts and supervisor.childopts >>>>> >>>>> >>>>> >>>>> i forgot to mention that i tried to increase topology.message.timeout.secs to 180 but didn't work too >>>>> >>>>> >>>>> >>>>> On Thu, Jul 27, 2017 at 9:52 PM, sam mohel <[email protected]> wrote: >>>>> >>>>> i tried to use debug . got in the worker.log.err >>>>> >>>>> 2017-07-27 21:47:48,868 FATAL Unable to register shutdown hook because JVM is shutting down. >>>>> >>>>> >>>>> >>>>> and this lines from worker.log >>>>> >>>>> 2017-07-27 21:47:48.811 b.s.d.executor [INFO] Processing received message FOR 1 TUPLE: source: b-1:27, stream: __ack_ack, id: {}, [3247365064986003851 -431522470795602124] >>>>> >>>>> 2017-07-27 21:47:48.811 b.s.d.executor [INFO] BOLT ack TASK: 1 TIME: 0 TUPLE: source: b-1:27, stream: __ack_ack, id: {}, [3247365064986003851 -431522470795602124] >>>>> >>>>> 2017-07-27 21:47:48.811 b.s.d.executor [INFO] Execute done TUPLE source: b-1:27, stream: __ack_ack, id: {}, [3247365064986003851 -431522470795602124] TASK: 1 DELTA: 0 >>>>> >>>>> 2017-07-27 21:47:48.811 b.s.d.executor [INFO] Processing received message FOR 1 TUPLE: source: b-1:29, stream: __ack_ack, id: {}, [3247365064986003851 -6442207219333745818] >>>>> >>>>> 2017-07-27 21:47:48.811 b.s.d.executor [INFO] BOLT ack TASK: 1 TIME: 0 TUPLE: source: b-1:29, stream: __ack_ack, id: {}, [3247365064986003851 -6442207219333745818] >>>>> >>>>> 2017-07-27 21:47:48.811 b.s.d.executor [INFO] Execute done TUPLE source: b-1:29, stream: __ack_ack, id: {}, [3247365064986003851 -6442207219333745818] TASK: 1 DELTA: 0 >>>>> >>>>> 2017-07-27 21:47:48.811 b.s.d.executor [INFO] Processing received message FOR 1 TUPLE: source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 5263752373603294688] >>>>> >>>>> 2017-07-27 21:47:48.811 b.s.d.executor [INFO] BOLT ack TASK: 1 TIME: 0 TUPLE: source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 5263752373603294688] >>>>> >>>>> 2017-07-27 21:47:48.868 b.s.d.worker [INFO] Shutting down worker top-1-1501184820 9adf5f4c-dc5b-47b5-a458-40defe84fe9e 6703 >>>>> >>>>> 2017-07-27 21:47:48.868 b.s.d.worker [INFO] Shutting down receive thread >>>>> >>>>> 2017-07-27 21:47:48.869 b.s.d.executor [INFO] BOLT ack TASK: 1 TIME: 0 TUPLE: source: b-1:31, stream: __ack_ack, id: {}, [3247365064986003851 4288963968930353157] >>>>> >>>>> 2017-07-27 21:47:48.872 b.s.d.executor [INFO] Execute done TUPLE source: b-1:31, stream: __ack_ack, id: {}, [3247365064986003851 4288963968930353157] TASK: 1 DELTA: 60 >>>>> >>>>> 2017-07-27 21:47:48.872 b.s.d.executor [INFO] Processing received message FOR 1 TUPLE: source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 5240959063117469257] >>>>> >>>>> 2017-07-27 21:47:48.872 b.s.d.executor [INFO] BOLT ack TASK: 1 TIME: 0 TUPLE: source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 5240959063117469257] >>>>> >>>>> 2017-07-27 21:47:48.873 b.s.d.executor [INFO] Execute done TUPLE source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 5240959063117469257] TASK: 1 DELTA: 1 >>>>> >>>>> 2017-07-27 21:47:48.873 b.s.d.executor [INFO] Processing received message FOR 1 TUPLE: source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 7583382518734849127] >>>>> >>>>> 2017-07-27 21:47:48.873 b.s.d.executor [INFO] BOLT ack TASK: 1 TIME: 0 TUPLE: source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 7583382518734849127] >>>>> >>>>> 2017-07-27 21:47:48.873 b.s.d.executor [INFO] Execute done TUPLE source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 7583382518734849127] TASK: 1 DELTA: 0 >>>>> >>>>> 2017-07-27 21:47:48.873 b.s.d.executor [INFO] Processing received message FOR 1 TUPLE: source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 6840644970823833210] >>>>> >>>>> 2017-07-27 21:47:48.873 b.s.d.executor [INFO] BOLT ack TASK: 1 TIME: 0 TUPLE: source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 6840644970823833210] >>>>> >>>>> 2017-07-27 21:47:48.873 b.s.d.executor [INFO] Execute done TUPLE source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 6840644970823833210] TASK: 1 DELTA: 0 >>>>> >>>>> 2017-07-27 21:47:48.873 b.s.d.executor [INFO] Processing received message FOR 1 TUPLE: source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 -6463368911496394080] >>>>> >>>>> 2017-07-27 21:47:48.873 b.s.d.executor [INFO] BOLT ack TASK: 1 TIME: 0 TUPLE: source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 -6463368911496394080] >>>>> >>>>> 2017-07-27 21:47:48.874 b.s.d.executor [INFO] Execute done TUPLE source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 -6463368911496394080] TASK: 1 DELTA: 1 >>>>> >>>>> 2017-07-27 21:47:48.874 b.s.d.executor [INFO] Processing received message FOR 1 TUPLE: source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 764549587969230513] >>>>> >>>>> 2017-07-27 21:47:48.874 b.s.d.executor [INFO] BOLT ack TASK: 1 TIME: 0 TUPLE: source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 764549587969230513] >>>>> >>>>> 2017-07-27 21:47:48.874 b.s.d.executor [INFO] Execute done TUPLE source: b-3:33, stream: __ack_ack, id: {}, [3247365064986003851 764549587969230513] TASK: 1 DELTA: 0 >>>>> >>>>> 2017-07-27 21:47:48.874 b.s.d.executor [INFO] Processing received message FOR 1 TUPLE: source: b-5:35, stream: __ack_ack, id: {}, [3247365064986003851 -4632707886455738545] >>>>> >>>>> 2017-07-27 21:47:48.874 b.s.d.executor [INFO] BOLT ack TASK: 1 TIME: 0 TUPLE: source: b-5:35, stream: __ack_ack, id: {}, [3247365064986003851 -4632707886455738545] >>>>> >>>>> 2017-07-27 21:47:48.874 b.s.d.executor [INFO] Execute done TUPLE source: b-5:35, stream: __ack_ack, id: {}, [3247365064986003851 -4632707886455738545] TASK: 1 DELTA: 0 >>>>> >>>>> 2017-07-27 21:47:48.874 b.s.d.executor [INFO] Processing received message FOR 1 TUPLE: source: b-5:35, stream: __ack_ack, id: {}, [3247365064986003851 2993206175355277727] >>>>> >>>>> 2017-07-27 21:47:48.874 b.s.d.executor [INFO] BOLT ack TASK: 1 TIME: 0 TUPLE: source: b-5:35, stream: __ack_ack, id: {}, [3247365064986003851 2993206175355277727] >>>>> >>>>> 2017-07-27 21:47:48.875 b.s.d.executor [INFO] Execute done TUPLE source: b-5:35, stream: __ack_ack, id: {}, [3247365064986003851 2993206175355277727] TASK: 1 DELTA: 1 >>>>> >>>>> 2017-07-27 21:47:48.898 b.s.m.n.Client [INFO] creating Netty Client, connecting to lenovo:6703, bufferSize: 5242880 >>>>> >>>>> 2017-07-27 21:47:48.902 b.s.m.loader [INFO] Shutting down receiving-thread: [top-1-1501184820, 6703] >>>>> >>>>> 2017-07-27 21:47:48.902 b.s.m.n.Client [INFO] closing Netty Client Netty-Client-lenovo/192.168.1.5:6703 >>>>> >>>>> 2017-07-27 21:47:48.902 b.s.m.n.Client [INFO] waiting up to 600000 ms to send 0 pending messages to Netty-Client-lenovo/192.168.1.5:6703 >>>>> >>>>> 2017-07-27 21:47:48.902 b.s.m.loader [INFO] Waiting for receiving-thread:[top-1-1501184820, 6703] to die >>>>> >>>>> 2017-07-27 21:47:48.903 b.s.m.loader [INFO] Shutdown receiving-thread: [top-1-1501184820, 6703] >>>>> >>>>> 2017-07-27 21:47:48.904 b.s.d.worker [INFO] Shut down receive thread >>>>> >>>>> 2017-07-27 21:47:48.904 b.s.d.worker [INFO] Terminating messaging context >>>>> >>>>> 2017-07-27 21:47:48.904 b.s.d.worker [INFO] Shutting down executors >>>>> >>>>> 2017-07-27 21:47:48.904 b.s.d.executor [INFO] Shutting down executor b-0:[8 8] >>>>> >>>>> 2017-07-27 21:47:48.905 b.s.util [INFO] Async loop interrupted! >>>>> >>>>> 2017-07-27 21:47:48.905 b.s.util [INFO] Async loop interrupted! >>>>> >>>>> 2017-07-27 21:47:48.906 b.s.d.executor [INFO] Shut down executor b-0:[8 8] >>>>> >>>>> 2017-07-27 21:47:48.906 b.s.d.executor [INFO] Shutting down executor b-8:[47 47] >>>>> >>>>> 2017-07-27 21:47:48.907 b.s.util [INFO] Async loop interrupted! >>>>> >>>>> 2017-07-27 21:47:48.907 b.s.util [INFO] Async loop interrupted! >>>>> >>>>> 2017-07-27 21:47:48.908 b.s.d.executor [INFO] Shut down executor b-8:[47 47] >>>>> >>>>> 2017-07-27 21:47:48.908 b.s.d.executor [INFO] Shutting down executor b-0:[12 12] >>>>> >>>>> 2017-07-27 21:47:48.908 b.s.util [INFO] Async loop interrupted! >>>>> >>>>> 2017-07-27 21:47:48.908 b.s.util [INFO] Async loop interrupted! >>>>> >>>>> 2017-07-27 21:47:48.908 b.s.d.executor [INFO] Shut down executor b-0:[12 12] >>>>> >>>>> 2017-07-27 21:47:48.908 b.s.d.executor [INFO] Shutting down executor b-8:[54 54] >>>>> >>>>> 2017-07-27 21:47:48.909 b.s.util [INFO] Async loop interrupted! >>>>> >>>>> 2017-07-27 21:47:48.909 b.s.util [INFO] Async loop interrupted! >>>>> >>>>> 2017-07-27 21:47:48.909 b.s.d.executor [INFO] Shut down executor b-8:[54 54] >>>>> >>>>> 2017-07-27 21:47:48.909 b.s.d.executor [INFO] Shutting down executor b-0:[2 2] >>>>> >>>>> 2017-07-27 21:47:48.909 b.s.util [INFO] Async loop interrupted! >>>>> >>>>> 2017-07-27 21:47:48.909 b.s.util [INFO] Async loop interrupted! >>>>> >>>>> 2017-07-27 21:47:48.909 b.s.d.executor [INFO] Shut down executor b-0:[2 2] >>>>> >>>>> 2017-07-27 21:47:48.909 b.s.d.executor [INFO] Shutting down executor b-2:[32 32] >>>>> >>>>> 2017-07-27 21:47:48.909 b.s.util [INFO] Async loop interrupted! >>>>> >>>>> 2017-07-27 21:47:48.910 b.s.util [INFO] Async loop interrupted! >>>>> >>>>> 2017-07-27 21:47:48.910 b.s.d.executor [INFO] Shut down executor b-2:[32 32] >>>>> >>>>> 2017-07-27 21:47:48.910 b.s.d.executor [INFO] Shutting down executor b-8:[41 41] >>>>> >>>>> 2017-07-27 21:47:48.910 b.s.util [INFO] Asy >>>>> >>>>> >>>>> >>>>> On Thu, Jul 27, 2017 at 3:11 PM, Stig Rohde Døssing <[email protected]> wrote: >>>>> >>>>> Yes, there is topology.message.timeout.secs for setting how long the topology has to process a message after it is emitted from the spout, and topology.enable.message.timeouts if you want to disable timeouts entirely. I'm assuming that's what you're asking? >>>>> >>>>> >>>>> >>>>> 2017-07-27 15:03 GMT+02:00 sam mohel <[email protected]>: >>>>> >>>>> Thanks for your patience and time. I will use debug now . But is there any settings or configurations about the time for spout? How can I increase it to try ? >>>>> >>>>> On Thursday, July 27, 2017, Stig Rohde Døssing <[email protected]> wrote: >>>>> > Last message accidentally went to you directly instead of the mailing list. >>>>> > >>>>> > Never mind what I wrote about worker slots. I think you should check that all tuples are being acked first. Then you might want to try enabling debug logging. You should also verify that your spout is emitting all the expected tuples. Since you're talking about a result file, I'm assuming your spout output is limited. >>>>> > >>>>> > 2017-07-27 10:36 GMT+02:00 Stig Rohde Døssing <[email protected]>: >>>>> >> >>>>> >> Okay. Unless you're seeing out of memory errors or know that your garbage collector is thrashing, I don't know why changing your xmx would help. Without knowing more about your topology it's hard to say what's going wrong. I think your best bet is to enable debug logging and try to figure out what happens when the topology stops writing to your result file. When you run your topology on a distributed cluster, you can use Storm UI to verify that all your tuples are being acked, maybe your tuple trees are not being acked correctly? >>>>> >> >>>>> >> Multiple topologies shouldn't be interfering with each other, the only thing I can think of is if you have too few worker slots and some of your topology's components are not being assigned to a worker. You can see this as well in Storm UI. >>>>> >> >>>>> >> 2017-07-27 8:11 GMT+02:00 sam mohel <[email protected]>: >>>>> >>> >>>>> >>> Yes I tried 2048 and 4096 to make worker more size but same problem . >>>>> >>> >>>>> >>> I have result file . It should contains the result of my processing . The size of this file should be 7 mb but what I got after sumbit the topology 50 kb only . >>>>> >>> >>>>> >>> I submitted this toplogy before . Since 4 months . But when I submitted it now I got this problem . >>>>> >>> >>>>> >>> How the toplogy working well before but now not ? >>>>> >>> >>>>> >>> Silly question and sorry for that >>>>> >>> I submitted three topology except that one . Is that make memory weak ? Or should I clean something after that >>>>> >>> >>>>> >>> On Thursday, July 27, 2017, Stig Rohde Døssing <[email protected]> wrote: >>>>> >>> > As far as I can tell the default xmx for workers in 0.10.2 is 768 megs ( https://github.com/apache/storm/blob/v0.10.2/conf/defaults.yaml#L134), your supervisor logs shows the following: >>>>> >>> > "Launching worker with command: <snip> -Xmx2048m". Is this the right configuration? >>>>> >>> > >>>>> >>> > Regarding the worker log, it looks like the components are initialized correctly, all the bolts report that they're done running prepare(). Could you explain what you expect the logs to look like and what you expect to happen when you run the topology? >>>>> >>> > >>>>> >>> > It's sometimes helpful to enable debug logging if your topology acts strange, consider trying that by setting >>>>> >>> > Config conf = new Config(); >>>>> >>> > conf.setDebug(true); >>>>> >>> > >>>>> >>> > 2017-07-27 1:43 GMT+02:00 sam mohel <[email protected]>: >>>>> >>> >> >>>>> >>> >> Same problem with distributed mode . I tried to submit toplogy in distributed with localhost and attached log files of worker and supervisor >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> On Thursday, July 27, 2017, sam mohel <[email protected]> wrote: >>>>> >>> >> > I submit my topology by this command >>>>> >>> >> > mvn package >>>>> >>> >> > mvn compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=trident.Topology >>>>> >>> >> > and i copied those lines >>>>> >>> >> > 11915 [Thread-47-b-4] INFO b.s.d.executor - Prepared bolt b-4:(40) >>>>> >>> >> > 11912 [Thread-111-b-2] INFO b.s.d.executor - Prepared bolt b-2:(14) >>>>> >>> >> > 11934 [Thread-103-b-5] INFO b.s.d.executor - Prepared bolt b-5:(45) >>>>> >>> >> > sam@lenovo:~/first-topology$ >>>>> >>> >> > from what i saw in terminal . I checked the size of the result file and found it's 50 KB each time i submit it . >>>>> >>> >> > what should i check ? >>>>> >>> >> > On Wed, Jul 26, 2017 at 9:05 PM, Bobby Evans < [email protected]> wrote: >>>>> >>> >> >> >>>>> >>> >> >> Local mode is totally separate and there are no processes launched except the original one. Those values are ignored in local mode. >>>>> >>> >> >> >>>>> >>> >> >> >>>>> >>> >> >> - Bobby >>>>> >>> >> >> >>>>> >>> >> >> >>>>> >>> >> >> On Wednesday, July 26, 2017, 2:01:52 PM CDT, sam mohel < [email protected]> wrote: >>>>> >>> >> >> >>>>> >>> >> >> Thanks so much for replying , i tried to submit topology in local mode ... i increased size of worker like >>>>> >>> >> >> conf.put(Config.TOPOLOGY_WORKER_CHILDOPTS,"-Xmx4096m" ); >>>>> >>> >> >> >>>>> >>> >> >> but got in terminal >>>>> >>> >> >> 11920 [Thread-121-b-4] INFO b.s.d.executor - Preparing bolt b-4:(25) >>>>> >>> >> >> 11935 [Thread-121-b-4] INFO b.s.d.executor - Prepared bolt b-4:(25) >>>>> >>> >> >> 11920 [Thread-67-b-5] INFO b.s.d.executor - Preparing bolt b-5:(48) >>>>> >>> >> >> 11936 [Thread-67-b-5] INFO b.s.d.executor - Prepared bolt b-5:(48) >>>>> >>> >> >> 11919 [Thread-105-b-2] INFO b.s.d.executor - Prepared bolt b-2:(10) >>>>> >>> >> >> 11915 [Thread-47-b-4] INFO b.s.d.executor - Prepared bolt b-4:(40) >>>>> >>> >> >> 11912 [Thread-111-b-2] INFO b.s.d.executor - Prepared bolt b-2:(14) >>>>> >>> >> >> 11934 [Thread-103-b-5] INFO b.s.d.executor - Prepared bolt b-5:(45) >>>>> >>> >> >> sam@lenovo:~/first-topology$ >>>>> >>> >> >> and didn't complete processing . the size of the result is 50 KB . This topology was working well without any problems . But when i tried to submit it now , i didn't get the full result >>>>> >>> >> >> >>>>> >>> >> >> On Wed, Jul 26, 2017 at 8:35 PM, Bobby Evans < [email protected]> wrote: >>>>> >>> >> >> >>>>> >>> >> >> worker.childops is the default value that is set by the system administrator in storm.yaml on each of the supervisor nodes. topology.worker.childopts is what you set in your topology conf if you want to add something more to the command line. >>>>> >>> >> >> >>>>> >>> >> >> >>>>> >>> >> >> - Bobby >>>>> >>> >> >> >>>>> >>> >> >> >>>>> >>> >> >> On Tuesday, July 25, 2017, 11:50:04 PM CDT, sam mohel < [email protected]> wrote: >>>>> >>> >> >> >>>>> >>> >> >> i'm using 0.10.2 version . i tried to write in the code >>>>> >>> >> >> conf.put(Config.WORKER_ CHILDOPTS, "-Xmx4g"); >>>>> >>> >> >> conf.put(Config.SUPERVISOR_ CHILDOPTS, "-Xmx4g"); >>>>> >>> >> >> >>>>> >>> >> >> but i didn't touch any affect . Did i write the right configurations ? >>>>> >>> >> >> Does this value is the largest ? >>>>> >>> >> >> >>>>> >>> >> > >>>>> >>> >> > >>>>> >>> > >>>>> > >>>>> > >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you. >>> >> > >
