Hello Simon,
If you talk about this:
https://github.com/apache/metron/tree/master/metron-platform/metron-indexing#sensor-indexing-configuration
My settings are (I’ve tried many changes here):
{
"hdfs": {
"batchSize": 10,
"batchTimeout": 1,
"enabled": true,
"index": "ansi"
},
"elasticsearch": {
"batchSize": 10,
"batchTimeout": 1,
"enabled": true,
"index": "ansi"
},
"solr": {
"batchSize": 10,
"enabled": false,
"index": "ansi"
}
}
What is not clear for me is how these settings can influence the
random_indexing and batch_indexing topologies. Once again, the first delay I
see is between the source topic and the “enrichments” topic. Unless I’m wrong,
data are moved between these 2 topics by the parsing topology, without any
indexing action.
Stéphane
From: Simon Elliston Ball [mailto:[email protected]]
Sent: Thursday, May 16, 2019 11:40
To: [email protected]
Subject: Re: Very low throuput on topologies
Can you share your Metron batch size config and timeouts?
On Thu, 16 May 2019 at 11:31,
<[email protected]<mailto:[email protected]>> wrote:
Hello Simon,
It is what it looks like yes, but I’ve set topology.flush.tuple.freq.millis to
10, no change. Moreover, if I send let’s say 20000 lines, it will anyway take a
lot of time to be fully processed.
From: Simon Elliston Ball
[mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, May 16, 2019 11:20
To: [email protected]<mailto:[email protected]>
Subject: Re: Very low throuput on topologies
If you’re sending low volumes to test, you may be waiting on the batch timeout,
ie not triggering flushing a batch by volume, but waiting for the timeout,
which may explain your latency.
Simon
On Thu, 16 May 2019 at 10:04,
<[email protected]<mailto:[email protected]>> wrote:
Hello Michael,
So, using curl and the API, I’ve been able to collect some statistics.
Currently, it is a test platform with nearly no activity. I’ve setup a basic
parser, with the following topology:
- 6 ackers (I’ve 6 kafka partitions per topic)
- Spout // = 6
- Spout # of tasks = 6
- Parser // = 24
- Parser # of tasks = 24
I inject one line of logs with Nifi on my sensor topic. As a reminder, it needs
roughly 10 s to be visible on the enrichments topic. Here are some statistics:
"spouts": [
{
"emitted": 1160,
"spoutId": "kafkaSpout",
"requestedMemOnHeap": 128,
"errorTime": null,
"tasks": 6,
"errorHost": "",
"failed": 0,
"completeLatency": "3963.078",
"executors": 6,
"encodedSpoutId": "kafkaSpout",
"transferred": 1160,
"errorPort": "",
"requestedMemOffHeap": 0,
"errorLapsedSecs": null,
"acked": 1020,
"requestedCpu": 10,
"lastError": "",
"errorWorkerLogLink": ""
}
This completeLatency looks very high doesn’t it?
And for bolt:
{
"emitted": 0,
"requestedMemOnHeap": 128,
"errorTime": null,
"tasks": 12,
"errorHost": "",
"failed": 0,
"boltId": "parserBolt",
"executors": 12,
"processLatency": "832.962",
"executeLatency": "1.391",
"transferred": 0,
"errorPort": "",
"requestedMemOffHeap": 0,
"errorLapsedSecs": null,
"acked": 3680,
"requestedCpu": 10,
"encodedBoltId": "parserBolt",
"lastError": "",
"executed": 3680,
"capacity": "0.003",
"errorWorkerLogLink": ""
}
So, my understanding is that it takes a lot of time to ack tuples, but I don’t
know where to go now. As said below, I’ve tried the tweaks mentioned here:
https://github.com/apache/storm/blob/master/docs/Performance.md but no change.
It looks like that we are trying to fill a bucket, and send data after a given
timeout if the bucket is not full. But I don’t see any timeout that looks like
10 or 20 secondes in storm configuration.
As a reminder, I’ve Kerberos enabled on my platform, but everything seems to
work fine except Metron ingestion.
Thanks for your help,
Stéphane
From: Michael Miklavcic
[mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, May 15, 2019 16:03
To: [email protected]<mailto:[email protected]>
Subject: Re: Very low throuput on topologies
You could use curl from the cli. But if this is something you're testing out on
your local machine, I'd probably start without Kerberos enabled and work the
perf knobs there first. You should be able to see the "complete latency" from
the Storm UI on each running topology.
On Wed, May 15, 2019 at 1:33 AM
<[email protected]<mailto:[email protected]>> wrote:
Hello Nick,
Thanks for your answer. By the way, the problem already happens before
indexing, at the parser level. It takes many time to go from sensor topic to
“enrichments” topic, and again many seconds to go from “enrichments” topic to
“indexing” topic.
I’ve tried the recommendations described here:
https://github.com/apache/storm/blob/master/docs/Performance.md but no change.
The problem with Kerberos is that it is no longer possible to access Storm UI
without some tweaks that are blocked by administrator on my computer.
From: Nick Allen [mailto:[email protected]<mailto:[email protected]>]
Sent: Tuesday, May 14, 2019 23:39
To: [email protected]<mailto:[email protected]>
Subject: Re: Very low throuput on topologies
Have you increased the indexing "batch_size"? That is the first knob to start
tuning.
https://github.com/apache/metron/tree/master/metron-platform/metron-indexing#sensor-indexing-configuration
On Tue, May 14, 2019 at 10:26 AM
<[email protected]<mailto:[email protected]>> wrote:
Hello happy metron users,
I’ve a Metron cluster based on Hortonworks CP, and I’ve setup Kerberos on the
top of it, as you all probably have done since we deal with security ☺
It seems that everything is working fine, Kerberos, ranger,… but I’m facing an
issue regarding the overall throuput.
I feed my cluster with Nifi, here is what I do:
Test 1:
- Send 2 lines of logs to Kafka sensor topic with Nifi
- Use Kafka CLI consumer to read messages from sensor topic: response
is immediate
- Use kafka CLI consumer to read messages from enrichment topic:
messages are coming after nearly 20 s
Test 2:
- Send 200 lines of logs to Kafka sensor topic with Nifi
- Use Kafka CLI consumer to read messages from sensor topic: response
is immediate
- Use kafka CLI consumer to read messages from enrichment topic: some
messages are coming immediately, but it seems they come 10 by 10 (nearly), with
many seconds between each flow
It’s probably related to Storm configuration, but I don’t know where to go now.
I’ve tried to change various parameters like topology.max.spout.pending
(currently set to 500), but no improvement
Thanks for your help
Stéphane
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.
--
--
simon elliston ball
@sireb
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.
--
--
simon elliston ball
@sireb
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.