[Spark Core] [Advanced] [How-to] How to map any external field to job ids spawned by Spark.

Dhruv Toshniwal Tue, 27 Dec 2022 05:04:15 -0800

TL;Dr -
how-to-map-external-request-ids-to-spark-job-ids-for-spark-instrumentation
<https://stackoverflow.com/questions/74794579/how-to-map-external-request-ids-to-spark-job-ids-for-spark-instrumentation>


Hi team,

We are the engineering team of Mindtickle Inc. and we have a use-case where
we want to store a map of request Ids (unique API call ID) to Spark Job
Ids. Architecturally, we have created a system where our users use various
Analytics tools on the frontend which in turn run Spark Jobs internally and
then serve computed data back to them. We receive various API calls from
upstream and serve it via Apache Spark computing on the backend.
However, as our customer base has grown, we have come to receive lots of
parallel requests. We have observed that Spark Jobs take different time for
the same API requests from upstream. Therefore, for Spark instrumentation
purposes we wish to maintain a map of requestID generated at our end to the
job IDs that Spark internally generates in relation to these requesrIDs.
This will enable us to go back in time via the history server or custom
SparkListeners to debug and improve our system. Any leads in this direction
would be greatly appreciated. I would love to explain our use case in
greater detail if required.

Thanks and Regards,
Dhruv Toshniwal
SDE-2
Mindtickle Inc.

[Spark Core] [Advanced] [How-to] How to map any external field to job ids spawned by Spark.

Reply via email to