There is a feature in SparkContext to set localProperties
(setLocalProperty) where you can set your Request ID and then using
SparkListener instance read that ID with Job ID using onJobStart event.

Hope this helps.

On Tue, 27 Dec 2022, 13:04 Dhruv Toshniwal,
<dhruv.toshni...@mindtickle.com.invalid> wrote:

> TL;Dr -
> how-to-map-external-request-ids-to-spark-job-ids-for-spark-instrumentation
> <https://stackoverflow.com/questions/74794579/how-to-map-external-request-ids-to-spark-job-ids-for-spark-instrumentation>
>
> Hi team,
>
> We are the engineering team of Mindtickle Inc. and we have a use-case
> where we want to store a map of request Ids (unique API call ID) to Spark
> Job Ids. Architecturally, we have created a system where our users use
> various Analytics tools on the frontend which in turn run Spark Jobs
> internally and then serve computed data back to them. We receive various
> API calls from upstream and serve it via Apache Spark computing on the
> backend.
> However, as our customer base has grown, we have come to receive lots of
> parallel requests. We have observed that Spark Jobs take different time for
> the same API requests from upstream. Therefore, for Spark instrumentation
> purposes we wish to maintain a map of requestID generated at our end to the
> job IDs that Spark internally generates in relation to these requesrIDs.
> This will enable us to go back in time via the history server or custom
> SparkListeners to debug and improve our system. Any leads in this direction
> would be greatly appreciated. I would love to explain our use case in
> greater detail if required.
>
> Thanks and Regards,
> Dhruv Toshniwal
> SDE-2
> Mindtickle Inc.
>

Reply via email to