All you need to do is implement a method readJson that reads a single
file given its path. Than, you map the values of column file_path to the
respective JSON content as a string. This can be done via an UDF or
simply Dataset.map:
case class RowWithJsonUri(entity_id: String, file_path: String,
Hi,
please see Sean's answer and please read about parallelism in spark.
Regards,
Gourav Sengupta
On Mon, Jul 11, 2022 at 10:12 AM Tufan Rakshit wrote:
> so as an average every 4 core , you get back 3.6 core in Yarn , but you
> can use only 3 .
> in Kubernetes you get back 3.6 and also can use
so as an average every 4 core , you get back 3.6 core in Yarn , but you can
use only 3 .
in Kubernetes you get back 3.6 and also can use 3.6
Best
Tufan
On Mon, 11 Jul 2022 at 11:02, Yong Walt wrote:
> We were using Yarn. thanks.
>
> On Sun, Jul 10, 2022 at 9:02 PM Tufan Rakshit wrote:
>
>> Mai
We were using Yarn. thanks.
On Sun, Jul 10, 2022 at 9:02 PM Tufan Rakshit wrote:
> Mainly depends what your cluster manager Yarn or kubernates ?
> Best
> Tufan
>
> On Sun, 10 Jul 2022 at 14:38, Sean Owen wrote:
>
>> Jobs consist of tasks, each of which consumes a core (can be set to >1
>> too,