Re: API Problem

2022-06-10 Thread Enrico Minack
Hi, This adds a column with value "1" (string) *in all rows*: |df = df.withColumn("uniqueID", lit("1")) | ||This counts the rows for all rows that have the same |uniqueID|, *which are all rows*. The window does not make much sense. And it orders all rows that have the same |uniqueID |by |uniqu

Re: API Problem

2022-06-10 Thread Sid
Hi Enrico, Thanks for your time. Much appreciated. I am expecting the payload to be as a JSON string to be a record like below: {"A":"some_value","B":"some_value"} Where A and B are the columns in my dataset. On Fri, Jun 10, 2022 at 6:09 PM Enrico Minack wrote: > Sid, > > just recognized yo

Re: API Problem

2022-06-10 Thread Enrico Minack
Sid, just recognized you are using Python API here. Then ||struct(*colsListToBePassed))|| should be correct, given it takes a list of strings. Your method |call_to_cust_bulk_api| takes argument |payload|, which is a ||Column||. This is then used in |custRequestBody|. That is pretty strange

Re:

2022-06-10 Thread Aironman DirtDiver
Generally it is never a good idea to run processes as root on any production machines. The main problem is the security problems not found or disclosed, so if someone malicious takes advantage of a vulnerability like the ones described below, they can first get in, and little by little escalate pri

Re: API Problem

2022-06-10 Thread Enrico Minack
Hi Sid, ||finalDF = finalDF.repartition(finalDF.rdd.getNumPartitions()) .withColumn("status_for_batch", call_to_cust_bulk_api(policyUrl, to_json(struct(*colsListToBePassed | | You are calling ||withColumn|| with the result of ||call_to_cust_bulk_api|| as the second argument. That result

Re: API Problem

2022-06-10 Thread Sid
Hi Stelios, Thank you so much for your help. If I use lit it gives an error of column not iterable. Can you suggest a simple way of achieving my use case? I need to send the entire column record by record to the API in JSON format. TIA, Sid On Fri, Jun 10, 2022 at 2:51 PM Stelios Philippou w

[no subject]

2022-06-10 Thread Rodrigo
Hi Everyone, My Security team has raised concerns about the requirement for root group membership for Spark running on Kubernetes. Does anyone know the reasons for that requirement, how insecure it is, and any alternatives if at all? Thanks, Rodrigo

Re: API Problem

2022-06-10 Thread Stelios Philippou
Sid Then the issue is on the data in the way you are creating them for that specific column. call_to_cust_bulk_api(policyUrl,to_json(struct(*colsListToBePassed))) Perhaps wrap that in a lit(call_to_cust_bulk_api(policyUrl,to_json(struct(*colsListToBePassed else you will need to start sendin

Re: API Problem

2022-06-10 Thread Sid
Still, it is giving the same error. On Fri, Jun 10, 2022 at 5:13 AM Sean Owen wrote: > That repartition seems to do nothing? But yes the key point is use col() > > On Thu, Jun 9, 2022, 9:41 PM Stelios Philippou wrote: > >> Perhaps >> >> >> finalDF.repartition(finalDF.rdd.getNumPartitions()).wi