Hi Anjana,
I used this code before to get some data form API call and store it into
BigQuery using Apache Beam
def get_api_data(data):
data_every_sec =
requests.get("https://min-api.cryptocompare.com/data/price?fsym=ETH&tsyms=BTC,USD,EUR").json()
return [data_every_sec]
# expected: {u'USD': 210.76, u'BTC': 0.03273, u'EUR': 184.02}
def parse_btc(btc_item):
usd, btc, eur = btc_item['USD'], btc_item['BTC'], btc_item['EUR']
return [(btc,usd,eur)]
dayData = (p
| 'get data' >> beam.ParDo(get_api_data)
| 'parse btc' >> beam.ParDo(parse_btc)
| 'Write' >> beam.io.WriteToBigQuery(...)
)
Hope it will help you...
On Tue, Jun 4, 2019 at 8:01 PM Anjana Pydi <[email protected]>
wrote:
> Hi Ankur,
>
> Thanks for the suggestion.
>
> Could you please provide me any examples if you know which are close to
> this use case.
>
> Regards,
> Anjana
> ------------------------------
> *From:* Ankur Goenka [[email protected]]
> *Sent:* Monday, June 03, 2019 4:27 PM
> *To:* [email protected]
> *Subject:* [Sender Auth Failure] Re: How to build a beam python pipeline
> which does GET/POST request to API's
>
> By looking at your usecase, the whole processing logic seems to be very
> custom.
> I would recommend using ParDo's to express your use case. If the
> processing for individual dictionary is expensive then you can potentially
> use a reshuffle operation to distribute the updation of dictionary over
> multiple workers.
>
> Note: As you are going to make write API calls your self, in case of
> worker failure, your transform can be executed multiple times.
>
> On Mon, Jun 3, 2019 at 11:41 AM Anjana Pydi <[email protected]>
> wrote:
>
>> Hi Ankur,
>>
>> Thanks for reply. Please find responses updated in below mail.
>>
>> Thanks,
>> Anjana
>> ------------------------------
>> *From:* Ankur Goenka [[email protected]]
>> *Sent:* Monday, June 03, 2019 11:01 AM
>> *To:* [email protected]
>> *Subject:* Re: How to build a beam python pipeline which does GET/POST
>> request to API's
>>
>> Thanks for providing more information.
>>
>> Some follow up questions/comments
>> 1. Call an API which would provide a dictionary as response.
>> Question: Do you need to make multiple of these API calls? If yes, what
>> distinguishes API call1 from call2? If its the input to the API, then can
>> you provide the inputs to in a file etc? What I am trying to identify is an
>> input source to the pipeline so that beam can distribute the work.
>> Answer : When an API call is made, it can provide a list of dictionaries
>> as response, we have to go through every dictionary, do the same
>> transformations for each and send it.
>> 2. Transform dictionary to add / remove few keys.
>> 3. Send transformed dictionary as JSON to an API which prints this JSON
>> as output.
>> Question: Are these write operation idempotent? As you are doing your own
>> api calls, its possible that after a failure, the calls are done again for
>> the same input. If write calls are not idempotent then their can be
>> duplicate data.
>> Answer : Suppose, if I receive a list of 1000 dictionaries as response
>> when I called API in point1, I should do only 1000 write operations
>> respectively to each input. If there is a failure for any input, only that
>> should not be posted and remaining should be posted successfully.
>>
>> On Sat, Jun 1, 2019 at 8:13 PM Anjana Pydi <[email protected]>
>> wrote:
>>
>>> Hi Ankur,
>>>
>>> Thanks for the reply! Below is more details of the usecase:
>>>
>>> 1. Call an API which would provide a dictionary as response.
>>> 2. Transform dictionary to add / remove few keys.
>>> 3. Send transformed dictionary as JSON to an API which prints this JSON
>>> as output.
>>>
>>> Please let me know in case of any clarifications.
>>>
>>> Thanks,
>>> Anjana
>>> ------------------------------
>>> *From:* Ankur Goenka [[email protected]]
>>> *Sent:* Saturday, June 01, 2019 6:47 PM
>>> *To:* [email protected]
>>> *Subject:* Re: How to build a beam python pipeline which does GET/POST
>>> request to API's
>>>
>>> Hi Anjana,
>>>
>>> You can write your API logic in a ParDo and subsequently pass the
>>> elements to other ParDos to transform and eventually make an API call to to
>>> another endpoint.
>>>
>>> However, this might not be a good fit for Beam as the input is not well
>>> defined and hence scaling and "once processing" of elements will not be
>>> possible as their is no well defined input.
>>>
>>> It will be better to elaborate a bit more on the usecase for better
>>> suggestions.
>>>
>>> Thanks,
>>> Ankur
>>>
>>> On Sat, Jun 1, 2019 at 5:50 PM Anjana Pydi <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a requirement to create an apache beam python pipeline to read a
>>>> JSON from an API endpoint, transform it (add/remove few fields)and send the
>>>> transformed JSON to another API endpoint.
>>>>
>>>> Can anyone please provide some suggestions on how to do it.
>>>>
>>>> Thanks,
>>>> Anjana
>>>> -----------------------------------------------------------------------------------------------------------------------
>>>> The information contained in this communication is intended solely for the
>>>> use of the individual or entity to whom it is addressed and others
>>>> authorized to receive it. It may contain confidential or legally privileged
>>>> information. If you are not the intended recipient you are hereby notified
>>>> that any disclosure, copying, distribution or taking any action in reliance
>>>> on the contents of this information is strictly prohibited and may be
>>>> unlawful. If you are not the intended recipient, please notify us
>>>> immediately by responding to this email and then delete it from your
>>>> system. Bahwan Cybertek is neither liable for the proper and complete
>>>> transmission of the information contained in this communication nor for any
>>>> delay in its receipt.
>>>>
>>> -----------------------------------------------------------------------------------------------------------------------
>>> The information contained in this communication is intended solely for the
>>> use of the individual or entity to whom it is addressed and others
>>> authorized to receive it. It may contain confidential or legally privileged
>>> information. If you are not the intended recipient you are hereby notified
>>> that any disclosure, copying, distribution or taking any action in reliance
>>> on the contents of this information is strictly prohibited and may be
>>> unlawful. If you are not the intended recipient, please notify us
>>> immediately by responding to this email and then delete it from your
>>> system. Bahwan Cybertek is neither liable for the proper and complete
>>> transmission of the information contained in this communication nor for any
>>> delay in its receipt.
>>>
>> -----------------------------------------------------------------------------------------------------------------------
>> The information contained in this communication is intended solely for the
>> use of the individual or entity to whom it is addressed and others
>> authorized to receive it. It may contain confidential or legally privileged
>> information. If you are not the intended recipient you are hereby notified
>> that any disclosure, copying, distribution or taking any action in reliance
>> on the contents of this information is strictly prohibited and may be
>> unlawful. If you are not the intended recipient, please notify us
>> immediately by responding to this email and then delete it from your
>> system. Bahwan Cybertek is neither liable for the proper and complete
>> transmission of the information contained in this communication nor for any
>> delay in its receipt.
>>
> -----------------------------------------------------------------------------------------------------------------------
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you are not the intended recipient, please notify us
> immediately by responding to this email and then delete it from your
> system. Bahwan Cybertek is neither liable for the proper and complete
> transmission of the information contained in this communication nor for any
> delay in its receipt.
>
--
Soliman ElSaber
Data Engineer
www.mindvalley.com