Thanks!!! On Mon, Jun 18, 2018 at 4:41 PM, Chamikara Jayalath <chamik...@google.com> wrote:
> A ParDo should always return an iterator not a string. So if you want to > output a single string it should either be "return [str]" or "yield str". > > > On Mon, Jun 18, 2018 at 1:39 PM OrielResearch Eila Arich-Landkof < > e...@orielresearch.org> wrote: > >> Thanks for the response. >> I tried this within the current parDo, CreateColForSampleFn, Apache beam >> returns a warning with recommendation not to return a string. >> >> So, my questions are: >> - Is it essential to separate this transformation in a different ParDo? >> - Should I ignore that message? When is this message relevant? >> >> Many thanks, >> Eila >> >> On Mon, Jun 18, 2018 at 2:52 PM Lukasz Cwik <lc...@google.com> wrote: >> >>> User is the correct mailing list. >>> >>> beam.io.WriteToText takes 'strings' which means that you have to format >>> the whole line yourself. You'll want to apply another ParDo >>> after CreateColForSampleFn which takes the 1x164 record and concatenates >>> each value with ',' in between. >>> >>> On Mon, Jun 18, 2018 at 9:00 AM OrielResearch Eila Arich-Landkof < >>> e...@orielresearch.org> wrote: >>> >>>> Hi, >>>> >>>> Is anyone listening on the user@ mailing list? or should I use a >>>> different mailing list? >>>> >>>> I have made some progress. >>>> - ParDo returns a List now >>>> - add a header to the WriteToText. >>>> >>>> The pipeline looks like that: >>>> ExploreData = (p | "Extract the rows from dataframe" >> beam.io.Read( >>>> beam.io.BigQuerySource('archs4.Debug_annotation')) >>>> | "create more columns" >> beam.ParDo( >>>> CreateColForSampleFn(colListSubset,outputPath))) >>>> >>>> (ExploreData | 'writing to CSV files' >> beam.io.WriteToText('gs:// >>>> dataExploration.txt',file_name_suffix='.csv',num_shards= >>>> 1,append_trailing_newlines=True,header=colListStr)) >>>> >>>> >>>> The remaining issue is that the output has new line after each value: >>>> >>>> *None >>>> None >>>> None >>>> None >>>> None >>>> 30 >>>> Primary Tissue >>>> None >>>> None >>>> None* >>>> >>>> Please let me know how do I get read from this new lines. I hope to be >>>> able to open the output file with Google Sheet. >>>> >>>> >>>> Thanks, >>>> >>>> Eila >>>> >>>> >>>> >>>> On Fri, Jun 15, 2018 at 2:45 PM, OrielResearch Eila Arich-Landkof < >>>> e...@orielresearch.org> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I am running a pipeline, where a table from BQ is being processed line >>>>> by line using ParDo function. >>>>> CreateColForSampleFn generates a data frame, with headers and values >>>>> (shape: 1x164 ) that I want to pass to WriteToText. >>>>> See the followings: >>>>> >>>>> ExploreData = (p | "Extract the rows from dataframe" >> beam.io.Read( >>>>> beam.io.BigQuerySource('archs4.Debug_annotation')) >>>>> | "create more columns" >> beam.ParDo( >>>>> CreateColForSampleFn(colListSubset,outputPath))) >>>>> >>>>> (ExploreData | 'writing to CSV files' >> beam.io.WriteToText('gs:// >>>>> dataExploration.txt',num_shards=1)) >>>>> >>>>> My questions are related to the returned DF and WriteToText: >>>>> 1. when I pass DF from the CreateColForSampleFn to WriteToText , I >>>>> get only the headers: >>>>> >>>>> Sample_contact_phone >>>>> Sample_extract_protocol_ch1 >>>>> Sample_platform_id >>>>> Sick >>>>> Sample_title >>>>> index >>>>> Sample_last_update_date >>>>> Sample_contact_country >>>>> Sample_channel_count >>>>> Sample_library_source >>>>> Sample_taxid_ch1 >>>>> >>>>> >>>>> 2. When I return the df in a list [df], I get the following txt for >>>>> each row (including the dimensions) >>>>> >>>>> Sample_contact_phone Sample_extract_protocol_ch1 >>>>> Sample_platform_id Sick >>>>> >>>>> 0 Library construction protocol: Four µg of >>>>> tota... GPL11154 None >>>>> >>>>> [1 rows x 168 columns] >>>>> >>>>> >>>>> >>>>> I want to generate a text file that includes: >>>>> - One header (if needed, I will add it after the pipeline completed) >>>>> - All the values from each rows that was processed and generated DF >>>>> - Full cell values, without ... in the middle >>>>> >>>>> What am I missing? any advice? >>>>> >>>>> Thanks, >>>>> -- >>>>> Eila >>>>> www.orielresearch.org >>>>> https://www.meetu >>>>> <https://www.meetup.com/Deep-Learning-In-Production/>p.co >>>>> <https://www.meetup.com/Deep-Learning-In-Production/>m/Deep- >>>>> Learning-In-Production/ >>>>> <https://www.meetup.com/Deep-Learning-In-Production/> >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Eila >>>> www.orielresearch.org >>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/> >>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>m/Deep- >>>> Learning-In-Production/ >>>> <https://www.meetup.com/Deep-Learning-In-Production/> >>>> >>>> >>>> -- >> Eila >> www.orielresearch.org >> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/> >> p.co <https://www.meetup.com/Deep-Learning-In-Production/>m/Deep- >> Learning-In-Production/ >> <https://www.meetup.com/Deep-Learning-In-Production/> >> >> >> -- Eila www.orielresearch.org https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>p.co <https://www.meetup.com/Deep-Learning-In-Production/> m/Deep-Learning-In-Production/ <https://www.meetup.com/Deep-Learning-In-Production/>