Re: How reading works?

2022-07-13 Thread Sid
files. >> The one that I know about is Parquet. Like this link explains Spark: >> Understand the Basic of Pushed Filter and Partition Filter Using Parquet >> File >> <https://medium.com/@songkunjump/spark-understand-the-basic-of-pushed-filter-and-partition-filter-using-parq

Re: How reading works?

2022-07-05 Thread Bjørn Jørgensen
-pushed-filter-and-partition-filter-using-parquet-file-3e5789e260bd> > > > > > > tir. 5. jul. 2022 kl. 21:21 skrev Sid : > >> Hi Team, >> >> I still need help in understanding how reading works exactly? >> >> Thanks, >> Sid >> >&g

Re: How reading works?

2022-07-05 Thread Bjørn Jørgensen
ile <https://medium.com/@songkunjump/spark-understand-the-basic-of-pushed-filter-and-partition-filter-using-parquet-file-3e5789e260bd> tir. 5. jul. 2022 kl. 21:21 skrev Sid : > Hi Team, > > I still need help in understanding how reading works exactly? > > Thanks, > Si

Re: How reading works?

2022-07-05 Thread Sid
Hi Team, I still need help in understanding how reading works exactly? Thanks, Sid On Mon, Jun 20, 2022 at 2:23 PM Sid wrote: > Hi Team, > > Can somebody help? > > Thanks, > Sid > > On Sun, Jun 19, 2022 at 3:51 PM Sid wrote: > >> Hi, >> >> I alrea

Re: How reading works?

2022-06-20 Thread Sid
Hi Team, Can somebody help? Thanks, Sid On Sun, Jun 19, 2022 at 3:51 PM Sid wrote: > Hi, > > I already have a partitioned JSON dataset in s3 like the below: > > edl_timestamp=2022090800 > > Now, the problem is, in the earlier 10 days of data collection there was a > duplicate columns

How reading works?

2022-06-19 Thread Sid
Hi, I already have a partitioned JSON dataset in s3 like the below: edl_timestamp=2022090800 Now, the problem is, in the earlier 10 days of data collection there was a duplicate columns issue due to which we couldn't read the data. Now the latest 10 days of data are proper. So, I am trying