It's hard to evaluate without knowing what you're doing. Generally, using a built-in function will be fastest. pandas UDFs can be faster than normal UDFs if you can take advantage of processing multiple rows at once.
On Tue, Mar 7, 2023 at 6:47 AM neha garde <neha.r.ga...@gmail.com> wrote: > Hello All, > > I need help deciding on what is better, pandas udfs or inbuilt functions > I have to perform a transformation where I managed to compare the two for > a few thousand records > and pandas_udf infact performed better. > Given the complexity of the transformation, I also found pandas_udf makes > it more readable. > I also found a lot of comparisons made between normal udfs and pandas_udfs > > What I am looking forward to is whether pandas_udfs will behave as a > normal pyspark in-built data. > How do pandas_udfs work internally, and will they be equally performant on > bigger sets of data.? > I did go through a few documents but wasn't able to get a clear idea. > I am mainly looking from the performance perspective. > > Thanks in advance > > > Regards, > Neha R.Garde. >