thanks Sean.

This is the gist of the case

<https://stackoverflow.com/posts/65570917/timeline>

I have data points for x-axis from 2010 till 2020 and values for y axis. I
am using PySpark, pandas and matplotlib. Data is read into PySpark from the
underlying database and a pandas Data Frame is built on it. Data is
aggregated over each year. However, the underlying prices are provided on a
monthly basis in CSV file which has been loaded into a Hive table

summary_df = spark.sql(f"""SELECT cast(Year as int) as year,
AVGFlatPricePerYear, AVGTerracedPricePerYear, AVGSemiDetachedPricePerYear,
AVGDetachedPricePerYear FROM {v.DSDB}.yearlyhouseprices""")

df_10 = summary_df.filter(col("year").between(f'{start_date}',
f'{end_date}'))

p_dfm = df_10.toPandas()  # converting spark DF to Pandas DF


for i in range(n):

  if p_dfm.columns[i] != 'year':   # year is x axis in integer

    vcolumn = p_dfm.columns[i]

     print(vcolumn)

     params = model.guess(p_dfm[vcolumn], x = p_dfm['year'])

     result = model.fit(p_dfm[vcolumn], params, x = p_dfm['year'])

     result.plot_fit()

     if vcolumn == "AVGFlatPricePerYear":

         plt.xlabel("Year", fontdict=v.font)

         plt.ylabel("Flat house prices in millions/GBP", fontdict=v.font)

         plt.title(f"""Flat price fluctuations in {regionname} for the past
10                 years """,  fontdict=v.font)

         plt.text(0.35,

            0.45,

            "Best-fit based on Non-Linear Lorentzian Model",

            transform=plt.gca().transAxes,

            color="grey",

            fontsize=10

         )

         print(result.fit_report())

         plt.xlim(left=2009)

         plt.xlim(right=2022)

         plt.show()

         plt.close()

```

So far so good. I get a best fit plot as shown using Lorentzian model

Also I have model fit data

[[Model]]

    Model(lorentzian)

[[Fit Statistics]]

    # fitting method   = leastsq

    # function evals   = 25

    # data points      = 11

    # variables        = 3

    chi-square         = 8.4155e+09

    reduced chi-square = 1.0519e+09

    Akaike info crit   = 231.009958

    Bayesian info crit = 232.203644

[[Variables]]

    amplitude:  31107480.0 +/- 1471033.33 (4.73%) (init = 6106104)

    center:     2016.75722 +/- 0.18632315 (0.01%) (init = 2016.5)

    sigma:      8.37428353 +/- 0.45979189 (5.49%) (init = 3.5)

    fwhm:       16.7485671 +/- 0.91958379 (5.49%) == '2.0000000*sigma'

    height:     1182407.88 +/- 15681.8211 (1.33%) ==
'0.3183099*amplitude/max(2.220446049250313e-16, sigma)'

[[Correlations]] (unreported correlations are < 0.100)

    C(amplitude, sigma)  =  0.977

    C(amplitude, center) =  0.644

    C(center, sigma)     =  0.603


Now I need to predict the prices for years 2021-2022 based on this fit. Is
there any way I can use some plt functions to provide extrapolated values
for 2021 and beyond?


Thanks





On Tue, 5 Jan 2021 at 14:43, Sean Owen <sro...@gmail.com> wrote:

> If your data set is 11 points, surely this is not a distributed problem?
> or are you asking how to build tens of thousands of those projections in
> parallel?
>
> On Tue, Jan 5, 2021 at 6:04 AM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am not sure Spark forum is the correct avenue for this question.
>>
>> I am using PySpark with matplotlib to  get the best fit for data using
>> the Lorentzian Model. This curve uses 2010-2020 data points (11 on x-axis).
>> I need to predict predict the prices for years 2021-2025 based on this
>> fit. So not sure if someone can advise me? If Ok, then I can post the
>> details
>>
>> Thanks
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to