Dear tutors, I am in a beginning-level computer science class in college and am running into problems with an assignment.
The assignment is as follows: Statisticians are fond of drawing regression lines. In statistics and other fields where people analyze lots of data, one of the most commonly used regression lines is called the “least squares line.” This is the line that is supposed to best fit the set of data points, in the sense that it minimizes the squared vertical distances between the points and the line. Why this should be a good fit is beyond the scope of this assignment. Presume that you have a collection of n two-dimensional data points. I’ll give it as a list of lists, where each of the lists inside represents one data point. Data :[ [x1, y1], [x2, y2], [x3, y3], …, [xn, yn]] Compute the following The regression line is then given by where m and b may be obtained by and Your task is to compute the m and b (slope and intercept, respectively) for a set of data. You have to analyze the data as given, not count or add anything yourself. Your program should do everything, even figure out how many data points there are. Here’s your data: First set: [ [3, 1], [4, 3], [6, 4], [7, 6], [8, 8], [9, 8] ] Second set: [ [63, 11], [22, 7.5], [63, 11], [28, 10], [151, 12], [108, 10], [18, 8], [115, 10], [31,7], [44, 9] ] Find m and b, then calculate an estimate for x = 5 using the first data set. That is, plug in 5 for x and see what y you get. For the second set, try x = 95. Turn in: code, m, b, and the estimates for both data sets. *********************************************************************************************************************** There’s an easy way to walk through the data and extract the values you need. Use a for loop. Try this: for item in data: [x, y] = item print(x) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For extra credit: draw a scatter plot of the data, and draw in the least squares line. Scale the window to fit, making it a bit wider and higher than the data requires, so that some of the points are near but not on the edges of the window. Then sketch in the regression line. Note that you should calculate the window size based on the data – don’t set them yourself; find the max and min values for x and y. You can print the scatter plot, or point me toward your web page. In any case, show me the code. So far, my program is as follows: Data = [[3,1],[4,3],[6, 4],[7, 6],[8, 8],[9, 8]] def X(): accX = 0 for item in Data: [x,y] = item accX = accX + x print (accX) def Y(): accY = 0 for item in Data: [x,y] = item accY = accY + y print (accY) def P(): accXY = 0 for item in Data: [x,y] = item accXY = accXY + (y*x) print (accXY) def Q(): accX2 = 0 for item in Data: [x,y] = item accX2 = accX2 + (x**2) print (accX2) X() Y() P() Q() def B(): ((Y() * Q()) - (P() * X())) / ((6 * Q()) - (X()**2)) def M(): ((Y() * Q()) - (P() * X())) / (X() * Q()) B() M() Now, my functions for X, Y, P, and Q are correct, but I have a couple of problems when it comes to continuing. First of all, despite what my teacher has told me, my method for trying to multiply X,Y,P, and Q's results in the functions for B and M are not working. I'm not sure if there is a way to make functions into variables or how to solve this problem. Second, I am confused as to what my teacher means to do when it comes to inputting different values of x. Find m and b, then calculate an estimate for x = 5 using the first data set. That is, plug in 5 for x and see what y you get. For the second set, try x = 95. Turn in: code, m, b, and the estimates for both data sets. I mean, I know I need to calculate the line of best fit for the data sets using B and M, but what in the world is x supposed to do and where does it go? How do I program this? This is especially harder since I've never taken a proper stat class before. Thank you all so much! -- Colleen Glaeser songbird42...@gmail.com 636.357.8519
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor