------------------------------
Message: 2 Date: Mon, 03 Apr 2017 11:36:10 +0200 From: Peter Otten <__pete...@web.de> To: tutor@python.org Subject: Re: [Tutor] Euclidean Distances between Atoms in a Molecule. Message-ID: <obt525$ths$1...@blaine.gmane.org> Content-Type: text/plain; charset="ISO-8859-1" Stephen P. Molnar wrote: > I am trying to port a program that I wrote in FORTRAN twenty years ago > into Python 3 and am having a hard time trying to calculate the > Euclidean distance between each atom in the molecule and every other > atom in the molecule. > > Here is a typical table of coordinates: > > > MASS X Y Z > 0 12.011 -3.265636 0.198894 0.090858 > 1 12.011 -1.307161 1.522212 1.003463 > 2 12.011 1.213336 0.948208 -0.033373 > 3 14.007 3.238650 1.041523 1.301322 > 4 12.011 -5.954489 0.650878 0.803379 > 5 12.011 5.654476 0.480066 0.013757 > 6 12.011 6.372043 2.731713 -1.662411 > 7 12.011 7.655753 0.168393 2.096802 > 8 12.011 5.563051 -1.990203 -1.511875 > 9 1.008 -2.939469 -1.327967 -1.247635 > 10 1.008 -1.460475 2.993912 2.415410 > 11 1.008 1.218042 0.451815 -2.057439 > 12 1.008 -6.255901 2.575035 1.496984 > 13 1.008 -6.560562 -0.695722 2.248982 > 14 1.008 -7.152500 0.390758 -0.864115 > 15 1.008 4.959548 3.061356 -3.139100 > 16 1.008 8.197613 2.429073 -2.588339 > 17 1.008 6.503322 4.471092 -0.543939 > 18 1.008 7.845274 1.892126 3.227577 > 19 1.008 9.512371 -0.273198 1.291080 > 20 1.008 7.147039 -1.365346 3.393778 > 21 1.008 4.191488 -1.928466 -3.057804 > 22 1.008 5.061650 -3.595015 -0.302810 > 23 1.008 7.402586 -2.392148 -2.374554 > > What I need for further calculation is a matrix of the Euclidean > distances between the atoms. > > So far in searching the Python literature I have only managed to confuse > myself and would greatly appreciate any pointers towards a solution. > > Thanks in advance. > Stitched together with heavy use of a search engine: $ cat data.txt MASS X Y Z 0 12.011 -3.265636 0.198894 0.090858 1 12.011 -1.307161 1.522212 1.003463 2 12.011 1.213336 0.948208 -0.033373 3 14.007 3.238650 1.041523 1.301322 4 12.011 -5.954489 0.650878 0.803379 5 12.011 5.654476 0.480066 0.013757 6 12.011 6.372043 2.731713 -1.662411 7 12.011 7.655753 0.168393 2.096802 8 12.011 5.563051 -1.990203 -1.511875 9 1.008 -2.939469 -1.327967 -1.247635 10 1.008 -1.460475 2.993912 2.415410 11 1.008 1.218042 0.451815 -2.057439 12 1.008 -6.255901 2.575035 1.496984 13 1.008 -6.560562 -0.695722 2.248982 14 1.008 -7.152500 0.390758 -0.864115 15 1.008 4.959548 3.061356 -3.139100 16 1.008 8.197613 2.429073 -2.588339 17 1.008 6.503322 4.471092 -0.543939 18 1.008 7.845274 1.892126 3.227577 19 1.008 9.512371 -0.273198 1.291080 20 1.008 7.147039 -1.365346 3.393778 21 1.008 4.191488 -1.928466 -3.057804 22 1.008 5.061650 -3.595015 -0.302810 23 1.008 7.402586 -2.392148 -2.374554 $ python3 Python 3.4.3 (default, Nov 17 2016, 01:08:31) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy, pandas, scipy.spatial.distance as dist >>> df = pandas.read_table("data.txt", sep=" ", skipinitialspace=True) >>> a = numpy.array(df[["X", "Y", "Z"]]) >>> dist.squareform(dist.pdist(a, "euclidean")) <snip big matrix> Here's an example with just the first 4 atoms: >>> dist.squareform(dist.pdist(a[:4], "euclidean")) array([[ 0. , 2.53370139, 4.54291701, 6.6694065 ], [ 2.53370139, 0. , 2.78521357, 4.58084922], [ 4.54291701, 2.78521357, 0. , 2.42734737], [ 6.6694065 , 4.58084922, 2.42734737, 0. ]]) See https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html[https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html] There may be a way to do this with pandas.pivot_table(), but I didn't manage to find that. As Alan says, this is not the appropriate forum for the topic you are belabouring. Work your way through a Python tutorial to pick up the basics (we can help you with this), then go straight to where the (numpy/scipy) experts are. ------------------------------ An alternative starting from the numpy array "a" from Peter answer: import numpy as np #Taking the number of rows and columns of the array anrows, ancols = np.shape(a) # Gather the coordinates as one dimensional arrays a_new = a.reshape(anrows, 1, ancols) # Takes the difference between each of the elements (one Vs all) diff = a_new - a # Computes the sum of the squared difference D = (diff ** 2).sum(2) # Takes the square root D = np.sqrt(D) #Check that both answers are the same: print(D-dist.squareform(dist.pdist(a, "euclidean"))) Salut, Sergio Avoiding for loops section of: https://www.packtpub.com/big-data-and-business-intelligence/numerical-and-scientific-computing-scipy-video _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor