Hi, Periya: Can you take a look at the patch of https://issues.apache.org/jira/browse/HIVE-3715 and see if you can apply the similar change to make sinc/cons more accurate for your use case? Feel free to comments on the jira as well. Thanks.
Johnny On Sat, Dec 8, 2012 at 11:23 AM, Periya.Data <periya.d...@gmail.com> wrote: > Hi Lauren and Zhang, > The book "Programming Hive" suggests to use Double (instead of float) > and also to cast any literal value to double. I am already using double for > all my computations (both in hive table schema as well as in my UDF). > Furthermore, I am not comparing two floats/doubles. I am doing some > computations involving doubles...and those minor differences are adding up. > > It looks like what Mark Grover was telling - mapping between Java > datatypes to Hive data-types. I am yet to look at that portion of the > source-code. > > Thanks and will keep you posted, > /PD > > > > On Fri, Dec 7, 2012 at 2:12 PM, Lauren Yang <lauren.y...@microsoft.com>wrote: > >> This sounds like https://issues.apache.org/jira/browse/HIVE-2586 , >> where comparing float/doubles will not work because of the way floating >> point numbers are represented.**** >> >> ** ** >> >> Perhaps there is a comparison between a float and double type because of >> some internal representation in the Java library, or the UDF.**** >> >> ** ** >> >> Ed Capriolo’s book has a good section about workarounds and caveats for >> working with floats/doubles in hive.**** >> >> ** ** >> >> Thanks,**** >> >> Lauren**** >> >> *From:* Periya.Data [mailto:periya.d...@gmail.com] >> *Sent:* Friday, December 07, 2012 1:28 PM >> *To:* user@hive.apache.org; cdh-u...@cloudera.org >> *Subject:* Hive double-precision question**** >> >> ** ** >> >> Hi Hive Users, >> I recently noticed an interesting behavior with Hive and I am unable >> to find the reason for it. Your insights into this is much appreciated. >> >> I am trying to compute the distance between two zip codes. I have the >> distances computed in various 'platforms' - SAS, R, Linux+Java, Hive UDF >> and using Hive's built-in functions. There are some discrepancies from the >> 3rd decimal place when I see the output got from using Hive UDF and Hive's >> built-in functions. Here is an example: >> >> zip1 zip 2 Hadoop Built-in function >> SAS R Linux + >> Java**** >> >> 00501 **** >> >> 11720 **** >> >> 4.49493083698542000**** >> >> 4.49508858**** >> >> 4.49508858054005**** >> >> 4.49508857976933000**** >> >> >> The formula used to compute distance is this (UDF): >> >> double long1 = Math.atan(1)/45 * ux; >> double lat1 = Math.atan(1)/45 * uy; >> double long2 = Math.atan(1)/45 * mx; >> double lat2 = Math.atan(1)/45 * my; >> >> double X1 = long1; >> double Y1 = lat1; >> double X2 = long2; >> double Y2 = lat2; >> >> double distance = 3949.99 * Math.acos(Math.sin(Y1) * >> Math.sin(Y2) + Math.cos(Y1) * Math.cos(Y2) * Math.cos(X1 >> - X2)); >> >> >> The one used using built-in functions (same as above): >> 3949.99*acos( sin(u_y_coord * (atan(1)/45 )) * >> sin(m_y_coord * (atan(1)/45 )) + cos(u_y_coord * (atan(1)/45 ))* >> cos(m_y_coord * (atan(1)/45 ))*cos(u_x_coord * >> (atan(1)/45) - m_x_coord * (atan(1)/45)) ) >> >> >> >> >> - The Hive's built-in functions used are acos, sin, cos and atan. >> - for another try, I used Hive UDF, with Java's math library (Math.acos, >> Math.atan etc) >> - All variables used are double. >> >> I expected the value from Hadoop UDF (and Built-in functions) to be >> identical with that got from plain Java code in Linux. But they are not. >> The built-in function (as well as UDF) gives 49493083698542000 whereas >> simple Java program running in Linux gives 49508857976933000. The linux >> machine is similar to the Hadoop cluster machines. >> >> Linux version - Red Hat 5.5 >> Java - latest. >> Hive - 0.7.1 >> Hadoop - 0.20.2 >> >> This discrepancy is very consistent across thousands of zip-code >> distances. It is not a one-off occurrence. In some cases, I see the >> difference from the 4th decimal place. Some more examples: >> >> zip1 zip 2 Hadoop Built-in function >> SAS R Linux + >> Java**** >> >> 00602 **** >> >> 00617 **** >> >> 42.79095253903410000**** >> >> 42.79072812**** >> >> 42.79072812185650**** >> >> 42.79072812185640000**** >> >> 00603 **** >> >> 00617 **** >> >> 40.24044016655180000**** >> >> 40.2402289**** >> >> 40.24022889740920**** >> >> 40.24022889740910000**** >> >> 00605 **** >> >> 00617 **** >> >> 40.19191761288380000**** >> >> 40.19186416**** >> >> 40.19186415807060**** >> >> 40.19186415807060000**** >> >> >> I have not tested the individual sin, cos, atan function returns. That >> will be my next test. But, at the very least, why is there a difference in >> the values between Hadoop's UDF/built-ins and that from Linux + Java? I am >> assuming that Hive's built-in mathematical functions are nothing but the >> underlying Java functions. >> >> Thanks, >> PD.**** >> > > -- > > > >