python - Vectorised Haversine formula with a pandas dataframe -


i know find distance between 2 latitude, longitude points need use haversine function:

def haversine(lon1, lat1, lon2, lat2):     lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])     dlon = lon2 - lon1      dlat = lat2 - lat1      = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2     c = 2 * asin(sqrt(a))      km = 6367 * c     return km 

i have dataframe 1 column latitude , column longitude. want find out how far these points set point, -56.7213600, 37.2175900. how take values dataframe , put them function?

example dataframe:

     seaz     lat          lon 1    296.40,  58.7312210,  28.3774110   2    274.72,  56.8148320,  31.2923240 3    192.25,  52.0649880,  35.8018640 4     34.34,  68.8188750,  67.1933670 5    271.05,  56.6699880,  31.6880620 6    131.88,  48.5546220,  49.7827730 7    350.71,  64.7742720,  31.3953780 8    214.44,  53.5192920,  33.8458560 9      1.46,  67.9433740,  38.4842520 10   273.55,  53.3437310,   4.4716664 

i can't confirm if calculations correct following worked:

in [11]:  def haversine(row):     lon1 = -56.7213600     lat1 = 37.2175900     lon2 = row['lon']     lat2 = row['lat']     lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])     dlon = lon2 - lon1      dlat = lat2 - lat1      = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2     c = 2 * arcsin(sqrt(a))      km = 6367 * c     return km  df['distance'] = df.apply(lambda row: haversine(row), axis=1) df out[11]:          seaz        lat        lon     distance index                                            1      296.40  58.731221  28.377411  6275.791920 2      274.72  56.814832  31.292324  6509.727368 3      192.25  52.064988  35.801864  6990.144378 4       34.34  68.818875  67.193367  7357.221846 5      271.05  56.669988  31.688062  6538.047542 6      131.88  48.554622  49.782773  8036.968198 7      350.71  64.774272  31.395378  6229.733699 8      214.44  53.519292  33.845856  6801.670843 9        1.46  67.943374  38.484252  6418.754323 10     273.55  53.343731   4.471666  4935.394528 

the following code slower on such small dataframe applied 100,000 row df:

in [35]:  %%timeit df['lat_rad'], df['lon_rad'] = np.radians(df['lat']), np.radians(df['lon']) df['dlon'] = df['lon_rad'] - math.radians(-56.7213600) df['dlat'] = df['lat_rad'] - math.radians(37.2175900) df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dlat']/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(df['lat_rad']) * np.sin(df['dlon']/2)**2))  1 loops, best of 3: 17.2 ms per loop 

compared apply function took 4.3s 250 times quicker, note in future

if compress above in one-liner:

in [39]:  %timeit df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin((np.radians(df['lat']) - math.radians(37.21759))/2)**2 + math.cos(math.radians(37.21759)) * np.cos(np.radians(df['lat']) * np.sin((np.radians(df['lon']) - math.radians(-56.72136))/2)**2))) 100 loops, best of 3: 12.6 ms per loop 

we observe further speed ups factor of ~341 times quicker.


Comments

Popular posts from this blog

javascript - how to protect a flash video from refresh? -

visual studio 2010 - Connect to informix database windows form application -

android - Associate same looper with different threads -