Coding / Programming Videos

Post your favorite coding videos and share them with others!

Getting accurate geolocation using Python Web-scraping and Selenium

Source link

Introduction

In many of the software/hardware projects, it would require you to find the exact geolocation but unfortunately, most of the solutions you’d find on a Google search, shows how to find the location based on the IP address. They provide you with latitude and longitude of your internet provider server rather than that of your current location. This tutorial will help you to web-scrape your exact geolocation coordinates with the help of selenium and python.

Note: Skip the tutorial and directly find the files needed on https://github.com/joeljogy/Getting-GeoLocation

What is web-scraping?

As part of this tutorial, it is needed to know what is meant by web-scraping. Wikipedia defines web-scraping as a method of data scraping used for extracting data from websites. While web-scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or a web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

Before any web-scraping experiment, it is important to know if you’re allowed to web scrape from that specific website or not. To know that, just add “/robots.txt” to the url of the website you hope to web-scrape and check for all the permissions granted to web-crawlers. 
Eg: https://www.linkedin.com/robots.txt

Generally, in Python you would use the packages, requests along with BeautifulSoup to do your web-scraping experiments but here, we will be using selenium package.

What is Selenium?

Selenium automates browsers. That’s it! What you do with that power is entirely up to you. Primarily, it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should!) be automated as well.

Selenium has the support of some of the largest browser vendors who have taken (or are taking) steps to make Selenium a native part of their browser. It is also the core technology in countless other browser automation tools, APIs and frameworks.

We use selenium in our project for the sole purpose of automating the process of granting permission to access location in your browser. If you were to web-scrape without giving location permission, it would lead to just blank values against the coordinates.

Getting Started

Firstly, we’ll need to install the selenium package, duh! But that’s the only package we’ll need for this project. Use this link to know how to install it. Along with installing the package, you’ll need to additionally download the chromedriver.exe based on the chrome version you’re using. Find the links for all the versions here. You’re all set now! Let’s code.

Note: Make sure the chromedriver.exe file is downloaded to the same directory as your python file.

Capturing the geolocation from browser

Step 1: Import the required packages.

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait

Step 2: Make the browser grant permission to access location by default.

def getLocation():    
chrome_options = Options()
chrome_options.add_argument("--use-fake-ui-for-media-stream")
driver = webdriver.Chrome(chrome_options=chrome_options)

If you try to access a website like “ https://mycurrentlocation.net” through chrome, it would ask you to allow location access. The command ” — use-fake-ui-for-media-stream” will grant all permission for location, microphone,etc. automatically.

Step 3: GET call the webpage and wait 20secs for the page to load.

    timeout = 20
driver.get("https://mycurrentlocation.net/")
wait = WebDriverWait(driver, timeout)

Step 4: Find the XPath of the latitude and longitude elements mentioned on webpage and get selenium to find the element.
To find the XPath of an element: Right-click on the element -> Inspect element –> Copy -> Copy XPath

Last part of the script is to add the XPath of the element you want to grab and returning the elements.

    longitude = driver.find_elements_by_xpath('//*[@id="longitude"]') #Replace with any XPath    
longitude = [x.text for x in longitude]
longitude = str(longitude[0])
latitude = driver.find_elements_by_xpath('//*[@id="latitude"]')
latitude = [x.text for x in latitude]
latitude = str(latitude[0])
driver.quit()
return (latitude,longitude)

“That’s all folks!”, you have got the exact location from where you’re accessing the net. Find the entire code for this project on my GitHub repo here.

Write to [email protected] for any clarifications or to share ideas for a better implementation.

Source link

Bookmark(0)
 

Leave a Reply

Please Login to comment
  Subscribe  
Notify of
Translate »