SCRAPER AND DATA ENRICHMENT

The project consisted of two parts: scraping and data enrichment. We have been provided with the list of educational institutions of the USA and the goal was to find email addresses of people who work in these universities and colleges. We scraped through Google results to find and define universities and colleges websites. After that we went through these webpages to find contact details and grab emails. The next stage was to define the department of email owner and his name and connect this data in the database.
Scraper was implemented using Python and MySQL. The result was the database consisting of more than 6 mln records.

HIGHLIGHTS

PHP

Python

MySQL

Scraper