The requirements of this project is to build a website scraper that will populate a data warehouse with information from a dating type website. Text and images will need to be stored on a per member basis in the data warehouse.
The website dose not require athuntication. There are aprox 30,000 profiles, and 100,000 photos.
The site will need to be scraped in different stages.
The first stage would be to download the 'profile list' pages that show ten profiles at a time and inches basic information like user name, age, height...
The second stage would be downloading the main profile photos from URLs found in the first stage.
The third stage scrapes the actual profile.