Although a lot of raw data is available in CSV, JSON, XML, and other “standard” formats

computer science

Description

Motivation for Assignment

Although a lot of raw data is available in CSV, JSON, XML, and other “standard” formats, an awful lot of published data resides within webpages, embedded within HTML markup (usually within tags, but not always). Developing strategies and techniques for extracting usable data from raw HTML is an important skill, and this assignment is designed to expose you to the techniques which must be mastered and applied when working with “web data”.


Details of Assignment

You are to study the following Wikipedia page: https://en.wikipedia.org/wiki/Taoiseach, paying particular attention to the two tables – near the bottom of the page - called “President of the Executive Council” and “Taoiseach”. The data from both tables are to be scraped. 



Related Questions in computer science category