Python is known for for web scraping (crawling) and data science. In this extra credit exercise, we want to put out our first step on web scraping area.

computer science

Description

Python is known for for web scraping (crawling) and data science.  In this extra credit exercise, we want to put out our first step on web scraping area.  Note that web scraping takes many trials and errors to find and reap what you are looking for.

Read the following 2 introductory tutorials and have a good grasp on their contents.  Say "Yes" if you did.

  • Survey of the libraries and tools for web scraping: https://www.scrapehero.com/python-web-scraping-frameworks/

  •  (Links to an external site.)


  • Beginners guide to Web Scraping: https://www.scrapehero.com/a-beginners-guide-to-web-scraping-part-2-build-a-scraper-for-reddit/

  •  (Links to an external site.)


 Read or watch  other sources for web scrawling available on the web/youtube and list at least two of those web sites including YouTube videos with a your own review of them.

I just need for this one

3. (10 points) Write your own Python program to minimally do following:

  • Access "http://www.ucdenver.edu/pages/ucdwelcomepage.aspx" page

  • Find a site table (???)  from 'docResponsive'

  • Go to the links from the site table

  • For each link, extract its title and URL and put the pair into a dictionary

  • Dump the dictionary to a JSON file

  • Then upload the JSON file.

Below is the first 10 lines of links that I got.  Please note that there should be many different way of scraping and mine is just an example of them.  I put the sequence numbers for my own counting purpose but you don't have to.

1.Degrees & Programs - https://www.ucdenver.edu/programs

 (Links to an external site.)


2.Pre-college Programs - https://www.ucdenver.edu/programs/pre-college

 (Links to an external site.)


3.K-12 Outreach - https://www.ucdenver.edu/programs/K12Outreach

 (Links to an external site.)


4.Bachelor's Degrees - https://www.ucdenver.edu/programs/undergraduate

 (Links to an external site.)


5.Master's Degrees - https://www.ucdenver.edu/programs/masters

 (Links to an external site.)


6.Doctorate Degrees - https://www.ucdenver.edu/programs/phd

 (Links to an external site.)


7.Online Programs - https://www1.ucdenver.edu/online

 (Links to an external site.)


8.Certificate Programs - https://www.ucdenver.edu/programs/certificate

 (Links to an external site.)


9.Health Programs - https://www.ucdenver.edu/programs/health

 (Links to an external site.)


10.Continuing Education - https://www.ucdenver.edu/academics/continuing-education

 (Links to an external site.)



Related Questions in computer science category