The Wonders of Beautiful Soup

Steven Diamond
2 min readApr 21, 2020

If you had asked me just last week what beautiful soup was, I might have asked you if you meant tomato-basil (one of my favorites). It turns out that Beautiful Soup is a Python library that enables data scientists to access data in HTML files through a process called webscraping. You can navigate and search a website for exactly what you need for a study.

As we were being taught to use this library, I thought about all of the ways that this would have helped me during my years as a marketer:

  • Gathering packaging options at cable operators
  • Checking to make sure that VOD programming was available for viewing with the titles and
  • Easily accessing pricing information in real time to match up with sales data

Using Beautiful Soup is relatively easy. It takes very little code to import the library, request access to a webpage (url) and then create an object to search that site. It can take a little trial and error to identify the exact path you need to get to your information, but once you find it, you can set up iterative processes to gather data, clean it, and build DataFrames/tables. Beautiful Soup even helps this process along by converting incoming information into Unicode text.

It’s such an exciting time for me as I learn about amazing tools like Beautiful Soup. Can’t wait to see what they teach us this week. Time to go have my lunch.

From TheCozyApron.com

--

--

Steven Diamond

After spending my career in Marketing and Business Development, I am taking the Data Science Immersive course at GA and looking forward to the next step.