How can I loop scraping data for multiple pages in a website using python and beautifulsoup4

The PGA website's search have multiple pages, the url follows the pattern:

http://www.pga.com/golf-courses/search?page=1 # Additional info after page parameter here

this means you can read the content of the page, then change the value of page by 1, and read the the next page.... and so on.

import csv
import requests 
from bs4 import BeautifulSoup
for i in range(907):      # Number of pages plus one 
    url = "http://www.pga.com/golf-courses/search?page={}&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0".format(i)
    r = requests.get(url)
    soup = BeautifulSoup(r.content)

    # Your code for each individual page here

if you still read this post , you can try this code too....

from urllib.request import urlopen
from bs4 import BeautifulSoup

file = "Details.csv"
f = open(file, "w")
Headers = "Name,Address,City,Phone,Website\n"
f.write(Headers)
for page in range(1,5):
    url = "http://www.pga.com/golf-courses/search?page={}&searchbox=Course%20Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0".format(page)
    html = urlopen(url)
    soup = BeautifulSoup(html,"html.parser")
    Title = soup.find_all("div", {"class":"views-field-nothing"})
    for i in Title:
        try:
            name = i.find("div", {"class":"views-field-title"}).get_text()
            address = i.find("div", {"class":"views-field-address"}).get_text()
            city = i.find("div", {"class":"views-field-city-state-zip"}).get_text()
            phone = i.find("div", {"class":"views-field-work-phone"}).get_text()
            website = i.find("div", {"class":"views-field-website"}).get_text()
            print(name, address, city, phone, website)
            f.write("{}".format(name).replace(",","|")+ ",{}".format(address)+ ",{}".format(city).replace(",", " ")+ ",{}".format(phone) + ",{}".format(website) + "\n")
        except: AttributeError
f.close()

where it is written range(1,5) just change that with 0,to the last page , and you will get all details in CSV, i tried very hard to get your data in proper format but it's hard:).

You're putting a link to a single page, it's not going to iterate through each one on its own.

Page 1:

url = "http://www.pga.com/golf-courses/search?searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0"

Page 2:

http://www.pga.com/golf-courses/search?page=1&searchbox=Course%20Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0

Page 907: http://www.pga.com/golf-courses/search?page=906&searchbox=Course%20Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0

Since you're running for page 1 you'll only get 20. You'll need to create a loop that'll run through each page.

You can start off by creating a function that does one page then iterate that function.

Right after the search? in the url, starting at page 2, page=1 begins increasing until page 907 where it's page=906.

How can I loop scraping data for multiple pages in a website using python and beautifulsoup4

Tags:

Python

Csv

Loops

Web Scraping

Beautifulsoup

Related

Recent Posts