How to scrape all topics from twitter

To scrape all the main topics e.g. Arts & culture, Business & finance, etc using Selenium and python you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using XPATH and text attribute:

    driver.get("https://twitter.com/i/flow/topics_selector")
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[contains(., 'see top Tweets about them in your timeline')]//following::div[@role='button']/div/span")))])
    
  • Using XPATH and get_attribute():

    driver.get("https://twitter.com/i/flow/topics_selector")
    print([my_elem.get_attribute("textContent") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[contains(., 'see top Tweets about them in your timeline')]//following::div[@role='button']/div/span")))])
    
  • Console Output:

    ['Arts & culture', 'Business & finance', 'Careers', 'Entertainment', 'Fashion & beauty', 'Food', 'Gaming', 'Lifestyle', 'Movies and TV', 'Music', 'News', 'Outdoors', 'Science', 'Sports', 'Technology', 'Travel']
    

To scrape all the main and sub topics using Selenium and WebDriver you can use the following Locator Strategy:

  • Using XPATH and get_attribute("textContent"):

    driver.get("https://twitter.com/i/flow/topics_selector")
    elements =  WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[contains(., 'see top Tweets about them in your timeline')]//following::div[@role='button']/div/span")))
    for element in elements:
        element.click()
    print([my_elem.get_attribute("textContent") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@role='button']/div/span[text()]")))])
    driver.quit()
    
  • Console Output:

    ['Arts & culture', 'Animation', 'Art', 'Books', 'Dance', 'Horoscope', 'Theater', 'Writing', 'Business & finance', 'Business personalities', 'Business professions', 'Cryptocurrencies', 'Careers', 'Education', 'Fields of study', 'Entertainment', 'Celebrities', 'Comedy', 'Digital creators', 'Entertainment brands', 'Podcasts', 'Popular franchises', 'Theater', 'Fashion & beauty', 'Beauty', 'Fashion', 'Food', 'Cooking', 'Cuisines', 'Gaming', 'Esports', 'Game development', 'Gaming hardware', 'Gaming personalities', 'Tabletop gaming', 'Video games', 'Lifestyle', 'Animals', 'At home', 'Collectibles', 'Family', 'Fitness', 'Unexplained phenomena', 'Movies and TV', 'Movies', 'Television', 'Music', 'Alternative', 'Bollywood music', 'C-pop', 'Classical music', 'Country music', 'Dance music', 'Electronic music', 'Hip-hop & rap', 'J-pop', 'K-hip hop', 'K-pop', 'Metal', 'Musical instruments', 'Pop', 'R&B and soul', 'Radio stations', 'Reggae', 'Reggaeton', 'Rock', 'World music', 'News', 'COVID-19', 'Local news', 'Social movements', 'Outdoors', 'Science', 'Biology', 'Sports', 'American football', 'Australian rules football', 'Auto racing', 'Baseball', 'Basketball', 'Combat Sports', 'Cricket', 'Extreme sports', 'Fantasy sports', 'Football', 'Golf', 'Gymnastics', 'Hockey', 'Lacrosse', 'Pub sports', 'Rugby', 'Sports icons', 'Sports journalists & coaches', 'Tennis', 'Track & field', 'Water sports', 'Winter sports', 'Technology', 'Computer programming', 'Cryptocurrencies', 'Data science', 'Information security', 'Operating system', 'Tech brands', 'Tech personalities', 'Travel', 'Adventure travel', 'Destinations', 'Transportation']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC