How to write a Twitter Data Scrapper in Python
There are two popular python modules that helps collect tweets via APIs: tweepy and tweety.
In this tutorial, we will find out how to work with both modules.
tweepy module
To collect Twitter data using Tweepy in Python, we will need to first create a Twitter developer account and obtain our API keys and access tokens.
Now, let’s do the following steps:
- Install the Tweepy library by running
pip install tweepyin your terminal or command prompt. - Let’s have the basic setup
import tweepy consumer_key = "CONSUMER_KEY" consumer_secret = "CONSUMER_SECRET" access_token = "ACCESS_TOKEN" access_token_secret = "ACCESS_TOKEN_SECRET" auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) api = tweepy.API(auth)
Collecting Tweets
Now, we can use Tweepy to search for tweets using various parameters, such as keywords, hashtags, location, and more. Here’s an example,
tweets = api.search(q="Cyber Security", lang="en", tweet_mode="extended")
Here, The q parameter specifies the query string to search for, lang specifies the language of the tweets, and tweet_mode specifies that we want to retrieve the full text of the tweets.
Let’s look at some other examples:
tweets = api.user_timeline(screen_name="elonmusk", count=10, tweet_mode="extended")
Here, we are collecting 10 most recent tweets from a specific user’s timeline or from a specific location.
And the following code retrieves the 100 most recent tweets containing the word Spaceship and located within a 50km radius of San Francisco:
tweets = api.search(q="Spaceship", geocode="37.7749,-122.4194,50km", count=100, tweet_mode="extended")
Storing Tweets
Once, we are done, we can store the tweets in a CSV or JSON file. Here’s a code to store in a CSV file.
import csv
with open("tweets.csv", "w", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerow(["Text", "Created At"])
for tweet in tweets:
writer.writerow([tweet.full_text, tweet.created_at])
tweety module
There’s another module named tweety that claims that there is no limit collecting tweets.
This one uses Beautifulsoup, requests, and openpyxl to collect tweet using the frontend API.
Based on the authors github repo, here’s an example:
#!pip install tweety-ns
from tweety.bot import Twitter
app = Twitter("elonmusk")
all_tweets = app.get_tweets()
for tweet in all_tweets:
print(tweet)
That’s all for today! Cheers!!! 😎
Leave a comment