【402】Twitter Data Collection



1. 收集某一区域的实时数据

Name: AUS.py

#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = "*****"
access_token_secret = "*****"
consumer_key = "*****"
consumer_secret = "*****"

#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):

    def on_data(self, data):
        return True

    def on_error(self, status):

if __name__ == '__main__':
    #This handles Twitter authetification and the connection to Twitter Streaming API
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)
    #This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
    stream.filter(locations=[112, -44, 154, -9])

在 cmd 上运行代码 python AUS.py > 2019-06-07.txt ,将数据实时存储。

通过上面的代码可以将打印出来的数据直接存储到文本文件中。(类似 print() 可以直接将内容存储)


由于数据存储到一定量会出现奔溃的情况,因此增加 Twilio 自动发短信功能,遇到奔溃可以实时发短信,实现如下:

文件名: AUS_SMS.py

#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
from twilio.rest import Client 
import time
#Variables that contains the user credentials to access Twitter API
access_token = "*****"
access_token_secret = "*****"
consumer_key = "*****"
consumer_secret = "*****"

#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):

    def on_data(self, data):
        return True

    def on_error(self, status):

def textMessage(message):       
    account = '*****'
    token = '*****'
    client = Client(account, token)
    message = client.messages.create(to=myNumber, from_=twilioNumber, body=message)
if __name__ == '__main__':
        #This handles Twitter authetification and the connection to Twitter Streaming API
        l = StdOutListener()
        auth = OAuthHandler(consumer_key, consumer_secret)
        auth.set_access_token(access_token, access_token_secret)
        stream = Stream(auth, l)
        #This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
        stream.filter(locations=[112, -44, 154, -9])
        textMessage("n(*≧▽≦*)n [HELP] Program crashed!!!
Time: "+time.asctime())

3. 无限运行

可以直接通过 Python 文件来运行 Python 文件,通过建立无线循环可以实现无限收集数据


import os
import time

while True:    
    year = str(time.localtime().tm_year)
    mon = str(time.localtime().tm_mon)
    day = str(time.localtime().tm_mday)
    filename = year + '-' + mon.zfill(2) + '-' + day.zfill(2)
    i = 0
    while os.path.exists(os.getcwd() + '\' + filename + '.txt'):
        i += 1
        filename = year + '-' + mon.zfill(2) + '-' + day.zfill(2) + '-' + str(i)
    os.system("python AUS_SMS.py > " + filename + '.txt')

按照当天日期进行文件名命名,如果同一天的文件存在,则后面加 1,然后加 2,,,以此类推。。。

通过 os.system() 方法可以实现 cmd 运行 Python 文件的效果。
