tapela.blogg.se - Tools for data analysis for tweets

We chose to use the Streaming API to collect tweets containing the hashtags “python” and/or “rstats” and/or “datascience” over a 10 day period. There are also a growing number of third party intermediaries that have access to the Twitter Firehose, and sell on the Twitter data they collect. The Twitter Firehose addresses the shortcomings of the previous two APIs, but at quite a substantial cost, whereas the other two are free to use. This means that if your search term is very generic and matches a lot of tweets, then not all of these tweets will be returned. The Streaming API tracks tweets as they happen, but Twitter only guarantees a sample of all current tweets will be collected. The REST API can only search past tweets, and is limited in how far back you can search as Twitter only keeps the last couple of weeks of data. These different approaches have different trade-offs. Twitter Firehose – Allows tracking of all tweets past and future, no limits on search results returned.

Streaming API – Allows tracking of multiple users and or search terms in near real time, though results may only be a sample.

REST API – Allows automated access to searching, reading and writing tweets.

These APIs currently come in three main flavours. To facilitate this type of analysis, Twitter offer a variety of Application Programming Interfaces or APIs that enable an application to programmatically interact with the services provided by Twitter. This type of analysis can be a component of market research, an avenue for collecting customer feedback or a way to promote campaigns and conduct targeted advertising. Today many companies are routinely drawing on social media data sources such as Twitter and Facebook to enhance their business decision making in a number of ways. This post is the first of three that will look into the results of our analysis, but first a bit of background.

With both languages becoming increasingly popular for data analysis, we thought it would be interesting to track current trends and see what people are saying about these and other tools for data science on Twitter. However the capabilities of each are expanding all the time thanks to continuous open source development in both areas. The question as to which is the “ best” language for doing data science is a hotly debated topic ( ), with both languages having their pros and cons. At Mango we use a variety of tools in-house to address our clients’ business needs and when these fall within the data science arena, the main candidates we turn to are either the R or Python programming languages.