Social Media Analytics is one of the major uses of Big Data. The following article article demonstrates how to use Hive in Azure HDInsight to analyse Twitter feeds.In this article, feeds about "Mauritius" shall be extracted and Analysed using Azure HDInslight.This article explains the whole process of analyzing Tweets using Hive, from gathering the Tweets to viewing them in Microsoft Excel.
2. Big Data Analytics using Microsoft Azure: Hive
The first step is to create an Application on Twitter, this would allow you to use the Twitter APIs, read data from Twitter and also post tweets if required.
Below are the steps to create your first Twitter Application
1. Go to https://dev.twitter.com/apps
2. Click on Create New App
3. Fill in the required details and create the App
Once you app is created, Navigate to Keys and Access tokens in you App, this is where the keys to read data from Twitter can be obtained.
In this article, the tweets shall be extracted using a PowerShell script, uploaded to Azure Storage before being processed by Azure HDInsight.
Below are the steps to extract the tweets and save them to an Azure Blob storage.
In this step, all the variables used in the PowerShell Script is defined.
This will open an interface to capture your login credentials
If the authentication is successful, your ID and subscriptions should be displayed in PowerShell.
Save the tweets to the blob storage
At this stage, the file has been uploaded to the blob storage in the default container at path Tweets/MRUTweets.txt
The below steps describes how to read the Tweets from the Blob Storage and analyse them using Hive on HDInsight.
From your Hadoop Cluster, click on Query Console.
Fill in your credentials and go to the Hive Editor
Load all the data from the file in table RAW_TWEETS.
The location above should point to the same path where the tweets are saved on the Blob Storage.
Below are the contents of the table RAW_TWEETS.
This is where processed (parsed) JSON twitter data will be stored.
Parse the JSON tweets from table RAW_TWEETS and store them into table TWEETS.
View the data into the TWEETS table
The result can also be downloaded and viewed locally.
View the data in the topregion table.
The driver can be found at the following location
Note: It was noticed that both the 32 and 64 bit versions for it to work.
Open ODBC data source administrator by clicking the Start button Picture of the Start button, clicking Control Panel, clicking Additional Options, and then clicking
Data Sources (ODBC). Administrator permission required If you're prompted for an administrator password or confirmation, type the password or provide confirmation.
Under User DSN, click add and select, Microsoft Hive ODBC Driver.
Fill in the required information as below. Of course include your password.
a. In Excel, go to the Data tab, From other sources, Microsoft Query
b. Select your data source
c. Add your required tables
iv. View your data in Excel
This article focused on demonstrating how Twitter feeds can be analysed using Hive and HDInsight. However, the Analysis does not end here this can be enhanced to discover lots of information about the customers of a company.
Imagine a company extracting twitter feeds about its products, retrieve the data into it's data warehouse and link the twitter data to its existing customer base, the possibilities of customer information here is of a really high scope, from discovering where customers are having negative views on its products, hence doing more advertising to understanding which customers needs which products.
Social Media analytics is definitely crucial to understand customer behaviors nowadays.
Moreover, having technologies like HDInsight where everything is managed by Azure and the user just focus on the business aspects makes Social Media/ Big data analytics much more easier and affordable.