Blog

Refine search

2020.9.07 - 

Using MAGELLAN BLOCKS and Tableau to understand trends around Covid-19

Hello everyone! I’m Rana, a consultant at Groovenauts.
Like most of my colleagues at Groovenauts, I have been working from home since the beginning of this year because of the Covid-19 pandemic. I love traveling but the situation has made it difficult for me to travel especially to go back home to Indonesia.

However, recently, the Japanese Government launched a “GoTo Campaign” that gives discounts to domestic travelers. I would love to utilize it! But, I was wondering how other people in Japan are feeling about travels in this Corona era. I don’t use Twitter much but I think people will talk about travels on Twitter. So I thought let’s analyze what people are tweeting recently regarding travel and get some idea about the travel situation in Japan.

Today, I would like to introduce a simple way to do twitter data analysis using MAGELLAN BLOCKS and Tableau! No advanced programming knowledge necessary!


  • What is Tableau?

    Tableau Software is one of the fastest-growing data visualization tools in today's BI industry. It is one of the best ways to transform your raw dataset into an easy-to-use format with zero technical skills and coding knowledge. With Tableau, you can easily visualize your data to get more insights and to do better analytics.

    Tableau introduction video

    Tableau by itself cannot go deeper into text data such as tweets. Here is where MAGELLAN BLOCKS comes in handy and can help us get better insights into our data!

  • Understanding the situation around Covid-19 era with Twitter data


    I want to show you how we can utilize data collected from Twitter to gain insights into the society's perceptions towards travel during Covid-19. People tweet about various topics all the time. Being able to understand what people are tweeting is a great way to understand what people are thinking about certain topics. Data collected from twitter is in text format. Since Tableau does not have the ability to break down texts into data that can be easily analyzed, we will use MAGELLAN BLOCKS machine learning and text analytics functions.

    By using MAGELLAN BLOCKS, there is no need for any programming skills to gather tweets from users and turn them into useful data to analyze.
    Obtaining data is only the first step. In order to be able to understand trends and gain insights, it is important to visualize the data! There are different tools available to achieve this, but Tableau will allow you to visualize your data beautifully with just a few simple drags and drops!


  • Steps
    1. Obtain a Twitter API

    2. Decide on keywords and build flow on Flow Designer

    3. Visualize using Tableau

Let's get started!


STEP 1 - Obtain Twitter API

Sign-in/sign-up to Twitter developer

If you do not have a twitter developer account, here is an article on how to sign up for an account: https://developer.twitter.com/en/apply-for-access
If you do have an account, follow these steps:

  • Sign in to your account
  • Go to apps > [your app details] > keys and token

    Take note and keep it aside for now. We’ll need it later!


STEP 2 - Decide on keywords and build Flow Designer to obtain data

This time, I will use these keywords related to travels during Covid-19 in Japan.

- Go Toキャンペーン
- GoToトラベル

Let’s build the flow!

  • Create a new Flow Designer

  • Let’s build our flow!
    Make sure you have Twitter function and Text Analysis function on your flow. For more details, feel free to contact us through our Inquiry Form.



    The two main blocks that we will be using are Twitter Search and Tokenize Document blocks.
    This time, let's prepare the following five blocks that we will be using (excluding the start and end of flow).



    Let’s dive into the details! You just need to drag the blocks from the left tab to the center view and set the blocks. No programming required!

    1. Start of Flow
      Notice that I change my BLOCK name to make sure that I can come back to this later and know what flow it is at a glance. You don’t need to change anything else here
    2. Search Tweets
      Remember the Twitter API that we put aside before? We will use the information on this block.


      Consumer Key = API key
      Consumer Secret = API secret key
      Access Token = Access token
      Access Token Secret = Access token secret

      In the GCS URL, insert your GCS location with the format: gs://filelocation 
      The flow I made this time has the following settings.
      gs://blocks-gn-sajarana-us-central1-data/twitter/gototravel.json

      Keyword is the twitter search keywords that we will be using. For details on Twitter search API, visit https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets

      This time, we will use the keywords related to GoToキャンペーン.

      "gotoトラベル" OR "GoToトラベル" OR "GoToキャンペーン" since:2020-08-16_00:00:00_JST until:2020-08-24_23:59:59_JST

      Note that Twitter standard search API only allows you to search tweets from approximately one week from the search’s date.

    3. Load to a single table from GCS
      This block is necessary to export the search results into BigQuery.
      Set the Source File GCS URL (same as the destination GCS URL on the Twitter search block), destination dataset, and destination table accordingly.


      Next, add a schema setting.

      Click edit as JSON and copy-paste the following:
      See JSON
       
      [
      {
      "name": "created_at","type": "STRING","mode": "NULLABLE"},{
      "name": "id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "text","type": "STRING","mode": "NULLABLE"},{
      "name": "source","type": "STRING","mode": "NULLABLE"},{
      "name": "truncated","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "in_reply_to_status_id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "in_reply_to_status_id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "in_reply_to_user_id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "in_reply_to_user_id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "in_reply_to_screen_name","type": "STRING","mode": "NULLABLE"},{
      "name": "quoted_status_id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "quoted_status_id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "is_quote_status","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "retweet_count","type": "INTEGER","mode": "NULLABLE"},{
      "name": "favorite_count","type": "INTEGER","mode": "NULLABLE"},{
      "name": "favorited","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "retweeted","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "lang","type": "STRING","mode": "NULLABLE"},{
      "name": "coordinates","type": "RECORD","mode": "NULLABLE","fields": [
      {
      "name": "coordinates","type": "STRING","mode": "REPEATED"},{
      "name": "type","type": "STRING","mode": "NULLABLE"
      }
      ]},{
      "name": "user","type": "RECORD","mode": "NULLABLE","fields": [
      {
      "name": "id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "name","type": "STRING","mode": "NULLABLE"},{
      "name": "screen_name","type": "STRING","mode": "NULLABLE"},{
      "name": "location","type": "STRING","mode": "NULLABLE"},{
      "name": "url","type": "STRING","mode": "NULLABLE"},{
      "name": "description","type": "STRING","mode": "NULLABLE"},{
      "name": "protected","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "verified","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "followers_count","type": "INTEGER","mode": "NULLABLE"},{
      "name": "friends_count","type": "INTEGER","mode": "NULLABLE"},{
      "name": "listed_count","type": "INTEGER","mode": "NULLABLE"},{
      "name": "favourites_count","type": "INTEGER","mode": "NULLABLE"},{
      "name": "statuses_count","type": "INTEGER","mode": "NULLABLE"},{
      "name": "lang","type": "STRING","mode": "NULLABLE"}
      ]},{"name": "place","type": "RECORD","mode": "NULLABLE","fields": [
      {"name": "id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "url","type": "STRING","mode": "NULLABLE"},{
      "name": "place_type","type": "STRING","mode": "NULLABLE"},{
      "name": "name","type": "STRING","mode": "NULLABLE"},{
      "name": "full_name","type": "STRING","mode": "NULLABLE"},{
      "name": "country_code","type": "STRING","mode": "NULLABLE"},{
      "name": "country","type": "STRING","mode": "NULLABLE"},{
      "name": "bounding_box","type": "RECORD","mode": "NULLABLE","fields": [
      {"name": "coordinates","type": "STRING","mode": "REPEATED"},{
      "name": "type","type": "STRING","mode": "NULLABLE"}
      ]}]},{"name": "quoted_status","type": "RECORD","mode": "NULLABLE","fields": [
      {"name": "created_at","type": "STRING","mode": "NULLABLE"},{
      "name": "id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "text","type": "STRING","mode": "NULLABLE"},{
      "name": "source","type": "STRING","mode": "NULLABLE"},{
      "name": "truncated","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "in_reply_to_status_id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "in_reply_to_status_id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "in_reply_to_user_id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "in_reply_to_user_id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "in_reply_to_screen_name","type": "STRING","mode": "NULLABLE"},{
      "name": "quoted_status_id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "quoted_status_id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "is_quote_status","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "retweet_count","type": "INTEGER","mode": "NULLABLE"},{
      "name": "favorite_count","type": "INTEGER","mode": "NULLABLE"},{
      "name": "favorited","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "retweeted","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "lang","type": "STRING","mode": "NULLABLE"},{
      "name": "user","type": "RECORD","mode": "NULLABLE","fields": [
      {"name": "id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "name","type": "STRING","mode": "NULLABLE"},{
      "name": "screen_name","type": "STRING","mode": "NULLABLE"
      }]}]},{"name": "retweeted_status","type": "RECORD","mode": "NULLABLE","fields": [
      {"name": "created_at","type": "STRING","mode": "NULLABLE"},{
      "name": "id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "text","type": "STRING","mode": "NULLABLE"},{
      "name": "source","type": "STRING","mode": "NULLABLE"},{
      "name": "truncated","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "in_reply_to_status_id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "in_reply_to_status_id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "in_reply_to_user_id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "in_reply_to_user_id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "in_reply_to_screen_name","type": "STRING","mode": "NULLABLE"},{
      "name": "quoted_status_id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "quoted_status_id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "is_quote_status","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "retweet_count","type": "INTEGER","mode": "NULLABLE"},{
      "name": "favorite_count","type": "INTEGER","mode": "NULLABLE"},{
      "name": "favorited","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "retweeted","type": "BOOLEAN","mode": "NULLABLE"},{
      "name": "lang","type": "STRING","mode": "NULLABLE"},{
      "name": "user","type": "RECORD","mode": "NULLABLE","fields": [
      {"name": "id","type": "INTEGER","mode": "NULLABLE"},{
      "name": "id_str","type": "STRING","mode": "NULLABLE"},{
      "name": "name","type": "STRING","mode": "NULLABLE"},{
      "name": "screen_name","type": "STRING","mode": "NULLABLE"}]}]}]

    4. Execute query
      The twitter data itself needs to be cleaned up to be usable in Tableau. This query will unnest the data and clean-up the date and time format.
      See query
       
      select
      cast(FORMAT_TIMESTAMP('%%Y-%%m-%%d', PARSE_TIMESTAMP('%%a %%b %%e %%T %%z %%Y',created_at,'UTC'), 'Asia/Tokyo') as date) as tweet_dt
      , cast(FORMAT_TIMESTAMP('%%Y-%%m-%%d %%H:%%M:%%S', PARSE_TIMESTAMP('%%a %%b %%e %%T %%z %%Y',created_at,'UTC'), 'Asia/Tokyo') as datetime) as tweet_dttm
      , retweeted_status.lang as retweeted_status_lang
      , retweeted_status.retweeted as retweeted_status_retweeted
      , retweeted_status.favorited as retweeted_status_favorited
      , retweeted_status.favorite_count as retweeted_status_favorite_count
      , retweeted_status.retweet_count as retweeted_status_retweet_count
      , cast(retweeted_status.quoted_status_id_str as string) as retweeted_status_quoted_status_id_str
      , retweeted_status.quoted_status_id as retweeted_status_quoted_status_id
      , retweeted_status.text as retweeted_status_text
      , retweeted_status.in_reply_to_screen_name as retweeted_status_in_reply_to_screen_name
      , cast(retweeted_status.in_reply_to_user_id_str as string) as retweeted_status_in_reply_to_user_id_str
      , retweeted_status.in_reply_to_user_id as retweeted_status_in_reply_to_user_id
      , retweeted_status.in_reply_to_status_id as retweeted_status_in_reply_to_status_id
      , retweeted_status.user.name as retweeted_status_user_name
      , retweeted_status.user.screen_name as retweeted_status_user_screen_name
      , cast(retweeted_status.user.id_str as string) as retweeted_status_user_id_str
      , retweeted_status.user.id as retweeted_status_user_id
      , retweeted_status.truncated as retweeted_status_truncated
      , cast(retweeted_status.id_str as string) as retweeted_status_id_str
      , retweeted_status.id as retweeted_status_id
      , retweeted_status.is_quote_status as retweeted_status_is_quote_status
      , retweeted_status.source as retweeted_status_source
      , cast(retweeted_status.in_reply_to_status_id_str as string) as retweeted_status_in_reply_to_status_id_str
      , retweeted_status.created_at as retweeted_status_created_at
      , in_reply_to_screen_name as in_reply_to_screen_name
      , favorite_count as favorite_count
      , quoted_status.lang as quoted_status_lang
      , quoted_status.retweeted as quoted_status_retweeted
      , quoted_status.favorited as quoted_status_favorited
      , quoted_status.favorite_count as quoted_status_favorite_count
      , quoted_status.retweet_count as quoted_status_retweet_count
      , cast( quoted_status.quoted_status_id_str as string) as quoted_status_quoted_status_id_str
      , quoted_status.quoted_status_id as quoted_status_quoted_status_id
      , quoted_status.text as quoted_status_text
      , quoted_status.in_reply_to_screen_name as quoted_status_in_reply_to_screen_name
      , cast(quoted_status.in_reply_to_user_id_str as string) as quoted_status_in_reply_to_user_id_str
      , quoted_status.in_reply_to_user_id as quoted_status_in_reply_to_user_id
      , quoted_status.in_reply_to_status_id as quoted_status_in_reply_to_status_id
      , quoted_status.user.name as quoted_status_user_name
      , quoted_status.user.screen_name as quoted_status_user_screen_name
      , cast(quoted_status.user.id_str as string) as quoted_status_user_id_str
      , quoted_status.user.id as quoted_status_user_id
      , quoted_status.truncated as quoted_status_truncated
      , cast(quoted_status.id_str as string) as quoted_status_id_str
      , quoted_status.id as quoted_status_id
      , quoted_status.is_quote_status as quoted_status_is_quote_status
      , quoted_status.source as quoted_status_source
      , cast(quoted_status.in_reply_to_status_id_str as string) as quoted_status_in_reply_to_status_id_str
      , quoted_status.created_at as quoted_status_created_at
      , place.bounding_box.type as place_bounding_box_type
      , place.country_code as place_country_code
      , place.url as place_url
      , place.name as place_name
      , place.place_type as place_place_type
      , place.country as place_country
      , place.full_name as place_full_name
      , place.id as place_id
      , lang as lang
      , retweeted as retweeted
      , user.lang as user_lang
      , user.statuses_count as user_statuses_count
      , user.favourites_count as user_favourites_count
      , user.listed_count as user_listed_count
      , user.screen_name as user_screen_name
      , user.verified as user_verified
      , user.followers_count as user_followers_count
      , user.protected as user_protected
      , user.description as user_description
      , user.url as user_url
      , user.name as user_name
      , user.location as user_location
      , user.friends_count as user_friends_count
      , cast(user.id_str as string) as user_id_str
      , user.id as user_id
      , favorited as favorited
      , retweet_count as retweet_count
      , coordinates.type as coordinates_type
      , coordinates.coordinates as coordinates_coordinates
      , text as text
      , in_reply_to_user_id_str as in_reply_to_user_id_str
      , in_reply_to_user_id as in_reply_to_user_id
      , in_reply_to_status_id as in_reply_to_status_id
      , truncated as truncated
      , id_str as id_str
      , id as id
      , is_quote_status as is_quote_status
      , source as source
      , in_reply_to_status_id_str as in_reply_to_status_id_str
      , created_at as created_at
      from twitter_search.gototravel
      where lang = 'ja' 



    5. Tokenize Document

      Now, we already got our tweets related to the keywords. However, I think it will be really difficult to go through thousands of tweets and try to understand a trend just from the raw text. So, let’s break the tweets down into words using the Tokenize Document block. This block parses the document stored in the BigQuery table and splits it into word strings according to the specified part-of-speech type. The result will be individual words that we can visualize in Tableau.



      You can set each field as shown above. In the twitter search result, each unique tweet has an “id” and the tweet itself is labeled “text”. Use id as the unique key field name and text as the text field name.

    6. Execute Query
      Since the result of the tokenized document is nested, we need to unnest it using a query. I will also exclude some of the common noises such as :// and one-byte characters that aren’t words that will not be useful for the analysis.

      SELECT
      key,language,word
      FROM twitter_search.gototravel_wordsplit
      CROSS JOIN UNNEST(words) as word
      where word not like "://" and byte_length(word)>1 

      Set the dataset and table accordingly.

      Don’t forget to click save and run your flow!




STEP 3 - Visualize data using Tableau

Now that we’ve extracted our data, let’s bring it to Tableau to visualize and analyze it!

  • Open a new Workbook


  • Connect to BigQuery. Give Tableau permission to access BigQuery.


  • Select your project and dataset. Drag the table into the view

 

  • Connect the tweet's search result table with the tokenized tweets table.
    We used Id Str (Tweet ID in string type) as key during the tokenization, so the common column would be Id Str = Key
  • With the tokenized data, I was able to create a word cloud on Tableau


    This visualization shows the words that are most commonly used along the words related to GoTo Campaign. You can see the keywords that show up the most appear bigger and in darker colors. I put a filter so that only words that appear more than 1000 times are included in the visualization.

Great! Now I get a gist of what people are talking about. But I want to see the actual tweets related to those words. Let’s put it into visualization!

  • Start with a simple visualization of the tweet and the count of the tweet (this includes the number of retweets)


    → Let’s put the two together on a dashboard!

    Simply drag the sheets into the dashboard and adjust it to your preference



    From here I can use the word cloud as a filter by clicking the filter icon and it will make the words as filters to see the actual tweets that include that word.



    I added some basic information such as tweets period, total corona cases (data from open data: https://www.mhlw.go.jp/stf/covid-19/open-data.html), and an overall number of tweets to make the dashboard more informative.





    Conclusion:

    MAGELLAN BLOCKS allows you to acquire and process Twitter data and analyze the text (Tweets) so that it can be visualized in Tableau.
    By using the combination of MAGELLAN BLOCKS and Tableau, we can use Twitter data to analyze the current trend and general feelings of what people are thinking about a specific topic.
    I was able to see that even with the campaign, people still seem reluctant to travel due to Coronavirus. I think I will wait until the pandemic subdued before traveling again!

    I hope you were able to follow along and use the topic of your interest and create your own visualization!