Data Bucket Guide: Message-Receiving Type

Data Bucket Guide

Data Bucket Guide: Message-Receiving Type

Before starting

One issue you’ll face when trying to use IoT for your business is figuring out how to save the data you collect from IoT devices into servers or the cloud. Usually this requires technical knowledge to set up, or time and cost to hire someone to do it for you.

The Data Bucket fixes these problems by making it simple for anyone to quickly set up their own system for collecting and storing data into the cloud.

For storage, the Data Bucket uses Google’s highly-secure Google Cloud Platform , giving you access to the fastest Big Data processing available.

What is a Data Bucket?

Data Buckets provide a system for collecting data from IoT devices and storing that data into the cloud.

Data Bucket outline

BLOCKS creates this system automatically within your Google Cloud Platform (GCP) project, but creating a Data Bucket does not require specialized knowledge of GCP.

Because the Data Bucket creates its system within your GCP project, GCP service charges will apply separately from BLOCKS fees.

There are two types of Data Bucket based on the type of data transferred.

Type Explanation
Message-receiving
  • Collects comparatively small data (messages), such as acceleration, temperature, or humidity, from devices then stores it into a BigQuery table.
Message-receiving Data Bucket outline
File-receiving
  • Collects files, such as images, recordings, etc., from devices and stores them into Google Cloud Storage.
  • Logs (file name, save date, etc.) can also be stored. (Optional)
  • Can call on a Flow Designer’s Flow to execute upon files being saved. (Optional)
File-receiving type outline

How to create a Data Bucket

This page explains how to create a message-receiving Data Bucket.

For information regarding file-receiving Data Buckets, refer to Data Bucket Guide: File-Receiving Type.

Note for Self-Service Plan (Free Trial) users:

  • The GCP service account used with a Data Bucket must be given owner privileges.

    If the GCP service account does not have owner privileges, access the IAM page in the GCP console and change the relevant GCP service account’s role to Owner (Steps 1 → 2 → 3 in the image below).

    GCP console IAM screen

General outline for creating Data Buckets

To create a new Data Bucket, either click the Start on the “What is the Data Bucket” screen, or Add from the Data Bucket list.

The general steps for creating a Data Bucket are as follows:

  1. Create a new Data Bucket
    Select the type of Data Bucket and give it a name.
  2. GCP service account settings (Self-Service Plan users only)
    Designate the GCP service account to be used with the Data Bucket.
  3. Entry point settings
    Configure the destination for data sent from devices to the Data Bucket.
  4. Processing settings
    Configure how the Data Bucket will process data it receives.
  5. Storage settings
    Configure how the Data Bucket will store the data it has received and processed.
  6. Confirmation
    Review your settings and create the Data Bucket.

Each of these steps will be explained in more detail below.

Create a new Data Bucket

If you don’t have any Data Buckets created, click Start on the screen that says, “What is the Data Bucket?”

What is the Data Bucket?

If you have at least one Data Bucket created, click Add at the top of the Data Bucket list.

The Data Bucket list

Select the type of Data Bucket to create and click Next.

Selecting the message-receiving type Data Bucket

Refer to the following for information about each type:

Type Explanation
Message-receiving type
  • Collects comparatively small data (messages) such as acceleration, temperature, or humidity, and stores it into a BigQuery table.
Message-receiving type outline
File-receiving type
  • Collects data as files, such as images, recordings, etc., from devices and stores them into Google Cloud Storage.
  • Logs (file name, save date, etc.) can also be stored. (Optional)
  • Can call on a Flow Designer’s Flow to execute upon files being saved. (Optional)
File-receiving type outline

Enter a name for the Data Bucket.

Data Bucket name setting

Click Next.

GCP service account settings

This step is for Self-Service Plan users only

In this section, you select your GCP service account and enable APIs required for using GCP services.

GCP service account settings
Select a GCP service account

Choose which GCP service account you will use from the list.

Enable APIs

If there are APIs that do not have a checkmark, follow these steps to enable them:

  1. Click the API’s link.
  2. Click Enable at the top of the Google API Console screen.
  3. Once Enable changes to Disable, close the Google API Console and return to BLOCKS.

Once finished with the above, click Check and confirm that a checkmark appears for the API.

You may need to wait a bit of time for the checkmark to appear. If it does not appear, try clicking Check and waiting again. Depending on the circumstances, this process can take a bit of time.

If you see a , the issue may be one of the following:

  • The relevant API is not enabled.
    Click the next to the API’s name and check if the API is enabled on the page that opens. If not, click the Enable.
  • Your GCP service account role is not set to Owner.
    Open the menu () from the upper-left of the GCP console and select IAM & Admin. From the IAM menu, confirm that your role is set to Owner. If not, change it to Owner.
  • Billing is not enabled on the relevant GCP project.
    Open the menu () from the upper-left of the GCP console and select Billing and enable billing for your project.

Click Next.

Entry point settings

In this section, you configure the entry point URL for your Data Bucket.

Entry point settings

The entry point URL is where data is sent on its way from devices to the Data Bucket. Use the HTTP POST method to send data from devices to this URL.

Entry point overview

Entry point URLs are formatted as shown below. Only one portion can be configured by the user.

https://magellan-iot-<*****>-dot-<project ID>.appspot.com

Item Explanation
<*****>

A string of characters used to specify the Data Bucket’s entry point. You can configure this portion. It’s set as a random 16-character string by default.

  • Only lower case letters (a-z), numbers (0-9), and hyphens (-) may be used.
  • The last character must be a letter or number.
  • The number of characters depends on the length of the project ID, and ranges from 15 to 27.

When creating several Data Buckets for one GCP project, be sure not to use the same string more than once.

<Project ID> This is set automatically to your GCP Project ID and cannot be changed.

Click Next.

Processing settings

In this section, you configure settings related to how the Data Bucket will process the data.

Processing settings

If desired, you can configure the following Google Container Engine (GKE) related settings:

Setting Explanation
Machine type

Designate the type of virtual machine to be used.

  • n1-standard-1 (1 virtual CPU, 3.75 GB memory)
  • g1-small (0.5 virtual CPU, 1.70 GB memory)

The default setting is n1-standard-1.

Charges based on GCE machine type pricing will apply separately from MAGELLAN BLOCKS fees.

VM nodes

Configure the number of virtual machine instances (nodes) to be used.

Using up to 5 VM nodes is free of charge. From 6 nodes and up, GKE charges will apply according to machine type (separate from MAGELLAN BLOCKS fees).

Containers per node

Designate the number of BLOCKS applications (containers) per VM node that will process the data received at the entry point.

You can designate between 1 to 10 containers per node.

info_outline You can reconfigure the number of VM nodes and containers per node after creating the Data Bucket.

Click Next.

Storage settings

In this section, you configure settings for where the data will be stored.

Storage settings

These settings configure where and how the data will be stored into the cloud. It’s possible to store data separately according to data type.

Data types are identified by message type identifiers. In addition to the type of data sent, message types also act as identifiers for the storage location.

Data sent from devices to a Data Bucket must contain these message types, since this is how the Data Bucket decides where to store the data.

Message types overview

In this section, you create profiles for the data's storage locations. You can add, edit, or delete profiles for storage locations by doing the following:

  • Click Add to configure a new storage location.
  • Click a storage location's pencil button to edit it.
  • Click a storage location's trash button to delete it.

The current message-receiving Data Bucket version only supports storing data to BigQuery. As such, storage location settings are all related to BigQuery.

Configure the following information for each BigQuery storage location:

Item Explanation
Message type

Designate the message type of the data to be saved.

Data identified by this message type will be saved to the BigQuery table configured in the settings that follow.

Dataset Designate the ID of the BigQuery dataset where this data will be saved.
Table Designate the name of the BigQuery table where this data will be saved.
Schema

Designate the table’s schema.

Table division

Designate if the table will be divided by day, month, or not.

If using table division, you must also configure the Times for table division (field) setting.

Regarding table division:

You can either set Times for table division (field) to Use time that data was sent to the Data Bucket, or select a field that contains TIMESTAMP type data. When selecting Use time that data was sent to the Data Bucket, the table will be divided according to the time the Data Bucket received the data.

If you choose to divide tables by day or month, suffixes will be added to the table’s name. These suffixes are formatted as follows:

  • Divide by day: _%Y%m%d
  • Divide by month: _%Y%m01

%Y, %m, and %d are replaced with the times stored in the Times for table division (field). The format is year (4 digits), month (2 digits), and day (2 digits). As an example, a table named sample set to Divide by day would be divided into tables named like sample_20160905, and sample_20160906.

Click Next.

Confirm settings

You can review and confirm your settings on this screen.

Confirm settings

If you find any mistakes, click the Back to return to the relevant settings screen. Once you have fixed everything, return to the Confirm settings screen and click Finish.

It will take a bit of time for the Data Bucket to be created. You’ll be taken to the Data Bucket’s details screen once it finishes.

Sending data to the Data Bucket

This section explains the process of sending data from devices to a Data Bucket.

The following three pieces of information are required when sending data:

  • The entry point URL
  • An API token
  • A message type

You can be find this information on the Data Bucket’s details screen. Open this by clicking on an Data Bucket’s name from the Data Bucket list.

To send data from devices to the Data Bucket, use the HTTP POST method to access the entry point URL.

The data should be in JSON format and include the API token and message type information.

Overview of sending data

The following is an example of JSON format data that is ready to send to a Data Bucket:

{
  "api_token": "************************",
  "logs": [
    {
      "type": "message",
      "attributes": {
        "date": 1473642000,
        "name": "device_001",
        "message": "hello"
      }
    }
  ]
}

Each member (name/value pair separated by a colon) is explained below:

Name Explanation of value
"api_token" Designate the API token with a string.
"logs"

Designate the data as an array of objects.

  • Objects: Formatted as {...}. The braces contain at least one member (multiple members are separated by commas.)
  • Array: Formatted as [{...},{...}]
"type" Designate the message type. The data's storage location (a BigQuery table) is determined by this message type.
"attributes"

Designate the data to be stored as objects.

For BigQuery storage, names refer to the field names of the table where data will be stored, and values refer to the actual data to be stored into those fields.

For fields dealing with date/time, data must be designated using Unix timestamp formatting. In this example, the name is “date” and the value is “1473642000”.

The following example puts together two types of data into JSON format ready to be sent to a Data Bucket.

{
  "api_token": "************************",
  "logs": [
    {
      "type": "message",
      "attributes": {
        "date": 1473642000,
        "name": "device_001",
        "message": "hello"
      }
    },
    {
      "type": "temperature",
      "attributes": {
        "date": 1473642000,
        "name": "device_fukuoka_001",
        "temperature": 24.5
      }
    }
  ]
}

GCP service charges

Data Buckets create an environment within your GCP project that makes use of several GCP services.

As such, GCP service charges will apply separately from MAGELLAN BLOCKS fees. These charges vary depending on the service. For details, refer to the pricing page for each service used by Data Buckets.

How to use Data Buckets

In this section, we’ll demonstrate how to create a Data Bucket that collects two types of data from IoT devices.

Overview of example Data Bucket

Demonstration Data Bucket specifications

Our Data Bucket will collect the following two types of data:

Data type Details (content: format)
Message
  • Date/time the message was sent: UNIX timestamp
  • Name of the device sending the message: String
  • Message: String
Temperature
  • Date/time the temperature was sent: UNIX timestamp
  • Name of the device sending the temperature data: String
  • Temperature: Float

The storage settings for this data will be as follows:

Message data storage:
Item Value
Message type message
Dataset sample
Table messages
Schema
Field name Data format Mode
date TIMESTAMP REQUIRED
name STRING REQUIRED
message STRING NULLABLE
Table division Division by day, using the "date" field
Temperature data storage:
Item Value
Message type temperature
Dataset sample
Table temperatures
Schema
Field name Data format Mode
date TIMESTAMP REQUIRED
name STRING REQUIRED
temperature FLOAT NULLABLE
Table division Division by day, using the "date" field

Making the Data Bucket

Using this information as a base, we will create a Data Bucket with the following settings:

Screen Item Contents
Create a new Data Bucket Type Select Message-receiving type.
Name Enter a name for the Data Bucket.
GCP service account settings Service account Select your GCP service account.
Entry point settings Optional settings Leave as default.
Processing settings Optional settings Leave as default.
Storage settings Storage location Add the information from Message data storage and Temperature data storage found in Demonstration Data Bucket specifications.

Testing it out

Now, we’ll try sending data to our Data Bucket following the information outlined in Sending data to Data Buckets. However, instead of using devices, we’ll use a PC to send the data. We’ll use Unix curl commands to send data from the PC.

This time, we’ll use the following Data Bucket Connection Information and data:

Data Bucket Connection Information:
Item Value
Entry point URL https://magellan-iot-*****-dot-magellan-iot-sample.appspot.com
API token *****
First data type:
Item Name Value
Message type "type" "message"
Date/time the message was sent "date" 1473642000
(2016/09/12 10:00)
Name of the device sending the message "name" "device_001"
Message "message" "hello"
Second data type:
Item Name Value
Message type "type" "temperature"
Date/time the temperature data was sent "date" 1473642600
(2016/09/12 10:10)
Name of the device sending the temperature data "name" "device_fukuoka_001"
Temperature "temperature" 24.5

The curl commands to send the data are as follows:

First data type:

curl --data '{"api_token":"*****","logs": [{"type": "message","attributes": {"date": 1473642000,"name": "device_001","message": "hello"}}]}' https://magellan-iot-*****-dot-magellan-iot-sample.appspot.com/

Second data type:

curl --data '{"api_token":"*****","logs": [{"type": "temperature","attributes": {"date": 1473642600,"name": "device_fukuoka_001","temperature": 24.5}}]}' https://magellan-iot-*****-dot-magellan-iot-sample.appspot.com/

Or, the two data types can be sent together in one curl command as shown below:

curl --data '{"api_token":"*****","logs": [{"type": "message","attributes": {"date": 1473642000,"name": "device_001","message": "hello"}},{"type": "temperature","attributes": {"date": 1473642600,"name": "device_fukuoka_001","temperature": 24.5}}]}' https://magellan-iot-*****-dot-magellan-iot-sample.appspot.com/

We’ll confirm the results in BigQuery.

Message data:

Message data stored in BigQuery

Temperature data:

Temperature data stored in BigQuery