Data Bucket Guide: File-Receiving Type

Data Bucket Guide: File-Receiving Type

Before starting

One issue you’ll face when trying to use IoT for your business is figuring out how to save the data you collect from IoT devices into servers or the cloud. Usually this requires technical knowledge to set up, or time and cost to hire someone to do it for you.

The Data Bucket fixes these problems by making it simple for anyone to quickly set up their own system for collecting and storing data into the cloud.

For storage, the Data Bucket uses Google’s highly-secure Google Cloud Platform , giving you access to the fastest Big Data processing available.

What is a Data Bucket?

Data Buckets provide a system for collecting data from IoT devices and storing that data into the cloud.

Data Bucket outline

BLOCKS creates this system automatically within your Google Cloud Platform (GCP) project, but creating a Data Bucket does not require specialized knowledge of GCP.

Because the Data Bucket creates its system within your GCP project, GCP service charges will apply separately from BLOCKS fees.

There are two types of Data Bucket based on the type of data transferred.

Type Explanation
Message-receiving
  • Collects comparatively small data (messages), such as acceleration, temperature, or humidity, from devices then stores it into a BigQuery table.
Message-receiving Data Bucket outline
File-receiving
  • Collects files, such as images, recordings, etc., from devices and stores them into Google Cloud Storage.
  • Logs (file name, save date, etc.) can also be stored. (Optional)
  • Can call on a Flow Designer’s Flow to execute upon files being saved. (Optional)
File-receiving type outline

How to create IoT Boards

This page explains how to create a file-receiving IoT Board.

For information regarding message-receiving IoT Boards, refer to Data Bucket Guide: Message-Receiving Type

Note for Self-Service Plan (Free Trial) users:

  • The GCP service account used with a Data Bucket must be given owner privileges.

    If the GCP service account does not have owner privileges, access the IAM page in the GCP console and change the relevant GCP service account’s role to Owner (Steps 1 → 2 → 3 in the image below).

    GCP console IAM screen

General outline for creating Data Buckets

To create a new Data Bucket, either click the Start on the “What is the Data Bucket” screen, or Add from the Data Bucket list.

The general steps for creating a Data Bucket are as follows:

  1. Create a new Data Bucket
    Select the type of Data Bucket and give it a name.
  2. GCP service account settings (Self-Service Plan users only)
    Designate the GCP service account to be used with the Data Bucket.
  3. Entry point settings
    Configure the destination for data sent from devices to the Data Bucket.
  4. Storage settings
    Configure how the Data Bucket will store the data it has received and processed.
  5. Flow Designer connection settings
    Designate a Flow from a Flow Designer that will execute when the Data Bucket stores the data.
  6. Confirm settings
    Review your settings and create the Data Bucket.

Each of these steps will be explained in more detail below.

Create a new Data Bucket

If you don’t have any Data Buckets created, click Start on the screen that says, “What is the Data Bucket?”

What is the Data Bucket?

If you have at least one Data Bucket created, click Add at the top of the Data Bucket list.

The Data Bucket list

info_outline A message will appear if your organization does not have sufficient licenses to create the Data Bucket. If you are an admin for your organization, you will be prompted with instructions on how to to purchase an additional license. If you are not an admin, you will be prompted to contact your admins.

Select the type of Data Bucket to create and click Next.

Select file-receiving type

Refer to the following for information about each type:

Type Explanation
Message-receiving type
  • Collects comparatively small data (messages) such as acceleration, temperature, or humidity, and stores it into a BigQuery table.
Message-receiving type outline
File-receiving type
  • Collects data as files, such as images, recordings, etc., from devices and stores them into Google Cloud Storage.
  • Logs (file name, save date, etc.) can also be stored. (Optional)
  • Can call on a Flow Designer’s Flow to execute upon files being saved. (Optional)
File-receiving type outline

Enter a name for the Data Bucket.

Board name setting

Click Next.

GCP service account settings

This step is for Self-Service Plan users only

In this section, you select your GCP service account and enable APIs required for using GCP services.

GCP service account setting
Select a GCP service account

Choose which GCP service account you will use from the list.

Enable APIs

If there are APIs that do not have a checkmark, follow these steps to enable them:

  1. Click the API’s link.
  2. Click Enable at the top of the Google API Console screen.
  3. Once Enable changes to Disable, close the Google API Console and return to BLOCKS.

Once finished with the above, click Check and confirm that a checkmark appears for the API.

You may need to wait a bit of time for the checkmark to appear. If it does not appear, try clicking Check and waiting again. Depending on the circumstances, this process can take a bit of time.

If you see a , the issue may be one of the following:

  • The relevant API is not enabled.
    Click the next to the API’s name and check if the API is enabled on the page that opens. If not, click the Enable.
  • Your GCP service account role is not set to Owner.
    Open the menu () from the upper-left of the GCP console and select IAM & Admin. From the IAM menu, confirm that your role is set to Owner. If not, change it to Owner.
  • Billing is not enabled on the relevant GCP project.
    Open the menu () from the upper-left of the GCP console and select Billing and enable billing for your project.

Click Next.

Entry point settings

In this section, you configure the entry point URL for your Data Bucket.

Entry point setting

The entry point URL is where data is sent on its way from devices to the Data Bucket. Use the HTTP POST method to send data from devices to this URL.

Entry point explanation

Entry point URLs are formatted as shown below. Only one portion can be configured by the user.

https://magellan-file-<*****>-dot-<Project  ID>.appspot.com

Item Explanation
<*****>

A string of characters used to specify the Data Bucket’s entry point. You can configure this portion. It’s set as a random 16-character string by default.

  • Only lower case letters (a-z), numbers (0-9), and hyphens (-) may be used.
  • The last character must be a letter or number.
  • The number of characters depends on the length of the Project ID, and ranges from 15 to 27.

When creating several Data Buckets for one GCP project, be sure not to use the same string more than once.

<Project ID> This is set automatically to your GCP Project ID and cannot be changed.

Click Next.

Storage settings

In this section, you configure settings for where the data will be stored. You can also configure saving file storage records (logs) into BigQuery.

Storage settings

Under Select a GCS (Google Cloud Storage) bucket, select which bucket your data will be stored into. You must create buckets beforehand.

There is also an optional setting to save logs into BigQuery. Click Save logs to BigQuery and configure the BigQuery dataset and table that will store the logs.

The Data Bucket will automatically create your specified dataset and table if the ones you enter don’t already exist. Tables used by Data Buckets always include the fields gcs_url and timestamp.

Field name Data type Explanation
gcs_url STRING The GCS URL of the saved file (gs://bucketname/filename)
timestamp TIMESTAMP The date/time the file was saved

You can add more schema if there is other information you want to record into the logs. Include this info with the data you send to the Data Bucket with the following format: "Field name added under Schema" = "Information to be recorded".

For example, suppose we want to add a field called "author" to record a log of the file’s owner. By including the parameter "author=MAGELLAN" with the data we send from the device, the Data Bucket will record the value "MAGELLAN" into the table’s "author" field.

Outline of sending data from devices to Data Buckets

If you enter an existing table's name into the Table setting, BLOCKS will find the fields you enter into Schema within that table. An error will occur when BLOCKS attempts to store logs if the fields "gcs_url" and "timestamp" do not exist.

Click Next.

Flow Designer connection settings

In this section, you can configure to have a Flow in a Flow Designer execute automatically when your data is stored.

Big Data connection settings

If you want to do this, click Create a Flow Designer connection and select the Flow you want to automatically execute.

The following parameters will be passed to the Flow.

Parameter name Explanation
gcs_url The GCS URL of the saved file (gs://bucketname/filename).
target_time The instant the file was saved in UTC (2016-10-13 12:39:10.234).
filename File name of the saved file.

Any other parameters included in the data will also be passed to the Flow.

Flow Designer connection parameters explanation

An API token with a description of API Token For Data Bucket Service will be added to the Flow Designer you link to your Data Bucket. Make sure not to delete this token or the connection between the Data Bucket and Flow Designer will not work. You can change the API token's description later, if desired.

Click Next.

Confirm settings

In this section, you can review your settings before creating the Data Bucket.

Confirmation screen

If you find any mistakes, click the Back to return to the relevant settings screen. Once you have fixed everything, return to the Confirm settings screen and click Finish.

It will take a bit of time for the Data Bucket to be created. You’ll be taken to the Data Bucket’s details screen once it finishes.

Sending data to Data Buckets

When sending data to the Data Bucket, use the HTTP POST method to access its Entry point URL. Encode the data in the application/x-www-form-urlencoded format.

Data sent to the Data Bucket must include the following parameters (formatted as parameter name=value):

Information to send Parameter name Value
API token key The "API Token" value from the Data Bucket details screen.
File name filename The file name to use when saving to GCS.
File data to send content Base64 encoded data.

You can also send other parameters with your data. If you use the Flow Designer connection feature, the Data Bucket will also pass these parameters to the Flow you designate.

If you have configured fields other than the defaults to be saved as logs, give your parameters the same names as those fields.

For example, imagine the following scenario:

  • We want to save a text file to GCS.
    • The file name is sample.txt.
    • The text file contains the string hello, world.
  • We added the field author to be saved into the logs.
  • We configured a connection with a Flow Designer's Flow.
    • The Flow references data from an external source with the variable foo.

In this case, the data we send to the Data Bucket would be as follows (the HTTP header is only an example):

POST /upload HTTP/1.1
Content-Type: application/x-www-form-urlencoded
    
key=d29b76ac16181816ac7bd678&filename=sample.txt&content=aGVsbG8sIHdvcmxkCg%3D%3D&author=MAGELLAN&foo=bar

Each parameter is explained below:

Parameter Explanation
key=d29b76ac16181816ac7bd678 The API token.
filename=sample.txt The file name to be used when the data is stored to GCS.
content=aGVsbG8sIHdvcmxkCg%3D%3D The data to be stored into GCS. After being encoded into Base64, the "hello, world" string is encoded once more into the application/x-www-form-urlencoded format.
author=MAGELLAN An optional parameter. This parameter is being used to give the value MAGELLAN to the author field that we configured for our logs. This parameter will also be passed along to the linked Flow, where the value MAGELLAN can be referenced by the variable author.
foo=bar An optional parameter. This parameter will be passed to the linked Flow where the value bar can be referenced by the variable foo.

The Data Bucket will also send the parameters gcs_url, target_time, and filename to the linked Flow. For details, refer to Flow Designer connection settings.

You can view your Entry point URL and API token on your Data Bucket's details screen. Click on your Data Bucket’s name in the Data Bucket list to access its details screen.

GCP service charges

Data Buckets create an environment within your GCP project that makes use of several GCP services.

As such, GCP service charges will apply separately from MAGELLAN BLOCKS fees. These charges vary depending on the service. For details, refer to the pricing page for each service used by Data Buckets.

How to use Data Buckets

In this section, we’ll demonstrate how to send an image file to a Data Bucket and automatically execute a Flow Designer's Flow to analyze the image.

Data Bucket and Flow Designer overview

The following image below shows an overview of how we will link our Data Bucket and Flow Designer.

Sample outline
  • First, a device will send the image file to the Data Bucket. The device will include a parameter about the file’s owner with the data (author=MAGELLAN).
  • The Data Bucket will save the image file into a GCS bucket. It will also save the owner information into a log in BigQuery.
    After the file has been saved to GCS, the linked Flow will execute in the Flow Designer.
  • Within the Flow, the Image analysis BLOCK will perform color analysis on the saved image, then output the results to the logs section of the Flow Designer with the Output to log BLOCK.

Preparing the Data Bucket and Flow Designer

First, we’ll create the following Flow in a Flow Designer. We’ve omitted the Start of Flow and End of Flow BLOCKS from the chart.

# BLOCK Settings
1 Image analysis
  • GCP service account: the relevant GCP service account
  • GCS image URL: ${gcs_url}
  • Information to detect: Color analysis only
2 Output to log
  • Variable to output: _
3 Output to log
  • Variable to output: author

Next, we’ll create a Data Bucket with the settings shown below. You can change specific values like bucket and table names if you are following along to test this out yourself.

Screen Item Contents
Entry point settings Entry point URL Leave as default.
Storage settings GCS bucket Set to magellan-iot-sample.
Logs

Check the box to save logs to BigQuery.

  • Data set: samples
  • Table: logs
  • Additional field: author (STRING type)
Flow Designer connection settings Connection Select the Flow described above that contains the Image analysis and Output to log BLOCKS.

Testing it out

We'll send the following image to the Data Bucket and have it analyzed for dominant colors.

Dominant color analysis sample image

We’ll use Unix-based "curl" and "base64" commands in the terminal to send the file.

First, we'll prepare the image by executing the following command:

curl https://storage.googleapis.com/blocks-docs/basic-guide/image_properties_sample.jpg | base64 -o sample.jpg

The "curl" command downloads the sample image, then the "base64" command encodes it into Base64. Finally, the file name is set to "sample.jpg".

Next, we'll send the image file to the Data Bucket with the following curl command. Since the API token ("key" parameter) is unique to each Data Bucket, be sure to use your Data Bucket’s API token if you test this out on your own.

curl --data key=d29b76ac16181816ac7bd678 --data filename=sample.jpg --data-urlencode content@sample.jpg --data author=MAGELLAN https://magellan-iot-2edblhnt-dot-magellan-iot-sample.appspot.com/upload

Running that command will send the image file to the Data Bucket, which will then save it into GCS. This will cause the Flow on the Flow Designer to execute, which will analyze the colors of the image and output a log of the results. The following are some example logs from running the command:

[2016-10-14 18:56:29.184]  info : Job-16 start
[2016-10-14 18:56:29.223]  info : VersatileVisionApi step start
[2016-10-14 18:56:29.234]  info : VersatileVisionApi: annotate for file gs://magellan-iot-sample/sample.jpg start
[2016-10-14 18:56:30.724]  info : VersatileVisionApi: annotate for file gs://magellan-iot-sample/sample.jpg finished
[2016-10-14 18:56:30.738]  info : VersatileVisionApi step finished
[2016-10-14 18:56:30.778]  info : Print step start
[2016-10-14 18:56:30.794]  info : Variable: '_'
{
  "imagePropertiesAnnotation": {
    "dominantColors": {
      "colors": [
        {
          "pixelFraction": 0.03975622,
          "color": {
            "green": 23,
            "blue": 24,
            "red": 19
          },
          "score": 0.15343559
        },
        {
          "pixelFraction": 0.25660831,
          "color": {
            "green": 213,
            "blue": 130,
            "red": 156
          },
          "score": 0.089890622
        },
        {
          "pixelFraction": 0.0071575367,
          "color": {
            "green": 243,
            "blue": 245,
            "red": 234
          },
          "score": 0.044020534
        },
        {
          "pixelFraction": 0.0072284034,
          "color": {
            "green": 165,
            "blue": 124,
            "red": 131
          },
          "score": 0.030882463
        },
        {
          "pixelFraction": 0.02898448,
          "color": {
            "green": 52,
            "blue": 54,
            "red": 46
          },
          "score": 0.10213716
        },
        {
          "pixelFraction": 0.027141945,
          "color": {
            "green": 85,
            "blue": 89,
            "red": 76
          },
          "score": 0.085689932
        },
        {
          "pixelFraction": 0.018425342,
          "color": {
            "green": 119,
            "blue": 122,
            "red": 109
          },
          "score": 0.053121205
        },
        {
          "pixelFraction": 0.17454468,
          "color": {
            "green": 209,
            "blue": 101,
            "red": 144
          },
          "score": 0.036085345
        },
        {
          "pixelFraction": 0.0090000713,
          "color": {
            "green": 116,
            "blue": 136,
            "red": 107
          },
          "score": 0.033624593
        },
        {
          "pixelFraction": 0.028913613,
          "color": {
            "green": 237,
            "blue": 153,
            "red": 184
          },
          "score": 0.030864337
        }
      ]
    }
  },
  "gcs_url": "gs://magellan-iot-sample/sample.jpg",
  "timestamp": 1476438988.0
}
[2016-10-14 18:56:30.809]  info : Print step finished
[2016-10-14 18:56:30.852]  info : Print step start
[2016-10-14 18:56:30.871]  info : Variable: 'author'
"MAGELLAN"
[2016-10-14 18:56:30.885]  info : Print step finished
[2016-10-14 18:56:30.926]  info : Job-16 finished