DataEditor Overview

Introduction

The DataEditor is tool for processing data with a visual interface. You can use it to prepare your data for Machine Learning trainings and predictions.

DataEditor summary tab

You can import data into the DataEditor from BigQuery, Google Sheets, or as CSV files.

The DataEditor stores and processes your data using BigQuery, but you don’t need to have specialized knowledge of BigQuery to use the DataEditor.

While you don’t need to have any specialized knowledge of BigQuery to use the DataEditor, you will need to specify the BigQuery dataset and table that will store your data. If you are familiar with spreadsheet software, a BigQuery dataset could be compared to a workbook, and a BigQuery table to a single sheet.

Opening the DataEditor

After signing in to BLOCKS, you can access the DataEditor by doing the following:

Opening the DataEditor
  1. Click the menu icon () in the global navigation bar.
  2. Click DataEditor.

Features

This section will briefly explain the following main features of the DataEditor:

Importing

You can import data into the DataEditor by doing the following:

Importing data into the DataEditor
  1. Click Import.
The import menu

Select where your data is currently stored and configure the settings for importing it into the DataEditor. You can import the following types of data:

  1. Google Cloud Storage: CSV files stored in Google Cloud Storage (GCS).
  2. Google BigQuery: BigQuery tables.
  3. Google Drive: CSV files or Google Sheets stored in Google Drive.
  4. Upload: CSV files stored in your PC.

Using the editor

You can edit data in the DataEditor by doing the following:

Home screen
  1. Click the Data tab.
  2. Click the name of the data you want to edit.

You can do the following within the editor:

View summaries

You can view various aggregates for your data by doing the following:

The summary tab
  1. Click Summary.

The summary tab contains the following information for each column in your data:

  • The top 10 most frequently occuring values with the number of instances shown in parentheses.
  • The number and percentage of missing values.
  • The minimum value.
  • The maximum value.
  • The median value.
  • The average value.
  • The number of unique values.

Click Aggregate All Columns to calculate these aggregates. They will be automatically aggregated when you import data into the DataEditor, but you will need to use this button if you update your data.

View graphs

You can view graphs representing the distribution of values in your data by doing the following:

Viewing graphs in the DataEditor
  1. Click Table.
  2. Click View graph for a column.

This will create a graph to visually represent the distribution of values in the selected column.

Missing values are shown in red.

View data

You can view your data as a table by doing the following:

Viewing data in the editor screen
  1. Click Table.
  2. Click View data.

This will create a table of your data as shown below:

Data shown in the editor
Edit columns

You can access the Edit column menu by doing the following:

Opening the edit column menu
  1. Click Table.
  2. Click Edit column.

Any edits that you make in this menu will not be reflected in the actual data until you either Save your edited data with a new name or overwrite your existing data. You can cancel any of edits you make up until you save.

You can do the following from the Edit column menu:

  • Rename column

    Rename the selected column.

    Renaming a column
    1. Enter a new name for the column.
    2. Click OK.
  • Change type

    Change the data type for the column.

    Changing a column type
    1. Select the new data type for the column.
    2. Click OK.
  • Delete column

    Delete the column.

  • Duplicate column

    Duplicate the column.

    Duplicating a column
    1. Enter a name for the new column.
    2. Select the position where the new column will be placed.
    3. Click OK.
  • Split column

    Split a STRING-type column into multiple columns.

    Splitting a column
    1. Select the delimiter character.
    2. Enter the number of columns to split the original column into.
    3. Enter a name for each new column.
    4. Select whether or not to delete the original column. The original column is deleted by default.
    5. Click OK.
  • Delete rows with missing values or replace missing values

    Before you can edit missing values in your data, you must do the following:

    Showing missing values in a all columns
    1. Click .
    2. Click Show missing values.

    You can then select how to edit a column that contains missing values by doing the following:

    Selecting how to deal with missing values
    1. Click Edit column.
    2. Click Missing values.
    3. Click either Delete rows with missing values, Replace missing values, or Replace missing values with data from another column.
  • Change values

    Change values in the column according to conditions you configure.

    Changing values
  • Change to day of week values (0–6)

    Change STRING, DATE, DATETIME, or TIMESTAMP data for dates into the following values signifying the day of the week:

    Value Day of the week
    0 Sunday
    1 Monday
    2 Tuesday
    3 Wednesday
    4 Thursday
    5 Friday
    6 Saturday

    STRING data formatted as YEAR-MONTH-DATE HOUR:MINUTES:SECONDS can be converted.

  • Filter rows

    Set conditions to refine which rows to display.

    Filtering rows
Add columns

You can add a new column to your data by doing the following:

Adding a column
  1. Click Table.
  2. Click Edit Table.
  3. Click Add column.
Configuring the column to add

You can add columns containing the following types of data:

  • Random values (FLOAT)
  • Sequential values (INTEGER)
Add sequence columns

You can add sequence data columns by doing the following:

Adding sequence columns
  1. Click Table.
  2. Click Edit Table.
  3. Click Add sequence columns.
Configuring the sequence columns

Configure settings for the sequence columns.

Concat columns

You can concat columns by doing the following:

Concating columns
  1. Click Table.
  2. Click Edit Table.
  3. Click Concat columns.
Configuring how to concat columns

Configure settings for the columns that you will combine into one column.

Templates

You can save a series of edits as a template. Then, you can perform the same series of edits for new data in one step by simply selecting that template.

You can save up to 20 templates per project.

To save a template, do one of the following:

Saving a template
  • A: Click Save Template.
  • B: Click Save As... to save your data with a new name and your edits will be automatically saved as a template.
Saving automatically saved templates

You must rename automatically saved templates in order to use them again. To do this, click Templates, then click Rename and save template.

Data analysis

The DataEditor includes various tools for aggregating and visualizing your data. With these tools, you can visually check the correlations between your data and better determine the value of each input feature. You can also use your aggregations in your training data.

The basic process for using the data analysis tools is as follows:

Selecting the data analysis tab
  1. Click Data Analysis.
Selecting the columns and rows to include
  1. Select the columns and rows to include in the analysis. Clicking the fields will bring up a selection menu.

When selecting columns, choices are displayed in groups as “dimensions” and “measures.” Dimensions are qualitative values, such as strings or days of the week, that could be used as classes or ways to partition the data. Measures are quantitative, numerical values that can be aggregated.

Showing the results of the analysis
  1. Click the button in the center of the page and wait a bit for the results to be displayed.
Selecting a bar graph
  1. Click Table to open a list of visualization options (table, bar graph, line graph, combination graph, area, pie, scatter, or confusion matrix). The following is an example of a bar graph.
An example bar graph
  1. Click Save to store your results into a BigQuery table.
Correlation coefficients

You can find correlation coefficients between selected columns with the DataEditor. With this, you can visually check the relationship between columns and better decide on the importance of each feature in your data.

Target columns for calculating correlation coefficients must be INTEGER, FLOAT, STRING, or BOOLEAN type. STRING-type columns will be handed as categorical (enum) data (for data showing numerical amounts, please convert the data into numerical data beforehand). BOOLEAN-type data is converted into numbers and finds the conversion factor.

Results for STRING-type columns are shown within the results with the format [column name]_[value in the enum list]. For example, if a column named a contained the values "foo", "bar", and "baz", the results would be shown as a_foo, a_bar, and a_baz. The value portion of the results will only contain alphanumeric characters, and any non-alphanumeric characters will be discarded. If there are no usable characters or if the results would be a duplicate, a serial number will be assigned instead.

To find correlation coefficients, do the following:

Selecting the correlation coefficient tab
  1. Click Correlation Coefficient.
Calculating correlation coefficients
  1. Click Select.
  2. Select the columns that you want to find correlation coefficients for.
  3. Click OK.
  4. Click Execute.
Viewing the results

The results are shown visually with colors according to the correlation coefficient values.

You can click Download to download a CSV file of the chart of results.

  • - values in the chart will be converted into 1 values in the CSV file.
  • The values shown on the screen are rounded to six decimal places, but downloaded data is not rounded.
Create models

You can create the following types of models from the DataEditor:

  • Regression
  • Classification
  • Clustering

Additionally, you can select whether to create the following two types of models by using a Model Generator or not:

  • Regression
  • Classification

You can create models from the Create Model tab.

Creating models in the DataEditor

For more information about creating models in the DataEditor, refer to Creating models and making predictions in the DataEditor.

Make predictions

You can use models created in the DataEditor or from a Model Generator (classification/regression) to make predictions.

You can use the Flow Designer to run batch predictions or to automate a schedule for predictions. For more details about using the Flow Designer for predictions, refer to Using Flow Templates or the classification and regression model tutorials (Classification prediction/Regression preditcion).

To make predictions from the DataEditor, open the editor page for prediction data and do the following:

Selecting the model
  1. Click Predict.
  2. Select the model to use for making predictions.
  3. Click Predict.
Viewing the prediction results

The results will be displayed after a bit of time. The formatting of the results depends on the type of model used. For more information, refer to Creating models and making predictions in the DataEditor.

Merging tables

You can merge (join) data from your tables with other tables you’ve stored in the DataEditor or weather data from Japanese Meteorological Agency (JMA) observatories.

For the weather data, you can use data for one observatory by day or by hour. To join weather data, the base data must have a column for the date (DATE, DATETIME, or TIMESTAMP). If using a DATE format only, you can only join daily weather data.

You must purchase a license to use the JMA weather data. This license is separate from licenses to use the DataEditor.

You can merge tables from the data list by doing the following:

Merging tables from the data list
  1. Click for an item in the data list.
  2. Click Merge table.

You can also merge tables from the editor screen by doing the following:

Merging tables from the editor
  1. Click Edit Table.
  2. Click Merge table.

To merge a table with other data stored in the DataEditor, do the following:

Merging with other data stored in the DataEditor
  1. Click Select join table.
Selecting the data to merge
  1. Click the data to join with the original table.
Configuring the merge settings
  1. Select the join type. The example above shows an Inner Join.
  2. Select which columns to include from the original table.
  3. Select which columns to include from the second table.
  4. Click Add.
  5. Select the shared column for table 1.
  6. Select the shared column for table 2.
  7. Click Join.
Configuring where to store the resulting table
  1. Enter a name for the resulting data.
  2. Click the dataset ID.
  3. Enter a table ID.
  4. Click OK.
Clicking OK on the confirmation dialog
  1. Click OK and the merged data will appear in the data list after a bit of time.

To merge with the JMA weather data, do the following:

Merging with JMA weather data
  1. Click Select weather data.
Configuring the weather data
  1. Select one weather observatory
  2. Select the column from the original data that contains date information.
  3. Select the time zone.
  4. Select either Daily or Hourly data.
  5. Click OK.
Configuring the merge settings
  1. Check the columns from the original table to merge.
  2. Check the columns from the weather data to merge.
  3. Click Join.
Configuring where to store the resulting table
  1. Enter a name for the resulting data.
  2. Click the dataset ID.
  3. Enter a table ID.
  4. Click OK.
Clicking OK on the confirmation dialog
  1. Click OK and the merged data will appear in the data list after a bit of time.

Splitting tables

You can split a table into two tables. You can either split the data randomly according to a set ratio, or you can configure custom conditions.

Splitting tables

You can access split tables from the data list by doing the following:

Splitting tables from the data list
  1. Click for an item in the data list.
  2. Click Split table.

You can also access split tables from the editor by doing the following:

Splitting data from the editor
  1. Click Edit Table.
  2. Click Split table.

Exporting

You can export the contents of your data to GCS as a CSV or JSON format text file, or as a BigQuery table.

Exporting data

When exporting to GCS, a download link for the file will appear as shown below:

Downloading an exported file
  1. Right-click the link and select Save link as... to download the file to your PC.

You can export data from the data list by doing the following:

Exporting data from the data list
  1. Click for an item in the data list.
  2. Click Export table.

You can export data from the editor by doing the following:

Exporting data from the editor
  1. Click Edit Table.
  2. Click Export table.

Copy schema info

You can copy schema info (JSON) for BigQuery to your clipboard. You can use this in the Schema settings property of BigQuery BLOCKS in the Flow Designer.

Delete data

You can delete your data from the DataEditor.

Settings

You can view information about your DataEditor licenses from the Settings tab. You can also request a quote for additional licenses or optional licenses.

The settings tab