Machine Learning
ML Board prediction (batch)
warning This BLOCK is currently in beta. The beta version will become unusable after the official version is released.
Some functions may not work properly during the beta. We appreciate your feedback regarding bugs or ways to improve MAGELLAN BLOCKS.
Overview
This BLOCK makes batch predictions using an ML Board’s training with input variable data. It is designed for making predictions with collections of large amounts of input data.
It supports the following four types of ML Board:
- Classification type
- Regression type
- Image classification type
- Image object detection type
info_outline This BLOCK takes a comparatively longer time for predictions than the ML Board prediction (online) BLOCK. However, it is more efficient at making predictions when using large amounts of data.
The BLOCK performs batch predictions for the regression and classification models by reading input variable data from a text file stored in Google Cloud Storage open_in_new (GCS). It outputs the results as text files to a GCS folder. As a general rule, the results are split into multiple files.

For the image classification model, the BLOCK reads from either image files stored in GCS or a JSON file that contains data for images. Results are output in the same manner as the regression and classification models.
info_outline You must apply a training on an ML Board beforehand to make predictions with this BLOCK.
Preparing the input data
Classification and regression types
Prepare input data as a JSON format text file as shown below:
{"key": "1", "sepal_length": 5.9, "sepal_width": 3.0, "petal_length": 4.2, "petal_width": 1.5} {"key": "2", "sepal_length": 6.9, "sepal_width": 3.1, "petal_length": 5.4, "petal_width": 2.1} {"key": "3", "sepal_length": 5.1, "sepal_width": 3.3, "petal_length": 1.7, "petal_width": 0.5}
- Make each line a JSON object (
{...}
). - Separate objects with line breaks.
- Gather input data for one instance into one JSON object.
- JSON objects consist of “name” and “value” paits.
- In each pair, the name and value are separated by a
:
. - The name is on the left of the
:
and the value is on the right (name: value
).
- In each pair, the name and value are separated by a
- Values can be set as the following three types:
- Numbers:
1
,23.45
, and the like (numerical value, month, and day data). - Strings:
"abc"
,"xyz"
, and the like. Strings are enclosed in"
(Strings (enumerated) data). - Arrays:
[1, 2, 3]
,[4, 5.6, 7.0]
, and the like. Arrays can contain multiple numerical values enclosed between[
and]
(Numerical value with multiple dimensions and sequence data).info_outline This section explains how to set sequence data.
Sequence data, like the image below for weather data for the past 7 days, has meaning in its ordering. This section explains how to set an array for this example sequence.
When using this sequence data as part of an ML Board’s training data, you would simple write it out with commas (
,
) separating each number, as shown below.In the same manner, you list sequence data as an array when using it for a prediction.
"weather": [21, 35, 0, 20, 34, 0, . . . , 18, 32, 20]
(This assumes the item name for the sequence data is
weather
)
- Numbers:
- Each JSON object (an instance of input data) should include a pair with the name
"key"
. The value should be a string to identify that instance of input data.
The results of the prediction are output as JSON format text files to a GCS folder. The files are named automatically as shown below (the XXXXX and YYYYY change based on the number of files).
prediction.results-XXXXX-of-YYYYY
- XXXXX: A number starting with 0 that represents the file’s index number (00000, 00001, etc.).
- YYYYY: The total number of files (00001, 00003, etc.).
The following example shows prediction results for the classification model:
{"label_index": 2, "score": [9.230815578575857e-08, 0.007054927293211222, 0.9929450154304504], "key": "2", "label": "Iris-virginica"}
- Each row contains one JSON object ({...}).
- Rows are separated by line breaks.
- Prediction results for each instance of input data are contained within individual JSON objects.
"label_index" Shows which element of the "score" array the value of "label" represents. The elements in the array are ordered starting from 0. "score" The level of certainty for predicting each class. In this example, the certainty for class 0 is 0.000009231%, class 1 is 0.705492729%, and class 2 is 99.294501543%. "key" The value for the "key" used in the prediction input data. "label" The predicted class.
The following example shows prediction results for the regression model:
{"output": 10304.1962890625, "key": "20170103"}
- Each row contains one JSON object ({...}).
- Rows are separated by line breaks.
- Prediction results for each instance of input data are contained within individual JSON objects.
"output" The predicted value. "key" The value for the "key" used in the prediction input data.
Image classification and image object detection types
There are two ways to make predictions with the image classification or image object detection models.
- Specify image files stored in GCS
- Specify a JSON file
error When making object detection predictions for multiple images at once, all images must be the same size.
Each method is explained below.
Making predictions with image files stored in GCS
This method is the simplest way to make predictions.
- Upload your collection of image files into GCS in a single folder.
- Designate this folder into the Input GCS URL property.
- Make sure to include a
/
at the end of the GCS URL. - Images for predictions must be JPEG format.
- The file extension for the images can be .jpg or .jpeg.
- The folder containing the uploaded images can also contain folders within it (see the following image).
Making predictions with a JSON file
This method makes predictions using a JSON file that contains Base64 open_in_new encoded image data.
- The JSON file must contain more than one JSON object with each object separated by line breaks. See the following example:
{"key": "samp01", "image": {"b64": "/9j/4....../2Q=="}} {"key": "samp02", "image": {"b64": "/9j/4....../2Q=="}}
- The JSON objects should be formatted as follows:
{"key": "key", "image": {"b64": "Base64 encoded image data"}}
*Set the red portions with information for your prediction images.Name Value "key"
Designate a string to act as an identifying key for the prediction image. "b64"
Designate the Base64 encoded data for the prediction image. - Images for predictions must be JPEG format.
- Upload the JSON file to GCS and enter its GCS URL into the Input GCS URL property.
Prediction results
The results of the prediction are output as JSON format text files to a GCS folder. The files are named automatically as shown below (the XXXXX and YYYYY change based on the number of files).
prediction.results-XXXXX-of-YYYYY
- XXXXX: A number starting with 0 that represents the file’s index number (00000, 00001, etc.).
- YYYYY: The total number of files (00001, 00003, etc.).
The following shows example prediction results for the image classification model:
{"labels": ["cat", "dog"], "score": [1.0, 2.0886015139609526e-10], "key": "gs://my-bucket/images/sample_01.jpg", "label": "cat"} {"labels": ["cat", "dog"], "score": [3.7939051367175125e-07, 0.9999996423721313], "key": "gs://my-bucket/images/sample_02.jpg", "label": "dog"}
- Each row contains one JSON object ({...}).
- Rows are separated by line breaks.
- Prediction results for one image is contained in one JSON object.
"labels" A list of the classes. In this case, there are two classes: "cat" and "dog".
The order of the "labels" list matches the order in the "score" list that follows.
"score" The certainty for predicting each class.
In this example, the first value in "score" refers to "cat" and the second value in "score" refers to "dog".
"key" - If you made the prediction using image files uploaded to GCS, this will be the GCS URL for the image file.
- If you made the prediction using a JSON file, this will be the value for "key" as designated within that file.
"label" The predicted class. This is the class that corresponds with the highest value within "score".
Properties
Property | Explanation |
---|---|
BLOCK name | Configure the name displayed on this BLOCK. |
GCP service account | Select the GCP service account to use with this BLOCK. |
ML Board | Select the ML Board to use for this prediction. |
Input GCS URL |
|
Output GCS URL |
Designate a GCS URL for the folder that will contain the results of the prediction. For example, for the results to be stored into a folder named info_outline The BLOCK will create a new folder automatically if the designated folder does not already exist. For existing folders, new files will overwrite older files if they have the same name. |
Batch size |
For batch predictions, the input feature data is saved into memory (buffered) before the prediction is performed. Designate the amount (number of records) of input feature data that will be buffered in this property. There is the possibility that increasing the batch size will improve the speed of the prediction. However, the amount of memory used also increases, so there is also the possibility that the prediction will fail due to a memory shortage. Because of this, you will need to set an appropriate batch size for your input feature data that will not cause a memory shortage. |