Image analysis

最終更新日：2022年07月01日

Output specifications

This page explains the type of results returned by the Image analysis BLOCK.

For a detailed example of using the Image analysis BLOCK, refer to Using the Google Cloud Vision API Machine Learning service.

The following image analysis results were recorded based on the February 2019 version of Google Cloud Platform | Google Cloud Vision API Documentation | Method: images.annotate

Image analysis results

By default, the Image analysis BLOCK exports its results as JSON format data to a variable named _ (underscore). You can configure the name of this variable with the Results storage variable setting in the Advanced settings portion of the property menu.

{
  "faceAnnotations": [{...}],
  "landmarkAnnotations": [{...}],
  "logoAnnotations": [{...}],
  "labelAnnotations": [{...}],
  "textAnnotations": [{...}],
  "fullTextAnnotation": {...},
  "safeSearchAnnotation": {...},
  "imagePropertiesAnnotation": {...},
  "cropHintsAnnotation": {...},
  "gcs_url": string,
  "timestamp": number
}

Name	Explanation
"faceAnnotations"	The results of facial recognition. Contains various data from the analyzed image.
"landmarkAnnotations"	The results of landmark (famous natural or man-made locations) recognition. Contains various data from the analyzed image.
"logoAnnotations"	The results of logo (logos of famous products and companies) recognition. Contains various data from the analyzed image.
"labelAnnotations"	The results of object recognition (tagging images based on the objects detected within, such as vehicles or animals). Contains various data from the analyzed image.
"textAnnotations"	The results text recognition. Contains various data from the analyzed image.
"fullTextAnnotation"	The results of the text recognition or OCR (document) detection.
"safeSearchAnnotation"	The results of adult content recognition (sexual/violent content, etc.).
"imagePropertiesAnnotation"	The results of color analysis.
"cropHintsAnnotation"	The data for the points of the cropped region.
"gcs_url"	String which displays the path for the analyzed image saved in GCS. Looks something like the following example: "gs://magellan-blocks-demo/kumamoto2.jpg"
"timestamp"	Numerical value showing the date/time (UNIX format) when image analysis occured. For example, analysis performed at 10:20 PM and 37 seconds on Sept. 27, 2016 would display as 1474939237.7195294.

Other than gcs_url and timestamp, these results are only exported when their corresponding features are detected by the analysis. For example, faceAnnotations will only be returned when a face is detected in the image.

Facial recognition ("faceAnnotations")

Facial recognition results are returned as an array ([...] format). The detected facial data that makes up the array are objects ({...} format).

"faceAnnotations": [
  {...},
  {...}
]

The format of the objects showing facial detection results are explained below. For an example of outputted results, refer to Examples of using the Image analysis BLOCK > Facial recognition example.

{
  "boundingPoly": {...},
  "fdBoundingPoly": {...},
  "landmarks": {...},
  "rollAngle": number,
  "panAngle": number,
  "tiltAngle": number,
  "detectionConfidence": number,
  "landmarkingConfidence": number,
  "joyLikelihood": string,
  "sorrowLikelihood": string,
  "angerLikelihood": string,
  "surpriseLikelihood": string,
  "underExposedLikelihood": string,
  "blurredLikelihood": string,
  "headwearLikelihood": string
}

Name	Explanation
"boundingPoly"	This object is a polygon that outlines the face and head detected in the image. It outlines the face and head in a broader rectangle than the object below, fdBoundingPoly, does.
"fdBoundingPoly"	This object is a polygon that outlines only the face detected in the image (a rectangle containing the ears, brow, and just below the mouth). It outlines the face more tightly than boundingPoly does.
"landmarks"	An array containing objects for each part of the face (left eye, right eye, etc.).
"rollAngle"	A numerical value showing the clockwise/counter-clockwise angle (roll) of the detected face. Degree range is -180 to 180.
"panAngle"	A numerical value showing the angle to which the detected face is pointing left/right (yaw). Degree range is -180 to 180.
"tiltAngle"	A numerical value showing the up/down angle the detected face is pointing (pitch). Degree range is -180 to 180.
"detectionConfidence"	A numerical value showing confidence level of the facial recognition (i.e. how accurately was the face detected). Range is 0 to 1.
"landmarkingConfidence"	A numerical value showing confidence of facial feature recognition. Range is 0 to 1.
"joyLikelihood"	A string showing the likelihood of a happy expression.
"sorrowLikelihood"	A string showing the likelihood of a sad expression.
"angerLikelihood"	A string showing the likelihood of an angry expression.
"surpriseLikelihood"	A string showing the likelihood of a surprised expression.
"underExposedLikelihood"	A string showing the likelihood that the image is underexposed.
"blurredLikelihood"	A string showing the likelihood that the image is blurry.
"headwearLikelihood"	A string showing the likelihood that the person detected is wearing a hat.

Landmark recognition ("landmarkAnnotations")

Landmark recognition results are returned as an array. The detected landmark data that makes up the array are objects.

"landmarkAnnotations": [
  {...},
  {...}
]

The format of the objects showing landmark detection results are explained below. For an example of outputted results, refer to Examples of using the Image analysis BLOCK > Landmark recognition example.

{
  "mid": string,
  "description": string,
  "score": number,
  "boundingPoly": {...},
  "locations": [{...}]
}

Name	Explanation
"mid"	The Google Knowledge Graph entity ID of the detected landmark.
"description"	The description for the above listed entity ID.
"score"	A numerical value showing confidence level of the landmark recognition. Range is 0 to 1.
"boundingPoly"	An object that expresses the polygon which outlines the detected landmark.
"locations"	Objects that show geographical position information for the detected landmark. This information may exist for both where the landmark is, and where the picture was taken from.

Logo recognition ("logoAnnotations")

Logo recognition results are returned as an array. The detected logo data that makes up the array are objects.

"logoAnnotations": [
  {...},
  {...}
]

The format of objects showing logo detection results are explained below. For an example of outputted results, refer to Examples of using the Image analysis BLOCK > Logo recognition example .

Name	Explanation
"description"	A string which expresses the detected logo.
"score"	A numerical value showing confidence level of the logo recognition. Range is 0 to 1.
"boundingPoly"	An object that expresses the polygon which surrounds the detected logo.

Object recognition ("labelAnnotations")

Object recognition results are returned as an array. The detected object data that makes up the array is an object.

"labelAnnotations": [
  {...},
  {...}
]

The format of the object showing object detection results is explained below. For an example of outputted results, refer to Examples of using the Image analysis BLOCK > Object recognition example.

{
  "mid": string,
  "description": string,
  "score": number
}

Name	Explanation
"mid"	The Google Knowledge Graph entity ID of the detected object.
"description"	The description of the above listed entity ID.
"score"	A numerical value showing confidence level of the object recognition. Range is 0 to 1.

Text recognition ("textAnnotations")

Text recognition results are returned as an array. The text data that makes up the array is an object.

"textAnnotations": [
  {...},
  {...}
]

The format for the results is explained below. For an example of outputted results, refer to Examples of using the Image analysis BLOCK > Text recognition example.

{
  "locale": string,
  "description": string,
  "boundingPoly": {...}
}

Name	Explanation
"locale"	A string showing the language code of the detected text.
"description"	A string expressing the detected text.
"boundingPoly"	An object that expresses the polygon which surrounds the detected text.

OCR (document) detection ("fullTextAnnotation")

The format of OCR (document) detection results data is explained below.

"fullTextAnnotation": {
  "pages": [
    {...},
    {...}
  ],
  "text": string
}

Name	Explanation
"pages"	A list of the detected pages.
"text"	The text detected on the page (UTF-8 encoding).

Adult content detection ("safeSearchAnnotation")

The format of the data showing adult content detection results is explained below. For an example of outputted results, refer to Examples of using the Image analysis BLOCK > Adult content detection example.

{
  "adult": string,
  "spoof": string,
  "medical": string,
  "violence": string
}

Name	Explanation
"adult"	A string that shows the likelihood of the image containing sexual content.
"spoof"	A string that shows the likelihood that the image is a spoof (an altered version meant to be funny or offensive) of another image.
"medical"	A string that shows the likelihood that the image is medical in nature.
"violence"	A string that shows the likelihood of the image containing violent content.

Color analysis ("imagePropertiesAnnotation")

The format of color analysis results data is as shown below.

"imagePropertiesAnnotation": {
  "dominantColors": {...}
}

Name	Explanation
"dominantColors"	An object that shows the dominant colors detected in the image. At the current time, this is the only information contained in imagePropertiesAnnotation.

The dominantColors object contains information about the colors in the image. Usually, it contains information for several colors.

"dominantColors": {
  "colors": [
    {...},
    {...}
  ]
}

Name	Explanation
"colors"	An array containing information about the dominant colors detected in the image.

The format of the color information is as follows. For an example of outputted results, refer to Examples of using the Image analysis BLOCK > Color analysis example.

{
  "color": {...},
  "score": number,
  "pixelFraction": number
}

Name	Explanation
"color"	An object that shows the RGB values of the detected color.
"score"	A numerical value that shows the level of confidence for color detection. Range is 0 to 1.
"pixelFraction"	A numerical value that shows what portion of the image the detected color comprises. Range is 0 to 1.

The objects showing the RGB values of the color are formatted as follows:

{
  "red": number,
  "green": number,
  "blue": number
}

Name	Explanation
"red"	A numerical value that shows the red level of the detected color. Range is 0 to 255.
"green"	A numerical value that shows the green level of the detected color. Range is 0 to 255.
"blue"	A numerical value that shows the blue level of the detected color. Range is 0 to 255.

Crop hints annotation ("cropHintsAnnotation")

The format for crop hints annotation results is as follows. For an example of outputted results, refer to Examples of using the Image analysis BLOCK > Image properties annotation.

{
  "cropHints": [
    {...},
    {...}
  ]
}

Name	Explanation
"cropHints"	A list of the crop hints.

Crop hint ("cropHint")

The format for crop hint results data is as follows:

{
  "boundingPoly": {...},
  "confidence": number,
  "importanceFraction": number
}

Name	Explanation
"boundingPoly"	An object showing the bounding polygon for the crop region.
"confidence"	A number between 0–1 showing the confidence that the crop region is salient (Example: `0.99`).
"importanceFraction"	A number showing the importance of the crop region to the original image.

Bounding polygon objects ("boundingPoly"/"fdBoundingPoly")

The format for objects that express polygons is as shown below.

{
  "vertices": [{...}, ..., {...}]
}

Name	Explanation
"vertices"	An array comprised of objects showing each of the polygon's vertices.

Vertices object

The vertices object shows two-dimensional coordinates within the image. The data is formatted as follows:

{
  "x": number,
  "y": number
}

Name	Explanation
"x"	The numerical value of the vertex's x coordinate.
"y"	The numerical value of the vertex's y coordinate.

Likelihood

Likelihood is shown using one of the six possible strings shown below.

Likelihood	Explanation
"UNKNOWN"	The likelihood is unknown.
"VERY_UNLIKELY"	The likelihood is very low.
"UNLIKELY"	The likelihood is low.
"POSSIBLE"	There is some likelihood.
"LIKELY"	The likelihood is high.
"VERY_LIKELY"	The likelihood is very high.

Facial features (landmarks) object

The facial landmarks object shows the type of facial feature (left eye, right eye, etc.) and the feature's position in the image. The data is formatted as shown below.

{
  "type": string,
  "position": {...}
}

Name	Explanation
"type"	A string that expresses the type of facial feature.
"position"	An object showing the position of the facial feature.

Types of features

Facial features are expressed as strings. The various strings and their meanings are as shown below.

Facial feature strings	Explanation
"UNKNOWN_LANDMARK"	An unrecognizable facial feature.
"LEFT_EYE"	The left eye.
"RIGHT_EYE"	The right eye.
"LEFT_OF_LEFT_EYEBROW"	The leftmost point of the left eyebrow.
"RIGHT_OF_LEFT_EYEBROW"	The rightmost point of the left eyebrow.
"LEFT_OF_RIGHT_EYEBROW"	The leftmost point of the right eyebrow.
"RIGHT_OF_RIGHT_EYEBROW"	The rightmost point of the right eyebrow.
"MIDPOINT_BETWEEN_EYES"	The point directly between the eyes.
"NOSE_TIP"	The tip of the nose.
"UPPER_LIP"	The upper lip.
"LOWER_LIP"	The lower lip.
"MOUTH_LEFT"	The leftmost point of the mouth.
"MOUTH_RIGHT"	The rightmost point of the mouth.
"MOUTH_CENTER"	The center point of the mouth.
"NOSE_BOTTOM_RIGHT"	The bottom-right point of the nose.
"NOSE_BOTTOM_LEFT"	The bottom-left point of the nose.
"NOSE_BOTTOM_CENTER"	The bottom-center point of the nose.
"LEFT_EYE_TOP_BOUNDARY"	The topmost point of the left eye.
"LEFT_EYE_RIGHT_CORNER"	The rightmost point of the left eye.
"LEFT_EYE_BOTTOM_BOUNDARY"	The bottommost point of the left eye.
"LEFT_EYE_LEFT_CORNER"	The leftmost point of the left eye.
"RIGHT_EYE_TOP_BOUNDARY"	The topmost point of the right eye.
"RIGHT_EYE_RIGHT_CORNER"	The rightmost point of the right eye.
"RIGHT_EYE_BOTTOM_BOUNDARY"	The bottommost point of the right eye.
"RIGHT_EYE_LEFT_CORNER"	The leftmost point of the right eye.
"LEFT_EYEBROW_UPPER_MIDPOINT"	The central uppermost point of the left eyebrow.
"RIGHT_EYEBROW_UPPER_MIDPOINT"	The central uppermost point of the right eyebrow.
"LEFT_EAR_TRAGION"	The notch above the tragus (the protrusion that partially covers the entrance to the inner ear) of the left ear.
"RIGHT_EAR_TRAGION"	The notch above the tragus (the protrusion that partially covers the entrance to the inner ear) of the right ear.
"LEFT_EYE_PUPIL"	The pupil of the left eye.
"RIGHT_EYE_PUPIL"	The pupil of the right eye.
"FOREHEAD_GLABELLA"	The point between the eyebrows.
"CHIN_GNATHION"	The bottommost point of the chin.
"CHIN_LEFT_GONION"	The leftmost point of the jaw.
"CHIN_RIGHT_GONION"	The rightmost point of the jaw.

Facial feature position object

Three dimensional coordinates showing the position of each facial feature. The data is formatted as follows:

{
  "x": number,
  "y": number,
  "z": number
}

Name	Explanation
"x"	Numerical value showing the x coordinate.
"y"	Numerical value showing the y coordinate.
"z"	Numerical value showing the z coordinate (depth).

Geographical position object ("locations")

An array that shows the geographical position. The position is expressed as objects for the latitude and longitude.

"locations": [
  {...},
  {...}
]

The data for the latitude and longitude objects is expressed as shown below.

{
  "latLng": {
    "latitude": number,
    "longitude": number
  }
}

Name	Explanation
"latitude"	A numerical value showing the latitude.
"longitude"	A numerical value showing the longitude.

Pages ("page")

The format for page data results for text recognition is as follows:

{
  "blocks": [
    {...},
    {...}
  ],
  "width": number,
  "height": number,
  "property": {...}
}

Name	Explanation
"blocks"	A list of the text or image blocks on the page.
"width"	The width of the page.
"height"	The height of the page.
"property"	Additional properties of the page.

Additional properties (property)

The format of the additional properties data is as follows:

{
  "detectedLanguages": [
    {...},
    {...}
  ],
  "detectedBreak": {...}
}

Name	Explanation
"detectedLanguages"	A list of the languages detected.
"detectedBreak"	A list of the text breaks detected.

Detected language ("detectedLanguage")

The format of the detected language results is as follows:

{
  "languageCode": string,
  "confidence": number
}

Name	Explanation
"languageCode"	A BCP-47 code indicating the language detected (Example: "en-US" or "sr-Latn"). For details, refer to Unicode Locale Identifier.
"confidence"	A number between 0–1 showing the confidence in detecting the language of the text (Example: `0.99`).

Detected break ("detectedBreak")

The format of detected break results is as follows:

{
  "type": string,
  "isPrefix": boolean
}

Name	Explanation
"type"	Shows the type of break detected.
"isPrefix"	Is "True" when the break leads the message.

Break type ("breakType")

The types of breaks are as follows:

Type	Explanation
UNKNOWN	The break type is unknown.
SPACE	A regular space.
SURE_SPACE	A sure space (wider than a regular space).
EOL_SURE_SPACE	A line-wrapping break.
HYPHEN	An end-line hyphen (cannot occur with SPACE or LINE_BREAK).
LINE_BREAK	A paragraph-ending line break.

Block ("block")

A block is a logical element detected on the page. The format for block results data is as follows:

{
  "boundingBox": {...},
  "paragraphs": [
    {...},
    {...}
  ],
  "blockType": string,
  "confidence": number,
}

Name	Explanation
"boundingBox"	The bounding box for the block (the rectangle containing text) with vertices ordered top-left, top-right, bottom-right, bottom-left. The top left-corner is defined by how text is read “normally,” so the vertices will be as follows when rotation is detected: Horizontal text: 0----1 \| \| 3----2 Text rotated 180 degrees: 2----3 \| \| 1----0 The order of the vertices will always be 0, 1, 2, 3.
"paragraphs"	A list of paragraphs in the blocks. Only shown when the block type is text.
"blockType"	The detected block type for the text or image block.
"confidence"	Shows the confidence for the OCR results for the block as a number between 0–1 (Example: `0.99`).

Paragraph ("paragraph")

The format of paragraph results is as follows:

{
  "boundingBox": {...},
  "words": [
    {...},
    {...}
  ],
  "confidence": number
}

Name	Explanation
"boundingBox"	The bounding box (rectangle that contains the text) of the paragraph with the vertices ordered top-left, top-right, bottom-right, bottom-left. The top left-corner is defined by how text is read “normally,” so the vertices will be as follows when rotation is detected: Horizontal text: 0----1 \| \| 3----2 Text rotated 180 degrees: 2----3 \| \| 1----0 The order of the vertices will always be 0, 1, 2, 3.
"words"	A list of the words detected in the paragraph.
"confidence"	Shows the confidence for the text recognition of the paragraph as a number between 0–1 (Example: `0.99`).

Name

Explanation

"boundingBox"

The bounding box (rectangle that contains the text) of the paragraph with the vertices ordered top-left, top-right, bottom-right, bottom-left. The top left-corner is defined by how text is read “normally,” so the vertices will be as follows when rotation is detected:

Horizontal text:

0----1
|    |
3----2

Text rotated 180 degrees:

2----3
|    |
1----0

The order of the vertices will always be 0, 1, 2, 3.

"words"

A list of the words detected in the paragraph.

"confidence"

Shows the confidence for the text recognition of the paragraph as a number between 0–1 (Example: 0.99).

Word ("word")

The format of word data results is as follows:

{
  "property": {...},
  "boundingBox": {...},
  "symbols": [
    {...},
    {...}
  },
  "confidence": number
}

Name	Explanation
"property"	Additional properties about the word.
"boundingBox"	The bounding box (rectangle that contains the word) of the word with the vertices ordered top-left, top-right, bottom-right, bottom-left. The top left-corner is defined by how the word is read “normally,” so the vertices will be as follows when rotation is detected: Horizontal word: 0----1 \| \| 3----2 Word rotated 180 degrees: 2----3 \| \| 1----0 The order of the vertices will always be 0, 1, 2, 3.
"symbols"	A list of the symbols detected in the word in order of how the word is read naturally.
"confidence"	Shows the confidence for recognition of the word as a number between 0–1 (Example: `0.99`).

Symbol ("symbol")

The format for symbol data results is as follows:

{
  "property": {...},
  "boundingBox": {...},
  "text": string,
  "confidence": number
}

Name	Explanation
"property"	Additional properties of the symbol.
"boundingBox"	The bounding box (rectangle that contains the symbol) of the symbol with the vertices ordered top-left, top-right, bottom-right, bottom-left. The top left-corner is defined by how the symbol is read “normally,” so the vertices will be as follows when rotation is detected: Horizontal symbol: 0----1 \| \| 3----2 Symbol rotated 180 degrees: 2----3 \| \| 1----0 The order of the vertices will always be 0, 1, 2, 3.
"text"	A UTF-8 symbol.
"confidence"	Shows the confidence for recognition of the symbol as a number between 0–1 (Example: `0.99`).

Block type ("blockType")

The type of block detected by the text recognition. The different possible types are as follows:

Value	Explanation
UNKNOWN	An unknown type of block.
TEXT	A text block.
PICTURE	An image block.
RULER	A horizontal or vertical line block.
BARCODE	A barcode.

BLOCKS Reference