Text Search Engine Help

GCP service account settings

This step only applies for Self-Service Plan users. It is not shown for Full Service Plan users.

In this step, you will do the following:

  • Select your Google Cloud Platform (GCP) service account
  • Enable various Google APIs required to create the Text Search Engine
  • Confirm that your GCP service account is assigned the proper role

Select a GCP service account

BLOCKS will automatically construct and operate your Text Search Engine within your GCP project. In order to do this, you must have a GCP service account that has been given the Owner role.

In this step, select a GCP service account that has been given the Owner role.

For details on creating a GCP service account, refer to Creating a Google Cloud Platform service account key. The example on that page creates a service account with the “Editor” role, but you should set the role for GCP services accounts to be used with the Text Search Engine to Owner.

Enable APIs

Do the following if there are any APIs that do not have a checkmark () next to their Check button:

  1. Click Enable All APIs.
  2. The GCP console will open.
  3. Click Continue in the GCP console.
  4. Once the message The APIs are enabled appears, close the GCP console and return to BLOCKS.

Click Check for any API that does not have a checkmark (). Confirm that a checkmark appears for the API. Repeat this process for any APIs without a checkmark.

If the checkmark does not appear, try clicking Check again and waiting for a moment. This can sometimes take a bit of time. If the checkmark still does not appear, repeat the following:

  1. Wait a moment.
  2. Click Check.

If you still see a , the issue could be the following:

  • Billing is not enabled for your GCP project:

    Open the menu () in the upper-left corner of the GCP console. Click Billing. If you haven’t already done so, enable billing for your project.

Account permissions

Do the following if there is no checkmark () next to the Check button.

  1. Click next to Account creation permission.
  2. The GCP IAM page will open in a new tab.
  3. Click next to the GCP service account for the Text Search Engine.
  4. Change the role to Owner.
  5. Click Save.
  6. Close the GCP IAM page and return to BLOCKS.

Once finished with the above, click Check and confirm that a checkmark () appears.

If the checkmark does not appear, try clicking Check again and waiting for a moment. This can sometimes take a bit of time. If the checkmark still does not appear, repeat the following:

  1. Wait a moment.
  2. Click Check.

Optional settings

In this step, you can configure specifications for the virtual machine that will be built in Google Cloud Platform (GCP) for the search engine, enable a simple search application, and other settings.

Text Search Engine ID

Configure the identifier given to the various resources that BLOCKS will construct in GCP. In order to make management easier, we recommend choosing an identifier that will be easy to remember. This setting cannot be left blank.

The identifier must be formatted as follows:

  • Lowercase letters or numbers only.
  • Between 1–14 characters.

Machine settings

Refer to the explanations below for each setting:

Item Explanation
Nodes

Set the number of machines to use as an odd number 1 or greater (1, 3, 5, 7, etc.).

One node may be sufficient for development and testing, but 3 or more nodes are recommended for a final application in order to improve service reliability.

Machine type

Select the machine type. Each type has various specification differences (memory size, CPU, etc.).

For details on available machine types, refer to the GCP Machine Types documentation page.

The f1-micro / g1-small / n1-highcpu-2 machine types listed on the GCP Machine Type documentation page cannot be used and are not included in the selectable machine type list.

Disk size

Configure the disk size between 10–65536 GB.

Zone

Select the zone for the virtual machine that will run the search engine. For details about zones, refer to the GCP Regions and Zones documentation page.

At the current time, the Internal Load Balancer (Internal LB) is used when accessing the search engine from applications. As such, any application you make for use with the search engine must be in the same network, region, and zone as the search engine. Please keep this in mind when selecting the zone. In the future, access will be possible through the HTTP(S) Load Balancer, which will remove the need for using the same zone.

Keep in mind the following if you plan to connect an app built in Google App Engine (GAE) with the search engine (including the Simple Search App):

  • Select a zone that can use GAE (shown with the GAE icon)
  • If you already have a GAE application within your GCP project, select a zone within the same region as that application.

Because the Simple Search App runs in GAE, please be aware of the above if you plan to use the Simple Search App.

Simple Search App settings

You will configure settings for the Simple Search App in this section.

You can send queries to the search engine and see a display of the results with the Simple Search App. This allows you to quickly test the performance and accuracy of the search engine without needing to build your own app.

If you want to use the Simple Search App, click Enable GAE.

You can activate or deactivate the Simple Search App whenever you like. If you’d like to activate the Simple Search App immediately upon creating the Text Search Engine, click the Activate the Simple Search App checkbox.

After creating the Text Search Engine, you can activate or deactivate the Simple Search App from the Simple Search App section of the Text Search Engine details screen.

If you choose to activate the app, you will also need to configure a password. You will need this password later to use the Simple Search App.

The Simple Search App does not support many to one and many to many data relationship types. For more information about data relationship types, refer to the Import data section.

Text Search Engine details

You can import/update data, view details, and delete your Text Search Engine from this screen.

Index list

A Text Search Engine can contain multiple indices, as shown in the image below.

Index overview

An index is composed of the target text for the search (question data, answer data) and various dictionaries to improve search accuracy (Synonyms, Stopwords).

You can create indices and perform the following from the index list:

  • View the status of each index.

    The meaning of each index status is explained below:

    Status Explanation
    Creating

    The index is being created. This status will be displayed when the index first starts being created and will last for a bit of time until it is finished being built.

    Created

    The index has been created. Once this status is shown, you will be able to import data and open the Simple Search App.

    Creation failed

    Creation of the index failed. Fix the cause of the failure, delete the index, and try recreating it.

    The cause of failure may be one of the following:

    • An invalid dictionary path
    • Invalid dictionary contents
    • A BLOCKS error

    If a BLOCKS error is the likely cause, try leaving the dictionaries as they are, deleting the index, and recreating it.

    Importing

    Data (question data, answer data) is being imported. This status will be displayed when the data starts being imported and will last for a bit of time until it is finished being imported.

    Imported

    The data has finished being imported. You can make searches with your imported data once this status is shown.

    Import failed

    Import of the data failed. Fix the cause of the error and try to reimport the data.

    The cause of the failure may be one of the following:

    • An invalid data path
    • Invalid data contents
    • A BLOCKS error

    You can view logs for data imports from the update logs section of the index details menu. Please refer to these when looking for the cause of a failure.

    If a BLOCKS error is the likely cause, please try reimporting the data.

    Deleting

    The index is being deleted. This status will be shown once the index starts being deleted. It will be displayed until the deletion process completes.

    Deletion failed

    Deletion of the index failed. Wait a bit of time and try clicking the delete button again.

  • Open the Simple Search App.

    Click Open to open the Simple Search App in a new window.

  • View index details.

    Click View details to view the index details menu. See Index details for a detailed explanation.

  • Import your search data.

    Click Import data to import or update the text data for the search engine. See Import data for a detailed explanation.

  • Delete indices.

    Click Delete to delete an index. This action cannot be reversed.

Create an index

Click Create Index to create a new index. You can configure the index name and register dictionary files.

  • Index naming conventions:
    • Must contain only letters, numbers, or hyphens (-).
    • The first character can only be a letter or number.
  • Dictionaries:
    • The three types of dictionaries are the User dictionary, Synonyms, and Stopwords. Formatting for each type is explained later in this document.
    • Registering these dictionary files is optional.
    • The dictionary files must be uploaded in advance to a fixed location (bucket and folder) in Google Cloud Storage (GCS):
      Bucket

      The bucket shown in Text Search Engine details > Resources used by this Text Search Engine > Cloud Storage

      Folder

      analysis (fixed)

User dictionary

The user dictionary is designed for improving accuracy for Japanese language search engines. This step can be ignored if you are not searching Japanese text. For a detailed explanation on how to use this file, please refer to the Japanese version of this page.

Synonyms

You can define synonyms for terms with this dictionary.

For example, you might want the engine to match terms like “New York”, “NYC”, or "The Big Apple" as synonyms.

The synonyms dictionary should be formatted as follows:

  • The file name can contain only ASCII characters.
  • It must be a text file.
  • The file can contain only UTF-8 characters without BOM.
  • Newlines must be CR+LF or LF.
  • Define synonyms for one term on one line.
  • Each line should be formatted as follows:
    Synonym => Term
    
    • You can define multiple synonyms for a term. In this case, separate each synonym with a comma (,).
    • You can also define multiple terms. In this case, separate each term with a comma (,).
  • Example line:
    • To match “New York” with “NYC” and “The Big Apple”, you would enter the following:
      nyc,the big apple => new york
      
Stopwords

You can define words that will be excluded from searches with this dictionary.

The stopwords dictionary should be formatted as follows:

  • The file name can contain only ASCII characters.
  • It must be a text file.
  • The file can contain only UTF-8 characters without BOM.
  • Newlines must be CR+LF or LF.
  • Define one stopword on one line.
  • Example stopwords:
    Tokyo
    Fukuoka
    

Index details

Clicking View details from the index list brings up this screen.

You can view the following details about an index from this screen:

  • Settings
    • Simple Search App URL
      Shows the URL for the Simple Search App. Clicking this URL will open the app in a new tab.
    • Index name
      Shows the name of the index whose details you are viewing.
    • User dictionary
      Shows the GCS URL for a user dictionary file (if one was registered).
    • Synonyms
      Shows the GCS URL for a synonyms file (if one was registered).
    • Stopwords
      Shows the GCS URL for a stopwords file (if one was registered).
  • Update logs
    You can view timestamps and logs for data imports and updates in this section. Click Show logs to display log details.

Import data

Click Import data to register the text for your search engine.

You must upload this data to a fixed GCS bucket beforehand. This bucket can be found under Text Search Engine details > Resources used by this Text Search Engine > Cloud Storage

If desired, you can create folders within this bucket for uploading data into. In this case, the folder names can only contain ASCII characters.

For the Text Search Engine, you divide documents that will be searched into question and answer data and manage the association between them. Or, you can also import all of the data as answer data without separating it.

General use case
Undivided data use case

The different types of associations between question and answer data are shown below:

Association type Explanation
one-to-none

No association because there is only answer data.

one-to-one

One element of the answer data corresponds with one element in the question data and vice versa.

one-to-many

One element in the answer data can correspond with multiple elements in the question data, but one element in the question data only corresponds to one element in the answer data.

many-to-one

One element in the answer data only corresponds to one element in the question data, but one element in the question data can correspond to multiple elements in the answer data.

many-to-many

One element in the answer data can correspond to multiple elements in the question data and vice versa.

Question data

Question data should be formatted as follows:

  • The file name can contain only ASCII characters.
  • It must be a CSV format text file.
  • It can contain only UTF-8 characters without BOM.
  • Newlines can only be CR+LF or LF.
  • Each record (row) consists of 3 fields:
    Field Can omit? Explanation
    Question ID No

    An identifier for the question.

    The identifier can contain only letters, numbers, hyphens (-), and underscores (_).

    Text No

    The text of the question. It can contain blank spaces.

    Target Answer ID No

    Designate the answer ID of the answer data that you want to associate with this question.

    To create an association with multiple answer IDs, separate each ID with a colon (a1:a3:a5).

  • The file must start with a header line.
    • If the association type is “many-to-one” or “many-to-many”:
      id,body,target_ids
      
    • If the association type is different from the above:
      id,body,target_id
      
  • Example CSV:
    • If the association type is “many-to-one” or “many-to-many”:
      id,body,target_ids
      q1,"First text to associate with answer ID a1",a1
      q2,"Second text to associate with answer ID a1",a1
      q3,"Text to associate with answer ID a2",a2
      q4,"Text to associate with answer IDs a3 a4 a5",a3:a4:a5
      
    • If the association type is different from the above:
      id,body,target_id
      q1,"First text to associate with answer ID a1",a1
      q2,"Second text to associate with answer ID a1",a1
      q3,"Text to associate with answer ID a2",a2
      q4,"Text to associate with answer ID a3",a3
      
Answer data

Answer data should be formatted as follows:

  • The file name can contain only ASCII characters.
  • It must be a CSV format text file.
  • It can contain only UTF-8 characters without BOM.
  • Newlines can only be CR+LF or LF.
  • Each record (row) consists of 2 fields:
    Field Can omit? Explanation
    Answer ID No

    An identifier for the answer.

    The identifier can contain only letters, numbers, hyphens (-), and underscores (_).

    Text Yes

    The text of the answer.

    If omitted, this will be handled as a blank space.

  • The file must start with this header line:
    id,body
    
  • Example CSV:
    id,body
    a1,"First answer text"
    a2,"Second answer text"
    a3,"Third answer text"
    a4,"Fourth answer text"
    a5,"Fifth answer text"
    

Connection info

This section shows the IP address required to access your search engine (the port number is fixed to 9200).

Clicking next to the IP address will copy it to your clipboard.

This IP address is an address for the Internal Load Balancer (Internal LB). As such, the search engine can only be accessed from within the same network and region.

Simple Search App

You can view information for the Simple Search App.

You can also switch between enabling or disabling the app.

  • If you have the app enabled, clicking Disable simple app will disable the app.

  • If you have the app disabled, clicking Enable simple app will enable the app.

    You will be required to configure a password if you enable the Simple Search App. You will need to configure a password when re-enabling the app even if you have configured one in the past.

If you forget the password for the Simple Search App, you can reconfigure it by disabling the app and then re-enabling it.

You can view the status of the Simple Search App in the “Status” section. The possible statuses are explained in the following chart:

Status Explanation
Enabling

The Simple Search App is in the process of being activated. This status will display when you first click to enable the app.

Enabled

The Simple Search App can be used.

Failed to enable

The Simple Search App failed to activate properly. Follow the steps below to try re-enabling the app:

  1. Click Disable simple app
  2. Wait until the status becomes Disabled
  3. Click Enable simple app
Disabling

The Simple Search App is in the process of being disabled. This status will display when you first click to disable the app.

Disabled The Simple Search App cannot be used.
Failed to disable

The Simple Search App failed to disable properly. Follow the steps below to try disabling the app again.

  1. Click Disable simple app
  2. Wait until the status becomes Disabled

Setting info

You can view settings information from when you created the Text Search Engine in this section.

You can rename a Text Search Engine by clicking Change service name.

Resources used by this Text Search Engine

You can view resources in GCP that are used by your Text Search Engine in this section.

These resources are created in your GCP project.

Do not delete these resources. The Text Search Engine will not function if they are deleted.

Delete this Text Search Engine

Click Delete to delete your Text Search Engine and all of its related resources. This action cannot be undone.

You cannot delete a Text Search Engine while its Simple Search App is in the process of enabling or disabling. Deleting a Text Search Engine takes a bit of time, and you cannot use BLOCKS until it finishes.