EnglishFrenchSpanish

Ad


OnWorks favicon

groonga - Online in the Cloud

Run groonga in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command groonga that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


groonga - Groonga documentation

· news

CHARACTERISTICS OF GROONGA


Groonga overview
Groonga is a fast and accurate full text search engine based on inverted index. One of the
characteristics of Groonga is that a newly registered document instantly appears in search
results. Also, Groonga allows updates without read locks. These characteristics result in
superior performance on real-time applications.

Groonga is also a column-oriented database management system (DBMS). Compared with
well-known row-oriented systems, such as MySQL and PostgreSQL, column-oriented systems are
more suited for aggregate queries. Due to this advantage, Groonga can cover weakness of
row-oriented systems.

The basic functions of Groonga are provided in a C library. Also, libraries for using
Groonga in other languages, such as Ruby, are provided by related projects. In addition,
groonga-based storage engines are provided for MySQL and PostgreSQL. These libraries and
storage engines allow any application to use Groonga. See usage examples.

Full text search and Instant update
In widely used DBMSs, updates are immediately processed, for example, a newly registered
record appears in the result of the next query. In contrast, some full text search engines
do not support instant updates, because it is difficult to dynamically update inverted
indexes, the underlying data structure.

Groonga also uses inverted indexes but supports instant updates. In addition, Groonga
allows you to search documents even when updating the document collection. Due to these
superior characteristics, Groonga is very flexible as a full text search engine. Also,
Groonga always shows good performance because it divides a large task, inverted index
merging, into smaller tasks.

Column store and aggregate query
People can collect more than enough data in the Internet era. However, it is difficult to
extract informative knowledge from a large database, and such a task requires a many-sided
analysis through trial and error. For example, search refinement by date, time and
location may reveal hidden patterns. Aggregate queries are useful to perform this kind of
tasks.

An aggregate query groups search results by specified column values and then counts the
number of records in each group. For example, an aggregate query in which a location
column is specified counts the number of records per location. Making a graph from the
result of an aggregate query against a date column is an easy way to visualize changes
over time. Also, a combination of refinement by location and an aggregate query against a
date column allows visualization of changes over time in specific location. Thus
refinement and aggregation are important to perform data mining.

A column-oriented architecture allows Groonga to efficiently process aggregate queries
because a column-oriented database, which stores records by column, allows an aggregate
query to access only a specified column. On the other hand, an aggregate query on a
row-oriented database, which stores records by row, has to access neighbor columns, even
though those columns are not required.

Inverted index and tokenizer
An inverted index is a traditional data structure used for large-scale full text search. A
search engine based on inverted index extracts index terms from a document when it is
added. Then in retrieval, a query is divided into index terms to find documents containing
those index terms. In this way, index terms play an important role in full text search and
thus the way of extracting index terms is a key to a better search engine.

A tokenizer is a module to extract index terms. A Japanese full text search engine
commonly uses a word-based tokenizer (hereafter referred to as a word tokenizer) and/or a
character-based n-gram tokenizer (hereafter referred to as an n-gram tokenizer). A word
tokenizer-based search engine is superior in time, space and precision, which is the
fraction of relevant documents in a search result. On the other hand, an n-gram
tokenizer-based search engine is superior in recall, which is the fraction of retrieved
documents in the perfect search result. The best choice depends on the application in
practice.

Groonga supports both word and n-gram tokenizers. The simplest built-in tokenizer uses
spaces as word delimiters. Built-in n-gram tokenizers (n = 1, 2, 3) are also available by
default. In addition, a yet another built-in word tokenizer is available if MeCab, a
part-of-speech and morphological analyzer, is embedded. Note that a tokenizer is pluggable
and you can develop your own tokenizer, such as a tokenizer based on another
part-of-speech tagger or a named-entity recognizer.

Sharable storage and read lock-free
Multi-core processors are mainstream today and the number of cores per processor is
increasing. In order to exploit multiple cores, executing multiple queries in parallel or
dividing a query into sub-queries for parallel processing is becoming more important.

A database of Groonga can be shared with multiple threads/processes. Also, multiple
threads/processes can execute read queries in parallel even when another thread/process is
executing an update query because Groonga uses read lock-free data structures. This
feature is suited to a real-time application that needs to update a database while
executing read queries. In addition, Groonga allows you to build flexible systems. For
example, a database can receive read queries through the built-in HTTP server of Groonga
while accepting update queries through MySQL.

Geo-location (latitude and longitude) search
Location services are getting more convenient because of mobile devices with GPS. For
example, if you are going to have lunch or dinner at a nearby restaurant, a local search
service for restaurants may be very useful, and for such services, fast geo-location
search is becoming more important.

Groonga provides inverted index-based fast geo-location search, which supports a query to
find points in a rectangle or circle. Groonga gives high priority to points near the
center of an area. Also, Groonga supports distance measurement and you can sort points by
distance from any point.

Groonga library
The basic functions of Groonga are provided in a C library and any application can use
Groonga as a full text search engine or a column-oriented database. Also, libraries for
languages other than C/C++, such as Ruby, are provided in related projects. See related
projects for details.

Groonga server
Groonga provides a built-in server command which supports HTTP, the memcached binary
protocol and the Groonga Query Transfer Protocol (/spec/gqtp). Also, a Groonga server
supports query caching, which significantly reduces response time for repeated read
queries. Using this command, Groonga is available even on a server that does not allow you
to install new libraries.

Mroonga storage engine
Groonga works not only as an independent column-oriented DBMS but also as storage engines
of well-known DBMSs. For example, Mroonga is a MySQL pluggable storage engine using
Groonga. By using Mroonga, you can use Groonga for column-oriented storage and full text
search. A combination of a built-in storage engine, MyISAM or InnoDB, and a Groonga-based
full text search engine is also available. All the combinations have good and bad points
and the best one depends on the application. See related projects for details.

INSTALL


This section describes how to install Groonga on each environment. There are packages for
major platforms. It's recommended that you use package instead of building Groonga by
yourself. But don't warry. There is a document about building Groonga from source.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package
for server. You should use a 32-bit package just only for tests or development. You will
encounter an out of memory error with a 32-bit package even if you just process medium
size data.

Windows
This section describes how to install Groonga on Windows. You can install Groogna by
extracting a zip package or running an installer.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package
for server. You should use a 32-bit package just only for tests or development. You will
encounter an out of memory error with a 32-bit package even if you just process medium
size data.

Installer
For 32-bit environment, download x86 executable binary from packages.groonga.org:

· http://packages.groonga.org/windows/groonga/groonga-6.0.1-x86.exe

Then run it.

For 64-bit environment, download x64 executable binary from packages.goronga.org:

· http://packages.groonga.org/windows/groonga/groonga-6.0.1-x64.exe

Then run it.

Use command prompt in start menu to run /reference/executables/groonga.

zip
For 32-bit environment, download x86 zip archive from packages.groonga.org:

· http://packages.groonga.org/windows/groonga/groonga-6.0.1-x86.zip

Then extract it.

For 64-bit environment, download x64 zip archive from packages.groonga.org:

· http://packages.groonga.org/windows/groonga/groonga-6.0.1-x64.zip

Then extract it.

You can find /reference/executables/groonga in bin folder.

Build from source
First, you need to install required tools for building Groonga on Windows. Here are
required tools:

· Microsoft Visual Studio Express 2013 for Windows Desktop

· CMake

Download zipped source from packages.groonga.org:

· http://packages.groonga.org/source/groonga/groonga-6.0.1.zip

Then extract it.

Move to the Groonga's source folder:

> cd c:\Users\%USERNAME%\Downloads\groonga-6.0.1

Configure by cmake. The following commnad line is for 64-bit version. To build 32-bit
version, use -G "Visual Studio 12 2013" parameter instead:

groonga-6.0.1> cmake . -G "Visual Studio 12 2013 Win64" -DCMAKE_INSTALL_PREFIX=C:\Groonga

Build:

groonga-6.0.1> cmake --build . --config Release

Install:

groonga-6.0.1> cmake --build . --config Release --target Install

After the above steps, /reference/executables/groonga is found at
c:\Groonga\bin\groonga.exe.

Mac OS X
This section describes how to install Groonga on Mac OS X. You can install Groonga by
MacPorts or Homebrew.

MacPorts
Install:

% sudo port install groonga

Homebrew
Install:

% brew install groonga

If you want to use MeCab as a tokenizer, specify --with-mecab option:

% brew install groonga --with-mecab

Then install and configure MeCab dictionary.

Install:

% brew install mecab-ipadic

Configure:

% sed -i '' -e 's,dicrc.*=.*,dicrc = /usr/local/lib/mecab/dic/ipadic,g' /usr/local/etc/mecabrc

Build from source
Install Xcode.

Download source:

% curl -O http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1

Configure (see source-configure about configure options):

% ./configure

Build:

% make -j$(/usr/sbin/sysctl -n hw.ncpu)

Install:

% sudo make install

Debian GNU/Linux
This section describes how to install Groonga related deb packages on Debian GNU/Linux.
You can install them by apt.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package
for server. You should use a 32-bit package just only for tests or development. You will
encounter an out of memory error with a 32-bit package even if you just process medium
size data.

wheezy
Add the Groonga apt repository.

/etc/apt/sources.list.d/groonga.list:

deb http://packages.groonga.org/debian/ wheezy main
deb-src http://packages.groonga.org/debian/ wheezy main

Install:

% sudo apt-get update
% sudo apt-get install -y --allow-unauthenticated groonga-keyring
% sudo apt-get update
% sudo apt-get install -y -V groonga

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo apt-get install -y -V groonga-tokenizer-mecab

If you want to use TokenFilterStem as a token filter, install groonga-token-filter-stem
package.

Install groonga-token-filter-stem package:

% sudo apt-get install -y -V groonga-token-filter-stem

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

Install groonga-munin-plugins package:

% sudo apt-get install -y -V groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo apt-get install -y -V groonga-normalizer-mysql

jessie
New in version 5.0.3.

Add the Groonga apt repository.

/etc/apt/sources.list.d/groonga.list:

deb http://packages.groonga.org/debian/ jessie main
deb-src http://packages.groonga.org/debian/ jessie main

Install:

% sudo apt-get update
% sudo apt-get install -y --allow-unauthenticated groonga-keyring
% sudo apt-get update
% sudo apt-get install -y -V groonga

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo apt-get install -y -V groonga-tokenizer-mecab

If you want to use TokenFilterStem as a token filter, install groonga-token-filter-stem
package.

Install groonga-token-filter-stem package:

% sudo apt-get install -y -V groonga-token-filter-stem

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

Install groonga-munin-plugins package:

% sudo apt-get install -y -V groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo apt-get install -y -V groonga-normalizer-mysql

Build from source
Install required packages to build Groonga:

% sudo apt-get install -y -V wget tar build-essential zlib1g-dev liblzo2-dev libmsgpack-dev libzmq-dev libevent-dev libmecab-dev

Download source:

% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1

Configure (see source-configure about configure options):

% ./configure

Build:

% make -j$(grep '^processor' /proc/cpuinfo | wc -l)

Install:

% sudo make install

Ubuntu
This section describes how to install Groonga related deb packages on Ubuntu. You can
install them by apt.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package
for server. You should use a 32-bit package just only for tests or development. You will
encounter an out of memory error with a 32-bit package even if you just process medium
size data.

PPA (Personal Package Archive)
The Groonga APT repository for Ubuntu uses PPA (Personal Package Archive) on Launchpad.
You can install Groonga by APT from the PPA.

Here are supported Ubuntu versions:

· 12.04 LTS Precise Pangolin

· 14.04 LTS Trusty Tahr

· 15.04 Vivid Vervet

· 15.10 Wily Werewolf

Enable the universe repository to install Groonga:

% sudo apt-get -y install software-properties-common
% sudo add-apt-repository -y universe

Add the ppa:groonga/ppa PPA to your system:

% sudo add-apt-repository -y ppa:groonga/ppa
% sudo apt-get update

Install:

% sudo apt-get -y install groonga

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo apt-get -y install groonga-tokenizer-mecab

If you want to use TokenFilterStem as a token filter, install groonga-token-filter-stem
package.

Install groonga-token-filter-stem package:

% sudo apt-get -y install groonga-token-filter-stem

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

Install groonga-munin-plugins package:

% sudo apt-get -y install groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo apt-get -y install groonga-normalizer-mysql

Build from source
Install required packages to build Groonga:

% sudo apt-get -V -y install wget tar build-essential zlib1g-dev liblzo2-dev libmsgpack-dev libzmq-dev libevent-dev libmecab-dev

Download source:

% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1

Configure (see source-configure about configure options):

% ./configure

Build:

% make -j$(grep '^processor' /proc/cpuinfo | wc -l)

Install:

% sudo make install

CentOS
This section describes how to install Groonga related RPM packages on CentOS. You can
install them by yum.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package
for server. You should use a 32-bit package just only for tests or development. You will
encounter an out of memory error with a 32-bit package even if you just process medium
size data.

CentOS 5
Install:

% sudo rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
% sudo yum makecache
% sudo yum install -y groonga

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y groonga-tokenizer-mecab

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

NOTE:
Groonga-munin-plugins package requires munin-node package that isn't included in the
official CentOS repository. You need to enable Repoforge (RPMforge) repository or EPEL
repository to install it by yum.

Enable Repoforge (RPMforge) repository on i386 environment:

% wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el5.rf.i386.rpm
% sudo rpm -ivh rpmforge-release-0.5.2-2.el5.rf.i386.rpm

Enable Repoforge (RPMforge) repository on x86_64 environment:

% wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el5.rf.x86_64.rpm
% sudo rpm -ivh rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm

Enable EPEL repository on any environment:

% wget http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
% sudo rpm -ivh epel-release-5-4.noarch.rpm

Install groonga-munin-plugins package:

% sudo yum install -y groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo yum install -y groonga-normalizer-mysql

CentOS 6
Install:

% sudo rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
% sudo yum makecache
% sudo yum install -y groonga

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y groonga-tokenizer-mecab

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

NOTE:
Groonga-munin-plugins package requires munin-node package that isn't included in the
official CentOS repository. You need to enable EPEL repository to install it by yum.

Enable EPEL repository on any environment:

% sudo rpm -ivh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm

Install groonga-munin-plugins package:

% sudo yum install -y groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo yum install -y groonga-normalizer-mysql

CentOS 7
Install:

% sudo yum install -y http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
% sudo yum install -y groonga

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y groonga-tokenizer-mecab

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

NOTE:
Groonga-munin-plugins package requires munin-node package that isn't included in the
official CentOS repository. You need to enable EPEL repository to install it by yum.

Enable EPEL repository:

% sudo yum install -y epel-release

Install groonga-munin-plugins package:

% sudo yum install -y groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo yum install -y groonga-normalizer-mysql

Build from source
Install required packages to build Groonga:

% sudo yum install -y wget tar gcc-c++ make mecab-devel

Download source:

% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1

Configure (see source-configure about configure options):

% ./configure

Build:

% make -j$(grep '^processor' /proc/cpuinfo | wc -l)

Install:

% sudo make install

Fedora
This section describes how to install Groonga related RPM packages on Fedora. You can
install them by yum.

NOTE:
Since Groonga 3.0.2 release, Groonga related RPM pakcages are in the official Fedora
yum repository (Fedora 18). So you can use them instead of the Groonga yum repository
now. There is some exceptions to use the Groonga yum repository because mecab
dictionaries (mecab-ipadic or mecab-jumandic) are provided by the Groonga yum
repository.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package
for server. You should use a 32-bit package just only for tests or development. You will
encounter an out of memory error with a 32-bit package even if you just process medium
size data.

Fedora 21
Install:

% sudo yum install -y groonga

Note that additional packages such as mecab-dic and mecab-jumandic packages require to
install groonga-release package which provides the Groonga yum repository beforehand:

% sudo rpm -ivh http://packages.groonga.org/fedora/groonga-release-1.1.0-1.noarch.rpm
% sudo yum update

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y groonga-tokenizer-mecab

Then install MeCab dictionary. (mecab-ipadic or mecab-jumandic)

Install IPA dictionary:

% sudo yum install -y mecab-ipadic

Or install Juman dictionary:

% sudo yum install -y mecab-jumandic

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

Install groonga-munin-plugins package:

% sudo yum install -y groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo yum install -y install groonga-normalizer-mysql

Build from source
Install required packages to build Groonga:

% sudo yum install -y wget tar gcc-c++ make mecab-devel libedit-devel

Download source:

% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1

Configure (see source-configure about configure options):

% ./configure

Build:

% make -j$(grep '^processor' /proc/cpuinfo | wc -l)

Install:

% sudo make install

Oracle Solaris
This section describes how to install Groonga from source on Oracle Solaris.

Oracle Solaris 11
Install required packages to build Groonga:

% sudo pkg install gnu-tar gcc-45 system/header

Download source:

% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% gtar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1

Configure with CFLAGS="-m64" CXXFLAGS="-m64" variables. They are needed for building
64-bit version. To build 32-bit version, just remove those variables. (see
source-configure about configure options):

% ./configure CFLAGS="-m64" CXXFLAGS="-m64"

Build:

% make

Install:

% sudo make install

Others
This section describes how to install Groonga from source on UNIX like environment.

To get more detail about installing Groonga from source on the specific environment, find
the document for the specific environment from /install.

Dependencies
Groonga doesn't require any special libraries but requires some tools for build.

Tools
Here are required tools:

· wget, curl or Web browser for downloading source archive

· tar and gzip for extracting source archive

· shell (many shells such as dash, bash and zsh will work)

· C compiler and C++ compiler (gcc and g++ are supported but other compilers may work)

· make (GNU make is supported but other make like BSD make will work)

You must get them ready.

You can use CMake instead of shell but this document doesn't describe about building with
CMake.

Here are optional tools:

· pkg-config for detecting libraries

· sudo for installing built Groonga

You must get them ready if you want to use optional libraries.

Libraries
All libraries are optional. Here are optional libraries:

· MeCab for tokenizing full-text search target document by morphological analysis

· KyTea for tokenizing full-text search target document by morphological analysis

· ZeroMQ for /reference/suggest

· libevent for /reference/suggest

· MessagePack for supporting MessagePack output and /reference/suggest

· libedit for command line editing in /reference/executables/groonga

· zlib for compressing column value

· LZ4 for compressing column value

If you want to use those all or some libraries, you need to install them before installing
Groonga.

Build from source
Groonga uses GNU build system. So the following is the simplest build steps:

% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1
% ./configure
% make
% sudo make install

After the above steps, /reference/executables/groonga is found in /usr/local/bin/groonga.

The default build will work well but you can customize Groonga at configure step.

The following describes details about each step.

configure
First, you need to run configure. Here are important configure options:

--prefix=PATH
Specifies the install base directory. Groonga related files are installed under ${PATH}/
directory.

The default is /usr/local. In this case, /reference/executables/groonga is installed into
/usr/local/bin/groonga.

Here is an example that installs Groonga into ~/local for an user use instead of system
wide use:

% ./configure --prefix=$HOME/local

--localstatedir=PATH
Specifies the base directory to place modifiable file such as log file, PID file and
database files. For example, log file is placed at ${PATH}/log/groonga.log.

The default is /usr/local/var.

Here is an example that system wide /var is used for modifiable files:

% ./configure --localstatedir=/var

--with-log-path=PATH
Specifies the default log file path. You can override the default log path is
/reference/executables/groonga command's --log-path command line option. So this option is
not critical build option. It's just for convenient.

The default is /usr/local/var/log/groonga.log. The /usr/local/var part is changed by
--localstatedir option.

Here is an example that log file is placed into shared NFS directory /nfs/log/groonga.log:

% ./configure --with-log-path=/nfs/log/groonga.log

--with-default-encoding=ENCODING
Specifies the default encoding. Available encodings are euc_jp, sjis, utf8, latin1, koi8r
and none.

The default is utf-8.

Here is an example that Shift_JIS is used as the default encoding:

% ./configure --with-default-encoding=sjis

--with-match-escalation-threshold=NUMBER
Specifies the default match escalation threshold. See select-match-escalation-threshold
about match escalation threshold. -1 means that match operation never escalate.

The default is 0.

Here is an example that match escalation isn't used by default:

% ./configure --with-match-escalation-threshold=-1

--with-zlib
Enables column value compression by zlib.

The default is disabled.

Here is an example that enables column value compression by zlib:

% ./configure --with-zlib

--with-lz4
Enables column value compression by LZ4.

The default is disabled.

Here is an example that enables column value compression by LZ4:

% ./configure --with-lz4

--with-message-pack=MESSAGE_PACK_INSTALL_PREFIX
Specifies where MessagePack is installed. If MessagePack isn't installed with
--prefix=/usr, you need to specify this option with path that you use for building
MessagePack.

If you installed MessagePack with --prefix=$HOME/local option, you should specify
--with-message-pack=$HOME/local to Groonga's configure.

The default is /usr.

Here is an example that uses MessagePack built with --prefix=$HOME/local option:

% ./configure --with-message-pack=$HOME/local

--with-munin-plugins
Installs Munin plugins for Groonga. They are installed into
${PREFIX}/share/groonga/munin/plugins/.

Those plugins are not installed by default.

Here is an example that installs Munin plugins for Groonga:

% ./configure --with-munin-plugins

--with-package-platform=PLATFORM
Installs platform specific system management files such as init script. Available
platforms are redhat and fedora. redhat is for Red Hat and Red Hat clone distributions
such as CentOS. fedora is for Fedora.

Those system management files are not installed by default.

Here is an example that installs CentOS specific system management files:

% ./configure --with-package-platform=redhat

--help
Shows all configure options.

make
configure is succeeded, you can build Groonga by make:

% make

If you have multi cores CPU, you can make faster by using -j option. If you have 4 cores
CPU, it's good for using -j4 option:

% make -j4

If you get some errors by make, please report them to us: /contribution/report

make install
Now, you can install built Groonga!:

% sudo make install

If you have write permission for ${PREFIX}, you don't need to use sudo. e.g.
--prefix=$HOME/local case. In this case, use make install:

% make install

COMMUNITY


There are some places for sharing Groonga information. We welcome you to join our
community.

Mailing List
There are mailing lists for discussion about Groonga.

For English speakers
[email protected]

For Japanese speakers
[email protected]

Chat room
There are chat rooms for discussion about Groonga.

For English speakers
groonga/en chat room on Gitter

For Japanese speakers
groonga/ja chat room on Gitter

Twitter
@groonga tweets Groonga related information.

Please follow the account to get the latest Groonga related information!

Facebook
Groonga page on Facebook shares Groonga related information.

Please like the page to get the latest Groonga related information!

TUTORIAL


Basic operations
A Groonga package provides a C library (libgroonga) and a command line tool (groonga).
This tutorial explains how to use the command line tool, with which you can create/operate
databases, start a server, establish a connection with a server, etc.

Create a database
The first step to using Groonga is to create a new database. The following shows how to do
it.

Form:

groonga -n DB_PATH

The -n option specifies to create a new database and DB_PATH specifies the path of the new
database. Actually, a database consists of a series of files and DB_PATH specifies the
file which will be the entrance to the new database. DB_PATH also specifies the path
prefix for other files. Note that database creation fails if DB_PATH points to an existing
file (For example, db open failed (DB_PATH): syscall error 'DB_PATH' (File exists). You
can operate an existing database in a way that is in the next chapter).

This command creates a new database and then enters into interactive mode in which Groonga
prompts you to enter commands for operating that database. You can terminate this mode
with Ctrl-d.

Execution example:

% groonga -n /tmp/groonga-databases/introduction.db

After this database creation, you can find a series of files in /tmp/groonga-databases.

Operate a database
The following shows how to operate an existing database.

Form:

groonga DB_PATH [COMMAND]

DB_PATH specifies the path of a target database. This command fails if the specified
database does not exist.

If COMMAND is specified, Groonga executes COMMAND and returns the result. Otherwise,
Groonga starts in interactive mode that reads commands from the standard input and
executes them one by one. This tutorial focuses on the interactive mode.

Let's see the status of a Groonga process by using a /reference/commands/status command.

Execution example:

% groonga /tmp/groonga-databases/introduction.db
status
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "uptime": 0,
# "max_command_version": 2,
# "n_queries": 0,
# "cache_hit_rate": 0.0,
# "version": "5.0.6-128-g8029ddb",
# "alloc_count": 206,
# "command_version": 1,
# "starttime": 1439995916,
# "default_command_version": 1
# }
# ]

As shown in the above example, a command returns a JSON array. The first element contains
an error code, execution time, etc. The second element is the result of an operation.

NOTE:
You can format a JSON using additional tools. For example, grnwrap, Grnline, jq and so
on.

Command format
Commands for operating a database accept arguments as follows:

Form_1: COMMAND VALUE_1 VALUE_2 ..

Form_2: COMMAND --NAME_1 VALUE_1 --NAME_2 VALUE_2 ..

In the first form, arguments must be passed in order. This kind of arguments are called
positional arguments because the position of each argument determines its meaning.

In the second form, you can specify a parameter name with its value. So, the order of
arguments is not defined. This kind of arguments are known as named parameters or keyword
arguments.

If you want to specify a value which contains white-spaces or special characters, such as
quotes and parentheses, please enclose the value with single-quotes or double-quotes.

For details, see also the paragraph of "command" in /reference/executables/groonga.

Basic commands
/reference/commands/status
shows status of a Groonga process.

/reference/commands/table_list
shows a list of tables in a database.

/reference/commands/column_list
shows a list of columns in a table.

/reference/commands/table_create
adds a table to a database.

/reference/commands/column_create
adds a column to a table.

/reference/commands/select
searches records from a table and shows the result.

/reference/commands/load
inserts records to a table.

Create a table
A /reference/commands/table_create command creates a new table.

In most cases, a table has a primary key which must be specified with its data type and
index type.

There are various data types such as integers, strings, etc. See also /reference/types for
more details. The index type determines the search performance and the availability of
prefix searches. The details will be described later.

Let's create a table. The following example creates a table with a primary key. The name
parameter specifies the name of the table. The flags parameter specifies the index type
for the primary key. The key_type parameter specifies the data type of the primary key.

Execution example:

table_create --name Site --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

The second element of the result indicates that the operation succeeded.

View a table
A /reference/commands/select command can enumerate records in a table.

Execution example:

select --table Site
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ]
# ]
# ]
# ]

When only a table name is specified with a table parameter, a /reference/commands/select
command returns the first (at most) 10 records in the table. [0] in the result shows the
number of records in the table. The next array is a list of columns. ["_id","Uint32"] is a
column of UInt32, named _id. ["_key","ShortText"] is a column of ShortText, named _key.

The above two columns, _id and _key, are the necessary columns. The _id column stores IDs
those are automatically allocated by Groonga. The _key column is associated with the
primary key. You are not allowed to rename these columns.

Create a column
A /reference/commands/column_create command creates a new column.

Let's add a column. The following example adds a column to the Site table. The table
parameter specifies the target table. The name parameter specifies the name of the column.
The type parameter specifies the data type of the column.

Execution example:

column_create --table Site --name title --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
select --table Site
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ]
# ]
# ]

Load records
A /reference/commands/load command loads JSON-formatted records into a table.

The following example loads nine records into the Site table.

Execution example:

load --table Site
[
{"_key":"http://example.org/","title":"This is test record 1!"},
{"_key":"http://example.net/","title":"test record 2."},
{"_key":"http://example.com/","title":"test test record three."},
{"_key":"http://example.net/afr","title":"test record four."},
{"_key":"http://example.org/aba","title":"test test test record five."},
{"_key":"http://example.com/rab","title":"test test test test record six."},
{"_key":"http://example.net/atv","title":"test test test record seven."},
{"_key":"http://example.org/gat","title":"test test record eight."},
{"_key":"http://example.com/vdw","title":"test test record nine."},
]
# [[0, 1337566253.89858, 0.000355720520019531], 9]

The second element of the result indicates how many records were successfully loaded. In
this case, all the records are successfully loaded.

Let's make sure that these records are correctly stored.

Execution example:

select --table Site
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ],
# [
# 2,
# "http://example.net/",
# "test record 2."
# ],
# [
# 3,
# "http://example.com/",
# "test test record three."
# ],
# [
# 4,
# "http://example.net/afr",
# "test record four."
# ],
# [
# 5,
# "http://example.org/aba",
# "test test test record five."
# ],
# [
# 6,
# "http://example.com/rab",
# "test test test test record six."
# ],
# [
# 7,
# "http://example.net/atv",
# "test test test record seven."
# ],
# [
# 8,
# "http://example.org/gat",
# "test test record eight."
# ],
# [
# 9,
# "http://example.com/vdw",
# "test test record nine."
# ]
# ]
# ]
# ]

Get a record
A /reference/commands/select command can search records in a table.

If a search condition is specified with a query parameter, a /reference/commands/select
command searches records matching the search condition and returns the matched records.

Let's get a record having a specified record ID. The following example gets the first
record in the Site table. More precisely, the query parameter specifies a record whose _id
column stores 1.

Execution example:

select --table Site --query _id:1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]

Next, let's get a record having a specified key. The following example gets the record
whose primary key is "http://example.org/". More precisely, the query parameter specifies
a record whose _key column stores "http://example.org/".

Execution example:

select --table Site --query '_key:"http://example.org/"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]

Create a lexicon table for full text search
Let's go on to how to make full text search.

Groonga uses an inverted index to provide fast full text search. So, the first step is to
create a lexicon table which stores an inverted index, also known as postings lists. The
primary key of this table is associated with a vocabulary made up of index terms and each
record stores postings lists for one index term.

The following shows a command which creates a lexicon table named Terms. The data type of
its primary key is ShortText.

Execution example:

table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]

The /reference/commands/table_create command takes many parameters but you don't need to
understand all of them. Please skip the next paragraph if you are not interested in how it
works.

The TABLE_PAT_KEY flag specifies to store index terms in a patricia trie. The
default_tokenizer parameter specifies the method for tokenizing text. This example uses
TokenBigram that is generally called N-gram.

The normalizer parameter specifies to normalize index terms.

Create an index column for full text search
The second step is to create an index column, which allows you to search records from its
associated column. That is to say this step specifies which column needs an index.

Let's create an index column. The following example creates an index column for a column
in the Site table.

Execution example:

column_create --table Terms --name blog_title --flags COLUMN_INDEX|WITH_POSITION --type Site --source title
# [[0, 1337566253.89858, 0.000355720520019531], true]

The table parameter specifies the index table and the name parameter specifies the index
column. The type parameter specifies the target table and the source parameter specifies
the target column. The COLUMN_INDEX flag specifies to create an index column and the
WITH_POSITION flag specifies to create a full inverted index, which contains the positions
of each index term. This combination, COLUMN_INDEX|WITH_POSITION, is recommended for the
general purpose.

NOTE:
You can create a lexicon table and index columns before/during/after loading records.
If a target column already has records, Groonga creates an inverted index in a static
manner. In contrast, if you load records into an already indexed column, Groonga
updates the inverted index in a dynamic manner.

Full text search
It's time. You can make full text search with a /reference/commands/select command.

A query for full text search is specified with a query parameter. The following example
searches records whose "title" column contains "this". The '@' specifies to make full text
search. Note that a lower case query matches upper case and capitalized terms in a record
if NormalizerAuto was specified when creating a lexcon table.

Execution example:

select --table Site --query title:@this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]

In this example, the first record matches the query because its title contains "This",
that is the capitalized form of the query.

A /reference/commands/select command accepts an optional parameter, named match_columns,
that specifies the default target columns. This parameter is used if target columns are
not specified in a query. [1]

The combination of "--match_columns title" and "--query this" brings you the same result
that "--query title:@this" does.

Execution example:

select --table Site --match_columns title --query this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]

Specify output columns
An output_columns parameter of a /reference/commands/select command specifies columns to
appear in the search result. If you want to specify more than one columns, please separate
column names by commas (',').

Execution example:

select --table Site --output_columns _key,title,_score --query title:@test
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://example.org/",
# "This is test record 1!",
# 1
# ],
# [
# "http://example.net/",
# "test record 2.",
# 1
# ],
# [
# "http://example.com/",
# "test test record three.",
# 2
# ],
# [
# "http://example.net/afr",
# "test record four.",
# 1
# ],
# [
# "http://example.org/aba",
# "test test test record five.",
# 3
# ],
# [
# "http://example.com/rab",
# "test test test test record six.",
# 4
# ],
# [
# "http://example.net/atv",
# "test test test record seven.",
# 3
# ],
# [
# "http://example.org/gat",
# "test test record eight.",
# 2
# ],
# [
# "http://example.com/vdw",
# "test test record nine.",
# 2
# ]
# ]
# ]
# ]

This example specifies three output columns including the _score column, which stores the
relevance score of each record.

Specify output ranges
A /reference/commands/select command returns a part of its search result if offset and/or
limit parameters are specified. These parameters are useful to paginate a search result, a
widely-used interface which shows a search result on a page by page basis.

An offset parameter specifies the starting point and a limit parameter specifies the
maximum number of records to be returned. If you need the first record in a search result,
the offset parameter must be 0 or omitted.

Execution example:

select --table Site --offset 0 --limit 3
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ],
# [
# 2,
# "http://example.net/",
# "test record 2."
# ],
# [
# 3,
# "http://example.com/",
# "test test record three."
# ]
# ]
# ]
# ]
select --table Site --offset 3 --limit 3
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 4,
# "http://example.net/afr",
# "test record four."
# ],
# [
# 5,
# "http://example.org/aba",
# "test test test record five."
# ],
# [
# 6,
# "http://example.com/rab",
# "test test test test record six."
# ]
# ]
# ]
# ]
select --table Site --offset 7 --limit 3
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 8,
# "http://example.org/gat",
# "test test record eight."
# ],
# [
# 9,
# "http://example.com/vdw",
# "test test record nine."
# ]
# ]
# ]
# ]

Sort a search result
A /reference/commands/select command sorts its result when used with a sortby parameter.

A sortby parameter specifies a column as a sorting creteria. A search result is arranged
in ascending order of the column values. If you want to sort a search result in reverse
order, please add a leading hyphen ('-') to the column name in a parameter.

The following example shows records in the Site table in reverse order.

Execution example:

select --table Site --sortby -_id
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 9,
# "http://example.com/vdw",
# "test test record nine."
# ],
# [
# 8,
# "http://example.org/gat",
# "test test record eight."
# ],
# [
# 7,
# "http://example.net/atv",
# "test test test record seven."
# ],
# [
# 6,
# "http://example.com/rab",
# "test test test test record six."
# ],
# [
# 5,
# "http://example.org/aba",
# "test test test record five."
# ],
# [
# 4,
# "http://example.net/afr",
# "test record four."
# ],
# [
# 3,
# "http://example.com/",
# "test test record three."
# ],
# [
# 2,
# "http://example.net/",
# "test record 2."
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]

The next example uses the _score column as the sorting criteria for ranking the search
result. The result is sorted in relevance order.

Execution example:

select --table Site --query title:@test --output_columns _id,_score,title --sortby -_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 6,
# 4,
# "test test test test record six."
# ],
# [
# 5,
# 3,
# "test test test record five."
# ],
# [
# 7,
# 3,
# "test test test record seven."
# ],
# [
# 8,
# 2,
# "test test record eight."
# ],
# [
# 3,
# 2,
# "test test record three."
# ],
# [
# 9,
# 2,
# "test test record nine."
# ],
# [
# 1,
# 1,
# "This is test record 1!"
# ],
# [
# 4,
# 1,
# "test record four."
# ],
# [
# 2,
# 1,
# "test record 2."
# ]
# ]
# ]
# ]

If you want to specify more than one columns, please separate column names by commas
(','). In such a case, a search result is sorted in order of the values in the first
column, and then records having the same values in the first column are sorted in order of
the second column values.

Execution example:

select --table Site --query title:@test --output_columns _id,_score,title --sortby -_score,_id
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 6,
# 4,
# "test test test test record six."
# ],
# [
# 5,
# 3,
# "test test test record five."
# ],
# [
# 7,
# 3,
# "test test test record seven."
# ],
# [
# 3,
# 2,
# "test test record three."
# ],
# [
# 8,
# 2,
# "test test record eight."
# ],
# [
# 9,
# 2,
# "test test record nine."
# ],
# [
# 1,
# 1,
# "This is test record 1!"
# ],
# [
# 2,
# 1,
# "test record 2."
# ],
# [
# 4,
# 1,
# "test record four."
# ]
# ]
# ]
# ]
footnote

[1] Currently, a match_columns parameter is available iff there exists an inverted index
for full text search. A match_columns parameter for a regular column is not
supported.

Remote access
You can use Groonga as a server which allows remote access. Groonga supports the original
protocol (GQTP), the memcached binary protocol and HTTP.

Hypertext transfer protocol (HTTP)
How to run an HTTP server
Groonga supports the hypertext transfer protocol (HTTP). The following form shows how to
run Groonga as an HTTP server daemon.

Form:

groonga [-p PORT_NUMBER] -d --protocol http DB_PATH

The --protocol option and its argument specify the protocol of the server. "http"
specifies to use HTTP. If the -p option is not specified, Groonga uses the default port
number 10041.

The following command runs an HTTP server that listens on the port number 80.

Execution example:

% sudo groonga -p 80 -d --protocol http /tmp/groonga-databases/introduction.db
%

NOTE:
You must have root privileges if you listen on the port number 80 (well known port).
There is no such a limitation about the port number 1024 or over.

How to send a command to an HTTP server
You can send a command to an HTTP server by sending a GET request to /d/COMMAND_NAME.
Command parameters can be passed as parameters of the GET request. The format is
"?NAME_1=VALUE_1&NAME_2=VALUE_2&...".

The following example shows how to send commands to an HTTP server.

Execution example:

http://HOST_NAME_OR_IP_ADDRESS[:PORT_NUMBER]/d/status
Executed command:
status
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "uptime": 0,
# "max_command_version": 2,
# "n_queries": 0,
# "cache_hit_rate": 0.0,
# "version": "5.0.6-128-g8029ddb",
# "alloc_count": 185,
# "command_version": 1,
# "starttime": 1439995935,
# "default_command_version": 1
# }
# ]
http://HOST_NAME_OR_IP_ADDRESS[:PORT_NUMBER]/d/select?table=Site&query=title:@this
Executed command:
select --table Site --query title:@this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "japan",
# ".org",
# "http://example.net/",
# [
# "http://example.net/",
# "http://example.org/",
# "http://example.com/"
# ],
# "128452975x503157902",
# "This is test record 1!"
# ]
# ]
# ]
# ]

Administration tool (HTTP)
An HTTP server of Groonga provides a browser based administration tool that makes database
management easy. After starting an HTTP server, you can use the administration tool by
accessing http://HOST_NAME_OR_IP_ADDRESS[:PORT_NUMBER]/. Note that Javascript must be
enabled for the tool to work properly.

Security issues
Groonga servers don't support user authentication. Everyone can view and modify databases
hosted by Groonga servers. You are recommended to restrict IP addresses that can access
Groonga servers. You can use iptables or similar for this purpose.

Various data types
Groonga is a full text search engine but also serves as a column-oriented data store.
Groonga supports various data types, such as numeric types, string types, date and time
type, longitude and latitude types, etc. This tutorial shows a list of data types and
explains how to use them.

Overview
The basic data types of Groonga are roughly divided into 5 groups --- boolean type,
numeric types, string types, date/time type and longitude/latitude types. The numeric
types are further divided according to whether integer or floating point number, signed or
unsigned and the number of bits allocated to each integer. The string types are further
divided according to the maximum length. The longitude/latitude types are further divided
according to the geographic coordinate system. For more details, see /reference/types.

In addition, Groonga supports reference types and vector types. Reference types are
designed for accessing other tables. Vector types are designed for storing a variable
number of values in one element.

First, let's create a table for this tutorial.

Execution example:

table_create --name ToyBox --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

Boolean type
The boolean type is used to store true or false. To create a boolean type column, specify
Bool to the type parameter of /reference/commands/column_create command. The default value
of the boolean type is false.

The following example creates a boolean type column and adds three records. Note that the
third record has the default value because no value is specified.

Execution example:

column_create --table ToyBox --name is_animal --type Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","is_animal":true}
{"_key":"Flower","is_animal":false}
{"_key":"Block"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select --table ToyBox --output_columns _key,is_animal
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "is_animal",
# "Bool"
# ]
# ],
# [
# "Monkey",
# true
# ],
# [
# "Flower",
# false
# ],
# [
# "Block",
# false
# ]
# ]
# ]
# ]

Numeric types
The numeric types are divided into integer types and a floating point number type. The
integer types are further divided into the signed integer types and unsigned integer
types. In addition, you can choose the number of bits allocated to each integer. For more
details, see /reference/types. The default value of the numeric types is 0.

The following example creates an Int8 column and a Float column, and then updates existing
records. The /reference/commands/load command updates the weight column as expected. On
the other hand, the price column values are different from the specified values because
15.9 is not an integer and 200 is too large. 15.9 is converted to 15 by removing the
fractional part. 200 causes an overflow and the result becomes -56. Note that the result
of an overflow/underflow is undefined.

Execution example:

column_create --table ToyBox --name price --type Int8
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table ToyBox --name weight --type Float
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","price":15.9}
{"_key":"Flower","price":200,"weight":0.13}
{"_key":"Block","weight":25.7}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select --table ToyBox --output_columns _key,price,weight
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "price",
# "Int8"
# ],
# [
# "weight",
# "Float"
# ]
# ],
# [
# "Monkey",
# 15,
# 0.0
# ],
# [
# "Flower",
# -56,
# 0.13
# ],
# [
# "Block",
# 0,
# 25.7
# ]
# ]
# ]
# ]

String types
The string types are divided according to the maximum length. For more details, see
/reference/types. The default value is the zero-length string.

The following example creates a ShortText column and updates existing records. The third
record ("Block" key record) has the default value (zero-length string) because it's not
updated.

Execution example:

column_create --table ToyBox --name name --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","name":"Grease"}
{"_key":"Flower","name":"Rose"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table ToyBox --output_columns _key,name
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# "Monkey",
# "Grease"
# ],
# [
# "Flower",
# "Rose"
# ],
# [
# "Block",
# ""
# ]
# ]
# ]
# ]

Date and time type
The date and time type of Groonga is Time. Actually, a Time column stores a date and time
as the number of microseconds since the Epoch, 1970-01-01 00:00:00. A Time value can
represent a date and time before the Epoch because the actual data type is a signed
integer. Note that /reference/commands/load and /reference/commands/select commands use a
decimal number to represent a data and time in seconds. The default value is 0.0, which
means the Epoch.

NOTE:
Groonga internally holds the value of Epoch as pair of integer. The first integer
represents the value of seconds, on the other hand, the second integer represents the
value of micro seconds. So, Groonga shows the value of Epoch as floating point.
Integral part means the value of seconds, fraction part means the value of micro
seconds.

The following example creates a Time column and updates existing records. The first record
("Monkey" key record) has the default value (0.0) because it's not updated.

Execution example:

column_create --table ToyBox --name time --type Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Flower","time":1234567890.1234569999}
{"_key":"Block","time":-1234567890}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table ToyBox --output_columns _key,time
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "time",
# "Time"
# ]
# ],
# [
# "Monkey",
# 0.0
# ],
# [
# "Flower",
# 1234567890.12346
# ],
# [
# "Block",
# -1234567890.0
# ]
# ]
# ]
# ]

Longitude and latitude types
The longitude and latitude types are divided according to the geographic coordinate
system. For more details, see /reference/types. To represent a longitude and latitude,
Groonga uses a string formatted as follows:

· "longitude x latitude" in milliseconds (e.g.: "128452975x503157902")

· "longitude x latitude" in degrees (e.g.: "35.6813819x139.7660839")

A number with/without a decimal point represents a longitude or latitude in
milliseconds/degrees respectively. Note that a combination of a number with a decimal
point and a number without a decimal point (e.g. 35.1x139) must not be used. A comma (',')
is also available as a delimiter. The default value is "0x0".

The following example creates a WGS84GeoPoint column and updates existing records. The
second record ("Flower" key record) has the default value ("0x0") because it's not
updated.

Execution example:

column_create --table ToyBox --name location --type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","location":"128452975x503157902"}
{"_key":"Block","location":"35.6813819x139.7660839"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table ToyBox --output_columns _key,location
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ]
# ],
# [
# "Monkey",
# "128452975x503157902"
# ],
# [
# "Flower",
# "0x0"
# ],
# [
# "Block",
# "128452975x503157902"
# ]
# ]
# ]
# ]

Reference types
Groonga supports a reference column, which stores references to records in its associated
table. In practice, a reference column stores the IDs of the referred records in the
associated table and enables access to those records.

You can specify a column in the associated table to the output_columns parameter of a
/reference/commands/select command. The format is Src.Dest where Src is the name of the
reference column and Dest is the name of the target column. If only the reference column
is specified, it is handled as Src._key. Note that if a reference does not point to a
valid record, a /reference/commands/select command outputs the default value of the target
column.

The following example adds a reference column to the Site table that was created in
tutorial-introduction-create-table. The new column, named link, is designed for storing
links among records in the Site table.

Execution example:

column_create --table Site --name link --type Site
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","link":"http://example.net/"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select --table Site --output_columns _key,title,link._key,link.title --query title:@this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ],
# [
# "link._key",
# "ShortText"
# ],
# [
# "link.title",
# "ShortText"
# ]
# ],
# [
# "http://example.org/",
# "This is test record 1!",
# "http://example.net/",
# "test record 2."
# ]
# ]
# ]
# ]

The type parameter of the /reference/commands/column_create command specifies the table to
be associated with the reference column. In this example, the reference column is
associated with the own table. Then, the /reference/commands/load command registers a link
from "http://example.org" to "http://example.net". Note that a reference column requires
the primary key, not the ID, of the record to be referred to. After that, the link is
confirmed by the /reference/commands/select command. In this case, the primary key and the
title of the referred record are output because link._key and link.title are specified to
the output_columns parameter.

Vector types
Groonga supports a vector column, in which each element can store a variable number of
values. To create a vector column, specify the COLUMN_VECTOR flag to the flags parameter
of a /reference/commands/column_create command. A vector column is useful to represent a
many-to-many relationship.

The previous example used a regular column, so each record could have at most one link.
Obviously, the specification is insufficient because a site usually has more than one
links. To solve this problem, the following example uses a vector column.

Execution example:

column_create --table Site --name links --flags COLUMN_VECTOR --type Site
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","links":["http://example.net/","http://example.org/","http://example.com/"]},
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select --table Site --output_columns _key,title,links._key,links.title --query title:@this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ],
# [
# "links._key",
# "ShortText"
# ],
# [
# "links.title",
# "ShortText"
# ]
# ],
# [
# "http://example.org/",
# "This is test record 1!",
# [
# "http://example.net/",
# "http://example.org/",
# "http://example.com/"
# ],
# [
# "test record 2.",
# "This is test record 1!",
# "test test record three."
# ]
# ]
# ]
# ]
# ]

The only difference at the first step is the flags parameter that specifies to create a
vector column. The type parameter of the /reference/commands/column_create command is the
same as in the previous example. Then, the /reference/commands/load command registers
three links from "http://example.org/" to "http://example.net/", "http://example.org/" and
"http://example.com/". After that, the links are confirmed by the
/reference/commands/select command. In this case, the primary keys and the titles are
output as arrays because links._key and links.title are specified to the output_columns
parameter.

Various search conditions
Groonga supports to narrow down by using syntax like JavaScript, sort by the calculated
value. Additionally, Groonga also supports to narrow down & sort search results by using
location information (latitude & longitude).

Narrow down & Full-text search by using syntax like JavaScript
The filter parameter of select command accepts the search condition. There is one
difference between filter parameter and query parameter, you need to specify the condition
by syntax like JavaScript for filter parameter.

Execution example:

select --table Site --filter "_id <= 1" --output_columns _id,_key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/"
# ]
# ]
# ]
# ]

See the detail of above query. Here is the condition which is specified as filter
parameter:

_id <= 1

In this case, this query returns the records which meets the condition that the value of
_id is equal to or less than 1.

Moreover, you can use && for AND search, || for OR search.

Execution example:

select --table Site --filter "_id >= 4 && _id <= 6" --output_columns _id,_key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 4,
# "http://example.net/afr"
# ],
# [
# 5,
# "http://example.org/aba"
# ],
# [
# 6,
# "http://example.com/rab"
# ]
# ]
# ]
# ]
select --table Site --filter "_id <= 2 || _id >= 7" --output_columns _id,_key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/"
# ],
# [
# 2,
# "http://example.net/"
# ],
# [
# 7,
# "http://example.net/atv"
# ],
# [
# 8,
# "http://example.org/gat"
# ],
# [
# 9,
# "http://example.com/vdw"
# ]
# ]
# ]
# ]

If you specify query parameter and filter parameter at the same time, you can get the
records which meets both of the condition as a result.

Sort by using scorer
select command accepts scorer parameter which is used to process each record of full-text
search results.

This parameter accepts the conditions which is specified by syntax like JavaScript as same
as filter parameter.

Execution example:

select --table Site --filter "true" --scorer "_score = rand()" --output_columns _id,_key,_score --sortby _score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# 6,
# "http://example.com/rab",
# 424238335
# ],
# [
# 9,
# "http://example.com/vdw",
# 596516649
# ],
# [
# 7,
# "http://example.net/atv",
# 719885386
# ],
# [
# 2,
# "http://example.net/",
# 846930886
# ],
# [
# 8,
# "http://example.org/gat",
# 1649760492
# ],
# [
# 3,
# "http://example.com/",
# 1681692777
# ],
# [
# 4,
# "http://example.net/afr",
# 1714636915
# ],
# [
# 1,
# "http://example.org/",
# 1804289383
# ],
# [
# 5,
# "http://example.org/aba",
# 1957747793
# ]
# ]
# ]
# ]
select --table Site --filter "true" --scorer "_score = rand()" --output_columns _id,_key,_score --sortby _score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# 4,
# "http://example.net/afr",
# 783368690
# ],
# [
# 2,
# "http://example.net/",
# 1025202362
# ],
# [
# 5,
# "http://example.org/aba",
# 1102520059
# ],
# [
# 1,
# "http://example.org/",
# 1189641421
# ],
# [
# 3,
# "http://example.com/",
# 1350490027
# ],
# [
# 8,
# "http://example.org/gat",
# 1365180540
# ],
# [
# 9,
# "http://example.com/vdw",
# 1540383426
# ],
# [
# 7,
# "http://example.net/atv",
# 1967513926
# ],
# [
# 6,
# "http://example.com/rab",
# 2044897763
# ]
# ]
# ]
# ]

'_score' is one of a pseudo column. The score of full-text search is assigned to it. See
/reference/columns/pseudo about '_score' column.

In the above query, the condition of scorer parameter is:

_score = rand()

In this case, the score of full-text search is overwritten by the value of rand()
function.

The condition of sortby parameter is:

_score

This means that sorting the search result by ascending order.

As a result, the order of search result is randomized.

Narrow down & sort by using location information
Groonga supports to store location information (Longitude & Latitude) and not only narrow
down but also sort by using it.

Groonga supports two kind of column types to store location information. One is
TokyoGeoPoint, the other is WGS84GeoPoint. TokyoGeoPoint is used for Japan geodetic
system. WGS84GeoPoint is used for world geodetic system.

Specify longitude and latitude as follows:

· "[latitude in milliseconds]x[longitude in milliseconds]"(e.g.: "128452975x503157902")

· "[latitude in milliseconds],[longitude in milliseconds]"(e.g.: "128452975,503157902")

· "[latitude in degrees]x[longitude in degrees]"(e.g.: "35.6813819x139.7660839")

· "[latitude in degrees],[longitude in degrees]"(e.g.: "35.6813819,139.7660839")

Let's store two location information about station in Japan by WGS. One is Tokyo station,
the other is Shinjyuku station. Both of them are station in Japan. The latitude of Tokyo
station is 35 degrees 40 minutes 52.975 seconds, the longitude of Tokyo station is 139
degrees 45 minutes 57.902 seconds. The latitude of Shinjyuku station is 35 degrees 41
minutes 27.316 seconds, the longitude of Shinjyuku station is 139 degrees 42 minutes
0.929 seconds. Thus, location information in milliseconds are "128452975x503157902" and
"128487316x502920929" respectively. location information in degrees are
"35.6813819x139.7660839" and "35.6909211x139.7002581" respectively.

Let's register location information in milliseconds.

Execution example:

column_create --table Site --name location --type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","location":"128452975x503157902"}
{"_key":"http://example.net/","location":"128487316x502920929"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table Site --query "_id:1 OR _id:2" --output_columns _key,location
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902"
# ],
# [
# "http://example.net/",
# "128487316x502920929"
# ]
# ]
# ]
# ]

Then assign the value of geo distance which is calculated by
/reference/functions/geo_distance function to scorer parameter.

Let's show geo distance from Akihabara station in Japan. In world geodetic system, the
latitude of Akihabara station is 35 degrees 41 minutes 55.259 seconds, the longitude of
Akihabara station is 139 degrees 46 minutes 27.188 seconds. Specify "128515259x503187188"
for geo_distance function.

Execution example:

select --table Site --query "_id:1 OR _id:2" --output_columns _key,location,_score --scorer '_score = geo_distance(location, "128515259x503187188")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902",
# 2054
# ],
# [
# "http://example.net/",
# "128487316x502920929",
# 6720
# ]
# ]
# ]
# ]

As you can see, the geo distance between Tokyo station and Akihabara station is 2054
meters, the geo distance between Akihabara station and Shinjyuku station is 6720 meters.

The return value of geo_distance function is also used for sorting by specifying pseudo
_score column to sortby parameter.

Execution example:

select --table Site --query "_id:1 OR _id:2" --output_columns _key,location,_score --scorer '_score = geo_distance(location, "128515259x503187188")' --sortby -_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://example.net/",
# "128487316x502920929",
# 6720
# ],
# [
# "http://example.org/",
# "128452975x503157902",
# 2054
# ]
# ]
# ]
# ]

Groonga also supports to narrow down by "a certain point within specified meters".

In such a case, use /reference/functions/geo_in_circle function in filter parameter.

For example, search the records which exists within 5000 meters from Akihabara station.

Execution example:

select --table Site --output_columns _key,location --filter 'geo_in_circle(location, "128515259x503187188", 5000)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902"
# ]
# ]
# ]
# ]

There is /reference/functions/geo_in_rectangle function which is used to search a certain
point within specified region.

Drilldown
You learned how to search and sort searched results in the previous sections. Now that
you can search as you likes, but how do you summarize the number of records which has
specific value in the column?

As you know, there is a naive solution to execute query by every the value of column, then
you can get the number of records as a result. It is a simple way, but it is not
reasonable to many records.

If you are familiar with SQL, you will doubt with "Is there a similar SQL functionality to
GROUP BY in Groonga?".

Of course, Groonga provides such a functionality. It's called as drilldown.

drilldown enables you to get the number of records which belongs to specific the value of
column at once.

To illustrate this feature, imagine the case that classification by domain and grouping by
country that domain belongs to.

Here is the concrete examples how to use this feature.

In this example, we add two columns to Site table. domain column is used for TLD (top
level domain). country column is used for country name. The type of these columns are
SiteDomain table which uses domain name as a primary key and SiteCountry table which uses
country name as a primary key.

Execution example:

table_create --name SiteDomain --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name SiteCountry --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Site --name domain --flags COLUMN_SCALAR --type SiteDomain
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Site --name country --flags COLUMN_SCALAR --type SiteCountry
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","domain":".org","country":"japan"},
{"_key":"http://example.net/","domain":".net","country":"brazil"},
{"_key":"http://example.com/","domain":".com","country":"japan"},
{"_key":"http://example.net/afr","domain":".net","country":"usa"},
{"_key":"http://example.org/aba","domain":".org","country":"korea"},
{"_key":"http://example.com/rab","domain":".com","country":"china"},
{"_key":"http://example.net/atv","domain":".net","country":"china"},
{"_key":"http://example.org/gat","domain":".org","country":"usa"},
{"_key":"http://example.com/vdw","domain":".com","country":"japan"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 9]

Here is a example of drilldown with domain column. Three kind of values are used in domain
column - ".org", ".net" and ".com".

Execution example:

select --table Site --limit 0 --drilldown domain
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# ".org",
# 3
# ],
# [
# ".net",
# 3
# ],
# [
# ".com",
# 3
# ]
# ]
# ]
# ]

Here is a summary of above query.

Drilldown by domain column
┌─────────┬──────────────────────────┬─────────────────────────────────┐
│Group by │ The number of group │ Group records means │
│ │ records │ following records │
├─────────┼──────────────────────────┼─────────────────────────────────┤
│.org │ 3 │ │
│ │ │ · http://example.org/
│ │ │ │
│ │ │ · http://example.org/aba
│ │ │ │
│ │ │ · http://example.org/gat
├─────────┼──────────────────────────┼─────────────────────────────────┤
│.net │ 3 │ │
│ │ │ · http://example.net/
│ │ │ │
│ │ │ · http://example.net/afr
│ │ │ │
│ │ │ · http://example.net/atv
└─────────┴──────────────────────────┴─────────────────────────────────┘

│.com │ 3 │ │
│ │ │ · http://example.com/
│ │ │ │
│ │ │ · http://example.com/rab
│ │ │ │
│ │ │ · http://example.com/vdw
└─────────┴──────────────────────────┴─────────────────────────────────┘

The value of drilldown are returned as the value of _nsubrecs column. In this case, Site
table is grouped by ".org", ".net", ".com" domain. _nsubrecs shows that each three domain
has three records.

If you execute drildown to the column which has table as a type, you can get the value of
column which is stored in referenced table. _nsubrecs pseudo column is added to the table
which is used for drilldown. this pseudo column stores the number of records which is
grouped by.

Then, investigate referenced table in detail. As Site table use SiteDomain table as column
type of domain, you can use --drilldown_output_columns to know detail of referenced
column.

Execution example:

select --table Site --limit 0 --drilldown domain --drilldown_output_columns _id,_key,_nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 1,
# ".org",
# 3
# ],
# [
# 2,
# ".net",
# 3
# ],
# [
# 3,
# ".com",
# 3
# ]
# ]
# ]
# ]

Now, you can see detail of each grouped domain, drilldown by country column which has
".org" as column value.

Execution example:

select --table Site --limit 0 --filter "domain._id == 1" --drilldown country
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "japan",
# 1
# ],
# [
# "korea",
# 1
# ],
# [
# "usa",
# 1
# ]
# ]
# ]
# ]

Drilldown with multiple column
Drilldown feature supports multiple column. Use comma separated multiple column names as
drildown parameter. You can get the each result of drilldown at once.

Execution example:

select --table Site --limit 0 --drilldown domain,country
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# ".org",
# 3
# ],
# [
# ".net",
# 3
# ],
# [
# ".com",
# 3
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "japan",
# 3
# ],
# [
# "brazil",
# 1
# ],
# [
# "usa",
# 2
# ],
# [
# "korea",
# 1
# ],
# [
# "china",
# 2
# ]
# ]
# ]
# ]

Sorting drildown results
Use --drilldown_sortby if you want to sort the result of drilldown. For example, specify
_nsubrecs as ascending order.

Execution example:

select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "brazil",
# 1
# ],
# [
# "korea",
# 1
# ],
# [
# "usa",
# 2
# ],
# [
# "china",
# 2
# ],
# [
# "japan",
# 3
# ]
# ]
# ]
# ]

limits drildown results
The number of drilldown results is limited to 10 as a default. Use drilldown_limit and
drilldown_offset parameter to customize orilldown results.

Execution example:

select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs --drilldown_limit 2 --drilldown_offset 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "usa",
# 2
# ],
# [
# "china",
# 2
# ]
# ]
# ]
# ]

Note that drilldown to the column which stores string is slower than the columns which
stores the other types. If you drilldown to string type of column, create the table that
type of primary key is string, then create the column which refers that table.

Tag search and reverse resolution of reference relationships
As you know, Groonga supports to store array in column which refers other table. In fact,
you can do tag search by using array data which refers other table.

Tag search is very fast because Groonga use inverted index as data structure.

Tag search
Let's consider to create a search engine for an web site to share movies. Each movie may
be associated with multiple keywords which represents the content of movie.

Let's create tables for movie information, then search the movies.

First, create the Video table which stores movie information. the Video table has two
columns. the title column stores title of the movie. the tags column stores multiple tag
information in reference Tag table.

Next, create the Tag table which stores tag information. the Tag table has one column. The
tag string is stored as primary key, then index_tags stores indexes for tags column of
Video table.

Execution example:

table_create --name Video --flags TABLE_HASH_KEY --key_type UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name Tag --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Video --name title --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Video --name tags --flags COLUMN_VECTOR --type Tag
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Tag --name index_tags --flags COLUMN_INDEX --type Video --source tags
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Video
[
{"_key":1,"title":"Soccer 2010","tags":["Sports","Soccer"]},
{"_key":2,"title":"Zenigata Kinjirou","tags":["Variety","Money"]},
{"_key":3,"title":"groonga Demo","tags":["IT","Server","groonga"]},
{"_key":4,"title":"Moero!! Ultra Baseball","tags":["Sports","Baseball"]},
{"_key":5,"title":"Hex Gone!","tags":["Variety","Quiz"]},
{"_key":6,"title":"Pikonyan 1","tags":["Animation","Pikonyan"]},
{"_key":7,"title":"Draw 8 Month","tags":["Animation","Raccoon"]},
{"_key":8,"title":"K.O.","tags":["Animation","Music"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 8]

After creating indexed column, you can do full-text search very fast. The indexed column
is also automatically updated when stored data is refreshed.

List up the movies that specific keywords are given.

Execution example:

select --table Video --query tags:@Variety --output_columns _key,title
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 2,
# "Zenigata Kinjirou"
# ],
# [
# 5,
# "Hex Gone!"
# ]
# ]
# ]
# ]
select --table Video --query tags:@Sports --output_columns _key,title
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "Soccer 2010"
# ],
# [
# 4,
# "Moero!! Ultra Baseball"
# ]
# ]
# ]
# ]
select --table Video --query tags:@Animation --output_columns _key,title
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 6,
# "Pikonyan 1"
# ],
# [
# 7,
# "Draw 8 Month"
# ],
# [
# 8,
# "K.O."
# ]
# ]
# ]
# ]

You can search by tags such as "Variety", "Sports" and "Animation".

Reverse resolution of reference relationships
Groonga supports indexes for reverse resolution among tables. Tag search is one of
concrete examples.

For example, you can search friendships by reverse resolution in social networking site.

Following example shows how to create User table which stores user information, username
column which stores user name, friends column which stores list of user's friends in
array, index_friends column as indexed column.

Execution example:

table_create --name User --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table User --name username --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table User --name friends --flags COLUMN_VECTOR --type User
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table User --name index_friends --flags COLUMN_INDEX --type User --source friends
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table User
[
{"_key":"ken","username":"健作","friends":["taro","jiro","tomo","moritapo"]}
{"_key":"moritapo","username":"森田","friends":["ken","tomo"]}
{"_key":"taro","username":"ぐるんが太郎","friends":["jiro","tomo"]}
{"_key":"jiro","username":"ぐるんが次郎","friends":["taro","tomo"]}
{"_key":"tomo","username":"トモちゃん","friends":["ken","hana"]}
{"_key":"hana","username":"花子","friends":["ken","taro","jiro","moritapo","tomo"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 6]

Let's show list of users who contains specified user in friend list.

Execution example:

select --table User --query friends:@tomo --output_columns _key,username
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "username",
# "ShortText"
# ]
# ],
# [
# "ken",
# "健作"
# ],
# [
# "taro",
# "ぐるんが太郎"
# ],
# [
# "jiro",
# "ぐるんが次郎"
# ],
# [
# "moritapo",
# "森田"
# ],
# [
# "hana",
# "花子"
# ]
# ]
# ]
# ]
select --table User --query friends:@jiro --output_columns _key,username
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "username",
# "ShortText"
# ]
# ],
# [
# "ken",
# "健作"
# ],
# [
# "taro",
# "ぐるんが太郎"
# ],
# [
# "hana",
# "花子"
# ]
# ]
# ]
# ]

Then drilldown the count which shows user is listed as friend.

Execution example:

select --table User --limit 0 --drilldown friends
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 6
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "friends",
# "User"
# ],
# [
# "index_friends",
# "UInt32"
# ],
# [
# "username",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 6
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "taro",
# 3
# ],
# [
# "jiro",
# 3
# ],
# [
# "tomo",
# 5
# ],
# [
# "moritapo",
# 2
# ],
# [
# "ken",
# 3
# ],
# [
# "hana",
# 1
# ]
# ]
# ]
# ]

As you can see, it shows the results which follows reverse resolution of reference
relationship.

Geo location search with index
Groonga supports to add indexes to the column which stores geo location information.
Groonga is very fast because it use such indexes against the column which contains geo
location information to search enormous number of records.

Execution example:

table_create --name GeoSite --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table GeoSite --name location --type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name GeoIndex --flags TABLE_PAT_KEY --key_type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table GeoIndex --name index_point --type GeoSite --flags COLUMN_INDEX --source location
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table GeoSite
[
{"_key":"http://example.org/","location":"128452975x503157902"},
{"_key":"http://example.net/","location":"128487316x502920929"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table GeoSite --filter 'geo_in_circle(location, "128515259x503187188", 5000)' --output_columns _key,location
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902"
# ]
# ]
# ]
# ]

These indexes are also used when sorting the records with geo location search.

Execution example:

select --table GeoSite --filter 'geo_in_circle(location, "128515259x503187188", 50000)' --output_columns _key,location,_score --sortby '-geo_distance(location, "128515259x503187188")' --scorer '_score = geo_distance(location, "128515259x503187188")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902",
# 2054
# ],
# [
# "http://example.net/",
# "128487316x502920929",
# 6720
# ]
# ]
# ]
# ]

match_columns parameter
Full-text search against multiple columns
Groonga supports full-text search against multiple columns. Let's consider blog site.
Usually, blog site has a table which contains title column and content column. How do you
search the blog entry which contains specified keywords in title or content?

In such a case, there are two ways to create indexes. One way is creating column index
against each column. The other way is creating one column index against multiple columns.
Either way, Groonga supports similar full-text search syntax.

Creating column index against each column
Here is the example which create column index against each column.

First, create Blog1 table, add title column which stores title string, message column
which stores content of blog entry.

Then create IndexBlog1 table for column indexes, add index_title column for title column,
index_message column for message column.

Execution example:

table_create --name Blog1 --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog1 --name title --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog1 --name message --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name IndexBlog1 --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table IndexBlog1 --name index_title --flags COLUMN_INDEX|WITH_POSITION --type Blog1 --source title
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table IndexBlog1 --name index_message --flags COLUMN_INDEX|WITH_POSITION --type Blog1 --source message
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Blog1
[
{"_key":"grn1","title":"Groonga test","message":"Groonga message"},
{"_key":"grn2","title":"baseball result","message":"rakutan eggs 4 - 4 Groonga moritars"},
{"_key":"grn3","title":"Groonga message","message":"none"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

match_columns option of select command accepts multiple columns as search target. Specify
query string to query option. Then you can do full-text search title and content of blog
entries.

Let's try to search blog entries.

Execution example:

select --table Blog1 --match_columns title||message --query groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ],
# [
# 2,
# "grn2",
# "rakutan eggs 4 - 4 Groonga moritars",
# "baseball result"
# ]
# ]
# ]
# ]
select --table Blog1 --match_columns title||message --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ]
# ]
# ]
# ]
select --table Blog1 --match_columns title --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]

Creating one column index against multiple columns
Groonga also supports one column index against multiple columns.

The difference for previous example is only one column index exists. Thus, There is one
common column index against title and message column.

Even though same column index is used, Groonga supports to search against title column
only, message column only and title or message column.

Execution example:

table_create --name Blog2 --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog2 --name title --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog2 --name message --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name IndexBlog2 --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table IndexBlog2 --name index_blog --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION --type Blog2 --source title,message
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Blog2
[
{"_key":"grn1","title":"Groonga test","message":"Groonga message"},
{"_key":"grn2","title":"baseball result","message":"rakutan eggs 4 - 4 Groonga moritars"},
{"_key":"grn3","title":"Groonga message","message":"none"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

Let's search same query in previous section. You can get same search results.

Execution example:

select --table Blog2 --match_columns title||message --query groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ],
# [
# 2,
# "grn2",
# "rakutan eggs 4 - 4 Groonga moritars",
# "baseball result"
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]
select --table Blog2 --match_columns title||message --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]
select --table Blog2 --match_columns title --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]

NOTE:
There may be a question that "which is the better solution for indexing." It depends
on the case.

· Indexes for each column - The update performance tends to be better than multiple
colum index because there is enough buffer for updating. On the other hand, the
efficiency of disk usage is not so good.

· Indexes for multiple column - It saves disk usage because it shares common buffer. On
the other hand, the update performance is not so good.

Full text search with specific index name
TODO

Nested index search among related table by column index
If there are relationships among multiple table with column index, you can search multiple
table by specifying reference column name.

Here is the concrete example.

There are tables which store blog articles, comments for articles. The table which stores
articles has columns for article and comment. And the comment column refers Comments
table. The table which stores comments has columns for comment and column index to
article table.

if you want to search the articles which contain specified keyword in comment, you need to
execute fulltext search for table of comment, then search the records which contains
fulltext search results.

But, you can search the records by specifying the reference column index at once.

Here is the sample schema.

Execution example:

table_create Comments TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Articles TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Articles content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Articles comment COLUMN_SCALAR Comments
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Lexicon TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon articles_content COLUMN_INDEX|WITH_POSITION Articles content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon comments_content COLUMN_INDEX|WITH_POSITION Comments content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments article COLUMN_INDEX Articles comment
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is the sample data.

Execution example:

load --table Comments
[
{"_key": 1, "content": "I'm using Groonga too!"},
{"_key": 2, "content": "I'm using Groonga and Mroonga!"},
{"_key": 3, "content": "I'm using Mroonga too!"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
load --table Articles
[
{"content": "Groonga is fast!", "comment": 1},
{"content": "Groonga is useful!"},
{"content": "Mroonga is fast!", "comment": 3}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

You can write the query that search the records which contains specified keyword as a
comment, then fetch the articles which refers to it.

Query for searching the records described above:

select Articles --match_columns comment.content --query groonga --output_columns "_id, _score, *"

You need to concatenate comment column of Articles table and content column of Comments
table with period( . ) as --match_columns arguments.

At first, this query execute fulltext search from content of Comments table, then fetch
the records of Articles table which refers to already searched records of Comments table.
(Because of this, if you comment out the query which creates index column article of
Comments table, you can't get intended search results.)

Execution example:

select Articles --match_columns comment.content --query groonga --output_columns "_id, _score, *"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "comment",
# "Comments"
# ],
# [
# "content",
# "Text"
# ]
# ],
# [
# 1,
# 1,
# 1,
# "Groonga is fast!"
# ]
# ]
# ]
# ]

Now, you can search articles which contains specific keywords as a comment.

The feature of nested index search is not limited to the relationship between two table
only.

Here is the sample schema similar to previous one. The difference is added table which
express 'Reply' and relationship is extended to three tables.

Execution example:

table_create Replies2 TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Replies2 content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Comments2 TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments2 content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments2 comment COLUMN_SCALAR Replies2
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Articles2 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Articles2 content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Articles2 comment COLUMN_SCALAR Comments2
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Lexicon2 TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon2 articles_content COLUMN_INDEX|WITH_POSITION Articles2 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon2 comments_content COLUMN_INDEX|WITH_POSITION Comments2 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon2 replies_content COLUMN_INDEX|WITH_POSITION Replies2 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments2 article COLUMN_INDEX Articles2 comment
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Replies2 reply_to COLUMN_INDEX Comments2 comment
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is the sample data.

Execution example:

load --table Replies2
[
{"_key": 1, "content": "I'm using Rroonga too!"},
{"_key": 2, "content": "I'm using Groonga and Mroonga and Rroonga!"},
{"_key": 3, "content": "I'm using Nroonga too!"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
load --table Comments2
[
{"_key": 1, "content": "I'm using Groonga too!", "comment": 1},
{"_key": 2, "content": "I'm using Groonga and Mroonga!", "comment": 2},
{"_key": 3, "content": "I'm using Mroonga too!"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
load --table Articles2
[
{"content": "Groonga is fast!", "comment": 1},
{"content": "Groonga is useful!", "comment": 2},
{"content": "Mroonga is fast!", "comment": 3}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

Query for searching the records described above:

select Articles2 --match_columns comment.content --query mroonga --output_columns "_id, _score, *"
select Articles2 --match_columns comment.comment.content --query mroonga --output_columns "_id, _score, *"

The first query searches mroonga from Comments2 table, the second one searches mroonga
from Replies2 and Comments2 table by using reference column index.

Execution example:

select Articles2 --match_columns comment.content --query mroonga --output_columns "_id, _score, *"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "comment",
# "Comments2"
# ],
# [
# "content",
# "Text"
# ]
# ],
# [
# 2,
# 1,
# 2,
# "Groonga is useful!"
# ],
# [
# 3,
# 1,
# 3,
# "Mroonga is fast!"
# ]
# ]
# ]
# ]
select Articles2 --match_columns comment.comment.content --query mroonga --output_columns "_id, _score, *"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "comment",
# "Comments2"
# ],
# [
# "content",
# "Text"
# ]
# ],
# [
# 2,
# 1,
# 2,
# "Groonga is useful!"
# ]
# ]
# ]
# ]

As a result, the first query matches two article because of Comments2 table has two
records which contains mroonga as keyword.

On the other hand, the second one matches one article only because of Replies2 table has
only one record which contains mroonga as keyword, and there is one record which contains
same keyword and refers to the record in Comments2 table.

Indexes with Weight
TODO

Prefix search with patricia trie
Groonga supports to create a table with patricia trie option. By specifying it, You can
do prefix search.

And more, you can do suffix search against primary key by specifying additional option.

Prefix search by primary key
table_create command which uses TABLE_PAT_KEY for flags option supports prefix search by
primary key.

Execution example:

table_create --name PatPrefix --flags TABLE_PAT_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table PatPrefix
[
{"_key":"James"}
{"_key":"Jason"}
{"_key":"Jennifer"},
{"_key":"Jeff"},
{"_key":"John"},
{"_key":"Joseph"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 6]
select --table PatPrefix --query _key:^Je
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 3,
# "Jennifer"
# ],
# [
# 4,
# "Jeff"
# ]
# ]
# ]
# ]

Suffix search by primary key
table_create command which uses TABLE_PAT_KEY and KEY_WITH_SIS for flags option supports
prefix search and suffix search by primary key.

If you set KEY_WITH_SIS flag, suffix search records also are added when you add the data.
So if you search simply, the automatically added records are hit in addition to the
original records. In order to search only the original records, you need a plan.

For example, in order to make this distinction between the original records and
automatically added records, add the original column indicating that it is the original
record, and add original column is true to the search condition. For attention, use
--filter option because --query option is not specify Bool type value intuitively.

Execution example:

table_create --name PatSuffix --flags TABLE_PAT_KEY|KEY_WITH_SIS --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table PatSuffix --name original --type Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table PatSuffix
[
{"_key":"ひろゆき","original":true},
{"_key":"まろゆき","original":true},
{"_key":"ひろあき","original":true},
{"_key":"ゆきひろ","original":true}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
select --table PatSuffix --query _key:$ゆき
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "original",
# "Bool"
# ]
# ],
# [
# 3,
# "ゆき",
# false
# ],
# [
# 2,
# "ろゆき",
# false
# ],
# [
# 5,
# "まろゆき",
# true
# ],
# [
# 1,
# "ひろゆき",
# true
# ]
# ]
# ]
# ]
select --table PatSuffix --filter '_key @$ "ゆき" && original == true'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "original",
# "Bool"
# ]
# ],
# [
# 5,
# "まろゆき",
# true
# ],
# [
# 1,
# "ひろゆき",
# true
# ]
# ]
# ]
# ]

Additional information about lexicon for full text search
Groonga uses lexicon for full text search as a table. Thus, Groonga can hold multiple
information each lexicon. For example, Groonga holds frequency of word, flags for stop
word, importance of word and so on.

TODO: Write document.

Let's create micro-blog
Let's create micro-blog with full text search by Groonga. Micro-blog is one of the
broadcast medium in the forms of blog. It is mainly used to post small messages like a
Twitter.

Create a table
Let's create table.

table_create --name Users --flags TABLE_HASH_KEY --key_type ShortText
table_create --name Comments --flags TABLE_HASH_KEY --key_type ShortText
table_create --name HashTags --flags TABLE_HASH_KEY --key_type ShortText
table_create --name Bigram --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
table_create --name GeoIndex --flags TABLE_PAT_KEY --key_type WGS84GeoPoint

column_create --table Users --name name --flags COLUMN_SCALAR --type ShortText
column_create --table Users --name follower --flags COLUMN_VECTOR --type Users
column_create --table Users --name favorites --flags COLUMN_VECTOR --type Comments
column_create --table Users --name location --flags COLUMN_SCALAR --type WGS84GeoPoint
column_create --table Users --name location_str --flags COLUMN_SCALAR --type ShortText
column_create --table Users --name description --flags COLUMN_SCALAR --type ShortText
column_create --table Users --name followee --flags COLUMN_INDEX --type Users --source follower

column_create --table Comments --name comment --flags COLUMN_SCALAR --type ShortText
column_create --table Comments --name last_modified --flags COLUMN_SCALAR --type Time
column_create --table Comments --name replied_to --flags COLUMN_SCALAR --type Comments
column_create --table Comments --name replied_users --flags COLUMN_VECTOR --type Users
column_create --table Comments --name hash_tags --flags COLUMN_VECTOR --type HashTags
column_create --table Comments --name location --flags COLUMN_SCALAR --type WGS84GeoPoint
column_create --table Comments --name posted_by --flags COLUMN_SCALAR --type Users
column_create --table Comments --name favorited_by --flags COLUMN_INDEX --type Users --source favorites

column_create --table HashTags --name hash_index --flags COLUMN_INDEX --type Comments --source hash_tags

column_create --table Bigram --name users_index --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION --type Users --source name,location_str,description
column_create --table Bigram --name comment_index --flags COLUMN_INDEX|WITH_POSITION --type Comments --source comment

column_create --table GeoIndex --name users_location --type Users --flags COLUMN_INDEX --source location
column_create --table GeoIndex --name comments_location --type Comments --flags COLUMN_INDEX --source location

Users table
This is the table which stores user information. It stores name of user, profile, list of
follower and so on.

_key User ID

name User name

follower
List of following users

favorites
List of favorite comments

location
Current location of user (geolocation)

location_str
Current location of user (string)

description
User profile

followee
Indexes for follower column in Users table. With this indexes, you can search
users who follows the person.

Comments table
This is the table which stores comments and its metadata. It stores content of comment,
posted date, comment which reply to, and so on.

_key Comment ID

comment
Content of comment

last_modified
Posted date

replied_to
Comment which you reply to someone

replied_users
List of users who you reply to

hash_tags
List of hash tags about comment

location
Posted place (for geolocation)

posted_by
Person who write comment

favorited_by
Indexes for favorites column in Users table. With this indexes, you can search the
person who mark comment as favorite one.

HashTags table
This is the table which stores hash tags for comments.

_key Hash tag

hash_index
Indexes for Comments.hash_tags. With this indexes, you can search list of comments
with specified hash tags.

Bigram table
This is the table which stores indexes for full text search by user information or
comments.

_key Word

users_index
Indexes of user information. This column contains indexes of user name
(Users.name), current location (Users.location_str), profile (Users.description).

comment_index
Indexes about content of comments (Comments.comment).

GeoIndex table
This is the table which stores indexes of location column to search geo location
effectively.

users_location
Indexes of location column for Users table

comments_location
Indexes of location column for Comments table

Loading data
Then, load example data.

load --table Users
[
{
"_key": "alice",
"name": "Alice",
"follower": ["bob"],
"favorites": [],
"location": "152489000x-255829000",
"location_str": "Boston, Massachusetts",
"description": "Groonga developer"
},
{
"_key": "bob",
"name": "Bob",
"follower": ["alice","charlie"],
"favorites": ["alice:1","charlie:1"],
"location": "146249000x-266228000",
"location_str": "Brooklyn, New York City",
"description": ""
},
{
"_key": "charlie",
"name": "Charlie",
"follower": ["alice","bob"],
"favorites": ["alice:1","bob:1"],
"location": "146607190x-267021260",
"location_str": "Newark, New Jersey",
"description": "Hmm,Hmm"
}
]

load --table Comments
[
{
"_key": "alice:1",
"comment": "I've created micro-blog!",
"last_modified": "2010/03/17 12:05:00",
"posted_by": "alice",
},
{
"_key": "bob:1",
"comment": "First post. test,test...",
"last_modified": "2010/03/17 12:00:00",
"posted_by": "bob",
},
{
"_key": "alice:2",
"comment": "@bob Welcome!!!",
"last_modified": "2010/03/17 12:05:00",
"replied_to": "bob:1",
"replied_users": ["bob"],
"posted_by": "alice",
},
{
"_key": "bob:2",
"comment": "@alice Thanks!",
"last_modified": "2010/03/17 13:00:00",
"replied_to": "alice:2",
"replied_users": ["alice"],
"posted_by": "bob",
},
{
"_key": "bob:3",
"comment": "I've just used 'Try-Groonga' now! #groonga",
"last_modified": "2010/03/17 14:00:00",
"hash_tags": ["groonga"],
"location": "146566000x-266422000",
"posted_by": "bob",
},
{
"_key": "bob:4",
"comment": "I'm come at city of New York for development camp! #groonga #travel",
"last_modified": "2010/03/17 14:05:00",
"hash_tags": ["groonga", "travel"],
"location": "146566000x-266422000",
"posted_by": "bob",
},
{
"_key": "charlie:1",
"comment": "@alice @bob I've tried to register!",
"last_modified": "2010/03/17 15:00:00",
"replied_users": ["alice", "bob"],
"location": "146607190x-267021260",
"posted_by": "charlie",
}
{
"_key": "charlie:2",
"comment": "I'm at the Museum of Modern Art in NY now!",
"last_modified": "2010/03/17 15:05:00",
"location": "146741340x-266319590",
"posted_by": "charlie",
}
]

follower column and favorites column in Users table and replied_users column in Comments
table are vector column, so specify the value as an array.

location column in Users table, location column in Comments table use GeoPoint type. This
type accepts "[latitude]x[longitude]".

last_modified column in Comments table use Time type.

There are two way to specify the value. First, specify epoch (seconds since Jan, 1, 1970
AM 00:00:00) directly. In this case, you can specify micro seconds as fractional part.
The value is converted from factional part to the time which is micro seconds based one
when data is loaded. The second, specify the timestamp as string in following format:
"(YEAR)/(MONTH)/(DAY) (HOUR):(MINUTE):(SECOND)". In this way, the string is casted to
proper micro seconds when data is loaded.

Search
Let's search micro-blog.

Search users by keyword
In this section, we search micro-blog against multiple column by keyword. See
match_columns to search multiple column at once.

Let's search user from micro-blog's user name, location, description entries.

Execution example:

select --table Users --match_columns name,location_str,description --query "New York" --output_columns _key,name
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
# [[0, 1337566253.89858, 0.000355720520019531], 8]
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# "bob",
# "Bob"
# ]
# ]
# ]
# ]

By using "New York" as searching keyword for user, "Bob" who lives in "New York" is listed
in search result.

Search users by geolocation data (GeoPoint)
In this section, we search users by column data which use type of GeoPoint. See search
about GeoPoint column.

Following example searches users who live in within 20km from specified location.

Execution example:

select --table Users --filter 'geo_in_circle(location,"146710080x-266315480",20000)' --output_columns _key,name
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# "charlie",
# "Charlie"
# ],
# [
# "bob",
# "Bob"
# ]
# ]
# ]
# ]

It shows that "Bob" and "Charlie" lives in within 20 km from station of "Grand Central
Terminal".

Search users who follows specific user
In this section, we do reverse resolution of reference relationships which is described at
index.

Following examples shows reverse resolution about follower column of Users table.

Execution example:

select --table Users --query follower:@bob --output_columns _key,name
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# "alice",
# "Alice"
# ],
# [
# "charlie",
# "Charlie"
# ]
# ]
# ]
# ]

It shows that "Alice" and "Charlie" follows "Bob".

Search comments by using the value of GeoPoint type
In this section, we search comments which are written within specific location.

Then, we also use drill down which is described at drilldown. Following example shows how
to drill down against search results. As a result, we get the value of count which is
grouped by user, and hash tags respectively.

Execution example:

select --table Comments --filter 'geo_in_circle(location,"146867000x-266280000",20000)' --output_columns posted_by.name,comment --drilldown hash_tags,posted_by
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "posted_by.name",
# "ShortText"
# ],
# [
# "comment",
# "ShortText"
# ]
# ],
# [
# "Charlie",
# "I'm at the Museum of Modern Art in NY now!"
# ],
# [
# "Bob",
# "I've just used 'Try-Groonga' now! #groonga"
# ],
# [
# "Bob",
# "I'm come at city of New York for development camp! #groonga #travel"
# ],
# [
# "Charlie",
# "@alice @bob I've tried to register!"
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "groonga",
# 2
# ],
# [
# "travel",
# 1
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "charlie",
# 2
# ],
# [
# "bob",
# 2
# ]
# ]
# ]
# ]

Above query searches comments which are posted within 20 km from Central Park in city of
New York.

As specified range is 20 km, all comments with location are collected. You know that
search results contain 2 #groonga hash tags and one #travel hash tag, and bob and charlie
posted 2 comments.

Search comments by keyword
In this section, we search comments which contains specific keyword. And more, Let's
calculate the value of _score which is described at search.

Execution example:

select --table Comments --query comment:@Now --output_columns comment,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "comment",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "I've just used 'Try-Groonga' now! #groonga",
# 1
# ],
# [
# "I'm at the Museum of Modern Art in NY now!",
# 1
# ]
# ]
# ]
# ]

By using 'Now' as a keyword, above query returns 2 comments. It also contains count of
'Now' as the value of _score.

Search comments by keyword and geolocation
In this section, we search comments by specific keyword and geolocation. By using --query
and --filter option, following query returns records which are matched to both conditions.

Execution example:

select --table Comments --query comment:@New --filter 'geo_in_circle(location,"146867000x-266280000",20000)' --output_columns posted_by.name,comment --drilldown hash_tags,posted_by
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "posted_by.name",
# "ShortText"
# ],
# [
# "comment",
# "ShortText"
# ]
# ],
# [
# "Bob",
# "I'm come at city of New York for development camp! #groonga #travel"
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "groonga",
# 1
# ],
# [
# "travel",
# 1
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "bob",
# 1
# ]
# ]
# ]
# ]

It returns 1 comment which meets both condition. It also returns result of drilldown.
There is 1 comment which is commented by Bob.

Search comments by hash tags
In this section, we search comments which contains specific hash tags. Let's use reverse
resolution of reference relationships.

Execution example:

select --table Comments --query hash_tags:@groonga --output_columns posted_by.name,comment --drilldown posted_by
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "posted_by.name",
# "ShortText"
# ],
# [
# "comment",
# "ShortText"
# ]
# ],
# [
# "Bob",
# "I've just used 'Try-Groonga' now! #groonga"
# ],
# [
# "Bob",
# "I'm come at city of New York for development camp! #groonga #travel"
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "bob",
# 2
# ]
# ]
# ]
# ]

Above query returns 2 comments which contains #groonga hash tag. It also returns result
of drilldown grouped by person who posted it. It shows that there are 2 comments. Bob
commented it.

Search comments by user id
In this section, we search comments which are posted by specific user.

Execution example:

select --table Comments --query posted_by:bob --output_columns comment --drilldown hash_tags
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "comment",
# "ShortText"
# ]
# ],
# [
# "First post. test,test..."
# ],
# [
# "@alice Thanks!"
# ],
# [
# "I've just used 'Try-Groonga' now! #groonga"
# ],
# [
# "I'm come at city of New York for development camp! #groonga #travel"
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "groonga",
# 2
# ],
# [
# "travel",
# 1
# ]
# ]
# ]
# ]

Above query returns 4 comments which are posted by Bob. It also returns result of
drilldown by hash tags. There are 2 comments which contains #groonga, and 1 comment which
contains #travel as hash tag.

Search user's favorite comments
In this section, we search user's favorite comments.

Execution example:

select --table Users --query _key:bob --output_columns favorites.posted_by,favorites.comment
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "favorites.posted_by",
# "Users"
# ],
# [
# "favorites.comment",
# "ShortText"
# ]
# ],
# [
# [
# "alice",
# "charlie"
# ],
# [
# "I've created micro-blog!",
# "@alice @bob I've tried to register!"
# ]
# ]
# ]
# ]
# ]

Above query returns Bob's favorite comments.

Search comments by posted time
In this section, we search comments by posted time. See type of Time in data.

Let's search comments that posted time are older than specified time.

Execution example:

select Comments --filter 'last_modified<=1268802000' --output_columns posted_by.name,comment,last_modified --drilldown hash_tags,posted_by
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "posted_by.name",
# "ShortText"
# ],
# [
# "comment",
# "ShortText"
# ],
# [
# "last_modified",
# "Time"
# ]
# ],
# [
# "Alice",
# "I've created micro-blog!",
# 1268795100.0
# ],
# [
# "Bob",
# "First post. test,test...",
# 1268794800.0
# ],
# [
# "Alice",
# "@bob Welcome!!!",
# 1268795100.0
# ],
# [
# "Bob",
# "@alice Thanks!",
# 1268798400.0
# ],
# [
# "Bob",
# "I've just used 'Try-Groonga' now! #groonga",
# 1268802000.0
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "groonga",
# 1
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "alice",
# 2
# ],
# [
# "bob",
# 3
# ]
# ]
# ]
# ]

Above query returns 5 comments which are older than 2010/03/17 14:00:00. It also returns
result of drilldown by posted person. There are 2 comments by Alice, 3 comments by Bob.

Query expansion
Groonga accepts query_expander parameter for /reference/commands/select command. It
enables you to extend your query string.

For example, if user searches "theatre" instead of "theater", query expansion enables to
return search results of "theatre OR theater". This kind of way reduces search leakages.
This is what really user wants.

Preparation
To use query expansion, you need to create table which stores documents, synonym table
which stores query string and replacement string. In synonym table, primary key
represents original string, the column of ShortText represents modified string.

Let's create document table and synonym table.

Execution example:

table_create Doc TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Doc body COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Term TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Term Doc_body COLUMN_INDEX|WITH_POSITION Doc body
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Synonym TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Synonym body COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Doc
[
{"_key": "001", "body": "Play all night in this theater."},
{"_key": "002", "body": "theatre is British spelling."},
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
load --table Synonym
[
{"_key": "theater", "body": "(theater OR theatre)"},
{"_key": "theatre", "body": "(theater OR theatre)"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

In this case, it doesn't occur search leakage because it creates synonym table which
accepts "theatre" and "theater" as query string.

Search
Then, let's use prepared synonym table. First, use select command without query_expander
parameter.

Execution example:

select Doc --match_columns body --query "theater"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 1,
# "001",
# "Play all night in this theater."
# ]
# ]
# ]
# ]
select Doc --match_columns body --query "theatre"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 2,
# "002",
# "theatre is British spelling."
# ]
# ]
# ]
# ]

Above query returns the record which completely equal to query string.

Then, use query_expander parameter against body column of Synonym table.

Execution example:

select Doc --match_columns body --query "theater" --query_expander Synonym.body
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 1,
# "001",
# "Play all night in this theater."
# ],
# [
# 2,
# "002",
# "theatre is British spelling."
# ]
# ]
# ]
# ]
select Doc --match_columns body --query "theatre" --query_expander Synonym.body
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 1,
# "001",
# "Play all night in this theater."
# ],
# [
# 2,
# "002",
# "theatre is British spelling."
# ]
# ]
# ]
# ]

In which cases, query string is replaced to "(theater OR theatre)", thus synonym is
considered for full text search.

SERVER


Server packages
The package groonga is the minimum set of fulltext search engine. If you want to use
groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (/spec/gqtp protocol based server package)

There is the reason why groonga supports not only GQTP but also two HTTP server packages.
/spec/gqtp - Groonga Query Transfer Protocol is desined to reduce overheads and improve
performance. But, GQTP is less support of client library than HTTP protocol does. As HTTP
is matured protocol, you can take advantage of existing tool and there are many client
library (See related projects for details). If you use groonga-httpd package, you can
also take benefits of nginx functionality.

We recommend to use groonga-httpd at first, because it provides fullfilling server
functionality. If you have performance issues which is derived from protocol overheads,
consider to use groonga-server-gqtp.

NOTE:
In the previous versions, there is a groonga-server-http package (simple HTTP
protocol based server package). It is now marked as obsolete, please use
groonga-httpd packages instead. groonga-server-http package became a transitional
package for groonga-httpd.

groonga-httpd
groonga-httpd is a nginx and HTTP protocol based server package.

Preconfigured setting:

┌───────────────────┬───────────────────────────────────────┐
│Item │ Default value │
├───────────────────┼───────────────────────────────────────┤
│Port number │ 10041 │
├───────────────────┼───────────────────────────────────────┤
│Access log path │ /var/log/groonga/httpd/acccess.log │
├───────────────────┼───────────────────────────────────────┤
│Error log path │ /var/log/groonga/http-query.log │
├───────────────────┼───────────────────────────────────────┤
│Database │ /var/lib/groonga/db/* │
├───────────────────┼───────────────────────────────────────┤
│Configuration file │ /etc/groonga/httpd/groonga-httpd.conf │
└───────────────────┴───────────────────────────────────────┘

Start HTTP server
Starting groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-httpd start

Starting groonga HTTP server(Fedora):

% sudo systemctl start groonga-httpd

Stop HTTP server
Stopping groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-httpd stop

Starting groonga HTTP server(Fedora):

% sudo systemctl stop groonga-httpd

Restart HTTP server
Restarting groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-httpd restart

Restarting groonga HTTP server(Fedora):

% sudo systemctl restart groonga-httpd

groonga-server-gqtp
groonga-server-gqtp is a /spec/gqtp protocol based server package.

┌────────────┬───────────────────────────────────┐
│Item │ Default value │
├────────────┼───────────────────────────────────┤
│Port number │ 10043 │
├────────────┼───────────────────────────────────┤
│process-log │ /var/log/groonga/groonga-gqtp.log │
├────────────┼───────────────────────────────────┤
│query-log │ /var/log/groonga/gqtp-query.log │
├────────────┼───────────────────────────────────┤
│Database │ /var/lib/groonga/db/* │
└────────────┴───────────────────────────────────┘

Configuration file for server setting (Debian/Ubuntu):

/etc/default/groonga/groonga-server-gqtp

Configuration file for server setting (CentOS):

/etc/sysconfig/groonga-server-gqtp

Start GQTP server
Starting groonga GQTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-server-gqtp start

Starting groonga GQTP server(Fedora):

% sudo systemctl start groonga-server-gqtp

Stop GQTP server
Stopping groonga GQTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-server-http stop

Stopping groonga GQTP server(Fedora):

% sudo systemctl stop groonga-server-gqtp

Restart GQTP server
Restarting groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-server-gqtp restart

Restarting groonga HTTP server(Fedora):

% sudo systemctl restart groonga-server-gqtp

groonga-server-http
groonga-server-http is a simple HTTP protocol based server package.

NOTE:
groonga-server-http package is the transitional package since Groonga 4.0.8.
Please use groonga-httpd instead.

Preconfigured setting:

┌────────────┬───────────────────────────────────┐
│Item │ Default value │
├────────────┼───────────────────────────────────┤
│Port number │ 10041 │
├────────────┼───────────────────────────────────┤
│process-log │ /var/log/groonga/groonga-http.log │
├────────────┼───────────────────────────────────┤
│query-log │ /var/log/groonga/http-query.log │
├────────────┼───────────────────────────────────┤
│Database │ /var/lib/groonga/db/* │
└────────────┴───────────────────────────────────┘

Configuration file for server setting (Debian/Ubuntu):

/etc/default/groonga/groonga-server-http

Configuration file for server setting (CentOS):

/etc/sysconfig/groonga-server-http

Start HTTP server
Starting groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-server-http start

Starting groonga HTTP server(Fedora):

% sudo systemctl start groonga-server-http

Stop HTTP server
Stopping groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-server-http stop

Stopping groonga HTTP server(Fedora):

% sudo systemctl stop groonga-server-http

Restart HTTP server
Restarting groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-server-http restart

Restarting groonga HTTP server(Fedora):

% sudo systemctl restart groonga-server-http

HTTP
Groonga provides two HTTP server implementations.

· http/groonga

· http/groonga-httpd

http/groonga is a simple implemntation. It is fast but doesn't have many HTTP features. It
is convenient to try Groonga because it requires just a few command line options to run.

http/groonga-httpd is a nginx based implementation. It is also fast and has many HTTP
features.

Comparison
There are many differences between groonga and groonga-httpd. Here is a comparison table.

┌─────────────────────────┬────────────────────────┬──────────────────────┐
│ │ groonga │ groonga-httpd │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Performance │ o │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Using multi CPU cores │ o (by multi threading) │ o (by multi process) │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Configuration file │ optional │ required │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Custom prefix path │ x │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Custom command version │ o │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Multi databases │ x │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Authentication │ x │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Gzip compression │ x │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│POST │ o │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│HTTPS │ x │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Access log │ x │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Upgrading without │ x │ o │
│downtime │ │ │
└─────────────────────────┴────────────────────────┴──────────────────────┘

Performance
Both groonga and groonga-httpd are very fast. They can work with the same throughput.

Using multi CPU cores
Groonga scales on multi CPU cores. groonga scales by multi threading. groonga-httpd scales
by multi processes.

groonga uses the same number of threads as CPU cores by default. If you have 8 CPU cores,
8 threads are used by default.

groonga-httpd uses 1 process by default. You need to set worker_processes directive to use
CPU cores. If you have 8 CPU cores, specify worker_processes 8 in configuration file like
the following:

worker_processes 8;

http {
# ...
}

Configuration file
groonga can work without configuration file. All configuration items such as port number
and the max number of threads can be specified by command line. Configuration file is also
used to specify configuration items.

It's very easy to run groonga HTTP server because groonga requires just a few options to
run. Here is the most simple command line to start HTTP server by groonga:

% groonga --protocol http -d /PATH/TO/DATABASE

groonga-httpd requires configuration file to run. Here is the most simple configuration
file to start HTTP server by groonga-httpd:

events {
}

http {
server {
listen 10041;

location /d/ {
groonga on;
groonga_database /PATH/TO/DATABASE;
}
}
}

Custom prefix path
groonga accepts a path that starts with /d/ as command URL such as
http://localhost:10041/d/status. You cannot change the prefix path /d/.

groonga-httpd can custom prefix path. For example, you can use
http://localhost:10041/api/status as command URL. Here is a sample configuration to use
/api/ as prefix path:

events {
}

http {
server {
listen 10041;

location /api/ { # <- change this
groonga on;
groonga_database /PATH/TO/DATABASE;
}
}
}

Custom command version
Groonga has /reference/command/command_version mechanism. It is for upgrading groonga
commands with backward compatibility.

groonga can change the default command veresion by --default-command-version option. Here
is a sample command line to use command version 2 as the default command version:

% groonga --protocol http --default-command-version 2 -d /PATH/TO/DATABASE

groonga-httpd cannot custom the default command version yet. But it will be supported
soon. If it is supported, you can provides different command version groonga commands in
the same groonga-httpd process. Here is a sample configuration to provide command version
1 commands under /api/1/ and command version 2 comamnds under /api/2/:

events {
}

http {
server {
listen 10041;

groonga_database /PATH/TO/DATABASE;

location /api/1/ {
groonga on;
groogna_default_command_version 1;
}

location /api/2/ {
groonga on;
groogna_default_command_version 2;
}
}
}

Multi databases
groonga can use only one database in a process.

groonga-httpd can use one or more databases in a process. Here is a sample configuration
to provide /tmp/db1 database under /db1/ path and /tmp/db2 database under /db2/ path:

events {
}

http {
server {
listen 10041;

location /db1/ {
groonga on;
groonga_database /tmp/db1;
}

location /db2/ {
groonga on;
groonga_database /tmp/db2;
}
}
}

Authentication
HTTP supports authentications such as basic authentication and digest authentication. It
can be used for restricting use of danger command such as /reference/commands/shutdown.

groonga doesn't support any authentications. To restrict use of danger command, other
tools such as iptables and reverse proxy are needed.

groonga-httpd supports basic authentication. Here is a sample configuration to restrict
use of /reference/commands/shutdown command:

events {
}

http {
server {
listen 10041;

groonga_database /PATH/TO/DATABASE;

location /d/shutdown {
groonga on;
auth_basic "manager is required!";
auth_basic_user_file "/etc/managers.htpasswd";
}

location /d/ {
groonga on;
}
}
}

Gzip compression
HTTP supports response compression by gzip with Content-Encoding: gzip response header. It
can reduce network flow. It is useful for large search response.

groonga doesn't support compression. To support compression, reverse proxy is needed.

groonga-httpd supports gzip compression. Here is a sample configuration to compress
response by gzip:

events {
}

http {
server {
listen 10041;

groonga_database /PATH/TO/DATABASE;

location /d/ {
groonga on;
gzip on;
gzip_types *;
}
}
}

Note that gzip_types * is specified. It's one of the important configuration. gzip_types
specifies gzip target data formats by MIME types. groonga-httpd returns one of JSON, XML
or MessagePack format data. But those formats aren't included in the default value of
gzip_types. The default value of gzip_types is text/html.

To compress response data from groonga-httpd by gzip, you need to specify gzip_types * or
gzip_types application/json text/xml application/x-msgpack explicitly. gzip_types * is
recommended. There are two reasons for it. The first, groonga may support more formats in
the future. The second, all requests for the location are processed by groonga. You don't
need to consider about other modules.

POST
You can load your data by POST JSON data. You need follow the following rules to use
loading by POST.

· Content-Type header value must be application/json.

· JSON data is sent as body.

· Table name is specified by query parameter such as table=NAME.

Here is an example curl command line that loads two users alice and bob to Users table:

% curl --data-binary '[{"_key": "alice"}, {"_key": "bob"}]' -H "Content-Type: application/json" "http://localhost:10041/d/load?table=Users"

HTTPS
TODO

Access log
TODO

Upgrading without downtime
TODO

groonga
TODO

groonga-httpd
TODO

GQTP
Summary
GQTP is the acronym standing for "Groonga Query Transfer Protocol".

GQTP is a protocol designed for Groonga. It's a stateful protocol. You can send multiple
commands in one session.

GQTP will be faster rather than /server/http when you send many light commands like
/reference/commands/status. GQTP will be almost same performance as HTTP when you send
heavy commands like /reference/commands/select.

We recommend that you use HTTP for many cases. Because there are many HTTP client
libraries.

If you want to use GQTP, you can use the following libraries:

· Ruby: groonga-client

· Python: poyonga

· Go: goroo

· PHP: proonga

· C/C++: Groonga (Groonga can be also used as library)

It's not a library but you can use /reference/executables/groonga as a GQTP client.

How to run
/reference/executables/groonga is a GQTP server implementation. You can run a Groonga
server by the following command line:

groonga --protocol gqtp -s [options] DB_PATH

You can run a Groonga server as a daemon by the following command line:

groonga --protocol gqtp -d [options] DB_PATH

See /reference/executables/groonga for available options.

Memcached binary protocol
Groonga supports the memcached binary protocol. The following form shows how to run
Groonga as a memcached binary protocol server daemon.

Form:

groonga [-p PORT_NUMBER] -d --protocol memcached DB_PATH

The --protocol option and its argument specify the protocol of the server. "memcached"
specifies to use the memcached binary protocol.

You don't need to create a table. When Groonga receives a request, it creates a table
automatically. The table name will be Memcache .

CLIENT


Groonga supports the original protocol (/spec/gqtp), the memcached binary protocol and
HTTP.

As HTTP and memcached binary protocol is matured protocol, you can use existing client
libraries.

There are some client libraries which provides convenient API to connect to Groonga server
in some program languages. See Client libraries for details.

REFERENCE MANUAL


Executables
This section describes executable files provided by groonga package.

grndb
Summary
NOTE:
This executable command is an experimental feature.

New in version 4.0.9.

grndb manages a Groonga database.

Here are features:

· Checks whether database is broken or not.

· Recovers broken database automatically if the database is recoverable.

Syntax
grndb requires command and database path:

grndb COMMAND [OPTIONS] DATABASE_PATH

Here are available commands:

· check - Checks whether database is broken or not.

· recover - Recovers database.

Usage
Here is an example to check the database at /var/lib/groonga/db/db:

% grndb check /var/lib/groonga/db/db

Here is an example to recover the database at /var/lib/groonga/db/db:

% grndb recover /var/lib/groonga/db/db

Commands
This section describes available commands.

check
It checks an existing Groonga database. If the database is broken, grndb reports reasons
and exits with non-0 exit status.

NOTE:
You must not use this command for opened database. If the database is opened, this
command may report wrong result.

check has some options.

--target
New in version 5.1.2.

It specifies a check target object.

If your database is large and you know an unreliable object, this option will help you.
check need more time for large database. You can reduce check time by --target option to
reduce check target.

The check target is checked recursive. Because related objects of unreliable object will
be unreliable.

If the check target is a table, all columns of the table are also checked recursive.

If the check target is a table and its key type is another table, the another table is
also checked recursive.

If the check target is a column and its value type is a table, the table is also checked
recursive.

If the check target is an index column, the table specified as value type and all sources
are also checked recursive.

Here is an example that checks only Entries table and its columns:

% grndb check --target Entries /var/lib/groonga/db/db

Here is an example that checks only Entries.name column:

% grndb check --target Entries.name /var/lib/groonga/db/db

recover
It recovers an existing broken Groonga database.

If the database is not broken, grndb does nothing and exits with 0 exit status.

If the database is broken and one or more index columns are only broken, grndb recovers
these index columns and exists with 0 exit status. It may take a long time for large
indexed data.

If the database is broken and tables or data columns are broken, grndb reports broken
reasons and exits with non-0 exit status. You can know whether the database is recoverable
or not by check command.

NOTE:
You must not use this command for opened database. If the database is opened, this
command may break the database.

grnslap
名前
grnslap - groongaプロセスの通信層のパフォーマンスをチェックするツール

書式
grnslap [options] [dest]

説明
grnslapは、groongaプロセスに対してリクエストを多重に行い、パフォーマンスをチェックするためのツールです。

Groonga独自プロトコルであるGQTPと、httpの両プロトコルでリクエストを行うことができます。また、リクエストの多重度を指定することができます。

クエリの内容を標準入力から与えることができます。実稼動環境でのクエリパタンに近いクエリを標準入力に与えることによって、実稼動環境に近い状態での検証を行うことができます。

現在は、make installしてもインストールは行われない。

オプション
-P リクエストのプロトコルを指定します。

http
httpでリクエストします。対象のhttpのパス群(GETパラメータを含む)をLF区切り形式で標準入力に与えると、それらのパスに順次アクセスします。

gqtp
gqtpでリクエストします。gqtpのリクエストをLF区切り形式で標準入力に与えると、それらのリクエストを順次行います。

-m リクエストの多重度を指定します。初期値は10です。

引数
dest 接続先のホスト名とポート番号をを指定します(デフォルト値は'localhost:10041')。ポート番号を指定しない場合には、10041が指定されたものとします。

サンプル
http://localhost:10041/d/status に、多重度100でリクエストを行う。

> yes /d/status | head -n 100 | grnslap -P http -m 100 localhost:10041
2009-11-12 19:34:09.998696|begin: max_concurrency=100 max_tp=10000
2009-11-12 19:34:10.011208|end : n=100 min=46 max=382 avg=0 qps=7992.966190 etime=0.012511

groonga executable file
Summary
groonga executable file provides the following features:

· Fulltext search server

· Fulltext search shell

· Client for Groonga fulltext search server

Groonga can be used as a library. If you want to use Groonga as a library, you need to
write a program in C, C++ and so on. Library use is useful for embedding fulltext search
feature to your application, but it's not easy to use.

You can use groonga executable file to get fulltext search feature.

If you want to try Groonga, fulltext search shell usage is useful. You don't need any
server and client. You just need one terminal. You can try Groonga like the following:

% groonga -n db
> status
[[0,1429687763.70845,0.000115633010864258],{"alloc_count":195,...}]
> quit
%

If you want to create an application that has fulltext search feature, fulltext search
server usage is useful. You can use Groonga as a server like RDBMS (Relational DataBase
Management System). Client-server model is a popular architecture.

Normally, client for Groonga fulltext server usage isn't used.

Syntax
groonga executable file has the following four modes:

· Standalone mode

· Server mode

· Daemon mode

· Client mode

There are common options in these modes. These common options is described later section.

Standalone mode
In standalone mode, groonga executable file runs one or more Groonga /reference/command
against a local Groonga database.

Here is the syntax to run shell that executes Groonga command against temporary database:

groonga [options]

Here is the syntax to create a new database and run shell that executes Groonga command
against the new database:

groonga [options] -n DB_PATH

Here is the syntax to run shell that executes Groonga command against existing database:

groonga [options] DB_PATH

Here is the syntax to run Groonga command against existing database and exit:

groonga [options] DB_PATH COMMAND [command arguments]

Server mode
In server mode, groonga executable file runs as a server. The server accepts connections
from other processes at local machine or remote machine and executes received Groonga
/reference/command against a local Groonga database.

You can choose one protocol from /server/http and /server/gqtp. Normally, HTTP is suitable
but GQTP is the default protocol. This section describes only about HTTP protocol usage.

In server mode, groonga executable file runs in the foreground. If you want to run Groonga
server in the background, see Daemon mode.

Here is the syntax to run Groonga server with temporary database:

groonga [options] --protocol http -s

Here is the syntax to create a new database and run Groonga server with the new database:

groonga [options] --protocol http -s -n DB_PATH

Here is the syntax to run Groonga server with existing database:

groonga [options] --protocol http -s DB_PATH

Daemon mode
In daemon mode, groonga executable file runs as a daemon. Daemon is similar to server but
it runs in the background. See Server mode about server.

Here is the syntax to run Groonga daemon with temporary database:

groonga [options] --protocol http -d

Here is the syntax to create a new database and run Groonga daemon with the new database:

groonga [options] --protocol http -d -n DB_PATH

Here is the syntax to run Groonga daemon with existing database:

groonga [options] --protocol http -d DB_PATH

--pid-path option will be useful for daemon mode.

Client mode
In client mode, groonga executable file runs as a client for GQTP protocol Groonga server.
Its usage is similar to Standalone mode. You can run shell and execute one command. You
need to specify server address instead of local database.

Note that you can use groonga executable file as a client for HTTP protocol Groonga
server.

Here is the syntax to run shell that executes Groonga command against Groonga server that
is running at 192.168.0.1:10043:

groonga [options] -c --host 192.168.0.1 --port 10043

Here is the syntax to run Groonga command against Groonga server that is running at
192.168.0.1:10043 and exit:

groonga [options] -c --host 192.168.0.1 --port 10043 COMMAND [command arguments]

Options
-n Creates new database.

-c Executes groonga command in client mode.

-s Executes groonga command in server mode. Use "Ctrl+C" to stop the groonga process.

-d Executes groonga command in daemon mode. In contrast to server mode, groonga
command forks in daemon mode. For example, to stop local daemon process, use "curl
http://127.0.0.1:10041/d/shutdown".

-e, --encoding <encoding>
Specifies encoding which is used for Groonga database. This option is effective
when you create new Groonga database. This parameter specifies one of the
following values: none, euc, utf8, sjis, latin or koi8r.

-l, --log-level <log level>
Specifies log level. A integer value between 0 and 8. The meaning of value is:

┌──────────┬─────────────┐
│log level │ description │
├──────────┼─────────────┤
│0 │ Nothing │
├──────────┼─────────────┤
│1 │ Emergency │
├──────────┼─────────────┤
│2 │ Alert │
├──────────┼─────────────┤
│3 │ Critical │
├──────────┼─────────────┤
│4 │ Error │
├──────────┼─────────────┤
│5 │ Warning │
├──────────┼─────────────┤
│6 │ Notice │
├──────────┼─────────────┤
│7 │ Info │
├──────────┼─────────────┤
│8 │ Debug │
└──────────┴─────────────┘

-a, --address <ip/hostname>
Deprecated since version 1.2.2: Use --bind-address instead.

--bind-address <ip/hostname>
New in version 1.2.2.

サーバモードかデーモンモードで実行するとき、listenするアドレスを指定します。(デフォルトは
hostname の返すホスト名)

-p, --port <port number>
クライアント、サーバ、またはデーモンモードで使用するTCPポート番号。
(クライアントモードのデフォルトは10043番、サーバ、またはデーモンモードのデフォルトは、HTTPの場合、10041番、GQTPの場合、10043番)

-i, --server-id <ip/hostname>
サーバモードかデーモンモードで実行するとき、サーバのIDとなるアドレスを指定します。(デフォルトは`hostname`の返すホスト名)

-h, --help
ヘルプメッセージを出力します。

--document-root <path>
httpサーバとしてgroongaを使用する場合に静的ページを格納するディレクトリを指定します。

デフォルトでは、データベースを管理するための汎用的なページに対応するファイルが/usr/share/groonga/admin_html以下にインストールされます。このディレクトリをdocument-rootオプションの値に指定して起動した場合、ウェブブラウザでhttp://hostname:port/index.htmlにアクセスすると、ウェブベースのデータベース管理ツールを使用できます。

--protocol <protocol>
http,gqtpのいずれかを指定します。(デフォルトはgqtp)

--log-path <path>
ログを出力するファイルのパスを指定します。(デフォルトは/var/log/groonga/groonga.logです)

--log-rotate-threshold-size <threshold>
New in version 5.0.3.

Specifies threshold for log rotation. Log file is rotated when log file size is
larger than or equals to the threshold (default: 0; disabled).

--query-log-path <path>
クエリーログを出力するファイルのパスを指定します。(デフォルトでは出力されません)

--query-log-rotate-threshold-size <threshold>
New in version 5.0.3.

Specifies threshold for query log rotation. Query log file is rotated when query
log file size is larger than or equals to the threshold (default: 0; disabled).

-t, --max-threads <max threasd>
最大で利用するスレッド数を指定します。(デフォルトはマシンのCPUコア数と同じ数です)

--pid-path <path>
PIDを保存するパスを指定します。(デフォルトでは保存しません)

--config-path <path>
設定ファイルのパスを指定します。設定ファイルは以下のようなフォーマットになります。:

# '#'以降はコメント。
; ';'以降もコメント。

# 'キー = 値'でオプションを指定。
pid-path = /var/run/groonga.pid

# '='の前後の空白はは無視される。↓は↑と同じ意味。
pid-path=/var/run/groonga.pid

# 'キー'は'--XXX'スタイルのオプション名と同じものが使える。
# 例えば、'--pid-path'に対応するキーは'pid-path'。
# ただし、キーが'config-path'のオプションは無視される。

--cache-limit <limit>
キャッシュ数の最大値を指定します。(デフォルトは100です)

--default-match-escalation-threshold <threshold>
検索の挙動をエスカレーションする閾値を指定します。(デフォルトは0です)

Command line parameters
dest 使用するデータベースのパス名を指定します。

クライアントモードの場合は接続先のホスト名とポート番号を指定します(デフォルト値は'localhost:10043')。ポート番号を指定しない場合には、10043が指定されたものとします。

command [args]
スタンドアロンおよびクライアントモードの場合は、実行するコマンドとその引数をコマンドライン引数に指定できます。コマンドライン引数にcommandを与えなかった場合は、標準入力から一行ずつEOFに達するまでコマンド文字列を読み取り、順次実行します。

Command
groongaコマンドを通してデータベースを操作する命令をコマンドと呼びます。コマンドは主にC言語で記述され、groongaプロセスにロードすることによって使用できるようになります。
それぞれのコマンドは一意な名前と、0個以上の引数を持ちます。

引数は以下の2種類の方法のいずれかで指定することができます。:

形式1: コマンド名 値1 値2,..

形式2: コマンド名 --引数名1 値1 --引数名2 値2,..

形式1でコマンドを実行する場合は、定義された順番で値を指定しなければならず、途中の引数の値を省略することはできません。形式2でコマンドを実行する場合は、「--引数名」のように引数の名前を明示しなければならない代わりに、任意の順番で引数を指定することが可能で、途中の引数の指定を省略することもできます。

標準入力からコマンド文字列を与える場合は、コマンド名と引数名と値は、空白(
)で区切ります。空白や、記号「"'()」のうちいずれかを含む値を指定したい場合は、シングルクォート(')かダブルクォート(")で値を囲みます。値として指定する文字列の中では、改行文字は'n'に置き換えて指定します。また、引用符に使用した文字を値の中で指定する場合には、その文字の前にバックスラッシュ('')
を指定します。バックスラッシュ文字自身を値として指定する場合には、その前にバックスラッシュを指定します。

You can write command list with continuous line which is represented by '\' character.:

table_create --name Terms \
--flags TABLE_PAT_KEY \
--key_type ShortText \
--default_tokenizer TokenBigram

Builtin command
以下のコマンドは組み込みコマンドとして予め定義されています。

status groongaプロセスの状態を表示します。

table_list
DBに定義されているテーブルのリストを表示します。

column_list
テーブルに定義されているカラムのリストを表示します。

table_create
DBにテーブルを追加します。

column_create
テーブルにカラムを追加します。

table_remove
DBに定義されているテーブルを削除します。

column_remove
テーブルに定義されているカラムを削除します。

load テーブルにレコードを挿入します。

select テーブルに含まれるレコードを検索して表示します。

define_selector
検索条件をカスタマイズした新たな検索コマンドを定義します。

quit データベースとのセッションを終了します。

shutdown
サーバ(デーモン)プロセスを停止します。

log_level
ログ出力レベルを設定します。

log_put
ログ出力を行います。

clearlock
ロックを解除します。

Usage
新しいデータベースを作成します。:

% groonga -n /tmp/hoge.db quit
%

作成済みのデータベースにテーブルを定義します。:

% groonga /tmp/hoge.db table_create Table 0 ShortText
[[0]]
%

サーバを起動します。:

% groonga -d /tmp/hoge.db
%

httpサーバとして起動します。:

% groonga -d -p 80 --protocol http --document-root /usr/share/groonga/admin_html /tmp/hoge.db
%

サーバに接続し、テーブル一覧を表示します。:

% groonga -c localhost table_list
[[0],[["id","name","path","flags","domain"],[256,"Table","/tmp/hoge.db.0000100",49152,14]]]
%

groonga-benchmark
名前
groonga-benchmark - groongaテストプログラム

書式
groonga-benchmark [options...] [script] [db]

説明
groonga-benchmarkは、groonga汎用ベンチマークツールです。

groongaを単独のプロセスとして利用する場合はもちろん、サーバプログラムとして利用する場合の動作確認や実行速度測定が可能です。

groonga-benchmark用のデータファイルは自分で作成することも既存のものを利用することもできます。既存のデータファイルは、ftp.groonga.orgから必要に応じダウンロードします。そのため、groonga及びgroonga-benchmarkが動作し、インターネットに接続できる環境であればgroongaコマンドの知識がなくてもgroongaの動作を確認できます。

現在は、Linux 及びWindows上で動作します。make installしてもインストールは行われません。

オプション
-i, --host <ip/hostname>
接続するgroongaサーバを、ipアドレスまたはホスト名で指定します。指定先にgroongaサーバが立ち上がっていない場合、接続不能となることに注意してください。このオプションを指定しない場合、groonga-benchmarkは自動的にlocalhostのgroongaサーバを起動して接続します。

-p, --port <port number>
自動的に起動するgroongaサーバ、または明示的に指定した接続先のgroonga
サーバが利用するポート番号を指定します。接続先のgroongaサーバが利用しているポートと、このオプションで指定したポート番号が異なる場合、接続不能となることに注意してください。

--dir ftp.groonga.org に用意されているスクリプトファイルを表示します。

--ftp ftp.groonga.orgとFTP通信を行い、scriptファイルの同期やログファイルの送信を行います。

--log-output-dir
デフォルトでは、groonga-benchmark終了後のログファイルの出力先ははカレントディレクトリです。このオプションを利用すると、任意のディレクトリに出力先を変更することができます。

--groonga <groonga_path>
groongaコマンドのパスを指定します。デフォルトでは、PATHの中からgroongaコマンドを探します。

--protocol <gqtp|http>
groongaコマンドが使うプロトコルとして gqtp または http を指定します。

引数
script groonga-benchmarkの動作方法(以下、groonga-benchmark命令と呼びます)を記述したテキストファイルです。拡張子は.scrです。

db groonga-benchmarkが利用するgroonga
データベースです。指定されたデータベースが存在しない場合、groonga-benchmarkが新規に作成します。またgroonga
サーバを自動的に起動する場合もこの引数で指定したデータベースが利用されます。接続するgroonga
サーバを明示的に指定した場合に利用するデータベースは、接続先サーバが使用中のデータベースになることに注意してください。

使い方
まず、シェル上(Windowsならコマンドプロンプト上)で:

groonga-benchmark test.scr 任意のDB名

とタイプしてください。もしgroonga-benchmarkが正常に動作すれば、:

test-ユーザ名-数字.log

というファイルが作成されるはずです。作成されない場合、このドキュメントの「トラブルシューティング」の章を参照してください。

スクリプトファイル
スクリプトファイルは、groonga-benchmark命令を記述したテキストファイルです。
";"セミコロンを利用して、一行に複数のgroonga-benchmark命令を記述することができます。一行に複数のgroonga-benchmark命令がある場合、各命令は並列に実行されます。
"#"で始まる行はコメントとして扱われます。

groonga-benchmark命令
現在サポートされているgroonga-benchmark命令は以下の11種類です。
do_local コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroonga-benchmark単体で実行します。スレッド数が指定されている場合、複数のスレッドで同じコマンドファイルを同時に実行します。繰り返し数が指定されてい場合、コマンドファイルの内容を繰り返し実行します。スレッド数、繰り返し数とも省略時は1です。1スレッドで複数回動作させたい場合は、do_local
コマンドファイル 1 [繰り返し数]と明示的に指定してください。

do_gqpt コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroongaサーバでGQTP経由で実行します。スレッド数や繰り返し数の意味はdo_localの場合と同じです。

do_http コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroongaサーバでHTTP経由で実行します。スレッド数や繰り返し数の意味はdo_localの場合と同じです。

rep_local コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroonga-benchmark単体で実行し、より詳細な報告を行います。

rep_gqpt コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroongaサーバでGQTP経由で実行し、より詳細な報告を行います。
スレッド数や繰り返し数の意味はdo_localと 同じです。

rep_http コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroongaサーバでHTTP経由で実行し、より詳細な報告を行います。
スレッド数や繰り返し数の意味はdo_localと 同じです。

out_local コマンドファイル 入力ファイル名
コマンドファイルをgroonga-benchmark単体で実行し、各コマンドの実行結果をすべて”出力ファイル"に書きだします。この結果は、test_local, test_gqtp命令で利用します。なおこの命令の「出力ファイル」とは、groonga-benchmark実行時に自動的に作成されるログとは別のものです。groonga-benchmarkではコメントが利用できる以外、:

groonga < コマンドファイル > 出力ファイル

とした場合と同じです。

out_gqtp コマンドファイル 出力ファイル名
コマンドファイルをgroongaサーバでGQTP経由で実行します。その他はout_local命令と同等です。

out_http コマンドファイル 出力ファイル名
コマンドファイルをgroongaサーバでHTTP経由で実行します。その他はout_local命令と同等です。

test_local コマンドファイル 入力ファイル名
コマンドファイルをgroonga-benchmark単体で実行し、各コマンドの実行結果を入力ファイルと比較します。処理時間など本質的要素以外に差分があった場合、差分を、入力ファイル.diffというファイルに書きだします。

コマンドファイル
コマンドファイルは、groonga組み込みコマンドを1行に1つずつ記述したテキストファイルです。拡張子に制限はありません。groonga組み込みコマンドに関しては
/reference/command を参照してください。

サンプル
スクリプトファイルのサンプルです。:

# sample script
rep_local test.ddl
do_local test.load;
do_gqtp test.select 10 10; do_local test.status 10

上記の意味は以下のとおりです。

1行目 コメント行。

2行目 test.ddl というコマンドファイルをgroonga単体で実行し、詳細に報告する。

3行目 test.load
というコマンドファイルをgroonga単体で実行する。(最後の";"セミコロンは複数のgroonga-benchmark命令を記述する場合に必要ですが、この例のように1つのgroonga-benchmark命令を実行する場合に付与しても問題ありません。)

4行目 test.select
というコマンドファイルをgroongaサーバで10個のスレッドで同時に実行する。各スレッドはtest.selectの中身を10回繰り返す。また同時に、groonga単体でtest.statusというコマンドファイルを10個のスレッドで実行する。

特殊命令
スクリプトファイルのコメント行には特殊コマンドを埋め込むことが可能です。現在サポートされている特殊命令は以下の二つです。

#SET_HOST <ip/hostname>
-i,
--hostオプションと同等の機能です。コマンドラインオプションに指定したIPアドレス/ホスト名と、SET_HOSTで指定したIPアドレス/ホスト名が異なる場合、またコマンドラインオプションを指定しなかった場合にもSET_HOSTが優先されます。SET_HOSTを利用した場合、サーバが自動的には起動されないのもコマンドラインオプションで指定した場合と同様です。

#SET_PORT <port number>
-p, --port
オプションと同等の機能です。コマンドラインオプションに指定したポート番号とSET_PORTで指定したポート番号が異なる場合、またコマンドラインオプションを指定しなかった場合にもSET_PORTが優先されます。

特殊命令はスクリプトファイルの任意の場所に書き込むことができます。同一ファイル内に複数回特殊命令を記述した場合、「最後の」特殊命令が有効となります。

例えば、

$ ./groonga-benchmark --port 20010 test.scr testdb

とコマンド上でポートを指定した場合でも、もしtest.scrの中身が

#SET_PORT 10900
rep_local test.ddl
do_local test.load;
rep_gqtp test.select 10 10; rep_local test.status 10
#SET_PORT 10400

であれば、自動的に起動されるgroongaサーバはポート番号10400を利用します。

groonga-benchmark実行結果
groonga-benchmarkが正常に終了すると、(拡張子を除いた)スクリプト名-ユーザ名-実行開始時刻.logという形式のログファイルがカレントディレクトリに作られます。ログファイルは自動的にftp.groonga.org
に送信されます。ログファイルは以下のようなjson形式のテキストです。

[{"script": "test.scr",
"user": "homepage",
"date": "2010-04-14 22:47:04",
"CPU": Intel(R) Pentium(R) 4 CPU 2.80GHz",
"BIT": 32,
"CORE": 1,
"RAM": "975MBytes",
"HDD": "257662232KBytes",
"OS": "Linux 2.4.20-24.7-i686",
"HOST": "localhost",
"PORT": "10041",
"VERSION": "0.1.8-100-ga54c5f8"
},
{"jobs": "rep_local test.ddl",
"detail": [
[0, "table_create res_table --key_type ShortText", 1490, 3086, [0,1271252824.25846,0.00144
7]],
[0, "column_create res_table res_column --type Text", 3137, 5956, [0,1271252824.2601,0.002
741]],
[0, "column_create res_table user_column --type Text", 6020, 8935, [0,1271252824.26298,0.0
02841]],
[0, "column_create res_table mail_column --type Text", 8990, 11925, [0,1271252824.26595,0.
002861]],
[0, "column_create res_table time_column --type Time", 12008, 13192, [0,1271252824.26897,0
.001147]],
[0, "status", 13214, 13277, [0,1271252824.27018,3.0e-05]],
[0, "table_create thread_table --key_type ShortText", 13289, 14541, [0,1271252824.27025,0.
001213]],
[0, "column_create thread_table thread_title_column --type ShortText", 14570, 17380, [0,12
71252824.27153,0.002741]],
[0, "status", 17435, 17480, [0,1271252824.2744,2.7e-05]],
[0, "table_create lexicon_table --flags 129 --key_type ShortText --default_tokenizer Token
Bigram", 17491, 18970, [0,1271252824.27446,0.001431]],
[0, "column_create lexicon_table inv_res_column 514 res_table res_column ", 18998, 33248,
[0,1271252824.27596,0.01418]],
[0, "column_create lexicon_table inv_thread_column 514 thread_table thread_title_column ",
33285, 48472, [0,1271252824.29025,0.015119]],
[0, "status", 48509, 48554, [0,1271252824.30547,2.7e-05]]],
"summary" :[{"job": "rep_local test.ddl", "latency": 48607, "self": 47719, "qps": 272.4281
73, "min": 45, "max": 15187, "queries": 13}]},
{"jobs": "do_local test.load; ",
"summary" :[{"job": "do_local test.load", "latency": 68693, "self": 19801, "qps": 1010.049
997, "min": 202, "max": 5453, "queries": 20}]},
{"jobs": "do_gqtp test.select 10 10; do_local test.status 10",
"summary" :[{"job": " do_local test.status 10", "latency": 805990, "self": 737014, "qps":
54.273053, "min": 24, "max": 218, "queries": 40},{"job": "do_gqtp test.select 10 10", "lat
ency": 831495, "self": 762519, "qps": 1967.164097, "min": 73, "max": 135631, "queries": 15
00}]},
{"total": 915408, "qps": 1718.359464, "queries": 1573}]

制限事項
· スクリプトファイルの一行には複数のgroonga-benchmark命令を記述できますが、すべてのスレッド数の合計は最大64までに制限されます。

· コマンドファイル中のgroongaコマンドの長さは最長5000000byteです。

トラブルシューティング
もし、groonga-benchmarkが正常に動作しない場合、まず以下を確認してください。

· インターネットに接続しているか? --ftp
オプションを指定すると、groonga-benchmarkは動作のたびにftp.groonga.orgと通信します。ftp.groonga.orgと通信可能でない場合、groonga-benchmarkは正常に動作しません。

· groonga サーバが動作していないか? groonga-benchmarkは、-i, --host
オプションで明示的にサーバを指定しないかぎり、自動的にlocalhostのgroongaサーバを立ち上げます。すでにgroongaサーバが動作している場合、groonga-benchmarkは正常に動作しない可能性があります。

· 指定したDBが適切か?
groonga-benchmarkは、引数で指定したDBの中身はチェックしません。もし指定されたDBが存在しなければ自動的にDBを作成しますが、もしファイルとして存在する場合は中身に関わらず動作を続けてしまい、結果が異常になる可能性があります。

以上の原因でなければ、問題はgroonga-benchmarkかgroongaにあります。ご報告をお願いします。

groonga-httpd
Summary
groonga-httpd is a program to communicate with a Groonga server using the HTTP protocol.
It functions as same as groonga-server-http. Although groonga-server-http has limited
support for HTTP with a minimal built-in HTTP server, groonga-httpd has full support for
HTTP with an embedded nginx. All standards-compliance and features provided by nginx is
also available in groonga-httpd.

groonga-httpd has an Web-based administration tool implemented with HTML and JavaScript.
You can access to it from http://hostname:port/.

Synopsis
groonga-httpd [nginx options]

Usage
Set up
First, you'll need to edit the groonga-httpd configuration file to specify a database.
Edit /etc/groonga/httpd/groonga-httpd.conf to enable the groonga_database directive like
this:

# Match this to the file owner of groonga database files if groonga-httpd is
# run as root.
#user groonga;
...
http {
...
# Don't change the location; currently only /d/ is supported.
location /d/ {
groonga on; # <= This means to turn on groonga-httpd.

# Specify an actual database and enable this.
groonga_database /var/lib/groonga/db/db;
}
...
}

Then, run groonga-httpd. Note that the control immediately returns back to the console
because groonga-httpd runs as a daemon process by default.:

% groonga-httpd

Request queries
To check, request a simple query (/reference/commands/status).

Execution example:

% curl http://localhost:10041/d/status
[
[
0,
1337566253.89858,
0.000355720520019531
],
{
"uptime": 0,
"max_command_version": 2,
"n_queries": 0,
"cache_hit_rate": 0.0,
"version": "4.0.1",
"alloc_count": 161,
"command_version": 1,
"starttime": 1395806036,
"default_command_version": 1
}
]

Loading data by POST
You can load data by POST JSON data.

Here is an example curl command line that loads two users alice and bob to Users table:

% curl --data-binary '[{"_key": "alice"}, {"_key": "bob"}]' -H "Content-Type: application/json" "http://localhost:10041/d/load?table=Users"

If you loads users from JSON file, prepare JSON file like this:

[
{"_key": "alice"},
{"_key": "bob"}
]

Then specify JSON file in curl command line:

% curl -X POST 'http://localhost:10041/d/load?table=Users' -H 'Content-Type: application/json' -d @users.json

Browse the administration tool
Also, you can browse Web-based administration tool at http://localhost:10041/.

Shut down
Finally, to terminate the running groonga-httpd daemon, run this:

% groonga-httpd -s stop

Configuration directives
This section describes only important directives. They are groonga-httpd specific
directives and performance related directives.

The following directives can be used in the groonga-httpd configuration file. By default,
it's located at /etc/groonga/httpd/groonga-httpd.conf.

Groonga-httpd specific directives
The following directives aren't provided by nginx. They are provided by groonga-httpd to
configure groonga-httpd specific configurations.

groonga
Synopsis:

groonga on | off;

Default
groonga off;

Context
location

Specifies whether Groonga is enabled in the location block. The default is off. You need
to specify on to enable groonga.

Examples:

location /d/ {
groonga on; # Enables groonga under /d/... path
}

location /d/ {
groonga off; # Disables groonga under /d/... path
}

groonga_database
Synopsis:

groonga_database /path/to/groonga/database;

Default
groonga_database /usr/local/var/lib/groonga/db/db;

Context
http, server, location

Specifies the path to a Groonga database. This is the required directive.

groonga_database_auto_create
Synopsis:

groonga_database_auto_create on | off;

Default
groonga_database_auto_create on;

Context
http, server, location

Specifies whether Groonga database is created automatically or not. If the value is on and
the Groonga database specified by groonga_database doesn't exist, the Groonga database is
created automatically. If the Groonga database exists, groonga-httpd does nothing.

If parent directory doesn't exist, parent directory is also created recursively.

The default value is on. Normally, the value doesn't need to be changed.

groonga_base_path
Synopsis:

groonga_base_path /d/;

Default
The same value as location name.

Context
location

Specifies the base path in URI. Groonga uses /d/command?parameter1=value1&... path to run
command. The form of path in used in groonga-httpd but groonga-httpd also supports
/other-prefix/command?parameter1=value1&... form. To support the form, groonga-httpd
removes the base path from the head of request URI and prepend /d/ to the processed
request URI. By the path conversion, users can use custom path prefix and Groonga can
always uses /d/command?parameter1=value1&... form.

Nomally, this directive isn't needed. It is needed for per command configuration.

Here is an example configuration to add authorization to /reference/commands/shutdown
command:

groonga_database /var/lib/groonga/db/db;

location /d/shutdown {
groonga on;
# groonga_base_path is needed.
# Because /d/shutdown is handled as the base path.
# Without this configuration, /d/shutdown/shutdown path is required
# to run shutdown command.
groonga_base_path /d/;
auth_basic "manager is required!";
auth_basic_user_file "/etc/managers.htpasswd";
}

location /d/ {
groonga on;
# groonga_base_path doesn't needed.
# Because location name is the base path.
}

groonga_log_path
Synopsis:

groonga_log_path path | off;

Default
/var/log/groonga/httpd/groonga.log

Context
http, server, location

Specifies Groonga log path in the http, server or location block. The default is
/var/log/groonga/httpd/groonga.log. You can disable logging to specify off.

Examples:

location /d/ {
groonga on;
# You can disable log for groonga.
groonga_log_path off;
}

groonga_log_level
Synopsis:

groonga_log_level none | emergency | alert | ciritical | error | warning | notice | info | debug | dump;

Default
notice

Context
http, server, location

Specifies Groonga log level in the http, server or location block. The default is notice.
You can disable logging by specifying none as log level.

Examples:

location /d/ {
groonga on;
# You can customize log level for groonga.
groonga_log_level notice;
}

groonga_query_log_path
Synopsis:

groonga_query_log_path path | off;

Default
/var/log/groonga/httpd/groonga-query.log

Context
http, server, location

Specifies Groonga's query log path in the http, server or location block. The default is
/var/log/groonga/httpd/groonga-query.log. You can disable logging to specify off.

Examples:

location /d/ {
groonga on;
# You can disable query log for groonga.
groonga_query_log_path off;
}

Query log is useful for the following cases:

· Detecting slow query.

· Debugging.

You can analyze your query log by groonga-query-log package. The package provides useful
tools.

For example, there is a tool that analyzing your query log. It can detect slow queries
from your query log. There is a tool that replaying same queries in your query log. It can
test the new Groonga before updating production environment.

Performance related directives
The following directives are related to the performance of groonga-httpd.

worker_processes
For optimum performance, set this to be equal to the number of CPUs or cores. In many
cases, Groonga queries may be CPU-intensive work, so to fully utilize multi-CPU/core
systems, it's essential to set this accordingly.

This isn't a groonga-httpd specific directive, but an nginx's one. For details, see
http://wiki.nginx.org/CoreModule#worker_processes.

By default, this is set to 1. It is nginx's default.

groonga_cache_limit
This directive is introduced to customize cache limit for each worker process.

Synopsis:

groonga_cache_limit CACHE_LIMIT;

Default
100

Context
http, server, location

Specifies Groonga's limit of query cache in the http, server or location block. The
default value is 100. You can disable query cache to specify 0 to groonga_cache_limit
explicitly.

Examples:

location /d/ {
groonga on;
# You can customize query cache limit for groonga.
groonga_cache_limit 100;
}

proxy_cache
In short, you can use nginx's reverse proxy and cache mechanism instead of Groonga's
built-in query cache feature.

Query cache
Groonga has query cache feature for /reference/commands/select command. The feature
improves performance in many cases.

Query cache feature works well on groonga-httpd except you use
/reference/commands/cache_limit command on 2 or more workers. Normally,
/reference/commands/cache_limit command isn't used. So there is no problem on many cases.

Here is a description about a problem of using /reference/commands/cache_limit command on
2 or more workers.

Groonga's query cache is available in the same process. It means that workers can't share
the cache. If you don't change cache size, it isn't a big problem. If you want to change
cache size by /reference/commands/cache_limit command, there is a problem.

There is no portable ways to change cache size for all workers.

For example, there are 3 workers:

+-- worker 1
client -- groonga-httpd (master) --+-- worker 2
+-- worker 3

The client requests /reference/commands/cache_limit command and the worker 1 receives it:

+-> worker 1 (changed!)
client -> groonga-httpd (master) --+-- worker 2
+-- worker 3

The client requests /reference/commands/cache_limit command again and the worker 1
receives it again:

+-> worker 1 (changed again!!!)
client -> groonga-httpd (master) --+-- worker 2
+-- worker 3

In this case, the worker 2 and the worker 3 aren't received any requests. So they don't
change cache size.

You can't choose a worker. So you can't change cache sizes of all workers by
/reference/commands/cache_limit command.

Reverse proxy and cache
You can use nginx's reverse proxy and cache feature for query cache:

+-- worker 1
client -- groonga-httpd (master) -- reverse proxy + cache --+-- worker 2
+-- worker 3

You can use the same cache configuration for all workers but you can't change cache
configuration dynamically by HTTP.

Here is a sample configuration:

...
http {
proxy_cache_path /var/cache/groonga-httpd levels=1:2 keys_zone=groonga:10m;
proxy_cache_valid 10m;
...
# Reverse proxy and cache
server {
listen 10041;
...
# Only select command
location /d/select {
# Pass through groonga with cache
proxy_cache groonga;
proxy_pass http://localhost:20041;
}

location / {
# Pass through groonga
proxy_pass http://localhost:20041;
}
}

# groonga
server {
location 20041;
location /d/ {
groonga on;
groonga_database /var/lib/groonga/db/db;
}
}
...
}

See the following nginx documentations for parameter details:

· http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path

· http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_valid

· http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache

· http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass

Note that you need to remove cache files created by nginx by hand after you load new data
to Groonga. For the above sample configuration, run the following command to remove cache
files:

% groonga DB_PATH < load.grn
% rm -rf /var/cache/groonga-httpd/*

If you use Groonga's query cache feature, you don't need to expire cache by hand. It is
done automatically.

Available nginx modules
All standard HTTP modules are available. HttpRewriteModule is disabled when you don't have
PCRE (Perl Compatible Regular Expressions). For the list of standard HTTP modules, see
http://wiki.nginx.org/Modules.

Groonga HTTP server
Name
Groonga HTTP server

Synopsis
groonga -d --protocol http DB_PATH

Summary
You can communicate by HTTP if you specify http to --protocol option. And output a file
that is put under the path, and correspond to specified URI to HTTP request if you specify
static page path by --document-root.

Groonga has an Web-based administration tool implemented with HTML and JavaScript. If you
don't specify --document-root, regarded as administration tool installed path is
specified, so you can use administration tool to access http://HOSTNAME:PORT/ in Web
browser.

Command
You can use the same commands of Groonga that starts of the other mode to Groonga server
that starts to specify http.

A command takes the arguments. An argument has a name. And there are special arguments
output_type and command_version.

In standalone mode or client mode, a command is specified by the following format.
Format 1: COMMAND_NAME VALUE1 VALUE2,..

Format 2: COMMAND_NAME --PARAMETER_NAME1 VALUE1 --PARAMETER_NAME2 VALUE2,..

Format 1 and Format 2 are possible to mix. Output type is specified by output_type in the
formats.

In HTTP server mode, the following formats to specify command:

Format: /d/COMMAND_NAME.OUTPUT_TYPE?ARGUMENT_NAME1=VALUE1&ARGUMENT_NAME2=VALUE2&...

But, they need URL encode for command names, arguments names and values.

You can use GET method only.

You can specify JSON, TSV and XML to output type.

command_version is specified for command specification compatibility. See
/reference/command/command_version for details.

Return value
The execution result is output that follows output type specification by the command.

groonga-suggest-create-dataset
NAME
groonga-suggest-create-dataset - Defines schema for a suggestion dataset

SYNOPSTIS
groonga-suggest-create-dataset [options] DATABASE DATASET

DESCTIPION
groonga-suggest-create-dataset creates a dataset for /reference/suggest. A database has
many datasets. This command just defines schema for a suggestion dataset.

This command generates some tables and columns for /reference/suggest.

Here is the list of such tables. If you specify 'query' as dataset name, following
'_DATASET' suffix are replaced. Thus, 'item_query', 'pair_query', 'sequence_query',
'event_query' tables are generated.

· event_type

· bigram

· kana

· item_DATASET

· pair_DATASET

· sequence_DATASET

· event_DATASET

· configuration

OPTIONS
None.

EXIT STATUS
TODO

FILES
TODO

EXAMPLE
TODO

SEE ALSO
/reference/suggest groonga-suggest-httpd groonga-suggest-learner

groonga-suggest-httpd
Summary
groonga-suggest-httpd is a program to provide interface which accepts HTTP request and
returns suggestion dataset, then saves logs for learning. groonga-suggest-httpd behaves
similar in point of view of suggestion functionality, but the name of parameter is
different.

Synopsis
groonga-suggest-httpd [options] database_path

Usage
Set up
First you need to set up database for suggestion.

Execution example:

% groonga-suggest-create-dataset /tmp/groonga-databases/groonga-suggest-httpd query

Launch groonga-suggest-httpd
Execute groonga-suggest-httpd command:

Execution example:

% groonga-suggest-httpd /tmp/groonga-databases/groonga-suggest-httpd

After executing above command, groonga-suggest-httpd accepts HTTP request on 8080 port.

If you just want to save requests into log file, use -l option.

Here is the example to save log files under logs directory with log prefix for each file.:

% groonga-suggest-httpd -l logs/log /tmp/groonga-databases/groonga-suggest-httpd

Under logs directory, log files such as logYYYYmmddHHMMSS-00 are created.

Request to groonga-suggest-httpd
Here is the sample requests to learn groonga for query dataset:

% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=92619&t=complete&q=g'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=93850&t=complete&q=gr'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=94293&t=complete&q=gro'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=94734&t=complete&q=groo'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=95147&t=complete&q=grooon'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=95553&t=complete&q=groonga'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=95959&t=submit&q=groonga

Options
-p, --port
Specify http server port number. The default value is 8080.

-t, --n-threads
Specify number of threads. The default value is 8. This option accepts 128 as the
max value, but use the number of CPU cores for performance.

-s, --send-endpoint
Specify endpoint for sender.

-r, --receive-endpoint
Specify endpoint for receiver.

-l, --log-base-path
Specify path prefix of log.

--n-lines-per-log-file
Specify the number of lines in a log file. The default value is 1,000,000.

-d, --daemon
Specify this option to daemonize.

--disable-max-fd-check
Specify this option to disable checking max fd on start.

Command line parameters
There is one required parameter - database_path.

database_path
Specifies the path to a Groonga database. This database must be created by
groonga-suggest-create-dataset command because it executes required initialization for
suggestion.

GET parameters
groonga-suggest-httpd accepts following GET parameters.

There are required parameters which depends on type of query.

Required parameters
┌────┬──────────────────────────┬──────┐
│Key │ Description │ Note │
├────┼──────────────────────────┼──────┤
│q │ UTF-8 encoded string │ │
│ │ which user fills in form │ │
├────┼──────────────────────────┼──────┤
│t │ The type of query. The │ │
│ │ value of type must be │ │
│ │ complete, correct, │ │
│ │ suggest or submit. It │ │
│ │ also accepts multiple │ │
│ │ type of query which is │ │
│ │ concatinated by |. Note │ │
│ │ that submit is invalid │ │
│ │ value when you specify │ │
│ │ multiple type of query. │ │
└────┴──────────────────────────┴──────┘

Required parameters for learning
┌────┬──────────────────────────┬──────────────────────────┐
│Key │ Description │ Note │
├────┼──────────────────────────┼──────────────────────────┤
│s │ Elapsed time from 0:00 │ Note that you need │
│ │ January 1, 1970 │ specify the value of s
│ │ │ in milliseconds │
└────┴──────────────────────────┴──────────────────────────┘

│i │ Unique ID to distinct │ Use session ID or IP │
│ │ user │ address for example │
├────┼──────────────────────────┼──────────────────────────┤
│l │ Specify the name of │ Note that dataset name │
│ │ dataset for learning. It │ must be matched to │
│ │ also accepts multiple │ following regular │
│ │ dataset name which is │ expression [A-Za-z
│ │ concatinated by |][A-Za-z0-9 ]{0,15}
└────┴──────────────────────────┴──────────────────────────┘

Required parameters for suggestion
┌────┬──────────────────────────┬──────────────────────────┐
│Key │ Description │ Note │
├────┼──────────────────────────┼──────────────────────────┤
│n │ Specify the name of │ This dataset name is │
│ │ dataset for suggestion │ used to calculate │
│ │ │ suggestion results │
└────┴──────────────────────────┴──────────────────────────┘

Optional parameter
┌─────────┬──────────────────────────┬──────────────────────────┐
│Key │ Description │ Note │
├─────────┼──────────────────────────┼──────────────────────────┤
│callback │ Specify the name of │ The name of function │
│ │ function if you prefer │ must be matched to │
│ │ JSONP as response format │ reqular expression │
│ │ │ [A-Za-z ][A-Za-z0-9
│ │ │ ]{0,15}
└─────────┴──────────────────────────┴──────────────────────────┘

Return value
groonga-suggest-httpd command returns following response in JSON or JSONP format.

In JSON format:

{TYPE: [[CANDIDATE_1, SCORE_1], [CANDIDATE_2, SCORE_2], ... [CANDIDATE_N, SCORE_N]]}

In JSONP format:

FUNCTION({TYPE: [[CANDIDATE_1, SCORE_1], [CANDIDATE_2, SCORE_2], ... [CANDIDATE_N, SCORE_N]]})

TYPE
One of complete, correct and suggest.

CANDIDATE_N
The string of candidate (UTF-8).

SCORE_N
The score.

groonga-suggest-learner
Summary
groonga-suggest-learner is a program to learn suggest result from data which derived from
groonga-suggest-httpd. Usually, it is used with groonga-suggest-httpd, but It is allowed
to launch standalone. In such a case, groonga-suggest-learner loads data from log
directory.

Synopsis
groonga-suggest-learner [options] database_path

Usage
groonga-suggest-leaner supports the two way of learning data. One is learning data from
groonga-suggest-httpd, the other is learning data from already existing log files.

Learning data from groonga-suggest-httpd
Execute groonga-suggest-learner.:

groonga-suggest-learner testdb/db

Learning data from log files
Execute groonga-suggest-learner with -l option.

Here is the sample to load log data under logs directory:

groonga-suggest-learner -l logs testdb/db

Options
-r <endpoint>, --receive-endpoint <endpoint>
Uses <endpoint> as the receiver endpoint.

-s <endpoint>, --send-endpoint <endpoint>
Uses <endpoint> as the sender endpoint.

-d, --daemon
Runs as a daemon.

-l <directory>, --log-base-path <directory>
Reads logs from <directory>.

--log-path <path>
Outputs log to <path>.

--log-level <level>
Uses <level> for log level. <level> must be between 1 and 9. Larger level outputs
more logs.

Parameters
There is one required parameter - database_path.

database_path
Specifies the path to a groonga database.

Related tables
Here is the list of table which learned data is stored. If you specify query as dataset
name, following _DATASET suffix are replaced. Thus, event_query table is used.

· event_DATASET

Output
Groonga supports the following output format types:

· JSON

· XML

· TSV (Tab Separated Values)

· MessagePack

JSON is the default output format.

Usage
Groonga has the following query interfaces:

· command line

· HTTP

They provides different ways to change output format type.

Command line
You can use command line query interface by groonga DB_PATH or groonga -c. Those groonga
commands shows > prompt. In this query interface, you can specify output format type by
output_type option.

If you don't specify output_type option, you will get a result in JSON format:

> status
[[0,1327721628.10738,0.000131845474243164],{"alloc_count":142,"starttime":1327721626,"uptime":2,"version":"1.2.9-92-gb87d9f8","n_queries":0,"cache_hit_rate":0.0,"command_version":1,"default_command_version":1,"max_command_version":2}]

You can specify json as output_type value to get a result in JSON format explicitly:

> status --output_type json
[[0,1327721639.08321,7.93933868408203e-05],{"alloc_count":144,"starttime":1327721626,"uptime":13,"version":"1.2.9-92-gb87d9f8","n_queries":0,"cache_hit_rate":0.0,"command_version":1,"default_command_version":1,"max_command_version":2}]

You need to specify xml as output_type value to get a result in XML format:

> status --output_type xml
<?xml version="1.0" encoding="utf-8"?>
<RESULT CODE="0" UP="1327721649.61095" ELAPSED="0.000126361846923828">
<RESULT>
<TEXT>alloc_count</TEXT>
<INT>146</INT>
<TEXT>starttime</TEXT>
<INT>1327721626</INT>
<TEXT>uptime</TEXT>
<INT>23</INT>
<TEXT>version</TEXT>
<TEXT>1.2.9-92-gb87d9f8</TEXT>
<TEXT>n_queries</TEXT>
<INT>0</INT>
<TEXT>cache_hit_rate</TEXT>
<FLOAT>0.0</FLOAT>
<TEXT>command_version</TEXT>
<INT>1</INT>
<TEXT>default_command_version</TEXT>
<INT>1</INT>
<TEXT>max_command_version</TEXT>
<INT>2</INT></RESULT>
</RESULT>

You need to specify tsv as output_type value to get a result in TSV format:

> status --output_type tsv
0 1327721664.82675 0.000113964080810547
"alloc_count" 146
"starttime" 1327721626
"uptime" 38
"version" "1.2.9-92-gb87d9f8"
"n_queries" 0
"cache_hit_rate" 0.0
"command_version" 1
"default_command_version" 1
"max_command_version" 2
END

You need to specify msgpack as output_type value to get a result in MessagePack format:

> status --output_type msgpack
(... omitted because MessagePack is binary data format. ...)

HTTP
You can use HTTP query interface by groonga --protocol http -s DB_PATH. Groonga HTTP
server starts on port 10041 by default. In this query interface, you can specify output
format type by extension.

If you don't specify extension, you will get a result in JSON format:

% curl http://localhost:10041/d/status
[[0,1327809294.54311,0.00082087516784668],{"alloc_count":155,"starttime":1327809282,"uptime":12,"version":"1.2.9-92-gb87d9f8","n_queries":0,"cache_hit_rate":0.0,"command_version":1,"default_command_version":1,"max_command_version":2}]

You can specify json as extension to get a result in JSON format explicitly:

% curl http://localhost:10041/d/status.json
[[0,1327809319.01929,9.5367431640625e-05],{"alloc_count":157,"starttime":1327809282,"uptime":37,"version":"1.2.9-92-gb87d9f8","n_queries":0,"cache_hit_rate":0.0,"command_version":1,"default_command_version":1,"max_command_version":2}]

You need to specify xml as extension to get a result in XML format:

% curl http://localhost:10041/d/status.xml
<?xml version="1.0" encoding="utf-8"?>
<RESULT CODE="0" UP="1327809339.5782" ELAPSED="9.56058502197266e-05">
<RESULT>
<TEXT>alloc_count</TEXT>
<INT>159</INT>
<TEXT>starttime</TEXT>
<INT>1327809282</INT>
<TEXT>uptime</TEXT>
<INT>57</INT>
<TEXT>version</TEXT>
<TEXT>1.2.9-92-gb87d9f8</TEXT>
<TEXT>n_queries</TEXT>
<INT>0</INT>
<TEXT>cache_hit_rate</TEXT>
<FLOAT>0.0</FLOAT>
<TEXT>command_version</TEXT>
<INT>1</INT>
<TEXT>default_command_version</TEXT>
<INT>1</INT>
<TEXT>max_command_version</TEXT>
<INT>2</INT></RESULT>
</RESULT>

You need to specify tsv as extension to get a result in TSV format:

% curl http://localhost:10041/d/status.tsv
0 1327809366.84187 8.44001770019531e-05
"alloc_count" 159
"starttime" 1327809282
"uptime" 84
"version" "1.2.9-92-gb87d9f8"
"n_queries" 0
"cache_hit_rate" 0.0
"command_version" 1
"default_command_version" 1
"max_command_version" 2
END

You need to specify msgpack as extension to get a result in MessagePack format:

% curl http://localhost:10041/d/status.msgpack
(... omitted because MessagePack is binary data format. ...)

Command
Command is the most important processing unit in query API. You request a processing to
groonga by a command.

This section describes about command and built-in commands.

Command version
概要
Groonga1.1からコマンドバージョンという概念が導入されます。コマンドバージョンは、selectやloadなどのGroongaのコマンドの仕様の互換性を表します。Groongaパッケージのバージョンが新しくなったとしても、同一のコマンドバージョンが使用可能であるなら、すべてのコマンドについて互換性が保証されます。コマンドバージョンが異なれば、同じ名前のコマンドであっても、動作に互換性がない可能性があります。

あるバージョンのGroongaは、二つのコマンドバージョンを同時にサポートするようになります。
使用するコマンドバージョンは、groongaを起動する際のコマンドラインオプションないしコンフィグファイルにdefault-commnad-versionパラメータを与えることによって指定できます。また、個々のコマンドを実行する際に、command_versionパラメータを与えることによっても指定することができます。

コマンドバージョンは1からはじまり、更新されるたびに1ずつ大きくなります。現状のGroongaのコマンドの仕様はcommand-version
1という扱いになります。次回提供するGroongaは、command-version 1とcommand-version
2の二つをサポートすることになります。

バージョンの位置づけ
あるバージョンのGroongaにおいてサポートされるコマンドバージョンは、develop,
stable,deprecatedのいずれかの位置づけとなります。

develop
まだ開発中であり、仕様が変更される可能性があります。

stable 使用可能であり仕様も安定しています。その時点で使用することが推奨されます。

deprecated
使用可能であり仕様も安定していますが、廃止予定であり使用が推奨されません。

あるバージョンのGroongaがサポートする二つのコマンドバージョンのうち、いずれか一つが必ずstableの位置づけとなります。残りの一つは、developないしdeprecatedとなります。

たとえば下記のようにGroongaのサポートするコマンドバージョンは推移します。:

groonga1.1: command-version1=stable command-version2=develop
groonga1.2: command-version1=deprecated command-version2=stable
groonga1.3: command-version2=stable command-version3=develop
groonga1.4: command-version2=deprecated command-version3=stable
groonga1.5: command-version3=stable command-version4=develop

あるコマンドバージョンははじめにdevelop扱いとしてリリースされ、やがてstableに移行します。
その後二世代経過するとそのコマンドバージョンはdeprecated扱いとなります。さらに次のコマンドバージョンがリリースされると、deprecatedだったコマンドバージョンはサポート対象外となります。

default-commnad-versionパラメータやcommand_versionパラメータを指定せずにgroongaコマンドを実行した際には、その時点でstableであるコマンドバージョンが指定されたものとみなします。

groongaプロセス起動時に、default-command-versionパラメータにstable扱いでないコマンドバージョンを指定した場合には、警告メッセージがログファイルに出力されます。また、サポート範囲外のコマンドバージョンを指定した場合にはエラーとなり、プロセスは速やかに停止します。

コマンドバージョンの指定方法
コマンドバージョンの指定方法はgroonga実行モジュールの引数として指定する方法と各コマンドの引数として指定する方法があります。

default-command-versionパラメータ
groonga実行モジュールの引数としてdefault-command-versionパラメータを指定できます。
(configファイルの中に指定することも可能です)

実行例:

groonga --default-command-version 1

そのプロセスで実行するすべてのコマンドについて、デフォルトのコマンドバージョンとして指定されたバージョンを使用します。指定されたコマンドバージョンがstableであった場合にはなんのメッセージも表示されずそのまま起動します。指定されたコマンドバージョンがdevelopあるいはdeprecatedであった場合には、groonga.logファイルに警告メッセージを出力します。指定されたコマンドバージョンがサポート対象外であった場合には標準エラー出力にエラーメッセージを出力し、プロセスは速やかに終了します。

command_versionパラメータ
select,loadなどのすべてのgroongaコマンドにcommand_versionが指定できます。

実行例:

select --command_version 1 --table tablename

指定されたコマンドバージョンでコマンドを実行します。指定されたコマンドバージョンがサポート対象外であった場合にはエラーが返されます。command-versionが指定されなかった場合は、当該プロセス起動時にdefault-command-versionに指定した値が指定されたものとみなします。

Output format
Summary
Commands output their result as JSON, MessagePack, XML or TSV format.

JSON and MessagePack output have the same structure. XML and TSV are their original
structure.

JSON or MessagePack is recommend format. XML is useful for visual result check. TSV is
just for special use. Normally you doesn't need to use TSV.

JSON and MessagePack
This secsion describes the structure of command result on JSON and MessagePack format.
JSON is used to show structure because MessagePack is binary format. Binary format isn't
proper for documenataion.

JSON and MessagePack uses the following structure:

[HEADER, BODY]

For example:

[
[
0,
1337566253.89858,
0.000355720520019531
],
[
[
[
1
],
[
[
"_id",
"UInt32"
],
[
"_key",
"ShortText"
],
[
"content",
"Text"
],
[
"n_likes",
"UInt32"
]
],
[
2,
"Groonga",
"I started to use groonga. It's very fast!",
10
]
]
]
]

In the example, the following part is HEADER:

[
0,
1337566253.89858,
0.000355720520019531
]

The following part is BODY:

[
[
[
1
],
[
[
"_id",
"UInt32"
],
[
"_key",
"ShortText"
],
[
"content",
"Text"
],
[
"n_likes",
"UInt32"
]
],
[
2,
"Groonga",
"I started to use groonga. It's very fast!",
10
]
]
]

HEADER
HEADER is an array. The content of HEADER has some patterns.

Success case
HEADER has three elements on success:

[0, UNIX_TIME_WHEN_COMMAND_IS_STARTED, ELAPSED_TIME]

The first element is always 0.

UNIX_TIME_WHEN_COMMAND_IS_STARTED is the number of seconds since 1970-01-01 00:00:00 UTC
when the command is started processing. ELAPSED_TIME is the elapsed time for processing
the command in seconds. Both UNIX_TIME_WHEN_COMMAND_IS_STARTED and ELAPSED_TIME are float
value. The precision of them are nanosecond.

Error case
HEADER has four or five elements on error:

[
RETURN_CODE,
UNIX_TIME_WHEN_COMMAND_IS_STARTED,
ELAPSED_TIME,
ERROR_MESSAGE,
ERROR_LOCATION
]

ERROR_LOCATION may not be included in HEADER but other four elements are always included.

RETURN_CODE is non 0 value. See return_code about available return codes.

UNIX_TIME_WHEN_COMMAND_IS_STARTED and ELAPSED_TIME are the same as success case.

ERROR_MESSAGE is an error message in string.

ERROR_LOCATION is optional. If error location is collected, ERROR_LOCATION is included.
ERROR_LOCATION is an array. ERROR_LOCATION has one ore two elements:

[
LOCATION_IN_GROONGA,
LOCATION_IN_INPUT
]

LOCATION_IN_GROONGA is the source location that error is occurred in groonga. It is useful
for groonga developers but not useful for users. LOCATION_IN_GROONGA is an array.
LOCATION_IN_GROONGA has three elements:

[
FUNCTION_NAME,
SOURCE_FILE_NAME,
LINE_NUMBER
]

FUNCTION_NAME is the name of function that error is occurred.

SOURCE_FILE_NAME is the name of groonga's source file that error is occurred.

LINE_NUMBER is the line number of SOURCE_FILE_NAME that error is occurred.

LOCATION_IN_INPUT is optional. LOCATION_IN_INPUT is included when the location that error
is occurred in input file is collected. Input file can be specified by --file command line
option for groonga command. LOCATION_IN_GROONGA is an array. LOCATION_IN_GROONGA has three
elements:

[
INPUT_FILE_NAME,
LINE_NUMBER,
LINE_CONTENT
]

INPUT_FILE_NAME is the input file name that error is occurred.

LINE_NUMBER is the line number of INPUT_FILE_NAME that error is occurred.

LINE_CONTENT is the content at LINE_NUMBER in INPUT_FILE_NAME.

BODY
BODY content depends on the executed command. It may be omitted.

BODY may be an error message on error case.

XML
TODO

TSV
TODO

See also
· return_code describes about return code.

Pretty print
Summary
New in version 5.1.0.

Groonga supports pretty print when you choose JSON for output_format.

Usage
Just specify yes to output_pretty parameter:

> status --output_pretty yes
[
[
0,
1448344438.43783,
5.29289245605469e-05
],
{
"alloc_count": 233,
"starttime": 1448344437,
"start_time": 1448344437,
"uptime": 1,
"version": "5.0.9-135-g0763d91",
"n_queries": 0,
"cache_hit_rate": 0.0,
"command_version": 1,
"default_command_version": 1,
"max_command_version": 2
}
]

Here is a result without output_pretty parameter:

> status
[[0,1448344438.43783,5.29289245605469e-05],{"alloc_count":233,"starttime":1448344437,...}]

Request ID
Summary
New in version 4.0.9.

You can assign ID to each request.

The ID can be used by canceling the request. See also /reference/commands/request_cancel
for details about canceling a request.

Request ID should be managed by user. If you assign the same ID for some running requests,
you can't cancel the request.

The simplest ID sequence is incremented numbers such as 1, 2 , ....

A request ID is a string. The maximum request ID size is 4096 byte.

How to assign ID to request
All commands accept request_id parameter. You can assign ID to request by adding
request_id parameter.

Here is an example to assign id-1 ID to a request:

select Users --request_id id-1

See also
· /reference/commands/request_cancel

Return code
Summary
Return code is used to show whether a processing is succeeded or not. If the processing is
not succeeded, return code shows error type.

Return code is used in C API and query API. You can check return code via grn_ctx_t::rc in
C API. You can check return code by looking the header element in query API. See
output_format about the header element in query API.

List
Here is a list of return codes. GRN_SUCCESS (= 0) means that the processing is succeeded.
Return codes that have negative value show error type. GRN_END_OF_DATA is a special return
code. It is used only C API. It is not showen in query API.

· 0: GRN_SUCCESS

· 1: GRN_END_OF_DATA

· -1: GRN_UNKNOWN_ERROR

· -2: GRN_OPERATION_NOT_PERMITTED

· -3: GRN_NO_SUCH_FILE_OR_DIRECTORY

· -4: GRN_NO_SUCH_PROCESS

· -5: GRN_INTERRUPTED_FUNCTION_CALL

· -6: GRN_INPUT_OUTPUT_ERROR

· -7: GRN_NO_SUCH_DEVICE_OR_ADDRESS

· -8: GRN_ARG_LIST_TOO_LONG

· -9: GRN_EXEC_FORMAT_ERROR

· -10: GRN_BAD_FILE_DESCRIPTOR

· -11: GRN_NO_CHILD_PROCESSES

· -12: GRN_RESOURCE_TEMPORARILY_UNAVAILABLE

· -13: GRN_NOT_ENOUGH_SPACE

· -14: GRN_PERMISSION_DENIED

· -15: GRN_BAD_ADDRESS

· -16: GRN_RESOURCE_BUSY

· -17: GRN_FILE_EXISTS

· -18: GRN_IMPROPER_LINK

· -19: GRN_NO_SUCH_DEVICE

· -20: GRN_NOT_A_DIRECTORY

· -21: GRN_IS_A_DIRECTORY

· -22: GRN_INVALID_ARGUMENT

· -23: GRN_TOO_MANY_OPEN_FILES_IN_SYSTEM

· -24: GRN_TOO_MANY_OPEN_FILES

· -25: GRN_INAPPROPRIATE_I_O_CONTROL_OPERATION

· -26: GRN_FILE_TOO_LARGE

· -27: GRN_NO_SPACE_LEFT_ON_DEVICE

· -28: GRN_INVALID_SEEK

· -29: GRN_READ_ONLY_FILE_SYSTEM

· -30: GRN_TOO_MANY_LINKS

· -31: GRN_BROKEN_PIPE

· -32: GRN_DOMAIN_ERROR

· -33: GRN_RESULT_TOO_LARGE

· -34: GRN_RESOURCE_DEADLOCK_AVOIDED

· -35: GRN_NO_MEMORY_AVAILABLE

· -36: GRN_FILENAME_TOO_LONG

· -37: GRN_NO_LOCKS_AVAILABLE

· -38: GRN_FUNCTION_NOT_IMPLEMENTED

· -39: GRN_DIRECTORY_NOT_EMPTY

· -40: GRN_ILLEGAL_BYTE_SEQUENCE

· -41: GRN_SOCKET_NOT_INITIALIZED

· -42: GRN_OPERATION_WOULD_BLOCK

· -43: GRN_ADDRESS_IS_NOT_AVAILABLE

· -44: GRN_NETWORK_IS_DOWN

· -45: GRN_NO_BUFFER

· -46: GRN_SOCKET_IS_ALREADY_CONNECTED

· -47: GRN_SOCKET_IS_NOT_CONNECTED

· -48: GRN_SOCKET_IS_ALREADY_SHUTDOWNED

· -49: GRN_OPERATION_TIMEOUT

· -50: GRN_CONNECTION_REFUSED

· -51: GRN_RANGE_ERROR

· -52: GRN_TOKENIZER_ERROR

· -53: GRN_FILE_CORRUPT

· -54: GRN_INVALID_FORMAT

· -55: GRN_OBJECT_CORRUPT

· -56: GRN_TOO_MANY_SYMBOLIC_LINKS

· -57: GRN_NOT_SOCKET

· -58: GRN_OPERATION_NOT_SUPPORTED

· -59: GRN_ADDRESS_IS_IN_USE

· -60: GRN_ZLIB_ERROR

· -61: GRN_LZO_ERROR

· -62: GRN_STACK_OVER_FLOW

· -63: GRN_SYNTAX_ERROR

· -64: GRN_RETRY_MAX

· -65: GRN_INCOMPATIBLE_FILE_FORMAT

· -66: GRN_UPDATE_NOT_ALLOWED

· -67: GRN_TOO_SMALL_OFFSET

· -68: GRN_TOO_LARGE_OFFSET

· -69: GRN_TOO_SMALL_LIMIT

· -70: GRN_CAS_ERROR

· -71: GRN_UNSUPPORTED_COMMAND_VERSION

See also
· output_format shows where return code is appeared in query API response.

· /spec/gqtp: GQTP protocol also uses return code as status but it uses 2byte unsigned
integer. So return codes that have negative value are statuses that have positive value
in GQTP protocol. You can convert status value in GQTP protocol to return code by
handling it as 2byte signed integer.

cache_limit
Summary
cache_limit gets or sets the max number of query cache entries. Query cache is used only
by select command.

If the max number of query cache entries is 100, the recent 100 select commands are only
cached. The cache expire algorithm is LRU (least recently used).

Syntax
This command takes only one optional parameter:

cache_limit [max=null]

Usage
You can get the current max number of cache entries by executing cache_limit without
parameter.

Execution example:

cache_limit
# [[0, 1337566253.89858, 0.000355720520019531], 100]

You can set the max number of cache entries by executing cache_limit with max parameter.

Here is an example that sets 10 as the max number of cache entries.

Execution example:

cache_limit 10
# [[0, 1337566253.89858, 0.000355720520019531], 100]
cache_limit
# [[0, 1337566253.89858, 0.000355720520019531], 10]

If max parameter is used, the return value is the max number of cache entries before max
parameter is set.

Parameters
This section describes all parameters.

max
Specifies the max number of query cache entries as a number.

If max parameter isn't specified, the current max number of query cache entries isn't
changed. cache_limit just returns the current max number of query cache entries.

Return value
cache_limit returns the current max number of query cache entries:

[HEADER, N_ENTRIES]

HEADER
See /reference/command/output_format about HEADER.

N_ENTRIES
N_ENTRIES is the current max number of query cache entries. It is a number.

See also
· select

check
Summary
check - オブジェクトの状態表示

Groonga組込コマンドの一つであるcheckについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

checkコマンドは、groongaプロセス内の指定したオブジェクトの状態を表示します。主にデータベースが壊れた場合など異常時の問題解決のために使用することを想定しています。デバッグ用のため、返値のフォーマットが安定しているということは保証されません。(フォーマットが変更される可能性が高い)

Syntax
check obj

Usage
テーブルTermsのインデックスカラムnameの状態を表示します。:

check Terms.name
[{"flags":"00008202",
"max sid":1,
"number of garbage segments":0,
"number of array segments":1,
"max id of array segment":1,
"number of buffer segments":110,
"max id of buffer segment":111,
"max id of physical segment in use":111,
"number of unmanaged segments":4294967185,
"total chunk size":7470239,
"max id of chunk segments in use":127,
"number of garbage chunk":[0,0,0,0,0,0,0,0,2,2,0,0,0,0,0]},
{"buffer id":0,
"chunk size":94392,
"buffer term":["596","59777","6",...],
"buffer free":152944,
"size in buffer":7361,
"nterms":237,
"nterms with chunk":216,
"buffer id":1,
"chunk size":71236,
"buffer term":[["に述",18149,18149,2,25,6,6],
["に追",4505,4505,76,485,136,174],
["に退",26568,26568,2,9,2,2],
...],
"buffer free":120000,
"size in buffer":11155,
"nterms":121,
"nterms with chunk":116},
{"buffer id":1,
...},
...]

Parameters
obj
状態を表示するオブジェクトの名前を指定します。

Return value
チェックするオブジェクトにより返される値が変わります。

インデックスカラムの場合:

下記のような配列が出力されます。

[インデックスの状態, バッファの状態1, バッファの状態2, ...]

インデックスの状態 には下記の項目がハッシュ形式で出力されます。
flags
指定されているフラグ値です。16進数で表現されています。

max sid
セグメントのうち最も大きなIDです。

number of garbage segments
ゴミセグメントの数です。

number of array segments
配列セグメントの数です。

max id of array segment
配列セグメントのうち最も大きなIDです。

number of buffer segments
バッファセグメントの数です。

max id of buffer segment
バッファセグメントのうち最も大きなIDです。

max id of physical segment in use
使用中の論理セグメントのうち最も大きなIDです。

number of unmanaged segments
管理されていないセグメントの数です。

total chunk size
チャンクサイズの合計です。

max id of chunk segments in use
使用中のチャンクセグメントのうち最も大きなIDです。

number of garbage chunk
各チャンク毎のゴミの数です。

バッファの状態 には下記の項目がハッシュ形式で出力されます。
buffer id
バッファIDです。

chunk size
チャンクのサイズです。

buffer term
バッファ内にある語の一覧です。各語の状態は以下のような配列となっています。
[語, バッファに登録されている語のID, 用語集に登録されている語のID,
バッファ内でのサイズ, チャンク内でのサイズ]

buffer free
バッファの空き容量です。

size in buffer
バッファの使用量です。

nterms
バッファ内にある語の数です。

nterms with chunk
バッファ内にある語のうち、チャンクを使っている語の数です。

clearlock
Summary
Deprecated since version 4.0.9: Use lock_clear instead.

clearlock - オブジェクトにセットされたロックを解除する

Groonga組込コマンドの一つであるclearlockについて説明します。組込コマンドは、groonga実行ファイルの引数、標準>入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

clearlockは、対象となるオブジェクト(データベース,テーブル,インデックス等)を指定し、オブジェクトにかけられた>ロックを再帰的に解除します。

Syntax
clearlock objname

Usage
開いているデータベースのロックをすべて解除する:

clearlock
[true]

テーブル名 Entry のカラム body のロックを解除する:

clearlock Entry.body
[true]

Parameters
objname
対象となるオブジェクト名を指定します。空の場合、開いているdbオブジェクトが対象となります。

Return value
[成功かどうかのフラグ]

成功かどうかのフラグ
エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。

See also
load

column_copy
Summary
New in version 5.0.7.

column_copy copies all column values to other column.

You can implement the following features with this command:

· Changing column configuration

· Changing table configuration

You can change column configuration by the following steps:

1. Create a new column with new configuration

2. Copy all values from the current column to the new column

3. Remove the current column

4. Rename the new column to the current column

You can change table configuration by the following steps:

1. Create a new table with new configuration

2. Create all same columns to the new table

3. Copy all column values from the current table to the new table

4. Remove the current table

5. Rename the new table to the current table

Concrete examples are showed later.

You can't copy column values from a TABLE_NO_KEY table to another table. And you can't
copy column values to a TABLE_NO_KEY table from another table. Because Groonga can't map
records without record key.

You can copy column values from a TABLE_NO_KEY table to the same TABLE_NO_KEY table.

You can copy column values from a TABLE_HASH_KEY / TABLE_PAT_KEY / TABLE_DAT_KEY table to
the same or another TABLE_HASH_KEY / TABLE_PAT_KEY / TABLE_DAT_KEY table.

Syntax
This command takes four parameters.

All parameters are required:

column_copy from_table
from_name
to_table
to_name

Usage
Here are use cases of this command:

· Changing column configuration

· Changing table configuration

How to change column configuration
You can change column value type. For example, you can change UInt32 column value to
ShortText column value.

You can change column type. For example, you can change COLUMN_SCALAR column to
COLUMN_VECTOR column.

You can move a column to other table. For example, you can move high_score column to Users
table from Players table.

Here are basic steps to change column configuration:

1. Create a new column with new configuration

2. Copy all values from the current column to the new column

3. Remove the current column

4. Rename the new column to the current column

Here is an example to change column value type to Int32 from ShortText.

Here are schema and data:

Execution example:

table_create Logs TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs serial COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Logs
[
{"_key": "log1", "serial": 1}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

The following commands change Logs.serial column value type to ShortText from Int32:

Execution example:

column_create Logs new_serial COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_copy Logs serial Logs new_serial
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_remove Logs serial
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_rename Logs new_serial serial
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Logs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "serial",
# "ShortText"
# ]
# ],
# [
# 1,
# "log1",
# "1"
# ]
# ]
# ]
# ]

You can find Logs.serial stores ShortText value from the response of select.

Here is an example to change column type to COLUMN_VECTOR from COLUMN_SCALAR.

Here are schema and data:

Execution example:

table_create Entries TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries tag COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key": "entry1", "tag": "Groonga"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

The following commands change Entries.tag column to COLUMN_VECTOR from COLUMN_SCALAR:

Execution example:

column_create Entries new_tag COLUMN_VECTOR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_copy Entries tag Entries new_tag
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_remove Entries tag
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_rename Entries new_tag tag
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Entries
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "entry1",
# [
# "Groonga"
# ]
# ]
# ]
# ]
# ]

You can find Entries.tag stores COLUMN_VECTOR value from the response of select.

Here is an example to move high_score column to Users table from Players table.

Here are schema and data:

Execution example:

table_create Players TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Players high_score COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Players
[
{"_key": "player1", "high_score": 100}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

The following commands move high_score column to Users table from Players table:

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users high_score COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_copy Players high_score Users high_score
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_remove Players high_score
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "high_score",
# "Int32"
# ]
# ],
# [
# 1,
# "player1",
# 100
# ]
# ]
# ]
# ]

You can find Users.high_score is moved from Players.high_score from the response of
select.

How to change table configuration
You can change table key type. For example, you can change key type to ShortText from
Int32.

You can change table type. For example, you can change TABLE_HASH_KEY table to
TABLE_PAT_KEY table.

You can also change other options such as default tokenizer and normalizer. For example,
you can change default tokenizer to TokenBigramSplitSymbolAlphaDigit from TokenBigrm.

NOTE:
You can't change TABLE_NO_KEY table. Because TABLE_NO_KEY doesn't have record key.
Groonga can't identify copy destination record without record key.

Here are basic steps to change table configuration:

1. Create a new table with new configuration

2. Create all same columns to the new table

3. Copy all column values from the current table to the new table

4. Remove the current table

5. Rename the new table to the current table

Here is an example to change table key type to ShortText from Int32.

Here are schema and data:

Execution example:

table_create IDs TABLE_HASH_KEY Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create IDs label COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create IDs used COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table IDs
[
{"_key": 100, "label": "ID 100", used: true}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

The following commands change IDs table key type to ShortText from Int32:

Execution example:

table_create NewIDs TABLE_HASH_KEY Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create NewIDs label COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create NewIDs used COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_copy IDs label NewIDs label
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_copy IDs used NewIDs used
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_remove IDs
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_rename NewIDs IDs
# [[0, 1337566253.89858, 0.000355720520019531], true]
select IDs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "Int32"
# ],
# [
# "label",
# "ShortText"
# ],
# [
# "used",
# "Bool"
# ]
# ],
# [
# 1,
# 100,
# "ID 100",
# true
# ]
# ]
# ]
# ]

You can find IDs stores ShortText key from the response of select.

Here is an example to change table type to TABLE_PAT_KEY from TABLE_HASH_KEY.

Here are schema and data:

Execution example:

table_create Names TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Names used COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key": "alice", "used": false}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

The following commands change Names table to TABLE_PAT_KEY from TABLE_HASH_KEY:

Execution example:

table_create NewNames TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create NewNames used COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_copy Names used NewNames used
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_remove Names
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_rename NewNames Names
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Names --filter '_key @^ "ali"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "used",
# "Bool"
# ]
# ]
# ]
# ]
# ]

You can find Names is a TABLE_PAT_KEY because select can use
script-syntax-prefix-search-operator. You can't use script-syntax-prefix-search-operator
with TABLE_HASH_KEY.

Parameters
This section describes parameters.

Required parameters
All parameters are required.

from_table
Specifies the table name of source column.

You can specify any table including TABLE_NO_KEY table.

If you specify TABLE_NO_KEY table, to_table must be the same table.

Here is an example to use from_table.

Here are schema and data:

Execution example:

table_create FromTable TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create FromTable from_column COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create FromTable to_column COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table FromTable
[
{"_key": "key1", "from_column": "value1"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select FromTable --output_columns _key,from_column,to_column
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "from_column",
# "ShortText"
# ],
# [
# "to_column",
# "ShortText"
# ]
# ],
# [
# "key1",
# "value1",
# ""
# ]
# ]
# ]
# ]

You can copy all values to to_column from from_column:

Execution example:

column_copy FromTable from_column FromTable to_column
# [[0, 1337566253.89858, 0.000355720520019531], true]
select FromTable --output_columns _key,from_column,to_column
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "from_column",
# "ShortText"
# ],
# [
# "to_column",
# "ShortText"
# ]
# ],
# [
# "key1",
# "value1",
# "value1"
# ]
# ]
# ]
# ]

from_name
Specifies the column name to be copied values.

See from_table for example.

to_table
Specifies the table name of destination column.

You can specify the same table name as from_table when you want to copy column values in
the same table.

You can't specify TABLE_NO_KEY table to to_table because Groonga can't identify
destination records without record key.

There is one exception. If you specify the same name as from_table to to_table, you can
use TABLE_NO_KEY table as to_table. Because Groonga can identify destination records when
source table and destination table is the same table.

Here is an example to use to_table.

Here are schema and data:

Execution example:

table_create Table TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Table column COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create ToTable TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create ToTable to_column COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Table
[
{"_key": "key1", "column": "value1"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

You can copy all values to ToTable.to_column from Table.column:

Execution example:

column_copy Table column ToTable to_column
# [[0, 1337566253.89858, 0.000355720520019531], true]
select ToTable --output_columns _key,to_column
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "to_column",
# "ShortText"
# ]
# ],
# [
# "key1",
# "value1"
# ]
# ]
# ]
# ]

to_name
Specifies the destination column name.

See to_table for example.

Optional parameters
There is no optional parameter.

Return value
The command returns true as body on success such as:

[HEADER, true]

If the command fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

column_create
Summary
column_create - カラムの追加

Groonga組込コマンドの一つであるcolumn_createについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

column_createは、使用しているデータベースのテーブルに対してカラムを追加します。

Syntax
column_create table name flags type [source]

Usage
テーブルEntryに、ShortText型の値を格納するカラム、bodyを作成します。:

column_create Entry body --type ShortText
[true]

テーブルTermに、Entryテーブルのbodyカラムの値を対象とする完全転置インデックス型カラム、entry_bodyを作成します。:

column_create Term entry_body COLUMN_INDEX|WITH_POSITION Entry body
[true]

Parameters
table
カラムを追加するテーブルの名前を指定します。

name
作成するカラムの名前を指定します。カラム名は、テーブルの中で一意でなければなりません。

ピリオド('.'),
コロン(':')を含む名前のカラムは作成できません。また、アンダースコア('_')で始まる名前は予約済みであり、使用できません。

flags
カラムの属性を表す以下の数値か、パイプ('|')で組み合わせたシンボル名を指定します。

0, COLUMN_SCALAR
単一の値が格納できるカラムを作成します。

1, COLUMN_VECTOR
複数の値の配列を格納できるカラムを作成します。

2, COLUMN_INDEX
インデックス型のカラムを作成します。

There are two flags to compress the value of column, but you can't specify these flags
for now because there are memory leaks issue GitHub#6 when refers the value of column.
This issue occurs both of them (zlib and lzo).

16, COMPRESS_ZLIB
Compress the value of column by using zlib. This flag is enabled when you build
Groonga with --with-zlib.

32, COMPRESS_LZO
Compress the value of column by using lzo. This flag is enabled when you build
Groonga with --with-lzo.

インデックス型のカラムについては、flagsの値に以下の値を加えることによって、追加の属
性を指定することができます。

128, WITH_SECTION
段落情報を格納するインデックスを作成します。

256, WITH_WEIGHT
ウェイト情報を格納するインデックスを作成します。

512, WITH_POSITION
位置情報を格納するインデックス(完全転置インデックス)を作成します。

type
値の型を指定します。Groongaの組込型か、同一データベースに定義済みのユーザ定義型、定義済みのテーブルを指定することができます。

source
インデックス型のカラムを作成した場合は、インデックス対象となるカラムをsource引数に指定します。

Return value
[HEADER, SUCCEEDED]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED
If command is succeeded, it returns true on success, false otherwise.

column_list
Summary
column_list command lists columns in a table.

Syntax
This command takes only one required parameter:

column_list table

Usage
Here is a simple example of column_list command.

Execution example:

table_create Users TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users age COLUMN_SCALAR UInt8
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users tags COLUMN_VECTOR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_list Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# "id",
# "UInt32"
# ],
# [
# "name",
# "ShortText"
# ],
# [
# "path",
# "ShortText"
# ],
# [
# "type",
# "ShortText"
# ],
# [
# "flags",
# "ShortText"
# ],
# [
# "domain",
# "ShortText"
# ],
# [
# "range",
# "ShortText"
# ],
# [
# "source",
# "ShortText"
# ]
# ],
# [
# 256,
# "_key",
# "",
# "",
# "COLUMN_SCALAR",
# "Users",
# "ShortText",
# []
# ],
# [
# 257,
# "age",
# "/tmp/groonga-databases/commands_column_list.0000101",
# "fix",
# "COLUMN_SCALAR|PERSISTENT",
# "Users",
# "UInt8",
# []
# ],
# [
# 258,
# "tags",
# "/tmp/groonga-databases/commands_column_list.0000102",
# "var",
# "COLUMN_VECTOR|PERSISTENT",
# "Users",
# "ShortText",
# []
# ]
# ]
# ]

Parameters
This section describes parameters of column_list.

Required parameters
All parameters are required.

table
Specifies the name of table to be listed columns.

Return value
column_list returns the list of column information in the table:

[
HEADER,
[
COLUMN_LIST_HEADER,
COLUMN_INFORMATION1,
COLUMN_INFORMATION2,
...
]
]

HEADER
See /reference/command/output_format about HEADER.

COLUMN_LIST_HEADER
COLUMN_LIST_HEADER describes about content of each COLUMN_INFORMATION.

COLUMN_LIST_HEADER is the following format:

[
["id", "UInt32"],
["name", "ShortText"],
["path", "ShortText"],
["type", "ShortText"],
["flags", "ShortText"],
["domain", "ShortText"],
["range", "ShortText"],
["source", "ShortText"]
]

It means the following:

· The first content in COLUMN_INFORMATION is id value and the value type is UInt32.

· The second content in COLUMN_INFORMATION is name value and the value type is
ShortText.

· The third content ....

See the following COLUMN_INFORMATION description for details.

This field provides meta-data of column information. So this field will be useful for
programs rather than humans.

COLUMN_INFORMATION
Each COLUMN_INFORMATION is the following format:

[
ID,
NAME,
PATH,
TYPE,
FLAGS,
DOMAIN,
RANGE,
SOURCES
]

ID
The column ID in the Groonga database. Normally, you don't care about it.

NAME
The column name.

PATH
The path for storing column data.

TYPE
The type of the column. It is one of the followings:

┌──────┬──────────────────────────────────┐
│Value │ Description │
├──────┼──────────────────────────────────┤
fix │ The column is a fixed size │
│ │ column. Scalar column that its │
│ │ type is fixed size type is fixed │
│ │ size column. │
├──────┼──────────────────────────────────┤
var │ The column is a variable size │
│ │ column. Vector column or scalar │
│ │ column that its type is variable │
│ │ size type are variable size │
│ │ column. │
├──────┼──────────────────────────────────┤
index │ The column is an index column. │
└──────┴──────────────────────────────────┘

FLAGS
The flags of the column. Each flag is separated by | like
COLUMN_VECTOR|WITH_WEIGHT. FLAGS must include one of COLUMN_SCALAR, COLUMN_VECTOR
or COLUMN_INDEX. Other flags are optional.

Here is the available flags:

┌──────────────┬──────────────────────────────────┐
│Flag │ Description │
├──────────────┼──────────────────────────────────┤
COLUMN_SCALAR │ The column is a scalar column. │
├──────────────┼──────────────────────────────────┤
COLUMN_VECTOR │ The column is a vector column. │
├──────────────┼──────────────────────────────────┤
COLUMN_INDEX │ The column is an index column. │
├──────────────┼──────────────────────────────────┤
WITH_WEIGHT │ The column can have weight. │
│ │ COLUMN_VECTOR and COLUMN_INDEX
│ │ may have it. COLUMN_SCALAR
│ │ doesn't have it. │
├──────────────┼──────────────────────────────────┤
WITH_SECTION │ The column can have section │
│ │ information. COLUMN_INDEX may │
│ │ have it. COLUMN_SCALAR and │
│ │ COLUMN_VECTOR don't have it. │
│ │ │
│ │ Multiple column index has it. │
├──────────────┼──────────────────────────────────┤
WITH_POSITION │ The column can have position │
│ │ information. COLUMN_INDEX may │
│ │ have it. COLUMN_SCALAR and │
│ │ COLUMN_VECTOR don't have it. │
│ │ │
│ │ Full text search index must has │
│ │ it. │
└──────────────┴──────────────────────────────────┘

PERSISTENT │ The column is a persistent │
│ │ column. It means that the column │
│ │ isn't a │
│ │ /reference/columns/pseudo. │
└──────────────┴──────────────────────────────────┘

DOMAIN
The name of table that has the column.

RANGE
The value type name of the column. It is a type name or a table name.

SOURCES
An array of the source column names of the index. If the index column is multiple
column index, the array has two or more source column names.

It is always an empty array for COLUMN_SCALAR and COLUMN_VECTOR.

See also
· /reference/commands/column_create

· /reference/column

column_remove
Summary
column_remove - テーブルに定義されているカラムの削除

Groonga組込コマンドの一つであるcolumn_removeについて説明します。組込コマンドは、groonga実行ファイルの引数、>標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

column_removeはテーブルに定義されているカラムを削除します。
また、付随するインデックスも削除されます。[1]

Syntax
column_remove table name

Usage
column_remove Entry body

[true]
脚注

[1] マルチセクションインデックスの一部である場合も、インデックスが削除されます。

Parameters
table
削除対象のカラムが定義されているテーブルの名前を指定します。

name
削除対象のカラム名を指定します。

Return value
[成功かどうかのフラグ]

成功かどうかのフラグ
エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。

column_rename
Summary
column_rename command renames a column.

It is a light operation. It just changes a relationship between name and the column
object. It doesn't copy column values.

It is a dangerous operation. You must stop all operations including read operations while
you run column_rename. If the following case is occurred, Groonga process may be crashed:

· Starts an operation (like select) that accesses the column to be renamed by the
current column name. The current column name is called as the old column name in the
below because the column name is renamed.

· Runs column_rename. The select is still running.

· The select accesses the column to be renamed by the old column name. But the select
can't find the column by the old name because the column has been renamed to the new
column name. It may crash the Groonga process.

Syntax
This command takes three parameters.

All parameters are required:

column_rename table name new_name

Usage
Here is a simple example of column_rename command.

Execution example:

table_create Users TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users score COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "Alice", "score": 2},
{"_key": "Bob", "score": 0},
{"_key": "Carlos", "score": -1}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
column_rename Users score point
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_list Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# "id",
# "UInt32"
# ],
# [
# "name",
# "ShortText"
# ],
# [
# "path",
# "ShortText"
# ],
# [
# "type",
# "ShortText"
# ],
# [
# "flags",
# "ShortText"
# ],
# [
# "domain",
# "ShortText"
# ],
# [
# "range",
# "ShortText"
# ],
# [
# "source",
# "ShortText"
# ]
# ],
# [
# 256,
# "_key",
# "",
# "",
# "COLUMN_SCALAR",
# "Users",
# "ShortText",
# []
# ],
# [
# 257,
# "point",
# "/tmp/groonga-databases/commands_column_rename.0000101",
# "fix",
# "COLUMN_SCALAR|PERSISTENT",
# "Users",
# "Int32",
# []
# ]
# ]
# ]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "point",
# "Int32"
# ]
# ],
# [
# 1,
# "Alice",
# 2
# ],
# [
# 2,
# "Bob",
# 0
# ],
# [
# 3,
# "Carlos",
# -1
# ]
# ]
# ]
# ]

Parameters
This section describes parameters of column_rename.

Required parameters
All parameters are required.

table
Specifies the name of table that has the column to be renamed.

name
Specifies the column name to be renamed.

new_name
Specifies the new column name.

Return value
[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
It is true on success, false otherwise.

config_delete
Summary
New in version 5.1.2.

config_delete command deletes the specified configuration item.

Syntax
This command takes only one required parameter:

config_delete key

Usage
Here is an example to delete alias.column configuration item:

Execution example:

config_set alias.column Aliases.real_name
# [[0, 1337566253.89858, 0.000355720520019531], true]
config_get alias.column
# [[0, 1337566253.89858, 0.000355720520019531], "Aliases.real_name"]
config_delete alias.column
# [[0, 1337566253.89858, 0.000355720520019531], true]
config_get alias.column
# [[0, 1337566253.89858, 0.000355720520019531], ""]

Here is an example to delete nonexistent configuration item:

Execution example:

config_delete nonexistent
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "[config][delete] failed to delete",
# [
# [
# "grn_config_delete",
# "config.c",
# 166
# ]
# ]
# ],
# false
# ]

config_delete returns an error when you try to delete nonexistent configuration item.

Parameters
This section describes all parameters.

Required parameters
There is one required parameter.

key
Specifies the key of target configuration item.

The max key size is 4KiB.

You can't use an empty string as key.

Optional parameters
There is no optional parameter.

Return value
config_delete command returns whether deleting a configuration item is succeeded or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

See also
· /reference/configuration

· config_get

· config_set

config_get
Summary
New in version 5.1.2.

config_get command returns the value of the specified configuration item.

Syntax
This command takes only one required parameter:

config_get key

Usage
Here is an example to set a value to alias.column configuration item and get the value:

Execution example:

config_set alias.column Aliases.real_name
# [[0, 1337566253.89858, 0.000355720520019531], true]
config_get alias.column
# [[0, 1337566253.89858, 0.000355720520019531], "Aliases.real_name"]

Here is an example to get nonexistent configuration item value:

Execution example:

config_get nonexistent
# [[0, 1337566253.89858, 0.000355720520019531], ""]

config_get returns an empty string for nonexistent configuration item key.

Parameters
This section describes all parameters.

Required parameters
There is one required parameter.

key
Specifies the key of target configuration item.

The max key size is 4KiB.

You can't use an empty string as key.

Optional parameters
There is no optional parameter.

Return value
config_get command returns the value of the specified configuration item:

[HEADER, VALUE]

HEADER
See /reference/command/output_format about HEADER.

VALUE
VALUE is the value of the configuration item specified by key. It's a string.

See also
· /reference/configuration

· config_set

· config_delete

config_set
Summary
New in version 5.1.2.

config_set command sets a value to the specified configuration item.

Syntax
This command takes two required parameters:

config_set key value

Usage
Here is an example to set a value to alias.column configuration item and confirm the set
value:

Execution example:

config_set alias.column Aliases.real_name
# [[0, 1337566253.89858, 0.000355720520019531], true]
config_get alias.column
# [[0, 1337566253.89858, 0.000355720520019531], "Aliases.real_name"]

Parameters
This section describes all parameters.

Required parameters
There are required parameters.

key
Specifies the key of target configuration item.

The max key size is 4KiB.

You can't use an empty string as key.

value
Specifies the value of the target configuration item specified by key.

The max value size is 4091B (= 4KiB - 5B).

Optional parameters
There is no optional parameter.

Return value
config_set command returns whether setting a configuration item value is succeeded or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

See also
· /reference/configuration

· config_get

· config_delete

database_unmap
Summary
New in version 5.0.7.

database_unmap unmaps already mapped tables and columns in the database. "Map" means that
loading from disk to memory. "Unmap" means that releasing mapped memory.

NOTE:
Normally, you don't need to use database_unmap because OS manages memory cleverly. If
remained system memory is reduced, OS moves memory used by Groonga to disk until
Groonga needs the memory. OS moves unused memory preferentially.

CAUTION:
You can use this command only when thread_limit returns 1. It means that this command
doesn't work with multithreading.

Syntax
This command takes no parameters:

database_unmap

Usage
You can unmap database after you change the max number of threads to 1:

Execution example:

thread_limit --max 1
# [[0, 1337566253.89858, 0.000355720520019531], 2]
database_unmap
# [[0, 1337566253.89858, 0.000355720520019531], true]

If the max number of threads is larger than 1, database_unmap fails:

Execution example:

thread_limit --max 2
# [[0, 1337566253.89858, 0.000355720520019531], 1]
database_unmap
# [
# [
# -2,
# 1337566253.89858,
# 0.000355720520019531,
# "[database_unmap] the max number of threads must be 1: <2>",
# [
# [
# "proc_database_unmap",
# "proc.c",
# 6931
# ]
# ]
# ],
# false
# ]

Parameters
This section describes all parameters.

Required parameters
There is no required parameter.

Optional parameters
There is no optional parameter.

Return value
The command returns true as body on success such as:

[HEADER, true]

If the command fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

define_selector
Summary
define_selector - 検索コマンドを定義

Groonga組込コマンドの一つであるdefine_selectorについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

define_selectorは、検索条件をカスタマイズした新たな検索コマンドを定義します。

Syntax
define_selector name table [match_columns [query [filter [scorer [sortby
[output_columns [offset [limit [drilldown [drilldown_sortby
[drilldown_output_columns [drilldown_offset [drilldown_limit]]]]]]]]]]]]]

Usage
テーブルEntryの全レコード・全カラムの値を出力するselectorコマンドを定義します。:

define_selector entry_selector Entry
[true]

Parameters
name
定義するselectorコマンドの名前を指定します。

table
検索対象のテーブルを指定します。

match_columns
追加するselectorコマンドのmatch_columns引数のデフォルト値を指定します。

query
追加するselectorコマンドのquery引数のデフォルト値を指定します。

filter
追加するselectorコマンドのfilter引数のデフォルト値を指定します。

scorer
追加するselectorコマンドのscorer引数のデフォルト値を指定します。

sortby
追加するselectorコマンドのsortby引数のデフォルト値を指定します。

output_columns
追加するselectorコマンドのoutput_columns引数のデフォルト値を指定します。

offset
追加するselectorコマンドのoffset引数のデフォルト値を指定します。

limit
追加するselectorコマンドのlimit引数のデフォルト値を指定します。

drilldown
追加するselectorコマンドのdrilldown引数のデフォルト値を指定します。

drilldown_sortby
追加するselectorコマンドのdrilldown_sortby引数のデフォルト値を指定します。

drilldown_output_columns
追加するselectorコマンドのdrilldown_output_columns引数のデフォルト値を指定します。

drilldown_offset
追加するselectorコマンドのdrilldown_offset引数のデフォルト値を指定します。

drilldown_limit
追加するselectorコマンドのdrilldown_limit引数のデフォルト値を指定します。

Return value
[成功かどうかのフラグ]

成功かどうかのフラグ
エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。

See also
/reference/grn_expr

defrag
Summary
defrag command resolves fragmentation of specified objects.

Groonga組込コマンドの一つであるdefragについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力
、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

defragは、対象となるオブジェクト(データベースか可変長サイズカラム)を指定し、オブジェクトのフラグメンテーショ
ンを解消します。

Syntax
defrag objname threshold

Usage
開いているデータベースのフラグメンテーションを解消する:

defrag
[300]

テーブル名 Entry のカラム body のフラグメンテーションを解消する:

defrag Entry.body
[30]

Parameters
objname
対象となるオブジェクト名を指定します。空の場合、開いているdbオブジェクトが対象となります。

Return value
[フラグメンテーション解消を実行したセグメントの数]

フラグメンテーション解消を実行したセグメントの数
フラグメンテーション解消を実行したセグメントの数を返す。

delete
Summary
delete command deletes specified record of table.

Cascade delete
There is a case that multiple table is associated. For example, the key of one table are
referenced by other table's records. In such a case, if you delete the key of one table,
other table's records are also removed.

Note that the type of other table's column is COLUMN_VECTOR, only the value of referencing
key is removed from the vector value.

Syntax
delete table [key [id [filter]]]

Usage
Here are a schema definition and sample data to show usage.

Delete the record from Entry table which has "2" as the key.

Execution example:

delete Entry 2
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Entry
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "UInt32"
# ],
# [
# "status",
# "ShortText"
# ]
# ],
# [
# 1,
# 1,
# "OK"
# ]
# ]
# ]
# ]

Here is the example about cascaded delete.

The country column of Users table associates with Country table.

"Cascaded delete" removes the records which matches specified key and refers that key.

Execution example:

table_create Country TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Users TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users name COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users country COLUMN_SCALAR Country
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": 1, "name": "John", country: "United States"}
{"_key": 2, "name": "Mike", country: "United States"}
{"_key": 3, "name": "Takashi", country: "Japan"}
{"_key": 4, "name": "Hanako", country: "Japan"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
load --table Country
[
{"_key": "United States"}
{"_key": "Japan"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
delete Country "United States"
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Country
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 2,
# "Japan"
# ]
# ]
# ]
# ]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "UInt32"
# ],
# [
# "country",
# "Country"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# 1,
# 1,
# "",
# "John"
# ],
# [
# 2,
# 2,
# "",
# "Mike"
# ],
# [
# 3,
# 3,
# "Japan",
# "Takashi"
# ],
# [
# 4,
# 4,
# "Japan",
# "Hanako"
# ]
# ]
# ]
# ]

Parameters
table
Specifies the name of table to delete the records.

key
Specifies the key of record to delete. If you use the table with TABLE_NO_KEY, the key
is just ignored. (Use id parameter in such a case)

id
Specifies the id of record to delete. If you specify id parameter, you must not specify
key parameter.

filter
Specifies the expression of grn_expr to identify the record. If you specify filter
parameter, you must not specify key and id parameter.

Return value
[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

See also
load

dump
Summary
dump - データベースのスキーマとデータを出力する

Groonga組込コマンドの一つであるdumpについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、
またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

dumpはデータベースのスキーマとデータを後から読み込めるフォーマットで出力します。dumpの結果は大きくなるため、
主にコマンドラインから使うことを想定しています。データベースのバックアップが主な利用方法です。

dumpが出力するフォーマットは直接Groongaが解釈できるフォーマットです。そのため、以下のようにしてデータベース>をコピーすることができます。:

% groonga original/db dump > dump.grn
% mkdir backup
% groonga -n backup/db < dump.grn

Syntax
dump [tables]
[dump_plugins]
[dump_schema]
[dump_records]
[dump_indexes]

Usage
Here is the sample schema and data to check dump behaviour:

plugin_register token_filters/stop_word
table_create Bookmarks TABLE_HASH_KEY ShortText
column_create Bookmarks title COLUMN_SCALAR ShortText
table_create Lexicon TABLE_PAT_KEY ShortText
table_create Sites TABLE_NO_KEY
column_create Sites url COLUMN_SCALAR ShortText
column_create Lexicon bookmark_title COLUMN_INDEX Bookmarks title
load --table Bookmarks
[
{"_key":"Groonga", "title":"Introduction to Groonga"},
{"_key":"Mroonga", "title":"Introduction to Mroonga"}
]
load --table Sites
[
{"_key": 1, "url":"http://groonga.org"},
{"_key": 2, "url":"http://mroonga.org"}
]

Dump all data in database:

> dump
plugin_register token_filters/stop_word

table_create Sites TABLE_NO_KEY
column_create Sites url COLUMN_SCALAR ShortText

table_create Bookmarks TABLE_HASH_KEY ShortText
column_create Bookmarks title COLUMN_SCALAR ShortText

table_create Lexicon TABLE_PAT_KEY ShortText

load --table Sites
[
["_id","url"],
[1,"http://groonga.org"],
[2,"http://mroonga.org"]
]

load --table Bookmarks
[
["_key","title"],
["Groonga","Introduction to Groonga"],
["Mroonga","Introduction to Mroonga"]
]

create Lexicon bookmark_title COLUMN_INDEX Bookmarks title

Dump schema and specific table data:

> dump Bookmarks
plugin_register token_filters/stop_word

table_create Sites TABLE_NO_KEY
column_create Sites url COLUMN_SCALAR ShortText

table_create Bookmarks TABLE_HASH_KEY ShortText
column_create Bookmarks title COLUMN_SCALAR ShortText

table_create Lexicon TABLE_PAT_KEY ShortText

load --table Bookmarks
[
["_key","title"],
["Groonga","Introduction to Groonga"],
["Mroonga","Introduction to Mroonga"]
]

column_create Lexicon bookmark_title COLUMN_INDEX Bookmarks title

Dump plugin only:

> dump --dump_schema no --dump_records no --dump_indexes no
plugin_register token_filters/stop_word

Dump records only:

> dump --dump_schema no --dump_plugins no --dump_indexes no
load --table Sites
[
["_id","url"],
[1,"http://groonga.org"],
[2,"http://mroonga.org"]
]

load --table Bookmarks
[
["_key","title"],
["Groonga","Introduction to Groonga"],
["Mroonga","Introduction to Mroonga"]
]

Dump schema only:

> dump --dump_records no --dump_plugins no --dump_indexes no
table_create Sites TABLE_NO_KEY
column_create Sites url COLUMN_SCALAR ShortText

table_create Bookmarks TABLE_HASH_KEY ShortText
column_create Bookmarks title COLUMN_SCALAR ShortText

table_create Lexicon TABLE_PAT_KEY ShortText

Parameters
There are optional parameters.

Optional parameters
tables
出力対象のテーブルを「,」(カンマ)区切りで指定します。存在しないテーブルを指定した場合は無視されます。

dump_plugins
New in version 5.0.3.

You can customize the output whether it contains registered plugins or not. To exclude
registered plugins from the output, specify no.

The default value is yes.

dump_schema
New in version 5.0.3.

You can customize the output whether it contains database schema or not. To exclude
database schema from the output, specify no.

The default value is yes.

dump_records
New in version 5.0.3.

You can customize the output whether it contains records or not. To exclude records from
the output, specify no.

The default value is yes.

dump_indexes
New in version 5.0.3.

You can customize the output whether it contains indexes or not. To exclude indexes from
the output, specify no.

The default value is yes.

Return value
データベースのスキーマとデータをGroongaの組み込みコマンド呼び出し形式で出力します。output_type指定は無視されます。

io_flush
Summary
NOTE:
This command is an experimental feature.

New in version 5.0.5.

io_flush flushes all changes in memory to disk explicitly. Normally, you don't need to use
io_flush explicitly. Because flushing is done automatically by OS. And flushing by OS is
effective.

You need to use io_flush explicitly when your system may often crash unexpectedly or you
may not shutdown your Groonga process in a normal way. (For example, using shutdown is a
normal shutdown process.) It's better that you use io_flush after you change your Groonga
database for the case. Here are commands that change your Groonga database:

· load

· delete

· truncate

· table_create

· table_remove

· table_rename

· column_create

· column_remove

· column_rename

· plugin_register

· plugin_unregister

If you're using select-scorer parameter in select to change existing column values, select
is added to the above list.

Note that io_flush may be a heavy process. If there are many changes in memory, flushing
them to disk is a heavy process.

Syntax
This command takes two parameters.

All parameters are optional:

io_flush [target_name=null]
[recursive=yes]

Usage
You can flush all changes in memory to disk with no arguments:

Execution example:

io_flush
# [[0, 1337566253.89858, 0.000355720520019531], true]

If you know what is changed, you can narrow flush targets. Here is a correspondence table
between command and flush targets.

┌─────────────────────────┬──────────────────────────┬────────────────────────────────────────────────────────────────────────────┐
│Command │ Flush targets │ io_flush arguments │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
load and delete │ Target table and its │ Table and its columns: │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
truncate │ Target table and its │ Table and its columns: │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
table_create │ Target table and │ Table: │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
table_remove and │ Database. │ Database: │
table_rename │ │ │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
column_create │ Target column and │ Table: │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
column_remove and │ Database. │ Database: │
column_rename │ │ │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
plugin_register and │ Database. │ Database: │
plugin_unregister │ │ │
└─────────────────────────┴──────────────────────────┴────────────────────────────────────────────────────────────────────────────┘

Parameters
This section describes all parameters.

Required parameters
There is no required parameter.

Optional parameters
There are optional parameters.

target_name
Specifies a flush target object name. Target object is one of database, table or column.

If you omit this parameter, database is flush target object:

Execution example:

io_flush
# [[0, 1337566253.89858, 0.000355720520019531], true]

If you specify table name, the table is flush target object:

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
io_flush --target_name Users
# [[0, 1337566253.89858, 0.000355720520019531], true]

If you specify column name, the column is flush target object:

Execution example:

column_create Users age COLUMN_SCALAR UInt8
# [[0, 1337566253.89858, 0.000355720520019531], true]
io_flush --target_name Users.age
# [[0, 1337566253.89858, 0.000355720520019531], true]

recursive
Specifies whether child objects of the flush target object are also flush target objects.

Child objects of database is all tables and all columns.

Child objects of table is all its columns.

Child objects of column is nothing.

recursive value must be yes or no. yes means that all of the specified flush target object
and child objects are flush target objects. no means that only the specified flush target
object is flush target object.

The following io_flush flushes all changes in database, all tables and all columns:

Execution example:

io_flush --recursive yes
# [[0, 1337566253.89858, 0.000355720520019531], true]

The following io_flush flushes all changes only in database:

Execution example:

io_flush --recursive no
# [[0, 1337566253.89858, 0.000355720520019531], true]

If you specify other value (not yes neither no) or omit recursive parameter, yes is used.

yes is used in the following case because invalid recursive argument is specified:

Execution example:

io_flush --recursive invalid
# [[0, 1337566253.89858, 0.000355720520019531], true]

yes is used in the following case because recursive parameter isn't specified:

Execution example:

io_flush
# [[0, 1337566253.89858, 0.000355720520019531], true]

Return value
The command returns true as body on success such as:

[HEADER, true]

If the command fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

load
Summary
load loads data as records in the current database and updates values of each columns.

Syntax
load values table [columns [ifexists [input_type]]]

Parameters
This section describes all parameters.

values
Specifies values loaded to records. Values should satisfy input_type format. If you
specify "json" as input_type, you can choose a format from below:

Format 1:
[[COLUMN_NAME1, COLUMN_NAME2,..], [VALUE1, VALUE2,..], [VALUE1, VALUE2,..],..]

Format 2:
[{COLUMN_NAME1: VALUE1, COLUMN_NAME2: VALUE2}, {COLUMN_NAME1: VALUE1,
COLUMN_NAME2: VALUE2},..]

[COLUMN_NAME1, COLUMN_NAME2,..] format in Format 1 is effective only when columns
parameter isn't specified.

When a target table contains primary key, you must specify _key column (pseudo column
associated primary key) as the one of COLUMN_NAME.

If values isn't specified any values, they are read from the standard input until all
opened parenthes match their closed ones. You don't have to enclose them with
single-quotes or double-quotes, but if you specified values with values parameter, you
should do.

In following values, you also don't have to enclose any spaces (' ') with single-quotes
or double-quotes.

table
Specifies a table name you want to add records.

columns
Specifies column names in added records with comma separations.

ifexists
Specifies executed grn_expr string when the same primary key as added records already
exists in your table. If ifexists specifies grn_expr string (default: true) and its
value is true, values in other (all columns excluding _key column) columns is updated.

input_type
Specifies an input format for values. It supports JSON only.

Usage
Here is an example to add records to "Entry" table.

load --table Entry --input_type json --values [{\"_key\":\"Groonga\",\"body\":\"It's very fast!!\"}]

[1]

This example shows how to add values from standard input.

load --table Entry --input_type json
[
{"_key": "Groonga", "body": "It's very fast!!"}
]

[1]

Return value
JSON format
load returns the number of added records such as

[NUMBER]

See also
/reference/grn_expr

lock_acquire
Summary
New in version 5.1.2.

lock_acquire command acquires the lock of the target object. The target object is one of
database, table and column.

NOTE:
This is a dangerous command. You must release locks by lock_release that you acquire
when these locks are no longer needed. If you forget to release these locks, your
database may be broken.

Syntax
This command takes only one optional parameter:

lock_clear [target_name=null]

If target_name parameters is omitted, database is used for the target object.

Usage
Here is an example to acquire the lock of the database:

Execution example:

lock_acquire
# [[0, 1337566253.89858, 0.000355720520019531], true]

If the database is locked, you can't create a new table and column. Release the lock of
the database to show another examples.

Execution example:

lock_release
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to acquire the lock of Entries table:

Execution example:

table_create Entries TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_acquire Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to acquire the lock of Sites.title column:

Execution example:

table_create Sites TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Sites title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_acquire Sites.title
# [[0, 1337566253.89858, 0.000355720520019531], true]

Parameters
This section describes all parameters.

target_name
Specifies the name of table or column.

If you don't specify it, database is used for the target object.

The default is none. It means that the target object is database.

Return value
lock_acquire command returns whether lock is acquired or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

See also
· lock_release

· lock_clear

lock_clear
Summary
New in version 4.0.9.

lock_clear command clear the lock of the target object recursively. The target object is
one of database, table and column.

NOTE:
This is a dangerous command. You must not use this command while other process or
thread is doing a write operation to the target object. If you do it, your database may
be broken and/or your process may be crashed.

Syntax
This command takes only one optional parameter:

lock_clear [target_name=null]

If target_name parameters is omitted, database is used for the target object. It means
that all locks in the database are cleared.

Usage
Here is an example to clear all locks in the database:

Execution example:

lock_clear
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to clear locks of Entries table and Entries table columns:

Execution example:

table_create Entries TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries body COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_clear Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to clear the lock of Sites.title column:

Execution example:

table_create Sites TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Sites title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_clear Sites.title
# [[0, 1337566253.89858, 0.000355720520019531], true]

Parameters
This section describes all parameters.

target_name
Specifies the name of table or column.

If you don't specify it, database is used for the target object.

The default is none. It means that the target object is database.

Return value
lock_clear command returns whether lock is cleared successfully or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

lock_release
Summary
New in version 5.1.2.

lock_release command releases the lock of the target object. The target object is one of
database, table and column.

NOTE:
This is a dangerous command. You must only release locks that you acquire by
lock_acquire. If you release locks without lock_acquire, your database may be broken.

Syntax
This command takes only one optional parameter:

lock_clear [target_name=null]

If target_name parameters is omitted, database is used for the target object.

Usage
Here is an example to release the lock of the database:

Execution example:

lock_acquire
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_release
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to release the lock of Entries table:

Execution example:

table_create Entries TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_acquire Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_release Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to release the lock of Sites.title column:

Execution example:

table_create Sites TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Sites title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_acquire Sites.title
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_release Sites.title
# [[0, 1337566253.89858, 0.000355720520019531], true]

Parameters
This section describes all parameters.

target_name
Specifies the name of table or column.

If you don't specify it, database is used for the target object.

The default is none. It means that the target object is database.

Return value
lock_release command returns whether lock is released successfully or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

See also
· lock_acquire

· lock_clear

log_level
Summary
log_level - ログ出力レベルの設定

Groonga組込コマンドの一つであるlog_levelについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

log_levelは、ログ出力レベルを設定します。

Syntax
log_level level

Usage
log_level warning
[true]

Parameters
level
設定するログ出力レベルの値を以下のいずれかで指定します。
EMERG ALERT CRIT error warning notice info debug

Return value
[成功かどうかのフラグ]

成功かどうかのフラグ
エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。

See also
log_put log_reopen

log_put
Summary
log_put - ログ出力

groonga組込コマンドの一つであるlog_putについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

log_putは、ログにmessageを出力します。

Syntax
log_put level message

Usage
log_put ERROR ****MESSAGE****
[true]

Parameters
level
設定するログ出力レベルの値を以下のいずれかで指定します。
EMERG ALERT CRIT error warning notice info debug

message
出力する文字列を指定します。

Return value
[成功かどうかのフラグ]

成功かどうかのフラグ
エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。

See also
log_level log_reopen

log_reopen
Summary
log_reopen - ログファイルの再読み込み

Groonga組込コマンドの一つであるlog_reopenについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

log_reopenは、ログファイルを再読み込みします。

現在、デフォルトのログ関数を用いている場合のみに対応しています。

Syntax
log_reopen

Usage
log_reopen

[true]

log_reopenを用いたログのローテーション
1. ログファイルをmvなどで移動する。 ログはmvで移動された先のファイルに書き込まれる。

2. log_reopenコマンドを実行する。

3. 既存のログファイル名と同じファイル名で、新たなログファイルが作成される。
今後のログは新たなログファイルに書き込まれる。

Parameters
ありません。

Return value
[成功かどうかのフラグ]

成功かどうかのフラグ
エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。

See also
log_level log_put

logical_count
Summary
New in version 5.0.0.

logical_count is a command to count matched records even though actual records are stored
into parted tables. It is useful for users because there is less need to care about
maximum records of table /limitations.

Note that this feature is not matured yet, so there are some limitations.

· Create parted tables which contains "_YYYYMMDD" postfix. It is hardcoded, so you must
create tables by each day.

· Load proper data into parted tables on your own.

Syntax
This command takes many parameters.

The required parameters are logical_table and shard_key:

logical_count logical_table
shard_key
[min]
[min_border]
[max]
[max_border]
[filter]

Usage
Register sharding plugin to use logical_count command in advance.

Note that logical_count is implemented as an experimental plugin, and the specification
may be changed in the future.

Here is the simple example which shows how to use this feature. Let's consider to count
specified logs which are stored into multiple tables.

Here is the schema and data.

Execution example:

table_create Logs_20150203 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150203 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150203 message COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Logs_20150204 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150204 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150204 message COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Logs_20150205 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150205 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150205 message COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]

Execution example:

load --table Logs_20150203
[
{"timestamp": "2015-02-03 23:59:58", "message": "Start"},
{"timestamp": "2015-02-03 23:59:58", "message": "Shutdown"},
{"timestamp": "2015-02-03 23:59:59", "message": "Start"},
{"timestamp": "2015-02-03 23:59:59", "message": "Shutdown"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
load --table Logs_20150204
[
{"timestamp": "2015-02-04 00:00:00", "message": "Start"},
{"timestamp": "2015-02-04 00:00:00", "message": "Shutdown"},
{"timestamp": "2015-02-04 00:00:01", "message": "Start"},
{"timestamp": "2015-02-04 00:00:01", "message": "Shutdown"},
{"timestamp": "2015-02-04 23:59:59", "message": "Start"},
{"timestamp": "2015-02-04 23:59:59", "message": "Shutdown"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 6]
load --table Logs_20150205
[
{"timestamp": "2015-02-05 00:00:00", "message": "Start"},
{"timestamp": "2015-02-05 00:00:00", "message": "Shutdown"},
{"timestamp": "2015-02-05 00:00:01", "message": "Start"},
{"timestamp": "2015-02-05 00:00:01", "message": "Shutdown"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]

There are three tables which are mapped each day from 2015 Feb 03 to 2015 Feb 05.

· Logs_20150203

· Logs_20150204

· Logs_20150205

Then, it loads data into each table which correspond to.

Let's count logs which contains "Shutdown" in message column and the value of timestamp is
"2015-02-04 00:00:00" or later.

Here is the query to achieve above purpose.

Execution example:

logical_count Logs timestamp --filter 'message == "Shutdown"' --min "2015-02-04 00:00:00" --min_border "include"
# [[0, 1337566253.89858, 0.000355720520019531], 5]

There is a well known limitation about the number of records. By sharding feature, you can
overcome such limitations because such a limitation is applied per table.

NOTE:
There is no convenient query such as PARTITIONING BY in SQL. Thus, you must create
table by table_create for each tables which contains "_YYYYMMDD" postfix in table name.

Parameters
This section describes parameters of logical_count.

Required parameters
There are required parameters, logical_table and shard_key.

logical_table
Specifies logical table name. It means table name without "_YYYYMMDD" postfix. If you use
actual table such as "Logs_20150203", "Logs_20150203" and so on, logical table name is
"Logs".

shard_key
Specifies column name which is treated as shared key in each parted table.

Optional parameters
There are optional parameters.

min
Specifies the min value of shard_key

min_border
Specifies whether the min value of borderline must be include or not. Specify include or
exclude as the value of this parameter.

max
Specifies the max value of shard_key.

max_border
Specifies whether the max value of borderline must be include or not. Specify include or
exclude as the value of this parameter.

filter
Return value
TODO

[HEADER, LOGICAL_COUNT]

logical_parameters
Summary
New in version 5.0.6.

logical_parameters is a command for test. Normally, you don't need to use this command.

logical_parameters provides the following two features:

· It returns the current parameters for logical_* commands.

· It sets new parameters for logical_* commands.

Here is a list of parameters:

· range_index

NOTE:
The parameters are independent in each thread. (To be exact, each grn_ctx.) If you want
to control the parameters perfectly, you should reduce the max number of threads to 1
by /reference/commands/thread_limit while you're using the parameters.

Syntax
This command takes only one optional parameter:

logical_parameters [range_index=null]

Usage
You need to register sharding plugin to use this command:

Execution example:

plugin_register sharding
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can get the all current parameter values by calling without parameters:

Execution example:

logical_parameters
# [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "auto"}]

You can set new values by calling with parameters:

Execution example:

logical_parameters --range_index never
# [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "auto"}]

logical_parameters returns the parameter values before new values are set when you set new
values.

Parameters
This section describes parameters.

Required parameters
There is no required parameter.

Optional parameters
There is one optional parameter.

range_index
Specifies how to use range index in logical_range_filter by keyword.

Here are available keywords:

· auto (default)

· always

· never

If auto is specified, range index is used only when it'll be efficient. This is the
default value.

Execution example:

logical_parameters --range_index auto
# [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "never"}]

If always is specified, range index is always used. It'll be useful for testing a case
that range index is used.

Execution example:

logical_parameters --range_index always
# [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "auto"}]

If never is specified, range index is never used. It'll be useful for testing a case that
range index isn't used.

Execution example:

logical_parameters --range_index never
# [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "always"}]

Return value
The command returns the current parameters for logical_* command:

[
HEADER,
{"range_index": HOW_TO_USE_RANGE_INDEX}
]

HOW_TO_USE_RANGE_INDEX value is one of the followings:

· "auto"

· "always"

· "never"

See /reference/command/output_format for HEADER.

logical_range_filter
Summary
New in version 5.0.0.

TODO: Write summary

Syntax
This command takes many parameters.

The required parameters are logical_table and shard_key:

logical_range_filter
logical_table
shard_key
[min=null]
[min_border=null]
[max=null]
[max_border=null]
[order=ascending]
[filter=null]
[offset=0]
[limit=10]
[output_columns=_key,*]
[use_range_index=null]

There are some parameters that can be only used as named parameters. You can't use these
parameters as ordered parameters. You must specify parameter name.

Here are parameters that can be only used as named parameters:

· cache=no

Usage
Register sharding plugin to use logical_range_filter command in advance.

TODO: Add examples

Parameters
This section describes parameters of logical_range_filter.

Required parameters
There are required parameters, logical_table and shard_key.

logical_table
Specifies logical table name. It means table name without "_YYYYMMDD" postfix. If you use
actual table such as "Logs_20150203", "Logs_20150203" and so on, logical table name is
"Logs".

TODO: Add examples

shard_key
Specifies column name which is treated as shared key in each parted table.

TODO: Add examples

Optional parameters
There are optional parameters.

min
Specifies the min value of shard_key

TODO: Add examples

min_border
Specifies whether the min value of borderline must be include or not. Specify include or
exclude as the value of this parameter.

TODO: Add examples

max
Specifies the max value of shard_key.

TODO: Add examples

max_border
Specifies whether the max value of borderline must be include or not. Specify include or
exclude as the value of this parameter.

TODO: Add examples

order
TODO

filter
TODO

offset
TODO

limit
TODO

output_columns
TODO

use_range_index
Specifies whether range_index is used or not. Note that it's a parameter for test. It
should not be used for production.

TODO: Add examples

Cache related parameter
cache
Specifies whether caching the result of this query or not.

If the result of this query is cached, the next same query returns response quickly by
using the cache.

It doesn't control whether existing cached result is used or not.

Here are available values:

┌──────┬──────────────────────────────────┐
│Value │ Description │
├──────┼──────────────────────────────────┤
no │ Don't cache the output of this │
│ │ query. │
├──────┼──────────────────────────────────┤
yes │ Cache the output of this query. │
│ │ It's the default value. │
└──────┴──────────────────────────────────┘

TODO: Add examples

The default value is yes.

Return value
TODO

[HEADER, LOGICAL_FILTERED]

logical_select
Summary
New in version 5.0.5.

logical_select is a sharding version of select. logical_select searches records from
multiple tables and outputs them.

You need to plugin_register sharding plugin because logical_select is included in sharding
plugin.

Syntax
This command takes many parameters.

The required parameters are logical_table and shard_key. Other parameters are optional:

logical_select logical_table
shard_key
[min=null]
[min_border="include"]
[max=null]
[max_border="include"]
[filter=null]
[sortby=null]
[output_columns="_id, _key, *"]
[offset=0]
[limit=10]
[drilldown=null]
[drilldown_sortby=null]
[drilldown_output_columns="_key, _nsubrecs"]
[drilldown_offset=0]
[drilldown_limit=10]
[drilldown_calc_types=NONE]
[drilldown_calc_target=null]

logical_select has the following named parameters for advanced drilldown:

· drilldown[${LABEL}].keys=null

· drilldown[${LABEL}].sortby=null

· drilldown[${LABEL}].output_columns="_key, _nsubrecs"

· drilldown[${LABEL}].offset=0

· drilldown[${LABEL}].limit=10

· drilldown[${LABEL}].calc_types=NONE

· drilldown[${LABEL}].calc_target=null

You can use one or more alphabets, digits, _ and . for ${LABEL}. For example, parent.sub1
is a valid ${LABEL}.

Parameters that have the same ${LABEL} are grouped.

For example, the following parameters specify one drilldown:

· --drilldown[label].keys column

· --drilldown[label].sortby -_nsubrecs

The following parameters specify two drilldowns:

· --drilldown[label1].keys column1

· --drilldown[label1].sortby -_nsubrecs

· --drilldown[label2].keys column2

· --drilldown[label2].sortby _key

Differences from select
Most of logical_select features can be used like corresponding select features. For
example, parameter name is same, output format is same and so on.

But there are some differences from select:

· logical_table and shard_key parameters are required instead of table parameter.

· sortby isn't supported when multiple shards are used. (Only one shard is used, they
are supported.)

· _value.${KEY_NAME} in drilldown[${LABEL}].sortby doesn't work with multiple shards.
It works with one shard. _key in drilldown[${LABEL}].sortby work with multiple
shards.

· match_columns and query aren't supported yet.

· cache isn't supported yet.

· match_escalation_threshold isn't supported yet.

· query_flags isn't supported yet.

· query_expander isn't supported yet.

· adjuster isn't supported yet.

Usage
Let's learn about logical_select usage with examples. This section shows many popular
usages.

You need to register sharding plugin because logical_select is included in sharding
plugin.

Execution example:

plugin_register sharding
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries_20150708 TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150708 created_at COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150708 content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150708 n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150708 tag COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Entries_20150709 TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150709 created_at COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150709 content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150709 n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150709 tag COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index_20150708 \
COLUMN_INDEX|WITH_POSITION Entries_20150708 _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index_20150708 \
COLUMN_INDEX|WITH_POSITION Entries_20150708 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index_20150709 \
COLUMN_INDEX|WITH_POSITION Entries_20150709 _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index_20150709 \
COLUMN_INDEX|WITH_POSITION Entries_20150709 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries_20150708
[
{"_key": "The first post!",
"created_at": "2015/07/08 00:00:00",
"content": "Welcome! This is my first post!",
"n_likes": 5,
"tag": "Hello"},
{"_key": "Groonga",
"created_at": "2015/07/08 01:00:00",
"content": "I started to use Groonga. It's very fast!",
"n_likes": 10,
"tag": "Groonga"},
{"_key": "Mroonga",
"created_at": "2015/07/08 02:00:00",
"content": "I also started to use Mroonga. It's also very fast! Really fast!",
"n_likes": 15,
"tag": "Groonga"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
load --table Entries_20150709
[
{"_key": "Good-bye Senna",
"created_at": "2015/07/09 00:00:00",
"content": "I migrated all Senna system!",
"n_likes": 3,
"tag": "Senna"},
{"_key": "Good-bye Tritonn",
"created_at": "2015/07/09 01:00:00",
"content": "I also migrated all Tritonn system!",
"n_likes": 3,
"tag": "Senna"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

There are two tables, Entries_20150708 and Entries_20150709, for blog entries.

NOTE:
You need to use ${LOGICAL_TABLE_NAME}_${YYYYMMDD} naming rule for table names. In this
example, LOGICAL_TABLE_NAME is Entries and YYYYMMDD is 20150708 or 20150709.

An entry has title, created time, content, the number of likes for the entry and tag.
Title is key of Entries_YYYYMMDD. Created time is value of Entries_YYYYMMDD.created_at
column. Content is value of Entries_YYYYMMDD.content column. The number of likes is value
of Entries_YYYYMMDD.n_likes column. Tag is value of Entries_YYYYMMDD.tag column.

Entries_YYYYMMDD._key column and Entries_YYYYMMDD.content column are indexed using
TokenBigram tokenizer. So both Entries_YYYYMMDD._key and Entries_YYYYMMDD.content are
fulltext search ready.

OK. The schema and data for examples are ready.

Simple usage
TODO

Parameters
This section describes parameters of logical_select.

Required parameters
There are required parameters, logical_table and shard_key.

logical_table
Specifies logical table name. It means table name without _YYYYMMDD postfix. If you use
actual table such as Entries_20150708, Entries_20150709 and so on, logical table name is
Entries.

You can show 10 records by specifying logical_table and shard_key parameters. They are
required parameters.

Execution example:

logical_select --logical_table Entries --shard_key created_at
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 1436284800.0,
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1436288400.0,
# 15,
# "Groonga"
# ],
# [
# 1,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1436367600.0,
# 3,
# "Senna"
# ],
# [
# 2,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1436371200.0,
# 3,
# "Senna"
# ]
# ]
# ]
# ]

If nonexistent table is specified, an error is returned.

Execution example:

logical_select --logical_table Nonexistent --shard_key created_at
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "[logical_select] no shard exists: logical_table: <Nonexistent>: shard_key: <created_at>",
# [
# [
# "Groonga::Context.set_groonga_error",
# "lib/mrb/scripts/context.rb",
# 27
# ]
# ]
# ]
# ]

shard_key
Specifies column name which is treated as shared key. Shard key is a column that stores
data that is used for distributing records to suitable shards.

Shard key must be Time type for now.

See logical_table how to specify shard_key.

Optional parameters
There are optional parameters.

min
Specifies the minimum value of shard_key column. If shard doesn't have any matched
records, the shard isn't searched.

For example, min is "2015/07/09 00:00:00", Entry_20150708 isn't searched. Because
Entry_20150708 has only records for "2015/07/08".

The following example only uses Entry_20150709 table. Entry_20150708 isn't used.

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--min "2015/07/09 00:00:00"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1436367600.0,
# 3,
# "Senna"
# ],
# [
# 2,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1436371200.0,
# 3,
# "Senna"
# ]
# ]
# ]
# ]

min_border
Specifies whether the minimum value is included or not. Here is available values.

┌────────┬──────────────────────────────────┐
│Value │ Description │
├────────┼──────────────────────────────────┤
include │ Includes min value. This is the │
│ │ default. │
├────────┼──────────────────────────────────┤
exclude │ Doesn't include min value. │
└────────┴──────────────────────────────────┘

Here is an example for exclude. The result doesn't include the "Good-bye Senna" record
because its created_at value is "2015/07/09 00:00:00".

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--min "2015/07/09 00:00:00" \
--min_border "exclude"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1436371200.0,
# 3,
# "Senna"
# ]
# ]
# ]
# ]

max
Specifies the maximum value of shard_key column. If shard doesn't have any matched
records, the shard isn't searched.

For example, max is "2015/07/08 23:59:59", Entry_20150709 isn't searched. Because
Entry_20150709 has only records for ""2015/07/09".

The following example only uses Entry_20150708 table. Entry_20150709 isn't used.

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--max "2015/07/08 23:59:59"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 1436284800.0,
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1436288400.0,
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

max_border
Specifies whether the maximum value is included or not. Here is available values.

┌────────┬──────────────────────────────────┐
│Value │ Description │
├────────┼──────────────────────────────────┤
include │ Includes max value. This is the │
│ │ default. │
├────────┼──────────────────────────────────┤
exclude │ Doesn't include max value. │
└────────┴──────────────────────────────────┘

Here is an example for exclude. The result doesn't include the "Good-bye Senna" record
because its created_at value is "2015/07/09 00:00:00".

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--max "2015/07/09 00:00:00" \
--max_border "exclude"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 1436284800.0,
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1436288400.0,
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

Search related parameters
logical_select provides select compatible search related parameters.

match_columns and query aren't supported yet. filter is only supported for now.

match_columns
Not implemented yet.

query
Not implemented yet.

filter
Corresponds to select-filter in select. See select-filter for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--filter "n_likes <= 5"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ],
# [
# 1,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1436367600.0,
# 3,
# "Senna"
# ],
# [
# 2,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1436371200.0,
# 3,
# "Senna"
# ]
# ]
# ]
# ]

Advanced search parameters
logical_select doesn't implement advanced search parameters yet.

match_escalation_threshold
Not implemented yet.

query_flags
Not implemented yet.

query_expander
Not implemented yet.

Output related parameters
output_columns
Corresponds to select-output-columns in select. See select-output-columns for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--output_columns '_key, *'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ],
# [
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 1436284800.0,
# 10,
# "Groonga"
# ],
# [
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1436288400.0,
# 15,
# "Groonga"
# ],
# [
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1436367600.0,
# 3,
# "Senna"
# ],
# [
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1436371200.0,
# 3,
# "Senna"
# ]
# ]
# ]
# ]

sortby
Corresponds to select-sortby in select. See select-sortby for details.

sortby has a limitation. It works only when the number of search target shards is one. If
the number of search target shards is larger than one, sortby doesn't work.

Here is an example that uses only one shard:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--min "2015/07/08 00:00:00" \
--min_border "include" \
--max "2015/07/09 00:00:00" \
--max_border "exclude" \
--sortby _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 1436284800.0,
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1436288400.0,
# 15,
# "Groonga"
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ]
# ]
# ]
# ]

offset
Corresponds to select-offset in select. See select-offset for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--offset 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1436288400.0,
# 15,
# "Groonga"
# ],
# [
# 1,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1436367600.0,
# 3,
# "Senna"
# ],
# [
# 2,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1436371200.0,
# 3,
# "Senna"
# ]
# ]
# ]
# ]

limit
Corresponds to select-limit in select. See select-limit for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 1436284800.0,
# 10,
# "Groonga"
# ]
# ]
# ]
# ]

scorer
Not implemented yet.

Drilldown related parameters
All drilldown related parameters in select are supported. See
select-drilldown-related-parameters for details.

drilldown
Corresponds to select-drilldown in select. See select-drilldown for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--output_columns _key,tag \
--drilldown tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# "The first post!",
# "Hello"
# ],
# [
# "Groonga",
# "Groonga"
# ],
# [
# "Mroonga",
# "Groonga"
# ],
# [
# "Good-bye Senna",
# "Senna"
# ],
# [
# "Good-bye Tritonn",
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

drilldown_sortby
Corresponds to select-drilldown-sortby in select. See select-drilldown-sortby for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown tag \
--drilldown_sortby -_nsubrecs,_key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ],
# [
# "Hello",
# 1
# ]
# ]
# ]
# ]

drilldown_output_columns
Corresponds to select-drilldown-output-columns in select. See
select-drilldown-output-columns for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown tag \
--drilldown_output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Hello"
# ],
# [
# "Groonga"
# ],
# [
# "Senna"
# ]
# ]
# ]
# ]

drilldown_offset
Corresponds to select-drilldown-offset in select. See select-drilldown-offset for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown tag \
--drilldown_offset 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

drilldown_limit
Corresponds to select-drilldown-limit in select. See select-drilldown-limit for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown tag \
--drilldown_limit 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ]
# ]
# ]
# ]

drilldown_calc_types
Corresponds to select-drilldown-calc-types in select. See select-drilldown-calc-types for
details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit -1 \
--output_columns tag,n_likes \
--drilldown tag \
--drilldown_calc_types MAX,MIN,SUM,AVG \
--drilldown_calc_target n_likes \
--drilldown_output_columns _key,_nsubrecs,_max,_min,_sum,_avg
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "tag",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# "Hello",
# 5
# ],
# [
# "Groonga",
# 10
# ],
# [
# "Groonga",
# 15
# ],
# [
# "Senna",
# 3
# ],
# [
# "Senna",
# 3
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ],
# [
# "_max",
# "Int64"
# ],
# [
# "_min",
# "Int64"
# ],
# [
# "_sum",
# "Int64"
# ],
# [
# "_avg",
# "Float"
# ]
# ],
# [
# "Hello",
# 1,
# 5,
# 5,
# 5,
# 5.0
# ],
# [
# "Groonga",
# 2,
# 15,
# 10,
# 25,
# 12.5
# ],
# [
# "Senna",
# 2,
# 3,
# 3,
# 6,
# 3.0
# ]
# ]
# ]
# ]

drilldown_calc_target
Corresponds to select-drilldown-calc-target in select. See select-drilldown-calc-target
for details.

See also drilldown_calc_types for an example.

Advanced drilldown related parameters
All advanced drilldown related parameters in select are supported. See
select-advanced-drilldown-related-parameters for details.

There are some limitations:

· _value.${KEY_NAME} in drilldown[${LABEL}].sortby doesn't work with multiple shards.
It works with one shard. _key in drilldown[${LABEL}].sortby work with multiple
shards.

drilldown[${LABEL}].keys
Corresponds to select-drilldown-label-keys in select. See select-drilldown-label-keys for
details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown[tag.n_likes].keys tag,n_likes \
--drilldown[tag.n_likes].output_columns _value.tag,_value.n_likes,_nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# {
# "tag.n_likes": [
# [
# 4
# ],
# [
# [
# "tag",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 5,
# 1
# ],
# [
# "Groonga",
# 10,
# 1
# ],
# [
# "Groonga",
# 15,
# 1
# ],
# [
# "Senna",
# 3,
# 2
# ]
# ]
# }
# ]
# ]

drilldown[${LABEL}].output_columns
Corresponds to select-drilldown-label-output-columns in select. See
select-drilldown-label-output-columns for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown[tag].keys tag \
--drilldown[tag].output_columns _key,_nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# {
# "tag": [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ]
# ]
# }
# ]
# ]

drilldown[${LABEL}].sortby
Corresponds to drilldown_sortby in not labeled drilldown.

drilldown[${LABEL}].sortby has a limitation.

_value.${KEY_NAME} in drilldown[${LABEL}].sortby doesn't work with multiple shards. It
works with one shard. _key in drilldown[${LABEL}].sortby work with multiple shards.

Here is an example that uses _value.${KEY_NAME} with only one shard:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--min "2015/07/08 00:00:00" \
--min_border "include" \
--max "2015/07/09 00:00:00" \
--max_border "exclude" \
--limit 0 \
--output_columns _id \
--drilldown[tag.n_likes].keys tag,n_likes \
--drilldown[tag.n_likes].output_columns _nsubrecs,_value.n_likes,_value.tag \
--drilldown[tag.n_likes].sortby -_nsubrecs,_value.n_likes,_value.tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# {
# "tag.n_likes": [
# [
# 3
# ],
# [
# [
# "_nsubrecs",
# "Int32"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# 5,
# "Hello"
# ],
# [
# 1,
# 10,
# "Groonga"
# ],
# [
# 1,
# 15,
# "Groonga"
# ]
# ]
# }
# ]
# ]

drilldown[${LABEL}].offset
Corresponds to drilldown_offset in not labeled drilldown.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown[tag.n_likes].keys tag \
--drilldown[tag.n_likes].offset 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# {
# "tag.n_likes": [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ]
# ]
# }
# ]
# ]

drilldown[${LABEL}].limit
Corresponds to drilldown_limit in not labeled drilldown.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown[tag.n_likes].keys tag \
--drilldown[tag.n_likes].limit 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# {
# "tag.n_likes": [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ]
# ]
# }
# ]
# ]

drilldown[${LABEL}].calc_types
Corresponds to drilldown_calc_types in not labeled drilldown.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown[tag].keys tag \
--drilldown[tag].calc_types MAX,MIN,SUM,AVG \
--drilldown[tag].calc_target n_likes \
--drilldown[tag].output_columns _key,_nsubrecs,_max,_min,_sum,_avg
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# {
# "tag": [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ],
# [
# "_max",
# "Int64"
# ],
# [
# "_min",
# "Int64"
# ],
# [
# "_sum",
# "Int64"
# ],
# [
# "_avg",
# "Float"
# ]
# ],
# [
# "Hello",
# 1,
# 5,
# 5,
# 5,
# 5.0
# ],
# [
# "Groonga",
# 2,
# 15,
# 10,
# 25,
# 12.5
# ],
# [
# "Senna",
# 2,
# 3,
# 3,
# 6,
# 3.0
# ]
# ]
# }
# ]
# ]

drilldown[${LABEL}].calc_target
Corresponds to drilldown_calc_target in not labeled drilldown.

See also drilldown[${LABEL}].calc_types for an example.

Return value
The return value format of logical_select is compatible with select. See
select-return-value for details.

logical_shard_list
Summary
New in version 5.0.7.

logical_shard_list returns all existing shard names against the specified logical table
name.

Syntax
This command takes only one required parameter:

logical_shard_list logical_table

Usage
You need to register sharding plugin to use this command:

Execution example:

plugin_register sharding
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here are sample shards:

Execution example:

table_create Logs_20150801 TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150801 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Logs_20150802 TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150802 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Logs_20150930 TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150930 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can get the all shard names in ascending order by specifying Logs as the logical table
name:

Execution example:

logical_shard_list --logical_table Logs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "name": "Logs_20150801"
# },
# {
# "name": "Logs_20150802"
# },
# {
# "name": "Logs_20150930"
# }
# ]
# ]

Parameters
This section describes parameters.

Required parameters
There is one required parameter.

logical_table
Specifies the logical table name. logical_shard_list returns a list of shard name of the
logical table:

Execution example:

logical_shard_list --logical_table Logs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "name": "Logs_20150801"
# },
# {
# "name": "Logs_20150802"
# },
# {
# "name": "Logs_20150930"
# }
# ]
# ]

The list is sorted by shard name in ascending order.

Optional parameters
There is no optional parameter.

Return value
The command returns a list of shard names in ascending order:

[
HEADER,
[
{"name": "SHARD_NAME_1"},
{"name": "SHARD_NAME_2"},
...
{"name": "SHARD_NAME_N"}
]
]

See /reference/command/output_format for HEADER.

See also
· /reference/sharding

logical_table_remove
Summary
New in version 5.0.5.

logical_table_remove removes tables and their columns for the specified logical table. If
there are one or more indexes against key of the tables and their columns, they are also
removed.

If you specify the part of a shard, table of the shard isn't removed. logical_table_remove
just deletes records in the table.

For example, there are the following records in a table:

· Record1: 2016-03-18 00:30:00

· Record2: 2016-03-18 01:00:00

· Record3: 2016-03-18 02:00:00

logical_table_remove deletes "Record1" and "Record2" when you specify range as between
2016-03-18 00:00:00 and 2016-03-18 01:30:00. logical_table_remove doesn't delete
"Record3". logical_table_remove doesn't remove the table.

New in version 6.0.1: You can also remove tables and columns that reference the target
table and tables related with the target shard by using dependent parameter.

Syntax
This command takes many parameters.

The required parameters are logical_table and shard_key:

logical_table_remove logical_table
shard_key
[min=null]
[min_border="include"]
[max=null]
[max_border="include"]
[dependent=no]

Usage
You specify logical table name and shard key what you want to remove.

This section describes about the followings:

· Basic usage

· Removes parts of a logical table

· Unremovable cases

· Removes with related tables

· Decreases used resources

Basic usage
Register sharding plugin to use this command in advance.

Execution example:

register sharding
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can remove all tables for the logical table by specifying only logical_table and
shard_key.

Here are commands to create 2 shards:

Execution example:

table_create Logs_20160318 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160318 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Logs_20160319 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160319 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can confirm existing shards by logical_shard_list:

Execution example:

logical_shard_list --logical_table Logs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "name": "Logs_20160318"
# },
# {
# "name": "Logs_20160319"
# }
# ]
# ]

You can remove all shards:

Execution example:

logical_table_remove \
--logical_table Logs \
--shard_key timestamp
# [[0, 1337566253.89858, 0.000355720520019531], true]

There are no shards after you remove all shards:

Execution example:

logical_shard_list --logical_table Logs
# [[0, 1337566253.89858, 0.000355720520019531], []]

Removes parts of a logical table
You can specify range of shards by the following parameters:

· min

· min_border

· max

· max_border

See the following documents of logical_select for each parameter:

· logical-select-min

· logical-select-min-border

· logical-select-max

· logical-select-max-border

If the specified range doesn't cover all records in a shard, table for the shard isn't
removed. Target records in the table are only deleted.

If the specified range covers all records in a shard, table for the shard is removed.

Here is a logical table to show the behavior. The logical table has two shards:

Execution example:

table_create Logs_20160318 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160318 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Logs_20160318
[
{"timestamp": "2016-03-18 00:30:00"},
{"timestamp": "2016-03-18 01:00:00"},
{"timestamp": "2016-03-18 02:00:00"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
table_create Logs_20160319 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160319 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Logs_20160319
[
{"timestamp": "2016-03-19 00:30:00"},
{"timestamp": "2016-03-19 01:00:00"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

There are the following records in Logs_20160318 table:

· Record1: "2016-03-18 00:30:00"

· Record2: "2016-03-18 01:00:00"

· Record3: "2016-03-18 02:00:00"

There are the following records in Logs_20160319 table:

· Record1: "2016-03-19 00:30:00"

· Record2: "2016-03-19 01:00:00"

The following range doesn't cover "Record1" in Logs_20160318 table but covers all records
in Logs_20160319 table:

┌───────────┬───────────────────────┐
│Parameter │ Value │
├───────────┼───────────────────────┤
min"2016-03-18 01:00:00"
├───────────┼───────────────────────┤
min_border"include"
├───────────┼───────────────────────┤
max"2016-03-19 01:30:00"
├───────────┼───────────────────────┤
max_border"include"
└───────────┴───────────────────────┘

logical_table_remove with the range deletes "Record2" and "Record3" in Logs_20160318 table
but doesn't remove Logs_20160318 table. Because there is "Record1" in Logs_20160318 table.

logical_table_remove with the range removes Logs_20160319 table because the range covers
all records in Logs_20160319 table.

Here is an example to use logical_table_remove with the range:

Execution example:

logical_table_remove \
--logical_table Logs \
--shard_key timestamp \
--min "2016-03-18 01:00:00" \
--min_border "include" \
--max "2016-03-19 01:30:00" \
--max_border "include"
# [[0, 1337566253.89858, 0.000355720520019531], true]

dump shows that there is "Record1" in Logs_20160318 table:

Execution example:

dump
# plugin_register sharding
#
# table_create Logs_20160318 TABLE_NO_KEY
# column_create Logs_20160318 timestamp COLUMN_SCALAR Time
#
# load --table Logs_20160318
# [
# ["_id","timestamp"],
# [1,1458228600.0]
# ]

Unremovable cases
There are some unremovable cases. See table-remove-unremovable-cases for details. Because
logical_table_remove uses the same checks.

Removes with related tables
New in version 6.0.1.

If you understand what you'll do, you can also remove tables and columns that depend on
the target shard with one logical_table_remove command by using --dependent yes parameter.

Here are conditions for dependent. If table or column satisfies one of the conditions, the
table or column depends on the target shard:

· Tables and columns that reference the target shard

· Tables for the shard (= The table has the same _YYYYMMDD postfix as the target shard
and is referenced from the target shard)

If there are one or more tables and columns that reference the target shard,
logical_table_remove is failed. It's for avoiding dangling references.

Bookmarks.log_20160320 column in the following is the column that references the target
shard:

Execution example:

table_create Logs_20160320 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160320 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Bookmarks TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Bookmarks log_20160320 COLUMN_SCALAR Logs_20160320
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can't remove Logs_20160320 by logical_table_remove by default:

Execution example:

logical_table_remove \
--logical_table Logs \
--shard_key timestamp
# [
# [
# -2,
# 1337566253.89858,
# 0.000355720520019531,
# "operation not permitted: <[table][remove] a column that references the table exists: <Bookmarks.log_20160320> -> <Logs_20160320",
# [
# [
# "Groonga::Sharding::LogicalTableRemoveCommand.remove_table",
# "/home/kou/work/c/groonga.clean/plugins/sharding/logical_table_remove.rb",
# 80
# ]
# ]
# ]
# ]

You can remove Logs_20160320 by logical_table_remove with --dependent yes parameter.
Bookmarks.log_20160320 is also removed:

Execution example:

logical_table_remove \
--logical_table Logs \
--shard_key timestamp \
--dependent yes
# [[0, 1337566253.89858, 0.000355720520019531], true]

object_exist shows that Logs_20160320 table and Bookmarks.log_20160320 column are removed:

Execution example:

object_exist Logs_20160320
# [[0, 1337566253.89858, 0.000355720520019531], false]
object_exist Bookmarks.log_20160320
# [[0, 1337566253.89858, 0.000355720520019531], false]

If there is one or more tables for the target shard, logical_table_remove with --dependent
yes also removes them. Tables that have the same _YYYYMMDD postfix as the target shard are
treated as tables for the target shard.

Here are two tables that have _20160320 postfix. NotRelated_20160320 table isn't used by
Logs_20160320 table. Users_20160320 table is used by Logs_20160320 table. Servers table
exists and used by Logs_20160320 table:

Execution example:

table_create NotRelated_20160320 TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Users_20160320 TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Servers TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Logs_20160320 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160320 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160320 user COLUMN_SCALAR Users_20160320
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160320 server COLUMN_SCALAR Servers
# [[0, 1337566253.89858, 0.000355720520019531], true]

logical_table_remove with --dependent yes parameter removes only Logs_20160320 table and
Users_20160320 table. Because Users_20160320 table has _20160320 postfix and used by
Logs_20160320. NotRelated_20160320 table and Servers table aren't removed. Because
NotRelated_20160320 table has _20160320 postfix but isn't used by Logs_20160320. Servers
table is used by Logs_20160320 but doesn't have _20160320 postfix:

Execution example:

logical_table_remove \
--logical_table Logs \
--shard_key timestamp \
--dependent yes
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can confirm that Logs_20160320 table and Users_20160320 table are removed but
NotRelated_20160320 table and Servers table aren't removed:

Execution example:

object_exist Logs_20160320
# [[0, 1337566253.89858, 0.000355720520019531], false]
object_exist Users_20160320
# [[0, 1337566253.89858, 0.000355720520019531], false]
object_exist NotRelated_20160320
# [[0, 1337566253.89858, 0.000355720520019531], true]
object_exist Servers
# [[0, 1337566253.89858, 0.000355720520019531], true]

Decreases used resources
You can decrease resources for this command. See table-remove-decreases-used-resources for
details. Because logical_table_remove uses the same logic as table_remove.

Parameters
This section describes parameters of logical_table_remove.

Required parameters
There are required parameters.

logical_table
Specifies logical table name. It means table name without _YYYYMMDD postfix. If you use
actual table such as Logs_20150203, Logs_20150203 and so on, logical table name is Logs.

See also logical-select-logical-table.

shard_key
Specifies column name which is treated as shared key.

See also logical-select-shard-key.

Optional parameters
There are optional parameters.

min
Specifies the minimum value of shard_key column.

See also logical-select-min.

min_border
Specifies whether the minimum value is included or not. include and exclude are available.
The default is include.

See also logical-select-min-border.

max
Specifies the maximum value of shard_key column.

See also logical-select-max.

max_border
Specifies whether the maximum value is included or not. include and exclude are available.
The default is include.

See also logical-select-max-border.

dependent
New in version 6.0.1.

Specifies whether tables and columns that depend on the target shard are also removed or
not.

Here are conditions for dependent. If table or column satisfies one of the conditions, the
table or column depends on the target shard:

· Tables and columns that reference the target shard

· Tables for the shard (= The table has the same _YYYYMMDD postfix as the target shard
and is referenced from the target shard)

If this value is yes, tables and columns that depend on the target shard are also removed.
Otherwise, they aren't removed. If there are one or more tables that reference the target
shard, an error is returned. If there are tables for the shared, they are not touched.

You should use this parameter carefully. This is a danger parameter.

See Removes with related tables how to use this parameter.

Return value
The command returns true as body on success such as:

[HEADER, true]

If the command fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

normalize
NOTE:
This command is an experimental feature.

This command may be changed in the future.

Summary
normalize command normalizes text by the specified normalizer.

There is no need to create table to use normalize command. It is useful for you to check
the results of normalizer.

Syntax
This command takes three parameters.

normalizer and string are required. Others are optional:

normalize normalizer
string
[flags=NONE]

Usage
Here is a simple example of normalize command.

Execution example:

normalize NormalizerAuto "aBcDe 123"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "normalized": "abcde 123",
# "types": [],
# "checks": []
# }
# ]

Parameters
This section describes parameters of normalizer.

Required parameters
There are required parameters, normalizer and string.

normalizer
Specifies the normalizer name. normalize command uses the normalizer that is named
normalizer.

See /reference/normalizers about built-in normalizers.

Here is an example to use built-in NormalizerAuto normalizer.

TODO

If you want to use other normalizers, you need to register additional normalizer plugin by
register command. For example, you can use MySQL compatible normalizer by registering
groonga-normalizer-mysql.

string
Specifies any string which you want to normalize.

If you want to include spaces in string, you need to quote string by single quotation (')
or double quotation (").

Here is an example to use spaces in string.

TODO

Optional parameters
There are optional parameters.

flags
Specifies a normalization customize options. You can specify multiple options separated by
"|". For example, REMOVE_BLANK|WITH_TYPES.

Here are available flags.

┌───────────────────────────┬───────────────┐
│Flag │ Description │
├───────────────────────────┼───────────────┤
NONE │ Just ignored. │
├───────────────────────────┼───────────────┤
REMOVE_BLANK │ TODO │
├───────────────────────────┼───────────────┤
WITH_TYPES │ TODO │
├───────────────────────────┼───────────────┤
WITH_CHECKS │ TODO │
├───────────────────────────┼───────────────┤
REMOVE_TOKENIZED_DELIMITER │ TODO │
└───────────────────────────┴───────────────┘

Here is an example that uses REMOVE_BLANK.

TODO

Here is an example that uses WITH_TYPES.

TODO

Here is an example that uses REMOVE_TOKENIZED_DELIMITER.

TODO

Return value
[HEADER, normalized_text]

HEADER
See /reference/command/output_format about HEADER.

normalized_text
normalized_text is an object that has the following attributes.

┌───────────┬──────────────────────────────────┐
│Name │ Description │
├───────────┼──────────────────────────────────┤
normalized │ The normalized text. │
├───────────┼──────────────────────────────────┤
types │ An array of types of the │
│ │ normalized text. The N-th types
│ │ shows the type of the N-th │
│ │ character in normalized. │
└───────────┴──────────────────────────────────┘

See also
· /reference/normalizers

normalizer_list
Summary
normalizer_list command lists normalizers in a database.

Syntax
This command takes no parameters:

normalizer_list

Usage
Here is a simple example.

Execution example:

normalizer_list
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "name": "NormalizerAuto"
# },
# {
# "name": "NormalizerNFKC51"
# }
# ]
# ]

It returns normalizers in a database.

Return value
normalizer_list command returns normalizers. Each normalizers has an attribute that
contains the name. The attribute will be increased in the feature:

[HEADER, normalizers]

HEADER
See /reference/command/output_format about HEADER.

normalizers
normalizers is an array of normalizer. Normalizer is an object that has the following
attributes.

┌─────┬──────────────────┐
│Name │ Description │
├─────┼──────────────────┤
name │ Normalizer name. │
└─────┴──────────────────┘

See also
· /reference/normalizers

· /reference/commands/normalize

object_exist
Summary
New in version 5.0.6.

object_exist returns whether object with the specified name exists or not in database.

It's a light operation. It just checks existence of the name in the database. It doesn't
load the specified object from disk.

object_exist doesn't check object type. The existing object may be table, column, function
and so on.

Syntax
This command takes only one required parameter:

object_exist name

Usage
You can check whether the name is already used in database:

Execution example:

object_exist Users
# [[0, 1337566253.89858, 0.000355720520019531], false]
table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
object_exist Users
# [[0, 1337566253.89858, 0.000355720520019531], true]

The object_exist Users returns false before you create Users table.

The object_exist Users returns true after you create Users table.

Parameters
This section describes all parameters.

Required parameters
There is only one required parameters.

name
Specifies the object name to be checked.

If you want to check existence of a column, use TABLE_NAME.COLUMN_NAME format like the
following:

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
object_exist Logs.timestamp
# [[0, 1337566253.89858, 0.000355720520019531], true]

Logs is table name and timestamp is column name in Logs.timestamp.

Optional parameters
There is no optional parameter.

Return value
The command returns true as body if object with the specified name exists in database such
as:

[HEADER, true]

The command returns false otherwise such as:

[HEADER, false]

See /reference/command/output_format for HEADER.

object_inspect
Summary
New in version 6.0.0.

object_inspect inspects an object. You can confirm details of an object.

For example:

· If the object is a table, you can confirm the number of records in the table.

· If the object is a column, you can confirm the type of value of the column.

Syntax
This command takes only one optional parameter:

object_inspect [name=null]

Usage
You can inspect an object in the database specified by name:

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "Alice"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
object_inspect Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "name": "Users",
# "n_records": 1,
# "value": {
# "type": null
# },
# "key": {
# "total_size": 5,
# "max_total_size": 4294967295,
# "type": {
# "size": 4096,
# "type": {
# "id": 32,
# "name": "type"
# },
# "id": 14,
# "name": "ShortText"
# }
# },
# "type": {
# "id": 48,
# "name": "table:hash_key"
# },
# "id": 256
# }
# ]

The object_inspect Users returns the following information:

· The name of the table: "name": Users

· The total used key size: "key": {"total_size": 5} ("Alice" is 5 byte data)

· The maximum total key size: "key": {"max_total_size": 4294967295}

· and so on.

You can inspect the database by not specifying name:

Execution example:

object_inspect
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "name_table": {
# "name": "",
# "n_records": 256,
# "value": null,
# "key": {
# "type": null
# },
# "type": {
# "id": 50,
# "name": "table:dat_key"
# },
# "id": 0
# },
# "type": {
# "id": 55,
# "name": "db"
# }
# }
# ]

The object_inspect returns the following information:

· The table type for object name management: "key": {"type": {"name": "table:dat_key"}}

· and so on.

Parameters
This section describes all parameters.

Required parameters
There is no required parameter.

Optional parameters
There is only one optional parameter.

name
Specifies the object name to be inspected.

If name isn't specified, the database is inspected.

Return value
The command returns an object (nested key and value pairs) that includes details of the
object (such as table) as body:

[HEADER, object]

See /reference/command/output_format for HEADER.

The format of the details is depends on object type. For example, table has key
information but function doesn't have key information.

Database
Database inspection returns the following information:

{
"type": {
"id": DATABASE_TYPE_ID,
"name": DATABASE_TYPE_NAME
},
"name_table": DATABASE_NAME_TABLE
}

DATABASE_TYPE_ID
DATABASE_TYPE_ID is always 55.

DATABASE_TYPE_NAME
DATABASE_TYPE_NAME is always "db".

DATABASE_NAME_TABLE
DATABASE_NAME_TABLE is a table for managing object names in the database. The table is
table-pat-key or table-dat-key. Normally, it's table-dat-key.

See Table for format details.

Table
Table inspection returns the following information:

{
"name": TABLE_NAME,
"type": {
"id": TABLE_TYPE_ID,
"name": TABLE_TYPE_NAME
},
"key": {
"type": TABLE_KEY_TYPE,
"total_size": TABLE_KEY_TOTAL_SIZE
"max_total_size": TABLE_KEY_MAX_TOTAL_SIZE
},
"value": {
"type": TABLE_VALUE_TYPE,
},
"n_records": TABLE_N_RECORDS
}

There are some exceptions:

· table-no-key doesn't return key information because it doesn't have key.

· table-dat-key doesn't return value information because it doesn't have value.

TABLE_NAME
The name of the inspected table.

TABLE_TYPE_ID
The type ID of the inspected table.

Here is a list of type IDs:

┌───────────────┬────┐
│Table type │ ID │
├───────────────┼────┤
│table-hash-key │ 48
├───────────────┼────┤
│table-pat-key │ 49
├───────────────┼────┤
│table-dat-key │ 50
├───────────────┼────┤
│table-no-key │ 51
└───────────────┴────┘

TABLE_TYPE_NAME
The type name of the inspected table.

Here is a list of type names:

┌───────────────┬──────────────────┐
│Table type │ Name │
├───────────────┼──────────────────┤
│table-hash-key │ "table:hash_key"
├───────────────┼──────────────────┤
│table-pat-key │ "table:pat_key"
├───────────────┼──────────────────┤
│table-dat-key │ "table:dat_key"
├───────────────┼──────────────────┤
│table-no-key │ "table:no_key"
└───────────────┴──────────────────┘

TABLE_KEY_TYPE
The type of key of the inspected table.

See Type for format details.

TABLE_KEY_TOTAL_SIZE
The total key size of the inspected table in bytes.

TABLE_KEY_MAX_TOTAL_SIZE
The maximum total key size of the inspected table in bytes.

TABLE_VALUE_TYPE
The type of value of the inspected table.

See Type for format details.

TABLE_N_RECORDS
The number of records of the inspected table.

It's a 64bit unsigned integer value.

Type
Type inspection returns the following information:

{
"id": TYPE_ID,
"name": TYPE_NAME,
"type": {
"id": TYPE_ID_OF_TYPE,
"name": TYPE_NAME_OF_TYPE
},
"size": TYPE_SIZE
}

TYPE_ID
The ID of the inspected type.

Here is an ID list of builtin types:

┌─────────────────────────────┬────┐
│Type │ ID │
├─────────────────────────────┼────┤
│builtin-type-bool │ 3
├─────────────────────────────┼────┤
│builtin-type-int8 │ 4
├─────────────────────────────┼────┤
│builtin-type-uint8 │ 5
├─────────────────────────────┼────┤
│builtin-type-int16 │ 6
├─────────────────────────────┼────┤
│builtin-type-uint16 │ 7
├─────────────────────────────┼────┤
│builtin-type-int32 │ 8
├─────────────────────────────┼────┤
│builtin-type-uint32 │ 9
├─────────────────────────────┼────┤
│builtin-type-int64 │ 10
├─────────────────────────────┼────┤
│builtin-type-uint64 │ 11
├─────────────────────────────┼────┤
│builtin-type-float │ 12
└─────────────────────────────┴────┘

│builtin-type-time │ 13
├─────────────────────────────┼────┤
│builtin-type-short-text │ 14
├─────────────────────────────┼────┤
│builtin-type-text │ 15
├─────────────────────────────┼────┤
│builtin-type-long-text │ 16
├─────────────────────────────┼────┤
│builtin-type-tokyo-geo-point │ 17
├─────────────────────────────┼────┤
│builtin-type-wgs84-geo-point │ 18
└─────────────────────────────┴────┘

TYPE_NAME
The name of the inspected type.

Here is a name list of builtin types:

· builtin-type-bool

· builtin-type-int8

· builtin-type-uint8

· builtin-type-int16

· builtin-type-uint16

· builtin-type-int32

· builtin-type-uint32

· builtin-type-int64

· builtin-type-uint64

· builtin-type-float

· builtin-type-time

· builtin-type-short-text

· builtin-type-text

· builtin-type-long-text

· builtin-type-tokyo-geo-point

· builtin-type-wgs84-geo-point

TYPE_ID_OF_TYPE
TYPE_ID_OF_TYPE is always 32.

TYPE_NAME_OF_TYPE
TYPE_NAME_OF_TYPE is always type.

TYPE_SIZE
TYPE_SIZE is the size of the inspected type in bytes. If the inspected type is variable
size type, the size means the maximum size.

object_remove
Summary
New in version 6.0.0.

object_remove removes an object. You can remove any object including table, column,
command and so on. Normally, you should use specific remove command such as table_remove
and column_remove.

object_remove is danger because you can remove any object. You should use object_remove
carefully.

object_remove has "force mode". You can remove a broken object by "force mode". "Force
mode" is useful to resolve problems reported by /reference/executables/grndb.

Syntax
This command takes two parameters:

object_remove name
[force=no]

Usage
You can remove an object in the database specified by name:

Execution example:

object_remove Users
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "[object][remove] target object doesn't exist: <Users>",
# [
# [
# "command_object_remove",
# "proc_object.c",
# 121
# ]
# ]
# ],
# false
# ]
table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
object_remove Users
# [[0, 1337566253.89858, 0.000355720520019531], true]

The object_remove Users returns false before you create Users table.

The object_remove Users returns true after you create Users table.

You can't remove a broken object by default:

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
thread_limit 1
# [[0, 1337566253.89858, 0.000355720520019531], 1]
database_unmap
# [[0, 1337566253.89858, 0.000355720520019531], true]
echo "BROKEN" > ${DB_PATH}.0000100
object_remove Users
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "[object][remove] failed to open the target object: <Users>",
# [
# [
# "command_object_remove",
# "proc_object.c",
# 116
# ]
# ]
# ],
# false
# ]
object_exist Users
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can remove a broken object by --force yes:

Execution example:

object_remove Users --force yes
# [
# [
# -65,
# 1337566253.89858,
# 0.000355720520019531,
# "[io][open] file size is too small: <7>(required: >= 64): </tmp/groonga-databases/commands_object_remove.0000100>",
# [
# [
# "grn_io_open",
# "io.c",
# 565
# ]
# ]
# ],
# false
# ]
object_exist Users
# [[0, 1337566253.89858, 0.000355720520019531], false]

--force yes means you enable "force mode". You can remove a broken object in "force mode".

Parameters
This section describes all parameters.

Required parameters
There is only one required parameter.

name
Specifies the object name to be removed.

If you want to remove a column, use TABLE_NAME.COLUMN_NAME format like the following:

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
object_remove Logs.timestamp
# [[0, 1337566253.89858, 0.000355720520019531], true]

Logs is table name and timestamp is column name in Logs.timestamp.

Optional parameters
There is one optional parameter.

force
Specifies whether removing the object in "force mode".

You can't remove a broken object by default. But you can remove a broken object in "force
mode".

force value must be yes or no. yes means that "force mode" is enabled. no means that
"force mode" is disabled.

The default value is no. It means that "force mode" is disabled by default.

Return value
The command returns true as body when the command removed the specified object without any
error. For example:

[HEADER, true]

The command returns false as body when the command gets any errors. For example:

[HEADER, false]

See /reference/command/output_format for HEADER.

Note that false doesn't mean that "the command can't remove the object". If you enable
"force mode", the command removes the object even if the object is broken. In the case,
the object is removed and false is returned as body.

plugin_register
New in version 5.0.1.

Summary
plugin_register command registers a plugin. You need to register a plugin before you use a
plugin.

You need just one plugin_register command for a plugin in the same database because
registered plugin information is written into the database. When you restart your groonga
process, groonga process loads all registered plugins without plugin_register command.

You can unregister a registered plugin by plugin_unregister.

Syntax
This command takes only one required parameter:

plugin_register name

Usage
Here is a sample that registers QueryExpanderTSV query expander that is included in
${PREFIX}/lib/groonga/plugins/query_expanders/tsv.so.

Execution example:

plugin_register query_expanders/tsv
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can omit ${PREFIX}/lib/groonga/plugins/ and suffix (.so). They are completed
automatically.

You can specify absolute path such as plugin_register
/usr/lib/groonga/plugins/query_expanders/tsv.so.

Return value
plugin_register returns true as body on success such as:

[HEADER, true]

If plugin_register fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

See also
· plugin_unregister

plugin_unregister
NOTE:
This command is an experimental feature.

New in version 5.0.1.

Summary
plugin_unregister command unregisters a plugin.

Syntax
This command takes only one required parameter:

plugin_unregister name

Usage
Here is a sample that unregisters QueryExpanderTSV query expander that is included in
${PREFIX}/lib/groonga/plugins/query_expanders/tsv.so.

Execution example:

plugin_unregister query_expanders/tsv
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can omit ${PREFIX}/lib/groonga/plugins/ and suffix (.so). They are completed
automatically.

You can specify absolute path such as plugin_unregister
/usr/lib/groonga/plugins/query_expanders/tsv.so.

Return value
plugin_unregister returns true as body on success such as:

[HEADER, true]

If plugin_unregister fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

See also
· plugin_register

quit
Summary
quit - セッション終了

Groonga組込コマンドの一つであるquitについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

quitは、groongaプロセスとのセッションを終了します。クライアントプロセスならばgroongaプロセスとの接続を切ります。

Syntax
quit

Usage
quit

Parameters
ありません。

Return value
ありません。

range_filter
Summary
TODO: write me

Syntax
Usage
Return value
See also
· /reference/commands/select

register
Deprecated since version 5.0.1: Use plugin_register instead.

Summary
register command registers a plugin. You need to register a plugin before you use a
plugin.

You need just one register command for a plugin in the same database because registered
plugin information is written into the database. When you restart your groonga process,
groonga process loads all registered plugins without register command.

NOTE:
Registered plugins can be removed since Groonga 5.0.1. Use plugin_unregister in such a
case.

Syntax
This command takes only one required parameter:

register path

Usage
Here is a sample that registers QueryExpanderTSV query expander that is included in
${PREFIX}/lib/groonga/plugins/query_expanders/tsv.so.

Execution example:

register query_expanders/tsv
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can omit ${PREFIX}/lib/groonga/plugins/ and suffix (.so). They are completed
automatically.

You can specify absolute path such as register
/usr/lib/groonga/plugins/query_expanders/tsv.so.

Return value
register returns true as body on success such as:

[HEADER, true]

If register fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

See also
· plugin_register

· plugin_unregister

reindex
Summary
New in version 5.1.0.

reindex command recreates one or more index columns.

If you specify a database as target object, all index columns are recreated.

If you specify a table as target object, all index columns in the table are recreated.

If you specify a data column as target object, all index columns for the data column are
recreated.

If you specify an index column as target object, the index column is recreated.

This command is useful when your index column is broken. The target object is one of
database, table and column.

NOTE:
You can't use target index columns while reindex command is running. If you use the
same database from multiple processes, all processes except running reindex should
reopen the database. You can use database_unmap for reopening database.

Syntax
This command takes only one optional parameter:

reindex [target_name=null]

If target_name parameters is omitted, database is used for the target object. It means
that all index columns in the database are recreated.

Usage
Here is an example to recreate all index columns in the database:

Execution example:

reindex
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to recreate all index columns (Lexicon.entry_key and
Lexicon.entry_body) in Lexicon table:

Execution example:

table_create Entry TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entry body COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Lexicon TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon entry_key COLUMN_INDEX|WITH_POSITION \
Entry _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon entry_body COLUMN_INDEX|WITH_POSITION \
Entry body
# [[0, 1337566253.89858, 0.000355720520019531], true]
reindex Lexicon
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to recreate all index columns (BigramLexicon.site_title and
RegexpLexicon.site_title) of Site.title data column:

Execution example:

table_create Site TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Site title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create BigramLexicon TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create BigramLexicon site_title COLUMN_INDEX|WITH_POSITION \
Site title
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create RegexpLexicon TABLE_PAT_KEY ShortText \
--default_tokenizer TokenRegexp \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create RegexpLexicon site_title COLUMN_INDEX|WITH_POSITION \
Site title
# [[0, 1337566253.89858, 0.000355720520019531], true]
reindex Site.title
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to recreate an index column (Timestamp.index):

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Timestamp TABLE_PAT_KEY Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Timestamp logs_timestamp COLUMN_INDEX Logs timestamp
# [[0, 1337566253.89858, 0.000355720520019531], true]
reindex Timestamp.logs_timestamp
# [[0, 1337566253.89858, 0.000355720520019531], true]

Parameters
This section describes all parameters.

target_name
Specifies the name of table or column.

If you don't specify it, database is used for the target object.

The default is none. It means that the target object is database.

Return value
reindex command returns whether recreation is succeeded or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

request_cancel
Summary
NOTE:
This command is an experimental feature.

New in version 4.0.9.

request_cancel command cancels a running request.

There are some limitations:

· Request ID must be managed by user. (You need to assign unique key for each request.)

· Cancel request may be ignored. (You can send request_cancel command multiple times
for the same request ID.)

· Only multithreading type Groonga server is supported. (You can use with
/reference/executables/groonga based server but can't use with
/reference/executables/groonga-httpd.)

See /reference/command/request_id about request ID.

If request is canceled, the canceled request has -5 (GRN_INTERRUPTED_FUNCTION_CALL) as
/reference/command/return_code.

Syntax
This command takes only one required parameter:

request_cancel id

Usage
Here is an example of request_cancel command:

$ curl 'http://localhost:10041/d/select?table=LargeTable&filter=true&request_id=unique-id-1' &
# The above "select" takes a long time...
# Point: "request_id=unique-id-1"
$ curl 'http://localhost:10041/d/request_cancel?id=unique-id-1'
[[...], {"id": "unique-id-1", "canceled": true}]
# Point: "id=unique-id-1"

Assume that the first select command takes a long time. unique-id-1 request ID is assigned
to the select command by request_id=unique-id-1 parameter.

The second request_cancel command passes id=unique-id-1 parameter. unique-id-1 is the same
request ID passed in select command.

The select command may not be canceled immediately. And the cancel request may be ignored.

You can send cancel request for the same request ID multiple times. If the target request
is canceled or finished, "canceled" value is changed to false from true in return value:

$ curl 'http://localhost:10041/d/request_cancel?id=unique-id-1'
[[...], {"id": "unique-id-1", "canceled": true}]
# "select" is still running... ("canceled" is "true")
$ curl 'http://localhost:10041/d/request_cancel?id=unique-id-1'
[[...], {"id": "unique-id-1", "canceled": true}]
# "select" is still running... ("canceled" is "true")
$ curl 'http://localhost:10041/d/request_cancel?id=unique-id-1'
[[...], {"id": "unique-id-1", "canceled": false}]
# "select" is canceled or finished. ("canceled" is "false")

If the select command is canceled, response of the select command has -5
(GRN_INTERRUPTED_FUNCTION_CALL) as /reference/command/return_code:

$ curl 'http://localhost:10041/d/select?table=LargeTable&filter=true&request_id=unique-id-1' &
[[-5, ...], ...]

Parameters
This section describes parameters of request_cancel.

Required parameters
There is required parameter, id.

id
Specifies the ID for the target request.

Return value
request_cancel command returns the result of the cancel request:

[
HEADER,
{
"id": ID,
"canceled": CANCEL_REQUEST_IS_ACCEPTED_OR_NOT
}
]

HEADER
See /reference/command/output_format about HEADER.

ID
The ID of the target request.

CANCEL_REQUEST_IS_ACCEPTED_OR_NOT
If the cancel request is accepted, this is true, otherwise this is false.

Note that "cancel request is accepted" doesn't means that "the target request is
canceled". It just means "cancel request is notified to the target request but the
cancel request may be ignored by the target request".

If request assigned with the request ID doesn't exist, this is false.

See also
· /reference/command/request_id

ruby_eval
Summary
ruby_eval command evaluates Ruby script and returns the result.

Syntax
This command takes only one required parameter:

ruby_eval script

Usage
You can execute any scripts which mruby supports by calling ruby_eval.

Here is an example that just calculate 1 + 2 as Ruby script.

Execution example:

register ruby/eval
# [[0, 1337566253.89858, 0.000355720520019531], true]
ruby_eval "1 + 2"
# [[0, 1337566253.89858, 0.000355720520019531], {"value": 3}]

Register ruby/eval plugin to use ruby_eval command in advance.

Note that ruby_eval is implemented as an experimental plugin, and the specification may be
changed in the future.

Parameters
This section describes all parameters.

script
Specifies the Ruby script which you want to evaluate.

Return value
ruby_eval returns the evaluated result with metadata such as exception information
(Including metadata isn't implemented yet):

[HEADER, {"value": EVALUATED_VALUE}]

HEADER
See /reference/command/output_format about HEADER.

EVALUATED_VALUE
EVALUATED_VALUE is the evaludated value of ruby_script.

ruby_eval supports only a number for evaluated value for now. Supported types will be
increased in the future.

See also
ruby_load
Summary
ruby_load command loads specified Ruby script.

Syntax
This command takes only one required parameter:

ruby_load path

Usage
You can load any script file which mruby supports by calling ruby_load.

Here is an example that just load expression.rb as Ruby script.

Execution example:

register ruby/load
# [[0, 1337566253.89858, 0.000355720520019531], true]
ruby_load "expression.rb"
# [[0, 1337566253.89858, 0.000355720520019531], {"value": null}]

Register ruby/load plugin to use ruby_load command in advance.

Note that ruby_load is implemented as an experimental plugin, and the specification may be
changed in the future.

Parameters
This section describes all parameters.

path
Specifies the Ruby script path which you want to load.

Return value
ruby_load returns the loaded result with metadata such as exception information (Including
metadata isn't implemented yet):

[HEADER, {"value": LOADED_VALUE}]

HEADER
See /reference/command/output_format about HEADER.

LOADED_VALUE
LOADED_VALUE is the loaded value of ruby script.

ruby_load just return null as LOADED_VALUE for now, it will be supported in the future.

See also
/reference/commands/ruby_eval

schema
Summary
New in version 5.0.9.

schema command returns schema in the database.

This command is useful when you want to inspect the database. For example, visualizing the
database, creating GUI for the database and so on.

Syntax
This command takes no parameters:

schema

Usage
Here is an example schema to show example output:

Execution example:

table_create Memos TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Memos content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms memos_content_index \
COLUMN_INDEX|WITH_POSITION \
Memos content
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an output of schema command against this example schema:

Execution example:

schema
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "tables": {
# "Terms": {
# "normalizer": {
# "name": "NormalizerAuto"
# },
# "name": "Terms",
# "tokenizer": {
# "name": "TokenBigram"
# },
# "command": {
# "command_line": "table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto",
# "name": "table_create",
# "arguments": {
# "key_type": "ShortText",
# "default_tokenizer": "TokenBigram",
# "normalizer": "NormalizerAuto",
# "flags": "TABLE_PAT_KEY",
# "name": "Terms"
# }
# },
# "indexes": [],
# "key_type": {
# "type": "type",
# "name": "ShortText"
# },
# "value_type": null,
# "token_filters": [],
# "type": "patricia trie",
# "columns": {
# "memos_content_index": {
# "name": "memos_content_index",
# "weight": false,
# "section": false,
# "compress": null,
# "command": {
# "command_line": "column_create --table Terms --name memos_content_index --flags COLUMN_INDEX|WITH_POSITION --type Memos --sources content",
# "name": "column_create",
# "arguments": {
# "table": "Terms",
# "flags": "COLUMN_INDEX|WITH_POSITION",
# "name": "memos_content_index",
# "sources": "content",
# "type": "Memos"
# }
# },
# "indexes": [],
# "sources": [
# {
# "table": "Memos",
# "name": "content",
# "full_name": "Memos.content"
# }
# ],
# "value_type": {
# "type": "reference",
# "name": "Memos"
# },
# "full_name": "Terms.memos_content_index",
# "position": true,
# "table": "Terms",
# "type": "index"
# }
# }
# },
# "Memos": {
# "normalizer": null,
# "name": "Memos",
# "tokenizer": null,
# "command": {
# "command_line": "table_create --name Memos --flags TABLE_HASH_KEY --key_type ShortText",
# "name": "table_create",
# "arguments": {
# "key_type": "ShortText",
# "flags": "TABLE_HASH_KEY",
# "name": "Memos"
# }
# },
# "indexes": [],
# "key_type": {
# "type": "type",
# "name": "ShortText"
# },
# "value_type": null,
# "token_filters": [],
# "type": "hash table",
# "columns": {
# "content": {
# "name": "content",
# "weight": false,
# "section": false,
# "compress": null,
# "command": {
# "command_line": "column_create --table Memos --name content --flags COLUMN_SCALAR --type Text",
# "name": "column_create",
# "arguments": {
# "table": "Memos",
# "flags": "COLUMN_SCALAR",
# "name": "content",
# "type": "Text"
# }
# },
# "indexes": [
# {
# "table": "Terms",
# "section": 0,
# "name": "memos_content_index",
# "full_name": "Terms.memos_content_index"
# }
# ],
# "sources": [],
# "value_type": {
# "type": "type",
# "name": "Text"
# },
# "full_name": "Memos.content",
# "position": false,
# "table": "Memos",
# "type": "scalar"
# }
# }
# }
# },
# "normalizers": {
# "NormalizerNFKC51": {
# "name": "NormalizerNFKC51"
# },
# "NormalizerAuto": {
# "name": "NormalizerAuto"
# }
# },
# "token_filters": {},
# "tokenizers": {
# "TokenBigramSplitSymbolAlphaDigit": {
# "name": "TokenBigramSplitSymbolAlphaDigit"
# },
# "TokenRegexp": {
# "name": "TokenRegexp"
# },
# "TokenBigramIgnoreBlankSplitSymbolAlphaDigit": {
# "name": "TokenBigramIgnoreBlankSplitSymbolAlphaDigit"
# },
# "TokenBigram": {
# "name": "TokenBigram"
# },
# "TokenDelimit": {
# "name": "TokenDelimit"
# },
# "TokenUnigram": {
# "name": "TokenUnigram"
# },
# "TokenBigramSplitSymbol": {
# "name": "TokenBigramSplitSymbol"
# },
# "TokenDelimitNull": {
# "name": "TokenDelimitNull"
# },
# "TokenBigramIgnoreBlankSplitSymbolAlpha": {
# "name": "TokenBigramIgnoreBlankSplitSymbolAlpha"
# },
# "TokenBigramSplitSymbolAlpha": {
# "name": "TokenBigramSplitSymbolAlpha"
# },
# "TokenTrigram": {
# "name": "TokenTrigram"
# },
# "TokenMecab": {
# "name": "TokenMecab"
# },
# "TokenBigramIgnoreBlankSplitSymbol": {
# "name": "TokenBigramIgnoreBlankSplitSymbol"
# },
# "TokenBigramIgnoreBlank": {
# "name": "TokenBigramIgnoreBlank"
# }
# },
# "plugins": {},
# "types": {
# "UInt64": {
# "can_be_key_type": true,
# "name": "UInt64",
# "can_be_value_type": true,
# "size": 8
# },
# "Int32": {
# "can_be_key_type": true,
# "name": "Int32",
# "can_be_value_type": true,
# "size": 4
# },
# "Int16": {
# "can_be_key_type": true,
# "name": "Int16",
# "can_be_value_type": true,
# "size": 2
# },
# "LongText": {
# "can_be_key_type": false,
# "name": "LongText",
# "can_be_value_type": false,
# "size": 2147483648
# },
# "TokyoGeoPoint": {
# "can_be_key_type": true,
# "name": "TokyoGeoPoint",
# "can_be_value_type": true,
# "size": 8
# },
# "Text": {
# "can_be_key_type": false,
# "name": "Text",
# "can_be_value_type": false,
# "size": 65536
# },
# "ShortText": {
# "can_be_key_type": true,
# "name": "ShortText",
# "can_be_value_type": false,
# "size": 4096
# },
# "Float": {
# "can_be_key_type": true,
# "name": "Float",
# "can_be_value_type": true,
# "size": 8
# },
# "UInt8": {
# "can_be_key_type": true,
# "name": "UInt8",
# "can_be_value_type": true,
# "size": 1
# },
# "UInt32": {
# "can_be_key_type": true,
# "name": "UInt32",
# "can_be_value_type": true,
# "size": 4
# },
# "Object": {
# "can_be_key_type": true,
# "name": "Object",
# "can_be_value_type": true,
# "size": 8
# },
# "UInt16": {
# "can_be_key_type": true,
# "name": "UInt16",
# "can_be_value_type": true,
# "size": 2
# },
# "Int64": {
# "can_be_key_type": true,
# "name": "Int64",
# "can_be_value_type": true,
# "size": 8
# },
# "Time": {
# "can_be_key_type": true,
# "name": "Time",
# "can_be_value_type": true,
# "size": 8
# },
# "Bool": {
# "can_be_key_type": true,
# "name": "Bool",
# "can_be_value_type": true,
# "size": 1
# },
# "WGS84GeoPoint": {
# "can_be_key_type": true,
# "name": "WGS84GeoPoint",
# "can_be_value_type": true,
# "size": 8
# },
# "Int8": {
# "can_be_key_type": true,
# "name": "Int8",
# "can_be_value_type": true,
# "size": 1
# }
# }
# }
# ]

Parameters
This section describes all parameters.

Required parameters
There is no required parameter.

Optional parameters
There is no optional parameter.

Return value
schema command returns schema in the database:

[HEADER, SCHEMA]

HEADER
See /reference/command/output_format about HEADER.

SCHEMA
SCHEMA is an object that consists of the following information:

{
"plugins": PLUGINS,
"types": TYPES,
"tokenizers": TOKENIZERS,
"normalizers": NORMALIZERS,
"token_filters": TOKEN_FITLERS,
"tables": TABLES
}

PLUGINS
PLUGINS is an object. Its key is plugin name and its value is plugin detail:

{
"PLUGIN_NAME_1": PLUGIN_1,
"PLUGIN_NAME_2": PLUGIN_2,
...
"PLUGIN_NAME_n": PLUGIN_n
}

PLUGIN
PLUGIN is an object that describes plugin detail:

{
"name": PLUGIN_NAME
}

Here are properties of PLUGIN:

┌─────┬──────────────────────────────────┐
│Name │ Description │
├─────┼──────────────────────────────────┤
name │ The plugin name. It's used in │
│ │ plugin_register. │
└─────┴──────────────────────────────────┘

TYPES
TYPES is an object. Its key is type name and its value is type detail:

{
"TYPE_NAME_1": TYPE_1,
"TYPE_NAME_2": TYPE_2,
...
"TYPE_NAME_n": TYPE_n
}

TYPE
TYPE is an object that describes type detail:

{
"name": TYPE_NAME,
"size": SIZE_OF_ONE_VALUE_IN_BYTE,
"can_be_key_type": BOOLEAN,
"can_be_value_type": BOOLEAN
}

Here are properties of TYPE:

┌──────────────────┬──────────────────────────────────┐
│Name │ Description │
├──────────────────┼──────────────────────────────────┤
name │ The type name. │
├──────────────────┼──────────────────────────────────┤
size │ The number of bytes of one │
│ │ value. │
├──────────────────┼──────────────────────────────────┤
can_be_key_typetrue when the type can be used │
│ │ for table key, false otherwise. │
├──────────────────┼──────────────────────────────────┤
can_be_value_typetrue when the type can be used │
│ │ for table value, false
│ │ otherwise. │
└──────────────────┴──────────────────────────────────┘

TOKENIZERS
TOKENIZERS is an object. Its key is tokenizer name and its value is tokenizer detail:

{
"TOKENIZER_NAME_1": TOKENIZER_1,
"TOKENIZER_NAME_2": TOKENIZER_2,
...
"TOKENIZER_NAME_n": TOKENIZER_n
}

TOKENIZER
TOKENIZER is an object that describes tokenizer detail:

{
"name": TOKENIZER_NAME
}

Here are properties of TOKENIZER:

┌─────┬──────────────────────────────────┐
│Name │ Description │
├─────┼──────────────────────────────────┤
name │ The tokenizer name. It's used │
│ │ for │
│ │ table-create-default-tokenizer. │
└─────┴──────────────────────────────────┘

NORMALIZERS
NORMALIZERS is an object. Its key is normalizer name and its value is normalizer detail:

{
"NORMALIZER_NAME_1": NORMALIZER_1,
"NORMALIZER_NAME_2": NORMALIZER_2,
...
"NORMALIZER_NAME_n": NORMALIZER_n
}

NORMALIZER
NORMALIZER is an object that describes normalizer detail:

{
"name": NORMALIZER_NAME
}

Here are properties of NORMALIZER:

┌─────┬──────────────────────────────────┐
│Name │ Description │
└─────┴──────────────────────────────────┘

name │ The normalizer name. It's used │
│ │ for table-create-normalizer. │
└─────┴──────────────────────────────────┘

TOKEN_FILTERS
TOKEN_FILTERS is an object. Its key is token filter name and its value is token filter
detail:

{
"TOKEN_FILTER_NAME_1": TOKEN_FILTER_1,
"TOKEN_FILTER_NAME_2": TOKEN_FILTER_2,
...
"TOKEN_FILTER_NAME_n": TOKEN_FILTER_n
}

TOKEN_FILTER
TOKEN_FILTER is an object that describes token filter detail:

{
"name": TOKEN_FILTER_NAME
}

Here are properties of TOKEN_FILTER:

┌─────┬──────────────────────────────────┐
│Name │ Description │
├─────┼──────────────────────────────────┤
name │ The token filter name. It's used │
│ │ for table-create-token-filters. │
└─────┴──────────────────────────────────┘

TABLES
TABLES is an object. Its key is table name and its value is table detail:

{
"TABLE_NAME_1": TABLE_1,
"TABLE_NAME_2": TABLE_2,
...
"TABLE_NAME_n": TABLE_n
}

TABLE
TABLE is an object that describes table detail:

{
"name": TABLE_NAME
"type": TYPE,
"key_type": KEY_TYPE,
"value_type": VALUE_TYPE,
"tokenizer": TOKENIZER,
"normalizer": NORMALIZER,
"token_filters": [
TOKEN_FILTER_1,
TOKEN_FILTER_2,
...,
TOKEN_FILTER_n,
],
"indexes": [
INDEX_1,
INDEX_2,
...,
INDEX_n
],
"command": COMMAND,
"columns": {
"COLUMN_NAME_1": COLUMN_1,
"COLUMN_NAME_2": COLUMN_2,
...,
"COLUMN_NAME_3": COLUMN_3,
}
}

Here are properties of TABLE:

┌──────────────┬──────────────────────────────────┐
│Name │ Description │
├──────────────┼──────────────────────────────────┤
name │ The table name. │
├──────────────┼──────────────────────────────────┤
type │ The table type. │
│ │ │
│ │ This is one of the followings: │
│ │ │
│ │ · array: table-no-key │
│ │ │
│ │ · hash: table-hash-key │
│ │ │
│ │ · patricia trie: │
│ │ table-pat-key │
│ │ │
│ │ · double array trie: │
│ │ table-dat-key │
└──────────────┴──────────────────────────────────┘

key_type │ The type of the table's key. │
│ │ │
│ │ If the table type is array, this │
│ │ is null. │
│ │ │
│ │ If the table type isn't array, │
│ │ this is an object that has the │
│ │ following properties: │
│ │ │
│ │ · name: The type name. │
│ │ │
│ │ · type: reference if │
│ │ the type is an │
│ │ table, type
│ │ otherwise. │
├──────────────┼──────────────────────────────────┤
value_type │ The type of the table's value. │
│ │ │
│ │ If the table doesn't use value, │
│ │ this is null. │
│ │ │
│ │ If the table uses value, this is │
│ │ an object that has the following │
│ │ properties: │
│ │ │
│ │ · name: The type name. │
│ │ │
│ │ · type: reference if │
│ │ the type is an │
│ │ table, type
│ │ otherwise. │
├──────────────┼──────────────────────────────────┤
tokenizer │ The tokenizer of the table. It's │
│ │ specified by │
│ │ table-create-default-tokenizer. │
│ │ │
│ │ If the table doesn't use │
│ │ tokenizer, this is null. │
│ │ │
│ │ If the table uses tokenizer, │
│ │ this is an object that has the │
│ │ following properties: │
│ │ │
│ │ · name: The tokenizer │
│ │ name. │
├──────────────┼──────────────────────────────────┤
normalizer │ The normalizer of the table. │
│ │ It's specified by │
│ │ table-create-normalizer. │
│ │ │
│ │ If the table doesn't use │
│ │ normalizer, this is null. │
│ │ │
│ │ If the table uses normalizer, │
│ │ this is an object that has the │
│ │ following properties: │
│ │ │
│ │ · name: The normalizer │
│ │ name. │
├──────────────┼──────────────────────────────────┤
token_filters │ The token filters of the table. │
│ │ It's specified by │
│ │ table-create-token-filters. │
│ │ │
│ │ This is an array of an object. │
│ │ The object has the following │
│ │ properties: │
│ │ │
│ │ · name: The token │
│ │ filter name. │
├──────────────┼──────────────────────────────────┤
indexes │ The indexes of the table's key. │
│ │ │
│ │ This is an array of INDEX. │
├──────────────┼──────────────────────────────────┤
command │ The Groonga command information │
│ │ to create the table. │
│ │ │
│ │ This is COMMAND. │
├──────────────┼──────────────────────────────────┤
columns │ The columns of the table. │
│ │ │
│ │ This is an object that its key │
│ │ is a column name and its value │
│ │ is COLUMN. │
└──────────────┴──────────────────────────────────┘

INDEX
INDEX is an object that describes index detail:

{
"full_name": INDEX_COLUMN_NAME_WITH_TABLE_NAME,
"table": TABLE_NAME,
"name": INDEX_COLUMN_NAME,
"section": SECTION
}

Here are properties of INDEX:

┌──────────┬──────────────────────────────────┐
│Name │ Description │
├──────────┼──────────────────────────────────┤
full_name │ The index column name with table │
│ │ name. │
│ │ │
│ │ For example, Terms.index. │
├──────────┼──────────────────────────────────┤
table │ The table name of the index │
│ │ column. │
│ │ │
│ │ For example, Terms. │
├──────────┼──────────────────────────────────┤
name │ The index column name. │
│ │ │
│ │ For example, index. │
├──────────┼──────────────────────────────────┤
section │ The section number in the index │
│ │ column for the table's key. │
│ │ │
│ │ If the index column isn't │
│ │ multiple column index, this is │
│ │ 0. │
└──────────┴──────────────────────────────────┘

COMMAND
COMMAND is an object that describes how to create the table or column:

{
"name": COMMAND_NAME,
"arguments": {
"KEY_1": "VALUE_1",
"KEY_2": "VALUE_2",
...,
"KEY_n": "VALUE_n"
},
"command_line": COMMAND_LINE
}

Here are properties of COMMAND:

┌─────────────┬──────────────────────────────────┐
│Name │ Description │
├─────────────┼──────────────────────────────────┤
name │ The Groonga command name to │
│ │ create the table or column. │
├─────────────┼──────────────────────────────────┤
arguments │ The arguments of the Groonga │
│ │ command to create the table or │
│ │ column. │
│ │ │
│ │ This is an object that its key │
│ │ is argument name and its value │
│ │ is argument value. │
├─────────────┼──────────────────────────────────┤
command_line │ The Groonga command line to │
│ │ create the table or column. │
│ │ │
│ │ This is a string that can be │
│ │ evaluated by Groonga. │
└─────────────┴──────────────────────────────────┘

COLUMN
COLUMN is an object that describes column detail:

{
"name": COLUMN_NAME,
"table": TABLE_NAME,
"full_name": COLUMN_NAME_WITH_TABLE,
"type": TYPE,
"value_type": VALUE_TYPE,
"compress": COMPRESS,
"section": SECTION,
"weight": WEIGHT,
"compress": COMPRESS,
"section": BOOLEAN,
"weight": BOOLEAN,
"position": BOOLEAN,
"sources": [
SOURCE_1,
SOURCE_2,
...,
SOURCE_n
],
"indexes": [
INDEX_1,
INDEX_2,
...,
INDEX_n
],
"command": COMMAND
}

Here are properties of COLUMN:

┌───────────┬───────────────────────────────────────┐
│Name │ Description │
└───────────┴───────────────────────────────────────┘

name │ The column name. │
│ │ │
│ │ For example, age. │
├───────────┼───────────────────────────────────────┤
table │ The table name of the column. │
│ │ │
│ │ For example, Users. │
├───────────┼───────────────────────────────────────┤
full_name │ The column name with table name. │
│ │ │
│ │ For example, Users.age. │
├───────────┼───────────────────────────────────────┤
type │ The column type. │
│ │ │
│ │ This is one of the followings: │
│ │ │
│ │ · scalar: │
│ │ /reference/columns/scalar
│ │ │
│ │ · vector: │
│ │ /reference/columns/vector
│ │ │
│ │ · index: │
│ │ /reference/columns/index
├───────────┼───────────────────────────────────────┤
value_type │ The type of the column's value. │
│ │ │
│ │ This is an object that has the │
│ │ following properties: │
│ │ │
│ │ · name: The type name. │
│ │ │
│ │ · type: reference if the │
│ │ type is an table, type
│ │ otherwise. │
├───────────┼───────────────────────────────────────┤
compress │ The compression method of the column. │
│ │ │
│ │ If the column doesn't use any │
│ │ compression methods, this is null. │
│ │ │
│ │ If the column uses a compression │
│ │ method, this is one of the │
│ │ followings: │
│ │ │
│ │ · zlib: The column uses │
│ │ zlib to compress column │
│ │ value. │
│ │ │
│ │ · lz4: The column uses LZ4 │
│ │ to compress column value. │
├───────────┼───────────────────────────────────────┤
section │ Whether the column can store section │
│ │ information or not. │
│ │ │
│ │ true if the column is created with │
│ │ WITH_SECTION flag, false otherwise. │
│ │ │
│ │ Normally, if the column isn't an │
│ │ index column, this is false. │
├───────────┼───────────────────────────────────────┤
weight │ Whether the column can store weight │
│ │ information or not. │
│ │ │
│ │ true if the column is created with │
│ │ WITH_WEIGHT flag, false otherwise. │
├───────────┼───────────────────────────────────────┤
position │ Whether the column can store position │
│ │ information or not. │
│ │ │
│ │ true if the column is created with │
│ │ WITH_POSITION flag, false otherwise. │
│ │ │
│ │ Normally, if the column isn't an │
│ │ index column, this is false. │
├───────────┼───────────────────────────────────────┤
sources │ The source columns of the index │
│ │ column. │
│ │ │
│ │ This is an array of SOURCE. │
│ │ │
│ │ Normally, if the column isn't an │
│ │ index column, this is an empty array. │
├───────────┼───────────────────────────────────────┤
indexes │ The indexes of the column. │
│ │ │
│ │ This is an array of INDEX. │
├───────────┼───────────────────────────────────────┤
command │ The Groonga command information to │
│ │ create the column. │
│ │ │
│ │ This is COMMAND. │
└───────────┴───────────────────────────────────────┘

SOURCE
SOURCE is an object that describes source detail:

{
"name": COLUMN_NAME,
"table": TABLE_NAME,
"full_name": COLUMN_NAME_WITH_TABLE_NAME
}

Here are properties of SOURCE:

┌──────────┬──────────────────────────────────┐
│Name │ Description │
├──────────┼──────────────────────────────────┤
name │ The source column name. │
│ │ │
│ │ For example, content. │
│ │ │
│ │ This may be a _key pseudo │
│ │ column. │
├──────────┼──────────────────────────────────┤
table │ The table name of the source │
│ │ column. │
│ │ │
│ │ For example, Memos. │
├──────────┼──────────────────────────────────┤
full_name │ The source column name with │
│ │ table name. │
│ │ │
│ │ For example, Memos.content. │
└──────────┴──────────────────────────────────┘

See also
· table_create

· column_create

select
Summary
select searches records that are matched to specified conditions from a table and then
outputs them.

select is the most important command in groonga. You need to understand select to use the
full power of Groonga.

Syntax
This command takes many parameters.

The required parameter is only table. Other parameters are optional:

select table
[match_columns=null]
[query=null]
[filter=null]
[scorer=null]
[sortby=null]
[output_columns="_id, _key, *"]
[offset=0]
[limit=10]
[drilldown=null]
[drilldown_sortby=null]
[drilldown_output_columns="_key, _nsubrecs"]
[drilldown_offset=0]
[drilldown_limit=10]
[cache=yes]
[match_escalation_threshold=0]
[query_expansion=null]
[query_flags=ALLOW_PRAGMA|ALLOW_COLUMN|ALLOW_UPDATE|ALLOW_LEADING_NOT|NONE]
[query_expander=null]
[adjuster=null]
[drilldown_calc_types=NONE]
[drilldown_calc_target=null]

select has the following named parameters for advanced drilldown:

· drilldown[${LABEL}].keys=null

· drilldown[${LABEL}].sortby=null

· drilldown[${LABEL}].output_columns="_key, _nsubrecs"

· drilldown[${LABEL}].offset=0

· drilldown[${LABEL}].limit=10

· drilldown[${LABEL}].calc_types=NONE

· drilldown[${LABEL}].calc_target=null

You can use one or more alphabets, digits, _ and . for ${LABEL}. For example, parent.sub1
is a valid ${LABEL}.

Parameters that have the same ${LABEL} are grouped.

For example, the following parameters specify one drilldown:

· --drilldown[label].keys column

· --drilldown[label].sortby -_nsubrecs

The following parameters specify two drilldowns:

· --drilldown[label1].keys column1

· --drilldown[label1].sortby -_nsubrecs

· --drilldown[label2].keys column2

· --drilldown[label2].sortby _key

Usage
Let's learn about select usage with examples. This section shows many popular usages.

Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries tag COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key": "The first post!",
"content": "Welcome! This is my first post!",
"n_likes": 5,
"tag": "Hello"},
{"_key": "Groonga",
"content": "I started to use Groonga. It's very fast!",
"n_likes": 10,
"tag": "Groonga"},
{"_key": "Mroonga",
"content": "I also started to use Mroonga. It's also very fast! Really fast!",
"n_likes": 15,
"tag": "Groonga"},
{"_key": "Good-bye Senna",
"content": "I migrated all Senna system!",
"n_likes": 3,
"tag": "Senna"},
{"_key": "Good-bye Tritonn",
"content": "I also migrated all Tritonn system!",
"n_likes": 3,
"tag": "Senna"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

There is a table, Entries, for blog entries. An entry has title, content, the number of
likes for the entry and tag. Title is key of Entries. Content is value of Entries.content
column. The number of likes is value of Entries.n_likes column. Tag is value of
Entries.tag column.

Entries._key column and Entries.content column are indexed using TokenBigram tokenizer. So
both Entries._key and Entries.content are fulltext search ready.

OK. The schema and data for examples are ready.

Simple usage
Here is the most simple usage with the above schema and data. It outputs all records in
Entries table.

Execution example:

select Entries
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

Why does the command output all records? There are two reasons. The first reason is that
the command doesn't specify any search conditions. No search condition means all records
are matched. The second reason is that the number of all records is 5. select command
outputs 10 records at a maximum by default. There are only 5 records. It is less than 10.
So the command outputs all records.

Search conditions
Search conditions are specified by query or filter. You can also specify both query and
filter. It means that selected records must be matched against both query and filter.

Search condition: query
query is designed for search box in Web page. Imagine a search box in google.com. You
specify search conditions for query as space separated keywords. For example, search
engine means a matched record should contain two words, search and engine.

Normally, query parameter is used for specifying fulltext search conditions. It can be
used for non fulltext search conditions but filter is used for the propose.

query parameter is used with match_columns parameter when query parameter is used for
specifying fulltext search conditions. match_columns specifies which columnes and indexes
are matched against query.

Here is a simple query usage example.

Execution example:

select Entries --match_columns content --query fast
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that contain a word fast in content column value from
Entries table.

query has query syntax but its deatils aren't described here. See
/reference/grn_expr/query_syntax for datails.

Search condition: filter
filter is designed for complex search conditions. You specify search conditions for filter
as ECMAScript like syntax.

Here is a simple filter usage example.

Execution example:

select Entries --filter 'content @ "fast" && _key == "Groonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that contain a word fast in content column value and
has Groonga as _key from Entries table. There are three operators in the command, @, &&
and ==. @ is fulltext search operator. && and == are the same as ECMAScript. && is logical
AND operator and == is equality operator.

filter has more operators and syntax like grouping by (...) its details aren't described
here. See /reference/grn_expr/script_syntax for datails.

Paging
You can specify range of outputted records by offset and limit. Here is an example to
output only the 2nd record.

Execution example:

select Entries --offset 1 --limit 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]

offset is zero-based. --offset 1 means output range is started from the 2nd record.

limit specifies the max number of output records. --limit 1 means the number of output
records is 1 at a maximium. If no records are matched, select command outputs no records.

The total number of records
You can use --limit 0 to retrieve the total number of recrods without any contents of
records.

Execution example:

select Entries --limit 0
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ]
# ]
# ]

--limit 0 is also useful for retrieving only the number of matched records.

Drilldown
You can get additional grouped results against the search result in one select. You need
to use two or more SELECT``s in SQL but ``select in Groonga can do it in one select.

This feature is called as drilldown in Groonga. It's also called as faceted search in
other search engine.

For example, think about the following situation.

You search entries that has fast word:

Execution example:

select Entries --filter 'content @ "fast"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

You want to use tag for additional search condition like --filter 'content @ "fast" && tag
== "???". But you don't know suitable tag until you see the result of content @ "fast".

If you know the number of matched records of each available tag, you can choose suitable
tag. You can use drilldown for the case:

Execution example:

select Entries --filter 'content @ "fast"' --drilldown tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ]
# ]
# ]
# ]

--drilldown tag returns a list of pair of available tag and the number of matched records.
You can avoid "no hit search" case by choosing a tag from the list. You can also avoid
"too many search results" case by choosing a tag that the number of matched records is few
from the list.

You can create the following UI with the drilldown results:

· Links to narrow search results. (Users don't need to input a search query by their
keyboard. They just click a link.)

Most EC sites use the UI. See side menu at Amazon.

Groonga supports not only counting grouped records but also finding the maximum and/or
minimum value from grouped records, summing values in grouped records and so on. See
Drilldown related parameters for details.

Parameters
This section describes all parameters. Parameters are categorized.

Required parameters
There is a required parameter, table.

table
Specifies a table to be searched. table must be specified.

If nonexistent table is specified, an error is returned.

Execution example:

select Nonexistent
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "invalid table name: <Nonexistent>",
# [
# [
# "grn_select",
# "proc.c",
# 1217
# ]
# ]
# ]
# ]

Search related parameters
There are search related parameters. Typically, match_columns and query parameters are
used for implementing a search box. filter parameters is used for implementing complex
search feature.

If both query and filter are specified, selected records must be matched against both
query and filter. If both query and filter aren't specified, all records are selected.

match_columns
Specifies the default target column for fulltext search by query parameter value. A target
column for fulltext search can be specified in query parameter. The difference between
match_columns and query is whether weight and score function are supported or not.
match_columns supports them but query doesn't.

Weight is relative importance of target column. A higher weight target column gets more
hit score rather than a lower weight target column when a record is matched by fulltext
search. The default weight is 1.

Here is a simple match_columns usage example.

Execution example:

select Entries --match_columns content --query fast --output_columns '_key, _score'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 1
# ],
# [
# "Mroonga",
# 2
# ]
# ]
# ]
# ]

--match_columns content means the default target column for fulltext search is content
column and its weight is 1. --output_columns '_key, _score' means that the select command
outputs _key value and _score value for matched records.

Pay attention to _score value. _score value is the number of matched counts against query
parameter value. In the example, query parameter value is fast. The fact that _score value
is 1 means that fast appers in content column only once. The fact that _score value is 2
means that fast appears in content column twice.

To specify weight, column * weight syntax is used. Here is a weight usage example.

Execution example:

select Entries --match_columns 'content * 2' --query fast --output_columns '_key, _score'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Mroonga",
# 4
# ]
# ]
# ]
# ]

--match_columns 'content * 2' means the default target column for fulltext search is
content column and its weight is 2.

Pay attention to _score value. _score value is doubled because weight is 2.

You can specify one or more columns as the default target columns for fulltext search. If
one or more columns are specified, fulltext search is done for all columns and scores are
accumulated. If one of the columns is matched against query parameter value, the record is
treated as matched.

To specify one or more columns, column1 * weight1 || column2 * weight2 || ... syntax is
used. * weight can be omitted. If it is omitted, 1 is used for weight. Here is a one or
more columns usage example.

Execution example:

select Entries --match_columns '_key * 10 || content' --query groonga --output_columns '_key, _score'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 11
# ]
# ]
# ]
# ]

--match_columns '_key * 10 || content' means the default target columns for fulltext
search are _key and content columns and _key column's weight is 10 and content column's
weight is 1. This weight allocation means _key column value is more important rather than
content column value. In this example, title of blog entry is more important rather thatn
content of blog entry.

You can also specify score function. See /reference/scorer for details.

Note that score function isn't related to scorer parameter.

query
Specifies the query text. Normally, it is used for fulltext search with match_columns
parameter. query parameter is designed for a fulltext search form in a Web page. A query
text should be formatted in /reference/grn_expr/query_syntax. The syntax is similar to
common search form like Google's search form. For example, word1 word2 means that groonga
searches records that contain both word1 and word2. word1 OR word2 means that groogna
searches records that contain either word1 or word2.

Here is a simple logical and search example.

Execution example:

select Entries --match_columns content --query "fast groonga"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that contain two words fast and groonga in content
column value from Entries table.

Here is a simple logical or search example.

Execution example:

select Entries --match_columns content --query "groonga OR mroonga"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that contain one of two words groonga or mroonga in
content column value from Entries table.

See /reference/grn_expr/query_syntax for other syntax.

It can be used for not only fulltext search but also other conditions. For example,
column:value means the value of column column is equal to value. column:<value means the
value of column column is less than value.

Here is a simple equality operator search example.

Execution example:

select Entries --query _key:Groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that _key column value is Groonga from Entries table.

Here is a simple less than operator search example.

Execution example:

select Entries --query n_likes:<11
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The select command searches records that n_likes column value is less than 11 from Entries
table.

See /reference/grn_expr/query_syntax for other operations.

filter
Specifies the filter text. Normally, it is used for complex search conditions. filter can
be used with query parameter. If both filter and query are specified, there are conbined
with logical and. It means that matched records should be matched against both filter and
query.

filter parameter is designed for complex conditions. A filter text should be formatted in
/reference/grn_expr/script_syntax. The syntax is similar to ECMAScript. For example,
column == "value" means that the value of column column is equal to "value". column <
value means that the value of column column is less than value.

Here is a simple equality operator search example.

Execution example:

select Entries --filter '_key == "Groonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that _key column value is Groonga from Entries table.

Here is a simple less than operator search example.

Execution example:

select Entries --filter 'n_likes < 11'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The select command searches records that n_likes column value is less than 11 from Entries
table.

See /reference/grn_expr/script_syntax for other operators.

Advanced search parameters
match_escalation_threshold
Specifies threshold to determine whether search storategy escalation is used or not. The
threshold is compared against the number of matched records. If the number of matched
records is equal to or less than the threshold, the search storategy escalation is used.
See /spec/search about the search storategy escalation.

The default threshold is 0. It means that search storategy escalation is used only when no
records are matched.

The default threshold can be customized by one of the followings.

· --with-match-escalation-threshold option of configure

· --match-escalation-threshold option of groogna command

· match-escalation-threshold configuration item in configuration file

Here is a simple match_escalation_threshold usage example. The first select doesn't have
match_escalation_threshold parameter. The second select has match_escalation_threshold
parameter.

Execution example:

select Entries --match_columns content --query groo
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]
select Entries --match_columns content --query groo --match_escalation_threshold -1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ]
# ]
# ]

The first select command searches records that contain a word groo in content column value
from Entries table. But no records are matched because the TokenBigram tokenizer tokenizes
groonga to groonga not gr|ro|oo|on|ng|ga. (The TokenBigramSplitSymbolAlpha tokenizer
tokenizes groonga to gr|ro|oo|on|ng|ga. See /reference/tokenizers for details.) It means
that groonga is indexed but groo isn't indexed. So no records are matched against groo by
exact match. In the case, the search storategy escalation is used because the number of
matched records (0) is equal to match_escalation_threshold (0). One record is matched
against groo by unsplit search.

The second select command also searches records that contain a word groo in content column
value from Entries table. And it also doesn't found matched records. In this case, the
search storategy escalation is not used because the number of matched records (0) is
larger than match_escalation_threshold (-1). So no more searches aren't executed. And no
records are matched.

query_expansion
Deprecated. Use query_expander instead.

query_flags
It customs query parameter syntax. You cannot update column value by query parameter by
default. But if you specify ALLOW_COLUMN|ALLOW_UPDATE as query_flags, you can update
column value by query.

Here are available values:

· ALLOW_PRAGMA

· ALLOW_COLUMN

· ALLOW_UPDATE

· ALLOW_LEADING_NOT

· NONE

ALLOW_PRAGMA enables pragma at the head of query. This is not implemented yet.

ALLOW_COLUMN enables search againt columns that are not included in match_columns. To
specify column, there are COLUMN:... syntaxes.

ALLOW_UPDATE enables column update by query with COLUMN:=NEW_VALUE syntax. ALLOW_COLUMN is
also required to update column because the column update syntax specifies column.

ALLOW_LEADING_NOT enables leading NOT condition with -WORD syntax. The query searches
records that doesn't match WORD. Leading NOT condition query is heavy query in many cases
because it matches many records. So this flag is disabled by default. Be careful about it
when you use the flag.

NONE is just ignores. You can use NONE for specifying no flags.

They can be combined by separated | such as ALLOW_COLUMN|ALLOW_UPDATE.

The default value is ALLOW_PRAGMA|ALLOW_COLUMN.

Here is a usage example of ALLOW_COLUMN.

Execution example:

select Entries --query content:@mroonga --query_flags ALLOW_COLUMN
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that contain mroonga in content column value from
Entries table.

Here is a usage example of ALLOW_UPDATE.

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users age COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "alice", "age": 18},
{"_key": "bob", "age": 20}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select Users --query age:=19 --query_flags ALLOW_COLUMN|ALLOW_UPDATE
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt32"
# ]
# ],
# [
# 1,
# "alice",
# 19
# ],
# [
# 2,
# "bob",
# 19
# ]
# ]
# ]
# ]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt32"
# ]
# ],
# [
# 1,
# "alice",
# 19
# ],
# [
# 2,
# "bob",
# 19
# ]
# ]
# ]
# ]

The first select command sets age column value of all records to 19. The second select
command outputs updated age column values.

Here is a usage example of ALLOW_LEADING_NOT.

Execution example:

select Entries --match_columns content --query -mroonga --query_flags ALLOW_LEADING_NOT
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The select command searches records that don't contain mroonga in content column value
from Entries table.

Here is a usage example of NONE.

Execution example:

select Entries --match_columns content --query 'mroonga OR _key:Groonga' --query_flags NONE
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that contain one of two words mroonga or _key:Groonga
in content from Entries table. Note that _key:Groonga doesn't mean that the value of _key
column is equal to Groonga. Because ALLOW_COLUMN flag is not specified.

See also /reference/grn_expr/query_syntax.

query_expander
It's for query expansion. Query expansion substitutes specific words to another words in
query. Nomally, it's used for synonym search.

It specifies a column that is used to substitute query parameter value. The format of this
parameter value is "${TABLE}.${COLUMN}". For example, "Terms.synonym" specifies synonym
column in Terms table.

Table for query expansion is called "substitution table". Substitution table's key must be
ShortText. So array table (TABLE_NO_KEY) can't be used for query expansion. Because array
table doesn't have key.

Column for query expansion is called "substitution column". Substitution column's value
type must be ShortText. Column type must be vector (COLUMN_VECTOR).

Query expansion substitutes key of substitution table in query with values in substitution
column. If a word in query is a key of substitution table, the word is substituted with
substitution column value that is associated with the key. Substition isn't performed
recursively. It means that substitution target words in substituted query aren't
substituted.

Here is a sample substitution table to show a simple query_expander usage example.

Execution example:

table_create Thesaurus TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Thesaurus synonym COLUMN_VECTOR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Thesaurus
[
{"_key": "mroonga", "synonym": ["mroonga", "tritonn", "groonga mysql"]},
{"_key": "groonga", "synonym": ["groonga", "senna"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

Thesaurus substitution table has two synonyms, "mroonga" and "groonga". If an user
searches with "mroonga", Groonga searches with "((mroonga) OR (tritonn) OR (groonga
mysql))". If an user searches with "groonga", Groonga searches with "((groonga) OR
(senna))".

Normally, it's good idea that substitution table uses a normalizer. For example, if
normalizer is used, substitute target word is matched in case insensitive manner. See
/reference/normalizers for available normalizers.

Note that those synonym values include the key value such as "mroonga" and "groonga". It's
recommended that you include the key value. If you don't include key value, substituted
value doesn't include the original substitute target value. Normally, including the
original value is better search result. If you have a word that you don't want to be
searched, you should not include the original word. For example, you can implement "stop
words" by an empty vector value.

Here is a simple query_expander usage example.

Execution example:

select Entries --match_columns content --query "mroonga"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
select Entries --match_columns content --query "mroonga" --query_expander Thesaurus.synonym
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]
select Entries --match_columns content --query "((mroonga) OR (tritonn) OR (groonga mysql))"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The first select command doesn't use query expansion. So a record that has "tritonn" isn't
found. The second select command uses query expansion. So a record that has "tritonn" is
found. The third select command doesn't use query expansion but it is same as the second
select command. The third one uses expanded query.

Each substitute value can contain any /reference/grn_expr/query_syntax syntax such as
(...) and OR. You can use complex substitution by using those syntax.

Here is a complex substitution usage example that uses query syntax.

Execution example:

load --table Thesaurus
[
{"_key": "popular", "synonym": ["popular", "n_likes:>=10"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select Entries --match_columns content --query "popular" --query_expander Thesaurus.synonym
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

The load command registers a new synonym "popular". It is substituted with ((popular) OR
(n_likes:>=10)). The substituted query means that "popular" is containing the word
"popular" or 10 or more liked entries.

The select command outputs records that n_likes column value is equal to or more than 10
from Entries table.

Output related parameters
output_columns
Specifies output columns separated by ,.

Here is a simple output_columns usage example.

Execution example:

select Entries --output_columns '_id, _key' --limit 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!"
# ]
# ]
# ]
# ]

The select command just outputs _id and _key column values.

* is a special value. It means that all columns that are not /reference/columns/pseudo.

Here is a * usage example.

Execution example:

select Entries --output_columns '_key, *' --limit 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ]
# ]
# ]
# ]

The select command outputs _key pseudo column, content column and n_likes column values
but doesn't output _id pseudo column value.

The default value is _id, _key, *. It means that all column values except _score are
outputted.

sortby
Specifies sort keys separated by ,. Each sort key is column name.

Here is a simple sortby usage example.

Execution example:

select Entries --sortby 'n_likes, _id'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

The select command sorts by n_likes column value in ascending order. For records that has
the same n_likes are sorted by _id in ascending order. "Good-bye Senna" and "Good-bye
Tritonn" are the case.

If you want to sort in descending order, add - before column name.

Here is a descending order sortby usage example.

Execution example:

select Entries --sortby '-n_likes, _id'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The select command sorts by n_likes column value in descending order. But ascending order
is used for sorting by _id.

You can use _score pseudo column in sortby if you use query or filter parameter.

Execution example:

select Entries --match_columns content --query fast --sortby -_score --output_columns '_key, _score'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Mroonga",
# 2
# ],
# [
# "Groonga",
# 1
# ]
# ]
# ]
# ]

The select command sorts matched records by hit score in descending order and outputs
record key and hit score.

If you use _score without query nor filter parameters, it's just ignored but get a warning
in log file.

offset
Specifies offset to determine output records range. Offset is zero-based. --offset 1 means
output range is started from the 2nd record.

Execution example:

select Entries --sortby _id --offset 3 --output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Good-bye Senna"
# ],
# [
# "Good-bye Tritonn"
# ]
# ]
# ]
# ]

The select command outputs from the 4th record.

You can specify negative value. It means that the number of matched records + offset. If
you have 3 matched records and specify --offset -2, you get records from the 2nd (3 + -2 =
1. 1 means 2nd. Remember that offset is zero-based.) record to the 3rd record.

Execution example:

select Entries --sortby _id --offset -2 --output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Good-bye Senna"
# ],
# [
# "Good-bye Tritonn"
# ]
# ]
# ]
# ]

The select command outputs from the 4th record because the total number of records is 5.

The default value is 0.

limit
Specifies the max number of output records. If the number of matched records is less than
limit, all records are outputted.

Here is a simple limit usage example.

Execution example:

select Entries --sortby _id --offset 2 --limit 3 --output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Mroonga"
# ],
# [
# "Good-bye Senna"
# ],
# [
# "Good-bye Tritonn"
# ]
# ]
# ]
# ]

The select command outputs the 3rd, the 4th and the 5th records.

You can specify negative value. It means that the number of matched records + limit + 1.
For example, --limit -1 outputs all records. It's very useful value to show all records.

Here is a simple negative limit value usage example.

Execution example:

select Entries --limit -1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The select command outputs all records.

The default value is 10.

scorer
TODO: write in English and add example.

検索条件にマッチする全てのレコードに対して適用するgrn_exprをscript形式で指定します。

scorerは、検索処理が完了し、ソート処理が実行される前に呼び出されます。従って、各レコードのスコアを操作する式を指定しておけば、検索結果のソート順序をカスタマイズできるようになります。

Drilldown related parameters
This section describes basic drilldown related parameters. Advanced drilldown related
parameters are described in another section.

drilldown
Specifies keys for grouping separated by ,.

Matched records by specified search conditions are grouped by each key. If you specify no
search condition, all records are grouped by each key.

Here is a simple drilldown example:

Execution example:

select Entries \
--output_columns _key,tag \
--drilldown tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# "The first post!",
# "Hello"
# ],
# [
# "Groonga",
# "Groonga"
# ],
# [
# "Mroonga",
# "Groonga"
# ],
# [
# "Good-bye Senna",
# "Senna"
# ],
# [
# "Good-bye Tritonn",
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

The select command outputs the following information:

· There is one record that has "Hello" tag.

· There is two records that has "Groonga" tag.

· There is two records that has "Senna" tag.

Here is a drilldown with search condition example:

Execution example:

select Entries \
--output_columns _key,tag \
--filter 'n_likes >= 5' \
--drilldown tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# "The first post!",
# "Hello"
# ],
# [
# "Groonga",
# "Groonga"
# ],
# [
# "Mroonga",
# "Groonga"
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ]
# ]
# ]
# ]

The select command outputs the following information:

· In records that have 5 or larger as n_likes value:

· There is one record that has "Hello" tag.

· There is two records that has "Groonga" tag.

Here is a drilldown with multiple group keys example:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown tag,n_likes
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ]
# ],
# [
# [
# 4
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 5,
# 1
# ],
# [
# 10,
# 1
# ],
# [
# 15,
# 1
# ],
# [
# 3,
# 2
# ]
# ]
# ]
# ]

The select command outputs the following information:

· About tag:

· There is one record that has "Hello" tag.

· There is two records that has "Groonga" tag.

· There is two records that has "Senna" tag.

· About n_likes:

· There is one record that has "Hello" tag.

· There is two records that has "Groonga" tag.

· There is two records that has "Senna" tag.

drilldown_sortby
Specifies sort keys for drilldown outputs separated by ,. Each sort key is column name.

You can refer the number of grouped records by _nsubrecs /reference/columns/pseudo.

Here is a simple drilldown_sortby example:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown 'tag, n_likes' \
--drilldown_sortby '-_nsubrecs, _key'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ],
# [
# "Hello",
# 1
# ]
# ],
# [
# [
# 4
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 3,
# 2
# ],
# [
# 5,
# 1
# ],
# [
# 10,
# 1
# ],
# [
# 15,
# 1
# ]
# ]
# ]
# ]

Drilldown result is sorted by the number of grouped records (= _nsubrecs ) in descending
order. If there are grouped results that the number of records in the group are the same,
these grouped results are sorted by grouped key (= _key ) in ascending order.

The sort keys are used in all group keys specified in drilldown:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown 'tag, n_likes' \
--drilldown_sortby '-_nsubrecs, _key'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ],
# [
# "Hello",
# 1
# ]
# ],
# [
# [
# 4
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 3,
# 2
# ],
# [
# 5,
# 1
# ],
# [
# 10,
# 1
# ],
# [
# 15,
# 1
# ]
# ]
# ]
# ]

The same sort keys are used in tag drilldown and n_likes drilldown.

If you want to use different sort keys for each drilldown, use Advanced drilldown related
parameters.

drilldown_output_columns
Specifies output columns for drilldown separated by ,.

Here is a drilldown_output_columns example:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Hello"
# ],
# [
# "Groonga"
# ],
# [
# "Senna"
# ]
# ]
# ]
# ]

The select command just outputs grouped key.

If grouped key is a referenced type column (= column that its type is a table), you can
access column of the table referenced by the referenced type column.

Here are a schema definition and sample data to show drilldown against referenced type
column:

Execution example:

table_create Tags TABLE_HASH_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Tags label COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Tags priority COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Items TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Items tag COLUMN_SCALAR Tags
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Tags
[
{"_key": "groonga", label: "Groonga", priority: 10},
{"_key": "mroonga", label: "Mroonga", priority: 5}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
load --table Items
[
{"_key": "A", "tag": "groonga"},
{"_key": "B", "tag": "groonga"},
{"_key": "C", "tag": "mroonga"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

Tags table is a referenced table. Items.tag is a referenced type column.

You can refer Tags.label by label in drilldown_output_columns:

Execution example:

select Items \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_output_columns '_key, label'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "Tags"
# ]
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "label",
# "ShortText"
# ]
# ],
# [
# "groonga",
# "Groonga"
# ],
# [
# "mroonga",
# "Mroonga"
# ]
# ]
# ]
# ]

You can use * to refer all columns in referenced table (= Tags):

Execution example:

select Items \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_output_columns '_key, *'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "Tags"
# ]
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "label",
# "ShortText"
# ],
# [
# "priority",
# "Int32"
# ]
# ],
# [
# "groonga",
# "Groonga",
# 10
# ],
# [
# "mroonga",
# "Mroonga",
# 5
# ]
# ]
# ]
# ]

* is expanded to label, priority.

The default value of drilldown_output_columns is _key, _nsubrecs. It means that grouped
key and the number of records in the group are output.

You can use more /reference/columns/pseudo in drilldown_output_columns such as _max, _min,
_sum and _avg when you use drilldown_calc_types. See drilldown_calc_types document for
details.

drilldown_offset
Specifies offset to determine range of drilldown output records. Offset is zero-based.
--drilldown_offset 1 means output range is started from the 2nd record.

Here is a drilldown_offset example:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_sortby _key \
--drilldown_offset 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

The select command outputs from the 2nd record.

You can specify negative value. It means that the number of grouped results + offset. If
you have 3 grouped results and specify --drilldown_offset -2, you get grouped results from
the 2st (3 + -2 = 1. 1 means 2nd. Remember that offset is zero-based.) grouped result to
the 3rd grouped result.

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_sortby _key \
--drilldown_offset -2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

The select command outputs from the 2nd grouped result because the total number of grouped
results is 3.

The default value of drilldown_offset is 0.

drilldown_limit
Specifies the max number of groups in a drilldown. If the number of groups is less than
drilldown_limit, all groups are outputted.

Here is a drilldown_limit example:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_sortby _key \
--drilldown_offset 1 \
--drilldown_limit 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

The select command outputs the 2rd and the 3rd groups.

You can specify negative value. It means that the number of groups + drilldown_limit + 1.
For example, --drilldown_limit -1 outputs all groups. It's very useful value to show all
groups.

Here is a negative drilldown_limit value example.

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_sortby _key \
--drilldown_limit -1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Hello",
# 1
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

The select command outputs all groups.

The default value of drilldown_limit is 10.

drilldown_calc_types
Specifies how to calculate (aggregate) values in grouped records by a drilldown. You can
specify multiple calculation types separated by ",". For example, MAX,MIN.

Calculation target values are read from a column of grouped records. The column is
specified by drilldown_calc_target.

You can read calculated value by /reference/columns/pseudo such as _max and _min in
drilldown_output_columns.

You can use the following calculation types:

┌──────────┬───────────────────────────┬───────────────────────┬─────────────────────┐
│Type name │ /reference/columns/pseudo │ Need │ Description │
│ │ name │ drilldown_calc_target │ │
├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤
NONE │ Nothing. │ Not needs. │ Just ignored. │
├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤
COUNT_nsubrecs │ Not needs. │ Counting grouped │
│ │ │ │ records. It's │
│ │ │ │ always enabled. So │
│ │ │ │ you don't need to │
│ │ │ │ specify it. │
├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤
MAX_max │ Needs. │ Finding the maximum │
│ │ │ │ integer value from │
│ │ │ │ integer values in │
│ │ │ │ grouped records. │
├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤
MIN_min │ Needs. │ Finding the minimum │
│ │ │ │ integer value from │
│ │ │ │ integer values in │
│ │ │ │ grouped records. │
├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤
SUM_sum │ Needs. │ Summing integer │
│ │ │ │ values in grouped │
│ │ │ │ records. │
├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤
AVG_avg │ Needs. │ Averaging │
│ │ │ │ integer/float │
│ │ │ │ values in grouped │
│ │ │ │ records. │
└──────────┴───────────────────────────┴───────────────────────┴─────────────────────┘

Here is a MAX example:

Execution example:

select Entries \
--limit -1 \
--output_column _id,n_likes \
--drilldown tag \
--drilldown_calc_types MAX \
--drilldown_calc_target n_likes \
--drilldown_output_columns _key,_max
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_max",
# "Int64"
# ]
# ],
# [
# "Hello",
# 5
# ],
# [
# "Groonga",
# 15
# ],
# [
# "Senna",
# 3
# ]
# ]
# ]
# ]

The select command groups all records by tag column value, finding the maximum n_likes
column value for each group and outputs pairs of grouped key and the maximum n_likes
column value for the group. It uses _max /reference/columns/pseudo to read the maximum
n_likes column value.

Here is a MIN example:

Execution example:

select Entries \
--limit -1 \
--output_column _id,n_likes \
--drilldown tag \
--drilldown_calc_types MIN \
--drilldown_calc_target n_likes \
--drilldown_output_columns _key,_min
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_min",
# "Int64"
# ]
# ],
# [
# "Hello",
# 5
# ],
# [
# "Groonga",
# 10
# ],
# [
# "Senna",
# 3
# ]
# ]
# ]
# ]

The select command groups all records by tag column value, finding the minimum n_likes
column value for each group and outputs pairs of grouped key and the minimum n_likes
column value for the group. It uses _min /reference/columns/pseudo to read the minimum
n_likes column value.

Here is a SUM example:

Execution example:

select Entries \
--limit -1 \
--output_column _id,n_likes \
--drilldown tag \
--drilldown_calc_types SUM \
--drilldown_calc_target n_likes \
--drilldown_output_columns _key,_sum
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_sum",
# "Int64"
# ]
# ],
# [
# "Hello",
# 5
# ],
# [
# "Groonga",
# 25
# ],
# [
# "Senna",
# 6
# ]
# ]
# ]
# ]

The select command groups all records by tag column value, sums all n_likes column values
for each group and outputs pairs of grouped key and the summed n_likes column values for
the group. It uses _sum /reference/columns/pseudo to read the summed n_likes column
values.

Here is a AVG example:

Execution example:

select Entries \
--limit -1 \
--output_column _id,n_likes \
--drilldown tag \
--drilldown_calc_types AVG \
--drilldown_calc_target n_likes \
--drilldown_output_columns _key,_avg
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_avg",
# "Float"
# ]
# ],
# [
# "Hello",
# 5.0
# ],
# [
# "Groonga",
# 12.5
# ],
# [
# "Senna",
# 3.0
# ]
# ]
# ]
# ]

The select command groups all records by tag column value, averages all n_likes column
values for each group and outputs pairs of grouped key and the averaged n_likes column
values for the group. It uses _avg /reference/columns/pseudo to read the averaged n_likes
column values.

Here is an example that uses all calculation types:

Execution example:

select Entries \
--limit -1 \
--output_column _id,n_likes \
--drilldown tag \
--drilldown_calc_types MAX,MIN,SUM,AVG \
--drilldown_calc_target n_likes \
--drilldown_output_columns _key,_nsubrecs,_max,_min,_sum,_avg
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ],
# [
# "_max",
# "Int64"
# ],
# [
# "_min",
# "Int64"
# ],
# [
# "_sum",
# "Int64"
# ],
# [
# "_avg",
# "Float"
# ]
# ],
# [
# "Hello",
# 1,
# 5,
# 5,
# 5,
# 5.0
# ],
# [
# "Groonga",
# 2,
# 15,
# 10,
# 25,
# 12.5
# ],
# [
# "Senna",
# 2,
# 3,
# 3,
# 6,
# 3.0
# ]
# ]
# ]
# ]

The select command specifies multiple calculation types separated by "," like
MAX,MIN,SUM,AVG. You can use _nsubrecs /reference/columns/pseudo in
drilldown_output_columns without specifying COUNT in drilldown_calc_types. Because COUNT
is always enabled.

The default value of drilldown_calc_types is NONE. It means that only COUNT is enabled.
Because NONE is just ignored and COUNT is always enabled.

drilldown_calc_target
Specifies the target column for drilldown_calc_types.

If you specify a calculation type that needs a target column such as MAX in
drilldown_calc_types but you omit drilldown_calc_target, the calculation result is always
0.

You can specify only one column name like --drilldown_calc_target n_likes. You can't
specify multiple column name like --drilldown_calc_target _key,n_likes.

You can use referenced value from the target record by combining "." like
--drilldown_calc_target reference_column.nested_reference_column.value.

See drilldown_calc_types to know how to use drilldown_calc_target.

The default value of drilldown_calc_target is null. It means that no calculation target
column is specified.

Advanced drilldown related parameters
You can get multiple drilldown results by specifying multiple group keys by drilldown. But
you need to use the same configuration for all drilldowns. For example,
drilldown_output_columns is used by all drilldowns.

You can use a configuration for each drilldown by the following parameters:

· drilldown[${LABEL}].keys

· drilldown[${LABEL}].sortby

· drilldown[${LABEL}].output_columns

· drilldown[${LABEL}].offset

· drilldown[${LABEL}].limit

· drilldown[${LABEL}].calc_types

· drilldown[${LABEL}].calc_target

${LABEL} is a variable. You can use the following characters for ${LABEL}:

· Alphabets

· Digits

· .

· _

NOTE:
You can use more characters but it's better that you use only these characters.

Parameters that has the same ${LABEL} value are grouped. Grouped parameters are used for
one drilldown.

For example, there are 2 groups for the following parameters:

· --drilldown[label1].keys _key

· --drilldown[label1].output_columns _nsubrecs

· --drilldown[label2].keys tag

· --drilldown[label2].output_columns _key,_nsubrecs

drilldown[label1].keys and drilldown[label1].output_columns are grouped.
drilldown[label2].keys and drilldown[label2].output_columns are also grouped.

In label1 group, _key is used for group key and _nsubrecs is used for output columns.

In label2 group, tag is used for group key and _key,_nsubrecs is used for output columns.

See document for corresponding drilldown_XXX parameter to know how to use it for the
following parameters:

· drilldown[${LABEL}].sortby: drilldown_sortby

· drilldown[${LABEL}].offset: drilldown_offset

· drilldown[${LABEL}].limit: drilldown_limit

· drilldown[${LABEL}].calc_types: drilldown_calc_types

· drilldown[${LABEL}].calc_target: drilldown_calc_target

The following parameters are needed more description:

· drilldown[${LABEL}].keys

· drilldown[${LABEL}].output_columns

Output format is different a bit. It's also needed more description.

drilldown[${LABEL}].keys
drilldown can specify multiple keys for multiple drilldowns. But it can't specify multiple
keys for one drilldown.

drilldown[${LABEL}].keys can't specify multiple keys for multiple drilldowns. But it can
specify multiple keys for one drilldown.

You can specify multiple keys separated by ",".

Here is an example to group by multiple keys, tag and n_likes column values:

Execution example:

select Entries \
--limit -1 \
--output_column tag,n_likes \
--drilldown[tag.n_likes].keys tag,n_likes \
--drilldown[tag.n_likes].output_columns _value.tag,_value.n_likes,_nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ],
# {
# "tag.n_likes": [
# [
# 4
# ],
# [
# [
# "tag",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 5,
# 1
# ],
# [
# "Groonga",
# 10,
# 1
# ],
# [
# "Groonga",
# 15,
# 1
# ],
# [
# "Senna",
# 3,
# 2
# ]
# ]
# }
# ]
# ]

tag.n_likes is used as the label for the drilldown parameters group. You can refer grouped
keys by _value.${KEY_NAME} syntax in drilldown[${LABEL}].output_columns. ${KEY_NAME} is a
column name to be used by group key. tag and n_likes are ${KEY_NAME} in this case.

Note that you can't use _value.${KEY_NAME} syntax when you just specify one key as
drilldown[${LABEL}].keys like --drilldown[tag].keys tag. You should use _key for the case.
It's the same rule in drilldown_output_columns.

drilldown[${LABEL}].output_columns
It's almost same as drilldown_output_columns. The difference between
drilldown_output_columns and drilldown[${LABEL}].output_columns is how to refer group
keys.

drilldown_output_columns uses _key /reference/columns/pseudo to refer group key.
drilldown[${LABEL}].output_columns also uses _key /reference/columns/pseudo to refer group
key when you specify only one group key by drilldown[${LABEL}].keys.

Here is an example to refer single group key by _key /reference/columns/pseudo:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown[tag.n_likes].keys tag,n_likes \
--drilldown[tag.n_likes].output_columns _value.tag,_value.n_likes
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# {
# "tag.n_likes": [
# [
# 4
# ],
# [
# [
# "tag",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# "Hello",
# 5
# ],
# [
# "Groonga",
# 10
# ],
# [
# "Groonga",
# 15
# ],
# [
# "Senna",
# 3
# ]
# ]
# }
# ]
# ]

But you can't refer each group key by _key /reference/columns/pseudo in
drilldown[${LABEL}].output_columns. You need to use _value.${KEY_NAME} syntax. ${KEY_NAME}
is a column name that is used for group key in drilldown[${LABEL}].keys.

Here is an example to refer each group key in multiple group keys by _value.${KEY_NAME}
syntax:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown[tag.n_likes].keys tag,n_likes \
--drilldown[tag.n_likes].output_columns _value.tag,_value.n_likes
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# {
# "tag.n_likes": [
# [
# 4
# ],
# [
# [
# "tag",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# "Hello",
# 5
# ],
# [
# "Groonga",
# 10
# ],
# [
# "Groonga",
# 15
# ],
# [
# "Senna",
# 3
# ]
# ]
# }
# ]
# ]

TIP:
Why _value.${KEY_NAME} syntax?

It's implementation specific information.

_key is a vector value. The vector value is consists of all group keys. You can see
byte sequence of the vector value by referring _key in
drilldown[${LABEL}].output_columns.

There is one grouped record in _value to refer each grouped values when you specify
multiple group keys to drilldown[${LABEL}].keys. So you can refer each group key by
_value.${KEY_NAME} syntax.

On the other hand, there is no grouped record in _value when you specify only one group
key to drilldown[${LABEL}].keys. So you can't refer group key by _value.${KEY_NAME}
syntax.

Output format for drilldown[${LABEL}] style
There is a difference in output format between drilldown and drilldown[${LABEL}].keys.
drilldown uses array to output multiple drilldown results. drilldown[${LABEL}].keys uses
pairs of label and drilldown result.

drilldown uses the following output format:

[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT1,
DRILLDOWN_RESULT2,
...
]
]

drilldown[${LABEL}].keys uses the following output format:

[
HEADER,
[
SEARCH_RESULT,
{
"LABEL1": DRILLDOWN_RESULT1,
"LABEL2": DRILLDOWN_RESULT2,
...
}
]
]

Cache related parameter
cache
Specifies whether caching the result of this query or not.

If the result of this query is cached, the next same query returns response quickly by
using the cache.

It doesn't control whether existing cached result is used or not.

Here are available values:

┌──────┬──────────────────────────────────┐
│Value │ Description │
├──────┼──────────────────────────────────┤
no │ Don't cache the output of this │
│ │ query. │
├──────┼──────────────────────────────────┤
yes │ Cache the output of this query. │
│ │ It's the default value. │
└──────┴──────────────────────────────────┘

Here is an example to disable caching the result of this query:

Execution example:

select Entries --cache no
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The default value is yes.

Score related parameters
There is a score related parameter, adjuster.

adjuster
Specifies one or more score adjust expressions. You need to use adjuster with query or
filter. adjuster doesn't work with not searched request.

You can increase score of specific records by adjuster. You can use adjuster to set high
score for important records.

For example, you can use adjuster to increase score of records that have groonga tag.

Here is the syntax:

--adjuster "SCORE_ADJUST_EXPRESSION1 + SCORE_ADJUST_EXPRESSION2 + ..."

Here is the SCORE_ADJUST_EXPRESSION syntax:

COLUMN @ "KEYWORD" * FACTOR

Note the following:

· COLUMN must be indexed.

· "KEYWORD" must be a string.

· FACTOR must be a positive integer.

Here is a sample adjuster usage example that uses just one SCORE_ADJUST_EXPRESSION:

Execution example:

select Entries \
--filter true \
--adjuster 'content @ "groonga" * 5' \
--output_columns _key,content,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "The first post!",
# "Welcome! This is my first post!",
# 1
# ],
# [
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 6
# ],
# [
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1
# ],
# [
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1
# ],
# [
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1
# ]
# ]
# ]
# ]

The select command matches all records. Then it applies adjuster. The adjuster increases
score of records that have "groonga" in Entries.content column by 5. There is only one
record that has "groonga" in Entries.content column. So the record that its key is
"Groonga" has score 6 (= 1 + 5).

You can omit FACTOR. If you omit FACTOR, it is treated as 1.

Here is a sample adjuster usage example that omits FACTOR:

Execution example:

select Entries \
--filter true \
--adjuster 'content @ "groonga"' \
--output_columns _key,content,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "The first post!",
# "Welcome! This is my first post!",
# 1
# ],
# [
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 2
# ],
# [
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1
# ],
# [
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1
# ],
# [
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1
# ]
# ]
# ]
# ]

The adjuster in the select command doesn't have FACTOR. So the factor is treated as 1.
There is only one record that has "groonga" in Entries.content column. So the record that
its key is "Groonga" has score 2 (= 1 + 1).

Here is a sample adjuster usage example that uses multiple SCORE_ADJUST_EXPRESSION:

Execution example:

select Entries \
--filter true \
--adjuster 'content @ "groonga" * 5 + content @ "started" * 3' \
--output_columns _key,content,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "The first post!",
# "Welcome! This is my first post!",
# 1
# ],
# [
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 9
# ],
# [
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 4
# ],
# [
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1
# ],
# [
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1
# ]
# ]
# ]
# ]

The adjuster in the select command has two SCORE_ADJUST_EXPRESSION s. The final increased
score is sum of scores of these SCORE_ADJUST_EXPRESSION s. All SCORE_ADJUST_EXPRESSION s
in the select command are applied to a record that its key is "Groonga". So the final
increased score of the record is sum of scores of all SCORE_ADJUST_EXPRESSION s.

The first SCORE_ADJUST_EXPRESSION is content @ "groonga" * 5. It increases score by 5.

The second SCORE_ADJUST_EXPRESSION is content @ "started" * 3. It increases score by 3.

The final increased score is 9 (= 1 + 5 + 3).

A SCORE_ADJUST_EXPRESSION has a factor for "KEYWORD". This means that increased scores of
all records that has "KEYWORD" are the same value. You can change increase score for each
record that has the same "KEYWORD". It is useful to tune search score. See
weight-vector-column for details.

Return value
select returns response with the following format:

[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT_1,
DRILLDOWN_RESULT_2,
...,
DRILLDOWN_RESULT_N
]
]

If select fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

There are zero or more DRILLDOWN_RESULT. If no drilldown and drilldown[${LABEL}].keys are
specified, they are omitted like the following:

[
HEADER,
[
SEARCH_RESULT
]
]

If drilldown has two or more keys like --drilldown "_key, column1, column2", multiple
DRILLDOWN_RESULT exist:

[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT_FOR_KEY,
DRILLDOWN_RESULT_FOR_COLUMN1,
DRILLDOWN_RESULT_FOR_COLUMN2
]
]

If drilldown[${LABEL}].keys is used, only one DRILLDOWN_RESULT exist:

[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT_FOR_LABELED_DRILLDOWN
]
]

DRILLDOWN_RESULT format is different between drilldown and drilldown[${LABEL}].keys. It's
described later.

SEARCH_RESULT is the following format:

[
[N_HITS],
COLUMNS,
RECORDS
]

See Simple usage for concrete example of the format.

N_HITS is the number of matched records before limit is applied.

COLUMNS describes about output columns specified by output_columns. It uses the following
format:

[
[COLUMN_NAME_1, COLUMN_TYPE_1],
[COLUMN_NAME_2, COLUMN_TYPE_2],
...,
[COLUMN_NAME_N, COLUMN_TYPE_N]
]

COLUMNS includes one or more output column information. Each output column information
includes the followings:

· Column name as string

· Column type as string or null

Column name is extracted from value specified as output_columns.

Column type is Groonga's type name or null. It doesn't describe whether the column value
is vector or scalar. You need to determine it by whether real column value is array or
not.

See /reference/types for type details.

null is used when column value type isn't determined. For example, function call in
output_columns such as --output_columns "snippet_html(content)" uses null.

Here is an example of COLUMNS:

[
["_id", "UInt32"],
["_key", "ShortText"],
["n_likes", "UInt32"],
]

RECORDS includes column values for each matched record. Included records are selected by
offset and limit. It uses the following format:

[
[
RECORD_1_COLUMN_1,
RECORD_1_COLUMN_2,
...,
RECORD_1_COLUMN_N
],
[
RECORD_2_COLUMN_1,
RECORD_2_COLUMN_2,
...,
RECORD_2_COLUMN_N
],
...
[
RECORD_N_COLUMN_1,
RECORD_N_COLUMN_2,
...,
RECORD_N_COLUMN_N
]
]

Here is an example RECORDS:

[
[
1,
"The first post!",
5
],
[
2,
"Groonga",
10
],
[
3,
"Mroonga",
15
]
]

DRILLDOWN_RESULT format is different between drilldown and drilldown[${LABEL}].keys.

drilldown uses the same format as SEARCH_RESULT:

[
[N_HITS],
COLUMNS,
RECORDS
]

And drilldown generates one or more DRILLDOWN_RESULT when drilldown has one ore more keys.

drilldown[${LABEL}].keys uses the following format. Multiple drilldown[${LABEL}].keys are
mapped to one object (key-value pairs):

{
"LABEL_1": [
[N_HITS],
COLUMNS,
RECORDS
],
"LABEL_2": [
[N_HITS],
COLUMNS,
RECORDS
],
...,
"LABEL_N": [
[N_HITS],
COLUMNS,
RECORDS
]
}

Each drilldown[${LABEL}].keys corresponds to the following:

"LABEL": [
[N_HITS],
COLUMNS,
RECORDS
]

The following value part is the same format as SEARCH_RESULT:

[
[N_HITS],
COLUMNS,
RECORDS
]

See also Output format for drilldown[${LABEL}] style for drilldown[${LABEL}] style
drilldown output format.

See also
· /reference/grn_expr/query_syntax

· /reference/grn_expr/script_syntax

shutdown
Summary
shutdown stops the Groonga server process.

shutdown uses graceful shutdown by default. If there are some running commands, the
Groonga server process stops after these running commands are finished. New command
requests aren't processed after shutdown command is executed.

New in version 6.0.1: shutdown uses immediate shutdown by specifying immediate to mode
parameter. The Groonga server process stops immediately even when there are some running
commands.

NOTE:
You need to set /reference/command/request_id to all requests to use immediate
shutdown.

Syntax
This command takes only one optional parameter:

shutdown [mode=graceful]

Usage
shutdown use graceful shutdown by default:

Execution example:

shutdown
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can specify graceful to mode parameter explicitly:

Execution example:

shutdown --mode graceful
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can choose immediate shutdown by specifying immediate to mode parameter:

Execution example:

shutdown --mode immediate
# [[0, 1337566253.89858, 0.000355720520019531], true]

Immediate shutdown is useful when you don't have time for graceful shutdown. For example,
Windows kills service that takes long time to stop on Windows shutdown.

Parameters
This section describes parameters of this command.

Required parameters
There is no required parameter.

Optional parameters
There are optional parameters.

mode
Specifies shutdown mode. Here are available shutdown modes:

┌──────────┬──────────────────────────────────┐
│Value │ Description │
├──────────┼──────────────────────────────────┤
graceful │ Stops after running commands are │
│ │ finished. │
│ │ │
│ │ This is the default. │
├──────────┼──────────────────────────────────┤
immediate │ New in version 6.0.1: Stops │
│ │ immediately even if there are │
│ │ some running commands. │
└──────────┴──────────────────────────────────┘

Return value
shutdown returns true as body when shutdown is accepted:

[HEADER, true]

If shutdown doesn't accept shutdown, error details are in HEADER.

See /reference/command/output_format for HEADER.

status
Summary
status returns the current status of the context that processes the request.

Context is an unit that processes requests. Normally, context is created for each thread.

Syntax
This command takes no parameters:

status

Usage
Here is a simple example:

Execution example:

status
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "uptime": 0,
# "max_command_version": 2,
# "start_time": 1441980651,
# "cache_hit_rate": 0.0,
# "version": "5.0.7-126-gb6fd7f7",
# "alloc_count": 206,
# "command_version": 1,
# "starttime": 1441980651,
# "default_command_version": 1,
# "n_queries": 0
# }
# ]

It returns the current status of the context that processes the request. See Return value
for details.

Parameters
This section describes all parameters.

Required parameters
There is no required parameter.

Optional parameters
There is no optional parameter.

Return value
The command returns the current status as an object:

[
HEADER,
{
"alloc_count": ALLOC_COUNT,
"cache_hit_rate": CACHE_HIT_RATE,
"command_version": COMMAND_VERSION,
"default_command_version": DEFAULT_COMMAND_VERSION,
"max_command_version": MAX_COMMAND_VERSION,
"n_queries": N_QUERIES,
"start_time": START_TIME,
"starttime": STARTTIME,
"uptime": UPTIME,
"version": VERSION
}
]

See /reference/command/output_format for HEADER.

Here are descriptions about values. See Usage for real values:

┌────────────────────────┬────────────────────────────────────┬────────────┐
│Key │ Description │ Example │
├────────────────────────┼────────────────────────────────────┼────────────┤
alloc_count │ The number of allocated │ 1400
│ │ memory blocks that │ │
│ │ aren't freed. If this │ │
│ │ value is continuously │ │
│ │ increased, there may be │ │
│ │ a memory leak. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
cache_hit_rate │ Percentage of cache used │ 29.4
│ │ responses in the Groonga │ │
│ │ process. If there are 10 │ │
│ │ requests and 7 responses │ │
│ │ are created from cache, │ │
│ │ cache_hit_rate is 70.0. │ │
│ │ The percentage is │ │
│ │ computed from only │ │
│ │ requests that use │ │
│ │ commands that support │ │
│ │ cache. │ │
│ │ │ │
│ │ Here are commands that │ │
│ │ support cache: │ │
│ │ │ │
│ │ · select │ │
│ │ │ │
│ │ · logical_select │ │
│ │ │ │
│ │ · logical_range_filter │ │
│ │ │ │
│ │ · logical_count │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
command_version │ The │ 1
│ │ /reference/command/command_version │ │
│ │ that is used by the context. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
default_command_version │ The default │ 1
│ │ /reference/command/command_version │ │
│ │ of the Groonga process. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
max_command_version │ The max │ 2
│ │ /reference/command/command_version │ │
│ │ of the Groonga process. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
n_queries │ The number of requests processed │ 29
│ │ by the Groonga process. It counts │ │
│ │ only requests that use commands │ │
│ │ that support cache. │ │
│ │ │ │
│ │ Here are commands that support │ │
│ │ cache: │ │
│ │ │ │
│ │ · select │ │
│ │ │ │
│ │ · logical_select │ │
│ │ │ │
│ │ · logical_range_filter │ │
│ │ │ │
│ │ · logical_count │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
start_time │ New in version 5.0.8. │ 1441761403
│ │ │ │
│ │ │ │
│ │ The time that the Groonga process │ │
│ │ started in UNIX time. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
starttime │ Deprecated since version 5.0.8: │ 1441761403
│ │ Use start_time instead. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
uptime │ The elapsed time since the Groonga │ 216639
│ │ process started in second. │ │
│ │ │ │
│ │ For example, 216639 means that 2.5 │ │
│ │ (= 216639 / 60 / 60 / 24 = 2.507) │ │
│ │ days. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
version │ The version of the Groonga │ 5.0.7
│ │ process. │ │
└────────────────────────┴────────────────────────────────────┴────────────┘

suggest
NOTE:
The suggest feature specification isn't stable. The specification may be changed.

Summary
suggest - returns completion, correction and/or suggestion for a query.

The suggest command returns completion, correction and/or suggestion for a specified
query.

See /reference/suggest/introduction about completion, correction and suggestion.

Syntax
suggest types table column query [sortby [output_columns [offset [limit [frequency_threshold [conditional_probability_threshold [prefix_search]]]]]]]

Usage
Here are learned data for completion.

Execution example:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "1", "time": 1312950803.86057, "item": "e"},
{"sequence": "1", "time": 1312950803.96857, "item": "en"},
{"sequence": "1", "time": 1312950804.26057, "item": "eng"},
{"sequence": "1", "time": 1312950804.56057, "item": "engi"},
{"sequence": "1", "time": 1312950804.76057, "item": "engin"},
{"sequence": "1", "time": 1312950805.86057, "item": "engine", "type": "submit"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 6]

Here are learned data for correction.

Execution example:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "2", "time": 1312950803.86057, "item": "s"},
{"sequence": "2", "time": 1312950803.96857, "item": "sa"},
{"sequence": "2", "time": 1312950804.26057, "item": "sae"},
{"sequence": "2", "time": 1312950804.56057, "item": "saer"},
{"sequence": "2", "time": 1312950804.76057, "item": "saerc"},
{"sequence": "2", "time": 1312950805.76057, "item": "saerch", "type": "submit"},
{"sequence": "2", "time": 1312950809.76057, "item": "serch"},
{"sequence": "2", "time": 1312950810.86057, "item": "search", "type": "submit"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 8]

Here are learned data for suggestion.

Execution example:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "3", "time": 1312950803.86057, "item": "search engine", "type": "submit"},
{"sequence": "3", "time": 1312950808.86057, "item": "web search realtime", "type": "submit"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

Here is a completion example.

Execution example:

suggest --table item_query --column kana --types complete --frequency_threshold 1 --query en
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "complete": [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "engine",
# 1
# ]
# ]
# }
# ]

Here is a correction example.

Execution example:

suggest --table item_query --column kana --types correct --frequency_threshold 1 --query saerch
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "correct": [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "search",
# 1
# ]
# ]
# }
# ]

Here is a suggestion example.

Execution example:

suggest --table item_query --column kana --types suggest --frequency_threshold 1 --query search
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "suggest": [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "search engine",
# 1
# ],
# [
# "web search realtime",
# 1
# ]
# ]
# }
# ]

Here is a mixed example.

Execution example:

suggest --table item_query --column kana --types complete|correct|suggest --frequency_threshold 1 --query search
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "suggest": [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "search engine",
# 1
# ],
# [
# "web search realtime",
# 1
# ]
# ],
# "complete": [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "search",
# 2
# ],
# [
# "search engine",
# 2
# ]
# ],
# "correct": [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "search",
# 2
# ]
# ]
# }
# ]

Parameters
types Specifies what types are returned by the suggest command.

Here are available types:

complete
The suggest command does completion.

correct
The suggest command does correction.

suggest
The suggest command does suggestion.

You can specify one or more types separated by |. Here are examples:
It returns correction:

correct

It returns correction and suggestion:

correct|suggest

It returns complete, correction and suggestion:

complete|correct|suggest

table Specifies table name that has item_${DATA_SET_NAME} format. For example,
item_query is a table name if you created dataset by the following command:

groonga-suggest-create-dataset /tmp/db-path query

column Specifies a column name that has furigana in Katakana in table table.

query Specifies query for completion, correction and/or suggestion.

sortby Specifies sort key.

Default:
-_score

output_columns
Specifies output columns.

Default:
_key,_score

offset Specifies returned records offset.

Default:
0

limit Specifies number of returned records.

Default:
10

frequency_threshold
Specifies threshold for item frequency. Returned records must have _score that is
greater than or equal to frequency_threshold.

Default:
100

conditional_probability_threshold
Specifies threshold for conditional probability. Conditional probability is used for
learned data. It is probability of query submission when query is occurred. Returned
records must have conditional probability that is greater than or equal to
conditional_probability_threshold.

Default:
0.2

prefix_search
Specifies whether optional prefix search is used or not in completion.

Here are available values:

yes Prefix search is always used.

no Prefix search is never used.

auto Prefix search is used only when other search can't find any records.

Default:
auto

similar_search
Specifies whether optional similar search is used or not in correction.

Here are available values:

yes Similar search is always used.

no Similar search is never used.

auto Similar search is used only when other search can't find any records.

Default:
auto

Return value
Here is a returned JSON format:

{"type1": [["candidate1", score of candidate1],
["candidate2", score of candidate2],
...],
"type2": [["candidate1", score of candidate1],
["candidate2", score of candidate2],
...],
...}

type
A type specified by types.

candidate
A candidate for completion, correction or suggestion.

score of candidate
A score of corresponding candidate. It means that higher score candidate is more likely
candidate for completion, correction or suggestion. Returned candidates are sorted by
score of candidate descending by default.

See also
· /reference/suggest

· /reference/executables/groonga-suggest-create-dataset

table_create
Summary
table_create creates a new table in the current database. You need to create one or more
tables to store and search data.

Syntax
This command takes many parameters.

The required parameter is only name and otehrs are optional:

table_create name
[flags=TABLE_HASH_KEY]
[key_type=null]
[value_type=null]
[default_tokenizer=null]
[normalizer=null]
[token_filters=null]

Usage
table_create command creates a new persistent table. See /reference/tables for table
details.

Create data store table
You can use all table types for data store table. See /reference/tables for all table
types.

Table type is specified as TABLE_${TYPE} to flags parameter.

Here is an example to create TABLE_NO_KEY table:

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]

The table_create command creates a table that is named Logs and is TABLE_NO_KEY type.

If your records aren't searched by key, TABLE_NO_KEY type table is suitable. Because
TABLE_NO_KEY doesn't support key but it is fast and small table. Storing logs into Groonga
database is the case.

If your records are searched by key or referenced by one or more columns, TABLE_NO_KEY
type isn't suitable. Lexicon for fulltext search is the case.

Create large data store table
If you want to store many large keys, your table may not be able to store them. If total
key data is larger than 4GiB, you can't store all key data into your table by default.

You can expand the maximum total key size to 1TiB from 4GiB by KEY_LARGE flag. KEY_LARGE
flag can be used with only TABLE_HASH_KEY. You can't use KEY_LARGE flag with TABLE_NO_KEY,
TABLE_PAT_KEY nor TABLE_DAT_KEY.

Here is an example to create a table that can store many large keys:

Execution example:

table_create Paths TABLE_HASH_KEY|KEY_LARGE ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

The table_create command creates a table that is named Paths and is TABLE_HASH_KEY type.
The Paths table can store many large keys.

Create lexicon table
You can use all table types except TABLE_NO_KEY for lexicon table. Lexicon table needs
key support but TABLE_NO_KEY doesn't support key.

Here is an example to create TABLE_PAT_KEY table:

Execution example:

table_create Lexicon TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]

The table_create command creates the following table:

· The table is named Lexicon.

· The table is TABLE_PAT_KEY type table.

· The table's key is ShortText type.

· The table uses TokenBigram tokenizer to extract tokens from a normalized text.

· The table uses NormalizerAuto normalizer to normalize a text.

TABLE_PAT_KEY is suitable table type for lexicon table. Lexicon table is used for fulltext
search.

In fulltext search, predictive search may be used for fuzzy search. Predictive search is
supported by TABLE_PAT_KEY and TABLE_DAT_KEY.

Lexicon table has many keys because a fulltext target text has many tokens. Table that has
many keys should consider table size because large table requires large memory. Requiring
large memory causes disk I/O. It blocks fast search. So table size is important for a
table that has many keys. TABLE_PAT_KEY is less table size than TABLE_DAT_KEY.

Because of the above reasons, TABLE_PAT_KEY is suitable table type for lexicon table.

Create tag index table
You can use all table types except TABLE_NO_KEY for tag index table. Tag index table needs
key support but TABLE_NO_KEY doesn't support key.

Here is an example to create TABLE_HASH_KEY table:

Execution example:

table_create Tags TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

The table_create command creates a table that is named Tags, is TABLE_HASH_KEY type and
has ShortText type key.

TABLE_HASH_KEY or TABLE_DAT_KEY are suitable table types for tag index table.

If you need only exact match tag search feature, TABLE_HASH_KEY is suitable. It is the
common case.

If you also need predictive tag search feature (for example, searching "groonga" by "gr"
keyword.), TABLE_DAT_KEY is suitable. TABLE_DAT_KEY is large table size but it is not
important because the number of tags will not be large.

Create range index table
You can use TABLE_PAT_KEY and TABLE_DAT_KEY table types for range index table. Range index
table needs range search support but TABLE_NO_KEY and TABLE_HASH_KEY don't support it.

Here is an example to create TABLE_DAT_KEY table:

Execution example:

table_create Ages TABLE_DAT_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]

The table_create command creates a table that is named Ages, is TABLE_DAT_KEY type and has
UInt32 type key.

TABLE_PAT_KEY and TABLE_DAT_KEY are suitable table types for range index table.

If you don't have many indexed items, TABLE_DAT_KEY is suitable. Index for age is the case
in the above example. Index for age will have only 0-100 items because human doesn't live
so long.

If you have many indexed items, TABLE_PAT_KEY is suitable. Because TABLE_PAT_KEY is
smaller than TABLE_DAT_KEY.

Parameters
This section describes all parameters.

name
Specifies a table name to be created. name must be specified.

Here are available characters:

· 0 .. 9 (digit)

· a .. z (alphabet, lower case)

· A .. Z (alphabet, upper case)

· # (hash)

· @ (at mark)

· - (hyphen)

· _ (underscore) (NOTE: Underscore can't be used as the first character.)

You need to create a name with one or more the above characters. Note that you cannot use
_ as the first character such as _name.

flags
Specifies a table type and table customize options.

Here are available flags:

┌───────────────┬──────────────────────────────────┐
│Flag │ Description │
├───────────────┼──────────────────────────────────┤
TABLE_NO_KEY │ Array table. See also │
│ │ table-no-key. │
├───────────────┼──────────────────────────────────┤
TABLE_HASH_KEY │ Hash table. See also │
│ │ table-hash-key. │
├───────────────┼──────────────────────────────────┤
TABLE_PAT_KEY │ Patricia trie. See also │
│ │ table-pat-key. │
├───────────────┼──────────────────────────────────┤
TABLE_DAT_KEY │ Double array trie. See also │
│ │ table-dat-key. │
├───────────────┼──────────────────────────────────┤
KEY_WITH_SIS │ Enable Semi Infinite String. │
│ │ Require TABLE_PAT_KEY. │
├───────────────┼──────────────────────────────────┤
KEY_LARGE │ Expand the maximum total key │
│ │ size to 1TiB from 4GiB. Require │
│ │ TABLE_HASH_KEY. │
└───────────────┴──────────────────────────────────┘

NOTE:
Since Groonga 2.1.0 KEY_NORMALIZE flag is deprecated. Use normalizer option with
NormalizerAuto instead.

You must specify one of TABLE_${TYPE} flags. You cannot specify two or more TABLE_${TYPE}
flags. For example, TABLE_NO_KEY|TABLE_HASH_KEY is invalid.

You can combine flags with | (vertical bar) such as TABLE_PAT_KEY|KEY_WITH_SIS.

See /reference/tables for difference between table types.

The default flags are TABLE_HASH_KEY.

key_type
Specifies key type.

If you specify TABLE_HASH_KEY, TABLE_PAT_KEY or TABLE_DAT_KEY as flags parameter, you need
to specify key_type option.

See /reference/types for all types.

The default value is none.

value_type
Specifies value type.

You can use value when you specify TABLE_NO_KEY, TABLE_HASH_KEY or TABLE_PAT_KEY as flags
parameter. Value type must be a fixed size type. For example, UInt32 can be used but
ShortText cannot be used. Use columns instead of value.

The default value is none.

default_tokenizer
Specifies the default tokenizer that is used on searching and data loading.

You must specify default_tokenizer for a table that is used for lexicon of fulltext search
index. See /reference/tokenizers for available tokenizers. You must choose a tokenizer
from the list for fulltext search.

You don't need to specify default_tokenizer in the following cases:

· You don't use the table as a lexicon.

· You use the table as a lexicon but you don't need fulltext search. For example:

· Index target data isn't text data such as Int32 and Time.

· You just need exact match search, prefix search and so on.

You can't use default_tokenizer with TABLE_NO_KEY flag because a table that uses
TABLE_NO_KEY flag can't be used as lexicon.

You must specify TABLE_HASH_KEY, TABLE_PAT_KEY, TABLE_DAT_KEY to flags when you want to
use the table as a lexicon.

The default value is none.

normalizer
Specifies a normalizer that is used to normalize key.

You cannot use normalizer with TABLE_NO_KEY because TABLE_NO_KEY doesn't support key.

See /reference/normalizers for all normalizsers.

The default value is none.

token_filters
Specifies token filters that is used to some processes tokenized token.

You cannot use token_filters with TABLE_NO_KEY because TABLE_NO_KEY doesn't support key.

See /reference/token_filters for all token filters.

The default value is none.

Return value
table_create returns true as body on success such as:

[HEADER, true]

If table_create fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

See also
· /reference/tables

· /reference/commands/column_create

· /reference/tokenizers

· /reference/normalizers

· /reference/command/output_format

table_list
Summary
table_list - DBに定義されているテーブルをリスト表示

Groonga組込コマンドの一つであるtable_listについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

table_listは、DBに定義されているテーブルのリストを表示します。

Syntax
table_list

Usage
Execution example:

table_list
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# "id",
# "UInt32"
# ],
# [
# "name",
# "ShortText"
# ],
# [
# "path",
# "ShortText"
# ],
# [
# "flags",
# "ShortText"
# ],
# [
# "domain",
# "ShortText"
# ],
# [
# "range",
# "ShortText"
# ],
# [
# "default_tokenizer",
# "ShortText"
# ],
# [
# "normalizer",
# "ShortText"
# ]
# ],
# [
# 259,
# "Ages",
# "/tmp/groonga-databases/commands_table_create.0000103",
# "TABLE_DAT_KEY|PERSISTENT",
# "UInt32",
# null,
# null,
# null
# ],
# [
# 257,
# "Lexicon",
# "/tmp/groonga-databases/commands_table_create.0000101",
# "TABLE_PAT_KEY|PERSISTENT",
# "ShortText",
# null,
# "TokenBigram",
# "NormalizerAuto"
# ],
# [
# 256,
# "Logs",
# "/tmp/groonga-databases/commands_table_create.0000100",
# "TABLE_NO_KEY|PERSISTENT",
# null,
# null,
# null,
# null
# ],
# [
# 258,
# "Tags",
# "/tmp/groonga-databases/commands_table_create.0000102",
# "TABLE_HASH_KEY|PERSISTENT",
# "ShortText",
# null,
# null,
# null
# ]
# ]
# ]

Parameters
ありません。

Return value
テーブル名一覧が以下の形式で返却されます。:

[[[テーブル情報名1,テーブル情報型1],...], テーブル情報1,...]

テーブル情報名n
テーブル情報n
には複数の情報が含まれますが、そこに入る情報がどんな内容かを示す名前を出力します。
情報名は以下の通りです。

id
テーブルオブジェクトに割り当てられたID

name
テーブル名

path
テーブルのレコードを格納するファイル名

flags
テーブルのflags属性

domain
主キー値の属する型

range
valueが属する型

テーブル情報型n
テーブル情報の型を出力します。

テーブル情報n
テーブル情報名n で示された情報の配列を出力します。 情報の順序は テーブル情報名n
の順序と同じです。

table_remove
Summary
table_remove removes a table and its columns. If there are one or more indexes against key
of the table and its columns, they are also removed.

New in version 6.0.1: You can also remove tables and columns that reference the target
table by using dependent parameter.

Syntax
This command takes two parameters:

table_remove name
[dependent=no]

Usage
You just specify table name that you want to remove. table_remove removes the table and
its columns. If the table and its columns are indexed, all index columns for the table and
its columns are also removed.

This section describes about the followings:

· Basic usage

· Unremovable cases

· Removes a table with tables and columns that reference the target table

· Decreases used resources

Basic usage
Let's think about the following case:

· There is one table Entries.

· Entries table has some columns.

· Entries table's key is indexed.

· A column of Entries is indexed.

Here are commands that create Entries table:

Execution example:

table_create Entries TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here are commands that create an index for Entries table's key:

Execution example:

table_create EntryKeys TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create EntryKeys key_index COLUMN_INDEX Entries _key
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here are commands that create an index for Entries table's column:

Execution example:

table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms content_index COLUMN_INDEX Entries content
# [[0, 1337566253.89858, 0.000355720520019531], true]

Let's confirm the current schema before running table_remove:

Execution example:

dump
# table_create Entries TABLE_HASH_KEY UInt32
# column_create Entries content COLUMN_SCALAR Text
# column_create Entries title COLUMN_SCALAR ShortText
#
# table_create EntryKeys TABLE_HASH_KEY UInt32
#
# table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
#
# column_create EntryKeys key_index COLUMN_INDEX Entries _key
# column_create Terms content_index COLUMN_INDEX Entries content

If you remove Entries table, the following tables and columns are removed:

· Entries

· Entries.title

· Entries.context

· EntryKeys.key_index

· Terms.content_index

The following tables (lexicons) aren't removed:

· EntryKeys

· Terms

Let's run table_remove:

Execution example:

table_remove Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is schema after table_remove. Only EntryKeys and Terms exist:

Execution example:

dump
# table_create EntryKeys TABLE_HASH_KEY UInt32
#
# table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto

Unremovable cases
There are some unremovable cases:

· One or more tables use the table as key type.

· One or more columns use the table as value type.

Both cases blocks dangling references. If the table is referenced as type and the table is
removed, tables and columns that refer the table are broken.

If the target table satisfies one of them, table_remove is failed. The target table and
its columns aren't removed.

Here is an example for the table is used as key type case.

The following commands create a table to be removed and a table that uses the table to be
removed as key type:

Execution example:

table_create ReferencedByTable TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create ReferenceTable TABLE_HASH_KEY ReferencedByTable
# [[0, 1337566253.89858, 0.000355720520019531], true]

table_remove against ReferencedByTable is failed:

Execution example:

table_remove ReferencedByTable
# [
# [
# -2,
# 1337566253.89858,
# 0.000355720520019531,
# "[table][remove] a table that references the table exists: <ReferenceTable._key> -> <ReferencedByTable>",
# [
# [
# "is_removable_table",
# "db.c",
# 8831
# ]
# ]
# ],
# false
# ]

You need to remove ReferenceTable before you remove ReferencedByTable:

Execution example:

table_remove ReferenceTable
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_remove ReferencedByTable
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example for the table is used as value type case.

The following commands create a table to be removed and a column that uses the table to be
removed as value type:

Execution example:

table_create ReferencedByColumn TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Table TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Table reference_column COLUMN_SCALAR ReferencedByColumn
# [[0, 1337566253.89858, 0.000355720520019531], true]

table_remove against ReferencedByColumn is failed:

Execution example:

table_remove ReferencedByColumn
# [
# [
# -2,
# 1337566253.89858,
# 0.000355720520019531,
# "[table][remove] a column that references the table exists: <Table.reference_column> -> <ReferencedByColumn>",
# [
# [
# "is_removable_table",
# "db.c",
# 8851
# ]
# ]
# ],
# false
# ]

You need to remove Table.reference_column before you remove ReferencedByColumn:

Execution example:

column_remove Table reference_column
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_remove ReferencedByColumn
# [[0, 1337566253.89858, 0.000355720520019531], true]

Removes a table with tables and columns that reference the target table
New in version 6.0.1.

If you understand what you'll do, you can also remove tables and columns that reference
the target table with one table_remove command by using --dependent yes parameter.

ReferencedTable in the following schema is referenced from a table and a column:

Execution example:

table_create ReferencedTable TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Table1 TABLE_HASH_KEY ReferencedTable
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Table2 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Table2 reference_column COLUMN_SCALAR ReferencedTable
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can't remove ReferencedTable by default:

Execution example:

table_remove ReferencedTable
# [
# [
# -2,
# 1337566253.89858,
# 0.000355720520019531,
# "[table][remove] a table that references the table exists: <Table1._key> -> <ReferencedTable>",
# [
# [
# "is_removable_table",
# "db.c",
# 8831
# ]
# ]
# ],
# false
# ]

You can remove ReferencedTable, Table1 and Table2.reference_column by using --dependent
yes parameter. Table1 and Table2.reference_column reference ReferencedTable:

Execution example:

table_remove ReferencedTable --dependent yes
# [[0, 1337566253.89858, 0.000355720520019531], true]

Decreases used resources
table_remove opens all tables and columns in database to check Unremovable cases.

If you have many tables and columns, table_remove may use many resources. There is a
workaround to avoid the case.

table_remove closes temporary opened tables and columns for checking when the max number
of threads is 1.

You can confirm and change the current max number of threads by thread_limit.

The feature is used in the following case:

Execution example:

table_create Entries TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
thread_limit 2
# [[0, 1337566253.89858, 0.000355720520019531], 1]
table_remove Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]

The feature isn't used in the following case:

Execution example:

table_create Entries TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
thread_limit 2
# [[0, 1337566253.89858, 0.000355720520019531], 1]
table_remove Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]

Parameters
This section describes all parameters.

Required parameters
There is only one required parameter.

name
Specifies the table name to be removed.

See Usage how to use this parameter.

Optional parameters
There is only one optional parameter.

dependent
New in version 6.0.1.

Specifies whether tables and columns that reference the target table are also removed or
not.

If this value is yes, tables and columns that reference the target table are also removed.
Otherwise, they aren't removed and an error is returned.

In other words, if there are any tables and columns that reference the target table, the
target table isn't removed by default.

You should use this parameter carefully. This is a danger parameter.

See Removes a table with tables and columns that reference the target table how to use
this parameter.

Return value
The command returns true as body on success such as:

[HEADER, true]

If the command fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

table_rename
Summary
table_rename command renames a table.

It is a light operation. It just changes a relationship between name and the table object.
It doesn't copy table and its column values.

It is a dangerous operation. You must stop all operations including read operations while
you run table_rename. If the following case is occurred, Groonga process may be crashed:

· Starts an operation (like select) that accesses the table to be renamed by the
current table name. The current table name is called as the old table name in the
below because the table name is renamed.

· Runs table_rename. The select is still running.

· The select accesses the table to be renamed by the old table name. But the select
can't find the table by the old name because the table has been renamed to the new
table name. It may crash the Groonga process.

Syntax
This command takes two parameters.

All parameters are required:

table_rename name new_name

Usage
Here is a simple example of table_rename command.

Execution example:

table_create Users TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users score COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "Alice", "score": 2},
{"_key": "Bob", "score": 0},
{"_key": "Carlos", "score": -1}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
table_rename Users Players
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_list
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# "id",
# "UInt32"
# ],
# [
# "name",
# "ShortText"
# ],
# [
# "path",
# "ShortText"
# ],
# [
# "flags",
# "ShortText"
# ],
# [
# "domain",
# "ShortText"
# ],
# [
# "range",
# "ShortText"
# ],
# [
# "default_tokenizer",
# "ShortText"
# ],
# [
# "normalizer",
# "ShortText"
# ]
# ],
# [
# 256,
# "Players",
# "/tmp/groonga-databases/commands_table_rename.0000100",
# "TABLE_PAT_KEY|PERSISTENT",
# "ShortText",
# null,
# null,
# null
# ]
# ]
# ]
select Players
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "score",
# "Int32"
# ]
# ],
# [
# 1,
# "Alice",
# 2
# ],
# [
# 2,
# "Bob",
# 0
# ],
# [
# 3,
# "Carlos",
# -1
# ]
# ]
# ]
# ]

Parameters
This section describes parameters of table_rename.

Required parameters
All parameters are required.

name
Specifies the table name to be renamed.

new_name
Specifies the new table name.

Return value
The command returns true as body on success such as:

[HEADER, true]

If the command fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

table_tokenize
Summary
table_tokenize command tokenizes text by the specified table's tokenizer.

Syntax
This command takes many parameters.

table and string are required parameters. Others are optional:

table_tokenize table
string
[flags=NONE]
[mode=GET]

Usage
Here is a simple example.

Execution example:

register token_filters/stop_word
# [[0,0.0,0.0],true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStopWord
# [[0,0.0,0.0],true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0,0.0,0.0],true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0,0.0,0.0],1]
table_tokenize Terms "Hello and Good-bye" --mode GET
# [
# [
# 0,
# 0.0,
# 0.0
# ],
# [
# {
# "value": "hello",
# "position": 0
# },
# {
# "value": "good",
# "position": 2
# },
# {
# "value": "-",
# "position": 3
# },
# {
# "value": "bye",
# "position": 4
# }
# ]
# ]

Terms table is set TokenBigram tokenizer, NormalizerAuto normalizer, TokenFilterStopWord
token filter. It returns tokens that is generated by tokenizeing "Hello and Good-bye" with
TokenBigram tokenizer. It is normalized by NormalizerAuto normalizer. and token is
removed with TokenFilterStopWord token filter.

Parameters
This section describes all parameters. Parameters are categorized.

Required parameters
There are required parameters, table and string.

table
Specifies the lexicon table. table_tokenize command uses the tokenizer, the normalizer,
the token filters that is set the lexicon table.

string
Specifies any string which you want to tokenize.

See tokenize-string option in /reference/commands/tokenize about details.

Optional parameters
There are optional parameters.

flags
Specifies a tokenization customize options. You can specify multiple options separated by
"|".

The default value is NONE.

See tokenize-flags option in /reference/commands/tokenize about details.

mode
Specifies a tokenize mode.

The default value is GET.

See tokenize-mode option in /reference/commands/tokenize about details.

Return value
table_tokenize command returns tokenized tokens.

See tokenize-return-value option in /reference/commands/tokenize about details.

See also
· /reference/tokenizers

· /reference/commands/tokenize

thread_limit
Summary
New in version 5.0.7.

thread_limit has the following two features:

· It returns the max number of threads.

· It sets the max number of threads.

/reference/executables/groonga is the only Groonga server that supports full thread_limit
features.

/reference/executables/groonga-httpd supports only one feature that returns the max number
of threads. The max number of threads of /reference/executables/groonga-httpd always
returns 1 because /reference/executables/groonga-httpd uses single thread model.

If you're using Groonga as a library, thread_limit doesn't work without you set custom
functions by grn_thread_set_get_limit_func() and grn_thread_set_set_limit_func(). If you
set a function by grn_thread_set_get_limit_func(), the feature that returns the max number
of threads works. If you set a function by grn_thread_set_set_limit_func(), the feature
that sets the max number of threads works.

Syntax
This command takes only one optional parameter:

thread_limit [max=null]

Usage
You can get the max number of threads by calling without any parameters:

Execution example:

thread_limit
# [[0, 1337566253.89858, 0.000355720520019531], 2]

If it returns 0, your Groonga server doesn't support the feature.

You can set the max number of threads by calling max parameter:

Execution example:

thread_limit --max 4
# [[0, 1337566253.89858, 0.000355720520019531], 2]

It returns the previous max number of threads when you pass max parameter.

Parameters
This section describes all parameters.

Required parameters
There is no required parameter.

Optional parameters
There is one optional parameter.

max
Specifies the new max number of threads.

You must specify positive integer:

Execution example:

thread_limit --max 3
# [[0, 1337566253.89858, 0.000355720520019531], 4]

If you specify max parameter, thread_limit returns the max number of threads before max is
applied.

Return value
The command returns the max number of threads as body:

[HEADER, N_MAX_THREADS]

If max is specified, N_MAX_THREADS is the max number of threads before max is applied.

See /reference/command/output_format for HEADER.

tokenize
Summary
tokenize command tokenizes text by the specified tokenizer. It is useful to debug
tokenization.

Syntax
This command takes many parameters.

tokenizer and string are required parameters. Others are optional:

tokenize tokenizer
string
[normalizer=null]
[flags=NONE]
[mode=ADD]
[token_filters=NONE]

Usage
Here is a simple example.

Execution example:

tokenize TokenBigram "Fulltext Search"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "Fu"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ul"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lt"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ex"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "t "
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": " S"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "Se"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ea"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "ar"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "rc"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "ch"
# },
# {
# "position": 14,
# "force_prefix": false,
# "value": "h"
# }
# ]
# ]

It has only required parameters. tokenizer is TokenBigram and string is "Fulltext Search".
It returns tokens that is generated by tokenizing "Fulltext Search" with TokenBigram
tokenizer. It doesn't normalize "Fulltext Search".

Parameters
This section describes all parameters. Parameters are categorized.

Required parameters
There are required parameters, tokenizer and string.

tokenizer
Specifies the tokenizer name. tokenize command uses the tokenizer that is named tokenizer.

See /reference/tokenizers about built-in tokenizers.

Here is an example to use built-in TokenTrigram tokenizer.

Execution example:

tokenize TokenTrigram "Fulltext Search"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "Ful"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ull"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "llt"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lte"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "tex"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ext"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "xt "
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "t S"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": " Se"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "Sea"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ear"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "arc"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "rch"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "ch"
# },
# {
# "position": 14,
# "force_prefix": false,
# "value": "h"
# }
# ]
# ]

If you want to use other tokenizers, you need to register additional tokenizer plugin by
register command. For example, you can use KyTea based tokenizer by registering
tokenizers/kytea.

string
Specifies any string which you want to tokenize.

If you want to include spaces in string, you need to quote string by single quotation (')
or double quotation (").

Here is an example to use spaces in string.

Execution example:

tokenize TokenBigram "Groonga is a fast fulltext earch engine!"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "Gr"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ro"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "oo"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "on"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "ng"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ga"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "a "
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": " i"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "is"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "s "
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": " a"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "a "
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": " f"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "fa"
# },
# {
# "position": 14,
# "force_prefix": false,
# "value": "as"
# },
# {
# "position": 15,
# "force_prefix": false,
# "value": "st"
# },
# {
# "position": 16,
# "force_prefix": false,
# "value": "t "
# },
# {
# "position": 17,
# "force_prefix": false,
# "value": " f"
# },
# {
# "position": 18,
# "force_prefix": false,
# "value": "fu"
# },
# {
# "position": 19,
# "force_prefix": false,
# "value": "ul"
# },
# {
# "position": 20,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 21,
# "force_prefix": false,
# "value": "lt"
# },
# {
# "position": 22,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 23,
# "force_prefix": false,
# "value": "ex"
# },
# {
# "position": 24,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 25,
# "force_prefix": false,
# "value": "t "
# },
# {
# "position": 26,
# "force_prefix": false,
# "value": " e"
# },
# {
# "position": 27,
# "force_prefix": false,
# "value": "ea"
# },
# {
# "position": 28,
# "force_prefix": false,
# "value": "ar"
# },
# {
# "position": 29,
# "force_prefix": false,
# "value": "rc"
# },
# {
# "position": 30,
# "force_prefix": false,
# "value": "ch"
# },
# {
# "position": 31,
# "force_prefix": false,
# "value": "h "
# },
# {
# "position": 32,
# "force_prefix": false,
# "value": " e"
# },
# {
# "position": 33,
# "force_prefix": false,
# "value": "en"
# },
# {
# "position": 34,
# "force_prefix": false,
# "value": "ng"
# },
# {
# "position": 35,
# "force_prefix": false,
# "value": "gi"
# },
# {
# "position": 36,
# "force_prefix": false,
# "value": "in"
# },
# {
# "position": 37,
# "force_prefix": false,
# "value": "ne"
# },
# {
# "position": 38,
# "force_prefix": false,
# "value": "e!"
# },
# {
# "position": 39,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

Optional parameters
There are optional parameters.

normalizer
Specifies the normalizer name. tokenize command uses the normalizer that is named
normalizer. Normalizer is important for N-gram family tokenizers such as TokenBigram.

Normalizer detects character type for each character while normalizing. N-gram family
tokenizers use character types while tokenizing.

Here is an example that doesn't use normalizer.

Execution example:

tokenize TokenBigram "Fulltext Search"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "Fu"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ul"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lt"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ex"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "t "
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": " S"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "Se"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ea"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "ar"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "rc"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "ch"
# },
# {
# "position": 14,
# "force_prefix": false,
# "value": "h"
# }
# ]
# ]

All alphabets are tokenized by two characters. For example, Fu is a token.

Here is an example that uses normalizer.

Execution example:

tokenize TokenBigram "Fulltext Search" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "fulltext"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "search"
# }
# ]
# ]

Continuous alphabets are tokenized as one token. For example, fulltext is a token.

If you want to tokenize by two characters with noramlizer, use
TokenBigramSplitSymbolAlpha.

Execution example:

tokenize TokenBigramSplitSymbolAlpha "Fulltext Search" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "fu"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ul"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lt"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ex"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "t"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "se"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "ea"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ar"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "rc"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "ch"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "h"
# }
# ]
# ]

All alphabets are tokenized by two characters. And they are normalized to lower case
characters. For example, fu is a token.

flags
Specifies a tokenization customize options. You can specify multiple options separated by
"|". For example, NONE|ENABLE_TOKENIZED_DELIMITER.

Here are available flags.

┌───────────────────────────┬──────────────────────────────────┐
│Flag │ Description │
├───────────────────────────┼──────────────────────────────────┤
NONE │ Just ignored. │
├───────────────────────────┼──────────────────────────────────┤
ENABLE_TOKENIZED_DELIMITER │ Enables tokenized delimiter. See │
│ │ /reference/tokenizers about │
│ │ tokenized delimiter details. │
└───────────────────────────┴──────────────────────────────────┘

Here is an example that uses ENABLE_TOKENIZED_DELIMITER.

Execution example:

tokenize TokenDelimit "Full￾text Sea￾crch" NormalizerAuto ENABLE_TOKENIZED_DELIMITER
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "full"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "text sea"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "crch"
# }
# ]
# ]

TokenDelimit tokenizer is one of tokenized delimiter supported tokenizer.
ENABLE_TOKENIZED_DELIMITER enables tokenized delimiter. Tokenized delimiter is special
character that indicates token border. It is U+FFFE. The character is not assigned any
character. It means that the character is not appeared in normal string. So the character
is good character for this puropose. If ENABLE_TOKENIZED_DELIMITER is enabled, the target
string is treated as already tokenized string. Tokenizer just tokenizes by tokenized
delimiter.

mode
Specifies a tokenize mode. If the mode is specified ADD, the text is tokenized by the rule
that adding a document. If the mode is specified GET, the text is tokenized by the rule
that searching a document. If the mode is omitted, the text is tokenized by the ADD mode.

The default mode is ADD.

Here is an example to the ADD mode.

Execution example:

tokenize TokenBigram "Fulltext Search" --mode ADD
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "Fu"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ul"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lt"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ex"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "t "
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": " S"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "Se"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ea"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "ar"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "rc"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "ch"
# },
# {
# "position": 14,
# "force_prefix": false,
# "value": "h"
# }
# ]
# ]

The last alphabet is tokenized by one character.

Here is an example to the GET mode.

Execution example:

tokenize TokenBigram "Fulltext Search" --mode GET
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "Fu"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ul"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lt"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ex"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "t "
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": " S"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "Se"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ea"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "ar"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "rc"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "ch"
# }
# ]
# ]

The last alphabet is tokenized by two characters.

token_filters
Specifies the token filter names. tokenize command uses the tokenizer that is named
token_filters.

See /reference/token_filters about token filters.

Return value
tokenize command returns tokenized tokens. Each token has some attributes except token
itself. The attributes will be increased in the feature:

[HEADER, tokens]

HEADER
See /reference/command/output_format about HEADER.

tokens
tokens is an array of token. Token is an object that has the following attributes.

┌─────────┬─────────────────┐
│Name │ Description │
├─────────┼─────────────────┤
value │ Token itself. │
├─────────┼─────────────────┤
position │ The N-th token. │
└─────────┴─────────────────┘

See also
· /reference/tokenizers

tokenizer_list
Summary
tokenizer_list command lists tokenizers in a database.

Syntax
This command takes no parameters:

tokenizer_list

Usage
Here is a simple example.

Execution example:

tokenizer_list
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "name": "TokenMecab"
# },
# {
# "name": "TokenDelimit"
# },
# {
# "name": "TokenUnigram"
# },
# {
# "name": "TokenBigram"
# },
# {
# "name": "TokenTrigram"
# },
# {
# "name": "TokenBigramSplitSymbol"
# },
# {
# "name": "TokenBigramSplitSymbolAlpha"
# },
# {
# "name": "TokenBigramSplitSymbolAlphaDigit"
# },
# {
# "name": "TokenBigramIgnoreBlank"
# },
# {
# "name": "TokenBigramIgnoreBlankSplitSymbol"
# },
# {
# "name": "TokenBigramIgnoreBlankSplitSymbolAlpha"
# },
# {
# "name": "TokenBigramIgnoreBlankSplitSymbolAlphaDigit"
# },
# {
# "name": "TokenDelimitNull"
# },
# {
# "name": "TokenRegexp"
# }
# ]
# ]

It returns tokenizers in a database.

Return value
tokenizer_list command returns tokenizers. Each tokenizers has an attribute that contains
the name. The attribute will be increased in the feature:

[HEADER, tokenizers]

HEADER
See /reference/command/output_format about HEADER.

tokenizers
tokenizers is an array of tokenizer. Tokenizer is an object that has the following
attributes.

┌─────┬─────────────────┐
│Name │ Description │
├─────┼─────────────────┤
name │ Tokenizer name. │
└─────┴─────────────────┘

See also
· /reference/tokenizers

· /reference/commands/tokenize

truncate
Summary
truncate command deletes all records from specified table or all values from specified
column.

Syntax
This command takes only one required parameter:

truncate target_name

New in version 4.0.9: target_name parameter can be used since 4.0.9. You need to use table
parameter for 4.0.8 or earlier.

For backward compatibility, truncate command accepts table parameter. But it should not be
used for newly written code.

Usage
Here is a simple example of truncate command against a table.

Execution example:

table_create Users TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users score COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "Alice", "score": 2},
{"_key": "Bob", "score": 0},
{"_key": "Carlos", "score": -1}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "score",
# "Int32"
# ]
# ],
# [
# 1,
# "Alice",
# 2
# ],
# [
# 2,
# "Bob",
# 0
# ],
# [
# 3,
# "Carlos",
# -1
# ]
# ]
# ]
# ]
truncate Users
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "score",
# "Int32"
# ]
# ]
# ]
# ]
# ]

Here is a simple example of truncate command against a column.

Execution example:

table_create Users TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users score COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "Alice", "score": 2},
{"_key": "Bob", "score": 0},
{"_key": "Carlos", "score": -1}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "score",
# "Int32"
# ]
# ],
# [
# 1,
# "Alice",
# 2
# ],
# [
# 2,
# "Bob",
# 0
# ],
# [
# 3,
# "Carlos",
# -1
# ]
# ]
# ]
# ]
truncate Users.score
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "score",
# "Int32"
# ]
# ],
# [
# 1,
# "Alice",
# 0
# ],
# [
# 2,
# "Bob",
# 0
# ],
# [
# 3,
# "Carlos",
# 0
# ]
# ]
# ]
# ]

Parameters
This section describes parameters of truncate.

Required parameters
There is required parameter, target_name.

target_name
Specifies the name of table or column.

Return value
truncate command returns whether truncation is succeeded or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

Data types
Name
Groonga data types

Description
Groonga identifies data types to store.

A primary key of table and column value belong to some kind of data types in Groonga
database. And normally, column values become in common with all records in one table.

A primary key type and column type can be specified Groonga defined types, user defined
types or user defined table.

If you specify other table to primary key type, this table becomes subset of the table of
primary key type.

If you specify other table to column type, this column becomes reference key of the table
of column type.

Builtin types
The following types are defined as builtin types.

Bool
Boolean type. The possible values are true and false. (default: false)

To store a value by /reference/commands/load command, becomes false if you specify false,
0 or empty string, becomes true if you specify others.

Int8
Signed 8bit integer. It's -128 or more and 127 or less. (default: 0)

UInt8
Unsigned 8bit integer. Is't 0 or more and 255 or less. (default: 0)

Int16
Signed 16bit integer. It's -32,768 or more and 32,767 or less. (default: 0)

UInt16
Unsigned 16bit integer. It's 0 or more and 65,535 or less. (default: 0)

Int32
Signed 32bit integer. It's -2,147,483,648 or more and 2,147,483,647 or less. (default: 0)

UInt32
Unsigned 32bit integer. It's 0 or more and 4,294,967,295 or less. (default: 0)

Int64
Signed 64bit integer. It's -9,223,372,036,854,775,808 or more and
9,223,372,036,854,775,807 or less. (default: 0)

UInt64
Unsigned 64bit integer. It's 0 or more and 18,446,744,073,709,551,615 or less. (default:
0)

Float
Double-precision floating-point number of IEEE 754 as a real number. (default: 0.0)

See IEEE floating point - Wikipedia, the free encyclopedia or IEEE 754: Standard for
Binary Floating-Point for details of IEEE 754 format.

Time
Date and Time, the number of seconds that have elapsed since 1970-01-01 00:00:00 by 64 bit
signed integer. (default: 0)

To store a value by /reference/commands/load command, specifies the number of elapsed
seconds since 1970-01-01 00:00:00. To specify the detailed date and time than seconds, use
the decimal.

ShortText
String of 4,095 or less bytes. (default: "")

Text
String of 65,535 or less bytes. (default: "")

LongText
String of 2,147,483,647 or less bytes. (default: "")

TokyoGeoPoint
旧日本測地系による経緯度であり、経度と緯度をミリ秒単位で表現した整数の組により表現します。(デフォルト値:
0x0)

度分秒形式でx度y分z秒となる経度・緯度は、(((x * 60) + y) * 60 + z) *
1000という計算式でミリ秒単位へと変換されます。

/reference/commands/load コマンドで値を格納するときは、"ミリ秒単位の経度xミリ秒単位の緯度"
もしくは "経度の小数表記x緯度の小数表記"
という文字列表現を使って指定します。経度と緯度の区切りとしては、'x' のほかに ','
を使うことができます。

測地系の詳細については、 測地系 - Wikipedia を参照してください。

WGS84GeoPoint
世界測地系(World Geodetic System, WGS
84)による経緯度であり、経度と緯度をミリ秒単位で表現した整数の組により表現します。(デフォルト値:
0x0)

度分秒形式からミリ秒形式への変換方法や /reference/commands/load
コマンドにおける指定方法はTokyoGeoPointと同じです。

Limitations about types
Types that can't be specified in primary key of table
Text and LongText can't be specified in primary key of table.

ベクターとして格納できない型
Groongaのカラムは、ある型のベクターを保存することができます。しかし、ShortText, Text,
LongTextの3つの型についてはベクターとして保存したり出力したりすることはできますが、検索条件やドリルダウン条件に指定することができません。

テーブル型は、ベクターとして格納することができます。よって、ShortTextのベクターを検索条件やドリルダウン条件に使用したい場合には、主キーがShortText型のテーブルを別途作成し、そのテーブルを型として利用します。

Tables
Summary
Table in Groonga manages relation between ID and key. Groonga provides four table types.
They are TABLE_NO_KEY, TABLE_HASH_KEY, TABLE_PAT_KEY and TABLE_DAT_KEY.

All tables except TABLE_NO_KEY provides both fast ID search by key and fast key search by
ID. TABLE_NO_KEY doesn't support key. TABLE_NO_KEY only manages ID. So TABLE_NO_KEY
doesn't provides ID search and key search.

Characteristics
Here is a chracteristic table of all tables in Groonga. (TABLE_ prefix is omitted in the
table.)

┌─────────────────┬────────┬────────────┬───────────────┬──────────────────┐
│ │ NO_KEYHASH_KEYPAT_KEYDAT_KEY
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Data structure │ Array │ Hash table │ Patricia trie │ Double array │
│ │ │ │ │ trie │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│ID support │ o │ o │ o │ o │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Key support │ x │ o │ o │ o │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Value support │ o │ o │ o │ x │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Key -> ID speed │ - │ oo │ x │ o │
│ │ │ │ │ │
│ · o: fast │ │ │ │ │
│ │ │ │ │ │
│ · x: slow │ │ │ │ │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Update speed │ ooo │ o │ o │ x │
│ │ │ │ │ │
│ · o: fast │ │ │ │ │
│ │ │ │ │ │
│ · x: slow │ │ │ │ │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Size │ ooo │ o │ oo │ x │
│ │ │ │ │ │
│ · o: │ │ │ │ │
│ small │ │ │ │ │
│ │ │ │ │ │
│ · x: │ │ │ │ │
│ large │ │ │ │ │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Key update │ - │ x │ x │ o │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Common prefix │ - │ x │ o │ o │
│search │ │ │ │ │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Predictive │ - │ x │ o │ o │
│search │ │ │ │ │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Range search │ - │ x │ o │ o │
└─────────────────┴────────┴────────────┴───────────────┴──────────────────┘

TABLE_NO_KEY
TABLE_NO_KEY is very fast and very small but it doesn't support key. TABLE_NO_KEY is a
only table that doesn't support key.

You cannot use TABLE_NO_KEY for lexicon for fulltext search because lexicon stores tokens
as key. TABLE_NO_KEY is useful for no key records such as log.

TABLE_HASH_KEY
TABLE_HASH_KEY is fast but it doesn't support advanced search functions such as common
prefix search and predictive search.

TABLE_HASH_KEY is useful for index for exact search such as tag search.

TABLE_PAT_KEY
TABLE_PAT_KEY is small and supports advanced search functions.

TABLE_PAT_KEY is useful for lexicon for fulltext search and index for range search.

TABLE_DAT_KEY
TABLE_DAT_KEY is fast and supports key update but it is large. It is not suitable for
storing many records. TABLE_DAT_KEY is a only table that supports key update.

TABLE_DAT_KEY is used in Groonga database. Groonga database needs to convert object name
such as ShortText, TokenBigram and table names to object ID. And Groonga database needs to
rename object name. Those features are implemented by TABLE_DAT_KEY. The number of objects
is small. So large data size demerit of TABLE_DAT_KEY can be ignored.

Record ID
Record ID is assigned automatically. You cannot assign record ID.

Record ID of deleted record may be reused.

Valid record ID range is between 1 and 268435455. (1 and 268435455 are valid IDs.)

Persistent table and temporary table
Table is persistent table or temporary table.

Persistent table
Persistent table is named and registered to database. Records in persistent table aren't
deleted after closing table or database.

Persistent table can be created by /reference/commands/table_create command.

Temporary table
Temporary table is anonymous. Records in temporary table are deleted after closing table.
Temporary table is used to store search result, sort result, group (drilldown) result and
so on. TABLE_HASH_KEY is used for search result and group result. TABLE_NO_KEY is used for
sort result.

Limitations
The max number of records is 268435455. You cannot add 268435456 or more records in a
table.

The max number of a key size is 4096byte. You cannot use 4097byte or larger key. You can
use column instead of key for 4097byte or larger size data. Text and LargeText types
supports 4097byte or larger size data.

The max number of total key size is 4GiB. You need to split a table, split a database
(sharding) or reduce each key size to handle 4GiB or more larger total key size.

See also
· /reference/commands/table_create

Column
Column is a data store object or an index object for fast search.

A column belongs to a table. Table has zero or more columns.

Both data store column and index column have type. Type of data store column specifies
data range. In other words, it is "value type". Type of index column specifies set of
documents to be indexed. A set of documents is a table in Groonga. In other words, type of
index column must be a table.

Here are data store columns:

Scalar column
Summary
TODO

Usage
TODO

Vector column
Summary
Vector column is a data store object. It can stores zero or more scalar values. In short,
scalar value is a single value such as number and string. See scalar about scalar value
details.

One of vector column use cases is tags store. You can use a vector column to store tag
values.

You can use vector column as index search target in the same way as scalar column. You can
set weight for each element. The element that has one or more weight is matched, the
record has more score rather than no weight case. It is a vector column specific feature.
Vector column that can store weight is called weight vector column.

You can also do full text search against each text element. But search score is too high
when weight is used. You should use full text search with weight carefully.

Usage
There are three vector column types:

· Normal vector column

· Reference vector column

· Weight vector column

This section describes how to use these types.

Normal vector column
Normal vector column stores zero or more scalar data. For example, scalar data are number,
string and so on.

A normal vector column can store the same type elements. You can't mix types. For example,
you can't store a number and a string in the same normal vector column.

Normal vector column is useful when a record has multiple values with a key. Tags are the
most popular use case.

How to create
Use /reference/commands/column_create command to create a normal vector column. The point
is COLUMN_VECTOR flag:

Execution example:

table_create Bookmarks TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Bookmarks tags COLUMN_VECTOR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can set zero or more tags to a bookmark.

How to load
You can load vector data by JSON array syntax:

[ELEMENT1, ELEMENT2, ELEMENT3, ...]

Let's load the following data:

┌────────────────────┬─────────────────────────────────┐
_keytags
├────────────────────┼─────────────────────────────────┤
http://groonga.org/["groonga"]
├────────────────────┼─────────────────────────────────┤
http://mroonga.org/["mroonga", "mysql", "groonga"]
├────────────────────┼─────────────────────────────────┤
http://ranguba.org/["ruby", "groonga"]
└────────────────────┴─────────────────────────────────┘

Here is a command that loads the data:

Execution example:

load --table Bookmarks
[
{"_key": "http://groonga.org/", "tags": ["groonga"]},
{"_key": "http://mroonga.org/", "tags": ["mroonga", "mysql", "groonga"]},
{"_key": "http://ranguba.org/", "tags": ["ruby", "groonga"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

The loaded data can be outputted as JSON array syntax:

Execution example:

select Bookmarks
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://groonga.org/",
# [
# "groonga"
# ]
# ],
# [
# 2,
# "http://mroonga.org/",
# [
# "mroonga",
# "mysql",
# "groonga"
# ]
# ],
# [
# 3,
# "http://ranguba.org/",
# [
# "ruby",
# "groonga"
# ]
# ]
# ]
# ]
# ]

How to search
You need to create an index to search normal vector column:

Execution example:

table_create Tags TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Tags bookmark_index COLUMN_INDEX Bookmarks tags
# [[0, 1337566253.89858, 0.000355720520019531], true]

There is no vector column specific way. You can create an index like a scalar column.

You can search an element in tags like full text search syntax.

With select-match-columns and select-query:

Execution example:

select Bookmarks --match_columns tags --query mysql --output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://mroonga.org/",
# [
# "mroonga",
# "mysql",
# "groonga"
# ],
# 1
# ]
# ]
# ]
# ]

You can also use weight in select-match-columns:

Execution example:

select Bookmarks --match_columns 'tags * 3' --query mysql --output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://mroonga.org/",
# [
# "mroonga",
# "mysql",
# "groonga"
# ],
# 3
# ]
# ]
# ]
# ]

With select-filter:

Execution example:

select Bookmarks --filter 'tags @ "msyql"' --output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ]
# ]
# ]
# ]

Reference vector column
TODO

Reference vector column is space-efficient if there are many same value elements.
Reference vector column keeps reference record IDs not value itself. Record ID is smaller
than value itself.

How to create
TODO

How to load
TODO

How to search
TODO

Weight vector column
Weight vector column is similar to normal vector column. It can store elements. It can
also stores weights for them. Weight is degree of importance of the element.

Weight is positive integer. 0 is the default weight. It means that no weight.

If weight is one or larger, search score is increased by the weight. If the weight is 0,
score is 1. If the weight is 10, score is 11 (= 1 + 10).

Weight vector column is useful for tuning search score. See also select-adjuster. You can
increase search score of specific records.

Limitations
There are some limitations for now. They will be resolved in the future.

Here are limitations:

· You need to use string representation for element value on load. For example, you
can't use 29 for number 29. You need to use "29" for number 29.

How to create
Use /reference/commands/column_create command to create a weight vector column. The point
is COLUMN_VECTOR|WITH_WEIGHT flags:

Execution example:

table_create Bookmarks TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Bookmarks tags COLUMN_VECTOR|WITH_WEIGHT ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

If you don't specify WITH_WEIGHT flag, it is just a normal vector column.

You can set zero or more tags with weight to a bookmark.

How to load
You can load vector data by JSON object syntax:

{"ELEMENT1": WEIGHT1, "ELEMENT2": WEIGHT2, "ELEMENT3": WEIGHT3, ...}

Let's load the following data:

┌────────────────────┬──────────────────────────────────┐
_keytags
├────────────────────┼──────────────────────────────────┤
http://groonga.org/{"groonga": 100}
├────────────────────┼──────────────────────────────────┤
http://mroonga.org/{"mroonga": 100, "mysql": 50,
│ │ "groonga": 10}
├────────────────────┼──────────────────────────────────┤
http://ranguba.org/{"ruby": 100, "groonga": 50}
└────────────────────┴──────────────────────────────────┘

Here is a command that loads the data:

Execution example:

load --table Bookmarks
[
{"_key": "http://groonga.org/",
"tags": {"groonga": 100}},
{"_key": "http://mroonga.org/",
"tags": {"mroonga": 100,
"mysql": 50,
"groonga": 10}},
{"_key": "http://ranguba.org/",
"tags": {"ruby": 100,
"groonga": 50}}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

The loaded data can be outputted as JSON object syntax:

Execution example:

select Bookmarks
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://groonga.org/",
# {
# "groonga": 100
# }
# ],
# [
# 2,
# "http://mroonga.org/",
# {
# "mroonga": 100,
# "groonga": 10,
# "mysql": 50
# }
# ],
# [
# 3,
# "http://ranguba.org/",
# {
# "ruby": 100,
# "groonga": 50
# }
# ]
# ]
# ]
# ]

How to search
You need to create an index to search weight vector column. You don't forget to specify
WITH_WEIGHT flag to column_create:

Execution example:

table_create Tags TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Tags bookmark_index COLUMN_INDEX|WITH_WEIGHT Bookmarks tags
# [[0, 1337566253.89858, 0.000355720520019531], true]

There is no weight vector column specific way except WITH_WEIGHT flag. You can create an
index like a scalar column.

You can search an element in tags like full text search syntax.

With select-match-columns and select-query:

Execution example:

select Bookmarks --match_columns tags --query groonga --output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://groonga.org/",
# {
# "groonga": 100
# },
# 101
# ],
# [
# "http://mroonga.org/",
# {
# "mroonga": 100,
# "groonga": 10,
# "mysql": 50
# },
# 11
# ],
# [
# "http://ranguba.org/",
# {
# "ruby": 100,
# "groonga": 50
# },
# 51
# ]
# ]
# ]
# ]

You can also use weight in select-match-columns. The score is (1 +
weight_in_weight_vector) * weight_in_match_columns:

Execution example:

select Bookmarks --match_columns 'tags * 3' --query groonga --output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://groonga.org/",
# {
# "groonga": 100
# },
# 303
# ],
# [
# "http://mroonga.org/",
# {
# "mroonga": 100,
# "groonga": 10,
# "mysql": 50
# },
# 33
# ],
# [
# "http://ranguba.org/",
# {
# "ruby": 100,
# "groonga": 50
# },
# 153
# ]
# ]
# ]
# ]

With select-filter:

Execution example:

select Bookmarks --filter 'tags @ "groonga"' --output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://groonga.org/",
# {
# "groonga": 100
# },
# 101
# ],
# [
# "http://mroonga.org/",
# {
# "mroonga": 100,
# "groonga": 10,
# "mysql": 50
# },
# 11
# ],
# [
# "http://ranguba.org/",
# {
# "ruby": 100,
# "groonga": 50
# },
# 51
# ]
# ]
# ]
# ]

How to apply just weight
You can use weight in weight vector column to just increase search score without changing
a set of matched records.

Use select-adjuster for the purpose:

Execution example:

select Bookmarks \
--filter true \
--adjuster 'tags @ "mysql" * 10 + tags @ "groonga" * 5' \
--output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://groonga.org/",
# {
# "groonga": 100
# },
# 506
# ],
# [
# "http://mroonga.org/",
# {
# "mroonga": 100,
# "groonga": 10,
# "mysql": 50
# },
# 566
# ],
# [
# "http://ranguba.org/",
# {
# "ruby": 100,
# "groonga": 50
# },
# 256
# ]
# ]
# ]
# ]

The select command uses --filter true. So all records are matched with score 1. Then it
applies --adjuster. The adjuster does the following:

· tags @ "mysql" * 10 increases score by (1 + weight) * 10 of records that has "mysql"
tag.

· tags @ "groonga" * 5 increases score by (1 + weight) * 5 of records that has
"groonga" tag.

For example, record "http://mroonga.org/" has both "mysql" tag and "groonga" tag. So its
score is increased by 565 (= ((1 + 50) * 10) + ((1 + 10) * 5) = (51 * 10) + (11 * 5) = 510
+ 55). The search score is 1 by --filter true before applying --adjuster. So the final
search score is 566 (= 1 + 565) of record "http://mroonga.org/".

Pseudo column
名前
疑似カラム

説明
Groongaのデータベースで作成したテーブルには、いくつかのカラムが自動的に定義されます。

これらのカラムはいずれもアンダースコア('_')で始まる名前が付与されます。定義される疑似カラムは、テーブルの種類によって異なります。

_id
レコードに付与される一意な番号です。全てのテーブルに定義されます。値の範囲は1〜1073741824の整数で、通常はレコードを追加した順に1ずつ加算されます。_idの値は不変で、レコードが存在する限り変更することはできません。ただし、削除されたレコードの_idの値は再利用されます。

_key
レコードの主キー値を表します。主キーを持つテーブルのみに定義されます。主キー値はテーブルの中で一意であり、変更することはできません。

_value
レコードの値を表します。value_typeを指定したテーブルのみに定義されます。自由に変更可能です。

_score
各レコードのスコア値を表します。検索結果として生成されたテーブルのみに定義されます。

検索処理を実行する過程で値が設定されますが、自由に変更可能です。

_nsubrecs
主キーの値が同一であったレコードの件数を表します。検索結果として生成されたテーブルのみに定義されます。グループ化(drilldown)処理を実行すると、グループ化前のテーブルにおいて、グループ化キーの値が同一であったレコードの件数が、グループ化処理の結果を格納するテーブルの_nsubrecsに記録されます。

Here is an index column:

Index column
Summary
TODO

Usage
TODO

Normalizers
Summary
Groonga has normalizer module that normalizes text. It is used when tokenizing text and
storing table key. For example, A and a are processed as the same character after
normalization.

Normalizer module can be added as a plugin. You can customize text normalization by
registering your normalizer plugins to Groonga.

A normalizer module is attached to a table. A table can have zero or one normalizer
module. You can attach a normalizer module to a table by table-create-normalizer option in
/reference/commands/table_create.

Here is an example table_create that uses NormalizerAuto normalizer module:

Execution example:

table_create Dictionary TABLE_HASH_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]

NOTE:
Groonga 2.0.9 or earlier doesn't have --normalizer option in table_create.
KEY_NORMALIZE flag was used instead.

You can open an old database by Groonga 2.1.0 or later. An old database means that the
database is created by Groonga 2.0.9 or earlier. But you cannot open the opened old
database by Groonga 2.0.9 or earlier. Once you open the old database by Groonga 2.1.0
or later, KEY_NORMALIZE flag information in the old database is converted to normalizer
information. So Groonga 2.0.9 or earlier cannot find KEY_NORMALIZE flag information in
the opened old database.

Keys of a table that has a normalizer module are normalized:

Execution example:

load --table Dictionary
[
{"_key": "Apple"},
{"_key": "black"},
{"_key": "COLOR"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select Dictionary
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 1,
# "apple"
# ],
# [
# 2,
# "black"
# ],
# [
# 3,
# "color"
# ]
# ]
# ]
# ]

NormalizerAuto normalizer normalizes a text as a downcased text. For example, "Apple" is
normalized to "apple", "black" is normalized to "blank" and "COLOR" is normalized to
"color".

If a table is a lexicon for fulltext search, tokenized tokens are normalized. Because
tokens are stored as table keys. Table keys are normalized as described above.

Built-in normalizers
Here is a list of built-in normalizers:

· NormalizerAuto

· NormalizerNFKC51

NormalizerAuto
Normally you should use NormalizerAuto normalizer. NormalizerAuto was the normalizer for
Groonga 2.0.9 or earlier. KEY_NORMALIZE flag in table_create on Groonga 2.0.9 or earlier
equals to --normalizer NormalizerAuto option in table_create on Groonga 2.1.0 or later.

NormalizerAuto supports all encoding. It uses Unicode NFKC (Normalization Form
Compatibility Composition) for UTF-8 encoding text. It uses encoding specific original
normalization for other encodings. The results of those original normalization are similar
to NFKC.

For example, half-width katakana (such as U+FF76 HALFWIDTH KATAKANA LETTER KA) +
half-width katakana voiced sound mark (U+FF9E HALFWIDTH KATAKANA VOICED SOUND MARK) is
normalized to full-width katakana with voiced sound mark (U+30AC KATAKANA LETTER GA). The
former is two characters but the latter is one character.

Here is an example that uses NormalizerAuto normalizer:

Execution example:

table_create NormalLexicon TABLE_HASH_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]

NormalizerNFKC51
NormalizerNFKC51 normalizes texts by Unicode NFKC (Normalization Form Compatibility
Composition) for Unicode version 5.1. It supports only UTF-8 encoding.

Normally you don't need to use NormalizerNFKC51 explicitly. You can use NormalizerAuto
instead.

Here is an example that uses NormalizerNFKC51 normalizer:

Execution example:

table_create NFKC51Lexicon TABLE_HASH_KEY ShortText --normalizer NormalizerNFKC51
# [[0, 1337566253.89858, 0.000355720520019531], true]

Additional normalizers
There are additional normalizers:

· groonga-normalizer-mysql

See also
· /reference/commands/table_create

Tokenizers
Summary
Groonga has tokenizer module that tokenizes text. It is used when the following cases:

· Indexing text
[image] Tokenizer is used when indexing text..UNINDENT

· Searching by query
[image] Tokenizer is used when searching by query..UNINDENT

Tokenizer is an important module for full-text search. You can change trade-off between
precision and recall by changing tokenizer.

Normally, TokenBigram is a suitable tokenizer. If you don't know much about tokenizer,
it's recommended that you choose TokenBigram.

You can try a tokenizer by /reference/commands/tokenize and
/reference/commands/table_tokenize. Here is an example to try TokenBigram tokenizer by
/reference/commands/tokenize:

Execution example:

tokenize TokenBigram "Hello World"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "He"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "el"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lo"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "o "
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": " W"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "Wo"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "or"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "rl"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "ld"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "d"
# }
# ]
# ]

What is tokenize ?
"tokenize" is the process that extracts zero or more tokens from a text. There are some
"tokenize" methods.

For example, Hello World is tokenized to the following tokens by bigram tokenize method:

· He

· el

· ll

· lo

· o_ (_ means a white-space)

· _W (_ means a white-space)

· Wo

· or

· rl

· ld

In the above example, 10 tokens are extracted from one text Hello World.

For example, Hello World is tokenized to the following tokens by white-space-separate
tokenize method:

· Hello

· World

In the above example, 2 tokens are extracted from one text Hello World.

Token is used as search key. You can find indexed documents only by tokens that are
extracted by used tokenize method. For example, you can find Hello World by ll with bigram
tokenize method but you can't find Hello World by ll with white-space-separate tokenize
method. Because white-space-separate tokenize method doesn't extract ll token. It just
extracts Hello and World tokens.

In general, tokenize method that generates small tokens increases recall but decreases
precision. Tokenize method that generates large tokens increases precision but decreases
recall.

For example, we can find Hello World and A or B by or with bigram tokenize method. Hello
World is a noise for people who wants to search "logical and". It means that precision is
decreased. But recall is increased.

We can find only A or B by or with white-space-separate tokenize method. Because World is
tokenized to one token World with white-space-separate tokenize method. It means that
precision is increased for people who wants to search "logical and". But recall is
decreased because Hello World that contains or isn't found.

Built-in tokenizsers
Here is a list of built-in tokenizers:

· TokenBigram

· TokenBigramSplitSymbol

· TokenBigramSplitSymbolAlpha

· TokenBigramSplitSymbolAlphaDigit

· TokenBigramIgnoreBlank

· TokenBigramIgnoreBlankSplitSymbol

· TokenBigramIgnoreBlankSplitAlpha

· TokenBigramIgnoreBlankSplitAlphaDigit

· TokenUnigram

· TokenTrigram

· TokenDelimit

· TokenDelimitNull

· TokenMecab

· TokenRegexp

TokenBigram
TokenBigram is a bigram based tokenizer. It's recommended to use this tokenizer for most
cases.

Bigram tokenize method tokenizes a text to two adjacent characters tokens. For example,
Hello is tokenized to the following tokens:

· He

· el

· ll

· lo

Bigram tokenize method is good for recall because you can find all texts by query consists
of two or more characters.

In general, you can't find all texts by query consists of one character because one
character token doesn't exist. But you can find all texts by query consists of one
character in Groonga. Because Groonga find tokens that start with query by predictive
search. For example, Groonga can find ll and lo tokens by l query.

Bigram tokenize method isn't good for precision because you can find texts that includes
query in word. For example, you can find world by or. This is more sensitive for ASCII
only languages rather than non-ASCII languages. TokenBigram has solution for this problem
described in the below.

TokenBigram behavior is different when it's worked with any /reference/normalizers.

If no normalizer is used, TokenBigram uses pure bigram (all tokens except the last token
have two characters) tokenize method:

Execution example:

tokenize TokenBigram "Hello World"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "He"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "el"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lo"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "o "
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": " W"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "Wo"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "or"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "rl"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "ld"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "d"
# }
# ]
# ]

If normalizer is used, TokenBigram uses white-space-separate like tokenize method for
ASCII characters. TokenBigram uses bigram tokenize method for non-ASCII characters.

You may be confused with this combined behavior. But it's reasonable for most use cases
such as English text (only ASCII characters) and Japanese text (ASCII and non-ASCII
characters are mixed).

Most languages consists of only ASCII characters use white-space for word separator.
White-space-separate tokenize method is suitable for the case.

Languages consists of non-ASCII characters don't use white-space for word separator.
Bigram tokenize method is suitable for the case.

Mixed tokenize method is suitable for mixed language case.

If you want to use bigram tokenize method for ASCII character, see TokenBigramSplitXXX
type tokenizers such as TokenBigramSplitSymbolAlpha.

Let's confirm TokenBigram behavior by example.

TokenBigram uses one or more white-spaces as token delimiter for ASCII characters:

Execution example:

tokenize TokenBigram "Hello World" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "hello"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "world"
# }
# ]
# ]

TokenBigram uses character type change as token delimiter for ASCII characters. Character
type is one of them:

· Alphabet

· Digit

· Symbol (such as (, ) and !)

· Hiragana

· Katakana

· Kanji

· Others

The following example shows two token delimiters:

· at between 100 (digits) and cents (alphabets)

· at between cents (alphabets) and !!! (symbols)

Execution example:

tokenize TokenBigram "100cents!!!" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "100"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "cents"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "!!!"
# }
# ]
# ]

Here is an example that TokenBigram uses bigram tokenize method for non-ASCII characters.

Execution example:

tokenize TokenBigram "日本語の勉強" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "日本"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "本語"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "語の"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "の勉"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "勉強"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "強"
# }
# ]
# ]

TokenBigramSplitSymbol
TokenBigramSplitSymbol is similar to TokenBigram. The difference between them is symbol
handling. TokenBigramSplitSymbol tokenizes symbols by bigram tokenize method:

Execution example:

tokenize TokenBigramSplitSymbol "100cents!!!" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "100"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "cents"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

TokenBigramSplitSymbolAlpha
TokenBigramSplitSymbolAlpha is similar to TokenBigram. The difference between them is
symbol and alphabet handling. TokenBigramSplitSymbolAlpha tokenizes symbols and alphabets
by bigram tokenize method:

Execution example:

tokenize TokenBigramSplitSymbolAlpha "100cents!!!" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "100"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ce"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "en"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "nt"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "ts"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "s!"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

TokenBigramSplitSymbolAlphaDigit
TokenBigramSplitSymbolAlphaDigit is similar to TokenBigram. The difference between them is
symbol, alphabet and digit handling. TokenBigramSplitSymbolAlphaDigit tokenizes symbols,
alphabets and digits by bigram tokenize method. It means that all characters are tokenized
by bigram tokenize method:

Execution example:

tokenize TokenBigramSplitSymbolAlphaDigit "100cents!!!" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "10"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "00"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "0c"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "ce"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "en"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "nt"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "ts"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "s!"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

TokenBigramIgnoreBlank
TokenBigramIgnoreBlank is similar to TokenBigram. The difference between them is blank
handling. TokenBigramIgnoreBlank ignores white-spaces in continuous symbols and non-ASCII
characters.

You can find difference of them by ! ! ! text because it has symbols and
non-ASCII characters.

Here is a result by TokenBigram :

Execution example:

tokenize TokenBigram "日 本 語 ! ! !" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "日"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "本"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "語"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

Here is a result by TokenBigramIgnoreBlank:

Execution example:

tokenize TokenBigramIgnoreBlank "日 本 語 ! ! !" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "日本"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "本語"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "語"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "!!!"
# }
# ]
# ]

TokenBigramIgnoreBlankSplitSymbol
TokenBigramIgnoreBlankSplitSymbol is similar to TokenBigram. The differences between them
are the followings:

· Blank handling

· Symbol handling

TokenBigramIgnoreBlankSplitSymbol ignores white-spaces in continuous symbols and non-ASCII
characters.

TokenBigramIgnoreBlankSplitSymbol tokenizes symbols by bigram tokenize method.

You can find difference of them by ! ! ! text because it has symbols and
non-ASCII characters.

Here is a result by TokenBigram :

Execution example:

tokenize TokenBigram "日 本 語 ! ! !" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "日"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "本"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "語"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

Here is a result by TokenBigramIgnoreBlankSplitSymbol:

Execution example:

tokenize TokenBigramIgnoreBlankSplitSymbol "日 本 語 ! ! !" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "日本"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "本語"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "語!"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

TokenBigramIgnoreBlankSplitSymbolAlpha
TokenBigramIgnoreBlankSplitSymbolAlpha is similar to TokenBigram. The differences between
them are the followings:

· Blank handling

· Symbol and alphabet handling

TokenBigramIgnoreBlankSplitSymbolAlpha ignores white-spaces in continuous symbols and
non-ASCII characters.

TokenBigramIgnoreBlankSplitSymbolAlpha tokenizes symbols and alphabets by bigram tokenize
method.

You can find difference of them by Hello ! ! ! text because it has symbols and
non-ASCII characters with white spaces and alphabets.

Here is a result by TokenBigram :

Execution example:

tokenize TokenBigram "Hello 日 本 語 ! ! !" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "hello"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "日"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "本"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "語"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

Here is a result by TokenBigramIgnoreBlankSplitSymbolAlpha:

Execution example:

tokenize TokenBigramIgnoreBlankSplitSymbolAlpha "Hello 日 本 語 ! ! !" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "he"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "el"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lo"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "o日"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "日本"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "本語"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "語!"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

TokenBigramIgnoreBlankSplitSymbolAlphaDigit
TokenBigramIgnoreBlankSplitSymbolAlphaDigit is similar to TokenBigram. The differences
between them are the followings:

· Blank handling

· Symbol, alphabet and digit handling

TokenBigramIgnoreBlankSplitSymbolAlphaDigit ignores white-spaces in continuous symbols and
non-ASCII characters.

TokenBigramIgnoreBlankSplitSymbolAlphaDigit tokenizes symbols, alphabets and digits by
bigram tokenize method. It means that all characters are tokenized by bigram tokenize
method.

You can find difference of them by Hello ! ! ! 777 text because it has symbols
and non-ASCII characters with white spaces, alphabets and digits.

Here is a result by TokenBigram :

Execution example:

tokenize TokenBigram "Hello 日 本 語 ! ! ! 777" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "hello"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "日"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "本"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "語"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "777"
# }
# ]
# ]

Here is a result by TokenBigramIgnoreBlankSplitSymbolAlphaDigit:

Execution example:

tokenize TokenBigramIgnoreBlankSplitSymbolAlphaDigit "Hello 日 本 語 ! ! ! 777" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "he"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "el"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lo"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "o日"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "日本"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "本語"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "語!"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "!7"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "77"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "77"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "7"
# }
# ]
# ]

TokenUnigram
TokenUnigram is similar to TokenBigram. The differences between them is token unit.
TokenBigram uses 2 characters per token. TokenUnigram uses 1 character per token.

Execution example:

tokenize TokenUnigram "100cents!!!" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "100"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "cents"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "!!!"
# }
# ]
# ]

TokenTrigram
TokenTrigram is similar to TokenBigram. The differences between them is token unit.
TokenBigram uses 2 characters per token. TokenTrigram uses 3 characters per token.

Execution example:

tokenize TokenTrigram "10000cents!!!!!" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "10000"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "cents"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "!!!!!"
# }
# ]
# ]

TokenDelimit
TokenDelimit extracts token by splitting one or more space characters (U+0020). For
example, Hello World is tokenized to Hello and World.

TokenDelimit is suitable for tag text. You can extract groonga and full-text-search and
http as tags from groonga full-text-search http.

Here is an example of TokenDelimit:

Execution example:

tokenize TokenDelimit "Groonga full-text-search HTTP" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "groonga"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "full-text-search"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "http"
# }
# ]
# ]

TokenDelimitNull
TokenDelimitNull is similar to TokenDelimit. The difference between them is separator
character. TokenDelimit uses space character (U+0020) but TokenDelimitNull uses NUL
character (U+0000).

TokenDelimitNull is also suitable for tag text.

Here is an example of TokenDelimitNull:

Execution example:

tokenize TokenDelimitNull "Groonga\u0000full-text-search\u0000HTTP" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "groongau0000full-text-searchu0000http"
# }
# ]
# ]

TokenMecab
TokenMecab is a tokenizer based on MeCab part-of-speech and morphological analyzer.

MeCab doesn't depend on Japanese. You can use MeCab for other languages by creating
dictionary for the languages. You can use NAIST Japanese Dictionary for Japanese.

TokenMecab is good for precision rather than recall. You can find 東京都 and 京都 texts by
京都 query with TokenBigram but 東京都 isn't expected. You can find only 京都 text by 京都
query with TokenMecab.

If you want to support neologisms, you need to keep updating your MeCab dictionary. It
needs maintain cost. (TokenBigram doesn't require dictionary maintenance because
TokenBigram doesn't use dictionary.) mecab-ipadic-NEologd : Neologism dictionary for MeCab
may help you.

Here is an example of TokenMeCab. 東京都 is tokenized to 東京 and . They don't include
京都:

Execution example:

tokenize TokenMecab "東京都"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "東京"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "都"
# }
# ]
# ]

TokenRegexp
New in version 5.0.1.

CAUTION:
This tokenizer is experimental. Specification may be changed.

CAUTION:
This tokenizer can be used only with UTF-8. You can't use this tokenizer with EUC-JP,
Shift_JIS and so on.

TokenRegexp is a tokenizer for supporting regular expression search by index.

In general, regular expression search is evaluated as sequential search. But the following
cases can be evaluated as index search:

· Literal only case such as hello

· The beginning of text and literal case such as \A/home/alice

· The end of text and literal case such as \.txt\z

In most cases, index search is faster than sequential search.

TokenRegexp is based on bigram tokenize method. TokenRegexp adds the beginning of text
mark (U+FFEF) at the begging of text and the end of text mark (U+FFF0) to the end of text
when you index text:

Execution example:

tokenize TokenRegexp "/home/alice/test.txt" NormalizerAuto --mode ADD
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "￯"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "/h"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ho"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "om"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "me"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "e/"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "/a"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "al"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "li"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "ic"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ce"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "e/"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "/t"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 14,
# "force_prefix": false,
# "value": "es"
# },
# {
# "position": 15,
# "force_prefix": false,
# "value": "st"
# },
# {
# "position": 16,
# "force_prefix": false,
# "value": "t."
# },
# {
# "position": 17,
# "force_prefix": false,
# "value": ".t"
# },
# {
# "position": 18,
# "force_prefix": false,
# "value": "tx"
# },
# {
# "position": 19,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 20,
# "force_prefix": false,
# "value": "t"
# },
# {
# "position": 21,
# "force_prefix": false,
# "value": "￰"
# }
# ]
# ]

Token filters
Summary
Groonga has token filter module that some processes tokenized token.

Token filter module can be added as a plugin.

You can customize tokenized token by registering your token filters plugins to Groonga.

A table can have zero or more token filters. You can attach token filters to a table by
table-create-token-filters option in /reference/commands/table_create.

Here is an example table_create that uses TokenFilterStopWord token filter module:

Execution example:

register token_filters/stop_word
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStopWord
# [[0, 1337566253.89858, 0.000355720520019531], true]

Available token filters
Here is the list of available token filters:

· TokenFilterStopWord

· TokenFilterStem

TokenFilterStopWord
TokenFilterStopWord removes stop words from tokenized token in searching the documents.

TokenFilterStopWord can specify stop word after adding the documents because it removes
token in searching the documents.

The stop word is specified is_stop_word column on lexicon table.

Here is an example that uses TokenFilterStopWord token filter:

Execution example:

register token_filters/stop_word
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Memos TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Memos content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStopWord
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table Memos
[
{"content": "Hello"},
{"content": "Hello and Good-bye"},
{"content": "Good-bye"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select Memos --match_columns content --query "Hello and"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "Hello"
# ],
# [
# 2,
# "Hello and Good-bye"
# ]
# ]
# ]
# ]

and token is marked as stop word in Terms table.

"Hello" that doesn't have and in content is matched. Because and is a stop word and and is
removed from query.

TokenFilterStem
TokenFilterStem stems tokenized token.

Here is an example that uses TokenFilterStem token filter:

Execution example:

register token_filters/stem
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Memos TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Memos content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStem
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Memos
[
{"content": "I develop Groonga"},
{"content": "I'm developing Groonga"},
{"content": "I developed Groonga"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select Memos --match_columns content --query "develops"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "I develop Groonga"
# ],
# [
# 2,
# "I'm developing Groonga"
# ],
# [
# 3,
# "I developed Groonga"
# ]
# ]
# ]
# ]

All of develop, developing, developed and develops tokens are stemmed as develop. So we
can find develop, developing and developed by develops query.

See also
· /reference/commands/table_create

Query expanders
QueryExpanderTSV
Summary
QueryExpanderTSV is a query expander plugin that reads synonyms from TSV (Tab Separated
Values) file. This plugin provides poor feature than the embedded query expansion feature.
For example, it doesn't support word normalization. But it may be easy to use because you
can manage your synonyms by TSV file. You can edit your synonyms by spreadsheet
application such as Excel. With the embedded query expansion feature, you manage your
synonyms by Groonga's table.

Install
You need to register query_expanders/tsv as a plugin before you use QueryExpanderTSV:

plugin_register query_expanders/tsv

Usage
You just add --query_expander QueryExpanderTSV parameter to select command:

select --query "QUERY" --query_expander QueryExpanderTSV

If QUERY has registered synonyms, they are expanded. For example, there are the following
synonyms.

┌────────┬───────────┬───────────────┐
│word │ synonym 1 │ synonym 2 │
├────────┼───────────┼───────────────┤
│groonga │ groonga │ Senna │
├────────┼───────────┼───────────────┤
│mroonga │ mroonga │ groonga MySQL │
└────────┴───────────┴───────────────┘

The table means that synonym 1 and synonym 2 are synonyms of word. For example, groonga
and Senna are synonyms of groonga. And mroonga and groonga MySQL are synonyms of mroonga.

Here is an example of query expnasion that uses groonga as query:

select --query "groonga" --query_expander QueryExpanderTSV

The above command equals to the following command:

select --query "groonga OR Senna" --query_expander QueryExpanderTSV

Here is another example of query expnasion that uses mroonga search as query:

select --query "mroonga search" --query_expander QueryExpanderTSV

The above command equals to the following command:

select --query "(mroonga OR (groonga MySQL)) search" --query_expander QueryExpanderTSV

It is important that registered words (groonga and mroonga) are only expanded to synonyms
and not registered words (search) are not expanded. Query expansion isn't occurred
recursively. groonga is appeared in (mroonga OR (groonga MySQL)) as query expansion result
but it isn't expanded.

Normally, you need to include word itself into synonyms. For example, groonga and mroonga
are included in synonyms of themselves. If you want to ignore word itself, you don't
include word itself into synonyms. For example, if you want to use query expansion as
spelling correction, you should use the following synonyms.

┌───────┬─────────┐
│word │ synonym │
├───────┼─────────┤
│gronga │ groonga │
└───────┴─────────┘

gronga in word has a typo. A o is missing. groonga in synonym is the correct word.

Here is an example of using query expnasion as spelling correction:

select --query "gronga" --query_expander QueryExpanderTSV

The above command equals to the following command:

select --query "groonga" --query_expander QueryExpanderTSV

The former command has a typo in --query value but the latter command doesn't have any
typos.

TSV File
Synonyms are defined in TSV format file. This section describes about it.

Location
The file name should be synonyms.tsv and it is located at configuration directory. For
example, /etc/groonga/synonyms.tsv is a TSV file location. The location is decided at
build time.

You can change the location by environment variable GRN_QUERY_EXPANDER_TSV_SYNONYMS_FILE
at run time:

% env GRN_QUERY_EXPANDER_TSV_SYNONYMS_FILE=/tmp/synonyms.tsv groonga

With the above command, /tmp/synonyms.tsv file is used.

Format
You can define zero or more synonyms in a TSV file. You define a word and synonyms pair by
a line. word is expanded to synonyms in --query value. Synonyms are combined by OR. For
example, groonga and Senna synonyms are expanded as groonga OR Senna.

The first column is word and the rest columns are synonyms of the word. Here is a sample
line for word is groonga and synonyms are groonga and Senna. (TAB) means a tab character
(U+0009):

groonga(TAB)groonga(TAB)Senna

Comment line is supported. Lines that start with # are ignored. Here is an example for
comment line. groonga line is ignored as comment line:

#groonga(TAB)groonga(TAB)Senna
mroonga(TAB)mroonga(TAB)groonga MySQL

Limitation
You need to restart groonga to reload your synonyms. TSV file is loaded only at the plugin
load time.

See also
· select-query-expansion

Scorer
Summary
Groonga has scorer module that customizes score function. Score function computes score of
matched record. The default scorer function uses the number of appeared terms. It is also
known as TF (term frequency).

TF is a fast score function but it's not suitable for the following cases:

· Search query contains one or more frequently-appearing words such as "the" and "a".

· Document contains many same keywords such as "They are keyword, keyword, keyword ...
and keyword". Search engine spammer may use the technique.

Score function can solve these cases. For example, TF-IDF (term frequency-inverse document
frequency) can solve the first case. Okapi BM25 can solve the second case. But their are
slower than TF.

Groonga provides TF-IDF based scorer as /reference/scorers/scorer_tf_idf but doesn't
provide Okapi BM25 based scorer yet.
You don't need to resolve scoring only by score function. Score function is highly
depends on search query. You may be able to use metadata of matched record.

For example, Google uses PageRank for scoring. You may be able to use data type
("title" data are important rather than "memo" data), tag, geolocation and so on.

Please stop to think about only score function for scoring.

Usage
This section describes how to use scorer.

Here are a schema definition and sample data to show usage.

Sample schema:

Execution example:

table_create Memos TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Memos title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Memos content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms title_index COLUMN_INDEX|WITH_POSITION Memos title
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms content_index COLUMN_INDEX|WITH_POSITION Memos content
# [[0, 1337566253.89858, 0.000355720520019531], true]

Sample data:

Execution example:

load --table Memos
[
{
"_key": "memo1",
"title": "Groonga is easy",
"content": "Groonga is very easy full text search engine!"
},
{
"_key": "memo2",
"title": "Mroonga is easy",
"content": "Mroonga is more easier full text search engine!"
},
{
"_key": "memo3",
"title": "Rroonga is easy",
"content": "Ruby is very helpful."
},
{
"_key": "memo4",
"title": "Groonga is fast",
"content": "Groonga! Groonga! Groonga! Groonga is very fast!"
},
{
"_key": "memo5",
"title": "PGroonga is fast",
"content": "PGroonga is very fast!"
},
{
"_key": "memo6",
"title": "PGroonga is useful",
"content": "SQL is easy because many client libraries exist."
},
{
"_key": "memo7",
"title": "Mroonga is also useful",
"content": "MySQL has replication feature. Mroonga can use it."
}
]
# [[0, 1337566253.89858, 0.000355720520019531], 7]

You can specify custom score function in select-match-columns. There are some syntaxes.

For score function that doesn't require any parameter such as
/reference/scorers/scorer_tf_idf:

SCORE_FUNCTION(COLUMN)

You can specify weight:

SCORE_FUNCTION(COLUMN) * WEIGHT

For score function that requires one or more parameters such as
/reference/scorers/scorer_tf_at_most:

SCORE_FUNCTION(COLUMN, ARGUMENT1, ARGUMENT2, ...)

You can specify weight:

SCORE_FUNCTION(COLUMN, ARGUMENT1, ARGUMENT2, ...) * WEIGHT

You can use different score function for each select-match-columns:

SCORE_FUNCTION1(COLUMN1) ||
SCORE_FUNCTION2(COLUMN2) * WEIGHT ||
SCORE_FUNCTION3(COLUMN3, ARGUMENT1) ||
...

Here is a simplest example:

Execution example:

select Memos \
--match_columns "scorer_tf_idf(content)" \
--query "Groonga" \
--output_columns "content, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga! Groonga! Groonga! Groonga is very fast!",
# 2
# ],
# [
# "Groonga is very easy full text search engine!",
# 1
# ]
# ]
# ]
# ]

Groonga! Groonga! Groonga! Groonga is very fast! contains 4 Groonga. If you use TF based
scorer that is the default scorer, _score is 4. But the actual _score is 2. Because the
select command uses TF-IDF based scorer scorer_tf_idf().

Here is an example that uses weight:

Execution example:

select Memos \
--match_columns "scorer_tf_idf(content) * 10" \
--query "Groonga" \
--output_columns "content, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga! Groonga! Groonga! Groonga is very fast!",
# 22
# ],
# [
# "Groonga is very easy full text search engine!",
# 10
# ]
# ]
# ]
# ]

Groonga! Groonga! Groonga! Groonga is very fast! has 22 as _score. It had 2 as _score in
the previous example that doesn't specify weight.

Here is an example that uses scorer that requires one argument.
/reference/scorers/scorer_tf_at_most scorer requires one argument. You can limit TF score
by the scorer.

Execution example:

select Memos \
--match_columns "scorer_tf_at_most(content, 2.0)" \
--query "Groonga" \
--output_columns "content, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga! Groonga! Groonga! Groonga is very fast!",
# 2
# ],
# [
# "Groonga is very easy full text search engine!",
# 1
# ]
# ]
# ]
# ]

Groonga! Groonga! Groonga! Groonga is very fast! contains 4 Groonga. If you use normal TF
based scorer that is the default scorer, _score is 4. But the actual _score is 2. Because
the scorer used in the select command limits the maximum score value to 2.

Here is an example that uses multiple scorers:

Execution example:

select Memos \
--match_columns "scorer_tf_idf(title) || scorer_tf_at_most(content, 2.0)" \
--query "Groonga" \
--output_columns "title, content, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "title",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga is fast",
# "Groonga! Groonga! Groonga! Groonga is very fast!",
# 3
# ],
# [
# "Groonga is easy",
# "Groonga is very easy full text search engine!",
# 2
# ]
# ]
# ]
# ]

The --match_columns uses scorer_tf_idf(title) and scorer_tf_at_most(content, 2.0). _score
value is sum of them.

You can use the default scorer and custom scorer in the same --match_columns. You can use
the default scorer by just specifying a match column:

Execution example:

select Memos \
--match_columns "title || scorer_tf_at_most(content, 2.0)" \
--query "Groonga" \
--output_columns "title, content, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "title",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga is fast",
# "Groonga! Groonga! Groonga! Groonga is very fast!",
# 3
# ],
# [
# "Groonga is easy",
# "Groonga is very easy full text search engine!",
# 2
# ]
# ]
# ]
# ]

The --match_columns uses the default scorer (TF) for title and
/reference/scorers/scorer_tf_at_most for content. _score value is sum of them.

Built-in scorers
Here are built-in scores:

scorer_tf_at_most
NOTE:
This scorer is an experimental feature.

New in version 5.0.1.

Summary
scorer_tf_at_most is a scorer based on TF (term frequency).

TF based scorer includes TF-IDF based scorer has a problem for the following case:

If document contains many same keywords such as "They are keyword, keyword, keyword ...
and keyword", the document has high score. It's not expected. Search engine spammer may
use the technique.

scorer_tf_at_most is a TF based scorer but it can solve the case.

scorer_tf_at_most limits the maximum score value. It means that scorer_tf_at_most limits
effect of a match.

If document contains many same keywords such as "They are keyword, keyword, keyword ...
and keyword", scorer_tf_at_most(column, 2.0) returns at most 2 as score.
You don't need to resolve scoring only by score function. Score function is highly
depends on search query. You may be able to use metadata of matched record.

For example, Google uses PageRank for scoring. You may be able to use data type
("title" data are important rather than "memo" data), tag, geolocation and so on.

Please stop to think about only score function for scoring.

Syntax
This scorer has two parameters:

scorer_tf_at_most(column, max)
scorer_tf_at_most(index, max)

Usage
This section describes how to use this scorer.

Here are a schema definition and sample data to show usage.

Sample schema:

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs message COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms message_index COLUMN_INDEX|WITH_POSITION Logs message
# [[0, 1337566253.89858, 0.000355720520019531], true]

Sample data:

Execution example:

load --table Logs
[
{"message": "Notice"},
{"message": "Notice Notice"},
{"message": "Notice Notice Notice"},
{"message": "Notice Notice Notice Notice"},
{"message": "Notice Notice Notice Notice Notice"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

You specify scorer_tf_at_most in select-match-columns like the following:

Execution example:

select Logs \
--match_columns "scorer_tf_at_most(message, 3.0)" \
--query "Notice" \
--output_columns "message, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "message",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Notice Notice Notice Notice Notice",
# 3
# ],
# [
# "Notice Notice Notice Notice",
# 3
# ],
# [
# "Notice Notice Notice",
# 3
# ],
# [
# "Notice Notice",
# 2
# ],
# [
# "Notice",
# 1
# ]
# ]
# ]
# ]

If a document has three or more Notice terms, its score is 3. Because the select specify
3.0 as the max score.

If a document has one or two Notice terms, its score is 1 or 2. Because the score is less
than 3.0 specified as the max score.

Parameters
This section describes all parameters.

Required parameters
There is only one required parameters.

column
The data column that is match target. The data column must be indexed.

index
The index column to be used for search.

Optional parameters
There is no optional parameter.

Return value
This scorer returns score as builtin-type-float.

/reference/commands/select returns _score as Int32 not Float. Because it casts to Int32
from Float for keeping backward compatibility.

Score is computed as TF with limitation.

See also
· ../scorer

scorer_tf_idf
NOTE:
This scorer is an experimental feature.

New in version 5.0.1.

Summary
scorer_tf_idf is a scorer based of TF-IDF (term frequency-inverse document frequency)
score function.

To put it simply, TF (term frequency) divided by DF (document frequency) is TF-IDF. "TF"
means that "the number of occurrences is more important". "TF divided by DF" means that
"the number of occurrences of important term is more important".

The default score function in Groonga is TF (term frequency). It doesn't care about term
importance but is fast.

TF-IDF cares about term importance but is slower than TF.

TF-IDF will compute more suitable score rather than TF for many cases. But it's not
perfect.

If document contains many same keywords such as "They are keyword, keyword, keyword ...
and keyword", it increases score by TF and TF-IDF. Search engine spammer may use the
technique. But TF-IDF doesn't guard from the technique.

Okapi BM25 can solve the case. But it's more slower than TF-IDF and not implemented yet in
Groonga.

Groonga provides scorer_tf_at_most scorer that can also solve the case.
You don't need to resolve scoring only by score function. Score function is highly
depends on search query. You may be able to use metadata of matched record.

For example, Google uses PageRank for scoring. You may be able to use data type
("title" data are important rather than "memo" data), tag, geolocation and so on.

Please stop to think about only score function for scoring.

Syntax
This scorer has only one parameter:

scorer_tf_idf(column)
scorer_tf_idf(index)

Usage
This section describes how to use this scorer.

Here are a schema definition and sample data to show usage.

Sample schema:

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs message COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms message_index COLUMN_INDEX|WITH_POSITION Logs message
# [[0, 1337566253.89858, 0.000355720520019531], true]

Sample data:

Execution example:

load --table Logs
[
{"message": "Error"},
{"message": "Warning"},
{"message": "Warning Warning"},
{"message": "Warning Warning Warning"},
{"message": "Info"},
{"message": "Info Info"},
{"message": "Info Info Info"},
{"message": "Info Info Info Info"},
{"message": "Notice"},
{"message": "Notice Notice"},
{"message": "Notice Notice Notice"},
{"message": "Notice Notice Notice Notice"},
{"message": "Notice Notice Notice Notice Notice"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 13]

You specify scorer_tf_idf in select-match-columns like the following:

Execution example:

select Logs \
--match_columns "scorer_tf_idf(message)" \
--query "Error OR Info" \
--output_columns "message, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "message",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Info Info Info Info",
# 3
# ],
# [
# "Error",
# 2
# ],
# [
# "Info Info Info",
# 2
# ],
# [
# "Info Info",
# 1
# ],
# [
# "Info",
# 1
# ]
# ]
# ]
# ]

Both the score of Info Info Info and the score of Error are 2 even Info Info Info includes
three Info terms. Because Error is more important term rather than Info. The number of
documents that include Info is 4. The number of documents that include Error is 1. Term
that is included in less documents means that the term is more characteristic term.
Characteristic term is important term.

Parameters
This section describes all parameters.

Required parameters
There is only one required parameters.

column
The data column that is match target. The data column must be indexed.

index
The index column to be used for search.

Optional parameters
There is no optional parameter.

Return value
This scorer returns score as builtin-type-float.

/reference/commands/select returns _score as Int32 not Float. Because it casts to Int32
from Float for keeping backward compatibility.

Score is computed as TF-IDF based algorithm.

See also
· ../scorer

grn_expr
Grn_expr is an object that searches records with specified conditions and manipulates a
database. It's pronounced as gurun expression.

Conditions for searching records from a database can be represented by conbining condition
expressions such as equal condition expression and less than condition expression with set
operations such as AND, OR and NOT. Grn_expr executes those conditions to search records.
You can also use advanced searches such as similar search and near search by grn_expr. You
can also use flexible full text search. For example, you can control hit scores for
specified words and improve recall by re-searching with high-recall algolithm dinamically.
To determine whether re-searching or not, the number of matched rescords is used.

There are three ways to create grn_expr:

· Parsing /reference/grn_expr/query_syntax string.

· Parsing /reference/grn_expr/script_syntax string.

· Calling grn_expr related APIs.

/reference/grn_expr/query_syntax is for common search form in Internet search site. It's
simple and easy to use but it has a limitation. You can not use all condition expressions
and set operations in /reference/grn_expr/query_syntax. You can use
/reference/grn_expr/query_syntax with query option in /reference/commands/select.

/reference/grn_expr/script_syntax is ECMAScript like syntax. You can use all condition
expressions and set operations in /reference/grn_expr/script_syntax. You can use
/reference/grn_expr/script_syntax with filter option and scorer option in
/reference/commands/select.

You can use groonga as a library and create a grn_expr by calling grn_expr related APIs.
You can use full features with calling APIs like /reference/grn_expr/script_syntax.
Calling APIs is useful creating a custom syntax to create grn_expr. They are used in
rroonga that is Ruby bindings of Groonga. Rroonga can create a grn_expr by Ruby's syntax
instead of parsing string.

Query syntax
Query syntax is a syntax to specify search condition for common Web search form. It is
similar to the syntax of Google's search form. For example, word1 word2 means that groonga
searches records that contain both word1 and word2. word1 OR word2 means that groogna
searches records that contain either word1 or word2.

Query syntax consists of conditional expression, combind expression and assignment
expression. Nomrally assignment expression can be ignored. Because assignment expression
is disabled in --query option of /reference/commands/select. You can use it if you use
groonga as library and customize query syntax parser options.

Conditinal expression specifies an condition. Combinded expression consists of one or more
conditional expression, combined expression or assignment expression. Assignment
expression can assigns a column to a value.

Sample data
Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key": "The first post!",
"content": "Welcome! This is my first post!",
"n_likes": 5},
{"_key": "Groonga",
"content": "I started to use Groonga. It's very fast!",
"n_likes": 10},
{"_key": "Mroonga",
"content": "I also started to use Mroonga. It's also very fast! Really fast!",
"n_likes": 15},
{"_key": "Good-bye Senna",
"content": "I migrated all Senna system!",
"n_likes": 3},
{"_key": "Good-bye Tritonn",
"content": "I also migrated all Tritonn system!",
"n_likes": 3}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

There is a table, Entries, for blog entries. An entry has title, content and the number of
likes for the entry. Title is key of Entries. Content is value of Entries.content column.
The number of likes is value of Entries.n_likes column.

Entries._key column and Entries.content column are indexed using TokenBigram tokenizer. So
both Entries._key and Entries.content are fulltext search ready.

OK. The schema and data for examples are ready.

Escape
There are special characters in query syntax. To use a special character as itself, it
should be escaped by prepending \. For example, " is a special character. It is escaped as
\".

Here is a special character list:

· [space] (escaped as [backslash][space]) (You should substitute [space] with a white
space character that is 0x20 in ASCII and [backslash] with \\.)

· " (escaped as \")

· ' (escaped as \')

· ( (escaped as \()

· ) (escaped as \))

· \ (escaped as \\)

You can use quote instead of escape special characters except \ (backslash). You need to
use backslash for escaping backslash like \\ in quote.

Quote syntax is "..." or '...'. You need escape " as \" in "..." quote syntax. You need
escape ' as \' in '...' quote syntax. For example, Alice's brother (Bob) can be quoted
"Alice's brother (Bob)" or 'Alice\'s brother (Bob)'.

NOTE:
There is an important point which you have to care. The \ (backslash) character is
interpreted by command line shell. So if you want to search ( itself for example, you
need to escape twice (\\() in command line shell. The command line shell interprets
\\( as \(, then pass such a literal to Groonga. Groonga regards \( as (, then search (
itself from database. If you can't do intended search by Groonga, confirm whether
special character is escaped properly.

Conditional expression
Here is available conditional expression list.

Full text search condition
Its syntax is keyword.

Full text search condition specifies a full text search condition against the default
match columns. Match columns are full text search target columns.

You should specify the default match columns for full text search. They can be specified
by --match_columns option of /reference/commands/select. If you don't specify the default
match columns, this conditional expression fails.

This conditional expression does full text search with keyword. keyword should not contain
any spaces. If keyword contains a space such as search keyword, it means two full text
search conditions; search and keyword. If you want to specifies a keyword that contains
one or more spaces, you can use phrase search condition that is described below.

Here is a simple example.

Execution example:

select Entries --match_columns content --query fast
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that contain a word fast in content column value.

content column is the default match column.

Phrase search condition
Its syntax is "search keyword".

Phrase search condition specifies a phrase search condition against the default match
columns.

You should specify the default match columns for full text search. They can be specified
by --match_columns option of /reference/commands/select. If you don't specify the default
match columns, this conditional expression fails.

This conditional expression does phrase search with search keyword. Phrase search searches
records that contain search and keyword and those terms are appeared in the same order and
adjacent. Thus, Put a search keyword in the form is matched but Search by the keyword and
There is a keyword. Search by it! aren't matched.

Here is a simple example.

Execution example:

select Entries --match_columns content --query '"I started"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that contain a phrase I started in content column value. I
also started isn't matched because I and started aren't adjacent.

content column is the default match column.

Full text search condition (with explicit match column)
Its syntax is column:@keyword.

It's similar to full text search condition but it doesn't require the default match
columns. You need to specify match column for the full text search condition by column:
instead of --match_columns option of /reference/commands/select.

This condtional expression is useful when you want to use two or more full text search
against different columns. The default match columns specified by --match_columns option
can't be specified multiple times. You need to specify the second match column by this
conditional expression.

The different between full text search condition and full text search condition (with
explicit match column) is whether advanced match columns are supported or not. Full text
search condition supports advanced match columns but full text search condition (with
explicit match column) isn't supported. Advanced match columns has the following features:

· Weight is supported.

· Using multiple columns are supported.

· Using index column as a match column is supported.

See description of --match_columns option of /reference/commands/select about them.

Here is a simple example.

Execution example:

select Entries --query content:@fast
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that contain a word fast in content column value.

Phrase search condition (with explicit match column)
Its syntax is column:@"search keyword".

It's similar to phrase search condition but it doesn't require the default match columns.
You need to specify match column for the phrase search condition by column: instead of
--match_columns option of /reference/commands/select.

The different between phrase search condition and phrase search condition (with explicit
match column) is similar to between full text search condition and full text search
condition (with explicit match column). Phrase search condition supports advanced match
columns but phrase search condition (with explicit match column) isn't supported. See
description of full text search condition (with explicit match column) about advanced
match columns.

Here is a simple example.

Execution example:

select Entries --query 'content:@"I started"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that contain a phrase I started in content column value. I
also started isn't matched because I and started aren't adjacent.

Prefix search condition
Its syntax is column:^value or value*.

This conditional expression does prefix search with value. Prefix search searches records
that contain a word that starts with value.

You can use fast prefix search against a column. The column must be indexed and index
table must be patricia trie table (TABLE_PAT_KEY) or double array trie table
(TABLE_DAT_KEY). You can also use fast prefix search against _key pseudo column of
patricia trie table or double array trie table. You don't need to index _key.

Prefix search can be used with other table types but it causes all records scan. It's not
problem for small records but it spends more time for large records.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query '_key:^Goo'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ]
# ]
# ]
# ]

The expression matches records that contain a word that starts with Goo in _key pseudo
column value. Good-bye Senna and Good-bye Tritonn are matched with the expression.

Suffix search condition
Its syntax is column:$value.

This conditional expression does suffix search with value. Suffix search searches records
that contain a word that ends with value.

You can use fast suffix search against a column. The column must be indexed and index
table must be patricia trie table (TABLE_PAT_KEY) with KEY_WITH_SIS flag. You can also use
fast suffix search against _key pseudo column of patricia trie table (TABLE_PAT_KEY) with
KEY_WITH_SIS flag. You don't need to index _key. We recommended that you use index column
based fast suffix search instead of _key based fast suffix search. _key based fast suffix
search returns automatically registered substrings. (TODO: write document about suffix
search and link to it from here.)

NOTE:
Fast suffix search can be used only for non-ASCII characters such as hiragana in
Japanese. You cannot use fast suffix search for ASCII character.

Suffix search can be used with other table types or patricia trie table without
KEY_WITH_SIS flag but it causes all records scan. It's not problem for small records but
it spends more time for large records.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example. It uses fast suffix search for hiragana in Japanese that is one
of non-ASCII characters.

Execution example:

table_create Titles TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Titles content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create SuffixSearchTerms TABLE_PAT_KEY|KEY_WITH_SIS ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create SuffixSearchTerms index COLUMN_INDEX Titles content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Titles
[
{"content": "ぐるんが"},
{"content": "むるんが"},
{"content": "せな"},
{"content": "とりとん"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
select Titles --query 'content:$んが'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 2,
# "むるんが"
# ],
# [
# 1,
# "ぐるんが"
# ]
# ]
# ]
# ]

The expression matches records that have value that ends with んが in content column
value. ぐるんが and むるんが are matched with the expression.

Equal condition
Its syntax is column:value.

It matches records that column value is equal to value.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query _key:Groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that _key column value is equal to Groonga.

Not equal condition
Its syntax is column:!value.

It matches records that column value isn't equal to value.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query _key:!Groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that _key column value is not equal to Groonga.

Less than condition
Its syntax is column:<value.

It matches records that column value is less than value.

If column type is numerical type such as Int32, column value and value are compared as
number. If column type is text type such as ShortText, column value and value are compared
as bit sequence.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query n_likes:<10
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is less than 10.

Greater than condition
Its syntax is column:>value.

It matches records that column value is greater than value.

If column type is numerical type such as Int32, column value and value are compared as
number. If column type is text type such as ShortText, column value and value are compared
as bit sequence.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query n_likes:>10
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is greater than 10.

Less than or equal to condition
Its syntax is column:<=value.

It matches records that column value is less than or equal to value.

If column type is numerical type such as Int32, column value and value are compared as
number. If column type is text type such as ShortText, column value and value are compared
as bit sequence.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query n_likes:<=10
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is less than or equal to 10.

Greater than or equal to condition
Its syntax is column:>=value.

It matches records that column value is greater than or equal to value.

If column type is numerical type such as Int32, column value and value are compared as
number. If column type is text type such as ShortText, column value and value are compared
as bit sequence.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query n_likes:>=10
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is greater than or equal to 10.

Regular expression condition
New in version 5.0.1.

Its syntax is column:~pattern.

It matches records that column value is matched to pattern. pattern must be valid
/reference/regular_expression.

The following example uses .roonga as pattern. It matches Groonga, Mroonga and so on.

Execution example:

select Entries --query content:~.roonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

In most cases, regular expression is evaluated sequentially. So it may be slow against
many records.

In some cases, Groonga evaluates regular expression by index. It's very fast. See
/reference/regular_expression for details.

Combined expression
Here is available combined expression list.

Logical OR
Its syntax is a OR b.

a and b are conditional expressions, conbinded expressions or assignment expressions.

If at least one of a and b are matched, a OR b is matched.

Here is a simple example.

Execution example:

select Entries --query 'n_likes:>10 OR content:@senna'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is greater than 10 or contain a
word senna in content column value.

Logical AND
Its syntax is a + b or just a b.

a and b are conditional expressions, conbinded expressions or assignment expressions.

If both a and b are matched, a + b is matched.

You can specify + the first expression such as +a. The + is just ignored.

Here is a simple example.

Execution example:

select Entries --query 'n_likes:>=10 + content:@groonga'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is greater than or equal to 10
and contain a word groonga in content column value.

Logical NOT
Its syntax is a - b.

a and b are conditional expressions, conbinded expressions or assignment expressions.

If a is matched and b is not matched, a - b is matched.

You can not specify - the first expression such as -a. It's syntax error.

Here is a simple example.

Execution example:

select Entries --query 'n_likes:>=10 - content:@groonga'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is greater than or equal to 10
and don't contain a word groonga in content column value.

Grouping
Its syntax is (...). ... is space separated expression list.

(...) groups one ore more expressions and they can be processed as an expression. a b OR c
means that a and b are matched or c is matched. a (b OR c) means that a and one of b and c
are matched.

Here is a simple example.

Execution example:

select Entries --query 'n_likes:<5 content:@senna OR content:@fast'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
select Entries --query 'n_likes:<5 (content:@senna OR content:@fast)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ]
# ]
# ]
# ]

The first expression doesn't use grouping. It matches records that n_likes:<5 and
content:@senna are matched or content:@fast is matched.

The second expression uses grouping. It matches records that n_likes:<5 and one of
content:@senna or content:@fast are matched.

Assignment expression
This section is for advanced users. Because assignment expression is disabled in --query
option of /reference/commands/select by default. You need to specify
ALLOW_COLUMN|ALLOW_UPDATE as --query_flags option value to enable assignment expression.

Assignment expression in query syntax has some limitations. So you should use
/reference/grn_expr/script_syntax instead of query syntax for assignment.

There is only one syntax for assignment expression. It's column:=value.

value is assigend to column. value is always processed as string in query syntax. value is
casted to the type of column automatically. It causes some limitations. For example, you
cannot use boolean literal such as true and false for Bool type column. You need to use
empty string for false but query syntax doesn't support column:= syntax.

See /reference/cast about cast.

Script syntax
Script syntax is a syntax to specify complex search condition. It is similar to
ECMAScript. For example, _key == "book" means that groonga searches records that _key
value is "book". All values are string in query_syntax but its own type in script syntax.
For example, "book" is string, 1 is integer, TokenBigram is the object whose name is
TokenBigram and so on.

Script syntax doesn't support full ECMAScript syntax. For example, script syntax doesn't
support statement such as if control statement, for iteration statement and variable
definition statement. Function definion is not supported too. But script syntax addes the
original additional operators. They are described after ECMAScript syntax is described.

Security
For security reason, you should not pass an input from users to Groonga directly. If there
is an evil user, the user may input a query that retrieves records that should not be
shown to the user.

Think about the following case.

A Groonga application constructs a Groonga request by the following program:

filter = "column @ \"#{user_input}\""
select_options = {
# ...
:filter => filter,
}
groonga_client.select(select_options)

user_input is an input from user. If the input is query, here is the constructed
select-filter parameter:

column @ "query"

If the input is x" || true || ", here is the constructed select-filter parameter:

column @ "x" || true || ""

This query matches to all records. The user will get all records from your database. The
user may be evil.

It's better that you just receive an user input as a value. It means that you don't accept
that user input can contain operator such as @ and &&. If you accept operator, user can
create evil query.

If user input has only value, you blocks evil query by escaping user input value. Here is
a list how to escape user input value:

· True value: Convert it to true.

· False value: Convert it to false.

· Numerical value: Convert it to Integer or Float. For example, 1.2, -10, 314e-2 and so
on.

· String value: Replace " with \" and \ with \\ in the string value and surround
substituted string value by ". For example, double " quote and back \ slash should be
converted to "double \" quote and back \\ slash".

Sample data
Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key": "The first post!",
"content": "Welcome! This is my first post!",
"n_likes": 5},
{"_key": "Groonga",
"content": "I started to use Groonga. It's very fast!",
"n_likes": 10},
{"_key": "Mroonga",
"content": "I also started to use Mroonga. It's also very fast! Really fast!",
"n_likes": 15},
{"_key": "Good-bye Senna",
"content": "I migrated all Senna system!",
"n_likes": 3},
{"_key": "Good-bye Tritonn",
"content": "I also migrated all Tritonn system!",
"n_likes": 3}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

There is a table, Entries, for blog entries. An entry has title, content and the number of
likes for the entry. Title is key of Entries. Content is value of Entries.content column.
The number of likes is value of Entries.n_likes column.

Entries._key column and Entries.content column are indexed using TokenBigram tokenizer. So
both Entries._key and Entries.content are fulltext search ready.

OK. The schema and data for examples are ready.

Literals
Integer
Integer literal is sequence of 0 to 9 such as 1234567890. + or - can be prepended as sign
such as +29 and -29. Integer literal must be decimal. Octal notation, hex and so on can't
be used.

The maximum value of integer literal is 9223372036854775807 (= 2 ** 63 - 1). The minimum
value of integer literal is -9223372036854775808 (= -(2 ** 63)).

Float
Float literal is sequence of 0 to 9, . and 0 to 9 such as 3.14. + or - can be prepended as
sign such as +3.14 and -3.14. ${RADIX}e${EXPORNENTIAL} and ${RADIX}E${EXPORNENTIAL}
formats are also supported. For example, 314e-2 is the same as 3.14.

String
String literal is "...". You need to escape " in literal by prepending \\'' such as ``\".
For example, "Say \"Hello!\"." is a literal for Say "Hello!". string.

String encoding must be the same as encoding of database. The default encoding is UTF-8.
It can be changed by --with-default-encoding configure option, --encodiong
/reference/executables/groonga option and so on.

Boolean
Boolean literal is true and false. true means true and false means false.

Null
Null literal is null. Groonga doesn't support null value but null literal is supported.

Time
NOTE:
This is the groonga original notation.

Time literal doesn't exit. There are string time notation, integer time notation and float
time notation.

String time notation is "YYYY/MM/DD hh:mm:ss.uuuuuu" or "YYYY-MM-DD hh:mm:ss.uuuuuu". YYYY
is year, MM is month, DD is day, hh is hour, mm is minute, ss is second and uuuuuu is
micro second. It is local time. For example, "2012/07/23 02:41:10.436218" is
2012-07-23T02:41:10.436218 in ISO 8601 format.

Integer time notation is the number of seconds that have elapsed since midnight UTC,
January 1, 1970. It is also known as POSIX time. For example, 1343011270 is
2012-07-23T02:41:10Z in ISO 8601 format.

Float time notation is the number of seconds and micro seconds that have elapsed since
midnight UTC, January 1, 1970. For example, 1343011270.436218 is
2012-07-23T02:41:10.436218Z in ISO 8601 format.

Geo point
NOTE:
This is the groonga original notation.

Geo point literal doesn't exist. There is string geo point notation.

String geo point notation has the following patterns:

· "LATITUDE_IN_MSECxLONGITUDE_IN_MSEC"

· "LATITUDE_IN_MSEC,LONGITUDE_IN_MSEC"

· "LATITUDE_IN_DEGREExLONGITUDE_IN_DEGREE"

· "LATITUDE_IN_DEGREE,LONGITUDE_IN_DEGREE"

x and , can be used for separator. Latitude and longitude can be represented in
milliseconds or degree.

Array
Array literal is [element1, element2, ...].

Object literal
Object literal is {name1: value1, name2: value2, ...}. Groonga doesn't support object
literal yet.

Control syntaxes
Script syntax doesn't support statement. So you cannot use control statement such as if.
You can only use A ? B : C expression as control syntax.

A ? B : C returns B if A is true, C otherwise.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == (_id == 1 ? 5 : 3)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that _id column value is equal to 1 and n_likes column
value is equal to 5 or _id column value is not equal to 1 and n_likes column value is
equal to 3.

Grouping
Its syntax is (...). ... is comma separated expression list.

(...) groups one ore more expressions and they can be processed as an expression. a && b
|| c means that a and b are matched or c is matched. a && (b || c) means that a and one of
b and c are matched.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes < 5 && content @ "senna" || content @ "fast"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
select Entries --filter 'n_likes < 5 && (content @ "senna" || content @ "fast")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ]
# ]
# ]
# ]

The first expression doesn't use grouping. It matches records that n_likes < 5 and content
@ "senna" are matched or content @ "fast" is matched.

The second expression uses grouping. It matches records that n_likes < 5 and one of
content @ "senna" or content @ "fast" are matched.

Function call
Its syntax is name(arugment1, argument2, ...).

name(argument1, argument2, ...) calls a function that is named name with arguments
argument1, argument2 and ....

See /reference/function for available functin list.

Here is a simple example.

Execution example:

select Entries --filter 'edit_distance(_key, "Groonga") <= 1'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression uses /reference/functions/edit_distance. It matches records that _key
column value is similar to "Groonga". Similality of "Groonga" is computed as edit
distance. If edit distance is less than or equal to 1, the value is treated as similar. In
this case, "Groonga" and "Mroonga" are treated as similar.

Basic operators
Groonga supports operators defined in ECMAScript.

Arithmetic operators
Here are arithmetic operators.

Addition operator
Its syntax is number1 + number2.

The operator adds number1 and number2 and returns the result.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == 10 + 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 15 (= 10 + 5).

Subtraction operator
Its syntax is number1 - number2.

The operator subtracts number2 from number1 and returns the result.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == 20 - 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 15 (= 20 - 5).

Multiplication operator
Its syntax is number1 * number2.

The operator multiplies number1 and number2 and returns the result.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == 3 * 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 15 (= 3 * 5).

Division operator
Its syntax is number1 / number2 and number1 % number2.

The operator divides number2 by number1. / returns the quotient of result. % returns the
remainder of result.

Here is simple examples.

Execution example:

select Entries --filter 'n_likes == 26 / 7'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 3 (= 26 / 7).

Execution example:

select Entries --filter 'n_likes == 26 % 7'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 (= 26 % 7).

Logical operators
Here are logical operators.

Logical NOT operator
Its syntax is !condition.

The operator inverts boolean value of condition.

Here is a simple example.

Execution example:

select Entries --filter '!(n_likes == 5)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is not equal to 5.

Logical AND operator
Its syntax is condition1 && condition2.

The operator returns true if both of condition1 and condition2 are true, false otherwise.

Here is a simple example.

Execution example:

select Entries --filter 'content @ "fast" && n_likes >= 10'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that content column value has the word fast and n_likes
column value is greater or equal to 10.

Logical OR operator
Its syntax is condition1 || condition2.

The operator returns true if either condition1 or condition2 is true, false otherwise.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == 5 || n_likes == 10'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 or 10.

Logical AND NOT operator
Its syntax is condition1 &! condition2.

The operator returns true if condition1 is true but condition2 is false, false otherwise.
It returns difference set.

Here is a simple example.

Execution example:

select Entries --filter 'content @ "fast" &! content @ "mroonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that content column value has the word fast but doesn't
have the word mroonga.

Bitwise operators
Here are bitwise operators.

Bitwise NOT operator
Its syntax is ~number.

The operator returns bitwise NOT of number.

Here is a simple example.

Execution example:

select Entries --filter '~n_likes == -6'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 because bitwise NOT
of 5 is equal to -6.

Bitwise AND operator
Its syntax is number1 & number2.

The operator returns bitwise AND between number1 and number2.

Here is a simple example.

Execution example:

select Entries --filter '(n_likes & 1) == 1'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is even number because bitwise
AND between an even number and 1 is equal to 1 and bitwise AND between an odd number and 1
is equal to 0.

Bitwise OR operator
Its syntax is number1 | number2.

The operator returns bitwise OR between number1 and number2.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == (1 | 4)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 (= 1 | 4).

Bitwise XOR operator
Its syntax is number1 ^ number2.

The operator returns bitwise XOR between number1 and number2.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == (10 ^ 15)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 (= 10 ^ 15).

Shift operators
Here are shift operators.

Left shift operator
Its syntax is number1 << number2.

The operator performs a bitwise left shift operation on number1 by number2.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == (5 << 1)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 10 (= 5 << 1).

Signed right shift operator
Its syntax is number1 >> number2.

The operator shifts bits of number1 to right by number2. The sign of the result is the
same as number1.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == -(-10 >> 1)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 (= -(-10 >> 1) =
-(-5)).

Unsigned right shift operator
Its syntax is number1 >>> number2.

The operator shifts bits of number1 to right by number2. The leftmost number2 bits are
filled by 0.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == (2147483648 - (-10 >>> 1))'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 (= 2147483648 -
(-10 >>> 1) = 2147483648 - 2147483643).

Comparison operators
Here are comparison operators.

Equal operator
Its syntax is object1 == object2.

The operator returns true if object1 equals to object2, false otherwise.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5.

Not equal operator
Its syntax is object1 != object2.

The operator returns true if object1 does not equal to object2, false otherwise.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes != 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is not equal to 5.

Less than operator
TODO: ...

Less than or equal to operator
TODO: ...

Greater than operator
TODO: ...

Greater than or equal to operator
TODO: ...

Assignment operators
Addition assignment operator
Its syntax is column1 += column2.

The operator performs addition assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score += n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 4
# ],
# [
# "Good-bye Tritonn",
# 3,
# 4
# ],
# [
# "Groonga",
# 10,
# 11
# ],
# [
# "Mroonga",
# 15,
# 16
# ],
# [
# "The first post!",
# 5,
# 6
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs addition
assignment operation such as '_score = _score + n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 + 3 is evaluated and stored to _score column as the execution result.

Subtraction assignment operator
Its syntax is column1 -= column2.

The operator performs subtraction assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score -= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# -2
# ],
# [
# "Good-bye Tritonn",
# 3,
# -2
# ],
# [
# "Groonga",
# 10,
# -9
# ],
# [
# "Mroonga",
# 15,
# -14
# ],
# [
# "The first post!",
# 5,
# -4
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score - n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 - 3 is evaluated and stored to _score column as the execution result.

Multiplication assignment operator
Its syntax is column1 *= column2.

The operator performs multiplication assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score *= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 3
# ],
# [
# "Good-bye Tritonn",
# 3,
# 3
# ],
# [
# "Groonga",
# 10,
# 10
# ],
# [
# "Mroonga",
# 15,
# 15
# ],
# [
# "The first post!",
# 5,
# 5
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score * n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 * 3 is evaluated and stored to _score column as the execution result.

Division assignment operator
Its syntax is column1 /= column2.

The operator performs division assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score /= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 0
# ],
# [
# "Good-bye Tritonn",
# 3,
# 0
# ],
# [
# "Groonga",
# 10,
# 0
# ],
# [
# "Mroonga",
# 15,
# 0
# ],
# [
# "The first post!",
# 5,
# 0
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score / n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 / 3 is evaluated and stored to _score column as the execution result.

Modulo assignment operator
Its syntax is column1 %= column2.

The operator performs modulo assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score %= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 1
# ],
# [
# "Good-bye Tritonn",
# 3,
# 1
# ],
# [
# "Groonga",
# 10,
# 1
# ],
# [
# "Mroonga",
# 15,
# 1
# ],
# [
# "The first post!",
# 5,
# 1
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score % n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 % 3 is evaluated and stored to _score column as the execution result.

Bitwise left shift assignment operator
Its syntax is column1 <<= column2.

The operator performs left shift assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score <<= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 8
# ],
# [
# "Good-bye Tritonn",
# 3,
# 8
# ],
# [
# "Groonga",
# 10,
# 1024
# ],
# [
# "Mroonga",
# 15,
# 32768
# ],
# [
# "The first post!",
# 5,
# 32
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score << n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 << 3 is evaluated and stored to _score column as the execution result.

Bitwise signed right shift assignment operator
Its syntax is column2 >>= column2.

The operator performs signed right shift assignment operation on column1 by column2.

Bitwise unsigned right shift assignment operator
Its syntax is column1 >>>= column2.

The operator performs unsigned right shift assignment operation on column1 by column2.

Bitwise AND assignment operator
Its syntax is column1 &= column2.

The operator performs bitwise AND assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score &= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 1
# ],
# [
# "Good-bye Tritonn",
# 3,
# 1
# ],
# [
# "Groonga",
# 10,
# 0
# ],
# [
# "Mroonga",
# 15,
# 1
# ],
# [
# "The first post!",
# 5,
# 1
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score & n_likes' for each records.

For example, the value of _score about the record which stores "Groonga" as the _key is
10.

So the expression 1 & 10 is evaluated and stored to _score column as the execution result.

Bitwise OR assignment operator
Its syntax is column1 |= column2.

The operator performs bitwise OR assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score |= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 3
# ],
# [
# "Good-bye Tritonn",
# 3,
# 3
# ],
# [
# "Groonga",
# 10,
# 11
# ],
# [
# "Mroonga",
# 15,
# 15
# ],
# [
# "The first post!",
# 5,
# 5
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score | n_likes' for each records.

For example, the value of _score about the record which stores "Groonga" as the _key is
10.

So the expression 1 | 10 is evaluated and stored to _score column as the execution result.

Bitwise XOR assignment operator
Its syntax is column1 ^= column2.

The operator performs bitwise XOR assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score ^= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 2
# ],
# [
# "Good-bye Tritonn",
# 3,
# 2
# ],
# [
# "Groonga",
# 10,
# 11
# ],
# [
# "Mroonga",
# 15,
# 14
# ],
# [
# "The first post!",
# 5,
# 4
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score ^ n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 ^ 3 is evaluated and stored to _score column as the execution result.

Original operators
Script syntax adds the original binary opearators to ECMAScript syntax. They operate
search specific operations. They are starts with @ or *.

Match operator
Its syntax is column @ value.

The operator searches value by inverted index of column. Normally, full text search is
operated but tag search can be operated. Because tag search is also implemented by
inverted index.

query_syntax uses this operator by default.

Here is a simple example.

Execution example:

select Entries --filter 'content @ "fast"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I started to use Groonga. It's very fast!"
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!"
# ]
# ]
# ]
# ]

The expression matches records that contain a word fast in content column value.

Prefix search operator
Its syntax is column @^ value.

The operator does prefix search with value. Prefix search searches records that contain a
word that starts with value.

You can use fast prefix search against a column. The column must be indexed and index
table must be patricia trie table (TABLE_PAT_KEY) or double array trie table
(TABLE_DAT_KEY). You can also use fast prefix search against _key pseudo column of
patricia trie table or double array trie table. You don't need to index _key.

Prefix search can be used with other table types but it causes all records scan. It's not
problem for small records but it spends more time for large records.

Here is a simple example.

Execution example:

select Entries --filter '_key @^ "Goo"' --output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Good-bye Tritonn"
# ],
# [
# "Good-bye Senna"
# ]
# ]
# ]
# ]

The expression matches records that contain a word that starts with Goo in _key pseudo
column value. Good-bye Senna and Good-bye Tritonn are matched with the expression.

Suffix search operator
Its syntax is column @$ value.

This operator does suffix search with value. Suffix search searches records that contain a
word that ends with value.

You can use fast suffix search against a column. The column must be indexed and index
table must be patricia trie table (TABLE_PAT_KEY) with KEY_WITH_SIS flag. You can also use
fast suffix search against _key pseudo column of patricia trie table (TABLE_PAT_KEY) with
KEY_WITH_SIS flag. You don't need to index _key. We recommended that you use index column
based fast suffix search instead of _key based fast suffix search. _key based fast suffix
search returns automatically registered substrings. (TODO: write document about suffix
search and link to it from here.)

NOTE:
Fast suffix search can be used only for non-ASCII characters such as hiragana in
Japanese. You cannot use fast suffix search for ASCII character.

Suffix search can be used with other table types or patricia trie table without
KEY_WITH_SIS flag but it causes all records scan. It's not problem for small records but
it spends more time for large records.

Here is a simple example. It uses fast suffix search for hiragana in Japanese that is one
of non-ASCII characters.

Execution example:

table_create Titles TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Titles content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create SuffixSearchTerms TABLE_PAT_KEY|KEY_WITH_SIS ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create SuffixSearchTerms index COLUMN_INDEX Titles content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Titles
[
{"content": "ぐるんが"},
{"content": "むるんが"},
{"content": "せな"},
{"content": "とりとん"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
select Titles --query 'content:$んが'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 2,
# "むるんが"
# ],
# [
# 1,
# "ぐるんが"
# ]
# ]
# ]
# ]

The expression matches records that have value that ends with んが in content column
value. ぐるんが and むるんが are matched with the expression.

Near search operator
Its syntax is column *N "word1 word2 ...".

The operator does near search with words word1 word2 .... Near search searches records
that contain the words and the words are appeared in the near distance. Near distance is
always 10 for now. The unit of near distance is the number of characters in N-gram family
tokenizers and the number of words in morphological analysis family tokenizers.

(TODO: Add a description about TokenBigram doesn't split ASCII only word into tokens. So
the unit for ASCII words with TokenBigram is the number of words even if TokenBigram is a
N-gram family tokenizer.)

Note that an index column for full text search must be defined for column.

Here is a simple example.

Execution example:

select Entries --filter 'content *N "I fast"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I started to use Groonga. It's very fast!"
# ]
# ]
# ]
# ]
select Entries --filter 'content *N "I Really"' --output_columns content
# [[0, 1337566253.89858, 0.000355720520019531], [[[0], [["content", "Text"]]]]]
select Entries --filter 'content *N "also Really"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!"
# ]
# ]
# ]
# ]

The first expression matches records that contain I and fast and the near distance of
those words are in 10 words. So the record that its content is I also started to use
mroonga. It's also very fast! ... is matched. The number of words between I and fast is
just 10.

The second expression matches records that contain I and Really and the near distance of
those words are in 10 words. So the record that its content is I also started to use
mroonga. It's also very fast! Really fast! is not matched. The number of words between I
and Really is 11.

The third expression matches records that contain also and Really and the near distance of
those words are in 10 words. So the record that its content is I also st arted to use
mroonga. It's also very fast! Really fast! is matched. The number of words between also
and Really is 10.

Similar search
Its syntax is column *S "document".

The operator does similar search with document document. Similar search searches records
that have similar content to document.

Note that an index column for full text search must be defined for column.

Here is a simple example.

Execution example:

select Entries --filter 'content *S "I migrated all Solr system!"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I migrated all Senna system!"
# ],
# [
# "I also migrated all Tritonn system!"
# ]
# ]
# ]
# ]

The expression matches records that have similar content to I migrated all Solr system!.
In this case, records that have I migrated all XXX system! content are matched.

Term extract operator
Its syntax is _key *T "document".

The operator extracts terms from document. Terms must be registered as keys of the table
of _key.

Note that the table must be patricia trie (TABLE_PAT_KEY) or double array trie
(TABLE_DAT_KEY). You can't use hash table (TABLE_HASH_KEY) and array (TABLE_NO_KEY)
because they don't support longest common prefix search. Longest common prefix search is
used to implement the operator.

Here is a simple example.

Execution example:

table_create Words TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Words
[
{"_key": "groonga"},
{"_key": "mroonga"},
{"_key": "Senna"},
{"_key": "Tritonn"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
select Words --filter '_key *T "Groonga is the successor project to Senna."' --output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "groonga"
# ],
# [
# "senna"
# ]
# ]
# ]
# ]

The expression extrcts terms that included in document Groonga is the successor project to
Senna.. In this case, NormalizerAuto normalizer is specified to Words. So Groonga can be
extracted even if it is loaded as groonga into Words. All of extracted terms are also
normalized.

Regular expression operator
New in version 5.0.1.

Its syntax is column @~ "pattern".

The operator searches records by the regular expression pattern. If a record's column
value is matched to pattern, the record is matched.

pattern must be valid regular expression syntax. See /reference/regular_expression about
regular expression syntax details.

The following example uses .roonga as pattern. It matches Groonga, Mroonga and so on.

Execution example:

select Entries --filter 'content @~ ".roonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

In most cases, regular expression is evaluated sequentially. So it may be slow against
many records.

In some cases, Groonga evaluates regular expression by index. It's very fast. See
/reference/regular_expression for details.

See also
· /reference/api/grn_expr: grn_expr related APIs

Regular expression
Summary
NOTE:
Regular expression support is an experimental feature.

New in version 5.0.1.

Groonga supports pattern match by regular expression. Regular expression is widely used
format to describe a pattern. Regular expression is useful to represent complex pattern.

In most cases, pattern match by regular expression is evaluated as sequential search.
It'll be slow for many records and many texts.

In some cases, pattern match by regular expression can be evaluated by index. It's very
fast rather than sequential search. Patterns that can be evaluated by index are described
later.

New in version 5.0.7: Groonga normalizes match target text by normalizer-auto normalizer
when Groonga doesn't use index for regular expression search. It means that regular
expression that has upper case such as Groonga never match. Because normalizer-auto
normalizer normalizes all alphabets to lower case. groonga matches to both Groonga and
groonga.

Why is match target text normalizered? It's for increasing index search-able patterns. If
Groonga doesn't normalize match target text, you need to write complex regular expression
such as [Dd][Ii][Ss][Kk] and (?i)disk for case-insensitive match. Groonga can't use index
against complex regular expression.

If you write disk regular expression for case-insensitive match, Groonga can search the
pattern with index. It's fast.

You may feel the behavior is strange. But fast search based on this behavior will help
you.

There are many regular expression syntaxes. Groonga uses the same syntax in Ruby. Because
Groonga uses the same regular expression engine as Ruby. The regular expression engine is
Onigmo. Characteristic difference with other regular expression syntax is ^ and $. The
regular expression syntax in Ruby, ^ means the beginning of line and $ means the end of
line. ^ means the beginning of text and $ means the end of text in other most regular
expression syntaxes. The regular expression syntax in Ruby uses \A for the beginning of
text and \z for the end of text.

New in version 5.0.6: Groonga uses multiline mode since 5.0.6. It means that . matches on
\n.

But it's meaningless. Because \n is removed by normalizer-auto normalizer.

You can use regular expression in select-query and select-filter options of
/reference/commands/select command.

Usage
Here are a schema definition and sample data to show usage. There is only one table, Logs.
Logs table has only message column. Log messages are stored into the message column.

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs message COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Logs
[
{"message": "host1:[error]: No memory"},
{"message": "host1:[warning]: Remained disk space is less than 30%"},
{"message": "host1:[error]: Disk full"},
{"message": "host2:[error]: No memory"},
{"message": "host2:[info]: Shutdown"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

Here is an example that uses regular expression in select-query. You need to use
${COLUMN}:~${REGULAR_EXPRESSION} syntax.

Execution example:

select Logs --query 'message:~"disk (space|full)"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 2,
# "host1:[warning]: Remained disk space is less than 30%"
# ],
# [
# 3,
# "host1:[error]: Disk full"
# ]
# ]
# ]
# ]

Here is an example that uses regular expression in select-filter. You need to use
${COLUMN} @~ ${REGULAR_EXPRESSION} syntax.

Execution example:

select Logs --filter 'message @~ "disk (space|full)"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 2,
# "host1:[warning]: Remained disk space is less than 30%"
# ],
# [
# 3,
# "host1:[error]: Disk full"
# ]
# ]
# ]
# ]

Index
Groonga can search records by regular expression with index. It's very fast rather than
sequential search.

But it doesn't support all regular expression patterns. It supports only the following
regular expression patterns. The patterns will be increased in the future.

· Literal only pattern such as disk

· The begging of text and literal only pattern such as \Adisk

· The end of text and literal only pattern such as disk\z

You need to create an index for fast regular expression search. Here are requirements of
index:

· Lexicon must be table-pat-key table.

· Lexicon must use token-regexp tokenizer.

· Index column must has WITH_POSITION flag.

Other configurations such as lexicon's normalizer are optional. You can choose what you
like. If you want to use case-insensitive search, use normalizer-auto normalizer.

Here are recommended index definitions. In