EnglishFrenchSpanish

Ad


OnWorks favicon

groonga - Online in the Cloud

Run groonga in OnWorks free hosting provider over Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

This is the command groonga that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

PROGRAM:

NAME


groonga - Groonga documentation

· news

CHARACTERISTICS OF GROONGA


Groonga overview
Groonga is a fast and accurate full text search engine based on inverted index. One of the
characteristics of Groonga is that a newly registered document instantly appears in search
results. Also, Groonga allows updates without read locks. These characteristics result in
superior performance on real-time applications.

Groonga is also a column-oriented database management system (DBMS). Compared with
well-known row-oriented systems, such as MySQL and PostgreSQL, column-oriented systems are
more suited for aggregate queries. Due to this advantage, Groonga can cover weakness of
row-oriented systems.

The basic functions of Groonga are provided in a C library. Also, libraries for using
Groonga in other languages, such as Ruby, are provided by related projects. In addition,
groonga-based storage engines are provided for MySQL and PostgreSQL. These libraries and
storage engines allow any application to use Groonga. See usage examples.

Full text search and Instant update
In widely used DBMSs, updates are immediately processed, for example, a newly registered
record appears in the result of the next query. In contrast, some full text search engines
do not support instant updates, because it is difficult to dynamically update inverted
indexes, the underlying data structure.

Groonga also uses inverted indexes but supports instant updates. In addition, Groonga
allows you to search documents even when updating the document collection. Due to these
superior characteristics, Groonga is very flexible as a full text search engine. Also,
Groonga always shows good performance because it divides a large task, inverted index
merging, into smaller tasks.

Column store and aggregate query
People can collect more than enough data in the Internet era. However, it is difficult to
extract informative knowledge from a large database, and such a task requires a many-sided
analysis through trial and error. For example, search refinement by date, time and
location may reveal hidden patterns. Aggregate queries are useful to perform this kind of
tasks.

An aggregate query groups search results by specified column values and then counts the
number of records in each group. For example, an aggregate query in which a location
column is specified counts the number of records per location. Making a graph from the
result of an aggregate query against a date column is an easy way to visualize changes
over time. Also, a combination of refinement by location and an aggregate query against a
date column allows visualization of changes over time in specific location. Thus
refinement and aggregation are important to perform data mining.

A column-oriented architecture allows Groonga to efficiently process aggregate queries
because a column-oriented database, which stores records by column, allows an aggregate
query to access only a specified column. On the other hand, an aggregate query on a
row-oriented database, which stores records by row, has to access neighbor columns, even
though those columns are not required.

Inverted index and tokenizer
An inverted index is a traditional data structure used for large-scale full text search. A
search engine based on inverted index extracts index terms from a document when it is
added. Then in retrieval, a query is divided into index terms to find documents containing
those index terms. In this way, index terms play an important role in full text search and
thus the way of extracting index terms is a key to a better search engine.

A tokenizer is a module to extract index terms. A Japanese full text search engine
commonly uses a word-based tokenizer (hereafter referred to as a word tokenizer) and/or a
character-based n-gram tokenizer (hereafter referred to as an n-gram tokenizer). A word
tokenizer-based search engine is superior in time, space and precision, which is the
fraction of relevant documents in a search result. On the other hand, an n-gram
tokenizer-based search engine is superior in recall, which is the fraction of retrieved
documents in the perfect search result. The best choice depends on the application in
practice.

Groonga supports both word and n-gram tokenizers. The simplest built-in tokenizer uses
spaces as word delimiters. Built-in n-gram tokenizers (n = 1, 2, 3) are also available by
default. In addition, a yet another built-in word tokenizer is available if MeCab, a
part-of-speech and morphological analyzer, is embedded. Note that a tokenizer is pluggable
and you can develop your own tokenizer, such as a tokenizer based on another
part-of-speech tagger or a named-entity recognizer.

Sharable storage and read lock-free
Multi-core processors are mainstream today and the number of cores per processor is
increasing. In order to exploit multiple cores, executing multiple queries in parallel or
dividing a query into sub-queries for parallel processing is becoming more important.

A database of Groonga can be shared with multiple threads/processes. Also, multiple
threads/processes can execute read queries in parallel even when another thread/process is
executing an update query because Groonga uses read lock-free data structures. This
feature is suited to a real-time application that needs to update a database while
executing read queries. In addition, Groonga allows you to build flexible systems. For
example, a database can receive read queries through the built-in HTTP server of Groonga
while accepting update queries through MySQL.

Geo-location (latitude and longitude) search
Location services are getting more convenient because of mobile devices with GPS. For
example, if you are going to have lunch or dinner at a nearby restaurant, a local search
service for restaurants may be very useful, and for such services, fast geo-location
search is becoming more important.

Groonga provides inverted index-based fast geo-location search, which supports a query to
find points in a rectangle or circle. Groonga gives high priority to points near the
center of an area. Also, Groonga supports distance measurement and you can sort points by
distance from any point.

Groonga library
The basic functions of Groonga are provided in a C library and any application can use
Groonga as a full text search engine or a column-oriented database. Also, libraries for
languages other than C/C++, such as Ruby, are provided in related projects. See related
projects for details.

Groonga server
Groonga provides a built-in server command which supports HTTP, the memcached binary
protocol and the Groonga Query Transfer Protocol (/spec/gqtp). Also, a Groonga server
supports query caching, which significantly reduces response time for repeated read
queries. Using this command, Groonga is available even on a server that does not allow you
to install new libraries.

Mroonga storage engine
Groonga works not only as an independent column-oriented DBMS but also as storage engines
of well-known DBMSs. For example, Mroonga is a MySQL pluggable storage engine using
Groonga. By using Mroonga, you can use Groonga for column-oriented storage and full text
search. A combination of a built-in storage engine, MyISAM or InnoDB, and a Groonga-based
full text search engine is also available. All the combinations have good and bad points
and the best one depends on the application. See related projects for details.

INSTALL


This section describes how to install Groonga on each environment. There are packages for
major platforms. It's recommended that you use package instead of building Groonga by
yourself. But don't warry. There is a document about building Groonga from source.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package
for server. You should use a 32-bit package just only for tests or development. You will
encounter an out of memory error with a 32-bit package even if you just process medium
size data.

Windows
This section describes how to install Groonga on Windows. You can install Groogna by
extracting a zip package or running an installer.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package
for server. You should use a 32-bit package just only for tests or development. You will
encounter an out of memory error with a 32-bit package even if you just process medium
size data.

Installer
For 32-bit environment, download x86 executable binary from packages.groonga.org:

· http://packages.groonga.org/windows/groonga/groonga-6.0.1-x86.exe

Then run it.

For 64-bit environment, download x64 executable binary from packages.goronga.org:

· http://packages.groonga.org/windows/groonga/groonga-6.0.1-x64.exe

Then run it.

Use command prompt in start menu to run /reference/executables/groonga.

zip
For 32-bit environment, download x86 zip archive from packages.groonga.org:

· http://packages.groonga.org/windows/groonga/groonga-6.0.1-x86.zip

Then extract it.

For 64-bit environment, download x64 zip archive from packages.groonga.org:

· http://packages.groonga.org/windows/groonga/groonga-6.0.1-x64.zip

Then extract it.

You can find /reference/executables/groonga in bin folder.

Build from source
First, you need to install required tools for building Groonga on Windows. Here are
required tools:

· Microsoft Visual Studio Express 2013 for Windows Desktop

· CMake

Download zipped source from packages.groonga.org:

· http://packages.groonga.org/source/groonga/groonga-6.0.1.zip

Then extract it.

Move to the Groonga's source folder:

> cd c:\Users\%USERNAME%\Downloads\groonga-6.0.1

Configure by cmake. The following commnad line is for 64-bit version. To build 32-bit
version, use -G "Visual Studio 12 2013" parameter instead:

groonga-6.0.1> cmake . -G "Visual Studio 12 2013 Win64" -DCMAKE_INSTALL_PREFIX=C:\Groonga

Build:

groonga-6.0.1> cmake --build . --config Release

Install:

groonga-6.0.1> cmake --build . --config Release --target Install

After the above steps, /reference/executables/groonga is found at
c:\Groonga\bin\groonga.exe.

Mac OS X
This section describes how to install Groonga on Mac OS X. You can install Groonga by
MacPorts or Homebrew.

MacPorts
Install:

% sudo port install groonga

Homebrew
Install:

% brew install groonga

If you want to use MeCab as a tokenizer, specify --with-mecab option:

% brew install groonga --with-mecab

Then install and configure MeCab dictionary.

Install:

% brew install mecab-ipadic

Configure:

% sed -i '' -e 's,dicrc.*=.*,dicrc = /usr/local/lib/mecab/dic/ipadic,g' /usr/local/etc/mecabrc

Build from source
Install Xcode.

Download source:

% curl -O http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1

Configure (see source-configure about configure options):

% ./configure

Build:

% make -j$(/usr/sbin/sysctl -n hw.ncpu)

Install:

% sudo make install

Debian GNU/Linux
This section describes how to install Groonga related deb packages on Debian GNU/Linux.
You can install them by apt.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package
for server. You should use a 32-bit package just only for tests or development. You will
encounter an out of memory error with a 32-bit package even if you just process medium
size data.

wheezy
Add the Groonga apt repository.

/etc/apt/sources.list.d/groonga.list:

deb http://packages.groonga.org/debian/ wheezy main
deb-src http://packages.groonga.org/debian/ wheezy main

Install:

% sudo apt-get update
% sudo apt-get install -y --allow-unauthenticated groonga-keyring
% sudo apt-get update
% sudo apt-get install -y -V groonga

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo apt-get install -y -V groonga-tokenizer-mecab

If you want to use TokenFilterStem as a token filter, install groonga-token-filter-stem
package.

Install groonga-token-filter-stem package:

% sudo apt-get install -y -V groonga-token-filter-stem

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

Install groonga-munin-plugins package:

% sudo apt-get install -y -V groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo apt-get install -y -V groonga-normalizer-mysql

jessie
New in version 5.0.3.

Add the Groonga apt repository.

/etc/apt/sources.list.d/groonga.list:

deb http://packages.groonga.org/debian/ jessie main
deb-src http://packages.groonga.org/debian/ jessie main

Install:

% sudo apt-get update
% sudo apt-get install -y --allow-unauthenticated groonga-keyring
% sudo apt-get update
% sudo apt-get install -y -V groonga

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo apt-get install -y -V groonga-tokenizer-mecab

If you want to use TokenFilterStem as a token filter, install groonga-token-filter-stem
package.

Install groonga-token-filter-stem package:

% sudo apt-get install -y -V groonga-token-filter-stem

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

Install groonga-munin-plugins package:

% sudo apt-get install -y -V groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo apt-get install -y -V groonga-normalizer-mysql

Build from source
Install required packages to build Groonga:

% sudo apt-get install -y -V wget tar build-essential zlib1g-dev liblzo2-dev libmsgpack-dev libzmq-dev libevent-dev libmecab-dev

Download source:

% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1

Configure (see source-configure about configure options):

% ./configure

Build:

% make -j$(grep '^processor' /proc/cpuinfo | wc -l)

Install:

% sudo make install

Ubuntu
This section describes how to install Groonga related deb packages on Ubuntu. You can
install them by apt.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package
for server. You should use a 32-bit package just only for tests or development. You will
encounter an out of memory error with a 32-bit package even if you just process medium
size data.

PPA (Personal Package Archive)
The Groonga APT repository for Ubuntu uses PPA (Personal Package Archive) on Launchpad.
You can install Groonga by APT from the PPA.

Here are supported Ubuntu versions:

· 12.04 LTS Precise Pangolin

· 14.04 LTS Trusty Tahr

· 15.04 Vivid Vervet

· 15.10 Wily Werewolf

Enable the universe repository to install Groonga:

% sudo apt-get -y install software-properties-common
% sudo add-apt-repository -y universe

Add the ppa:groonga/ppa PPA to your system:

% sudo add-apt-repository -y ppa:groonga/ppa
% sudo apt-get update

Install:

% sudo apt-get -y install groonga

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo apt-get -y install groonga-tokenizer-mecab

If you want to use TokenFilterStem as a token filter, install groonga-token-filter-stem
package.

Install groonga-token-filter-stem package:

% sudo apt-get -y install groonga-token-filter-stem

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

Install groonga-munin-plugins package:

% sudo apt-get -y install groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo apt-get -y install groonga-normalizer-mysql

Build from source
Install required packages to build Groonga:

% sudo apt-get -V -y install wget tar build-essential zlib1g-dev liblzo2-dev libmsgpack-dev libzmq-dev libevent-dev libmecab-dev

Download source:

% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1

Configure (see source-configure about configure options):

% ./configure

Build:

% make -j$(grep '^processor' /proc/cpuinfo | wc -l)

Install:

% sudo make install

CentOS
This section describes how to install Groonga related RPM packages on CentOS. You can
install them by yum.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package
for server. You should use a 32-bit package just only for tests or development. You will
encounter an out of memory error with a 32-bit package even if you just process medium
size data.

CentOS 5
Install:

% sudo rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
% sudo yum makecache
% sudo yum install -y groonga

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y groonga-tokenizer-mecab

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

NOTE:
Groonga-munin-plugins package requires munin-node package that isn't included in the
official CentOS repository. You need to enable Repoforge (RPMforge) repository or EPEL
repository to install it by yum.

Enable Repoforge (RPMforge) repository on i386 environment:

% wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el5.rf.i386.rpm
% sudo rpm -ivh rpmforge-release-0.5.2-2.el5.rf.i386.rpm

Enable Repoforge (RPMforge) repository on x86_64 environment:

% wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el5.rf.x86_64.rpm
% sudo rpm -ivh rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm

Enable EPEL repository on any environment:

% wget http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
% sudo rpm -ivh epel-release-5-4.noarch.rpm

Install groonga-munin-plugins package:

% sudo yum install -y groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo yum install -y groonga-normalizer-mysql

CentOS 6
Install:

% sudo rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
% sudo yum makecache
% sudo yum install -y groonga

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y groonga-tokenizer-mecab

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

NOTE:
Groonga-munin-plugins package requires munin-node package that isn't included in the
official CentOS repository. You need to enable EPEL repository to install it by yum.

Enable EPEL repository on any environment:

% sudo rpm -ivh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm

Install groonga-munin-plugins package:

% sudo yum install -y groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo yum install -y groonga-normalizer-mysql

CentOS 7
Install:

% sudo yum install -y http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
% sudo yum install -y groonga

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y groonga-tokenizer-mecab

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

NOTE:
Groonga-munin-plugins package requires munin-node package that isn't included in the
official CentOS repository. You need to enable EPEL repository to install it by yum.

Enable EPEL repository:

% sudo yum install -y epel-release

Install groonga-munin-plugins package:

% sudo yum install -y groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo yum install -y groonga-normalizer-mysql

Build from source
Install required packages to build Groonga:

% sudo yum install -y wget tar gcc-c++ make mecab-devel

Download source:

% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1

Configure (see source-configure about configure options):

% ./configure

Build:

% make -j$(grep '^processor' /proc/cpuinfo | wc -l)

Install:

% sudo make install

Fedora
This section describes how to install Groonga related RPM packages on Fedora. You can
install them by yum.

NOTE:
Since Groonga 3.0.2 release, Groonga related RPM pakcages are in the official Fedora
yum repository (Fedora 18). So you can use them instead of the Groonga yum repository
now. There is some exceptions to use the Groonga yum repository because mecab
dictionaries (mecab-ipadic or mecab-jumandic) are provided by the Groonga yum
repository.

We distribute both 32-bit and 64-bit packages but we strongly recommend a 64-bit package
for server. You should use a 32-bit package just only for tests or development. You will
encounter an out of memory error with a 32-bit package even if you just process medium
size data.

Fedora 21
Install:

% sudo yum install -y groonga

Note that additional packages such as mecab-dic and mecab-jumandic packages require to
install groonga-release package which provides the Groonga yum repository beforehand:

% sudo rpm -ivh http://packages.groonga.org/fedora/groonga-release-1.1.0-1.noarch.rpm
% sudo yum update

NOTE:
groonga package is the minimum set of fulltext search engine. If you want to use
Groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (GQTP protocol based server package)

See /server section about details.

If you want to use MeCab as a tokenizer, install groonga-tokenizer-mecab package.

Install groonga-tokenizer-mecab package:

% sudo yum install -y groonga-tokenizer-mecab

Then install MeCab dictionary. (mecab-ipadic or mecab-jumandic)

Install IPA dictionary:

% sudo yum install -y mecab-ipadic

Or install Juman dictionary:

% sudo yum install -y mecab-jumandic

There is a package that provides Munin plugins. If you want to monitor Groonga status by
Munin, install groonga-munin-plugins package.

Install groonga-munin-plugins package:

% sudo yum install -y groonga-munin-plugins

There is a package that provides MySQL compatible normalizer as a Groonga plugin. If you
want to use that one, install groonga-normalizer-mysql package.

Install groonga-normalizer-mysql package:

% sudo yum install -y install groonga-normalizer-mysql

Build from source
Install required packages to build Groonga:

% sudo yum install -y wget tar gcc-c++ make mecab-devel libedit-devel

Download source:

% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1

Configure (see source-configure about configure options):

% ./configure

Build:

% make -j$(grep '^processor' /proc/cpuinfo | wc -l)

Install:

% sudo make install

Oracle Solaris
This section describes how to install Groonga from source on Oracle Solaris.

Oracle Solaris 11
Install required packages to build Groonga:

% sudo pkg install gnu-tar gcc-45 system/header

Download source:

% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% gtar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1

Configure with CFLAGS="-m64" CXXFLAGS="-m64" variables. They are needed for building
64-bit version. To build 32-bit version, just remove those variables. (see
source-configure about configure options):

% ./configure CFLAGS="-m64" CXXFLAGS="-m64"

Build:

% make

Install:

% sudo make install

Others
This section describes how to install Groonga from source on UNIX like environment.

To get more detail about installing Groonga from source on the specific environment, find
the document for the specific environment from /install.

Dependencies
Groonga doesn't require any special libraries but requires some tools for build.

Tools
Here are required tools:

· wget, curl or Web browser for downloading source archive

· tar and gzip for extracting source archive

· shell (many shells such as dash, bash and zsh will work)

· C compiler and C++ compiler (gcc and g++ are supported but other compilers may work)

· make (GNU make is supported but other make like BSD make will work)

You must get them ready.

You can use CMake instead of shell but this document doesn't describe about building with
CMake.

Here are optional tools:

· pkg-config for detecting libraries

· sudo for installing built Groonga

You must get them ready if you want to use optional libraries.

Libraries
All libraries are optional. Here are optional libraries:

· MeCab for tokenizing full-text search target document by morphological analysis

· KyTea for tokenizing full-text search target document by morphological analysis

· ZeroMQ for /reference/suggest

· libevent for /reference/suggest

· MessagePack for supporting MessagePack output and /reference/suggest

· libedit for command line editing in /reference/executables/groonga

· zlib for compressing column value

· LZ4 for compressing column value

If you want to use those all or some libraries, you need to install them before installing
Groonga.

Build from source
Groonga uses GNU build system. So the following is the simplest build steps:

% wget http://packages.groonga.org/source/groonga/groonga-6.0.1.tar.gz
% tar xvzf groonga-6.0.1.tar.gz
% cd groonga-6.0.1
% ./configure
% make
% sudo make install

After the above steps, /reference/executables/groonga is found in /usr/local/bin/groonga.

The default build will work well but you can customize Groonga at configure step.

The following describes details about each step.

configure
First, you need to run configure. Here are important configure options:

--prefix=PATH
Specifies the install base directory. Groonga related files are installed under ${PATH}/
directory.

The default is /usr/local. In this case, /reference/executables/groonga is installed into
/usr/local/bin/groonga.

Here is an example that installs Groonga into ~/local for an user use instead of system
wide use:

% ./configure --prefix=$HOME/local

--localstatedir=PATH
Specifies the base directory to place modifiable file such as log file, PID file and
database files. For example, log file is placed at ${PATH}/log/groonga.log.

The default is /usr/local/var.

Here is an example that system wide /var is used for modifiable files:

% ./configure --localstatedir=/var

--with-log-path=PATH
Specifies the default log file path. You can override the default log path is
/reference/executables/groonga command's --log-path command line option. So this option is
not critical build option. It's just for convenient.

The default is /usr/local/var/log/groonga.log. The /usr/local/var part is changed by
--localstatedir option.

Here is an example that log file is placed into shared NFS directory /nfs/log/groonga.log:

% ./configure --with-log-path=/nfs/log/groonga.log

--with-default-encoding=ENCODING
Specifies the default encoding. Available encodings are euc_jp, sjis, utf8, latin1, koi8r
and none.

The default is utf-8.

Here is an example that Shift_JIS is used as the default encoding:

% ./configure --with-default-encoding=sjis

--with-match-escalation-threshold=NUMBER
Specifies the default match escalation threshold. See select-match-escalation-threshold
about match escalation threshold. -1 means that match operation never escalate.

The default is 0.

Here is an example that match escalation isn't used by default:

% ./configure --with-match-escalation-threshold=-1

--with-zlib
Enables column value compression by zlib.

The default is disabled.

Here is an example that enables column value compression by zlib:

% ./configure --with-zlib

--with-lz4
Enables column value compression by LZ4.

The default is disabled.

Here is an example that enables column value compression by LZ4:

% ./configure --with-lz4

--with-message-pack=MESSAGE_PACK_INSTALL_PREFIX
Specifies where MessagePack is installed. If MessagePack isn't installed with
--prefix=/usr, you need to specify this option with path that you use for building
MessagePack.

If you installed MessagePack with --prefix=$HOME/local option, you should specify
--with-message-pack=$HOME/local to Groonga's configure.

The default is /usr.

Here is an example that uses MessagePack built with --prefix=$HOME/local option:

% ./configure --with-message-pack=$HOME/local

--with-munin-plugins
Installs Munin plugins for Groonga. They are installed into
${PREFIX}/share/groonga/munin/plugins/.

Those plugins are not installed by default.

Here is an example that installs Munin plugins for Groonga:

% ./configure --with-munin-plugins

--with-package-platform=PLATFORM
Installs platform specific system management files such as init script. Available
platforms are redhat and fedora. redhat is for Red Hat and Red Hat clone distributions
such as CentOS. fedora is for Fedora.

Those system management files are not installed by default.

Here is an example that installs CentOS specific system management files:

% ./configure --with-package-platform=redhat

--help
Shows all configure options.

make
configure is succeeded, you can build Groonga by make:

% make

If you have multi cores CPU, you can make faster by using -j option. If you have 4 cores
CPU, it's good for using -j4 option:

% make -j4

If you get some errors by make, please report them to us: /contribution/report

make install
Now, you can install built Groonga!:

% sudo make install

If you have write permission for ${PREFIX}, you don't need to use sudo. e.g.
--prefix=$HOME/local case. In this case, use make install:

% make install

COMMUNITY


There are some places for sharing Groonga information. We welcome you to join our
community.

Mailing List
There are mailing lists for discussion about Groonga.

For English speakers
[email protected]

For Japanese speakers
[email protected]

Chat room
There are chat rooms for discussion about Groonga.

For English speakers
groonga/en chat room on Gitter

For Japanese speakers
groonga/ja chat room on Gitter

Twitter
@groonga tweets Groonga related information.

Please follow the account to get the latest Groonga related information!

Facebook
Groonga page on Facebook shares Groonga related information.

Please like the page to get the latest Groonga related information!

TUTORIAL


Basic operations
A Groonga package provides a C library (libgroonga) and a command line tool (groonga).
This tutorial explains how to use the command line tool, with which you can create/operate
databases, start a server, establish a connection with a server, etc.

Create a database
The first step to using Groonga is to create a new database. The following shows how to do
it.

Form:

groonga -n DB_PATH

The -n option specifies to create a new database and DB_PATH specifies the path of the new
database. Actually, a database consists of a series of files and DB_PATH specifies the
file which will be the entrance to the new database. DB_PATH also specifies the path
prefix for other files. Note that database creation fails if DB_PATH points to an existing
file (For example, db open failed (DB_PATH): syscall error 'DB_PATH' (File exists). You
can operate an existing database in a way that is in the next chapter).

This command creates a new database and then enters into interactive mode in which Groonga
prompts you to enter commands for operating that database. You can terminate this mode
with Ctrl-d.

Execution example:

% groonga -n /tmp/groonga-databases/introduction.db

After this database creation, you can find a series of files in /tmp/groonga-databases.

Operate a database
The following shows how to operate an existing database.

Form:

groonga DB_PATH [COMMAND]

DB_PATH specifies the path of a target database. This command fails if the specified
database does not exist.

If COMMAND is specified, Groonga executes COMMAND and returns the result. Otherwise,
Groonga starts in interactive mode that reads commands from the standard input and
executes them one by one. This tutorial focuses on the interactive mode.

Let's see the status of a Groonga process by using a /reference/commands/status command.

Execution example:

% groonga /tmp/groonga-databases/introduction.db
status
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "uptime": 0,
# "max_command_version": 2,
# "n_queries": 0,
# "cache_hit_rate": 0.0,
# "version": "5.0.6-128-g8029ddb",
# "alloc_count": 206,
# "command_version": 1,
# "starttime": 1439995916,
# "default_command_version": 1
# }
# ]

As shown in the above example, a command returns a JSON array. The first element contains
an error code, execution time, etc. The second element is the result of an operation.

NOTE:
You can format a JSON using additional tools. For example, grnwrap, Grnline, jq and so
on.

Command format
Commands for operating a database accept arguments as follows:

Form_1: COMMAND VALUE_1 VALUE_2 ..

Form_2: COMMAND --NAME_1 VALUE_1 --NAME_2 VALUE_2 ..

In the first form, arguments must be passed in order. This kind of arguments are called
positional arguments because the position of each argument determines its meaning.

In the second form, you can specify a parameter name with its value. So, the order of
arguments is not defined. This kind of arguments are known as named parameters or keyword
arguments.

If you want to specify a value which contains white-spaces or special characters, such as
quotes and parentheses, please enclose the value with single-quotes or double-quotes.

For details, see also the paragraph of "command" in /reference/executables/groonga.

Basic commands
/reference/commands/status
shows status of a Groonga process.

/reference/commands/table_list
shows a list of tables in a database.

/reference/commands/column_list
shows a list of columns in a table.

/reference/commands/table_create
adds a table to a database.

/reference/commands/column_create
adds a column to a table.

/reference/commands/select
searches records from a table and shows the result.

/reference/commands/load
inserts records to a table.

Create a table
A /reference/commands/table_create command creates a new table.

In most cases, a table has a primary key which must be specified with its data type and
index type.

There are various data types such as integers, strings, etc. See also /reference/types for
more details. The index type determines the search performance and the availability of
prefix searches. The details will be described later.

Let's create a table. The following example creates a table with a primary key. The name
parameter specifies the name of the table. The flags parameter specifies the index type
for the primary key. The key_type parameter specifies the data type of the primary key.

Execution example:

table_create --name Site --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

The second element of the result indicates that the operation succeeded.

View a table
A /reference/commands/select command can enumerate records in a table.

Execution example:

select --table Site
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ]
# ]
# ]
# ]

When only a table name is specified with a table parameter, a /reference/commands/select
command returns the first (at most) 10 records in the table. [0] in the result shows the
number of records in the table. The next array is a list of columns. ["_id","Uint32"] is a
column of UInt32, named _id. ["_key","ShortText"] is a column of ShortText, named _key.

The above two columns, _id and _key, are the necessary columns. The _id column stores IDs
those are automatically allocated by Groonga. The _key column is associated with the
primary key. You are not allowed to rename these columns.

Create a column
A /reference/commands/column_create command creates a new column.

Let's add a column. The following example adds a column to the Site table. The table
parameter specifies the target table. The name parameter specifies the name of the column.
The type parameter specifies the data type of the column.

Execution example:

column_create --table Site --name title --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
select --table Site
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ]
# ]
# ]

Load records
A /reference/commands/load command loads JSON-formatted records into a table.

The following example loads nine records into the Site table.

Execution example:

load --table Site
[
{"_key":"http://example.org/","title":"This is test record 1!"},
{"_key":"http://example.net/","title":"test record 2."},
{"_key":"http://example.com/","title":"test test record three."},
{"_key":"http://example.net/afr","title":"test record four."},
{"_key":"http://example.org/aba","title":"test test test record five."},
{"_key":"http://example.com/rab","title":"test test test test record six."},
{"_key":"http://example.net/atv","title":"test test test record seven."},
{"_key":"http://example.org/gat","title":"test test record eight."},
{"_key":"http://example.com/vdw","title":"test test record nine."},
]
# [[0, 1337566253.89858, 0.000355720520019531], 9]

The second element of the result indicates how many records were successfully loaded. In
this case, all the records are successfully loaded.

Let's make sure that these records are correctly stored.

Execution example:

select --table Site
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ],
# [
# 2,
# "http://example.net/",
# "test record 2."
# ],
# [
# 3,
# "http://example.com/",
# "test test record three."
# ],
# [
# 4,
# "http://example.net/afr",
# "test record four."
# ],
# [
# 5,
# "http://example.org/aba",
# "test test test record five."
# ],
# [
# 6,
# "http://example.com/rab",
# "test test test test record six."
# ],
# [
# 7,
# "http://example.net/atv",
# "test test test record seven."
# ],
# [
# 8,
# "http://example.org/gat",
# "test test record eight."
# ],
# [
# 9,
# "http://example.com/vdw",
# "test test record nine."
# ]
# ]
# ]
# ]

Get a record
A /reference/commands/select command can search records in a table.

If a search condition is specified with a query parameter, a /reference/commands/select
command searches records matching the search condition and returns the matched records.

Let's get a record having a specified record ID. The following example gets the first
record in the Site table. More precisely, the query parameter specifies a record whose _id
column stores 1.

Execution example:

select --table Site --query _id:1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]

Next, let's get a record having a specified key. The following example gets the record
whose primary key is "http://example.org/". More precisely, the query parameter specifies
a record whose _key column stores "http://example.org/".

Execution example:

select --table Site --query '_key:"http://example.org/"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]

Create a lexicon table for full text search
Let's go on to how to make full text search.

Groonga uses an inverted index to provide fast full text search. So, the first step is to
create a lexicon table which stores an inverted index, also known as postings lists. The
primary key of this table is associated with a vocabulary made up of index terms and each
record stores postings lists for one index term.

The following shows a command which creates a lexicon table named Terms. The data type of
its primary key is ShortText.

Execution example:

table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]

The /reference/commands/table_create command takes many parameters but you don't need to
understand all of them. Please skip the next paragraph if you are not interested in how it
works.

The TABLE_PAT_KEY flag specifies to store index terms in a patricia trie. The
default_tokenizer parameter specifies the method for tokenizing text. This example uses
TokenBigram that is generally called N-gram.

The normalizer parameter specifies to normalize index terms.

Create an index column for full text search
The second step is to create an index column, which allows you to search records from its
associated column. That is to say this step specifies which column needs an index.

Let's create an index column. The following example creates an index column for a column
in the Site table.

Execution example:

column_create --table Terms --name blog_title --flags COLUMN_INDEX|WITH_POSITION --type Site --source title
# [[0, 1337566253.89858, 0.000355720520019531], true]

The table parameter specifies the index table and the name parameter specifies the index
column. The type parameter specifies the target table and the source parameter specifies
the target column. The COLUMN_INDEX flag specifies to create an index column and the
WITH_POSITION flag specifies to create a full inverted index, which contains the positions
of each index term. This combination, COLUMN_INDEX|WITH_POSITION, is recommended for the
general purpose.

NOTE:
You can create a lexicon table and index columns before/during/after loading records.
If a target column already has records, Groonga creates an inverted index in a static
manner. In contrast, if you load records into an already indexed column, Groonga
updates the inverted index in a dynamic manner.

Full text search
It's time. You can make full text search with a /reference/commands/select command.

A query for full text search is specified with a query parameter. The following example
searches records whose "title" column contains "this". The '@' specifies to make full text
search. Note that a lower case query matches upper case and capitalized terms in a record
if NormalizerAuto was specified when creating a lexcon table.

Execution example:

select --table Site --query title:@this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]

In this example, the first record matches the query because its title contains "This",
that is the capitalized form of the query.

A /reference/commands/select command accepts an optional parameter, named match_columns,
that specifies the default target columns. This parameter is used if target columns are
not specified in a query. [1]

The combination of "--match_columns title" and "--query this" brings you the same result
that "--query title:@this" does.

Execution example:

select --table Site --match_columns title --query this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]

Specify output columns
An output_columns parameter of a /reference/commands/select command specifies columns to
appear in the search result. If you want to specify more than one columns, please separate
column names by commas (',').

Execution example:

select --table Site --output_columns _key,title,_score --query title:@test
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://example.org/",
# "This is test record 1!",
# 1
# ],
# [
# "http://example.net/",
# "test record 2.",
# 1
# ],
# [
# "http://example.com/",
# "test test record three.",
# 2
# ],
# [
# "http://example.net/afr",
# "test record four.",
# 1
# ],
# [
# "http://example.org/aba",
# "test test test record five.",
# 3
# ],
# [
# "http://example.com/rab",
# "test test test test record six.",
# 4
# ],
# [
# "http://example.net/atv",
# "test test test record seven.",
# 3
# ],
# [
# "http://example.org/gat",
# "test test record eight.",
# 2
# ],
# [
# "http://example.com/vdw",
# "test test record nine.",
# 2
# ]
# ]
# ]
# ]

This example specifies three output columns including the _score column, which stores the
relevance score of each record.

Specify output ranges
A /reference/commands/select command returns a part of its search result if offset and/or
limit parameters are specified. These parameters are useful to paginate a search result, a
widely-used interface which shows a search result on a page by page basis.

An offset parameter specifies the starting point and a limit parameter specifies the
maximum number of records to be returned. If you need the first record in a search result,
the offset parameter must be 0 or omitted.

Execution example:

select --table Site --offset 0 --limit 3
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ],
# [
# 2,
# "http://example.net/",
# "test record 2."
# ],
# [
# 3,
# "http://example.com/",
# "test test record three."
# ]
# ]
# ]
# ]
select --table Site --offset 3 --limit 3
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 4,
# "http://example.net/afr",
# "test record four."
# ],
# [
# 5,
# "http://example.org/aba",
# "test test test record five."
# ],
# [
# 6,
# "http://example.com/rab",
# "test test test test record six."
# ]
# ]
# ]
# ]
select --table Site --offset 7 --limit 3
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 8,
# "http://example.org/gat",
# "test test record eight."
# ],
# [
# 9,
# "http://example.com/vdw",
# "test test record nine."
# ]
# ]
# ]
# ]

Sort a search result
A /reference/commands/select command sorts its result when used with a sortby parameter.

A sortby parameter specifies a column as a sorting creteria. A search result is arranged
in ascending order of the column values. If you want to sort a search result in reverse
order, please add a leading hyphen ('-') to the column name in a parameter.

The following example shows records in the Site table in reverse order.

Execution example:

select --table Site --sortby -_id
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 9,
# "http://example.com/vdw",
# "test test record nine."
# ],
# [
# 8,
# "http://example.org/gat",
# "test test record eight."
# ],
# [
# 7,
# "http://example.net/atv",
# "test test test record seven."
# ],
# [
# 6,
# "http://example.com/rab",
# "test test test test record six."
# ],
# [
# 5,
# "http://example.org/aba",
# "test test test record five."
# ],
# [
# 4,
# "http://example.net/afr",
# "test record four."
# ],
# [
# 3,
# "http://example.com/",
# "test test record three."
# ],
# [
# 2,
# "http://example.net/",
# "test record 2."
# ],
# [
# 1,
# "http://example.org/",
# "This is test record 1!"
# ]
# ]
# ]
# ]

The next example uses the _score column as the sorting criteria for ranking the search
result. The result is sorted in relevance order.

Execution example:

select --table Site --query title:@test --output_columns _id,_score,title --sortby -_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 6,
# 4,
# "test test test test record six."
# ],
# [
# 5,
# 3,
# "test test test record five."
# ],
# [
# 7,
# 3,
# "test test test record seven."
# ],
# [
# 8,
# 2,
# "test test record eight."
# ],
# [
# 3,
# 2,
# "test test record three."
# ],
# [
# 9,
# 2,
# "test test record nine."
# ],
# [
# 1,
# 1,
# "This is test record 1!"
# ],
# [
# 4,
# 1,
# "test record four."
# ],
# [
# 2,
# 1,
# "test record 2."
# ]
# ]
# ]
# ]

If you want to specify more than one columns, please separate column names by commas
(','). In such a case, a search result is sorted in order of the values in the first
column, and then records having the same values in the first column are sorted in order of
the second column values.

Execution example:

select --table Site --query title:@test --output_columns _id,_score,title --sortby -_score,_id
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 6,
# 4,
# "test test test test record six."
# ],
# [
# 5,
# 3,
# "test test test record five."
# ],
# [
# 7,
# 3,
# "test test test record seven."
# ],
# [
# 3,
# 2,
# "test test record three."
# ],
# [
# 8,
# 2,
# "test test record eight."
# ],
# [
# 9,
# 2,
# "test test record nine."
# ],
# [
# 1,
# 1,
# "This is test record 1!"
# ],
# [
# 2,
# 1,
# "test record 2."
# ],
# [
# 4,
# 1,
# "test record four."
# ]
# ]
# ]
# ]
footnote

[1] Currently, a match_columns parameter is available iff there exists an inverted index
for full text search. A match_columns parameter for a regular column is not
supported.

Remote access
You can use Groonga as a server which allows remote access. Groonga supports the original
protocol (GQTP), the memcached binary protocol and HTTP.

Hypertext transfer protocol (HTTP)
How to run an HTTP server
Groonga supports the hypertext transfer protocol (HTTP). The following form shows how to
run Groonga as an HTTP server daemon.

Form:

groonga [-p PORT_NUMBER] -d --protocol http DB_PATH

The --protocol option and its argument specify the protocol of the server. "http"
specifies to use HTTP. If the -p option is not specified, Groonga uses the default port
number 10041.

The following command runs an HTTP server that listens on the port number 80.

Execution example:

% sudo groonga -p 80 -d --protocol http /tmp/groonga-databases/introduction.db
%

NOTE:
You must have root privileges if you listen on the port number 80 (well known port).
There is no such a limitation about the port number 1024 or over.

How to send a command to an HTTP server
You can send a command to an HTTP server by sending a GET request to /d/COMMAND_NAME.
Command parameters can be passed as parameters of the GET request. The format is
"?NAME_1=VALUE_1&NAME_2=VALUE_2&...".

The following example shows how to send commands to an HTTP server.

Execution example:

http://HOST_NAME_OR_IP_ADDRESS[:PORT_NUMBER]/d/status
Executed command:
status
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "uptime": 0,
# "max_command_version": 2,
# "n_queries": 0,
# "cache_hit_rate": 0.0,
# "version": "5.0.6-128-g8029ddb",
# "alloc_count": 185,
# "command_version": 1,
# "starttime": 1439995935,
# "default_command_version": 1
# }
# ]
http://HOST_NAME_OR_IP_ADDRESS[:PORT_NUMBER]/d/select?table=Site&query=title:@this
Executed command:
select --table Site --query title:@this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/",
# "japan",
# ".org",
# "http://example.net/",
# [
# "http://example.net/",
# "http://example.org/",
# "http://example.com/"
# ],
# "128452975x503157902",
# "This is test record 1!"
# ]
# ]
# ]
# ]

Administration tool (HTTP)
An HTTP server of Groonga provides a browser based administration tool that makes database
management easy. After starting an HTTP server, you can use the administration tool by
accessing http://HOST_NAME_OR_IP_ADDRESS[:PORT_NUMBER]/. Note that Javascript must be
enabled for the tool to work properly.

Security issues
Groonga servers don't support user authentication. Everyone can view and modify databases
hosted by Groonga servers. You are recommended to restrict IP addresses that can access
Groonga servers. You can use iptables or similar for this purpose.

Various data types
Groonga is a full text search engine but also serves as a column-oriented data store.
Groonga supports various data types, such as numeric types, string types, date and time
type, longitude and latitude types, etc. This tutorial shows a list of data types and
explains how to use them.

Overview
The basic data types of Groonga are roughly divided into 5 groups --- boolean type,
numeric types, string types, date/time type and longitude/latitude types. The numeric
types are further divided according to whether integer or floating point number, signed or
unsigned and the number of bits allocated to each integer. The string types are further
divided according to the maximum length. The longitude/latitude types are further divided
according to the geographic coordinate system. For more details, see /reference/types.

In addition, Groonga supports reference types and vector types. Reference types are
designed for accessing other tables. Vector types are designed for storing a variable
number of values in one element.

First, let's create a table for this tutorial.

Execution example:

table_create --name ToyBox --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

Boolean type
The boolean type is used to store true or false. To create a boolean type column, specify
Bool to the type parameter of /reference/commands/column_create command. The default value
of the boolean type is false.

The following example creates a boolean type column and adds three records. Note that the
third record has the default value because no value is specified.

Execution example:

column_create --table ToyBox --name is_animal --type Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","is_animal":true}
{"_key":"Flower","is_animal":false}
{"_key":"Block"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select --table ToyBox --output_columns _key,is_animal
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "is_animal",
# "Bool"
# ]
# ],
# [
# "Monkey",
# true
# ],
# [
# "Flower",
# false
# ],
# [
# "Block",
# false
# ]
# ]
# ]
# ]

Numeric types
The numeric types are divided into integer types and a floating point number type. The
integer types are further divided into the signed integer types and unsigned integer
types. In addition, you can choose the number of bits allocated to each integer. For more
details, see /reference/types. The default value of the numeric types is 0.

The following example creates an Int8 column and a Float column, and then updates existing
records. The /reference/commands/load command updates the weight column as expected. On
the other hand, the price column values are different from the specified values because
15.9 is not an integer and 200 is too large. 15.9 is converted to 15 by removing the
fractional part. 200 causes an overflow and the result becomes -56. Note that the result
of an overflow/underflow is undefined.

Execution example:

column_create --table ToyBox --name price --type Int8
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table ToyBox --name weight --type Float
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","price":15.9}
{"_key":"Flower","price":200,"weight":0.13}
{"_key":"Block","weight":25.7}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select --table ToyBox --output_columns _key,price,weight
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "price",
# "Int8"
# ],
# [
# "weight",
# "Float"
# ]
# ],
# [
# "Monkey",
# 15,
# 0.0
# ],
# [
# "Flower",
# -56,
# 0.13
# ],
# [
# "Block",
# 0,
# 25.7
# ]
# ]
# ]
# ]

String types
The string types are divided according to the maximum length. For more details, see
/reference/types. The default value is the zero-length string.

The following example creates a ShortText column and updates existing records. The third
record ("Block" key record) has the default value (zero-length string) because it's not
updated.

Execution example:

column_create --table ToyBox --name name --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","name":"Grease"}
{"_key":"Flower","name":"Rose"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table ToyBox --output_columns _key,name
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# "Monkey",
# "Grease"
# ],
# [
# "Flower",
# "Rose"
# ],
# [
# "Block",
# ""
# ]
# ]
# ]
# ]

Date and time type
The date and time type of Groonga is Time. Actually, a Time column stores a date and time
as the number of microseconds since the Epoch, 1970-01-01 00:00:00. A Time value can
represent a date and time before the Epoch because the actual data type is a signed
integer. Note that /reference/commands/load and /reference/commands/select commands use a
decimal number to represent a data and time in seconds. The default value is 0.0, which
means the Epoch.

NOTE:
Groonga internally holds the value of Epoch as pair of integer. The first integer
represents the value of seconds, on the other hand, the second integer represents the
value of micro seconds. So, Groonga shows the value of Epoch as floating point.
Integral part means the value of seconds, fraction part means the value of micro
seconds.

The following example creates a Time column and updates existing records. The first record
("Monkey" key record) has the default value (0.0) because it's not updated.

Execution example:

column_create --table ToyBox --name time --type Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Flower","time":1234567890.1234569999}
{"_key":"Block","time":-1234567890}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table ToyBox --output_columns _key,time
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "time",
# "Time"
# ]
# ],
# [
# "Monkey",
# 0.0
# ],
# [
# "Flower",
# 1234567890.12346
# ],
# [
# "Block",
# -1234567890.0
# ]
# ]
# ]
# ]

Longitude and latitude types
The longitude and latitude types are divided according to the geographic coordinate
system. For more details, see /reference/types. To represent a longitude and latitude,
Groonga uses a string formatted as follows:

· "longitude x latitude" in milliseconds (e.g.: "128452975x503157902")

· "longitude x latitude" in degrees (e.g.: "35.6813819x139.7660839")

A number with/without a decimal point represents a longitude or latitude in
milliseconds/degrees respectively. Note that a combination of a number with a decimal
point and a number without a decimal point (e.g. 35.1x139) must not be used. A comma (',')
is also available as a delimiter. The default value is "0x0".

The following example creates a WGS84GeoPoint column and updates existing records. The
second record ("Flower" key record) has the default value ("0x0") because it's not
updated.

Execution example:

column_create --table ToyBox --name location --type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","location":"128452975x503157902"}
{"_key":"Block","location":"35.6813819x139.7660839"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table ToyBox --output_columns _key,location
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ]
# ],
# [
# "Monkey",
# "128452975x503157902"
# ],
# [
# "Flower",
# "0x0"
# ],
# [
# "Block",
# "128452975x503157902"
# ]
# ]
# ]
# ]

Reference types
Groonga supports a reference column, which stores references to records in its associated
table. In practice, a reference column stores the IDs of the referred records in the
associated table and enables access to those records.

You can specify a column in the associated table to the output_columns parameter of a
/reference/commands/select command. The format is Src.Dest where Src is the name of the
reference column and Dest is the name of the target column. If only the reference column
is specified, it is handled as Src._key. Note that if a reference does not point to a
valid record, a /reference/commands/select command outputs the default value of the target
column.

The following example adds a reference column to the Site table that was created in
tutorial-introduction-create-table. The new column, named link, is designed for storing
links among records in the Site table.

Execution example:

column_create --table Site --name link --type Site
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","link":"http://example.net/"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select --table Site --output_columns _key,title,link._key,link.title --query title:@this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ],
# [
# "link._key",
# "ShortText"
# ],
# [
# "link.title",
# "ShortText"
# ]
# ],
# [
# "http://example.org/",
# "This is test record 1!",
# "http://example.net/",
# "test record 2."
# ]
# ]
# ]
# ]

The type parameter of the /reference/commands/column_create command specifies the table to
be associated with the reference column. In this example, the reference column is
associated with the own table. Then, the /reference/commands/load command registers a link
from "http://example.org" to "http://example.net". Note that a reference column requires
the primary key, not the ID, of the record to be referred to. After that, the link is
confirmed by the /reference/commands/select command. In this case, the primary key and the
title of the referred record are output because link._key and link.title are specified to
the output_columns parameter.

Vector types
Groonga supports a vector column, in which each element can store a variable number of
values. To create a vector column, specify the COLUMN_VECTOR flag to the flags parameter
of a /reference/commands/column_create command. A vector column is useful to represent a
many-to-many relationship.

The previous example used a regular column, so each record could have at most one link.
Obviously, the specification is insufficient because a site usually has more than one
links. To solve this problem, the following example uses a vector column.

Execution example:

column_create --table Site --name links --flags COLUMN_VECTOR --type Site
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","links":["http://example.net/","http://example.org/","http://example.com/"]},
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select --table Site --output_columns _key,title,links._key,links.title --query title:@this
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ],
# [
# "links._key",
# "ShortText"
# ],
# [
# "links.title",
# "ShortText"
# ]
# ],
# [
# "http://example.org/",
# "This is test record 1!",
# [
# "http://example.net/",
# "http://example.org/",
# "http://example.com/"
# ],
# [
# "test record 2.",
# "This is test record 1!",
# "test test record three."
# ]
# ]
# ]
# ]
# ]

The only difference at the first step is the flags parameter that specifies to create a
vector column. The type parameter of the /reference/commands/column_create command is the
same as in the previous example. Then, the /reference/commands/load command registers
three links from "http://example.org/" to "http://example.net/", "http://example.org/" and
"http://example.com/". After that, the links are confirmed by the
/reference/commands/select command. In this case, the primary keys and the titles are
output as arrays because links._key and links.title are specified to the output_columns
parameter.

Various search conditions
Groonga supports to narrow down by using syntax like JavaScript, sort by the calculated
value. Additionally, Groonga also supports to narrow down & sort search results by using
location information (latitude & longitude).

Narrow down & Full-text search by using syntax like JavaScript
The filter parameter of select command accepts the search condition. There is one
difference between filter parameter and query parameter, you need to specify the condition
by syntax like JavaScript for filter parameter.

Execution example:

select --table Site --filter "_id <= 1" --output_columns _id,_key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/"
# ]
# ]
# ]
# ]

See the detail of above query. Here is the condition which is specified as filter
parameter:

_id <= 1

In this case, this query returns the records which meets the condition that the value of
_id is equal to or less than 1.

Moreover, you can use && for AND search, || for OR search.

Execution example:

select --table Site --filter "_id >= 4 && _id <= 6" --output_columns _id,_key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 4,
# "http://example.net/afr"
# ],
# [
# 5,
# "http://example.org/aba"
# ],
# [
# 6,
# "http://example.com/rab"
# ]
# ]
# ]
# ]
select --table Site --filter "_id <= 2 || _id >= 7" --output_columns _id,_key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://example.org/"
# ],
# [
# 2,
# "http://example.net/"
# ],
# [
# 7,
# "http://example.net/atv"
# ],
# [
# 8,
# "http://example.org/gat"
# ],
# [
# 9,
# "http://example.com/vdw"
# ]
# ]
# ]
# ]

If you specify query parameter and filter parameter at the same time, you can get the
records which meets both of the condition as a result.

Sort by using scorer
select command accepts scorer parameter which is used to process each record of full-text
search results.

This parameter accepts the conditions which is specified by syntax like JavaScript as same
as filter parameter.

Execution example:

select --table Site --filter "true" --scorer "_score = rand()" --output_columns _id,_key,_score --sortby _score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# 6,
# "http://example.com/rab",
# 424238335
# ],
# [
# 9,
# "http://example.com/vdw",
# 596516649
# ],
# [
# 7,
# "http://example.net/atv",
# 719885386
# ],
# [
# 2,
# "http://example.net/",
# 846930886
# ],
# [
# 8,
# "http://example.org/gat",
# 1649760492
# ],
# [
# 3,
# "http://example.com/",
# 1681692777
# ],
# [
# 4,
# "http://example.net/afr",
# 1714636915
# ],
# [
# 1,
# "http://example.org/",
# 1804289383
# ],
# [
# 5,
# "http://example.org/aba",
# 1957747793
# ]
# ]
# ]
# ]
select --table Site --filter "true" --scorer "_score = rand()" --output_columns _id,_key,_score --sortby _score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# 4,
# "http://example.net/afr",
# 783368690
# ],
# [
# 2,
# "http://example.net/",
# 1025202362
# ],
# [
# 5,
# "http://example.org/aba",
# 1102520059
# ],
# [
# 1,
# "http://example.org/",
# 1189641421
# ],
# [
# 3,
# "http://example.com/",
# 1350490027
# ],
# [
# 8,
# "http://example.org/gat",
# 1365180540
# ],
# [
# 9,
# "http://example.com/vdw",
# 1540383426
# ],
# [
# 7,
# "http://example.net/atv",
# 1967513926
# ],
# [
# 6,
# "http://example.com/rab",
# 2044897763
# ]
# ]
# ]
# ]

'_score' is one of a pseudo column. The score of full-text search is assigned to it. See
/reference/columns/pseudo about '_score' column.

In the above query, the condition of scorer parameter is:

_score = rand()

In this case, the score of full-text search is overwritten by the value of rand()
function.

The condition of sortby parameter is:

_score

This means that sorting the search result by ascending order.

As a result, the order of search result is randomized.

Narrow down & sort by using location information
Groonga supports to store location information (Longitude & Latitude) and not only narrow
down but also sort by using it.

Groonga supports two kind of column types to store location information. One is
TokyoGeoPoint, the other is WGS84GeoPoint. TokyoGeoPoint is used for Japan geodetic
system. WGS84GeoPoint is used for world geodetic system.

Specify longitude and latitude as follows:

· "[latitude in milliseconds]x[longitude in milliseconds]"(e.g.: "128452975x503157902")

· "[latitude in milliseconds],[longitude in milliseconds]"(e.g.: "128452975,503157902")

· "[latitude in degrees]x[longitude in degrees]"(e.g.: "35.6813819x139.7660839")

· "[latitude in degrees],[longitude in degrees]"(e.g.: "35.6813819,139.7660839")

Let's store two location information about station in Japan by WGS. One is Tokyo station,
the other is Shinjyuku station. Both of them are station in Japan. The latitude of Tokyo
station is 35 degrees 40 minutes 52.975 seconds, the longitude of Tokyo station is 139
degrees 45 minutes 57.902 seconds. The latitude of Shinjyuku station is 35 degrees 41
minutes 27.316 seconds, the longitude of Shinjyuku station is 139 degrees 42 minutes
0.929 seconds. Thus, location information in milliseconds are "128452975x503157902" and
"128487316x502920929" respectively. location information in degrees are
"35.6813819x139.7660839" and "35.6909211x139.7002581" respectively.

Let's register location information in milliseconds.

Execution example:

column_create --table Site --name location --type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","location":"128452975x503157902"}
{"_key":"http://example.net/","location":"128487316x502920929"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table Site --query "_id:1 OR _id:2" --output_columns _key,location
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902"
# ],
# [
# "http://example.net/",
# "128487316x502920929"
# ]
# ]
# ]
# ]

Then assign the value of geo distance which is calculated by
/reference/functions/geo_distance function to scorer parameter.

Let's show geo distance from Akihabara station in Japan. In world geodetic system, the
latitude of Akihabara station is 35 degrees 41 minutes 55.259 seconds, the longitude of
Akihabara station is 139 degrees 46 minutes 27.188 seconds. Specify "128515259x503187188"
for geo_distance function.

Execution example:

select --table Site --query "_id:1 OR _id:2" --output_columns _key,location,_score --scorer '_score = geo_distance(location, "128515259x503187188")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902",
# 2054
# ],
# [
# "http://example.net/",
# "128487316x502920929",
# 6720
# ]
# ]
# ]
# ]

As you can see, the geo distance between Tokyo station and Akihabara station is 2054
meters, the geo distance between Akihabara station and Shinjyuku station is 6720 meters.

The return value of geo_distance function is also used for sorting by specifying pseudo
_score column to sortby parameter.

Execution example:

select --table Site --query "_id:1 OR _id:2" --output_columns _key,location,_score --scorer '_score = geo_distance(location, "128515259x503187188")' --sortby -_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://example.net/",
# "128487316x502920929",
# 6720
# ],
# [
# "http://example.org/",
# "128452975x503157902",
# 2054
# ]
# ]
# ]
# ]

Groonga also supports to narrow down by "a certain point within specified meters".

In such a case, use /reference/functions/geo_in_circle function in filter parameter.

For example, search the records which exists within 5000 meters from Akihabara station.

Execution example:

select --table Site --output_columns _key,location --filter 'geo_in_circle(location, "128515259x503187188", 5000)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902"
# ]
# ]
# ]
# ]

There is /reference/functions/geo_in_rectangle function which is used to search a certain
point within specified region.

Drilldown
You learned how to search and sort searched results in the previous sections. Now that
you can search as you likes, but how do you summarize the number of records which has
specific value in the column?

As you know, there is a naive solution to execute query by every the value of column, then
you can get the number of records as a result. It is a simple way, but it is not
reasonable to many records.

If you are familiar with SQL, you will doubt with "Is there a similar SQL functionality to
GROUP BY in Groonga?".

Of course, Groonga provides such a functionality. It's called as drilldown.

drilldown enables you to get the number of records which belongs to specific the value of
column at once.

To illustrate this feature, imagine the case that classification by domain and grouping by
country that domain belongs to.

Here is the concrete examples how to use this feature.

In this example, we add two columns to Site table. domain column is used for TLD (top
level domain). country column is used for country name. The type of these columns are
SiteDomain table which uses domain name as a primary key and SiteCountry table which uses
country name as a primary key.

Execution example:

table_create --name SiteDomain --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name SiteCountry --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Site --name domain --flags COLUMN_SCALAR --type SiteDomain
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Site --name country --flags COLUMN_SCALAR --type SiteCountry
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","domain":".org","country":"japan"},
{"_key":"http://example.net/","domain":".net","country":"brazil"},
{"_key":"http://example.com/","domain":".com","country":"japan"},
{"_key":"http://example.net/afr","domain":".net","country":"usa"},
{"_key":"http://example.org/aba","domain":".org","country":"korea"},
{"_key":"http://example.com/rab","domain":".com","country":"china"},
{"_key":"http://example.net/atv","domain":".net","country":"china"},
{"_key":"http://example.org/gat","domain":".org","country":"usa"},
{"_key":"http://example.com/vdw","domain":".com","country":"japan"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 9]

Here is a example of drilldown with domain column. Three kind of values are used in domain
column - ".org", ".net" and ".com".

Execution example:

select --table Site --limit 0 --drilldown domain
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# ".org",
# 3
# ],
# [
# ".net",
# 3
# ],
# [
# ".com",
# 3
# ]
# ]
# ]
# ]

Here is a summary of above query.

Drilldown by domain column
┌─────────┬──────────────────────────┬─────────────────────────────────┐
│Group by │ The number of group │ Group records means │
│ │ records │ following records │
├─────────┼──────────────────────────┼─────────────────────────────────┤
│.org │ 3 │ │
│ │ │ · http://example.org/
│ │ │ │
│ │ │ · http://example.org/aba
│ │ │ │
│ │ │ · http://example.org/gat
├─────────┼──────────────────────────┼─────────────────────────────────┤
│.net │ 3 │ │
│ │ │ · http://example.net/
│ │ │ │
│ │ │ · http://example.net/afr
│ │ │ │
│ │ │ · http://example.net/atv
└─────────┴──────────────────────────┴─────────────────────────────────┘

│.com │ 3 │ │
│ │ │ · http://example.com/
│ │ │ │
│ │ │ · http://example.com/rab
│ │ │ │
│ │ │ · http://example.com/vdw
└─────────┴──────────────────────────┴─────────────────────────────────┘

The value of drilldown are returned as the value of _nsubrecs column. In this case, Site
table is grouped by ".org", ".net", ".com" domain. _nsubrecs shows that each three domain
has three records.

If you execute drildown to the column which has table as a type, you can get the value of
column which is stored in referenced table. _nsubrecs pseudo column is added to the table
which is used for drilldown. this pseudo column stores the number of records which is
grouped by.

Then, investigate referenced table in detail. As Site table use SiteDomain table as column
type of domain, you can use --drilldown_output_columns to know detail of referenced
column.

Execution example:

select --table Site --limit 0 --drilldown domain --drilldown_output_columns _id,_key,_nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 1,
# ".org",
# 3
# ],
# [
# 2,
# ".net",
# 3
# ],
# [
# 3,
# ".com",
# 3
# ]
# ]
# ]
# ]

Now, you can see detail of each grouped domain, drilldown by country column which has
".org" as column value.

Execution example:

select --table Site --limit 0 --filter "domain._id == 1" --drilldown country
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "japan",
# 1
# ],
# [
# "korea",
# 1
# ],
# [
# "usa",
# 1
# ]
# ]
# ]
# ]

Drilldown with multiple column
Drilldown feature supports multiple column. Use comma separated multiple column names as
drildown parameter. You can get the each result of drilldown at once.

Execution example:

select --table Site --limit 0 --drilldown domain,country
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# ".org",
# 3
# ],
# [
# ".net",
# 3
# ],
# [
# ".com",
# 3
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "japan",
# 3
# ],
# [
# "brazil",
# 1
# ],
# [
# "usa",
# 2
# ],
# [
# "korea",
# 1
# ],
# [
# "china",
# 2
# ]
# ]
# ]
# ]

Sorting drildown results
Use --drilldown_sortby if you want to sort the result of drilldown. For example, specify
_nsubrecs as ascending order.

Execution example:

select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "brazil",
# 1
# ],
# [
# "korea",
# 1
# ],
# [
# "usa",
# 2
# ],
# [
# "china",
# 2
# ],
# [
# "japan",
# 3
# ]
# ]
# ]
# ]

limits drildown results
The number of drilldown results is limited to 10 as a default. Use drilldown_limit and
drilldown_offset parameter to customize orilldown results.

Execution example:

select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs --drilldown_limit 2 --drilldown_offset 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "usa",
# 2
# ],
# [
# "china",
# 2
# ]
# ]
# ]
# ]

Note that drilldown to the column which stores string is slower than the columns which
stores the other types. If you drilldown to string type of column, create the table that
type of primary key is string, then create the column which refers that table.

Tag search and reverse resolution of reference relationships
As you know, Groonga supports to store array in column which refers other table. In fact,
you can do tag search by using array data which refers other table.

Tag search is very fast because Groonga use inverted index as data structure.

Tag search
Let's consider to create a search engine for an web site to share movies. Each movie may
be associated with multiple keywords which represents the content of movie.

Let's create tables for movie information, then search the movies.

First, create the Video table which stores movie information. the Video table has two
columns. the title column stores title of the movie. the tags column stores multiple tag
information in reference Tag table.

Next, create the Tag table which stores tag information. the Tag table has one column. The
tag string is stored as primary key, then index_tags stores indexes for tags column of
Video table.

Execution example:

table_create --name Video --flags TABLE_HASH_KEY --key_type UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name Tag --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Video --name title --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Video --name tags --flags COLUMN_VECTOR --type Tag
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Tag --name index_tags --flags COLUMN_INDEX --type Video --source tags
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Video
[
{"_key":1,"title":"Soccer 2010","tags":["Sports","Soccer"]},
{"_key":2,"title":"Zenigata Kinjirou","tags":["Variety","Money"]},
{"_key":3,"title":"groonga Demo","tags":["IT","Server","groonga"]},
{"_key":4,"title":"Moero!! Ultra Baseball","tags":["Sports","Baseball"]},
{"_key":5,"title":"Hex Gone!","tags":["Variety","Quiz"]},
{"_key":6,"title":"Pikonyan 1","tags":["Animation","Pikonyan"]},
{"_key":7,"title":"Draw 8 Month","tags":["Animation","Raccoon"]},
{"_key":8,"title":"K.O.","tags":["Animation","Music"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 8]

After creating indexed column, you can do full-text search very fast. The indexed column
is also automatically updated when stored data is refreshed.

List up the movies that specific keywords are given.

Execution example:

select --table Video --query tags:@Variety --output_columns _key,title
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 2,
# "Zenigata Kinjirou"
# ],
# [
# 5,
# "Hex Gone!"
# ]
# ]
# ]
# ]
select --table Video --query tags:@Sports --output_columns _key,title
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "Soccer 2010"
# ],
# [
# 4,
# "Moero!! Ultra Baseball"
# ]
# ]
# ]
# ]
select --table Video --query tags:@Animation --output_columns _key,title
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 6,
# "Pikonyan 1"
# ],
# [
# 7,
# "Draw 8 Month"
# ],
# [
# 8,
# "K.O."
# ]
# ]
# ]
# ]

You can search by tags such as "Variety", "Sports" and "Animation".

Reverse resolution of reference relationships
Groonga supports indexes for reverse resolution among tables. Tag search is one of
concrete examples.

For example, you can search friendships by reverse resolution in social networking site.

Following example shows how to create User table which stores user information, username
column which stores user name, friends column which stores list of user's friends in
array, index_friends column as indexed column.

Execution example:

table_create --name User --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table User --name username --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table User --name friends --flags COLUMN_VECTOR --type User
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table User --name index_friends --flags COLUMN_INDEX --type User --source friends
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table User
[
{"_key":"ken","username":"健作","friends":["taro","jiro","tomo","moritapo"]}
{"_key":"moritapo","username":"森田","friends":["ken","tomo"]}
{"_key":"taro","username":"ぐるんが太郎","friends":["jiro","tomo"]}
{"_key":"jiro","username":"ぐるんが次郎","friends":["taro","tomo"]}
{"_key":"tomo","username":"トモちゃん","friends":["ken","hana"]}
{"_key":"hana","username":"花子","friends":["ken","taro","jiro","moritapo","tomo"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 6]

Let's show list of users who contains specified user in friend list.

Execution example:

select --table User --query friends:@tomo --output_columns _key,username
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "username",
# "ShortText"
# ]
# ],
# [
# "ken",
# "健作"
# ],
# [
# "taro",
# "ぐるんが太郎"
# ],
# [
# "jiro",
# "ぐるんが次郎"
# ],
# [
# "moritapo",
# "森田"
# ],
# [
# "hana",
# "花子"
# ]
# ]
# ]
# ]
select --table User --query friends:@jiro --output_columns _key,username
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "username",
# "ShortText"
# ]
# ],
# [
# "ken",
# "健作"
# ],
# [
# "taro",
# "ぐるんが太郎"
# ],
# [
# "hana",
# "花子"
# ]
# ]
# ]
# ]

Then drilldown the count which shows user is listed as friend.

Execution example:

select --table User --limit 0 --drilldown friends
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 6
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "friends",
# "User"
# ],
# [
# "index_friends",
# "UInt32"
# ],
# [
# "username",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 6
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "taro",
# 3
# ],
# [
# "jiro",
# 3
# ],
# [
# "tomo",
# 5
# ],
# [
# "moritapo",
# 2
# ],
# [
# "ken",
# 3
# ],
# [
# "hana",
# 1
# ]
# ]
# ]
# ]

As you can see, it shows the results which follows reverse resolution of reference
relationship.

Geo location search with index
Groonga supports to add indexes to the column which stores geo location information.
Groonga is very fast because it use such indexes against the column which contains geo
location information to search enormous number of records.

Execution example:

table_create --name GeoSite --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table GeoSite --name location --type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name GeoIndex --flags TABLE_PAT_KEY --key_type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table GeoIndex --name index_point --type GeoSite --flags COLUMN_INDEX --source location
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table GeoSite
[
{"_key":"http://example.org/","location":"128452975x503157902"},
{"_key":"http://example.net/","location":"128487316x502920929"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table GeoSite --filter 'geo_in_circle(location, "128515259x503187188", 5000)' --output_columns _key,location
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902"
# ]
# ]
# ]
# ]

These indexes are also used when sorting the records with geo location search.

Execution example:

select --table GeoSite --filter 'geo_in_circle(location, "128515259x503187188", 50000)' --output_columns _key,location,_score --sortby '-geo_distance(location, "128515259x503187188")' --scorer '_score = geo_distance(location, "128515259x503187188")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://example.org/",
# "128452975x503157902",
# 2054
# ],
# [
# "http://example.net/",
# "128487316x502920929",
# 6720
# ]
# ]
# ]
# ]

match_columns parameter
Full-text search against multiple columns
Groonga supports full-text search against multiple columns. Let's consider blog site.
Usually, blog site has a table which contains title column and content column. How do you
search the blog entry which contains specified keywords in title or content?

In such a case, there are two ways to create indexes. One way is creating column index
against each column. The other way is creating one column index against multiple columns.
Either way, Groonga supports similar full-text search syntax.

Creating column index against each column
Here is the example which create column index against each column.

First, create Blog1 table, add title column which stores title string, message column
which stores content of blog entry.

Then create IndexBlog1 table for column indexes, add index_title column for title column,
index_message column for message column.

Execution example:

table_create --name Blog1 --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog1 --name title --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog1 --name message --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name IndexBlog1 --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table IndexBlog1 --name index_title --flags COLUMN_INDEX|WITH_POSITION --type Blog1 --source title
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table IndexBlog1 --name index_message --flags COLUMN_INDEX|WITH_POSITION --type Blog1 --source message
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Blog1
[
{"_key":"grn1","title":"Groonga test","message":"Groonga message"},
{"_key":"grn2","title":"baseball result","message":"rakutan eggs 4 - 4 Groonga moritars"},
{"_key":"grn3","title":"Groonga message","message":"none"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

match_columns option of select command accepts multiple columns as search target. Specify
query string to query option. Then you can do full-text search title and content of blog
entries.

Let's try to search blog entries.

Execution example:

select --table Blog1 --match_columns title||message --query groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ],
# [
# 2,
# "grn2",
# "rakutan eggs 4 - 4 Groonga moritars",
# "baseball result"
# ]
# ]
# ]
# ]
select --table Blog1 --match_columns title||message --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ]
# ]
# ]
# ]
select --table Blog1 --match_columns title --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]

Creating one column index against multiple columns
Groonga also supports one column index against multiple columns.

The difference for previous example is only one column index exists. Thus, There is one
common column index against title and message column.

Even though same column index is used, Groonga supports to search against title column
only, message column only and title or message column.

Execution example:

table_create --name Blog2 --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog2 --name title --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog2 --name message --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name IndexBlog2 --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table IndexBlog2 --name index_blog --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION --type Blog2 --source title,message
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Blog2
[
{"_key":"grn1","title":"Groonga test","message":"Groonga message"},
{"_key":"grn2","title":"baseball result","message":"rakutan eggs 4 - 4 Groonga moritars"},
{"_key":"grn3","title":"Groonga message","message":"none"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

Let's search same query in previous section. You can get same search results.

Execution example:

select --table Blog2 --match_columns title||message --query groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ],
# [
# 2,
# "grn2",
# "rakutan eggs 4 - 4 Groonga moritars",
# "baseball result"
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]
select --table Blog2 --match_columns title||message --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]
select --table Blog2 --match_columns title --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]

NOTE:
There may be a question that "which is the better solution for indexing." It depends
on the case.

· Indexes for each column - The update performance tends to be better than multiple
colum index because there is enough buffer for updating. On the other hand, the
efficiency of disk usage is not so good.

· Indexes for multiple column - It saves disk usage because it shares common buffer. On
the other hand, the update performance is not so good.

Full text search with specific index name
TODO

Nested index search among related table by column index
If there are relationships among multiple table with column index, you can search multiple
table by specifying reference column name.

Here is the concrete example.

There are tables which store blog articles, comments for articles. The table which stores
articles has columns for article and comment. And the comment column refers Comments
table. The table which stores comments has columns for comment and column index to
article table.

if you want to search the articles which contain specified keyword in comment, you need to
execute fulltext search for table of comment, then search the records which contains
fulltext search results.

But, you can search the records by specifying the reference column index at once.

Here is the sample schema.

Execution example:

table_create Comments TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Articles TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Articles content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Articles comment COLUMN_SCALAR Comments
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Lexicon TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon articles_content COLUMN_INDEX|WITH_POSITION Articles content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon comments_content COLUMN_INDEX|WITH_POSITION Comments content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments article COLUMN_INDEX Articles comment
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is the sample data.

Execution example:

load --table Comments
[
{"_key": 1, "content": "I'm using Groonga too!"},
{"_key": 2, "content": "I'm using Groonga and Mroonga!"},
{"_key": 3, "content": "I'm using Mroonga too!"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
load --table Articles
[
{"content": "Groonga is fast!", "comment": 1},
{"content": "Groonga is useful!"},
{"content": "Mroonga is fast!", "comment": 3}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

You can write the query that search the records which contains specified keyword as a
comment, then fetch the articles which refers to it.

Query for searching the records described above:

select Articles --match_columns comment.content --query groonga --output_columns "_id, _score, *"

You need to concatenate comment column of Articles table and content column of Comments
table with period( . ) as --match_columns arguments.

At first, this query execute fulltext search from content of Comments table, then fetch
the records of Articles table which refers to already searched records of Comments table.
(Because of this, if you comment out the query which creates index column article of
Comments table, you can't get intended search results.)

Execution example:

select Articles --match_columns comment.content --query groonga --output_columns "_id, _score, *"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "comment",
# "Comments"
# ],
# [
# "content",
# "Text"
# ]
# ],
# [
# 1,
# 1,
# 1,
# "Groonga is fast!"
# ]
# ]
# ]
# ]

Now, you can search articles which contains specific keywords as a comment.

The feature of nested index search is not limited to the relationship between two table
only.

Here is the sample schema similar to previous one. The difference is added table which
express 'Reply' and relationship is extended to three tables.

Execution example:

table_create Replies2 TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Replies2 content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Comments2 TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments2 content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments2 comment COLUMN_SCALAR Replies2
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Articles2 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Articles2 content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Articles2 comment COLUMN_SCALAR Comments2
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Lexicon2 TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon2 articles_content COLUMN_INDEX|WITH_POSITION Articles2 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon2 comments_content COLUMN_INDEX|WITH_POSITION Comments2 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon2 replies_content COLUMN_INDEX|WITH_POSITION Replies2 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comments2 article COLUMN_INDEX Articles2 comment
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Replies2 reply_to COLUMN_INDEX Comments2 comment
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is the sample data.

Execution example:

load --table Replies2
[
{"_key": 1, "content": "I'm using Rroonga too!"},
{"_key": 2, "content": "I'm using Groonga and Mroonga and Rroonga!"},
{"_key": 3, "content": "I'm using Nroonga too!"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
load --table Comments2
[
{"_key": 1, "content": "I'm using Groonga too!", "comment": 1},
{"_key": 2, "content": "I'm using Groonga and Mroonga!", "comment": 2},
{"_key": 3, "content": "I'm using Mroonga too!"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
load --table Articles2
[
{"content": "Groonga is fast!", "comment": 1},
{"content": "Groonga is useful!", "comment": 2},
{"content": "Mroonga is fast!", "comment": 3}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

Query for searching the records described above:

select Articles2 --match_columns comment.content --query mroonga --output_columns "_id, _score, *"
select Articles2 --match_columns comment.comment.content --query mroonga --output_columns "_id, _score, *"

The first query searches mroonga from Comments2 table, the second one searches mroonga
from Replies2 and Comments2 table by using reference column index.

Execution example:

select Articles2 --match_columns comment.content --query mroonga --output_columns "_id, _score, *"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "comment",
# "Comments2"
# ],
# [
# "content",
# "Text"
# ]
# ],
# [
# 2,
# 1,
# 2,
# "Groonga is useful!"
# ],
# [
# 3,
# 1,
# 3,
# "Mroonga is fast!"
# ]
# ]
# ]
# ]
select Articles2 --match_columns comment.comment.content --query mroonga --output_columns "_id, _score, *"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ],
# [
# "comment",
# "Comments2"
# ],
# [
# "content",
# "Text"
# ]
# ],
# [
# 2,
# 1,
# 2,
# "Groonga is useful!"
# ]
# ]
# ]
# ]

As a result, the first query matches two article because of Comments2 table has two
records which contains mroonga as keyword.

On the other hand, the second one matches one article only because of Replies2 table has
only one record which contains mroonga as keyword, and there is one record which contains
same keyword and refers to the record in Comments2 table.

Indexes with Weight
TODO

Prefix search with patricia trie
Groonga supports to create a table with patricia trie option. By specifying it, You can
do prefix search.

And more, you can do suffix search against primary key by specifying additional option.

Prefix search by primary key
table_create command which uses TABLE_PAT_KEY for flags option supports prefix search by
primary key.

Execution example:

table_create --name PatPrefix --flags TABLE_PAT_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table PatPrefix
[
{"_key":"James"}
{"_key":"Jason"}
{"_key":"Jennifer"},
{"_key":"Jeff"},
{"_key":"John"},
{"_key":"Joseph"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 6]
select --table PatPrefix --query _key:^Je
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 3,
# "Jennifer"
# ],
# [
# 4,
# "Jeff"
# ]
# ]
# ]
# ]

Suffix search by primary key
table_create command which uses TABLE_PAT_KEY and KEY_WITH_SIS for flags option supports
prefix search and suffix search by primary key.

If you set KEY_WITH_SIS flag, suffix search records also are added when you add the data.
So if you search simply, the automatically added records are hit in addition to the
original records. In order to search only the original records, you need a plan.

For example, in order to make this distinction between the original records and
automatically added records, add the original column indicating that it is the original
record, and add original column is true to the search condition. For attention, use
--filter option because --query option is not specify Bool type value intuitively.

Execution example:

table_create --name PatSuffix --flags TABLE_PAT_KEY|KEY_WITH_SIS --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table PatSuffix --name original --type Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table PatSuffix
[
{"_key":"ひろゆき","original":true},
{"_key":"まろゆき","original":true},
{"_key":"ひろあき","original":true},
{"_key":"ゆきひろ","original":true}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
select --table PatSuffix --query _key:$ゆき
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "original",
# "Bool"
# ]
# ],
# [
# 3,
# "ゆき",
# false
# ],
# [
# 2,
# "ろゆき",
# false
# ],
# [
# 5,
# "まろゆき",
# true
# ],
# [
# 1,
# "ひろゆき",
# true
# ]
# ]
# ]
# ]
select --table PatSuffix --filter '_key @$ "ゆき" && original == true'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "original",
# "Bool"
# ]
# ],
# [
# 5,
# "まろゆき",
# true
# ],
# [
# 1,
# "ひろゆき",
# true
# ]
# ]
# ]
# ]

Additional information about lexicon for full text search
Groonga uses lexicon for full text search as a table. Thus, Groonga can hold multiple
information each lexicon. For example, Groonga holds frequency of word, flags for stop
word, importance of word and so on.

TODO: Write document.

Let's create micro-blog
Let's create micro-blog with full text search by Groonga. Micro-blog is one of the
broadcast medium in the forms of blog. It is mainly used to post small messages like a
Twitter.

Create a table
Let's create table.

table_create --name Users --flags TABLE_HASH_KEY --key_type ShortText
table_create --name Comments --flags TABLE_HASH_KEY --key_type ShortText
table_create --name HashTags --flags TABLE_HASH_KEY --key_type ShortText
table_create --name Bigram --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
table_create --name GeoIndex --flags TABLE_PAT_KEY --key_type WGS84GeoPoint

column_create --table Users --name name --flags COLUMN_SCALAR --type ShortText
column_create --table Users --name follower --flags COLUMN_VECTOR --type Users
column_create --table Users --name favorites --flags COLUMN_VECTOR --type Comments
column_create --table Users --name location --flags COLUMN_SCALAR --type WGS84GeoPoint
column_create --table Users --name location_str --flags COLUMN_SCALAR --type ShortText
column_create --table Users --name description --flags COLUMN_SCALAR --type ShortText
column_create --table Users --name followee --flags COLUMN_INDEX --type Users --source follower

column_create --table Comments --name comment --flags COLUMN_SCALAR --type ShortText
column_create --table Comments --name last_modified --flags COLUMN_SCALAR --type Time
column_create --table Comments --name replied_to --flags COLUMN_SCALAR --type Comments
column_create --table Comments --name replied_users --flags COLUMN_VECTOR --type Users
column_create --table Comments --name hash_tags --flags COLUMN_VECTOR --type HashTags
column_create --table Comments --name location --flags COLUMN_SCALAR --type WGS84GeoPoint
column_create --table Comments --name posted_by --flags COLUMN_SCALAR --type Users
column_create --table Comments --name favorited_by --flags COLUMN_INDEX --type Users --source favorites

column_create --table HashTags --name hash_index --flags COLUMN_INDEX --type Comments --source hash_tags

column_create --table Bigram --name users_index --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION --type Users --source name,location_str,description
column_create --table Bigram --name comment_index --flags COLUMN_INDEX|WITH_POSITION --type Comments --source comment

column_create --table GeoIndex --name users_location --type Users --flags COLUMN_INDEX --source location
column_create --table GeoIndex --name comments_location --type Comments --flags COLUMN_INDEX --source location

Users table
This is the table which stores user information. It stores name of user, profile, list of
follower and so on.

_key User ID

name User name

follower
List of following users

favorites
List of favorite comments

location
Current location of user (geolocation)

location_str
Current location of user (string)

description
User profile

followee
Indexes for follower column in Users table. With this indexes, you can search
users who follows the person.

Comments table
This is the table which stores comments and its metadata. It stores content of comment,
posted date, comment which reply to, and so on.

_key Comment ID

comment
Content of comment

last_modified
Posted date

replied_to
Comment which you reply to someone

replied_users
List of users who you reply to

hash_tags
List of hash tags about comment

location
Posted place (for geolocation)

posted_by
Person who write comment

favorited_by
Indexes for favorites column in Users table. With this indexes, you can search the
person who mark comment as favorite one.

HashTags table
This is the table which stores hash tags for comments.

_key Hash tag

hash_index
Indexes for Comments.hash_tags. With this indexes, you can search list of comments
with specified hash tags.

Bigram table
This is the table which stores indexes for full text search by user information or
comments.

_key Word

users_index
Indexes of user information. This column contains indexes of user name
(Users.name), current location (Users.location_str), profile (Users.description).

comment_index
Indexes about content of comments (Comments.comment).

GeoIndex table
This is the table which stores indexes of location column to search geo location
effectively.

users_location
Indexes of location column for Users table

comments_location
Indexes of location column for Comments table

Loading data
Then, load example data.

load --table Users
[
{
"_key": "alice",
"name": "Alice",
"follower": ["bob"],
"favorites": [],
"location": "152489000x-255829000",
"location_str": "Boston, Massachusetts",
"description": "Groonga developer"
},
{
"_key": "bob",
"name": "Bob",
"follower": ["alice","charlie"],
"favorites": ["alice:1","charlie:1"],
"location": "146249000x-266228000",
"location_str": "Brooklyn, New York City",
"description": ""
},
{
"_key": "charlie",
"name": "Charlie",
"follower": ["alice","bob"],
"favorites": ["alice:1","bob:1"],
"location": "146607190x-267021260",
"location_str": "Newark, New Jersey",
"description": "Hmm,Hmm"
}
]

load --table Comments
[
{
"_key": "alice:1",
"comment": "I've created micro-blog!",
"last_modified": "2010/03/17 12:05:00",
"posted_by": "alice",
},
{
"_key": "bob:1",
"comment": "First post. test,test...",
"last_modified": "2010/03/17 12:00:00",
"posted_by": "bob",
},
{
"_key": "alice:2",
"comment": "@bob Welcome!!!",
"last_modified": "2010/03/17 12:05:00",
"replied_to": "bob:1",
"replied_users": ["bob"],
"posted_by": "alice",
},
{
"_key": "bob:2",
"comment": "@alice Thanks!",
"last_modified": "2010/03/17 13:00:00",
"replied_to": "alice:2",
"replied_users": ["alice"],
"posted_by": "bob",
},
{
"_key": "bob:3",
"comment": "I've just used 'Try-Groonga' now! #groonga",
"last_modified": "2010/03/17 14:00:00",
"hash_tags": ["groonga"],
"location": "146566000x-266422000",
"posted_by": "bob",
},
{
"_key": "bob:4",
"comment": "I'm come at city of New York for development camp! #groonga #travel",
"last_modified": "2010/03/17 14:05:00",
"hash_tags": ["groonga", "travel"],
"location": "146566000x-266422000",
"posted_by": "bob",
},
{
"_key": "charlie:1",
"comment": "@alice @bob I've tried to register!",
"last_modified": "2010/03/17 15:00:00",
"replied_users": ["alice", "bob"],
"location": "146607190x-267021260",
"posted_by": "charlie",
}
{
"_key": "charlie:2",
"comment": "I'm at the Museum of Modern Art in NY now!",
"last_modified": "2010/03/17 15:05:00",
"location": "146741340x-266319590",
"posted_by": "charlie",
}
]

follower column and favorites column in Users table and replied_users column in Comments
table are vector column, so specify the value as an array.

location column in Users table, location column in Comments table use GeoPoint type. This
type accepts "[latitude]x[longitude]".

last_modified column in Comments table use Time type.

There are two way to specify the value. First, specify epoch (seconds since Jan, 1, 1970
AM 00:00:00) directly. In this case, you can specify micro seconds as fractional part.
The value is converted from factional part to the time which is micro seconds based one
when data is loaded. The second, specify the timestamp as string in following format:
"(YEAR)/(MONTH)/(DAY) (HOUR):(MINUTE):(SECOND)". In this way, the string is casted to
proper micro seconds when data is loaded.

Search
Let's search micro-blog.

Search users by keyword
In this section, we search micro-blog against multiple column by keyword. See
match_columns to search multiple column at once.

Let's search user from micro-blog's user name, location, description entries.

Execution example:

select --table Users --match_columns name,location_str,description --query "New York" --output_columns _key,name
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], true]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
# [[0, 1337566253.89858, 0.000355720520019531], 8]
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# "bob",
# "Bob"
# ]
# ]
# ]
# ]

By using "New York" as searching keyword for user, "Bob" who lives in "New York" is listed
in search result.

Search users by geolocation data (GeoPoint)
In this section, we search users by column data which use type of GeoPoint. See search
about GeoPoint column.

Following example searches users who live in within 20km from specified location.

Execution example:

select --table Users --filter 'geo_in_circle(location,"146710080x-266315480",20000)' --output_columns _key,name
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# "charlie",
# "Charlie"
# ],
# [
# "bob",
# "Bob"
# ]
# ]
# ]
# ]

It shows that "Bob" and "Charlie" lives in within 20 km from station of "Grand Central
Terminal".

Search users who follows specific user
In this section, we do reverse resolution of reference relationships which is described at
index.

Following examples shows reverse resolution about follower column of Users table.

Execution example:

select --table Users --query follower:@bob --output_columns _key,name
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# "alice",
# "Alice"
# ],
# [
# "charlie",
# "Charlie"
# ]
# ]
# ]
# ]

It shows that "Alice" and "Charlie" follows "Bob".

Search comments by using the value of GeoPoint type
In this section, we search comments which are written within specific location.

Then, we also use drill down which is described at drilldown. Following example shows how
to drill down against search results. As a result, we get the value of count which is
grouped by user, and hash tags respectively.

Execution example:

select --table Comments --filter 'geo_in_circle(location,"146867000x-266280000",20000)' --output_columns posted_by.name,comment --drilldown hash_tags,posted_by
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "posted_by.name",
# "ShortText"
# ],
# [
# "comment",
# "ShortText"
# ]
# ],
# [
# "Charlie",
# "I'm at the Museum of Modern Art in NY now!"
# ],
# [
# "Bob",
# "I've just used 'Try-Groonga' now! #groonga"
# ],
# [
# "Bob",
# "I'm come at city of New York for development camp! #groonga #travel"
# ],
# [
# "Charlie",
# "@alice @bob I've tried to register!"
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "groonga",
# 2
# ],
# [
# "travel",
# 1
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "charlie",
# 2
# ],
# [
# "bob",
# 2
# ]
# ]
# ]
# ]

Above query searches comments which are posted within 20 km from Central Park in city of
New York.

As specified range is 20 km, all comments with location are collected. You know that
search results contain 2 #groonga hash tags and one #travel hash tag, and bob and charlie
posted 2 comments.

Search comments by keyword
In this section, we search comments which contains specific keyword. And more, Let's
calculate the value of _score which is described at search.

Execution example:

select --table Comments --query comment:@Now --output_columns comment,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "comment",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "I've just used 'Try-Groonga' now! #groonga",
# 1
# ],
# [
# "I'm at the Museum of Modern Art in NY now!",
# 1
# ]
# ]
# ]
# ]

By using 'Now' as a keyword, above query returns 2 comments. It also contains count of
'Now' as the value of _score.

Search comments by keyword and geolocation
In this section, we search comments by specific keyword and geolocation. By using --query
and --filter option, following query returns records which are matched to both conditions.

Execution example:

select --table Comments --query comment:@New --filter 'geo_in_circle(location,"146867000x-266280000",20000)' --output_columns posted_by.name,comment --drilldown hash_tags,posted_by
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "posted_by.name",
# "ShortText"
# ],
# [
# "comment",
# "ShortText"
# ]
# ],
# [
# "Bob",
# "I'm come at city of New York for development camp! #groonga #travel"
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "groonga",
# 1
# ],
# [
# "travel",
# 1
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "bob",
# 1
# ]
# ]
# ]
# ]

It returns 1 comment which meets both condition. It also returns result of drilldown.
There is 1 comment which is commented by Bob.

Search comments by hash tags
In this section, we search comments which contains specific hash tags. Let's use reverse
resolution of reference relationships.

Execution example:

select --table Comments --query hash_tags:@groonga --output_columns posted_by.name,comment --drilldown posted_by
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "posted_by.name",
# "ShortText"
# ],
# [
# "comment",
# "ShortText"
# ]
# ],
# [
# "Bob",
# "I've just used 'Try-Groonga' now! #groonga"
# ],
# [
# "Bob",
# "I'm come at city of New York for development camp! #groonga #travel"
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "bob",
# 2
# ]
# ]
# ]
# ]

Above query returns 2 comments which contains #groonga hash tag. It also returns result
of drilldown grouped by person who posted it. It shows that there are 2 comments. Bob
commented it.

Search comments by user id
In this section, we search comments which are posted by specific user.

Execution example:

select --table Comments --query posted_by:bob --output_columns comment --drilldown hash_tags
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "comment",
# "ShortText"
# ]
# ],
# [
# "First post. test,test..."
# ],
# [
# "@alice Thanks!"
# ],
# [
# "I've just used 'Try-Groonga' now! #groonga"
# ],
# [
# "I'm come at city of New York for development camp! #groonga #travel"
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "groonga",
# 2
# ],
# [
# "travel",
# 1
# ]
# ]
# ]
# ]

Above query returns 4 comments which are posted by Bob. It also returns result of
drilldown by hash tags. There are 2 comments which contains #groonga, and 1 comment which
contains #travel as hash tag.

Search user's favorite comments
In this section, we search user's favorite comments.

Execution example:

select --table Users --query _key:bob --output_columns favorites.posted_by,favorites.comment
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "favorites.posted_by",
# "Users"
# ],
# [
# "favorites.comment",
# "ShortText"
# ]
# ],
# [
# [
# "alice",
# "charlie"
# ],
# [
# "I've created micro-blog!",
# "@alice @bob I've tried to register!"
# ]
# ]
# ]
# ]
# ]

Above query returns Bob's favorite comments.

Search comments by posted time
In this section, we search comments by posted time. See type of Time in data.

Let's search comments that posted time are older than specified time.

Execution example:

select Comments --filter 'last_modified<=1268802000' --output_columns posted_by.name,comment,last_modified --drilldown hash_tags,posted_by
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "posted_by.name",
# "ShortText"
# ],
# [
# "comment",
# "ShortText"
# ],
# [
# "last_modified",
# "Time"
# ]
# ],
# [
# "Alice",
# "I've created micro-blog!",
# 1268795100.0
# ],
# [
# "Bob",
# "First post. test,test...",
# 1268794800.0
# ],
# [
# "Alice",
# "@bob Welcome!!!",
# 1268795100.0
# ],
# [
# "Bob",
# "@alice Thanks!",
# 1268798400.0
# ],
# [
# "Bob",
# "I've just used 'Try-Groonga' now! #groonga",
# 1268802000.0
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "groonga",
# 1
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "alice",
# 2
# ],
# [
# "bob",
# 3
# ]
# ]
# ]
# ]

Above query returns 5 comments which are older than 2010/03/17 14:00:00. It also returns
result of drilldown by posted person. There are 2 comments by Alice, 3 comments by Bob.

Query expansion
Groonga accepts query_expander parameter for /reference/commands/select command. It
enables you to extend your query string.

For example, if user searches "theatre" instead of "theater", query expansion enables to
return search results of "theatre OR theater". This kind of way reduces search leakages.
This is what really user wants.

Preparation
To use query expansion, you need to create table which stores documents, synonym table
which stores query string and replacement string. In synonym table, primary key
represents original string, the column of ShortText represents modified string.

Let's create document table and synonym table.

Execution example:

table_create Doc TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Doc body COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Term TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Term Doc_body COLUMN_INDEX|WITH_POSITION Doc body
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Synonym TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Synonym body COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Doc
[
{"_key": "001", "body": "Play all night in this theater."},
{"_key": "002", "body": "theatre is British spelling."},
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
load --table Synonym
[
{"_key": "theater", "body": "(theater OR theatre)"},
{"_key": "theatre", "body": "(theater OR theatre)"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

In this case, it doesn't occur search leakage because it creates synonym table which
accepts "theatre" and "theater" as query string.

Search
Then, let's use prepared synonym table. First, use select command without query_expander
parameter.

Execution example:

select Doc --match_columns body --query "theater"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 1,
# "001",
# "Play all night in this theater."
# ]
# ]
# ]
# ]
select Doc --match_columns body --query "theatre"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 2,
# "002",
# "theatre is British spelling."
# ]
# ]
# ]
# ]

Above query returns the record which completely equal to query string.

Then, use query_expander parameter against body column of Synonym table.

Execution example:

select Doc --match_columns body --query "theater" --query_expander Synonym.body
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 1,
# "001",
# "Play all night in this theater."
# ],
# [
# 2,
# "002",
# "theatre is British spelling."
# ]
# ]
# ]
# ]
select Doc --match_columns body --query "theatre" --query_expander Synonym.body
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "body",
# "ShortText"
# ]
# ],
# [
# 1,
# "001",
# "Play all night in this theater."
# ],
# [
# 2,
# "002",
# "theatre is British spelling."
# ]
# ]
# ]
# ]

In which cases, query string is replaced to "(theater OR theatre)", thus synonym is
considered for full text search.

SERVER


Server packages
The package groonga is the minimum set of fulltext search engine. If you want to use
groonga for server use, you can install additional preconfigured packages.

There are two packages for server use.

· groonga-httpd (nginx and HTTP protocol based server package)

· groonga-server-gqtp (/spec/gqtp protocol based server package)

There is the reason why groonga supports not only GQTP but also two HTTP server packages.
/spec/gqtp - Groonga Query Transfer Protocol is desined to reduce overheads and improve
performance. But, GQTP is less support of client library than HTTP protocol does. As HTTP
is matured protocol, you can take advantage of existing tool and there are many client
library (See related projects for details). If you use groonga-httpd package, you can
also take benefits of nginx functionality.

We recommend to use groonga-httpd at first, because it provides fullfilling server
functionality. If you have performance issues which is derived from protocol overheads,
consider to use groonga-server-gqtp.

NOTE:
In the previous versions, there is a groonga-server-http package (simple HTTP
protocol based server package). It is now marked as obsolete, please use
groonga-httpd packages instead. groonga-server-http package became a transitional
package for groonga-httpd.

groonga-httpd
groonga-httpd is a nginx and HTTP protocol based server package.

Preconfigured setting:

┌───────────────────┬───────────────────────────────────────┐
│Item │ Default value │
├───────────────────┼───────────────────────────────────────┤
│Port number │ 10041 │
├───────────────────┼───────────────────────────────────────┤
│Access log path │ /var/log/groonga/httpd/acccess.log │
├───────────────────┼───────────────────────────────────────┤
│Error log path │ /var/log/groonga/http-query.log │
├───────────────────┼───────────────────────────────────────┤
│Database │ /var/lib/groonga/db/* │
├───────────────────┼───────────────────────────────────────┤
│Configuration file │ /etc/groonga/httpd/groonga-httpd.conf │
└───────────────────┴───────────────────────────────────────┘

Start HTTP server
Starting groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-httpd start

Starting groonga HTTP server(Fedora):

% sudo systemctl start groonga-httpd

Stop HTTP server
Stopping groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-httpd stop

Starting groonga HTTP server(Fedora):

% sudo systemctl stop groonga-httpd

Restart HTTP server
Restarting groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-httpd restart

Restarting groonga HTTP server(Fedora):

% sudo systemctl restart groonga-httpd

groonga-server-gqtp
groonga-server-gqtp is a /spec/gqtp protocol based server package.

┌────────────┬───────────────────────────────────┐
│Item │ Default value │
├────────────┼───────────────────────────────────┤
│Port number │ 10043 │
├────────────┼───────────────────────────────────┤
│process-log │ /var/log/groonga/groonga-gqtp.log │
├────────────┼───────────────────────────────────┤
│query-log │ /var/log/groonga/gqtp-query.log │
├────────────┼───────────────────────────────────┤
│Database │ /var/lib/groonga/db/* │
└────────────┴───────────────────────────────────┘

Configuration file for server setting (Debian/Ubuntu):

/etc/default/groonga/groonga-server-gqtp

Configuration file for server setting (CentOS):

/etc/sysconfig/groonga-server-gqtp

Start GQTP server
Starting groonga GQTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-server-gqtp start

Starting groonga GQTP server(Fedora):

% sudo systemctl start groonga-server-gqtp

Stop GQTP server
Stopping groonga GQTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-server-http stop

Stopping groonga GQTP server(Fedora):

% sudo systemctl stop groonga-server-gqtp

Restart GQTP server
Restarting groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-server-gqtp restart

Restarting groonga HTTP server(Fedora):

% sudo systemctl restart groonga-server-gqtp

groonga-server-http
groonga-server-http is a simple HTTP protocol based server package.

NOTE:
groonga-server-http package is the transitional package since Groonga 4.0.8.
Please use groonga-httpd instead.

Preconfigured setting:

┌────────────┬───────────────────────────────────┐
│Item │ Default value │
├────────────┼───────────────────────────────────┤
│Port number │ 10041 │
├────────────┼───────────────────────────────────┤
│process-log │ /var/log/groonga/groonga-http.log │
├────────────┼───────────────────────────────────┤
│query-log │ /var/log/groonga/http-query.log │
├────────────┼───────────────────────────────────┤
│Database │ /var/lib/groonga/db/* │
└────────────┴───────────────────────────────────┘

Configuration file for server setting (Debian/Ubuntu):

/etc/default/groonga/groonga-server-http

Configuration file for server setting (CentOS):

/etc/sysconfig/groonga-server-http

Start HTTP server
Starting groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-server-http start

Starting groonga HTTP server(Fedora):

% sudo systemctl start groonga-server-http

Stop HTTP server
Stopping groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-server-http stop

Stopping groonga HTTP server(Fedora):

% sudo systemctl stop groonga-server-http

Restart HTTP server
Restarting groonga HTTP server(Debian/Ubuntu/CentOS):

% sudo service groonga-server-http restart

Restarting groonga HTTP server(Fedora):

% sudo systemctl restart groonga-server-http

HTTP
Groonga provides two HTTP server implementations.

· http/groonga

· http/groonga-httpd

http/groonga is a simple implemntation. It is fast but doesn't have many HTTP features. It
is convenient to try Groonga because it requires just a few command line options to run.

http/groonga-httpd is a nginx based implementation. It is also fast and has many HTTP
features.

Comparison
There are many differences between groonga and groonga-httpd. Here is a comparison table.

┌─────────────────────────┬────────────────────────┬──────────────────────┐
│ │ groonga │ groonga-httpd │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Performance │ o │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Using multi CPU cores │ o (by multi threading) │ o (by multi process) │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Configuration file │ optional │ required │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Custom prefix path │ x │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Custom command version │ o │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Multi databases │ x │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Authentication │ x │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Gzip compression │ x │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│POST │ o │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│HTTPS │ x │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Access log │ x │ o │
├─────────────────────────┼────────────────────────┼──────────────────────┤
│Upgrading without │ x │ o │
│downtime │ │ │
└─────────────────────────┴────────────────────────┴──────────────────────┘

Performance
Both groonga and groonga-httpd are very fast. They can work with the same throughput.

Using multi CPU cores
Groonga scales on multi CPU cores. groonga scales by multi threading. groonga-httpd scales
by multi processes.

groonga uses the same number of threads as CPU cores by default. If you have 8 CPU cores,
8 threads are used by default.

groonga-httpd uses 1 process by default. You need to set worker_processes directive to use
CPU cores. If you have 8 CPU cores, specify worker_processes 8 in configuration file like
the following:

worker_processes 8;

http {
# ...
}

Configuration file
groonga can work without configuration file. All configuration items such as port number
and the max number of threads can be specified by command line. Configuration file is also
used to specify configuration items.

It's very easy to run groonga HTTP server because groonga requires just a few options to
run. Here is the most simple command line to start HTTP server by groonga:

% groonga --protocol http -d /PATH/TO/DATABASE

groonga-httpd requires configuration file to run. Here is the most simple configuration
file to start HTTP server by groonga-httpd:

events {
}

http {
server {
listen 10041;

location /d/ {
groonga on;
groonga_database /PATH/TO/DATABASE;
}
}
}

Custom prefix path
groonga accepts a path that starts with /d/ as command URL such as
http://localhost:10041/d/status. You cannot change the prefix path /d/.

groonga-httpd can custom prefix path. For example, you can use
http://localhost:10041/api/status as command URL. Here is a sample configuration to use
/api/ as prefix path:

events {
}

http {
server {
listen 10041;

location /api/ { # <- change this
groonga on;
groonga_database /PATH/TO/DATABASE;
}
}
}

Custom command version
Groonga has /reference/command/command_version mechanism. It is for upgrading groonga
commands with backward compatibility.

groonga can change the default command veresion by --default-command-version option. Here
is a sample command line to use command version 2 as the default command version:

% groonga --protocol http --default-command-version 2 -d /PATH/TO/DATABASE

groonga-httpd cannot custom the default command version yet. But it will be supported
soon. If it is supported, you can provides different command version groonga commands in
the same groonga-httpd process. Here is a sample configuration to provide command version
1 commands under /api/1/ and command version 2 comamnds under /api/2/:

events {
}

http {
server {
listen 10041;

groonga_database /PATH/TO/DATABASE;

location /api/1/ {
groonga on;
groogna_default_command_version 1;
}

location /api/2/ {
groonga on;
groogna_default_command_version 2;
}
}
}

Multi databases
groonga can use only one database in a process.

groonga-httpd can use one or more databases in a process. Here is a sample configuration
to provide /tmp/db1 database under /db1/ path and /tmp/db2 database under /db2/ path:

events {
}

http {
server {
listen 10041;

location /db1/ {
groonga on;
groonga_database /tmp/db1;
}

location /db2/ {
groonga on;
groonga_database /tmp/db2;
}
}
}

Authentication
HTTP supports authentications such as basic authentication and digest authentication. It
can be used for restricting use of danger command such as /reference/commands/shutdown.

groonga doesn't support any authentications. To restrict use of danger command, other
tools such as iptables and reverse proxy are needed.

groonga-httpd supports basic authentication. Here is a sample configuration to restrict
use of /reference/commands/shutdown command:

events {
}

http {
server {
listen 10041;

groonga_database /PATH/TO/DATABASE;

location /d/shutdown {
groonga on;
auth_basic "manager is required!";
auth_basic_user_file "/etc/managers.htpasswd";
}

location /d/ {
groonga on;
}
}
}

Gzip compression
HTTP supports response compression by gzip with Content-Encoding: gzip response header. It
can reduce network flow. It is useful for large search response.

groonga doesn't support compression. To support compression, reverse proxy is needed.

groonga-httpd supports gzip compression. Here is a sample configuration to compress
response by gzip:

events {
}

http {
server {
listen 10041;

groonga_database /PATH/TO/DATABASE;

location /d/ {
groonga on;
gzip on;
gzip_types *;
}
}
}

Note that gzip_types * is specified. It's one of the important configuration. gzip_types
specifies gzip target data formats by MIME types. groonga-httpd returns one of JSON, XML
or MessagePack format data. But those formats aren't included in the default value of
gzip_types. The default value of gzip_types is text/html.

To compress response data from groonga-httpd by gzip, you need to specify gzip_types * or
gzip_types application/json text/xml application/x-msgpack explicitly. gzip_types * is
recommended. There are two reasons for it. The first, groonga may support more formats in
the future. The second, all requests for the location are processed by groonga. You don't
need to consider about other modules.

POST
You can load your data by POST JSON data. You need follow the following rules to use
loading by POST.

· Content-Type header value must be application/json.

· JSON data is sent as body.

· Table name is specified by query parameter such as table=NAME.

Here is an example curl command line that loads two users alice and bob to Users table:

% curl --data-binary '[{"_key": "alice"}, {"_key": "bob"}]' -H "Content-Type: application/json" "http://localhost:10041/d/load?table=Users"

HTTPS
TODO

Access log
TODO

Upgrading without downtime
TODO

groonga
TODO

groonga-httpd
TODO

GQTP
Summary
GQTP is the acronym standing for "Groonga Query Transfer Protocol".

GQTP is a protocol designed for Groonga. It's a stateful protocol. You can send multiple
commands in one session.

GQTP will be faster rather than /server/http when you send many light commands like
/reference/commands/status. GQTP will be almost same performance as HTTP when you send
heavy commands like /reference/commands/select.

We recommend that you use HTTP for many cases. Because there are many HTTP client
libraries.

If you want to use GQTP, you can use the following libraries:

· Ruby: groonga-client

· Python: poyonga

· Go: goroo

· PHP: proonga

· C/C++: Groonga (Groonga can be also used as library)

It's not a library but you can use /reference/executables/groonga as a GQTP client.

How to run
/reference/executables/groonga is a GQTP server implementation. You can run a Groonga
server by the following command line:

groonga --protocol gqtp -s [options] DB_PATH

You can run a Groonga server as a daemon by the following command line:

groonga --protocol gqtp -d [options] DB_PATH

See /reference/executables/groonga for available options.

Memcached binary protocol
Groonga supports the memcached binary protocol. The following form shows how to run
Groonga as a memcached binary protocol server daemon.

Form:

groonga [-p PORT_NUMBER] -d --protocol memcached DB_PATH

The --protocol option and its argument specify the protocol of the server. "memcached"
specifies to use the memcached binary protocol.

You don't need to create a table. When Groonga receives a request, it creates a table
automatically. The table name will be Memcache .

CLIENT


Groonga supports the original protocol (/spec/gqtp), the memcached binary protocol and
HTTP.

As HTTP and memcached binary protocol is matured protocol, you can use existing client
libraries.

There are some client libraries which provides convenient API to connect to Groonga server
in some program languages. See Client libraries for details.

REFERENCE MANUAL


Executables
This section describes executable files provided by groonga package.

grndb
Summary
NOTE:
This executable command is an experimental feature.

New in version 4.0.9.

grndb manages a Groonga database.

Here are features:

· Checks whether database is broken or not.

· Recovers broken database automatically if the database is recoverable.

Syntax
grndb requires command and database path:

grndb COMMAND [OPTIONS] DATABASE_PATH

Here are available commands:

· check - Checks whether database is broken or not.

· recover - Recovers database.

Usage
Here is an example to check the database at /var/lib/groonga/db/db:

% grndb check /var/lib/groonga/db/db

Here is an example to recover the database at /var/lib/groonga/db/db:

% grndb recover /var/lib/groonga/db/db

Commands
This section describes available commands.

check
It checks an existing Groonga database. If the database is broken, grndb reports reasons
and exits with non-0 exit status.

NOTE:
You must not use this command for opened database. If the database is opened, this
command may report wrong result.

check has some options.

--target
New in version 5.1.2.

It specifies a check target object.

If your database is large and you know an unreliable object, this option will help you.
check need more time for large database. You can reduce check time by --target option to
reduce check target.

The check target is checked recursive. Because related objects of unreliable object will
be unreliable.

If the check target is a table, all columns of the table are also checked recursive.

If the check target is a table and its key type is another table, the another table is
also checked recursive.

If the check target is a column and its value type is a table, the table is also checked
recursive.

If the check target is an index column, the table specified as value type and all sources
are also checked recursive.

Here is an example that checks only Entries table and its columns:

% grndb check --target Entries /var/lib/groonga/db/db

Here is an example that checks only Entries.name column:

% grndb check --target Entries.name /var/lib/groonga/db/db

recover
It recovers an existing broken Groonga database.

If the database is not broken, grndb does nothing and exits with 0 exit status.

If the database is broken and one or more index columns are only broken, grndb recovers
these index columns and exists with 0 exit status. It may take a long time for large
indexed data.

If the database is broken and tables or data columns are broken, grndb reports broken
reasons and exits with non-0 exit status. You can know whether the database is recoverable
or not by check command.

NOTE:
You must not use this command for opened database. If the database is opened, this
command may break the database.

grnslap
名前
grnslap - groongaプロセスの通信層のパフォーマンスをチェックするツール

書式
grnslap [options] [dest]

説明
grnslapは、groongaプロセスに対してリクエストを多重に行い、パフォーマンスをチェックするためのツールです。

Groonga独自プロトコルであるGQTPと、httpの両プロトコルでリクエストを行うことができます。また、リクエストの多重度を指定することができます。

クエリの内容を標準入力から与えることができます。実稼動環境でのクエリパタンに近いクエリを標準入力に与えることによって、実稼動環境に近い状態での検証を行うことができます。

現在は、make installしてもインストールは行われない。

オプション
-P リクエストのプロトコルを指定します。

http
httpでリクエストします。対象のhttpのパス群(GETパラメータを含む)をLF区切り形式で標準入力に与えると、それらのパスに順次アクセスします。

gqtp
gqtpでリクエストします。gqtpのリクエストをLF区切り形式で標準入力に与えると、それらのリクエストを順次行います。

-m リクエストの多重度を指定します。初期値は10です。

引数
dest 接続先のホスト名とポート番号をを指定します(デフォルト値は'localhost:10041')。ポート番号を指定しない場合には、10041が指定されたものとします。

サンプル
http://localhost:10041/d/status に、多重度100でリクエストを行う。

> yes /d/status | head -n 100 | grnslap -P http -m 100 localhost:10041
2009-11-12 19:34:09.998696|begin: max_concurrency=100 max_tp=10000
2009-11-12 19:34:10.011208|end : n=100 min=46 max=382 avg=0 qps=7992.966190 etime=0.012511

groonga executable file
Summary
groonga executable file provides the following features:

· Fulltext search server

· Fulltext search shell

· Client for Groonga fulltext search server

Groonga can be used as a library. If you want to use Groonga as a library, you need to
write a program in C, C++ and so on. Library use is useful for embedding fulltext search
feature to your application, but it's not easy to use.

You can use groonga executable file to get fulltext search feature.

If you want to try Groonga, fulltext search shell usage is useful. You don't need any
server and client. You just need one terminal. You can try Groonga like the following:

% groonga -n db
> status
[[0,1429687763.70845,0.000115633010864258],{"alloc_count":195,...}]
> quit
%

If you want to create an application that has fulltext search feature, fulltext search
server usage is useful. You can use Groonga as a server like RDBMS (Relational DataBase
Management System). Client-server model is a popular architecture.

Normally, client for Groonga fulltext server usage isn't used.

Syntax
groonga executable file has the following four modes:

· Standalone mode

· Server mode

· Daemon mode

· Client mode

There are common options in these modes. These common options is described later section.

Standalone mode
In standalone mode, groonga executable file runs one or more Groonga /reference/command
against a local Groonga database.

Here is the syntax to run shell that executes Groonga command against temporary database:

groonga [options]

Here is the syntax to create a new database and run shell that executes Groonga command
against the new database:

groonga [options] -n DB_PATH

Here is the syntax to run shell that executes Groonga command against existing database:

groonga [options] DB_PATH

Here is the syntax to run Groonga command against existing database and exit:

groonga [options] DB_PATH COMMAND [command arguments]

Server mode
In server mode, groonga executable file runs as a server. The server accepts connections
from other processes at local machine or remote machine and executes received Groonga
/reference/command against a local Groonga database.

You can choose one protocol from /server/http and /server/gqtp. Normally, HTTP is suitable
but GQTP is the default protocol. This section describes only about HTTP protocol usage.

In server mode, groonga executable file runs in the foreground. If you want to run Groonga
server in the background, see Daemon mode.

Here is the syntax to run Groonga server with temporary database:

groonga [options] --protocol http -s

Here is the syntax to create a new database and run Groonga server with the new database:

groonga [options] --protocol http -s -n DB_PATH

Here is the syntax to run Groonga server with existing database:

groonga [options] --protocol http -s DB_PATH

Daemon mode
In daemon mode, groonga executable file runs as a daemon. Daemon is similar to server but
it runs in the background. See Server mode about server.

Here is the syntax to run Groonga daemon with temporary database:

groonga [options] --protocol http -d

Here is the syntax to create a new database and run Groonga daemon with the new database:

groonga [options] --protocol http -d -n DB_PATH

Here is the syntax to run Groonga daemon with existing database:

groonga [options] --protocol http -d DB_PATH

--pid-path option will be useful for daemon mode.

Client mode
In client mode, groonga executable file runs as a client for GQTP protocol Groonga server.
Its usage is similar to Standalone mode. You can run shell and execute one command. You
need to specify server address instead of local database.

Note that you can use groonga executable file as a client for HTTP protocol Groonga
server.

Here is the syntax to run shell that executes Groonga command against Groonga server that
is running at 192.168.0.1:10043:

groonga [options] -c --host 192.168.0.1 --port 10043

Here is the syntax to run Groonga command against Groonga server that is running at
192.168.0.1:10043 and exit:

groonga [options] -c --host 192.168.0.1 --port 10043 COMMAND [command arguments]

Options
-n Creates new database.

-c Executes groonga command in client mode.

-s Executes groonga command in server mode. Use "Ctrl+C" to stop the groonga process.

-d Executes groonga command in daemon mode. In contrast to server mode, groonga
command forks in daemon mode. For example, to stop local daemon process, use "curl
http://127.0.0.1:10041/d/shutdown".

-e, --encoding <encoding>
Specifies encoding which is used for Groonga database. This option is effective
when you create new Groonga database. This parameter specifies one of the
following values: none, euc, utf8, sjis, latin or koi8r.

-l, --log-level <log level>
Specifies log level. A integer value between 0 and 8. The meaning of value is:

┌──────────┬─────────────┐
│log level │ description │
├──────────┼─────────────┤
│0 │ Nothing │
├──────────┼─────────────┤
│1 │ Emergency │
├──────────┼─────────────┤
│2 │ Alert │
├──────────┼─────────────┤
│3 │ Critical │
├──────────┼─────────────┤
│4 │ Error │
├──────────┼─────────────┤
│5 │ Warning │
├──────────┼─────────────┤
│6 │ Notice │
├──────────┼─────────────┤
│7 │ Info │
├──────────┼─────────────┤
│8 │ Debug │
└──────────┴─────────────┘

-a, --address <ip/hostname>
Deprecated since version 1.2.2: Use --bind-address instead.

--bind-address <ip/hostname>
New in version 1.2.2.

サーバモードかデーモンモードで実行するとき、listenするアドレスを指定します。(デフォルトは
hostname の返すホスト名)

-p, --port <port number>
クライアント、サーバ、またはデーモンモードで使用するTCPポート番号。
(クライアントモードのデフォルトは10043番、サーバ、またはデーモンモードのデフォルトは、HTTPの場合、10041番、GQTPの場合、10043番)

-i, --server-id <ip/hostname>
サーバモードかデーモンモードで実行するとき、サーバのIDとなるアドレスを指定します。(デフォルトは`hostname`の返すホスト名)

-h, --help
ヘルプメッセージを出力します。

--document-root <path>
httpサーバとしてgroongaを使用する場合に静的ページを格納するディレクトリを指定します。

デフォルトでは、データベースを管理するための汎用的なページに対応するファイルが/usr/share/groonga/admin_html以下にインストールされます。このディレクトリをdocument-rootオプションの値に指定して起動した場合、ウェブブラウザでhttp://hostname:port/index.htmlにアクセスすると、ウェブベースのデータベース管理ツールを使用できます。

--protocol <protocol>
http,gqtpのいずれかを指定します。(デフォルトはgqtp)

--log-path <path>
ログを出力するファイルのパスを指定します。(デフォルトは/var/log/groonga/groonga.logです)

--log-rotate-threshold-size <threshold>
New in version 5.0.3.

Specifies threshold for log rotation. Log file is rotated when log file size is
larger than or equals to the threshold (default: 0; disabled).

--query-log-path <path>
クエリーログを出力するファイルのパスを指定します。(デフォルトでは出力されません)

--query-log-rotate-threshold-size <threshold>
New in version 5.0.3.

Specifies threshold for query log rotation. Query log file is rotated when query
log file size is larger than or equals to the threshold (default: 0; disabled).

-t, --max-threads <max threasd>
最大で利用するスレッド数を指定します。(デフォルトはマシンのCPUコア数と同じ数です)

--pid-path <path>
PIDを保存するパスを指定します。(デフォルトでは保存しません)

--config-path <path>
設定ファイルのパスを指定します。設定ファイルは以下のようなフォーマットになります。:

# '#'以降はコメント。
; ';'以降もコメント。

# 'キー = 値'でオプションを指定。
pid-path = /var/run/groonga.pid

# '='の前後の空白はは無視される。↓は↑と同じ意味。
pid-path=/var/run/groonga.pid

# 'キー'は'--XXX'スタイルのオプション名と同じものが使える。
# 例えば、'--pid-path'に対応するキーは'pid-path'。
# ただし、キーが'config-path'のオプションは無視される。

--cache-limit <limit>
キャッシュ数の最大値を指定します。(デフォルトは100です)

--default-match-escalation-threshold <threshold>
検索の挙動をエスカレーションする閾値を指定します。(デフォルトは0です)

Command line parameters
dest 使用するデータベースのパス名を指定します。

クライアントモードの場合は接続先のホスト名とポート番号を指定します(デフォルト値は'localhost:10043')。ポート番号を指定しない場合には、10043が指定されたものとします。

command [args]
スタンドアロンおよびクライアントモードの場合は、実行するコマンドとその引数をコマンドライン引数に指定できます。コマンドライン引数にcommandを与えなかった場合は、標準入力から一行ずつEOFに達するまでコマンド文字列を読み取り、順次実行します。

Command
groongaコマンドを通してデータベースを操作する命令をコマンドと呼びます。コマンドは主にC言語で記述され、groongaプロセスにロードすることによって使用できるようになります。
それぞれのコマンドは一意な名前と、0個以上の引数を持ちます。

引数は以下の2種類の方法のいずれかで指定することができます。:

形式1: コマンド名 値1 値2,..

形式2: コマンド名 --引数名1 値1 --引数名2 値2,..

形式1でコマンドを実行する場合は、定義された順番で値を指定しなければならず、途中の引数の値を省略することはできません。形式2でコマンドを実行する場合は、「--引数名」のように引数の名前を明示しなければならない代わりに、任意の順番で引数を指定することが可能で、途中の引数の指定を省略することもできます。

標準入力からコマンド文字列を与える場合は、コマンド名と引数名と値は、空白(
)で区切ります。空白や、記号「"'()」のうちいずれかを含む値を指定したい場合は、シングルクォート(')かダブルクォート(")で値を囲みます。値として指定する文字列の中では、改行文字は'n'に置き換えて指定します。また、引用符に使用した文字を値の中で指定する場合には、その文字の前にバックスラッシュ('')
を指定します。バックスラッシュ文字自身を値として指定する場合には、その前にバックスラッシュを指定します。

You can write command list with continuous line which is represented by '\' character.:

table_create --name Terms \
--flags TABLE_PAT_KEY \
--key_type ShortText \
--default_tokenizer TokenBigram

Builtin command
以下のコマンドは組み込みコマンドとして予め定義されています。

status groongaプロセスの状態を表示します。

table_list
DBに定義されているテーブルのリストを表示します。

column_list
テーブルに定義されているカラムのリストを表示します。

table_create
DBにテーブルを追加します。

column_create
テーブルにカラムを追加します。

table_remove
DBに定義されているテーブルを削除します。

column_remove
テーブルに定義されているカラムを削除します。

load テーブルにレコードを挿入します。

select テーブルに含まれるレコードを検索して表示します。

define_selector
検索条件をカスタマイズした新たな検索コマンドを定義します。

quit データベースとのセッションを終了します。

shutdown
サーバ(デーモン)プロセスを停止します。

log_level
ログ出力レベルを設定します。

log_put
ログ出力を行います。

clearlock
ロックを解除します。

Usage
新しいデータベースを作成します。:

% groonga -n /tmp/hoge.db quit
%

作成済みのデータベースにテーブルを定義します。:

% groonga /tmp/hoge.db table_create Table 0 ShortText
[[0]]
%

サーバを起動します。:

% groonga -d /tmp/hoge.db
%

httpサーバとして起動します。:

% groonga -d -p 80 --protocol http --document-root /usr/share/groonga/admin_html /tmp/hoge.db
%

サーバに接続し、テーブル一覧を表示します。:

% groonga -c localhost table_list
[[0],[["id","name","path","flags","domain"],[256,"Table","/tmp/hoge.db.0000100",49152,14]]]
%

groonga-benchmark
名前
groonga-benchmark - groongaテストプログラム

書式
groonga-benchmark [options...] [script] [db]

説明
groonga-benchmarkは、groonga汎用ベンチマークツールです。

groongaを単独のプロセスとして利用する場合はもちろん、サーバプログラムとして利用する場合の動作確認や実行速度測定が可能です。

groonga-benchmark用のデータファイルは自分で作成することも既存のものを利用することもできます。既存のデータファイルは、ftp.groonga.orgから必要に応じダウンロードします。そのため、groonga及びgroonga-benchmarkが動作し、インターネットに接続できる環境であればgroongaコマンドの知識がなくてもgroongaの動作を確認できます。

現在は、Linux 及びWindows上で動作します。make installしてもインストールは行われません。

オプション
-i, --host <ip/hostname>
接続するgroongaサーバを、ipアドレスまたはホスト名で指定します。指定先にgroongaサーバが立ち上がっていない場合、接続不能となることに注意してください。このオプションを指定しない場合、groonga-benchmarkは自動的にlocalhostのgroongaサーバを起動して接続します。

-p, --port <port number>
自動的に起動するgroongaサーバ、または明示的に指定した接続先のgroonga
サーバが利用するポート番号を指定します。接続先のgroongaサーバが利用しているポートと、このオプションで指定したポート番号が異なる場合、接続不能となることに注意してください。

--dir ftp.groonga.org に用意されているスクリプトファイルを表示します。

--ftp ftp.groonga.orgとFTP通信を行い、scriptファイルの同期やログファイルの送信を行います。

--log-output-dir
デフォルトでは、groonga-benchmark終了後のログファイルの出力先ははカレントディレクトリです。このオプションを利用すると、任意のディレクトリに出力先を変更することができます。

--groonga <groonga_path>
groongaコマンドのパスを指定します。デフォルトでは、PATHの中からgroongaコマンドを探します。

--protocol <gqtp|http>
groongaコマンドが使うプロトコルとして gqtp または http を指定します。

引数
script groonga-benchmarkの動作方法(以下、groonga-benchmark命令と呼びます)を記述したテキストファイルです。拡張子は.scrです。

db groonga-benchmarkが利用するgroonga
データベースです。指定されたデータベースが存在しない場合、groonga-benchmarkが新規に作成します。またgroonga
サーバを自動的に起動する場合もこの引数で指定したデータベースが利用されます。接続するgroonga
サーバを明示的に指定した場合に利用するデータベースは、接続先サーバが使用中のデータベースになることに注意してください。

使い方
まず、シェル上(Windowsならコマンドプロンプト上)で:

groonga-benchmark test.scr 任意のDB名

とタイプしてください。もしgroonga-benchmarkが正常に動作すれば、:

test-ユーザ名-数字.log

というファイルが作成されるはずです。作成されない場合、このドキュメントの「トラブルシューティング」の章を参照してください。

スクリプトファイル
スクリプトファイルは、groonga-benchmark命令を記述したテキストファイルです。
";"セミコロンを利用して、一行に複数のgroonga-benchmark命令を記述することができます。一行に複数のgroonga-benchmark命令がある場合、各命令は並列に実行されます。
"#"で始まる行はコメントとして扱われます。

groonga-benchmark命令
現在サポートされているgroonga-benchmark命令は以下の11種類です。
do_local コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroonga-benchmark単体で実行します。スレッド数が指定されている場合、複数のスレッドで同じコマンドファイルを同時に実行します。繰り返し数が指定されてい場合、コマンドファイルの内容を繰り返し実行します。スレッド数、繰り返し数とも省略時は1です。1スレッドで複数回動作させたい場合は、do_local
コマンドファイル 1 [繰り返し数]と明示的に指定してください。

do_gqpt コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroongaサーバでGQTP経由で実行します。スレッド数や繰り返し数の意味はdo_localの場合と同じです。

do_http コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroongaサーバでHTTP経由で実行します。スレッド数や繰り返し数の意味はdo_localの場合と同じです。

rep_local コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroonga-benchmark単体で実行し、より詳細な報告を行います。

rep_gqpt コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroongaサーバでGQTP経由で実行し、より詳細な報告を行います。
スレッド数や繰り返し数の意味はdo_localと 同じです。

rep_http コマンドファイル [スレッド数] [繰り返し数]
コマンドファイルをgroongaサーバでHTTP経由で実行し、より詳細な報告を行います。
スレッド数や繰り返し数の意味はdo_localと 同じです。

out_local コマンドファイル 入力ファイル名
コマンドファイルをgroonga-benchmark単体で実行し、各コマンドの実行結果をすべて”出力ファイル"に書きだします。この結果は、test_local, test_gqtp命令で利用します。なおこの命令の「出力ファイル」とは、groonga-benchmark実行時に自動的に作成されるログとは別のものです。groonga-benchmarkではコメントが利用できる以外、:

groonga < コマンドファイル > 出力ファイル

とした場合と同じです。

out_gqtp コマンドファイル 出力ファイル名
コマンドファイルをgroongaサーバでGQTP経由で実行します。その他はout_local命令と同等です。

out_http コマンドファイル 出力ファイル名
コマンドファイルをgroongaサーバでHTTP経由で実行します。その他はout_local命令と同等です。

test_local コマンドファイル 入力ファイル名
コマンドファイルをgroonga-benchmark単体で実行し、各コマンドの実行結果を入力ファイルと比較します。処理時間など本質的要素以外に差分があった場合、差分を、入力ファイル.diffというファイルに書きだします。

コマンドファイル
コマンドファイルは、groonga組み込みコマンドを1行に1つずつ記述したテキストファイルです。拡張子に制限はありません。groonga組み込みコマンドに関しては
/reference/command を参照してください。

サンプル
スクリプトファイルのサンプルです。:

# sample script
rep_local test.ddl
do_local test.load;
do_gqtp test.select 10 10; do_local test.status 10

上記の意味は以下のとおりです。

1行目 コメント行。

2行目 test.ddl というコマンドファイルをgroonga単体で実行し、詳細に報告する。

3行目 test.load
というコマンドファイルをgroonga単体で実行する。(最後の";"セミコロンは複数のgroonga-benchmark命令を記述する場合に必要ですが、この例のように1つのgroonga-benchmark命令を実行する場合に付与しても問題ありません。)

4行目 test.select
というコマンドファイルをgroongaサーバで10個のスレッドで同時に実行する。各スレッドはtest.selectの中身を10回繰り返す。また同時に、groonga単体でtest.statusというコマンドファイルを10個のスレッドで実行する。

特殊命令
スクリプトファイルのコメント行には特殊コマンドを埋め込むことが可能です。現在サポートされている特殊命令は以下の二つです。

#SET_HOST <ip/hostname>
-i,
--hostオプションと同等の機能です。コマンドラインオプションに指定したIPアドレス/ホスト名と、SET_HOSTで指定したIPアドレス/ホスト名が異なる場合、またコマンドラインオプションを指定しなかった場合にもSET_HOSTが優先されます。SET_HOSTを利用した場合、サーバが自動的には起動されないのもコマンドラインオプションで指定した場合と同様です。

#SET_PORT <port number>
-p, --port
オプションと同等の機能です。コマンドラインオプションに指定したポート番号とSET_PORTで指定したポート番号が異なる場合、またコマンドラインオプションを指定しなかった場合にもSET_PORTが優先されます。

特殊命令はスクリプトファイルの任意の場所に書き込むことができます。同一ファイル内に複数回特殊命令を記述した場合、「最後の」特殊命令が有効となります。

例えば、

$ ./groonga-benchmark --port 20010 test.scr testdb

とコマンド上でポートを指定した場合でも、もしtest.scrの中身が

#SET_PORT 10900
rep_local test.ddl
do_local test.load;
rep_gqtp test.select 10 10; rep_local test.status 10
#SET_PORT 10400

であれば、自動的に起動されるgroongaサーバはポート番号10400を利用します。

groonga-benchmark実行結果
groonga-benchmarkが正常に終了すると、(拡張子を除いた)スクリプト名-ユーザ名-実行開始時刻.logという形式のログファイルがカレントディレクトリに作られます。ログファイルは自動的にftp.groonga.org
に送信されます。ログファイルは以下のようなjson形式のテキストです。

[{"script": "test.scr",
"user": "homepage",
"date": "2010-04-14 22:47:04",
"CPU": Intel(R) Pentium(R) 4 CPU 2.80GHz",
"BIT": 32,
"CORE": 1,
"RAM": "975MBytes",
"HDD": "257662232KBytes",
"OS": "Linux 2.4.20-24.7-i686",
"HOST": "localhost",
"PORT": "10041",
"VERSION": "0.1.8-100-ga54c5f8"
},
{"jobs": "rep_local test.ddl",
"detail": [
[0, "table_create res_table --key_type ShortText", 1490, 3086, [0,1271252824.25846,0.00144
7]],
[0, "column_create res_table res_column --type Text", 3137, 5956, [0,1271252824.2601,0.002
741]],
[0, "column_create res_table user_column --type Text", 6020, 8935, [0,1271252824.26298,0.0
02841]],
[0, "column_create res_table mail_column --type Text", 8990, 11925, [0,1271252824.26595,0.
002861]],
[0, "column_create res_table time_column --type Time", 12008, 13192, [0,1271252824.26897,0
.001147]],
[0, "status", 13214, 13277, [0,1271252824.27018,3.0e-05]],
[0, "table_create thread_table --key_type ShortText", 13289, 14541, [0,1271252824.27025,0.
001213]],
[0, "column_create thread_table thread_title_column --type ShortText", 14570, 17380, [0,12
71252824.27153,0.002741]],
[0, "status", 17435, 17480, [0,1271252824.2744,2.7e-05]],
[0, "table_create lexicon_table --flags 129 --key_type ShortText --default_tokenizer Token
Bigram", 17491, 18970, [0,1271252824.27446,0.001431]],
[0, "column_create lexicon_table inv_res_column 514 res_table res_column ", 18998, 33248,
[0,1271252824.27596,0.01418]],
[0, "column_create lexicon_table inv_thread_column 514 thread_table thread_title_column ",
33285, 48472, [0,1271252824.29025,0.015119]],
[0, "status", 48509, 48554, [0,1271252824.30547,2.7e-05]]],
"summary" :[{"job": "rep_local test.ddl", "latency": 48607, "self": 47719, "qps": 272.4281
73, "min": 45, "max": 15187, "queries": 13}]},
{"jobs": "do_local test.load; ",
"summary" :[{"job": "do_local test.load", "latency": 68693, "self": 19801, "qps": 1010.049
997, "min": 202, "max": 5453, "queries": 20}]},
{"jobs": "do_gqtp test.select 10 10; do_local test.status 10",
"summary" :[{"job": " do_local test.status 10", "latency": 805990, "self": 737014, "qps":
54.273053, "min": 24, "max": 218, "queries": 40},{"job": "do_gqtp test.select 10 10", "lat
ency": 831495, "self": 762519, "qps": 1967.164097, "min": 73, "max": 135631, "queries": 15
00}]},
{"total": 915408, "qps": 1718.359464, "queries": 1573}]

制限事項
· スクリプトファイルの一行には複数のgroonga-benchmark命令を記述できますが、すべてのスレッド数の合計は最大64までに制限されます。

· コマンドファイル中のgroongaコマンドの長さは最長5000000byteです。

トラブルシューティング
もし、groonga-benchmarkが正常に動作しない場合、まず以下を確認してください。

· インターネットに接続しているか? --ftp
オプションを指定すると、groonga-benchmarkは動作のたびにftp.groonga.orgと通信します。ftp.groonga.orgと通信可能でない場合、groonga-benchmarkは正常に動作しません。

· groonga サーバが動作していないか? groonga-benchmarkは、-i, --host
オプションで明示的にサーバを指定しないかぎり、自動的にlocalhostのgroongaサーバを立ち上げます。すでにgroongaサーバが動作している場合、groonga-benchmarkは正常に動作しない可能性があります。

· 指定したDBが適切か?
groonga-benchmarkは、引数で指定したDBの中身はチェックしません。もし指定されたDBが存在しなければ自動的にDBを作成しますが、もしファイルとして存在する場合は中身に関わらず動作を続けてしまい、結果が異常になる可能性があります。

以上の原因でなければ、問題はgroonga-benchmarkかgroongaにあります。ご報告をお願いします。

groonga-httpd
Summary
groonga-httpd is a program to communicate with a Groonga server using the HTTP protocol.
It functions as same as groonga-server-http. Although groonga-server-http has limited
support for HTTP with a minimal built-in HTTP server, groonga-httpd has full support for
HTTP with an embedded nginx. All standards-compliance and features provided by nginx is
also available in groonga-httpd.

groonga-httpd has an Web-based administration tool implemented with HTML and JavaScript.
You can access to it from http://hostname:port/.

Synopsis
groonga-httpd [nginx options]

Usage
Set up
First, you'll need to edit the groonga-httpd configuration file to specify a database.
Edit /etc/groonga/httpd/groonga-httpd.conf to enable the groonga_database directive like
this:

# Match this to the file owner of groonga database files if groonga-httpd is
# run as root.
#user groonga;
...
http {
...
# Don't change the location; currently only /d/ is supported.
location /d/ {
groonga on; # <= This means to turn on groonga-httpd.

# Specify an actual database and enable this.
groonga_database /var/lib/groonga/db/db;
}
...
}

Then, run groonga-httpd. Note that the control immediately returns back to the console
because groonga-httpd runs as a daemon process by default.:

% groonga-httpd

Request queries
To check, request a simple query (/reference/commands/status).

Execution example:

% curl http://localhost:10041/d/status
[
[
0,
1337566253.89858,
0.000355720520019531
],
{
"uptime": 0,
"max_command_version": 2,
"n_queries": 0,
"cache_hit_rate": 0.0,
"version": "4.0.1",
"alloc_count": 161,
"command_version": 1,
"starttime": 1395806036,
"default_command_version": 1
}
]

Loading data by POST
You can load data by POST JSON data.

Here is an example curl command line that loads two users alice and bob to Users table:

% curl --data-binary '[{"_key": "alice"}, {"_key": "bob"}]' -H "Content-Type: application/json" "http://localhost:10041/d/load?table=Users"

If you loads users from JSON file, prepare JSON file like this:

[
{"_key": "alice"},
{"_key": "bob"}
]

Then specify JSON file in curl command line:

% curl -X POST 'http://localhost:10041/d/load?table=Users' -H 'Content-Type: application/json' -d @users.json

Browse the administration tool
Also, you can browse Web-based administration tool at http://localhost:10041/.

Shut down
Finally, to terminate the running groonga-httpd daemon, run this:

% groonga-httpd -s stop

Configuration directives
This section describes only important directives. They are groonga-httpd specific
directives and performance related directives.

The following directives can be used in the groonga-httpd configuration file. By default,
it's located at /etc/groonga/httpd/groonga-httpd.conf.

Groonga-httpd specific directives
The following directives aren't provided by nginx. They are provided by groonga-httpd to
configure groonga-httpd specific configurations.

groonga
Synopsis:

groonga on | off;

Default
groonga off;

Context
location

Specifies whether Groonga is enabled in the location block. The default is off. You need
to specify on to enable groonga.

Examples:

location /d/ {
groonga on; # Enables groonga under /d/... path
}

location /d/ {
groonga off; # Disables groonga under /d/... path
}

groonga_database
Synopsis:

groonga_database /path/to/groonga/database;

Default
groonga_database /usr/local/var/lib/groonga/db/db;

Context
http, server, location

Specifies the path to a Groonga database. This is the required directive.

groonga_database_auto_create
Synopsis:

groonga_database_auto_create on | off;

Default
groonga_database_auto_create on;

Context
http, server, location

Specifies whether Groonga database is created automatically or not. If the value is on and
the Groonga database specified by groonga_database doesn't exist, the Groonga database is
created automatically. If the Groonga database exists, groonga-httpd does nothing.

If parent directory doesn't exist, parent directory is also created recursively.

The default value is on. Normally, the value doesn't need to be changed.

groonga_base_path
Synopsis:

groonga_base_path /d/;

Default
The same value as location name.

Context
location

Specifies the base path in URI. Groonga uses /d/command?parameter1=value1&... path to run
command. The form of path in used in groonga-httpd but groonga-httpd also supports
/other-prefix/command?parameter1=value1&... form. To support the form, groonga-httpd
removes the base path from the head of request URI and prepend /d/ to the processed
request URI. By the path conversion, users can use custom path prefix and Groonga can
always uses /d/command?parameter1=value1&... form.

Nomally, this directive isn't needed. It is needed for per command configuration.

Here is an example configuration to add authorization to /reference/commands/shutdown
command:

groonga_database /var/lib/groonga/db/db;

location /d/shutdown {
groonga on;
# groonga_base_path is needed.
# Because /d/shutdown is handled as the base path.
# Without this configuration, /d/shutdown/shutdown path is required
# to run shutdown command.
groonga_base_path /d/;
auth_basic "manager is required!";
auth_basic_user_file "/etc/managers.htpasswd";
}

location /d/ {
groonga on;
# groonga_base_path doesn't needed.
# Because location name is the base path.
}

groonga_log_path
Synopsis:

groonga_log_path path | off;

Default
/var/log/groonga/httpd/groonga.log

Context
http, server, location

Specifies Groonga log path in the http, server or location block. The default is
/var/log/groonga/httpd/groonga.log. You can disable logging to specify off.

Examples:

location /d/ {
groonga on;
# You can disable log for groonga.
groonga_log_path off;
}

groonga_log_level
Synopsis:

groonga_log_level none | emergency | alert | ciritical | error | warning | notice | info | debug | dump;

Default
notice

Context
http, server, location

Specifies Groonga log level in the http, server or location block. The default is notice.
You can disable logging by specifying none as log level.

Examples:

location /d/ {
groonga on;
# You can customize log level for groonga.
groonga_log_level notice;
}

groonga_query_log_path
Synopsis:

groonga_query_log_path path | off;

Default
/var/log/groonga/httpd/groonga-query.log

Context
http, server, location

Specifies Groonga's query log path in the http, server or location block. The default is
/var/log/groonga/httpd/groonga-query.log. You can disable logging to specify off.

Examples:

location /d/ {
groonga on;
# You can disable query log for groonga.
groonga_query_log_path off;
}

Query log is useful for the following cases:

· Detecting slow query.

· Debugging.

You can analyze your query log by groonga-query-log package. The package provides useful
tools.

For example, there is a tool that analyzing your query log. It can detect slow queries
from your query log. There is a tool that replaying same queries in your query log. It can
test the new Groonga before updating production environment.

Performance related directives
The following directives are related to the performance of groonga-httpd.

worker_processes
For optimum performance, set this to be equal to the number of CPUs or cores. In many
cases, Groonga queries may be CPU-intensive work, so to fully utilize multi-CPU/core
systems, it's essential to set this accordingly.

This isn't a groonga-httpd specific directive, but an nginx's one. For details, see
http://wiki.nginx.org/CoreModule#worker_processes.

By default, this is set to 1. It is nginx's default.

groonga_cache_limit
This directive is introduced to customize cache limit for each worker process.

Synopsis:

groonga_cache_limit CACHE_LIMIT;

Default
100

Context
http, server, location

Specifies Groonga's limit of query cache in the http, server or location block. The
default value is 100. You can disable query cache to specify 0 to groonga_cache_limit
explicitly.

Examples:

location /d/ {
groonga on;
# You can customize query cache limit for groonga.
groonga_cache_limit 100;
}

proxy_cache
In short, you can use nginx's reverse proxy and cache mechanism instead of Groonga's
built-in query cache feature.

Query cache
Groonga has query cache feature for /reference/commands/select command. The feature
improves performance in many cases.

Query cache feature works well on groonga-httpd except you use
/reference/commands/cache_limit command on 2 or more workers. Normally,
/reference/commands/cache_limit command isn't used. So there is no problem on many cases.

Here is a description about a problem of using /reference/commands/cache_limit command on
2 or more workers.

Groonga's query cache is available in the same process. It means that workers can't share
the cache. If you don't change cache size, it isn't a big problem. If you want to change
cache size by /reference/commands/cache_limit command, there is a problem.

There is no portable ways to change cache size for all workers.

For example, there are 3 workers:

+-- worker 1
client -- groonga-httpd (master) --+-- worker 2
+-- worker 3

The client requests /reference/commands/cache_limit command and the worker 1 receives it:

+-> worker 1 (changed!)
client -> groonga-httpd (master) --+-- worker 2
+-- worker 3

The client requests /reference/commands/cache_limit command again and the worker 1
receives it again:

+-> worker 1 (changed again!!!)
client -> groonga-httpd (master) --+-- worker 2
+-- worker 3

In this case, the worker 2 and the worker 3 aren't received any requests. So they don't
change cache size.

You can't choose a worker. So you can't change cache sizes of all workers by
/reference/commands/cache_limit command.

Reverse proxy and cache
You can use nginx's reverse proxy and cache feature for query cache:

+-- worker 1
client -- groonga-httpd (master) -- reverse proxy + cache --+-- worker 2
+-- worker 3

You can use the same cache configuration for all workers but you can't change cache
configuration dynamically by HTTP.

Here is a sample configuration:

...
http {
proxy_cache_path /var/cache/groonga-httpd levels=1:2 keys_zone=groonga:10m;
proxy_cache_valid 10m;
...
# Reverse proxy and cache
server {
listen 10041;
...
# Only select command
location /d/select {
# Pass through groonga with cache
proxy_cache groonga;
proxy_pass http://localhost:20041;
}

location / {
# Pass through groonga
proxy_pass http://localhost:20041;
}
}

# groonga
server {
location 20041;
location /d/ {
groonga on;
groonga_database /var/lib/groonga/db/db;
}
}
...
}

See the following nginx documentations for parameter details:

· http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path

· http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_valid

· http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache

· http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass

Note that you need to remove cache files created by nginx by hand after you load new data
to Groonga. For the above sample configuration, run the following command to remove cache
files:

% groonga DB_PATH < load.grn
% rm -rf /var/cache/groonga-httpd/*

If you use Groonga's query cache feature, you don't need to expire cache by hand. It is
done automatically.

Available nginx modules
All standard HTTP modules are available. HttpRewriteModule is disabled when you don't have
PCRE (Perl Compatible Regular Expressions). For the list of standard HTTP modules, see
http://wiki.nginx.org/Modules.

Groonga HTTP server
Name
Groonga HTTP server

Synopsis
groonga -d --protocol http DB_PATH

Summary
You can communicate by HTTP if you specify http to --protocol option. And output a file
that is put under the path, and correspond to specified URI to HTTP request if you specify
static page path by --document-root.

Groonga has an Web-based administration tool implemented with HTML and JavaScript. If you
don't specify --document-root, regarded as administration tool installed path is
specified, so you can use administration tool to access http://HOSTNAME:PORT/ in Web
browser.

Command
You can use the same commands of Groonga that starts of the other mode to Groonga server
that starts to specify http.

A command takes the arguments. An argument has a name. And there are special arguments
output_type and command_version.

In standalone mode or client mode, a command is specified by the following format.
Format 1: COMMAND_NAME VALUE1 VALUE2,..

Format 2: COMMAND_NAME --PARAMETER_NAME1 VALUE1 --PARAMETER_NAME2 VALUE2,..

Format 1 and Format 2 are possible to mix. Output type is specified by output_type in the
formats.

In HTTP server mode, the following formats to specify command:

Format: /d/COMMAND_NAME.OUTPUT_TYPE?ARGUMENT_NAME1=VALUE1&ARGUMENT_NAME2=VALUE2&...

But, they need URL encode for command names, arguments names and values.

You can use GET method only.

You can specify JSON, TSV and XML to output type.

command_version is specified for command specification compatibility. See
/reference/command/command_version for details.

Return value
The execution result is output that follows output type specification by the command.

groonga-suggest-create-dataset
NAME
groonga-suggest-create-dataset - Defines schema for a suggestion dataset

SYNOPSTIS
groonga-suggest-create-dataset [options] DATABASE DATASET

DESCTIPION
groonga-suggest-create-dataset creates a dataset for /reference/suggest. A database has
many datasets. This command just defines schema for a suggestion dataset.

This command generates some tables and columns for /reference/suggest.

Here is the list of such tables. If you specify 'query' as dataset name, following
'_DATASET' suffix are replaced. Thus, 'item_query', 'pair_query', 'sequence_query',
'event_query' tables are generated.

· event_type

· bigram

· kana

· item_DATASET

· pair_DATASET

· sequence_DATASET

· event_DATASET

· configuration

OPTIONS
None.

EXIT STATUS
TODO

FILES
TODO

EXAMPLE
TODO

SEE ALSO
/reference/suggest groonga-suggest-httpd groonga-suggest-learner

groonga-suggest-httpd
Summary
groonga-suggest-httpd is a program to provide interface which accepts HTTP request and
returns suggestion dataset, then saves logs for learning. groonga-suggest-httpd behaves
similar in point of view of suggestion functionality, but the name of parameter is
different.

Synopsis
groonga-suggest-httpd [options] database_path

Usage
Set up
First you need to set up database for suggestion.

Execution example:

% groonga-suggest-create-dataset /tmp/groonga-databases/groonga-suggest-httpd query

Launch groonga-suggest-httpd
Execute groonga-suggest-httpd command:

Execution example:

% groonga-suggest-httpd /tmp/groonga-databases/groonga-suggest-httpd

After executing above command, groonga-suggest-httpd accepts HTTP request on 8080 port.

If you just want to save requests into log file, use -l option.

Here is the example to save log files under logs directory with log prefix for each file.:

% groonga-suggest-httpd -l logs/log /tmp/groonga-databases/groonga-suggest-httpd

Under logs directory, log files such as logYYYYmmddHHMMSS-00 are created.

Request to groonga-suggest-httpd
Here is the sample requests to learn groonga for query dataset:

% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=92619&t=complete&q=g'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=93850&t=complete&q=gr'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=94293&t=complete&q=gro'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=94734&t=complete&q=groo'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=95147&t=complete&q=grooon'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=95553&t=complete&q=groonga'
% curl 'http://localhost:8080/?i=127.0.0.1&l=query&s=95959&t=submit&q=groonga

Options
-p, --port
Specify http server port number. The default value is 8080.

-t, --n-threads
Specify number of threads. The default value is 8. This option accepts 128 as the
max value, but use the number of CPU cores for performance.

-s, --send-endpoint
Specify endpoint for sender.

-r, --receive-endpoint
Specify endpoint for receiver.

-l, --log-base-path
Specify path prefix of log.

--n-lines-per-log-file
Specify the number of lines in a log file. The default value is 1,000,000.

-d, --daemon
Specify this option to daemonize.

--disable-max-fd-check
Specify this option to disable checking max fd on start.

Command line parameters
There is one required parameter - database_path.

database_path
Specifies the path to a Groonga database. This database must be created by
groonga-suggest-create-dataset command because it executes required initialization for
suggestion.

GET parameters
groonga-suggest-httpd accepts following GET parameters.

There are required parameters which depends on type of query.

Required parameters
┌────┬──────────────────────────┬──────┐
│Key │ Description │ Note │
├────┼──────────────────────────┼──────┤
│q │ UTF-8 encoded string │ │
│ │ which user fills in form │ │
├────┼──────────────────────────┼──────┤
│t │ The type of query. The │ │
│ │ value of type must be │ │
│ │ complete, correct, │ │
│ │ suggest or submit. It │ │
│ │ also accepts multiple │ │
│ │ type of query which is │ │
│ │ concatinated by |. Note │ │
│ │ that submit is invalid │ │
│ │ value when you specify │ │
│ │ multiple type of query. │ │
└────┴──────────────────────────┴──────┘

Required parameters for learning
┌────┬──────────────────────────┬──────────────────────────┐
│Key │ Description │ Note │
├────┼──────────────────────────┼──────────────────────────┤
│s │ Elapsed time from 0:00 │ Note that you need │
│ │ January 1, 1970 │ specify the value of s
│ │ │ in milliseconds │
└────┴──────────────────────────┴──────────────────────────┘

│i │ Unique ID to distinct │ Use session ID or IP │
│ │ user │ address for example │
├────┼──────────────────────────┼──────────────────────────┤
│l │ Specify the name of │ Note that dataset name │
│ │ dataset for learning. It │ must be matched to │
│ │ also accepts multiple │ following regular │
│ │ dataset name which is │ expression [A-Za-z
│ │ concatinated by |][A-Za-z0-9 ]{0,15}
└────┴──────────────────────────┴──────────────────────────┘

Required parameters for suggestion
┌────┬──────────────────────────┬──────────────────────────┐
│Key │ Description │ Note │
├────┼──────────────────────────┼──────────────────────────┤
│n │ Specify the name of │ This dataset name is │
│ │ dataset for suggestion │ used to calculate │
│ │ │ suggestion results │
└────┴──────────────────────────┴──────────────────────────┘

Optional parameter
┌─────────┬──────────────────────────┬──────────────────────────┐
│Key │ Description │ Note │
├─────────┼──────────────────────────┼──────────────────────────┤
│callback │ Specify the name of │ The name of function │
│ │ function if you prefer │ must be matched to │
│ │ JSONP as response format │ reqular expression │
│ │ │ [A-Za-z ][A-Za-z0-9
│ │ │ ]{0,15}
└─────────┴──────────────────────────┴──────────────────────────┘

Return value
groonga-suggest-httpd command returns following response in JSON or JSONP format.

In JSON format:

{TYPE: [[CANDIDATE_1, SCORE_1], [CANDIDATE_2, SCORE_2], ... [CANDIDATE_N, SCORE_N]]}

In JSONP format:

FUNCTION({TYPE: [[CANDIDATE_1, SCORE_1], [CANDIDATE_2, SCORE_2], ... [CANDIDATE_N, SCORE_N]]})

TYPE
One of complete, correct and suggest.

CANDIDATE_N
The string of candidate (UTF-8).

SCORE_N
The score.

groonga-suggest-learner
Summary
groonga-suggest-learner is a program to learn suggest result from data which derived from
groonga-suggest-httpd. Usually, it is used with groonga-suggest-httpd, but It is allowed
to launch standalone. In such a case, groonga-suggest-learner loads data from log
directory.

Synopsis
groonga-suggest-learner [options] database_path

Usage
groonga-suggest-leaner supports the two way of learning data. One is learning data from
groonga-suggest-httpd, the other is learning data from already existing log files.

Learning data from groonga-suggest-httpd
Execute groonga-suggest-learner.:

groonga-suggest-learner testdb/db

Learning data from log files
Execute groonga-suggest-learner with -l option.

Here is the sample to load log data under logs directory:

groonga-suggest-learner -l logs testdb/db

Options
-r <endpoint>, --receive-endpoint <endpoint>
Uses <endpoint> as the receiver endpoint.

-s <endpoint>, --send-endpoint <endpoint>
Uses <endpoint> as the sender endpoint.

-d, --daemon
Runs as a daemon.

-l <directory>, --log-base-path <directory>
Reads logs from <directory>.

--log-path <path>
Outputs log to <path>.

--log-level <level>
Uses <level> for log level. <level> must be between 1 and 9. Larger level outputs
more logs.

Parameters
There is one required parameter - database_path.

database_path
Specifies the path to a groonga database.

Related tables
Here is the list of table which learned data is stored. If you specify query as dataset
name, following _DATASET suffix are replaced. Thus, event_query table is used.

· event_DATASET

Output
Groonga supports the following output format types:

· JSON

· XML

· TSV (Tab Separated Values)

· MessagePack

JSON is the default output format.

Usage
Groonga has the following query interfaces:

· command line

· HTTP

They provides different ways to change output format type.

Command line
You can use command line query interface by groonga DB_PATH or groonga -c. Those groonga
commands shows > prompt. In this query interface, you can specify output format type by
output_type option.

If you don't specify output_type option, you will get a result in JSON format:

> status
[[0,1327721628.10738,0.000131845474243164],{"alloc_count":142,"starttime":1327721626,"uptime":2,"version":"1.2.9-92-gb87d9f8","n_queries":0,"cache_hit_rate":0.0,"command_version":1,"default_command_version":1,"max_command_version":2}]

You can specify json as output_type value to get a result in JSON format explicitly:

> status --output_type json
[[0,1327721639.08321,7.93933868408203e-05],{"alloc_count":144,"starttime":1327721626,"uptime":13,"version":"1.2.9-92-gb87d9f8","n_queries":0,"cache_hit_rate":0.0,"command_version":1,"default_command_version":1,"max_command_version":2}]

You need to specify xml as output_type value to get a result in XML format:

> status --output_type xml
<?xml version="1.0" encoding="utf-8"?>
<RESULT CODE="0" UP="1327721649.61095" ELAPSED="0.000126361846923828">
<RESULT>
<TEXT>alloc_count</TEXT>
<INT>146</INT>
<TEXT>starttime</TEXT>
<INT>1327721626</INT>
<TEXT>uptime</TEXT>
<INT>23</INT>
<TEXT>version</TEXT>
<TEXT>1.2.9-92-gb87d9f8</TEXT>
<TEXT>n_queries</TEXT>
<INT>0</INT>
<TEXT>cache_hit_rate</TEXT>
<FLOAT>0.0</FLOAT>
<TEXT>command_version</TEXT>
<INT>1</INT>
<TEXT>default_command_version</TEXT>
<INT>1</INT>
<TEXT>max_command_version</TEXT>
<INT>2</INT></RESULT>
</RESULT>

You need to specify tsv as output_type value to get a result in TSV format:

> status --output_type tsv
0 1327721664.82675 0.000113964080810547
"alloc_count" 146
"starttime" 1327721626
"uptime" 38
"version" "1.2.9-92-gb87d9f8"
"n_queries" 0
"cache_hit_rate" 0.0
"command_version" 1
"default_command_version" 1
"max_command_version" 2
END

You need to specify msgpack as output_type value to get a result in MessagePack format:

> status --output_type msgpack
(... omitted because MessagePack is binary data format. ...)

HTTP
You can use HTTP query interface by groonga --protocol http -s DB_PATH. Groonga HTTP
server starts on port 10041 by default. In this query interface, you can specify output
format type by extension.

If you don't specify extension, you will get a result in JSON format:

% curl http://localhost:10041/d/status
[[0,1327809294.54311,0.00082087516784668],{"alloc_count":155,"starttime":1327809282,"uptime":12,"version":"1.2.9-92-gb87d9f8","n_queries":0,"cache_hit_rate":0.0,"command_version":1,"default_command_version":1,"max_command_version":2}]

You can specify json as extension to get a result in JSON format explicitly:

% curl http://localhost:10041/d/status.json
[[0,1327809319.01929,9.5367431640625e-05],{"alloc_count":157,"starttime":1327809282,"uptime":37,"version":"1.2.9-92-gb87d9f8","n_queries":0,"cache_hit_rate":0.0,"command_version":1,"default_command_version":1,"max_command_version":2}]

You need to specify xml as extension to get a result in XML format:

% curl http://localhost:10041/d/status.xml
<?xml version="1.0" encoding="utf-8"?>
<RESULT CODE="0" UP="1327809339.5782" ELAPSED="9.56058502197266e-05">
<RESULT>
<TEXT>alloc_count</TEXT>
<INT>159</INT>
<TEXT>starttime</TEXT>
<INT>1327809282</INT>
<TEXT>uptime</TEXT>
<INT>57</INT>
<TEXT>version</TEXT>
<TEXT>1.2.9-92-gb87d9f8</TEXT>
<TEXT>n_queries</TEXT>
<INT>0</INT>
<TEXT>cache_hit_rate</TEXT>
<FLOAT>0.0</FLOAT>
<TEXT>command_version</TEXT>
<INT>1</INT>
<TEXT>default_command_version</TEXT>
<INT>1</INT>
<TEXT>max_command_version</TEXT>
<INT>2</INT></RESULT>
</RESULT>

You need to specify tsv as extension to get a result in TSV format:

% curl http://localhost:10041/d/status.tsv
0 1327809366.84187 8.44001770019531e-05
"alloc_count" 159
"starttime" 1327809282
"uptime" 84
"version" "1.2.9-92-gb87d9f8"
"n_queries" 0
"cache_hit_rate" 0.0
"command_version" 1
"default_command_version" 1
"max_command_version" 2
END

You need to specify msgpack as extension to get a result in MessagePack format:

% curl http://localhost:10041/d/status.msgpack
(... omitted because MessagePack is binary data format. ...)

Command
Command is the most important processing unit in query API. You request a processing to
groonga by a command.

This section describes about command and built-in commands.

Command version
概要
Groonga1.1からコマンドバージョンという概念が導入されます。コマンドバージョンは、selectやloadなどのGroongaのコマンドの仕様の互換性を表します。Groongaパッケージのバージョンが新しくなったとしても、同一のコマンドバージョンが使用可能であるなら、すべてのコマンドについて互換性が保証されます。コマンドバージョンが異なれば、同じ名前のコマンドであっても、動作に互換性がない可能性があります。

あるバージョンのGroongaは、二つのコマンドバージョンを同時にサポートするようになります。
使用するコマンドバージョンは、groongaを起動する際のコマンドラインオプションないしコンフィグファイルにdefault-commnad-versionパラメータを与えることによって指定できます。また、個々のコマンドを実行する際に、command_versionパラメータを与えることによっても指定することができます。

コマンドバージョンは1からはじまり、更新されるたびに1ずつ大きくなります。現状のGroongaのコマンドの仕様はcommand-version
1という扱いになります。次回提供するGroongaは、command-version 1とcommand-version
2の二つをサポートすることになります。

バージョンの位置づけ
あるバージョンのGroongaにおいてサポートされるコマンドバージョンは、develop,
stable,deprecatedのいずれかの位置づけとなります。

develop
まだ開発中であり、仕様が変更される可能性があります。

stable 使用可能であり仕様も安定しています。その時点で使用することが推奨されます。

deprecated
使用可能であり仕様も安定していますが、廃止予定であり使用が推奨されません。

あるバージョンのGroongaがサポートする二つのコマンドバージョンのうち、いずれか一つが必ずstableの位置づけとなります。残りの一つは、developないしdeprecatedとなります。

たとえば下記のようにGroongaのサポートするコマンドバージョンは推移します。:

groonga1.1: command-version1=stable command-version2=develop
groonga1.2: command-version1=deprecated command-version2=stable
groonga1.3: command-version2=stable command-version3=develop
groonga1.4: command-version2=deprecated command-version3=stable
groonga1.5: command-version3=stable command-version4=develop

あるコマンドバージョンははじめにdevelop扱いとしてリリースされ、やがてstableに移行します。
その後二世代経過するとそのコマンドバージョンはdeprecated扱いとなります。さらに次のコマンドバージョンがリリースされると、deprecatedだったコマンドバージョンはサポート対象外となります。

default-commnad-versionパラメータやcommand_versionパラメータを指定せずにgroongaコマンドを実行した際には、その時点でstableであるコマンドバージョンが指定されたものとみなします。

groongaプロセス起動時に、default-command-versionパラメータにstable扱いでないコマンドバージョンを指定した場合には、警告メッセージがログファイルに出力されます。また、サポート範囲外のコマンドバージョンを指定した場合にはエラーとなり、プロセスは速やかに停止します。

コマンドバージョンの指定方法
コマンドバージョンの指定方法はgroonga実行モジュールの引数として指定する方法と各コマンドの引数として指定する方法があります。

default-command-versionパラメータ
groonga実行モジュールの引数としてdefault-command-versionパラメータを指定できます。
(configファイルの中に指定することも可能です)

実行例:

groonga --default-command-version 1

そのプロセスで実行するすべてのコマンドについて、デフォルトのコマンドバージョンとして指定されたバージョンを使用します。指定されたコマンドバージョンがstableであった場合にはなんのメッセージも表示されずそのまま起動します。指定されたコマンドバージョンがdevelopあるいはdeprecatedであった場合には、groonga.logファイルに警告メッセージを出力します。指定されたコマンドバージョンがサポート対象外であった場合には標準エラー出力にエラーメッセージを出力し、プロセスは速やかに終了します。

command_versionパラメータ
select,loadなどのすべてのgroongaコマンドにcommand_versionが指定できます。

実行例:

select --command_version 1 --table tablename

指定されたコマンドバージョンでコマンドを実行します。指定されたコマンドバージョンがサポート対象外であった場合にはエラーが返されます。command-versionが指定されなかった場合は、当該プロセス起動時にdefault-command-versionに指定した値が指定されたものとみなします。

Output format
Summary
Commands output their result as JSON, MessagePack, XML or TSV format.

JSON and MessagePack output have the same structure. XML and TSV are their original
structure.

JSON or MessagePack is recommend format. XML is useful for visual result check. TSV is
just for special use. Normally you doesn't need to use TSV.

JSON and MessagePack
This secsion describes the structure of command result on JSON and MessagePack format.
JSON is used to show structure because MessagePack is binary format. Binary format isn't
proper for documenataion.

JSON and MessagePack uses the following structure:

[HEADER, BODY]

For example:

[
[
0,
1337566253.89858,
0.000355720520019531
],
[
[
[
1
],
[
[
"_id",
"UInt32"
],
[
"_key",
"ShortText"
],
[
"content",
"Text"
],
[
"n_likes",
"UInt32"
]
],
[
2,
"Groonga",
"I started to use groonga. It's very fast!",
10
]
]
]
]

In the example, the following part is HEADER:

[
0,
1337566253.89858,
0.000355720520019531
]

The following part is BODY:

[
[
[
1
],
[
[
"_id",
"UInt32"
],
[
"_key",
"ShortText"
],
[
"content",
"Text"
],
[
"n_likes",
"UInt32"
]
],
[
2,
"Groonga",
"I started to use groonga. It's very fast!",
10
]
]
]

HEADER
HEADER is an array. The content of HEADER has some patterns.

Success case
HEADER has three elements on success:

[0, UNIX_TIME_WHEN_COMMAND_IS_STARTED, ELAPSED_TIME]

The first element is always 0.

UNIX_TIME_WHEN_COMMAND_IS_STARTED is the number of seconds since 1970-01-01 00:00:00 UTC
when the command is started processing. ELAPSED_TIME is the elapsed time for processing
the command in seconds. Both UNIX_TIME_WHEN_COMMAND_IS_STARTED and ELAPSED_TIME are float
value. The precision of them are nanosecond.

Error case
HEADER has four or five elements on error:

[
RETURN_CODE,
UNIX_TIME_WHEN_COMMAND_IS_STARTED,
ELAPSED_TIME,
ERROR_MESSAGE,
ERROR_LOCATION
]

ERROR_LOCATION may not be included in HEADER but other four elements are always included.

RETURN_CODE is non 0 value. See return_code about available return codes.

UNIX_TIME_WHEN_COMMAND_IS_STARTED and ELAPSED_TIME are the same as success case.

ERROR_MESSAGE is an error message in string.

ERROR_LOCATION is optional. If error location is collected, ERROR_LOCATION is included.
ERROR_LOCATION is an array. ERROR_LOCATION has one ore two elements:

[
LOCATION_IN_GROONGA,
LOCATION_IN_INPUT
]

LOCATION_IN_GROONGA is the source location that error is occurred in groonga. It is useful
for groonga developers but not useful for users. LOCATION_IN_GROONGA is an array.
LOCATION_IN_GROONGA has three elements:

[
FUNCTION_NAME,
SOURCE_FILE_NAME,
LINE_NUMBER
]

FUNCTION_NAME is the name of function that error is occurred.

SOURCE_FILE_NAME is the name of groonga's source file that error is occurred.

LINE_NUMBER is the line number of SOURCE_FILE_NAME that error is occurred.

LOCATION_IN_INPUT is optional. LOCATION_IN_INPUT is included when the location that error
is occurred in input file is collected. Input file can be specified by --file command line
option for groonga command. LOCATION_IN_GROONGA is an array. LOCATION_IN_GROONGA has three
elements:

[
INPUT_FILE_NAME,
LINE_NUMBER,
LINE_CONTENT
]

INPUT_FILE_NAME is the input file name that error is occurred.

LINE_NUMBER is the line number of INPUT_FILE_NAME that error is occurred.

LINE_CONTENT is the content at LINE_NUMBER in INPUT_FILE_NAME.

BODY
BODY content depends on the executed command. It may be omitted.

BODY may be an error message on error case.

XML
TODO

TSV
TODO

See also
· return_code describes about return code.

Pretty print
Summary
New in version 5.1.0.

Groonga supports pretty print when you choose JSON for output_format.

Usage
Just specify yes to output_pretty parameter:

> status --output_pretty yes
[
[
0,
1448344438.43783,
5.29289245605469e-05
],
{
"alloc_count": 233,
"starttime": 1448344437,
"start_time": 1448344437,
"uptime": 1,
"version": "5.0.9-135-g0763d91",
"n_queries": 0,
"cache_hit_rate": 0.0,
"command_version": 1,
"default_command_version": 1,
"max_command_version": 2
}
]

Here is a result without output_pretty parameter:

> status
[[0,1448344438.43783,5.29289245605469e-05],{"alloc_count":233,"starttime":1448344437,...}]

Request ID
Summary
New in version 4.0.9.

You can assign ID to each request.

The ID can be used by canceling the request. See also /reference/commands/request_cancel
for details about canceling a request.

Request ID should be managed by user. If you assign the same ID for some running requests,
you can't cancel the request.

The simplest ID sequence is incremented numbers such as 1, 2 , ....

A request ID is a string. The maximum request ID size is 4096 byte.

How to assign ID to request
All commands accept request_id parameter. You can assign ID to request by adding
request_id parameter.

Here is an example to assign id-1 ID to a request:

select Users --request_id id-1

See also
· /reference/commands/request_cancel

Return code
Summary
Return code is used to show whether a processing is succeeded or not. If the processing is
not succeeded, return code shows error type.

Return code is used in C API and query API. You can check return code via grn_ctx_t::rc in
C API. You can check return code by looking the header element in query API. See
output_format about the header element in query API.

List
Here is a list of return codes. GRN_SUCCESS (= 0) means that the processing is succeeded.
Return codes that have negative value show error type. GRN_END_OF_DATA is a special return
code. It is used only C API. It is not showen in query API.

· 0: GRN_SUCCESS

· 1: GRN_END_OF_DATA

· -1: GRN_UNKNOWN_ERROR

· -2: GRN_OPERATION_NOT_PERMITTED

· -3: GRN_NO_SUCH_FILE_OR_DIRECTORY

· -4: GRN_NO_SUCH_PROCESS

· -5: GRN_INTERRUPTED_FUNCTION_CALL

· -6: GRN_INPUT_OUTPUT_ERROR

· -7: GRN_NO_SUCH_DEVICE_OR_ADDRESS

· -8: GRN_ARG_LIST_TOO_LONG

· -9: GRN_EXEC_FORMAT_ERROR

· -10: GRN_BAD_FILE_DESCRIPTOR

· -11: GRN_NO_CHILD_PROCESSES

· -12: GRN_RESOURCE_TEMPORARILY_UNAVAILABLE

· -13: GRN_NOT_ENOUGH_SPACE

· -14: GRN_PERMISSION_DENIED

· -15: GRN_BAD_ADDRESS

· -16: GRN_RESOURCE_BUSY

· -17: GRN_FILE_EXISTS

· -18: GRN_IMPROPER_LINK

· -19: GRN_NO_SUCH_DEVICE

· -20: GRN_NOT_A_DIRECTORY

· -21: GRN_IS_A_DIRECTORY

· -22: GRN_INVALID_ARGUMENT

· -23: GRN_TOO_MANY_OPEN_FILES_IN_SYSTEM

· -24: GRN_TOO_MANY_OPEN_FILES

· -25: GRN_INAPPROPRIATE_I_O_CONTROL_OPERATION

· -26: GRN_FILE_TOO_LARGE

· -27: GRN_NO_SPACE_LEFT_ON_DEVICE

· -28: GRN_INVALID_SEEK

· -29: GRN_READ_ONLY_FILE_SYSTEM

· -30: GRN_TOO_MANY_LINKS

· -31: GRN_BROKEN_PIPE

· -32: GRN_DOMAIN_ERROR

· -33: GRN_RESULT_TOO_LARGE

· -34: GRN_RESOURCE_DEADLOCK_AVOIDED

· -35: GRN_NO_MEMORY_AVAILABLE

· -36: GRN_FILENAME_TOO_LONG

· -37: GRN_NO_LOCKS_AVAILABLE

· -38: GRN_FUNCTION_NOT_IMPLEMENTED

· -39: GRN_DIRECTORY_NOT_EMPTY

· -40: GRN_ILLEGAL_BYTE_SEQUENCE

· -41: GRN_SOCKET_NOT_INITIALIZED

· -42: GRN_OPERATION_WOULD_BLOCK

· -43: GRN_ADDRESS_IS_NOT_AVAILABLE

· -44: GRN_NETWORK_IS_DOWN

· -45: GRN_NO_BUFFER

· -46: GRN_SOCKET_IS_ALREADY_CONNECTED

· -47: GRN_SOCKET_IS_NOT_CONNECTED

· -48: GRN_SOCKET_IS_ALREADY_SHUTDOWNED

· -49: GRN_OPERATION_TIMEOUT

· -50: GRN_CONNECTION_REFUSED

· -51: GRN_RANGE_ERROR

· -52: GRN_TOKENIZER_ERROR

· -53: GRN_FILE_CORRUPT

· -54: GRN_INVALID_FORMAT

· -55: GRN_OBJECT_CORRUPT

· -56: GRN_TOO_MANY_SYMBOLIC_LINKS

· -57: GRN_NOT_SOCKET

· -58: GRN_OPERATION_NOT_SUPPORTED

· -59: GRN_ADDRESS_IS_IN_USE

· -60: GRN_ZLIB_ERROR

· -61: GRN_LZO_ERROR

· -62: GRN_STACK_OVER_FLOW

· -63: GRN_SYNTAX_ERROR

· -64: GRN_RETRY_MAX

· -65: GRN_INCOMPATIBLE_FILE_FORMAT

· -66: GRN_UPDATE_NOT_ALLOWED

· -67: GRN_TOO_SMALL_OFFSET

· -68: GRN_TOO_LARGE_OFFSET

· -69: GRN_TOO_SMALL_LIMIT

· -70: GRN_CAS_ERROR

· -71: GRN_UNSUPPORTED_COMMAND_VERSION

See also
· output_format shows where return code is appeared in query API response.

· /spec/gqtp: GQTP protocol also uses return code as status but it uses 2byte unsigned
integer. So return codes that have negative value are statuses that have positive value
in GQTP protocol. You can convert status value in GQTP protocol to return code by
handling it as 2byte signed integer.

cache_limit
Summary
cache_limit gets or sets the max number of query cache entries. Query cache is used only
by select command.

If the max number of query cache entries is 100, the recent 100 select commands are only
cached. The cache expire algorithm is LRU (least recently used).

Syntax
This command takes only one optional parameter:

cache_limit [max=null]

Usage
You can get the current max number of cache entries by executing cache_limit without
parameter.

Execution example:

cache_limit
# [[0, 1337566253.89858, 0.000355720520019531], 100]

You can set the max number of cache entries by executing cache_limit with max parameter.

Here is an example that sets 10 as the max number of cache entries.

Execution example:

cache_limit 10
# [[0, 1337566253.89858, 0.000355720520019531], 100]
cache_limit
# [[0, 1337566253.89858, 0.000355720520019531], 10]

If max parameter is used, the return value is the max number of cache entries before max
parameter is set.

Parameters
This section describes all parameters.

max
Specifies the max number of query cache entries as a number.

If max parameter isn't specified, the current max number of query cache entries isn't
changed. cache_limit just returns the current max number of query cache entries.

Return value
cache_limit returns the current max number of query cache entries:

[HEADER, N_ENTRIES]

HEADER
See /reference/command/output_format about HEADER.

N_ENTRIES
N_ENTRIES is the current max number of query cache entries. It is a number.

See also
· select

check
Summary
check - オブジェクトの状態表示

Groonga組込コマンドの一つであるcheckについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

checkコマンドは、groongaプロセス内の指定したオブジェクトの状態を表示します。主にデータベースが壊れた場合など異常時の問題解決のために使用することを想定しています。デバッグ用のため、返値のフォーマットが安定しているということは保証されません。(フォーマットが変更される可能性が高い)

Syntax
check obj

Usage
テーブルTermsのインデックスカラムnameの状態を表示します。:

check Terms.name
[{"flags":"00008202",
"max sid":1,
"number of garbage segments":0,
"number of array segments":1,
"max id of array segment":1,
"number of buffer segments":110,
"max id of buffer segment":111,
"max id of physical segment in use":111,
"number of unmanaged segments":4294967185,
"total chunk size":7470239,
"max id of chunk segments in use":127,
"number of garbage chunk":[0,0,0,0,0,0,0,0,2,2,0,0,0,0,0]},
{"buffer id":0,
"chunk size":94392,
"buffer term":["596","59777","6",...],
"buffer free":152944,
"size in buffer":7361,
"nterms":237,
"nterms with chunk":216,
"buffer id":1,
"chunk size":71236,
"buffer term":[["に述",18149,18149,2,25,6,6],
["に追",4505,4505,76,485,136,174],
["に退",26568,26568,2,9,2,2],
...],
"buffer free":120000,
"size in buffer":11155,
"nterms":121,
"nterms with chunk":116},
{"buffer id":1,
...},
...]

Parameters
obj
状態を表示するオブジェクトの名前を指定します。

Return value
チェックするオブジェクトにより返される値が変わります。

インデックスカラムの場合:

下記のような配列が出力されます。

[インデックスの状態, バッファの状態1, バッファの状態2, ...]

インデックスの状態 には下記の項目がハッシュ形式で出力されます。
flags
指定されているフラグ値です。16進数で表現されています。

max sid
セグメントのうち最も大きなIDです。

number of garbage segments
ゴミセグメントの数です。

number of array segments
配列セグメントの数です。

max id of array segment
配列セグメントのうち最も大きなIDです。

number of buffer segments
バッファセグメントの数です。

max id of buffer segment
バッファセグメントのうち最も大きなIDです。

max id of physical segment in use
使用中の論理セグメントのうち最も大きなIDです。

number of unmanaged segments
管理されていないセグメントの数です。

total chunk size
チャンクサイズの合計です。

max id of chunk segments in use
使用中のチャンクセグメントのうち最も大きなIDです。

number of garbage chunk
各チャンク毎のゴミの数です。

バッファの状態 には下記の項目がハッシュ形式で出力されます。
buffer id
バッファIDです。

chunk size
チャンクのサイズです。

buffer term
バッファ内にある語の一覧です。各語の状態は以下のような配列となっています。
[語, バッファに登録されている語のID, 用語集に登録されている語のID,
バッファ内でのサイズ, チャンク内でのサイズ]

buffer free
バッファの空き容量です。

size in buffer
バッファの使用量です。

nterms
バッファ内にある語の数です。

nterms with chunk
バッファ内にある語のうち、チャンクを使っている語の数です。

clearlock
Summary
Deprecated since version 4.0.9: Use lock_clear instead.

clearlock - オブジェクトにセットされたロックを解除する

Groonga組込コマンドの一つであるclearlockについて説明します。組込コマンドは、groonga実行ファイルの引数、標準>入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

clearlockは、対象となるオブジェクト(データベース,テーブル,インデックス等)を指定し、オブジェクトにかけられた>ロックを再帰的に解除します。

Syntax
clearlock objname

Usage
開いているデータベースのロックをすべて解除する:

clearlock
[true]

テーブル名 Entry のカラム body のロックを解除する:

clearlock Entry.body
[true]

Parameters
objname
対象となるオブジェクト名を指定します。空の場合、開いているdbオブジェクトが対象となります。

Return value
[成功かどうかのフラグ]

成功かどうかのフラグ
エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。

See also
load

column_copy
Summary
New in version 5.0.7.

column_copy copies all column values to other column.

You can implement the following features with this command:

· Changing column configuration

· Changing table configuration

You can change column configuration by the following steps:

1. Create a new column with new configuration

2. Copy all values from the current column to the new column

3. Remove the current column

4. Rename the new column to the current column

You can change table configuration by the following steps:

1. Create a new table with new configuration

2. Create all same columns to the new table

3. Copy all column values from the current table to the new table

4. Remove the current table

5. Rename the new table to the current table

Concrete examples are showed later.

You can't copy column values from a TABLE_NO_KEY table to another table. And you can't
copy column values to a TABLE_NO_KEY table from another table. Because Groonga can't map
records without record key.

You can copy column values from a TABLE_NO_KEY table to the same TABLE_NO_KEY table.

You can copy column values from a TABLE_HASH_KEY / TABLE_PAT_KEY / TABLE_DAT_KEY table to
the same or another TABLE_HASH_KEY / TABLE_PAT_KEY / TABLE_DAT_KEY table.

Syntax
This command takes four parameters.

All parameters are required:

column_copy from_table
from_name
to_table
to_name

Usage
Here are use cases of this command:

· Changing column configuration

· Changing table configuration

How to change column configuration
You can change column value type. For example, you can change UInt32 column value to
ShortText column value.

You can change column type. For example, you can change COLUMN_SCALAR column to
COLUMN_VECTOR column.

You can move a column to other table. For example, you can move high_score column to Users
table from Players table.

Here are basic steps to change column configuration:

1. Create a new column with new configuration

2. Copy all values from the current column to the new column

3. Remove the current column

4. Rename the new column to the current column

Here is an example to change column value type to Int32 from ShortText.

Here are schema and data:

Execution example:

table_create Logs TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs serial COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Logs
[
{"_key": "log1", "serial": 1}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

The following commands change Logs.serial column value type to ShortText from Int32:

Execution example:

column_create Logs new_serial COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_copy Logs serial Logs new_serial
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_remove Logs serial
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_rename Logs new_serial serial
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Logs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "serial",
# "ShortText"
# ]
# ],
# [
# 1,
# "log1",
# "1"
# ]
# ]
# ]
# ]

You can find Logs.serial stores ShortText value from the response of select.

Here is an example to change column type to COLUMN_VECTOR from COLUMN_SCALAR.

Here are schema and data:

Execution example:

table_create Entries TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries tag COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key": "entry1", "tag": "Groonga"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

The following commands change Entries.tag column to COLUMN_VECTOR from COLUMN_SCALAR:

Execution example:

column_create Entries new_tag COLUMN_VECTOR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_copy Entries tag Entries new_tag
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_remove Entries tag
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_rename Entries new_tag tag
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Entries
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "entry1",
# [
# "Groonga"
# ]
# ]
# ]
# ]
# ]

You can find Entries.tag stores COLUMN_VECTOR value from the response of select.

Here is an example to move high_score column to Users table from Players table.

Here are schema and data:

Execution example:

table_create Players TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Players high_score COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Players
[
{"_key": "player1", "high_score": 100}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

The following commands move high_score column to Users table from Players table:

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users high_score COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_copy Players high_score Users high_score
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_remove Players high_score
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "high_score",
# "Int32"
# ]
# ],
# [
# 1,
# "player1",
# 100
# ]
# ]
# ]
# ]

You can find Users.high_score is moved from Players.high_score from the response of
select.

How to change table configuration
You can change table key type. For example, you can change key type to ShortText from
Int32.

You can change table type. For example, you can change TABLE_HASH_KEY table to
TABLE_PAT_KEY table.

You can also change other options such as default tokenizer and normalizer. For example,
you can change default tokenizer to TokenBigramSplitSymbolAlphaDigit from TokenBigrm.

NOTE:
You can't change TABLE_NO_KEY table. Because TABLE_NO_KEY doesn't have record key.
Groonga can't identify copy destination record without record key.

Here are basic steps to change table configuration:

1. Create a new table with new configuration

2. Create all same columns to the new table

3. Copy all column values from the current table to the new table

4. Remove the current table

5. Rename the new table to the current table

Here is an example to change table key type to ShortText from Int32.

Here are schema and data:

Execution example:

table_create IDs TABLE_HASH_KEY Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create IDs label COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create IDs used COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table IDs
[
{"_key": 100, "label": "ID 100", used: true}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

The following commands change IDs table key type to ShortText from Int32:

Execution example:

table_create NewIDs TABLE_HASH_KEY Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create NewIDs label COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create NewIDs used COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_copy IDs label NewIDs label
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_copy IDs used NewIDs used
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_remove IDs
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_rename NewIDs IDs
# [[0, 1337566253.89858, 0.000355720520019531], true]
select IDs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "Int32"
# ],
# [
# "label",
# "ShortText"
# ],
# [
# "used",
# "Bool"
# ]
# ],
# [
# 1,
# 100,
# "ID 100",
# true
# ]
# ]
# ]
# ]

You can find IDs stores ShortText key from the response of select.

Here is an example to change table type to TABLE_PAT_KEY from TABLE_HASH_KEY.

Here are schema and data:

Execution example:

table_create Names TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Names used COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key": "alice", "used": false}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

The following commands change Names table to TABLE_PAT_KEY from TABLE_HASH_KEY:

Execution example:

table_create NewNames TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create NewNames used COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_copy Names used NewNames used
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_remove Names
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_rename NewNames Names
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Names --filter '_key @^ "ali"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "used",
# "Bool"
# ]
# ]
# ]
# ]
# ]

You can find Names is a TABLE_PAT_KEY because select can use
script-syntax-prefix-search-operator. You can't use script-syntax-prefix-search-operator
with TABLE_HASH_KEY.

Parameters
This section describes parameters.

Required parameters
All parameters are required.

from_table
Specifies the table name of source column.

You can specify any table including TABLE_NO_KEY table.

If you specify TABLE_NO_KEY table, to_table must be the same table.

Here is an example to use from_table.

Here are schema and data:

Execution example:

table_create FromTable TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create FromTable from_column COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create FromTable to_column COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table FromTable
[
{"_key": "key1", "from_column": "value1"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select FromTable --output_columns _key,from_column,to_column
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "from_column",
# "ShortText"
# ],
# [
# "to_column",
# "ShortText"
# ]
# ],
# [
# "key1",
# "value1",
# ""
# ]
# ]
# ]
# ]

You can copy all values to to_column from from_column:

Execution example:

column_copy FromTable from_column FromTable to_column
# [[0, 1337566253.89858, 0.000355720520019531], true]
select FromTable --output_columns _key,from_column,to_column
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "from_column",
# "ShortText"
# ],
# [
# "to_column",
# "ShortText"
# ]
# ],
# [
# "key1",
# "value1",
# "value1"
# ]
# ]
# ]
# ]

from_name
Specifies the column name to be copied values.

See from_table for example.

to_table
Specifies the table name of destination column.

You can specify the same table name as from_table when you want to copy column values in
the same table.

You can't specify TABLE_NO_KEY table to to_table because Groonga can't identify
destination records without record key.

There is one exception. If you specify the same name as from_table to to_table, you can
use TABLE_NO_KEY table as to_table. Because Groonga can identify destination records when
source table and destination table is the same table.

Here is an example to use to_table.

Here are schema and data:

Execution example:

table_create Table TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Table column COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create ToTable TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create ToTable to_column COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Table
[
{"_key": "key1", "column": "value1"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

You can copy all values to ToTable.to_column from Table.column:

Execution example:

column_copy Table column ToTable to_column
# [[0, 1337566253.89858, 0.000355720520019531], true]
select ToTable --output_columns _key,to_column
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "to_column",
# "ShortText"
# ]
# ],
# [
# "key1",
# "value1"
# ]
# ]
# ]
# ]

to_name
Specifies the destination column name.

See to_table for example.

Optional parameters
There is no optional parameter.

Return value
The command returns true as body on success such as:

[HEADER, true]

If the command fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

column_create
Summary
column_create - カラムの追加

Groonga組込コマンドの一つであるcolumn_createについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

column_createは、使用しているデータベースのテーブルに対してカラムを追加します。

Syntax
column_create table name flags type [source]

Usage
テーブルEntryに、ShortText型の値を格納するカラム、bodyを作成します。:

column_create Entry body --type ShortText
[true]

テーブルTermに、Entryテーブルのbodyカラムの値を対象とする完全転置インデックス型カラム、entry_bodyを作成します。:

column_create Term entry_body COLUMN_INDEX|WITH_POSITION Entry body
[true]

Parameters
table
カラムを追加するテーブルの名前を指定します。

name
作成するカラムの名前を指定します。カラム名は、テーブルの中で一意でなければなりません。

ピリオド('.'),
コロン(':')を含む名前のカラムは作成できません。また、アンダースコア('_')で始まる名前は予約済みであり、使用できません。

flags
カラムの属性を表す以下の数値か、パイプ('|')で組み合わせたシンボル名を指定します。

0, COLUMN_SCALAR
単一の値が格納できるカラムを作成します。

1, COLUMN_VECTOR
複数の値の配列を格納できるカラムを作成します。

2, COLUMN_INDEX
インデックス型のカラムを作成します。

There are two flags to compress the value of column, but you can't specify these flags
for now because there are memory leaks issue GitHub#6 when refers the value of column.
This issue occurs both of them (zlib and lzo).

16, COMPRESS_ZLIB
Compress the value of column by using zlib. This flag is enabled when you build
Groonga with --with-zlib.

32, COMPRESS_LZO
Compress the value of column by using lzo. This flag is enabled when you build
Groonga with --with-lzo.

インデックス型のカラムについては、flagsの値に以下の値を加えることによって、追加の属
性を指定することができます。

128, WITH_SECTION
段落情報を格納するインデックスを作成します。

256, WITH_WEIGHT
ウェイト情報を格納するインデックスを作成します。

512, WITH_POSITION
位置情報を格納するインデックス(完全転置インデックス)を作成します。

type
値の型を指定します。Groongaの組込型か、同一データベースに定義済みのユーザ定義型、定義済みのテーブルを指定することができます。

source
インデックス型のカラムを作成した場合は、インデックス対象となるカラムをsource引数に指定します。

Return value
[HEADER, SUCCEEDED]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED
If command is succeeded, it returns true on success, false otherwise.

column_list
Summary
column_list command lists columns in a table.

Syntax
This command takes only one required parameter:

column_list table

Usage
Here is a simple example of column_list command.

Execution example:

table_create Users TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users age COLUMN_SCALAR UInt8
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users tags COLUMN_VECTOR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_list Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# "id",
# "UInt32"
# ],
# [
# "name",
# "ShortText"
# ],
# [
# "path",
# "ShortText"
# ],
# [
# "type",
# "ShortText"
# ],
# [
# "flags",
# "ShortText"
# ],
# [
# "domain",
# "ShortText"
# ],
# [
# "range",
# "ShortText"
# ],
# [
# "source",
# "ShortText"
# ]
# ],
# [
# 256,
# "_key",
# "",
# "",
# "COLUMN_SCALAR",
# "Users",
# "ShortText",
# []
# ],
# [
# 257,
# "age",
# "/tmp/groonga-databases/commands_column_list.0000101",
# "fix",
# "COLUMN_SCALAR|PERSISTENT",
# "Users",
# "UInt8",
# []
# ],
# [
# 258,
# "tags",
# "/tmp/groonga-databases/commands_column_list.0000102",
# "var",
# "COLUMN_VECTOR|PERSISTENT",
# "Users",
# "ShortText",
# []
# ]
# ]
# ]

Parameters
This section describes parameters of column_list.

Required parameters
All parameters are required.

table
Specifies the name of table to be listed columns.

Return value
column_list returns the list of column information in the table:

[
HEADER,
[
COLUMN_LIST_HEADER,
COLUMN_INFORMATION1,
COLUMN_INFORMATION2,
...
]
]

HEADER
See /reference/command/output_format about HEADER.

COLUMN_LIST_HEADER
COLUMN_LIST_HEADER describes about content of each COLUMN_INFORMATION.

COLUMN_LIST_HEADER is the following format:

[
["id", "UInt32"],
["name", "ShortText"],
["path", "ShortText"],
["type", "ShortText"],
["flags", "ShortText"],
["domain", "ShortText"],
["range", "ShortText"],
["source", "ShortText"]
]

It means the following:

· The first content in COLUMN_INFORMATION is id value and the value type is UInt32.

· The second content in COLUMN_INFORMATION is name value and the value type is
ShortText.

· The third content ....

See the following COLUMN_INFORMATION description for details.

This field provides meta-data of column information. So this field will be useful for
programs rather than humans.

COLUMN_INFORMATION
Each COLUMN_INFORMATION is the following format:

[
ID,
NAME,
PATH,
TYPE,
FLAGS,
DOMAIN,
RANGE,
SOURCES
]

ID
The column ID in the Groonga database. Normally, you don't care about it.

NAME
The column name.

PATH
The path for storing column data.

TYPE
The type of the column. It is one of the followings:

┌──────┬──────────────────────────────────┐
│Value │ Description │
├──────┼──────────────────────────────────┤
fix │ The column is a fixed size │
│ │ column. Scalar column that its │
│ │ type is fixed size type is fixed │
│ │ size column. │
├──────┼──────────────────────────────────┤
var │ The column is a variable size │
│ │ column. Vector column or scalar │
│ │ column that its type is variable │
│ │ size type are variable size │
│ │ column. │
├──────┼──────────────────────────────────┤
index │ The column is an index column. │
└──────┴──────────────────────────────────┘

FLAGS
The flags of the column. Each flag is separated by | like
COLUMN_VECTOR|WITH_WEIGHT. FLAGS must include one of COLUMN_SCALAR, COLUMN_VECTOR
or COLUMN_INDEX. Other flags are optional.

Here is the available flags:

┌──────────────┬──────────────────────────────────┐
│Flag │ Description │
├──────────────┼──────────────────────────────────┤
COLUMN_SCALAR │ The column is a scalar column. │
├──────────────┼──────────────────────────────────┤
COLUMN_VECTOR │ The column is a vector column. │
├──────────────┼──────────────────────────────────┤
COLUMN_INDEX │ The column is an index column. │
├──────────────┼──────────────────────────────────┤
WITH_WEIGHT │ The column can have weight. │
│ │ COLUMN_VECTOR and COLUMN_INDEX
│ │ may have it. COLUMN_SCALAR
│ │ doesn't have it. │
├──────────────┼──────────────────────────────────┤
WITH_SECTION │ The column can have section │
│ │ information. COLUMN_INDEX may │
│ │ have it. COLUMN_SCALAR and │
│ │ COLUMN_VECTOR don't have it. │
│ │ │
│ │ Multiple column index has it. │
├──────────────┼──────────────────────────────────┤
WITH_POSITION │ The column can have position │
│ │ information. COLUMN_INDEX may │
│ │ have it. COLUMN_SCALAR and │
│ │ COLUMN_VECTOR don't have it. │
│ │ │
│ │ Full text search index must has │
│ │ it. │
└──────────────┴──────────────────────────────────┘

PERSISTENT │ The column is a persistent │
│ │ column. It means that the column │
│ │ isn't a │
│ │ /reference/columns/pseudo. │
└──────────────┴──────────────────────────────────┘

DOMAIN
The name of table that has the column.

RANGE
The value type name of the column. It is a type name or a table name.

SOURCES
An array of the source column names of the index. If the index column is multiple
column index, the array has two or more source column names.

It is always an empty array for COLUMN_SCALAR and COLUMN_VECTOR.

See also
· /reference/commands/column_create

· /reference/column

column_remove
Summary
column_remove - テーブルに定義されているカラムの削除

Groonga組込コマンドの一つであるcolumn_removeについて説明します。組込コマンドは、groonga実行ファイルの引数、>標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

column_removeはテーブルに定義されているカラムを削除します。
また、付随するインデックスも削除されます。[1]

Syntax
column_remove table name

Usage
column_remove Entry body

[true]
脚注

[1] マルチセクションインデックスの一部である場合も、インデックスが削除されます。

Parameters
table
削除対象のカラムが定義されているテーブルの名前を指定します。

name
削除対象のカラム名を指定します。

Return value
[成功かどうかのフラグ]

成功かどうかのフラグ
エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。

column_rename
Summary
column_rename command renames a column.

It is a light operation. It just changes a relationship between name and the column
object. It doesn't copy column values.

It is a dangerous operation. You must stop all operations including read operations while
you run column_rename. If the following case is occurred, Groonga process may be crashed:

· Starts an operation (like select) that accesses the column to be renamed by the
current column name. The current column name is called as the old column name in the
below because the column name is renamed.

· Runs column_rename. The select is still running.

· The select accesses the column to be renamed by the old column name. But the select
can't find the column by the old name because the column has been renamed to the new
column name. It may crash the Groonga process.

Syntax
This command takes three parameters.

All parameters are required:

column_rename table name new_name

Usage
Here is a simple example of column_rename command.

Execution example:

table_create Users TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users score COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "Alice", "score": 2},
{"_key": "Bob", "score": 0},
{"_key": "Carlos", "score": -1}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
column_rename Users score point
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_list Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# "id",
# "UInt32"
# ],
# [
# "name",
# "ShortText"
# ],
# [
# "path",
# "ShortText"
# ],
# [
# "type",
# "ShortText"
# ],
# [
# "flags",
# "ShortText"
# ],
# [
# "domain",
# "ShortText"
# ],
# [
# "range",
# "ShortText"
# ],
# [
# "source",
# "ShortText"
# ]
# ],
# [
# 256,
# "_key",
# "",
# "",
# "COLUMN_SCALAR",
# "Users",
# "ShortText",
# []
# ],
# [
# 257,
# "point",
# "/tmp/groonga-databases/commands_column_rename.0000101",
# "fix",
# "COLUMN_SCALAR|PERSISTENT",
# "Users",
# "Int32",
# []
# ]
# ]
# ]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "point",
# "Int32"
# ]
# ],
# [
# 1,
# "Alice",
# 2
# ],
# [
# 2,
# "Bob",
# 0
# ],
# [
# 3,
# "Carlos",
# -1
# ]
# ]
# ]
# ]

Parameters
This section describes parameters of column_rename.

Required parameters
All parameters are required.

table
Specifies the name of table that has the column to be renamed.

name
Specifies the column name to be renamed.

new_name
Specifies the new column name.

Return value
[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
It is true on success, false otherwise.

config_delete
Summary
New in version 5.1.2.

config_delete command deletes the specified configuration item.

Syntax
This command takes only one required parameter:

config_delete key

Usage
Here is an example to delete alias.column configuration item:

Execution example:

config_set alias.column Aliases.real_name
# [[0, 1337566253.89858, 0.000355720520019531], true]
config_get alias.column
# [[0, 1337566253.89858, 0.000355720520019531], "Aliases.real_name"]
config_delete alias.column
# [[0, 1337566253.89858, 0.000355720520019531], true]
config_get alias.column
# [[0, 1337566253.89858, 0.000355720520019531], ""]

Here is an example to delete nonexistent configuration item:

Execution example:

config_delete nonexistent
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "[config][delete] failed to delete",
# [
# [
# "grn_config_delete",
# "config.c",
# 166
# ]
# ]
# ],
# false
# ]

config_delete returns an error when you try to delete nonexistent configuration item.

Parameters
This section describes all parameters.

Required parameters
There is one required parameter.

key
Specifies the key of target configuration item.

The max key size is 4KiB.

You can't use an empty string as key.

Optional parameters
There is no optional parameter.

Return value
config_delete command returns whether deleting a configuration item is succeeded or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

See also
· /reference/configuration

· config_get

· config_set

config_get
Summary
New in version 5.1.2.

config_get command returns the value of the specified configuration item.

Syntax
This command takes only one required parameter:

config_get key

Usage
Here is an example to set a value to alias.column configuration item and get the value:

Execution example:

config_set alias.column Aliases.real_name
# [[0, 1337566253.89858, 0.000355720520019531], true]
config_get alias.column
# [[0, 1337566253.89858, 0.000355720520019531], "Aliases.real_name"]

Here is an example to get nonexistent configuration item value:

Execution example:

config_get nonexistent
# [[0, 1337566253.89858, 0.000355720520019531], ""]

config_get returns an empty string for nonexistent configuration item key.

Parameters
This section describes all parameters.

Required parameters
There is one required parameter.

key
Specifies the key of target configuration item.

The max key size is 4KiB.

You can't use an empty string as key.

Optional parameters
There is no optional parameter.

Return value
config_get command returns the value of the specified configuration item:

[HEADER, VALUE]

HEADER
See /reference/command/output_format about HEADER.

VALUE
VALUE is the value of the configuration item specified by key. It's a string.

See also
· /reference/configuration

· config_set

· config_delete

config_set
Summary
New in version 5.1.2.

config_set command sets a value to the specified configuration item.

Syntax
This command takes two required parameters:

config_set key value

Usage
Here is an example to set a value to alias.column configuration item and confirm the set
value:

Execution example:

config_set alias.column Aliases.real_name
# [[0, 1337566253.89858, 0.000355720520019531], true]
config_get alias.column
# [[0, 1337566253.89858, 0.000355720520019531], "Aliases.real_name"]

Parameters
This section describes all parameters.

Required parameters
There are required parameters.

key
Specifies the key of target configuration item.

The max key size is 4KiB.

You can't use an empty string as key.

value
Specifies the value of the target configuration item specified by key.

The max value size is 4091B (= 4KiB - 5B).

Optional parameters
There is no optional parameter.

Return value
config_set command returns whether setting a configuration item value is succeeded or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

See also
· /reference/configuration

· config_get

· config_delete

database_unmap
Summary
New in version 5.0.7.

database_unmap unmaps already mapped tables and columns in the database. "Map" means that
loading from disk to memory. "Unmap" means that releasing mapped memory.

NOTE:
Normally, you don't need to use database_unmap because OS manages memory cleverly. If
remained system memory is reduced, OS moves memory used by Groonga to disk until
Groonga needs the memory. OS moves unused memory preferentially.

CAUTION:
You can use this command only when thread_limit returns 1. It means that this command
doesn't work with multithreading.

Syntax
This command takes no parameters:

database_unmap

Usage
You can unmap database after you change the max number of threads to 1:

Execution example:

thread_limit --max 1
# [[0, 1337566253.89858, 0.000355720520019531], 2]
database_unmap
# [[0, 1337566253.89858, 0.000355720520019531], true]

If the max number of threads is larger than 1, database_unmap fails:

Execution example:

thread_limit --max 2
# [[0, 1337566253.89858, 0.000355720520019531], 1]
database_unmap
# [
# [
# -2,
# 1337566253.89858,
# 0.000355720520019531,
# "[database_unmap] the max number of threads must be 1: <2>",
# [
# [
# "proc_database_unmap",
# "proc.c",
# 6931
# ]
# ]
# ],
# false
# ]

Parameters
This section describes all parameters.

Required parameters
There is no required parameter.

Optional parameters
There is no optional parameter.

Return value
The command returns true as body on success such as:

[HEADER, true]

If the command fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

define_selector
Summary
define_selector - 検索コマンドを定義

Groonga組込コマンドの一つであるdefine_selectorについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

define_selectorは、検索条件をカスタマイズした新たな検索コマンドを定義します。

Syntax
define_selector name table [match_columns [query [filter [scorer [sortby
[output_columns [offset [limit [drilldown [drilldown_sortby
[drilldown_output_columns [drilldown_offset [drilldown_limit]]]]]]]]]]]]]

Usage
テーブルEntryの全レコード・全カラムの値を出力するselectorコマンドを定義します。:

define_selector entry_selector Entry
[true]

Parameters
name
定義するselectorコマンドの名前を指定します。

table
検索対象のテーブルを指定します。

match_columns
追加するselectorコマンドのmatch_columns引数のデフォルト値を指定します。

query
追加するselectorコマンドのquery引数のデフォルト値を指定します。

filter
追加するselectorコマンドのfilter引数のデフォルト値を指定します。

scorer
追加するselectorコマンドのscorer引数のデフォルト値を指定します。

sortby
追加するselectorコマンドのsortby引数のデフォルト値を指定します。

output_columns
追加するselectorコマンドのoutput_columns引数のデフォルト値を指定します。

offset
追加するselectorコマンドのoffset引数のデフォルト値を指定します。

limit
追加するselectorコマンドのlimit引数のデフォルト値を指定します。

drilldown
追加するselectorコマンドのdrilldown引数のデフォルト値を指定します。

drilldown_sortby
追加するselectorコマンドのdrilldown_sortby引数のデフォルト値を指定します。

drilldown_output_columns
追加するselectorコマンドのdrilldown_output_columns引数のデフォルト値を指定します。

drilldown_offset
追加するselectorコマンドのdrilldown_offset引数のデフォルト値を指定します。

drilldown_limit
追加するselectorコマンドのdrilldown_limit引数のデフォルト値を指定します。

Return value
[成功かどうかのフラグ]

成功かどうかのフラグ
エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。

See also
/reference/grn_expr

defrag
Summary
defrag command resolves fragmentation of specified objects.

Groonga組込コマンドの一つであるdefragについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力
、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

defragは、対象となるオブジェクト(データベースか可変長サイズカラム)を指定し、オブジェクトのフラグメンテーショ
ンを解消します。

Syntax
defrag objname threshold

Usage
開いているデータベースのフラグメンテーションを解消する:

defrag
[300]

テーブル名 Entry のカラム body のフラグメンテーションを解消する:

defrag Entry.body
[30]

Parameters
objname
対象となるオブジェクト名を指定します。空の場合、開いているdbオブジェクトが対象となります。

Return value
[フラグメンテーション解消を実行したセグメントの数]

フラグメンテーション解消を実行したセグメントの数
フラグメンテーション解消を実行したセグメントの数を返す。

delete
Summary
delete command deletes specified record of table.

Cascade delete
There is a case that multiple table is associated. For example, the key of one table are
referenced by other table's records. In such a case, if you delete the key of one table,
other table's records are also removed.

Note that the type of other table's column is COLUMN_VECTOR, only the value of referencing
key is removed from the vector value.

Syntax
delete table [key [id [filter]]]

Usage
Here are a schema definition and sample data to show usage.

Delete the record from Entry table which has "2" as the key.

Execution example:

delete Entry 2
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Entry
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "UInt32"
# ],
# [
# "status",
# "ShortText"
# ]
# ],
# [
# 1,
# 1,
# "OK"
# ]
# ]
# ]
# ]

Here is the example about cascaded delete.

The country column of Users table associates with Country table.

"Cascaded delete" removes the records which matches specified key and refers that key.

Execution example:

table_create Country TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Users TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users name COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users country COLUMN_SCALAR Country
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": 1, "name": "John", country: "United States"}
{"_key": 2, "name": "Mike", country: "United States"}
{"_key": 3, "name": "Takashi", country: "Japan"}
{"_key": 4, "name": "Hanako", country: "Japan"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
load --table Country
[
{"_key": "United States"}
{"_key": "Japan"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
delete Country "United States"
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Country
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 2,
# "Japan"
# ]
# ]
# ]
# ]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "UInt32"
# ],
# [
# "country",
# "Country"
# ],
# [
# "name",
# "ShortText"
# ]
# ],
# [
# 1,
# 1,
# "",
# "John"
# ],
# [
# 2,
# 2,
# "",
# "Mike"
# ],
# [
# 3,
# 3,
# "Japan",
# "Takashi"
# ],
# [
# 4,
# 4,
# "Japan",
# "Hanako"
# ]
# ]
# ]
# ]

Parameters
table
Specifies the name of table to delete the records.

key
Specifies the key of record to delete. If you use the table with TABLE_NO_KEY, the key
is just ignored. (Use id parameter in such a case)

id
Specifies the id of record to delete. If you specify id parameter, you must not specify
key parameter.

filter
Specifies the expression of grn_expr to identify the record. If you specify filter
parameter, you must not specify key and id parameter.

Return value
[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

See also
load

dump
Summary
dump - データベースのスキーマとデータを出力する

Groonga組込コマンドの一つであるdumpについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、
またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

dumpはデータベースのスキーマとデータを後から読み込めるフォーマットで出力します。dumpの結果は大きくなるため、
主にコマンドラインから使うことを想定しています。データベースのバックアップが主な利用方法です。

dumpが出力するフォーマットは直接Groongaが解釈できるフォーマットです。そのため、以下のようにしてデータベース>をコピーすることができます。:

% groonga original/db dump > dump.grn
% mkdir backup
% groonga -n backup/db < dump.grn

Syntax
dump [tables]
[dump_plugins]
[dump_schema]
[dump_records]
[dump_indexes]

Usage
Here is the sample schema and data to check dump behaviour:

plugin_register token_filters/stop_word
table_create Bookmarks TABLE_HASH_KEY ShortText
column_create Bookmarks title COLUMN_SCALAR ShortText
table_create Lexicon TABLE_PAT_KEY ShortText
table_create Sites TABLE_NO_KEY
column_create Sites url COLUMN_SCALAR ShortText
column_create Lexicon bookmark_title COLUMN_INDEX Bookmarks title
load --table Bookmarks
[
{"_key":"Groonga", "title":"Introduction to Groonga"},
{"_key":"Mroonga", "title":"Introduction to Mroonga"}
]
load --table Sites
[
{"_key": 1, "url":"http://groonga.org"},
{"_key": 2, "url":"http://mroonga.org"}
]

Dump all data in database:

> dump
plugin_register token_filters/stop_word

table_create Sites TABLE_NO_KEY
column_create Sites url COLUMN_SCALAR ShortText

table_create Bookmarks TABLE_HASH_KEY ShortText
column_create Bookmarks title COLUMN_SCALAR ShortText

table_create Lexicon TABLE_PAT_KEY ShortText

load --table Sites
[
["_id","url"],
[1,"http://groonga.org"],
[2,"http://mroonga.org"]
]

load --table Bookmarks
[
["_key","title"],
["Groonga","Introduction to Groonga"],
["Mroonga","Introduction to Mroonga"]
]

create Lexicon bookmark_title COLUMN_INDEX Bookmarks title

Dump schema and specific table data:

> dump Bookmarks
plugin_register token_filters/stop_word

table_create Sites TABLE_NO_KEY
column_create Sites url COLUMN_SCALAR ShortText

table_create Bookmarks TABLE_HASH_KEY ShortText
column_create Bookmarks title COLUMN_SCALAR ShortText

table_create Lexicon TABLE_PAT_KEY ShortText

load --table Bookmarks
[
["_key","title"],
["Groonga","Introduction to Groonga"],
["Mroonga","Introduction to Mroonga"]
]

column_create Lexicon bookmark_title COLUMN_INDEX Bookmarks title

Dump plugin only:

> dump --dump_schema no --dump_records no --dump_indexes no
plugin_register token_filters/stop_word

Dump records only:

> dump --dump_schema no --dump_plugins no --dump_indexes no
load --table Sites
[
["_id","url"],
[1,"http://groonga.org"],
[2,"http://mroonga.org"]
]

load --table Bookmarks
[
["_key","title"],
["Groonga","Introduction to Groonga"],
["Mroonga","Introduction to Mroonga"]
]

Dump schema only:

> dump --dump_records no --dump_plugins no --dump_indexes no
table_create Sites TABLE_NO_KEY
column_create Sites url COLUMN_SCALAR ShortText

table_create Bookmarks TABLE_HASH_KEY ShortText
column_create Bookmarks title COLUMN_SCALAR ShortText

table_create Lexicon TABLE_PAT_KEY ShortText

Parameters
There are optional parameters.

Optional parameters
tables
出力対象のテーブルを「,」(カンマ)区切りで指定します。存在しないテーブルを指定した場合は無視されます。

dump_plugins
New in version 5.0.3.

You can customize the output whether it contains registered plugins or not. To exclude
registered plugins from the output, specify no.

The default value is yes.

dump_schema
New in version 5.0.3.

You can customize the output whether it contains database schema or not. To exclude
database schema from the output, specify no.

The default value is yes.

dump_records
New in version 5.0.3.

You can customize the output whether it contains records or not. To exclude records from
the output, specify no.

The default value is yes.

dump_indexes
New in version 5.0.3.

You can customize the output whether it contains indexes or not. To exclude indexes from
the output, specify no.

The default value is yes.

Return value
データベースのスキーマとデータをGroongaの組み込みコマンド呼び出し形式で出力します。output_type指定は無視されます。

io_flush
Summary
NOTE:
This command is an experimental feature.

New in version 5.0.5.

io_flush flushes all changes in memory to disk explicitly. Normally, you don't need to use
io_flush explicitly. Because flushing is done automatically by OS. And flushing by OS is
effective.

You need to use io_flush explicitly when your system may often crash unexpectedly or you
may not shutdown your Groonga process in a normal way. (For example, using shutdown is a
normal shutdown process.) It's better that you use io_flush after you change your Groonga
database for the case. Here are commands that change your Groonga database:

· load

· delete

· truncate

· table_create

· table_remove

· table_rename

· column_create

· column_remove

· column_rename

· plugin_register

· plugin_unregister

If you're using select-scorer parameter in select to change existing column values, select
is added to the above list.

Note that io_flush may be a heavy process. If there are many changes in memory, flushing
them to disk is a heavy process.

Syntax
This command takes two parameters.

All parameters are optional:

io_flush [target_name=null]
[recursive=yes]

Usage
You can flush all changes in memory to disk with no arguments:

Execution example:

io_flush
# [[0, 1337566253.89858, 0.000355720520019531], true]

If you know what is changed, you can narrow flush targets. Here is a correspondence table
between command and flush targets.

┌─────────────────────────┬──────────────────────────┬────────────────────────────────────────────────────────────────────────────┐
│Command │ Flush targets │ io_flush arguments │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
load and delete │ Target table and its │ Table and its columns: │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
truncate │ Target table and its │ Table and its columns: │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
table_create │ Target table and │ Table: │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
table_remove and │ Database. │ Database: │
table_rename │ │ │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
column_create │ Target column and │ Table: │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
column_remove and │ Database. │ Database: │
column_rename │ │ │
├─────────────────────────┼──────────────────────────┼────────────────────────────────────────────────────────────────────────────┤
plugin_register and │ Database. │ Database: │
plugin_unregister │ │ │
└─────────────────────────┴──────────────────────────┴────────────────────────────────────────────────────────────────────────────┘

Parameters
This section describes all parameters.

Required parameters
There is no required parameter.

Optional parameters
There are optional parameters.

target_name
Specifies a flush target object name. Target object is one of database, table or column.

If you omit this parameter, database is flush target object:

Execution example:

io_flush
# [[0, 1337566253.89858, 0.000355720520019531], true]

If you specify table name, the table is flush target object:

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
io_flush --target_name Users
# [[0, 1337566253.89858, 0.000355720520019531], true]

If you specify column name, the column is flush target object:

Execution example:

column_create Users age COLUMN_SCALAR UInt8
# [[0, 1337566253.89858, 0.000355720520019531], true]
io_flush --target_name Users.age
# [[0, 1337566253.89858, 0.000355720520019531], true]

recursive
Specifies whether child objects of the flush target object are also flush target objects.

Child objects of database is all tables and all columns.

Child objects of table is all its columns.

Child objects of column is nothing.

recursive value must be yes or no. yes means that all of the specified flush target object
and child objects are flush target objects. no means that only the specified flush target
object is flush target object.

The following io_flush flushes all changes in database, all tables and all columns:

Execution example:

io_flush --recursive yes
# [[0, 1337566253.89858, 0.000355720520019531], true]

The following io_flush flushes all changes only in database:

Execution example:

io_flush --recursive no
# [[0, 1337566253.89858, 0.000355720520019531], true]

If you specify other value (not yes neither no) or omit recursive parameter, yes is used.

yes is used in the following case because invalid recursive argument is specified:

Execution example:

io_flush --recursive invalid
# [[0, 1337566253.89858, 0.000355720520019531], true]

yes is used in the following case because recursive parameter isn't specified:

Execution example:

io_flush
# [[0, 1337566253.89858, 0.000355720520019531], true]

Return value
The command returns true as body on success such as:

[HEADER, true]

If the command fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

load
Summary
load loads data as records in the current database and updates values of each columns.

Syntax
load values table [columns [ifexists [input_type]]]

Parameters
This section describes all parameters.

values
Specifies values loaded to records. Values should satisfy input_type format. If you
specify "json" as input_type, you can choose a format from below:

Format 1:
[[COLUMN_NAME1, COLUMN_NAME2,..], [VALUE1, VALUE2,..], [VALUE1, VALUE2,..],..]

Format 2:
[{COLUMN_NAME1: VALUE1, COLUMN_NAME2: VALUE2}, {COLUMN_NAME1: VALUE1,
COLUMN_NAME2: VALUE2},..]

[COLUMN_NAME1, COLUMN_NAME2,..] format in Format 1 is effective only when columns
parameter isn't specified.

When a target table contains primary key, you must specify _key column (pseudo column
associated primary key) as the one of COLUMN_NAME.

If values isn't specified any values, they are read from the standard input until all
opened parenthes match their closed ones. You don't have to enclose them with
single-quotes or double-quotes, but if you specified values with values parameter, you
should do.

In following values, you also don't have to enclose any spaces (' ') with single-quotes
or double-quotes.

table
Specifies a table name you want to add records.

columns
Specifies column names in added records with comma separations.

ifexists
Specifies executed grn_expr string when the same primary key as added records already
exists in your table. If ifexists specifies grn_expr string (default: true) and its
value is true, values in other (all columns excluding _key column) columns is updated.

input_type
Specifies an input format for values. It supports JSON only.

Usage
Here is an example to add records to "Entry" table.

load --table Entry --input_type json --values [{\"_key\":\"Groonga\",\"body\":\"It's very fast!!\"}]

[1]

This example shows how to add values from standard input.

load --table Entry --input_type json
[
{"_key": "Groonga", "body": "It's very fast!!"}
]

[1]

Return value
JSON format
load returns the number of added records such as

[NUMBER]

See also
/reference/grn_expr

lock_acquire
Summary
New in version 5.1.2.

lock_acquire command acquires the lock of the target object. The target object is one of
database, table and column.

NOTE:
This is a dangerous command. You must release locks by lock_release that you acquire
when these locks are no longer needed. If you forget to release these locks, your
database may be broken.

Syntax
This command takes only one optional parameter:

lock_clear [target_name=null]

If target_name parameters is omitted, database is used for the target object.

Usage
Here is an example to acquire the lock of the database:

Execution example:

lock_acquire
# [[0, 1337566253.89858, 0.000355720520019531], true]

If the database is locked, you can't create a new table and column. Release the lock of
the database to show another examples.

Execution example:

lock_release
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to acquire the lock of Entries table:

Execution example:

table_create Entries TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_acquire Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to acquire the lock of Sites.title column:

Execution example:

table_create Sites TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Sites title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_acquire Sites.title
# [[0, 1337566253.89858, 0.000355720520019531], true]

Parameters
This section describes all parameters.

target_name
Specifies the name of table or column.

If you don't specify it, database is used for the target object.

The default is none. It means that the target object is database.

Return value
lock_acquire command returns whether lock is acquired or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

See also
· lock_release

· lock_clear

lock_clear
Summary
New in version 4.0.9.

lock_clear command clear the lock of the target object recursively. The target object is
one of database, table and column.

NOTE:
This is a dangerous command. You must not use this command while other process or
thread is doing a write operation to the target object. If you do it, your database may
be broken and/or your process may be crashed.

Syntax
This command takes only one optional parameter:

lock_clear [target_name=null]

If target_name parameters is omitted, database is used for the target object. It means
that all locks in the database are cleared.

Usage
Here is an example to clear all locks in the database:

Execution example:

lock_clear
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to clear locks of Entries table and Entries table columns:

Execution example:

table_create Entries TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries body COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_clear Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to clear the lock of Sites.title column:

Execution example:

table_create Sites TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Sites title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_clear Sites.title
# [[0, 1337566253.89858, 0.000355720520019531], true]

Parameters
This section describes all parameters.

target_name
Specifies the name of table or column.

If you don't specify it, database is used for the target object.

The default is none. It means that the target object is database.

Return value
lock_clear command returns whether lock is cleared successfully or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

lock_release
Summary
New in version 5.1.2.

lock_release command releases the lock of the target object. The target object is one of
database, table and column.

NOTE:
This is a dangerous command. You must only release locks that you acquire by
lock_acquire. If you release locks without lock_acquire, your database may be broken.

Syntax
This command takes only one optional parameter:

lock_clear [target_name=null]

If target_name parameters is omitted, database is used for the target object.

Usage
Here is an example to release the lock of the database:

Execution example:

lock_acquire
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_release
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to release the lock of Entries table:

Execution example:

table_create Entries TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_acquire Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_release Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to release the lock of Sites.title column:

Execution example:

table_create Sites TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Sites title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_acquire Sites.title
# [[0, 1337566253.89858, 0.000355720520019531], true]
lock_release Sites.title
# [[0, 1337566253.89858, 0.000355720520019531], true]

Parameters
This section describes all parameters.

target_name
Specifies the name of table or column.

If you don't specify it, database is used for the target object.

The default is none. It means that the target object is database.

Return value
lock_release command returns whether lock is released successfully or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

See also
· lock_acquire

· lock_clear

log_level
Summary
log_level - ログ出力レベルの設定

Groonga組込コマンドの一つであるlog_levelについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

log_levelは、ログ出力レベルを設定します。

Syntax
log_level level

Usage
log_level warning
[true]

Parameters
level
設定するログ出力レベルの値を以下のいずれかで指定します。
EMERG ALERT CRIT error warning notice info debug

Return value
[成功かどうかのフラグ]

成功かどうかのフラグ
エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。

See also
log_put log_reopen

log_put
Summary
log_put - ログ出力

groonga組込コマンドの一つであるlog_putについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

log_putは、ログにmessageを出力します。

Syntax
log_put level message

Usage
log_put ERROR ****MESSAGE****
[true]

Parameters
level
設定するログ出力レベルの値を以下のいずれかで指定します。
EMERG ALERT CRIT error warning notice info debug

message
出力する文字列を指定します。

Return value
[成功かどうかのフラグ]

成功かどうかのフラグ
エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。

See also
log_level log_reopen

log_reopen
Summary
log_reopen - ログファイルの再読み込み

Groonga組込コマンドの一つであるlog_reopenについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

log_reopenは、ログファイルを再読み込みします。

現在、デフォルトのログ関数を用いている場合のみに対応しています。

Syntax
log_reopen

Usage
log_reopen

[true]

log_reopenを用いたログのローテーション
1. ログファイルをmvなどで移動する。 ログはmvで移動された先のファイルに書き込まれる。

2. log_reopenコマンドを実行する。

3. 既存のログファイル名と同じファイル名で、新たなログファイルが作成される。
今後のログは新たなログファイルに書き込まれる。

Parameters
ありません。

Return value
[成功かどうかのフラグ]

成功かどうかのフラグ
エラーが生じなかった場合にはtrue、エラーが生じた場合にはfalseを返す。

See also
log_level log_put

logical_count
Summary
New in version 5.0.0.

logical_count is a command to count matched records even though actual records are stored
into parted tables. It is useful for users because there is less need to care about
maximum records of table /limitations.

Note that this feature is not matured yet, so there are some limitations.

· Create parted tables which contains "_YYYYMMDD" postfix. It is hardcoded, so you must
create tables by each day.

· Load proper data into parted tables on your own.

Syntax
This command takes many parameters.

The required parameters are logical_table and shard_key:

logical_count logical_table
shard_key
[min]
[min_border]
[max]
[max_border]
[filter]

Usage
Register sharding plugin to use logical_count command in advance.

Note that logical_count is implemented as an experimental plugin, and the specification
may be changed in the future.

Here is the simple example which shows how to use this feature. Let's consider to count
specified logs which are stored into multiple tables.

Here is the schema and data.

Execution example:

table_create Logs_20150203 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150203 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150203 message COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Logs_20150204 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150204 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150204 message COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Logs_20150205 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150205 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150205 message COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]

Execution example:

load --table Logs_20150203
[
{"timestamp": "2015-02-03 23:59:58", "message": "Start"},
{"timestamp": "2015-02-03 23:59:58", "message": "Shutdown"},
{"timestamp": "2015-02-03 23:59:59", "message": "Start"},
{"timestamp": "2015-02-03 23:59:59", "message": "Shutdown"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
load --table Logs_20150204
[
{"timestamp": "2015-02-04 00:00:00", "message": "Start"},
{"timestamp": "2015-02-04 00:00:00", "message": "Shutdown"},
{"timestamp": "2015-02-04 00:00:01", "message": "Start"},
{"timestamp": "2015-02-04 00:00:01", "message": "Shutdown"},
{"timestamp": "2015-02-04 23:59:59", "message": "Start"},
{"timestamp": "2015-02-04 23:59:59", "message": "Shutdown"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 6]
load --table Logs_20150205
[
{"timestamp": "2015-02-05 00:00:00", "message": "Start"},
{"timestamp": "2015-02-05 00:00:00", "message": "Shutdown"},
{"timestamp": "2015-02-05 00:00:01", "message": "Start"},
{"timestamp": "2015-02-05 00:00:01", "message": "Shutdown"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]

There are three tables which are mapped each day from 2015 Feb 03 to 2015 Feb 05.

· Logs_20150203

· Logs_20150204

· Logs_20150205

Then, it loads data into each table which correspond to.

Let's count logs which contains "Shutdown" in message column and the value of timestamp is
"2015-02-04 00:00:00" or later.

Here is the query to achieve above purpose.

Execution example:

logical_count Logs timestamp --filter 'message == "Shutdown"' --min "2015-02-04 00:00:00" --min_border "include"
# [[0, 1337566253.89858, 0.000355720520019531], 5]

There is a well known limitation about the number of records. By sharding feature, you can
overcome such limitations because such a limitation is applied per table.

NOTE:
There is no convenient query such as PARTITIONING BY in SQL. Thus, you must create
table by table_create for each tables which contains "_YYYYMMDD" postfix in table name.

Parameters
This section describes parameters of logical_count.

Required parameters
There are required parameters, logical_table and shard_key.

logical_table
Specifies logical table name. It means table name without "_YYYYMMDD" postfix. If you use
actual table such as "Logs_20150203", "Logs_20150203" and so on, logical table name is
"Logs".

shard_key
Specifies column name which is treated as shared key in each parted table.

Optional parameters
There are optional parameters.

min
Specifies the min value of shard_key

min_border
Specifies whether the min value of borderline must be include or not. Specify include or
exclude as the value of this parameter.

max
Specifies the max value of shard_key.

max_border
Specifies whether the max value of borderline must be include or not. Specify include or
exclude as the value of this parameter.

filter
Return value
TODO

[HEADER, LOGICAL_COUNT]

logical_parameters
Summary
New in version 5.0.6.

logical_parameters is a command for test. Normally, you don't need to use this command.

logical_parameters provides the following two features:

· It returns the current parameters for logical_* commands.

· It sets new parameters for logical_* commands.

Here is a list of parameters:

· range_index

NOTE:
The parameters are independent in each thread. (To be exact, each grn_ctx.) If you want
to control the parameters perfectly, you should reduce the max number of threads to 1
by /reference/commands/thread_limit while you're using the parameters.

Syntax
This command takes only one optional parameter:

logical_parameters [range_index=null]

Usage
You need to register sharding plugin to use this command:

Execution example:

plugin_register sharding
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can get the all current parameter values by calling without parameters:

Execution example:

logical_parameters
# [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "auto"}]

You can set new values by calling with parameters:

Execution example:

logical_parameters --range_index never
# [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "auto"}]

logical_parameters returns the parameter values before new values are set when you set new
values.

Parameters
This section describes parameters.

Required parameters
There is no required parameter.

Optional parameters
There is one optional parameter.

range_index
Specifies how to use range index in logical_range_filter by keyword.

Here are available keywords:

· auto (default)

· always

· never

If auto is specified, range index is used only when it'll be efficient. This is the
default value.

Execution example:

logical_parameters --range_index auto
# [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "never"}]

If always is specified, range index is always used. It'll be useful for testing a case
that range index is used.

Execution example:

logical_parameters --range_index always
# [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "auto"}]

If never is specified, range index is never used. It'll be useful for testing a case that
range index isn't used.

Execution example:

logical_parameters --range_index never
# [[0, 1337566253.89858, 0.000355720520019531], {"range_index": "always"}]

Return value
The command returns the current parameters for logical_* command:

[
HEADER,
{"range_index": HOW_TO_USE_RANGE_INDEX}
]

HOW_TO_USE_RANGE_INDEX value is one of the followings:

· "auto"

· "always"

· "never"

See /reference/command/output_format for HEADER.

logical_range_filter
Summary
New in version 5.0.0.

TODO: Write summary

Syntax
This command takes many parameters.

The required parameters are logical_table and shard_key:

logical_range_filter
logical_table
shard_key
[min=null]
[min_border=null]
[max=null]
[max_border=null]
[order=ascending]
[filter=null]
[offset=0]
[limit=10]
[output_columns=_key,*]
[use_range_index=null]

There are some parameters that can be only used as named parameters. You can't use these
parameters as ordered parameters. You must specify parameter name.

Here are parameters that can be only used as named parameters:

· cache=no

Usage
Register sharding plugin to use logical_range_filter command in advance.

TODO: Add examples

Parameters
This section describes parameters of logical_range_filter.

Required parameters
There are required parameters, logical_table and shard_key.

logical_table
Specifies logical table name. It means table name without "_YYYYMMDD" postfix. If you use
actual table such as "Logs_20150203", "Logs_20150203" and so on, logical table name is
"Logs".

TODO: Add examples

shard_key
Specifies column name which is treated as shared key in each parted table.

TODO: Add examples

Optional parameters
There are optional parameters.

min
Specifies the min value of shard_key

TODO: Add examples

min_border
Specifies whether the min value of borderline must be include or not. Specify include or
exclude as the value of this parameter.

TODO: Add examples

max
Specifies the max value of shard_key.

TODO: Add examples

max_border
Specifies whether the max value of borderline must be include or not. Specify include or
exclude as the value of this parameter.

TODO: Add examples

order
TODO

filter
TODO

offset
TODO

limit
TODO

output_columns
TODO

use_range_index
Specifies whether range_index is used or not. Note that it's a parameter for test. It
should not be used for production.

TODO: Add examples

Cache related parameter
cache
Specifies whether caching the result of this query or not.

If the result of this query is cached, the next same query returns response quickly by
using the cache.

It doesn't control whether existing cached result is used or not.

Here are available values:

┌──────┬──────────────────────────────────┐
│Value │ Description │
├──────┼──────────────────────────────────┤
no │ Don't cache the output of this │
│ │ query. │
├──────┼──────────────────────────────────┤
yes │ Cache the output of this query. │
│ │ It's the default value. │
└──────┴──────────────────────────────────┘

TODO: Add examples

The default value is yes.

Return value
TODO

[HEADER, LOGICAL_FILTERED]

logical_select
Summary
New in version 5.0.5.

logical_select is a sharding version of select. logical_select searches records from
multiple tables and outputs them.

You need to plugin_register sharding plugin because logical_select is included in sharding
plugin.

Syntax
This command takes many parameters.

The required parameters are logical_table and shard_key. Other parameters are optional:

logical_select logical_table
shard_key
[min=null]
[min_border="include"]
[max=null]
[max_border="include"]
[filter=null]
[sortby=null]
[output_columns="_id, _key, *"]
[offset=0]
[limit=10]
[drilldown=null]
[drilldown_sortby=null]
[drilldown_output_columns="_key, _nsubrecs"]
[drilldown_offset=0]
[drilldown_limit=10]
[drilldown_calc_types=NONE]
[drilldown_calc_target=null]

logical_select has the following named parameters for advanced drilldown:

· drilldown[${LABEL}].keys=null

· drilldown[${LABEL}].sortby=null

· drilldown[${LABEL}].output_columns="_key, _nsubrecs"

· drilldown[${LABEL}].offset=0

· drilldown[${LABEL}].limit=10

· drilldown[${LABEL}].calc_types=NONE

· drilldown[${LABEL}].calc_target=null

You can use one or more alphabets, digits, _ and . for ${LABEL}. For example, parent.sub1
is a valid ${LABEL}.

Parameters that have the same ${LABEL} are grouped.

For example, the following parameters specify one drilldown:

· --drilldown[label].keys column

· --drilldown[label].sortby -_nsubrecs

The following parameters specify two drilldowns:

· --drilldown[label1].keys column1

· --drilldown[label1].sortby -_nsubrecs

· --drilldown[label2].keys column2

· --drilldown[label2].sortby _key

Differences from select
Most of logical_select features can be used like corresponding select features. For
example, parameter name is same, output format is same and so on.

But there are some differences from select:

· logical_table and shard_key parameters are required instead of table parameter.

· sortby isn't supported when multiple shards are used. (Only one shard is used, they
are supported.)

· _value.${KEY_NAME} in drilldown[${LABEL}].sortby doesn't work with multiple shards.
It works with one shard. _key in drilldown[${LABEL}].sortby work with multiple
shards.

· match_columns and query aren't supported yet.

· cache isn't supported yet.

· match_escalation_threshold isn't supported yet.

· query_flags isn't supported yet.

· query_expander isn't supported yet.

· adjuster isn't supported yet.

Usage
Let's learn about logical_select usage with examples. This section shows many popular
usages.

You need to register sharding plugin because logical_select is included in sharding
plugin.

Execution example:

plugin_register sharding
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries_20150708 TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150708 created_at COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150708 content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150708 n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150708 tag COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Entries_20150709 TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150709 created_at COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150709 content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150709 n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries_20150709 tag COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index_20150708 \
COLUMN_INDEX|WITH_POSITION Entries_20150708 _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index_20150708 \
COLUMN_INDEX|WITH_POSITION Entries_20150708 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index_20150709 \
COLUMN_INDEX|WITH_POSITION Entries_20150709 _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index_20150709 \
COLUMN_INDEX|WITH_POSITION Entries_20150709 content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries_20150708
[
{"_key": "The first post!",
"created_at": "2015/07/08 00:00:00",
"content": "Welcome! This is my first post!",
"n_likes": 5,
"tag": "Hello"},
{"_key": "Groonga",
"created_at": "2015/07/08 01:00:00",
"content": "I started to use Groonga. It's very fast!",
"n_likes": 10,
"tag": "Groonga"},
{"_key": "Mroonga",
"created_at": "2015/07/08 02:00:00",
"content": "I also started to use Mroonga. It's also very fast! Really fast!",
"n_likes": 15,
"tag": "Groonga"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
load --table Entries_20150709
[
{"_key": "Good-bye Senna",
"created_at": "2015/07/09 00:00:00",
"content": "I migrated all Senna system!",
"n_likes": 3,
"tag": "Senna"},
{"_key": "Good-bye Tritonn",
"created_at": "2015/07/09 01:00:00",
"content": "I also migrated all Tritonn system!",
"n_likes": 3,
"tag": "Senna"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

There are two tables, Entries_20150708 and Entries_20150709, for blog entries.

NOTE:
You need to use ${LOGICAL_TABLE_NAME}_${YYYYMMDD} naming rule for table names. In this
example, LOGICAL_TABLE_NAME is Entries and YYYYMMDD is 20150708 or 20150709.

An entry has title, created time, content, the number of likes for the entry and tag.
Title is key of Entries_YYYYMMDD. Created time is value of Entries_YYYYMMDD.created_at
column. Content is value of Entries_YYYYMMDD.content column. The number of likes is value
of Entries_YYYYMMDD.n_likes column. Tag is value of Entries_YYYYMMDD.tag column.

Entries_YYYYMMDD._key column and Entries_YYYYMMDD.content column are indexed using
TokenBigram tokenizer. So both Entries_YYYYMMDD._key and Entries_YYYYMMDD.content are
fulltext search ready.

OK. The schema and data for examples are ready.

Simple usage
TODO

Parameters
This section describes parameters of logical_select.

Required parameters
There are required parameters, logical_table and shard_key.

logical_table
Specifies logical table name. It means table name without _YYYYMMDD postfix. If you use
actual table such as Entries_20150708, Entries_20150709 and so on, logical table name is
Entries.

You can show 10 records by specifying logical_table and shard_key parameters. They are
required parameters.

Execution example:

logical_select --logical_table Entries --shard_key created_at
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 1436284800.0,
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1436288400.0,
# 15,
# "Groonga"
# ],
# [
# 1,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1436367600.0,
# 3,
# "Senna"
# ],
# [
# 2,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1436371200.0,
# 3,
# "Senna"
# ]
# ]
# ]
# ]

If nonexistent table is specified, an error is returned.

Execution example:

logical_select --logical_table Nonexistent --shard_key created_at
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "[logical_select] no shard exists: logical_table: <Nonexistent>: shard_key: <created_at>",
# [
# [
# "Groonga::Context.set_groonga_error",
# "lib/mrb/scripts/context.rb",
# 27
# ]
# ]
# ]
# ]

shard_key
Specifies column name which is treated as shared key. Shard key is a column that stores
data that is used for distributing records to suitable shards.

Shard key must be Time type for now.

See logical_table how to specify shard_key.

Optional parameters
There are optional parameters.

min
Specifies the minimum value of shard_key column. If shard doesn't have any matched
records, the shard isn't searched.

For example, min is "2015/07/09 00:00:00", Entry_20150708 isn't searched. Because
Entry_20150708 has only records for "2015/07/08".

The following example only uses Entry_20150709 table. Entry_20150708 isn't used.

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--min "2015/07/09 00:00:00"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1436367600.0,
# 3,
# "Senna"
# ],
# [
# 2,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1436371200.0,
# 3,
# "Senna"
# ]
# ]
# ]
# ]

min_border
Specifies whether the minimum value is included or not. Here is available values.

┌────────┬──────────────────────────────────┐
│Value │ Description │
├────────┼──────────────────────────────────┤
include │ Includes min value. This is the │
│ │ default. │
├────────┼──────────────────────────────────┤
exclude │ Doesn't include min value. │
└────────┴──────────────────────────────────┘

Here is an example for exclude. The result doesn't include the "Good-bye Senna" record
because its created_at value is "2015/07/09 00:00:00".

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--min "2015/07/09 00:00:00" \
--min_border "exclude"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1436371200.0,
# 3,
# "Senna"
# ]
# ]
# ]
# ]

max
Specifies the maximum value of shard_key column. If shard doesn't have any matched
records, the shard isn't searched.

For example, max is "2015/07/08 23:59:59", Entry_20150709 isn't searched. Because
Entry_20150709 has only records for ""2015/07/09".

The following example only uses Entry_20150708 table. Entry_20150709 isn't used.

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--max "2015/07/08 23:59:59"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 1436284800.0,
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1436288400.0,
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

max_border
Specifies whether the maximum value is included or not. Here is available values.

┌────────┬──────────────────────────────────┐
│Value │ Description │
├────────┼──────────────────────────────────┤
include │ Includes max value. This is the │
│ │ default. │
├────────┼──────────────────────────────────┤
exclude │ Doesn't include max value. │
└────────┴──────────────────────────────────┘

Here is an example for exclude. The result doesn't include the "Good-bye Senna" record
because its created_at value is "2015/07/09 00:00:00".

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--max "2015/07/09 00:00:00" \
--max_border "exclude"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 1436284800.0,
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1436288400.0,
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

Search related parameters
logical_select provides select compatible search related parameters.

match_columns and query aren't supported yet. filter is only supported for now.

match_columns
Not implemented yet.

query
Not implemented yet.

filter
Corresponds to select-filter in select. See select-filter for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--filter "n_likes <= 5"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ],
# [
# 1,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1436367600.0,
# 3,
# "Senna"
# ],
# [
# 2,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1436371200.0,
# 3,
# "Senna"
# ]
# ]
# ]
# ]

Advanced search parameters
logical_select doesn't implement advanced search parameters yet.

match_escalation_threshold
Not implemented yet.

query_flags
Not implemented yet.

query_expander
Not implemented yet.

Output related parameters
output_columns
Corresponds to select-output-columns in select. See select-output-columns for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--output_columns '_key, *'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ],
# [
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 1436284800.0,
# 10,
# "Groonga"
# ],
# [
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1436288400.0,
# 15,
# "Groonga"
# ],
# [
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1436367600.0,
# 3,
# "Senna"
# ],
# [
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1436371200.0,
# 3,
# "Senna"
# ]
# ]
# ]
# ]

sortby
Corresponds to select-sortby in select. See select-sortby for details.

sortby has a limitation. It works only when the number of search target shards is one. If
the number of search target shards is larger than one, sortby doesn't work.

Here is an example that uses only one shard:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--min "2015/07/08 00:00:00" \
--min_border "include" \
--max "2015/07/09 00:00:00" \
--max_border "exclude" \
--sortby _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 1436284800.0,
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1436288400.0,
# 15,
# "Groonga"
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ]
# ]
# ]
# ]

offset
Corresponds to select-offset in select. See select-offset for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--offset 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1436288400.0,
# 15,
# "Groonga"
# ],
# [
# 1,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1436367600.0,
# 3,
# "Senna"
# ],
# [
# 2,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1436371200.0,
# 3,
# "Senna"
# ]
# ]
# ]
# ]

limit
Corresponds to select-limit in select. See select-limit for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "created_at",
# "Time"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 1436281200.0,
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 1436284800.0,
# 10,
# "Groonga"
# ]
# ]
# ]
# ]

scorer
Not implemented yet.

Drilldown related parameters
All drilldown related parameters in select are supported. See
select-drilldown-related-parameters for details.

drilldown
Corresponds to select-drilldown in select. See select-drilldown for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--output_columns _key,tag \
--drilldown tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# "The first post!",
# "Hello"
# ],
# [
# "Groonga",
# "Groonga"
# ],
# [
# "Mroonga",
# "Groonga"
# ],
# [
# "Good-bye Senna",
# "Senna"
# ],
# [
# "Good-bye Tritonn",
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

drilldown_sortby
Corresponds to select-drilldown-sortby in select. See select-drilldown-sortby for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown tag \
--drilldown_sortby -_nsubrecs,_key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ],
# [
# "Hello",
# 1
# ]
# ]
# ]
# ]

drilldown_output_columns
Corresponds to select-drilldown-output-columns in select. See
select-drilldown-output-columns for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown tag \
--drilldown_output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Hello"
# ],
# [
# "Groonga"
# ],
# [
# "Senna"
# ]
# ]
# ]
# ]

drilldown_offset
Corresponds to select-drilldown-offset in select. See select-drilldown-offset for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown tag \
--drilldown_offset 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

drilldown_limit
Corresponds to select-drilldown-limit in select. See select-drilldown-limit for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown tag \
--drilldown_limit 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ]
# ]
# ]
# ]

drilldown_calc_types
Corresponds to select-drilldown-calc-types in select. See select-drilldown-calc-types for
details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit -1 \
--output_columns tag,n_likes \
--drilldown tag \
--drilldown_calc_types MAX,MIN,SUM,AVG \
--drilldown_calc_target n_likes \
--drilldown_output_columns _key,_nsubrecs,_max,_min,_sum,_avg
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "tag",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# "Hello",
# 5
# ],
# [
# "Groonga",
# 10
# ],
# [
# "Groonga",
# 15
# ],
# [
# "Senna",
# 3
# ],
# [
# "Senna",
# 3
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ],
# [
# "_max",
# "Int64"
# ],
# [
# "_min",
# "Int64"
# ],
# [
# "_sum",
# "Int64"
# ],
# [
# "_avg",
# "Float"
# ]
# ],
# [
# "Hello",
# 1,
# 5,
# 5,
# 5,
# 5.0
# ],
# [
# "Groonga",
# 2,
# 15,
# 10,
# 25,
# 12.5
# ],
# [
# "Senna",
# 2,
# 3,
# 3,
# 6,
# 3.0
# ]
# ]
# ]
# ]

drilldown_calc_target
Corresponds to select-drilldown-calc-target in select. See select-drilldown-calc-target
for details.

See also drilldown_calc_types for an example.

Advanced drilldown related parameters
All advanced drilldown related parameters in select are supported. See
select-advanced-drilldown-related-parameters for details.

There are some limitations:

· _value.${KEY_NAME} in drilldown[${LABEL}].sortby doesn't work with multiple shards.
It works with one shard. _key in drilldown[${LABEL}].sortby work with multiple
shards.

drilldown[${LABEL}].keys
Corresponds to select-drilldown-label-keys in select. See select-drilldown-label-keys for
details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown[tag.n_likes].keys tag,n_likes \
--drilldown[tag.n_likes].output_columns _value.tag,_value.n_likes,_nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# {
# "tag.n_likes": [
# [
# 4
# ],
# [
# [
# "tag",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 5,
# 1
# ],
# [
# "Groonga",
# 10,
# 1
# ],
# [
# "Groonga",
# 15,
# 1
# ],
# [
# "Senna",
# 3,
# 2
# ]
# ]
# }
# ]
# ]

drilldown[${LABEL}].output_columns
Corresponds to select-drilldown-label-output-columns in select. See
select-drilldown-label-output-columns for details.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown[tag].keys tag \
--drilldown[tag].output_columns _key,_nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# {
# "tag": [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ]
# ]
# }
# ]
# ]

drilldown[${LABEL}].sortby
Corresponds to drilldown_sortby in not labeled drilldown.

drilldown[${LABEL}].sortby has a limitation.

_value.${KEY_NAME} in drilldown[${LABEL}].sortby doesn't work with multiple shards. It
works with one shard. _key in drilldown[${LABEL}].sortby work with multiple shards.

Here is an example that uses _value.${KEY_NAME} with only one shard:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--min "2015/07/08 00:00:00" \
--min_border "include" \
--max "2015/07/09 00:00:00" \
--max_border "exclude" \
--limit 0 \
--output_columns _id \
--drilldown[tag.n_likes].keys tag,n_likes \
--drilldown[tag.n_likes].output_columns _nsubrecs,_value.n_likes,_value.tag \
--drilldown[tag.n_likes].sortby -_nsubrecs,_value.n_likes,_value.tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# {
# "tag.n_likes": [
# [
# 3
# ],
# [
# [
# "_nsubrecs",
# "Int32"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# 5,
# "Hello"
# ],
# [
# 1,
# 10,
# "Groonga"
# ],
# [
# 1,
# 15,
# "Groonga"
# ]
# ]
# }
# ]
# ]

drilldown[${LABEL}].offset
Corresponds to drilldown_offset in not labeled drilldown.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown[tag.n_likes].keys tag \
--drilldown[tag.n_likes].offset 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# {
# "tag.n_likes": [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ]
# ]
# }
# ]
# ]

drilldown[${LABEL}].limit
Corresponds to drilldown_limit in not labeled drilldown.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown[tag.n_likes].keys tag \
--drilldown[tag.n_likes].limit 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# {
# "tag.n_likes": [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ]
# ]
# }
# ]
# ]

drilldown[${LABEL}].calc_types
Corresponds to drilldown_calc_types in not labeled drilldown.

Here is an example:

Execution example:

logical_select \
--logical_table Entries \
--shard_key created_at \
--limit 0 \
--output_columns _id \
--drilldown[tag].keys tag \
--drilldown[tag].calc_types MAX,MIN,SUM,AVG \
--drilldown[tag].calc_target n_likes \
--drilldown[tag].output_columns _key,_nsubrecs,_max,_min,_sum,_avg
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ]
# ]
# ],
# {
# "tag": [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ],
# [
# "_max",
# "Int64"
# ],
# [
# "_min",
# "Int64"
# ],
# [
# "_sum",
# "Int64"
# ],
# [
# "_avg",
# "Float"
# ]
# ],
# [
# "Hello",
# 1,
# 5,
# 5,
# 5,
# 5.0
# ],
# [
# "Groonga",
# 2,
# 15,
# 10,
# 25,
# 12.5
# ],
# [
# "Senna",
# 2,
# 3,
# 3,
# 6,
# 3.0
# ]
# ]
# }
# ]
# ]

drilldown[${LABEL}].calc_target
Corresponds to drilldown_calc_target in not labeled drilldown.

See also drilldown[${LABEL}].calc_types for an example.

Return value
The return value format of logical_select is compatible with select. See
select-return-value for details.

logical_shard_list
Summary
New in version 5.0.7.

logical_shard_list returns all existing shard names against the specified logical table
name.

Syntax
This command takes only one required parameter:

logical_shard_list logical_table

Usage
You need to register sharding plugin to use this command:

Execution example:

plugin_register sharding
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here are sample shards:

Execution example:

table_create Logs_20150801 TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150801 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Logs_20150802 TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150802 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Logs_20150930 TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20150930 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can get the all shard names in ascending order by specifying Logs as the logical table
name:

Execution example:

logical_shard_list --logical_table Logs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "name": "Logs_20150801"
# },
# {
# "name": "Logs_20150802"
# },
# {
# "name": "Logs_20150930"
# }
# ]
# ]

Parameters
This section describes parameters.

Required parameters
There is one required parameter.

logical_table
Specifies the logical table name. logical_shard_list returns a list of shard name of the
logical table:

Execution example:

logical_shard_list --logical_table Logs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "name": "Logs_20150801"
# },
# {
# "name": "Logs_20150802"
# },
# {
# "name": "Logs_20150930"
# }
# ]
# ]

The list is sorted by shard name in ascending order.

Optional parameters
There is no optional parameter.

Return value
The command returns a list of shard names in ascending order:

[
HEADER,
[
{"name": "SHARD_NAME_1"},
{"name": "SHARD_NAME_2"},
...
{"name": "SHARD_NAME_N"}
]
]

See /reference/command/output_format for HEADER.

See also
· /reference/sharding

logical_table_remove
Summary
New in version 5.0.5.

logical_table_remove removes tables and their columns for the specified logical table. If
there are one or more indexes against key of the tables and their columns, they are also
removed.

If you specify the part of a shard, table of the shard isn't removed. logical_table_remove
just deletes records in the table.

For example, there are the following records in a table:

· Record1: 2016-03-18 00:30:00

· Record2: 2016-03-18 01:00:00

· Record3: 2016-03-18 02:00:00

logical_table_remove deletes "Record1" and "Record2" when you specify range as between
2016-03-18 00:00:00 and 2016-03-18 01:30:00. logical_table_remove doesn't delete
"Record3". logical_table_remove doesn't remove the table.

New in version 6.0.1: You can also remove tables and columns that reference the target
table and tables related with the target shard by using dependent parameter.

Syntax
This command takes many parameters.

The required parameters are logical_table and shard_key:

logical_table_remove logical_table
shard_key
[min=null]
[min_border="include"]
[max=null]
[max_border="include"]
[dependent=no]

Usage
You specify logical table name and shard key what you want to remove.

This section describes about the followings:

· Basic usage

· Removes parts of a logical table

· Unremovable cases

· Removes with related tables

· Decreases used resources

Basic usage
Register sharding plugin to use this command in advance.

Execution example:

register sharding
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can remove all tables for the logical table by specifying only logical_table and
shard_key.

Here are commands to create 2 shards:

Execution example:

table_create Logs_20160318 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160318 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Logs_20160319 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160319 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can confirm existing shards by logical_shard_list:

Execution example:

logical_shard_list --logical_table Logs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "name": "Logs_20160318"
# },
# {
# "name": "Logs_20160319"
# }
# ]
# ]

You can remove all shards:

Execution example:

logical_table_remove \
--logical_table Logs \
--shard_key timestamp
# [[0, 1337566253.89858, 0.000355720520019531], true]

There are no shards after you remove all shards:

Execution example:

logical_shard_list --logical_table Logs
# [[0, 1337566253.89858, 0.000355720520019531], []]

Removes parts of a logical table
You can specify range of shards by the following parameters:

· min

· min_border

· max

· max_border

See the following documents of logical_select for each parameter:

· logical-select-min

· logical-select-min-border

· logical-select-max

· logical-select-max-border

If the specified range doesn't cover all records in a shard, table for the shard isn't
removed. Target records in the table are only deleted.

If the specified range covers all records in a shard, table for the shard is removed.

Here is a logical table to show the behavior. The logical table has two shards:

Execution example:

table_create Logs_20160318 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160318 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Logs_20160318
[
{"timestamp": "2016-03-18 00:30:00"},
{"timestamp": "2016-03-18 01:00:00"},
{"timestamp": "2016-03-18 02:00:00"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
table_create Logs_20160319 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160319 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Logs_20160319
[
{"timestamp": "2016-03-19 00:30:00"},
{"timestamp": "2016-03-19 01:00:00"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

There are the following records in Logs_20160318 table:

· Record1: "2016-03-18 00:30:00"

· Record2: "2016-03-18 01:00:00"

· Record3: "2016-03-18 02:00:00"

There are the following records in Logs_20160319 table:

· Record1: "2016-03-19 00:30:00"

· Record2: "2016-03-19 01:00:00"

The following range doesn't cover "Record1" in Logs_20160318 table but covers all records
in Logs_20160319 table:

┌───────────┬───────────────────────┐
│Parameter │ Value │
├───────────┼───────────────────────┤
min"2016-03-18 01:00:00"
├───────────┼───────────────────────┤
min_border"include"
├───────────┼───────────────────────┤
max"2016-03-19 01:30:00"
├───────────┼───────────────────────┤
max_border"include"
└───────────┴───────────────────────┘

logical_table_remove with the range deletes "Record2" and "Record3" in Logs_20160318 table
but doesn't remove Logs_20160318 table. Because there is "Record1" in Logs_20160318 table.

logical_table_remove with the range removes Logs_20160319 table because the range covers
all records in Logs_20160319 table.

Here is an example to use logical_table_remove with the range:

Execution example:

logical_table_remove \
--logical_table Logs \
--shard_key timestamp \
--min "2016-03-18 01:00:00" \
--min_border "include" \
--max "2016-03-19 01:30:00" \
--max_border "include"
# [[0, 1337566253.89858, 0.000355720520019531], true]

dump shows that there is "Record1" in Logs_20160318 table:

Execution example:

dump
# plugin_register sharding
#
# table_create Logs_20160318 TABLE_NO_KEY
# column_create Logs_20160318 timestamp COLUMN_SCALAR Time
#
# load --table Logs_20160318
# [
# ["_id","timestamp"],
# [1,1458228600.0]
# ]

Unremovable cases
There are some unremovable cases. See table-remove-unremovable-cases for details. Because
logical_table_remove uses the same checks.

Removes with related tables
New in version 6.0.1.

If you understand what you'll do, you can also remove tables and columns that depend on
the target shard with one logical_table_remove command by using --dependent yes parameter.

Here are conditions for dependent. If table or column satisfies one of the conditions, the
table or column depends on the target shard:

· Tables and columns that reference the target shard

· Tables for the shard (= The table has the same _YYYYMMDD postfix as the target shard
and is referenced from the target shard)

If there are one or more tables and columns that reference the target shard,
logical_table_remove is failed. It's for avoiding dangling references.

Bookmarks.log_20160320 column in the following is the column that references the target
shard:

Execution example:

table_create Logs_20160320 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160320 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Bookmarks TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Bookmarks log_20160320 COLUMN_SCALAR Logs_20160320
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can't remove Logs_20160320 by logical_table_remove by default:

Execution example:

logical_table_remove \
--logical_table Logs \
--shard_key timestamp
# [
# [
# -2,
# 1337566253.89858,
# 0.000355720520019531,
# "operation not permitted: <[table][remove] a column that references the table exists: <Bookmarks.log_20160320> -> <Logs_20160320",
# [
# [
# "Groonga::Sharding::LogicalTableRemoveCommand.remove_table",
# "/home/kou/work/c/groonga.clean/plugins/sharding/logical_table_remove.rb",
# 80
# ]
# ]
# ]
# ]

You can remove Logs_20160320 by logical_table_remove with --dependent yes parameter.
Bookmarks.log_20160320 is also removed:

Execution example:

logical_table_remove \
--logical_table Logs \
--shard_key timestamp \
--dependent yes
# [[0, 1337566253.89858, 0.000355720520019531], true]

object_exist shows that Logs_20160320 table and Bookmarks.log_20160320 column are removed:

Execution example:

object_exist Logs_20160320
# [[0, 1337566253.89858, 0.000355720520019531], false]
object_exist Bookmarks.log_20160320
# [[0, 1337566253.89858, 0.000355720520019531], false]

If there is one or more tables for the target shard, logical_table_remove with --dependent
yes also removes them. Tables that have the same _YYYYMMDD postfix as the target shard are
treated as tables for the target shard.

Here are two tables that have _20160320 postfix. NotRelated_20160320 table isn't used by
Logs_20160320 table. Users_20160320 table is used by Logs_20160320 table. Servers table
exists and used by Logs_20160320 table:

Execution example:

table_create NotRelated_20160320 TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Users_20160320 TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Servers TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Logs_20160320 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160320 timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160320 user COLUMN_SCALAR Users_20160320
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs_20160320 server COLUMN_SCALAR Servers
# [[0, 1337566253.89858, 0.000355720520019531], true]

logical_table_remove with --dependent yes parameter removes only Logs_20160320 table and
Users_20160320 table. Because Users_20160320 table has _20160320 postfix and used by
Logs_20160320. NotRelated_20160320 table and Servers table aren't removed. Because
NotRelated_20160320 table has _20160320 postfix but isn't used by Logs_20160320. Servers
table is used by Logs_20160320 but doesn't have _20160320 postfix:

Execution example:

logical_table_remove \
--logical_table Logs \
--shard_key timestamp \
--dependent yes
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can confirm that Logs_20160320 table and Users_20160320 table are removed but
NotRelated_20160320 table and Servers table aren't removed:

Execution example:

object_exist Logs_20160320
# [[0, 1337566253.89858, 0.000355720520019531], false]
object_exist Users_20160320
# [[0, 1337566253.89858, 0.000355720520019531], false]
object_exist NotRelated_20160320
# [[0, 1337566253.89858, 0.000355720520019531], true]
object_exist Servers
# [[0, 1337566253.89858, 0.000355720520019531], true]

Decreases used resources
You can decrease resources for this command. See table-remove-decreases-used-resources for
details. Because logical_table_remove uses the same logic as table_remove.

Parameters
This section describes parameters of logical_table_remove.

Required parameters
There are required parameters.

logical_table
Specifies logical table name. It means table name without _YYYYMMDD postfix. If you use
actual table such as Logs_20150203, Logs_20150203 and so on, logical table name is Logs.

See also logical-select-logical-table.

shard_key
Specifies column name which is treated as shared key.

See also logical-select-shard-key.

Optional parameters
There are optional parameters.

min
Specifies the minimum value of shard_key column.

See also logical-select-min.

min_border
Specifies whether the minimum value is included or not. include and exclude are available.
The default is include.

See also logical-select-min-border.

max
Specifies the maximum value of shard_key column.

See also logical-select-max.

max_border
Specifies whether the maximum value is included or not. include and exclude are available.
The default is include.

See also logical-select-max-border.

dependent
New in version 6.0.1.

Specifies whether tables and columns that depend on the target shard are also removed or
not.

Here are conditions for dependent. If table or column satisfies one of the conditions, the
table or column depends on the target shard:

· Tables and columns that reference the target shard

· Tables for the shard (= The table has the same _YYYYMMDD postfix as the target shard
and is referenced from the target shard)

If this value is yes, tables and columns that depend on the target shard are also removed.
Otherwise, they aren't removed. If there are one or more tables that reference the target
shard, an error is returned. If there are tables for the shared, they are not touched.

You should use this parameter carefully. This is a danger parameter.

See Removes with related tables how to use this parameter.

Return value
The command returns true as body on success such as:

[HEADER, true]

If the command fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

normalize
NOTE:
This command is an experimental feature.

This command may be changed in the future.

Summary
normalize command normalizes text by the specified normalizer.

There is no need to create table to use normalize command. It is useful for you to check
the results of normalizer.

Syntax
This command takes three parameters.

normalizer and string are required. Others are optional:

normalize normalizer
string
[flags=NONE]

Usage
Here is a simple example of normalize command.

Execution example:

normalize NormalizerAuto "aBcDe 123"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "normalized": "abcde 123",
# "types": [],
# "checks": []
# }
# ]

Parameters
This section describes parameters of normalizer.

Required parameters
There are required parameters, normalizer and string.

normalizer
Specifies the normalizer name. normalize command uses the normalizer that is named
normalizer.

See /reference/normalizers about built-in normalizers.

Here is an example to use built-in NormalizerAuto normalizer.

TODO

If you want to use other normalizers, you need to register additional normalizer plugin by
register command. For example, you can use MySQL compatible normalizer by registering
groonga-normalizer-mysql.

string
Specifies any string which you want to normalize.

If you want to include spaces in string, you need to quote string by single quotation (')
or double quotation (").

Here is an example to use spaces in string.

TODO

Optional parameters
There are optional parameters.

flags
Specifies a normalization customize options. You can specify multiple options separated by
"|". For example, REMOVE_BLANK|WITH_TYPES.

Here are available flags.

┌───────────────────────────┬───────────────┐
│Flag │ Description │
├───────────────────────────┼───────────────┤
NONE │ Just ignored. │
├───────────────────────────┼───────────────┤
REMOVE_BLANK │ TODO │
├───────────────────────────┼───────────────┤
WITH_TYPES │ TODO │
├───────────────────────────┼───────────────┤
WITH_CHECKS │ TODO │
├───────────────────────────┼───────────────┤
REMOVE_TOKENIZED_DELIMITER │ TODO │
└───────────────────────────┴───────────────┘

Here is an example that uses REMOVE_BLANK.

TODO

Here is an example that uses WITH_TYPES.

TODO

Here is an example that uses REMOVE_TOKENIZED_DELIMITER.

TODO

Return value
[HEADER, normalized_text]

HEADER
See /reference/command/output_format about HEADER.

normalized_text
normalized_text is an object that has the following attributes.

┌───────────┬──────────────────────────────────┐
│Name │ Description │
├───────────┼──────────────────────────────────┤
normalized │ The normalized text. │
├───────────┼──────────────────────────────────┤
types │ An array of types of the │
│ │ normalized text. The N-th types
│ │ shows the type of the N-th │
│ │ character in normalized. │
└───────────┴──────────────────────────────────┘

See also
· /reference/normalizers

normalizer_list
Summary
normalizer_list command lists normalizers in a database.

Syntax
This command takes no parameters:

normalizer_list

Usage
Here is a simple example.

Execution example:

normalizer_list
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "name": "NormalizerAuto"
# },
# {
# "name": "NormalizerNFKC51"
# }
# ]
# ]

It returns normalizers in a database.

Return value
normalizer_list command returns normalizers. Each normalizers has an attribute that
contains the name. The attribute will be increased in the feature:

[HEADER, normalizers]

HEADER
See /reference/command/output_format about HEADER.

normalizers
normalizers is an array of normalizer. Normalizer is an object that has the following
attributes.

┌─────┬──────────────────┐
│Name │ Description │
├─────┼──────────────────┤
name │ Normalizer name. │
└─────┴──────────────────┘

See also
· /reference/normalizers

· /reference/commands/normalize

object_exist
Summary
New in version 5.0.6.

object_exist returns whether object with the specified name exists or not in database.

It's a light operation. It just checks existence of the name in the database. It doesn't
load the specified object from disk.

object_exist doesn't check object type. The existing object may be table, column, function
and so on.

Syntax
This command takes only one required parameter:

object_exist name

Usage
You can check whether the name is already used in database:

Execution example:

object_exist Users
# [[0, 1337566253.89858, 0.000355720520019531], false]
table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
object_exist Users
# [[0, 1337566253.89858, 0.000355720520019531], true]

The object_exist Users returns false before you create Users table.

The object_exist Users returns true after you create Users table.

Parameters
This section describes all parameters.

Required parameters
There is only one required parameters.

name
Specifies the object name to be checked.

If you want to check existence of a column, use TABLE_NAME.COLUMN_NAME format like the
following:

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
object_exist Logs.timestamp
# [[0, 1337566253.89858, 0.000355720520019531], true]

Logs is table name and timestamp is column name in Logs.timestamp.

Optional parameters
There is no optional parameter.

Return value
The command returns true as body if object with the specified name exists in database such
as:

[HEADER, true]

The command returns false otherwise such as:

[HEADER, false]

See /reference/command/output_format for HEADER.

object_inspect
Summary
New in version 6.0.0.

object_inspect inspects an object. You can confirm details of an object.

For example:

· If the object is a table, you can confirm the number of records in the table.

· If the object is a column, you can confirm the type of value of the column.

Syntax
This command takes only one optional parameter:

object_inspect [name=null]

Usage
You can inspect an object in the database specified by name:

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "Alice"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
object_inspect Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "name": "Users",
# "n_records": 1,
# "value": {
# "type": null
# },
# "key": {
# "total_size": 5,
# "max_total_size": 4294967295,
# "type": {
# "size": 4096,
# "type": {
# "id": 32,
# "name": "type"
# },
# "id": 14,
# "name": "ShortText"
# }
# },
# "type": {
# "id": 48,
# "name": "table:hash_key"
# },
# "id": 256
# }
# ]

The object_inspect Users returns the following information:

· The name of the table: "name": Users

· The total used key size: "key": {"total_size": 5} ("Alice" is 5 byte data)

· The maximum total key size: "key": {"max_total_size": 4294967295}

· and so on.

You can inspect the database by not specifying name:

Execution example:

object_inspect
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "name_table": {
# "name": "",
# "n_records": 256,
# "value": null,
# "key": {
# "type": null
# },
# "type": {
# "id": 50,
# "name": "table:dat_key"
# },
# "id": 0
# },
# "type": {
# "id": 55,
# "name": "db"
# }
# }
# ]

The object_inspect returns the following information:

· The table type for object name management: "key": {"type": {"name": "table:dat_key"}}

· and so on.

Parameters
This section describes all parameters.

Required parameters
There is no required parameter.

Optional parameters
There is only one optional parameter.

name
Specifies the object name to be inspected.

If name isn't specified, the database is inspected.

Return value
The command returns an object (nested key and value pairs) that includes details of the
object (such as table) as body:

[HEADER, object]

See /reference/command/output_format for HEADER.

The format of the details is depends on object type. For example, table has key
information but function doesn't have key information.

Database
Database inspection returns the following information:

{
"type": {
"id": DATABASE_TYPE_ID,
"name": DATABASE_TYPE_NAME
},
"name_table": DATABASE_NAME_TABLE
}

DATABASE_TYPE_ID
DATABASE_TYPE_ID is always 55.

DATABASE_TYPE_NAME
DATABASE_TYPE_NAME is always "db".

DATABASE_NAME_TABLE
DATABASE_NAME_TABLE is a table for managing object names in the database. The table is
table-pat-key or table-dat-key. Normally, it's table-dat-key.

See Table for format details.

Table
Table inspection returns the following information:

{
"name": TABLE_NAME,
"type": {
"id": TABLE_TYPE_ID,
"name": TABLE_TYPE_NAME
},
"key": {
"type": TABLE_KEY_TYPE,
"total_size": TABLE_KEY_TOTAL_SIZE
"max_total_size": TABLE_KEY_MAX_TOTAL_SIZE
},
"value": {
"type": TABLE_VALUE_TYPE,
},
"n_records": TABLE_N_RECORDS
}

There are some exceptions:

· table-no-key doesn't return key information because it doesn't have key.

· table-dat-key doesn't return value information because it doesn't have value.

TABLE_NAME
The name of the inspected table.

TABLE_TYPE_ID
The type ID of the inspected table.

Here is a list of type IDs:

┌───────────────┬────┐
│Table type │ ID │
├───────────────┼────┤
│table-hash-key │ 48
├───────────────┼────┤
│table-pat-key │ 49
├───────────────┼────┤
│table-dat-key │ 50
├───────────────┼────┤
│table-no-key │ 51
└───────────────┴────┘

TABLE_TYPE_NAME
The type name of the inspected table.

Here is a list of type names:

┌───────────────┬──────────────────┐
│Table type │ Name │
├───────────────┼──────────────────┤
│table-hash-key │ "table:hash_key"
├───────────────┼──────────────────┤
│table-pat-key │ "table:pat_key"
├───────────────┼──────────────────┤
│table-dat-key │ "table:dat_key"
├───────────────┼──────────────────┤
│table-no-key │ "table:no_key"
└───────────────┴──────────────────┘

TABLE_KEY_TYPE
The type of key of the inspected table.

See Type for format details.

TABLE_KEY_TOTAL_SIZE
The total key size of the inspected table in bytes.

TABLE_KEY_MAX_TOTAL_SIZE
The maximum total key size of the inspected table in bytes.

TABLE_VALUE_TYPE
The type of value of the inspected table.

See Type for format details.

TABLE_N_RECORDS
The number of records of the inspected table.

It's a 64bit unsigned integer value.

Type
Type inspection returns the following information:

{
"id": TYPE_ID,
"name": TYPE_NAME,
"type": {
"id": TYPE_ID_OF_TYPE,
"name": TYPE_NAME_OF_TYPE
},
"size": TYPE_SIZE
}

TYPE_ID
The ID of the inspected type.

Here is an ID list of builtin types:

┌─────────────────────────────┬────┐
│Type │ ID │
├─────────────────────────────┼────┤
│builtin-type-bool │ 3
├─────────────────────────────┼────┤
│builtin-type-int8 │ 4
├─────────────────────────────┼────┤
│builtin-type-uint8 │ 5
├─────────────────────────────┼────┤
│builtin-type-int16 │ 6
├─────────────────────────────┼────┤
│builtin-type-uint16 │ 7
├─────────────────────────────┼────┤
│builtin-type-int32 │ 8
├─────────────────────────────┼────┤
│builtin-type-uint32 │ 9
├─────────────────────────────┼────┤
│builtin-type-int64 │ 10
├─────────────────────────────┼────┤
│builtin-type-uint64 │ 11
├─────────────────────────────┼────┤
│builtin-type-float │ 12
└─────────────────────────────┴────┘

│builtin-type-time │ 13
├─────────────────────────────┼────┤
│builtin-type-short-text │ 14
├─────────────────────────────┼────┤
│builtin-type-text │ 15
├─────────────────────────────┼────┤
│builtin-type-long-text │ 16
├─────────────────────────────┼────┤
│builtin-type-tokyo-geo-point │ 17
├─────────────────────────────┼────┤
│builtin-type-wgs84-geo-point │ 18
└─────────────────────────────┴────┘

TYPE_NAME
The name of the inspected type.

Here is a name list of builtin types:

· builtin-type-bool

· builtin-type-int8

· builtin-type-uint8

· builtin-type-int16

· builtin-type-uint16

· builtin-type-int32

· builtin-type-uint32

· builtin-type-int64

· builtin-type-uint64

· builtin-type-float

· builtin-type-time

· builtin-type-short-text

· builtin-type-text

· builtin-type-long-text

· builtin-type-tokyo-geo-point

· builtin-type-wgs84-geo-point

TYPE_ID_OF_TYPE
TYPE_ID_OF_TYPE is always 32.

TYPE_NAME_OF_TYPE
TYPE_NAME_OF_TYPE is always type.

TYPE_SIZE
TYPE_SIZE is the size of the inspected type in bytes. If the inspected type is variable
size type, the size means the maximum size.

object_remove
Summary
New in version 6.0.0.

object_remove removes an object. You can remove any object including table, column,
command and so on. Normally, you should use specific remove command such as table_remove
and column_remove.

object_remove is danger because you can remove any object. You should use object_remove
carefully.

object_remove has "force mode". You can remove a broken object by "force mode". "Force
mode" is useful to resolve problems reported by /reference/executables/grndb.

Syntax
This command takes two parameters:

object_remove name
[force=no]

Usage
You can remove an object in the database specified by name:

Execution example:

object_remove Users
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "[object][remove] target object doesn't exist: <Users>",
# [
# [
# "command_object_remove",
# "proc_object.c",
# 121
# ]
# ]
# ],
# false
# ]
table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
object_remove Users
# [[0, 1337566253.89858, 0.000355720520019531], true]

The object_remove Users returns false before you create Users table.

The object_remove Users returns true after you create Users table.

You can't remove a broken object by default:

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
thread_limit 1
# [[0, 1337566253.89858, 0.000355720520019531], 1]
database_unmap
# [[0, 1337566253.89858, 0.000355720520019531], true]
echo "BROKEN" > ${DB_PATH}.0000100
object_remove Users
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "[object][remove] failed to open the target object: <Users>",
# [
# [
# "command_object_remove",
# "proc_object.c",
# 116
# ]
# ]
# ],
# false
# ]
object_exist Users
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can remove a broken object by --force yes:

Execution example:

object_remove Users --force yes
# [
# [
# -65,
# 1337566253.89858,
# 0.000355720520019531,
# "[io][open] file size is too small: <7>(required: >= 64): </tmp/groonga-databases/commands_object_remove.0000100>",
# [
# [
# "grn_io_open",
# "io.c",
# 565
# ]
# ]
# ],
# false
# ]
object_exist Users
# [[0, 1337566253.89858, 0.000355720520019531], false]

--force yes means you enable "force mode". You can remove a broken object in "force mode".

Parameters
This section describes all parameters.

Required parameters
There is only one required parameter.

name
Specifies the object name to be removed.

If you want to remove a column, use TABLE_NAME.COLUMN_NAME format like the following:

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
object_remove Logs.timestamp
# [[0, 1337566253.89858, 0.000355720520019531], true]

Logs is table name and timestamp is column name in Logs.timestamp.

Optional parameters
There is one optional parameter.

force
Specifies whether removing the object in "force mode".

You can't remove a broken object by default. But you can remove a broken object in "force
mode".

force value must be yes or no. yes means that "force mode" is enabled. no means that
"force mode" is disabled.

The default value is no. It means that "force mode" is disabled by default.

Return value
The command returns true as body when the command removed the specified object without any
error. For example:

[HEADER, true]

The command returns false as body when the command gets any errors. For example:

[HEADER, false]

See /reference/command/output_format for HEADER.

Note that false doesn't mean that "the command can't remove the object". If you enable
"force mode", the command removes the object even if the object is broken. In the case,
the object is removed and false is returned as body.

plugin_register
New in version 5.0.1.

Summary
plugin_register command registers a plugin. You need to register a plugin before you use a
plugin.

You need just one plugin_register command for a plugin in the same database because
registered plugin information is written into the database. When you restart your groonga
process, groonga process loads all registered plugins without plugin_register command.

You can unregister a registered plugin by plugin_unregister.

Syntax
This command takes only one required parameter:

plugin_register name

Usage
Here is a sample that registers QueryExpanderTSV query expander that is included in
${PREFIX}/lib/groonga/plugins/query_expanders/tsv.so.

Execution example:

plugin_register query_expanders/tsv
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can omit ${PREFIX}/lib/groonga/plugins/ and suffix (.so). They are completed
automatically.

You can specify absolute path such as plugin_register
/usr/lib/groonga/plugins/query_expanders/tsv.so.

Return value
plugin_register returns true as body on success such as:

[HEADER, true]

If plugin_register fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

See also
· plugin_unregister

plugin_unregister
NOTE:
This command is an experimental feature.

New in version 5.0.1.

Summary
plugin_unregister command unregisters a plugin.

Syntax
This command takes only one required parameter:

plugin_unregister name

Usage
Here is a sample that unregisters QueryExpanderTSV query expander that is included in
${PREFIX}/lib/groonga/plugins/query_expanders/tsv.so.

Execution example:

plugin_unregister query_expanders/tsv
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can omit ${PREFIX}/lib/groonga/plugins/ and suffix (.so). They are completed
automatically.

You can specify absolute path such as plugin_unregister
/usr/lib/groonga/plugins/query_expanders/tsv.so.

Return value
plugin_unregister returns true as body on success such as:

[HEADER, true]

If plugin_unregister fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

See also
· plugin_register

quit
Summary
quit - セッション終了

Groonga組込コマンドの一つであるquitについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

quitは、groongaプロセスとのセッションを終了します。クライアントプロセスならばgroongaプロセスとの接続を切ります。

Syntax
quit

Usage
quit

Parameters
ありません。

Return value
ありません。

range_filter
Summary
TODO: write me

Syntax
Usage
Return value
See also
· /reference/commands/select

register
Deprecated since version 5.0.1: Use plugin_register instead.

Summary
register command registers a plugin. You need to register a plugin before you use a
plugin.

You need just one register command for a plugin in the same database because registered
plugin information is written into the database. When you restart your groonga process,
groonga process loads all registered plugins without register command.

NOTE:
Registered plugins can be removed since Groonga 5.0.1. Use plugin_unregister in such a
case.

Syntax
This command takes only one required parameter:

register path

Usage
Here is a sample that registers QueryExpanderTSV query expander that is included in
${PREFIX}/lib/groonga/plugins/query_expanders/tsv.so.

Execution example:

register query_expanders/tsv
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can omit ${PREFIX}/lib/groonga/plugins/ and suffix (.so). They are completed
automatically.

You can specify absolute path such as register
/usr/lib/groonga/plugins/query_expanders/tsv.so.

Return value
register returns true as body on success such as:

[HEADER, true]

If register fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

See also
· plugin_register

· plugin_unregister

reindex
Summary
New in version 5.1.0.

reindex command recreates one or more index columns.

If you specify a database as target object, all index columns are recreated.

If you specify a table as target object, all index columns in the table are recreated.

If you specify a data column as target object, all index columns for the data column are
recreated.

If you specify an index column as target object, the index column is recreated.

This command is useful when your index column is broken. The target object is one of
database, table and column.

NOTE:
You can't use target index columns while reindex command is running. If you use the
same database from multiple processes, all processes except running reindex should
reopen the database. You can use database_unmap for reopening database.

Syntax
This command takes only one optional parameter:

reindex [target_name=null]

If target_name parameters is omitted, database is used for the target object. It means
that all index columns in the database are recreated.

Usage
Here is an example to recreate all index columns in the database:

Execution example:

reindex
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to recreate all index columns (Lexicon.entry_key and
Lexicon.entry_body) in Lexicon table:

Execution example:

table_create Entry TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entry body COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Lexicon TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon entry_key COLUMN_INDEX|WITH_POSITION \
Entry _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon entry_body COLUMN_INDEX|WITH_POSITION \
Entry body
# [[0, 1337566253.89858, 0.000355720520019531], true]
reindex Lexicon
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to recreate all index columns (BigramLexicon.site_title and
RegexpLexicon.site_title) of Site.title data column:

Execution example:

table_create Site TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Site title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create BigramLexicon TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create BigramLexicon site_title COLUMN_INDEX|WITH_POSITION \
Site title
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create RegexpLexicon TABLE_PAT_KEY ShortText \
--default_tokenizer TokenRegexp \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create RegexpLexicon site_title COLUMN_INDEX|WITH_POSITION \
Site title
# [[0, 1337566253.89858, 0.000355720520019531], true]
reindex Site.title
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example to recreate an index column (Timestamp.index):

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs timestamp COLUMN_SCALAR Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Timestamp TABLE_PAT_KEY Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Timestamp logs_timestamp COLUMN_INDEX Logs timestamp
# [[0, 1337566253.89858, 0.000355720520019531], true]
reindex Timestamp.logs_timestamp
# [[0, 1337566253.89858, 0.000355720520019531], true]

Parameters
This section describes all parameters.

target_name
Specifies the name of table or column.

If you don't specify it, database is used for the target object.

The default is none. It means that the target object is database.

Return value
reindex command returns whether recreation is succeeded or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

request_cancel
Summary
NOTE:
This command is an experimental feature.

New in version 4.0.9.

request_cancel command cancels a running request.

There are some limitations:

· Request ID must be managed by user. (You need to assign unique key for each request.)

· Cancel request may be ignored. (You can send request_cancel command multiple times
for the same request ID.)

· Only multithreading type Groonga server is supported. (You can use with
/reference/executables/groonga based server but can't use with
/reference/executables/groonga-httpd.)

See /reference/command/request_id about request ID.

If request is canceled, the canceled request has -5 (GRN_INTERRUPTED_FUNCTION_CALL) as
/reference/command/return_code.

Syntax
This command takes only one required parameter:

request_cancel id

Usage
Here is an example of request_cancel command:

$ curl 'http://localhost:10041/d/select?table=LargeTable&filter=true&request_id=unique-id-1' &
# The above "select" takes a long time...
# Point: "request_id=unique-id-1"
$ curl 'http://localhost:10041/d/request_cancel?id=unique-id-1'
[[...], {"id": "unique-id-1", "canceled": true}]
# Point: "id=unique-id-1"

Assume that the first select command takes a long time. unique-id-1 request ID is assigned
to the select command by request_id=unique-id-1 parameter.

The second request_cancel command passes id=unique-id-1 parameter. unique-id-1 is the same
request ID passed in select command.

The select command may not be canceled immediately. And the cancel request may be ignored.

You can send cancel request for the same request ID multiple times. If the target request
is canceled or finished, "canceled" value is changed to false from true in return value:

$ curl 'http://localhost:10041/d/request_cancel?id=unique-id-1'
[[...], {"id": "unique-id-1", "canceled": true}]
# "select" is still running... ("canceled" is "true")
$ curl 'http://localhost:10041/d/request_cancel?id=unique-id-1'
[[...], {"id": "unique-id-1", "canceled": true}]
# "select" is still running... ("canceled" is "true")
$ curl 'http://localhost:10041/d/request_cancel?id=unique-id-1'
[[...], {"id": "unique-id-1", "canceled": false}]
# "select" is canceled or finished. ("canceled" is "false")

If the select command is canceled, response of the select command has -5
(GRN_INTERRUPTED_FUNCTION_CALL) as /reference/command/return_code:

$ curl 'http://localhost:10041/d/select?table=LargeTable&filter=true&request_id=unique-id-1' &
[[-5, ...], ...]

Parameters
This section describes parameters of request_cancel.

Required parameters
There is required parameter, id.

id
Specifies the ID for the target request.

Return value
request_cancel command returns the result of the cancel request:

[
HEADER,
{
"id": ID,
"canceled": CANCEL_REQUEST_IS_ACCEPTED_OR_NOT
}
]

HEADER
See /reference/command/output_format about HEADER.

ID
The ID of the target request.

CANCEL_REQUEST_IS_ACCEPTED_OR_NOT
If the cancel request is accepted, this is true, otherwise this is false.

Note that "cancel request is accepted" doesn't means that "the target request is
canceled". It just means "cancel request is notified to the target request but the
cancel request may be ignored by the target request".

If request assigned with the request ID doesn't exist, this is false.

See also
· /reference/command/request_id

ruby_eval
Summary
ruby_eval command evaluates Ruby script and returns the result.

Syntax
This command takes only one required parameter:

ruby_eval script

Usage
You can execute any scripts which mruby supports by calling ruby_eval.

Here is an example that just calculate 1 + 2 as Ruby script.

Execution example:

register ruby/eval
# [[0, 1337566253.89858, 0.000355720520019531], true]
ruby_eval "1 + 2"
# [[0, 1337566253.89858, 0.000355720520019531], {"value": 3}]

Register ruby/eval plugin to use ruby_eval command in advance.

Note that ruby_eval is implemented as an experimental plugin, and the specification may be
changed in the future.

Parameters
This section describes all parameters.

script
Specifies the Ruby script which you want to evaluate.

Return value
ruby_eval returns the evaluated result with metadata such as exception information
(Including metadata isn't implemented yet):

[HEADER, {"value": EVALUATED_VALUE}]

HEADER
See /reference/command/output_format about HEADER.

EVALUATED_VALUE
EVALUATED_VALUE is the evaludated value of ruby_script.

ruby_eval supports only a number for evaluated value for now. Supported types will be
increased in the future.

See also
ruby_load
Summary
ruby_load command loads specified Ruby script.

Syntax
This command takes only one required parameter:

ruby_load path

Usage
You can load any script file which mruby supports by calling ruby_load.

Here is an example that just load expression.rb as Ruby script.

Execution example:

register ruby/load
# [[0, 1337566253.89858, 0.000355720520019531], true]
ruby_load "expression.rb"
# [[0, 1337566253.89858, 0.000355720520019531], {"value": null}]

Register ruby/load plugin to use ruby_load command in advance.

Note that ruby_load is implemented as an experimental plugin, and the specification may be
changed in the future.

Parameters
This section describes all parameters.

path
Specifies the Ruby script path which you want to load.

Return value
ruby_load returns the loaded result with metadata such as exception information (Including
metadata isn't implemented yet):

[HEADER, {"value": LOADED_VALUE}]

HEADER
See /reference/command/output_format about HEADER.

LOADED_VALUE
LOADED_VALUE is the loaded value of ruby script.

ruby_load just return null as LOADED_VALUE for now, it will be supported in the future.

See also
/reference/commands/ruby_eval

schema
Summary
New in version 5.0.9.

schema command returns schema in the database.

This command is useful when you want to inspect the database. For example, visualizing the
database, creating GUI for the database and so on.

Syntax
This command takes no parameters:

schema

Usage
Here is an example schema to show example output:

Execution example:

table_create Memos TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Memos content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms memos_content_index \
COLUMN_INDEX|WITH_POSITION \
Memos content
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an output of schema command against this example schema:

Execution example:

schema
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "tables": {
# "Terms": {
# "normalizer": {
# "name": "NormalizerAuto"
# },
# "name": "Terms",
# "tokenizer": {
# "name": "TokenBigram"
# },
# "command": {
# "command_line": "table_create --name Terms --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto",
# "name": "table_create",
# "arguments": {
# "key_type": "ShortText",
# "default_tokenizer": "TokenBigram",
# "normalizer": "NormalizerAuto",
# "flags": "TABLE_PAT_KEY",
# "name": "Terms"
# }
# },
# "indexes": [],
# "key_type": {
# "type": "type",
# "name": "ShortText"
# },
# "value_type": null,
# "token_filters": [],
# "type": "patricia trie",
# "columns": {
# "memos_content_index": {
# "name": "memos_content_index",
# "weight": false,
# "section": false,
# "compress": null,
# "command": {
# "command_line": "column_create --table Terms --name memos_content_index --flags COLUMN_INDEX|WITH_POSITION --type Memos --sources content",
# "name": "column_create",
# "arguments": {
# "table": "Terms",
# "flags": "COLUMN_INDEX|WITH_POSITION",
# "name": "memos_content_index",
# "sources": "content",
# "type": "Memos"
# }
# },
# "indexes": [],
# "sources": [
# {
# "table": "Memos",
# "name": "content",
# "full_name": "Memos.content"
# }
# ],
# "value_type": {
# "type": "reference",
# "name": "Memos"
# },
# "full_name": "Terms.memos_content_index",
# "position": true,
# "table": "Terms",
# "type": "index"
# }
# }
# },
# "Memos": {
# "normalizer": null,
# "name": "Memos",
# "tokenizer": null,
# "command": {
# "command_line": "table_create --name Memos --flags TABLE_HASH_KEY --key_type ShortText",
# "name": "table_create",
# "arguments": {
# "key_type": "ShortText",
# "flags": "TABLE_HASH_KEY",
# "name": "Memos"
# }
# },
# "indexes": [],
# "key_type": {
# "type": "type",
# "name": "ShortText"
# },
# "value_type": null,
# "token_filters": [],
# "type": "hash table",
# "columns": {
# "content": {
# "name": "content",
# "weight": false,
# "section": false,
# "compress": null,
# "command": {
# "command_line": "column_create --table Memos --name content --flags COLUMN_SCALAR --type Text",
# "name": "column_create",
# "arguments": {
# "table": "Memos",
# "flags": "COLUMN_SCALAR",
# "name": "content",
# "type": "Text"
# }
# },
# "indexes": [
# {
# "table": "Terms",
# "section": 0,
# "name": "memos_content_index",
# "full_name": "Terms.memos_content_index"
# }
# ],
# "sources": [],
# "value_type": {
# "type": "type",
# "name": "Text"
# },
# "full_name": "Memos.content",
# "position": false,
# "table": "Memos",
# "type": "scalar"
# }
# }
# }
# },
# "normalizers": {
# "NormalizerNFKC51": {
# "name": "NormalizerNFKC51"
# },
# "NormalizerAuto": {
# "name": "NormalizerAuto"
# }
# },
# "token_filters": {},
# "tokenizers": {
# "TokenBigramSplitSymbolAlphaDigit": {
# "name": "TokenBigramSplitSymbolAlphaDigit"
# },
# "TokenRegexp": {
# "name": "TokenRegexp"
# },
# "TokenBigramIgnoreBlankSplitSymbolAlphaDigit": {
# "name": "TokenBigramIgnoreBlankSplitSymbolAlphaDigit"
# },
# "TokenBigram": {
# "name": "TokenBigram"
# },
# "TokenDelimit": {
# "name": "TokenDelimit"
# },
# "TokenUnigram": {
# "name": "TokenUnigram"
# },
# "TokenBigramSplitSymbol": {
# "name": "TokenBigramSplitSymbol"
# },
# "TokenDelimitNull": {
# "name": "TokenDelimitNull"
# },
# "TokenBigramIgnoreBlankSplitSymbolAlpha": {
# "name": "TokenBigramIgnoreBlankSplitSymbolAlpha"
# },
# "TokenBigramSplitSymbolAlpha": {
# "name": "TokenBigramSplitSymbolAlpha"
# },
# "TokenTrigram": {
# "name": "TokenTrigram"
# },
# "TokenMecab": {
# "name": "TokenMecab"
# },
# "TokenBigramIgnoreBlankSplitSymbol": {
# "name": "TokenBigramIgnoreBlankSplitSymbol"
# },
# "TokenBigramIgnoreBlank": {
# "name": "TokenBigramIgnoreBlank"
# }
# },
# "plugins": {},
# "types": {
# "UInt64": {
# "can_be_key_type": true,
# "name": "UInt64",
# "can_be_value_type": true,
# "size": 8
# },
# "Int32": {
# "can_be_key_type": true,
# "name": "Int32",
# "can_be_value_type": true,
# "size": 4
# },
# "Int16": {
# "can_be_key_type": true,
# "name": "Int16",
# "can_be_value_type": true,
# "size": 2
# },
# "LongText": {
# "can_be_key_type": false,
# "name": "LongText",
# "can_be_value_type": false,
# "size": 2147483648
# },
# "TokyoGeoPoint": {
# "can_be_key_type": true,
# "name": "TokyoGeoPoint",
# "can_be_value_type": true,
# "size": 8
# },
# "Text": {
# "can_be_key_type": false,
# "name": "Text",
# "can_be_value_type": false,
# "size": 65536
# },
# "ShortText": {
# "can_be_key_type": true,
# "name": "ShortText",
# "can_be_value_type": false,
# "size": 4096
# },
# "Float": {
# "can_be_key_type": true,
# "name": "Float",
# "can_be_value_type": true,
# "size": 8
# },
# "UInt8": {
# "can_be_key_type": true,
# "name": "UInt8",
# "can_be_value_type": true,
# "size": 1
# },
# "UInt32": {
# "can_be_key_type": true,
# "name": "UInt32",
# "can_be_value_type": true,
# "size": 4
# },
# "Object": {
# "can_be_key_type": true,
# "name": "Object",
# "can_be_value_type": true,
# "size": 8
# },
# "UInt16": {
# "can_be_key_type": true,
# "name": "UInt16",
# "can_be_value_type": true,
# "size": 2
# },
# "Int64": {
# "can_be_key_type": true,
# "name": "Int64",
# "can_be_value_type": true,
# "size": 8
# },
# "Time": {
# "can_be_key_type": true,
# "name": "Time",
# "can_be_value_type": true,
# "size": 8
# },
# "Bool": {
# "can_be_key_type": true,
# "name": "Bool",
# "can_be_value_type": true,
# "size": 1
# },
# "WGS84GeoPoint": {
# "can_be_key_type": true,
# "name": "WGS84GeoPoint",
# "can_be_value_type": true,
# "size": 8
# },
# "Int8": {
# "can_be_key_type": true,
# "name": "Int8",
# "can_be_value_type": true,
# "size": 1
# }
# }
# }
# ]

Parameters
This section describes all parameters.

Required parameters
There is no required parameter.

Optional parameters
There is no optional parameter.

Return value
schema command returns schema in the database:

[HEADER, SCHEMA]

HEADER
See /reference/command/output_format about HEADER.

SCHEMA
SCHEMA is an object that consists of the following information:

{
"plugins": PLUGINS,
"types": TYPES,
"tokenizers": TOKENIZERS,
"normalizers": NORMALIZERS,
"token_filters": TOKEN_FITLERS,
"tables": TABLES
}

PLUGINS
PLUGINS is an object. Its key is plugin name and its value is plugin detail:

{
"PLUGIN_NAME_1": PLUGIN_1,
"PLUGIN_NAME_2": PLUGIN_2,
...
"PLUGIN_NAME_n": PLUGIN_n
}

PLUGIN
PLUGIN is an object that describes plugin detail:

{
"name": PLUGIN_NAME
}

Here are properties of PLUGIN:

┌─────┬──────────────────────────────────┐
│Name │ Description │
├─────┼──────────────────────────────────┤
name │ The plugin name. It's used in │
│ │ plugin_register. │
└─────┴──────────────────────────────────┘

TYPES
TYPES is an object. Its key is type name and its value is type detail:

{
"TYPE_NAME_1": TYPE_1,
"TYPE_NAME_2": TYPE_2,
...
"TYPE_NAME_n": TYPE_n
}

TYPE
TYPE is an object that describes type detail:

{
"name": TYPE_NAME,
"size": SIZE_OF_ONE_VALUE_IN_BYTE,
"can_be_key_type": BOOLEAN,
"can_be_value_type": BOOLEAN
}

Here are properties of TYPE:

┌──────────────────┬──────────────────────────────────┐
│Name │ Description │
├──────────────────┼──────────────────────────────────┤
name │ The type name. │
├──────────────────┼──────────────────────────────────┤
size │ The number of bytes of one │
│ │ value. │
├──────────────────┼──────────────────────────────────┤
can_be_key_typetrue when the type can be used │
│ │ for table key, false otherwise. │
├──────────────────┼──────────────────────────────────┤
can_be_value_typetrue when the type can be used │
│ │ for table value, false
│ │ otherwise. │
└──────────────────┴──────────────────────────────────┘

TOKENIZERS
TOKENIZERS is an object. Its key is tokenizer name and its value is tokenizer detail:

{
"TOKENIZER_NAME_1": TOKENIZER_1,
"TOKENIZER_NAME_2": TOKENIZER_2,
...
"TOKENIZER_NAME_n": TOKENIZER_n
}

TOKENIZER
TOKENIZER is an object that describes tokenizer detail:

{
"name": TOKENIZER_NAME
}

Here are properties of TOKENIZER:

┌─────┬──────────────────────────────────┐
│Name │ Description │
├─────┼──────────────────────────────────┤
name │ The tokenizer name. It's used │
│ │ for │
│ │ table-create-default-tokenizer. │
└─────┴──────────────────────────────────┘

NORMALIZERS
NORMALIZERS is an object. Its key is normalizer name and its value is normalizer detail:

{
"NORMALIZER_NAME_1": NORMALIZER_1,
"NORMALIZER_NAME_2": NORMALIZER_2,
...
"NORMALIZER_NAME_n": NORMALIZER_n
}

NORMALIZER
NORMALIZER is an object that describes normalizer detail:

{
"name": NORMALIZER_NAME
}

Here are properties of NORMALIZER:

┌─────┬──────────────────────────────────┐
│Name │ Description │
└─────┴──────────────────────────────────┘

name │ The normalizer name. It's used │
│ │ for table-create-normalizer. │
└─────┴──────────────────────────────────┘

TOKEN_FILTERS
TOKEN_FILTERS is an object. Its key is token filter name and its value is token filter
detail:

{
"TOKEN_FILTER_NAME_1": TOKEN_FILTER_1,
"TOKEN_FILTER_NAME_2": TOKEN_FILTER_2,
...
"TOKEN_FILTER_NAME_n": TOKEN_FILTER_n
}

TOKEN_FILTER
TOKEN_FILTER is an object that describes token filter detail:

{
"name": TOKEN_FILTER_NAME
}

Here are properties of TOKEN_FILTER:

┌─────┬──────────────────────────────────┐
│Name │ Description │
├─────┼──────────────────────────────────┤
name │ The token filter name. It's used │
│ │ for table-create-token-filters. │
└─────┴──────────────────────────────────┘

TABLES
TABLES is an object. Its key is table name and its value is table detail:

{
"TABLE_NAME_1": TABLE_1,
"TABLE_NAME_2": TABLE_2,
...
"TABLE_NAME_n": TABLE_n
}

TABLE
TABLE is an object that describes table detail:

{
"name": TABLE_NAME
"type": TYPE,
"key_type": KEY_TYPE,
"value_type": VALUE_TYPE,
"tokenizer": TOKENIZER,
"normalizer": NORMALIZER,
"token_filters": [
TOKEN_FILTER_1,
TOKEN_FILTER_2,
...,
TOKEN_FILTER_n,
],
"indexes": [
INDEX_1,
INDEX_2,
...,
INDEX_n
],
"command": COMMAND,
"columns": {
"COLUMN_NAME_1": COLUMN_1,
"COLUMN_NAME_2": COLUMN_2,
...,
"COLUMN_NAME_3": COLUMN_3,
}
}

Here are properties of TABLE:

┌──────────────┬──────────────────────────────────┐
│Name │ Description │
├──────────────┼──────────────────────────────────┤
name │ The table name. │
├──────────────┼──────────────────────────────────┤
type │ The table type. │
│ │ │
│ │ This is one of the followings: │
│ │ │
│ │ · array: table-no-key │
│ │ │
│ │ · hash: table-hash-key │
│ │ │
│ │ · patricia trie: │
│ │ table-pat-key │
│ │ │
│ │ · double array trie: │
│ │ table-dat-key │
└──────────────┴──────────────────────────────────┘

key_type │ The type of the table's key. │
│ │ │
│ │ If the table type is array, this │
│ │ is null. │
│ │ │
│ │ If the table type isn't array, │
│ │ this is an object that has the │
│ │ following properties: │
│ │ │
│ │ · name: The type name. │
│ │ │
│ │ · type: reference if │
│ │ the type is an │
│ │ table, type
│ │ otherwise. │
├──────────────┼──────────────────────────────────┤
value_type │ The type of the table's value. │
│ │ │
│ │ If the table doesn't use value, │
│ │ this is null. │
│ │ │
│ │ If the table uses value, this is │
│ │ an object that has the following │
│ │ properties: │
│ │ │
│ │ · name: The type name. │
│ │ │
│ │ · type: reference if │
│ │ the type is an │
│ │ table, type
│ │ otherwise. │
├──────────────┼──────────────────────────────────┤
tokenizer │ The tokenizer of the table. It's │
│ │ specified by │
│ │ table-create-default-tokenizer. │
│ │ │
│ │ If the table doesn't use │
│ │ tokenizer, this is null. │
│ │ │
│ │ If the table uses tokenizer, │
│ │ this is an object that has the │
│ │ following properties: │
│ │ │
│ │ · name: The tokenizer │
│ │ name. │
├──────────────┼──────────────────────────────────┤
normalizer │ The normalizer of the table. │
│ │ It's specified by │
│ │ table-create-normalizer. │
│ │ │
│ │ If the table doesn't use │
│ │ normalizer, this is null. │
│ │ │
│ │ If the table uses normalizer, │
│ │ this is an object that has the │
│ │ following properties: │
│ │ │
│ │ · name: The normalizer │
│ │ name. │
├──────────────┼──────────────────────────────────┤
token_filters │ The token filters of the table. │
│ │ It's specified by │
│ │ table-create-token-filters. │
│ │ │
│ │ This is an array of an object. │
│ │ The object has the following │
│ │ properties: │
│ │ │
│ │ · name: The token │
│ │ filter name. │
├──────────────┼──────────────────────────────────┤
indexes │ The indexes of the table's key. │
│ │ │
│ │ This is an array of INDEX. │
├──────────────┼──────────────────────────────────┤
command │ The Groonga command information │
│ │ to create the table. │
│ │ │
│ │ This is COMMAND. │
├──────────────┼──────────────────────────────────┤
columns │ The columns of the table. │
│ │ │
│ │ This is an object that its key │
│ │ is a column name and its value │
│ │ is COLUMN. │
└──────────────┴──────────────────────────────────┘

INDEX
INDEX is an object that describes index detail:

{
"full_name": INDEX_COLUMN_NAME_WITH_TABLE_NAME,
"table": TABLE_NAME,
"name": INDEX_COLUMN_NAME,
"section": SECTION
}

Here are properties of INDEX:

┌──────────┬──────────────────────────────────┐
│Name │ Description │
├──────────┼──────────────────────────────────┤
full_name │ The index column name with table │
│ │ name. │
│ │ │
│ │ For example, Terms.index. │
├──────────┼──────────────────────────────────┤
table │ The table name of the index │
│ │ column. │
│ │ │
│ │ For example, Terms. │
├──────────┼──────────────────────────────────┤
name │ The index column name. │
│ │ │
│ │ For example, index. │
├──────────┼──────────────────────────────────┤
section │ The section number in the index │
│ │ column for the table's key. │
│ │ │
│ │ If the index column isn't │
│ │ multiple column index, this is │
│ │ 0. │
└──────────┴──────────────────────────────────┘

COMMAND
COMMAND is an object that describes how to create the table or column:

{
"name": COMMAND_NAME,
"arguments": {
"KEY_1": "VALUE_1",
"KEY_2": "VALUE_2",
...,
"KEY_n": "VALUE_n"
},
"command_line": COMMAND_LINE
}

Here are properties of COMMAND:

┌─────────────┬──────────────────────────────────┐
│Name │ Description │
├─────────────┼──────────────────────────────────┤
name │ The Groonga command name to │
│ │ create the table or column. │
├─────────────┼──────────────────────────────────┤
arguments │ The arguments of the Groonga │
│ │ command to create the table or │
│ │ column. │
│ │ │
│ │ This is an object that its key │
│ │ is argument name and its value │
│ │ is argument value. │
├─────────────┼──────────────────────────────────┤
command_line │ The Groonga command line to │
│ │ create the table or column. │
│ │ │
│ │ This is a string that can be │
│ │ evaluated by Groonga. │
└─────────────┴──────────────────────────────────┘

COLUMN
COLUMN is an object that describes column detail:

{
"name": COLUMN_NAME,
"table": TABLE_NAME,
"full_name": COLUMN_NAME_WITH_TABLE,
"type": TYPE,
"value_type": VALUE_TYPE,
"compress": COMPRESS,
"section": SECTION,
"weight": WEIGHT,
"compress": COMPRESS,
"section": BOOLEAN,
"weight": BOOLEAN,
"position": BOOLEAN,
"sources": [
SOURCE_1,
SOURCE_2,
...,
SOURCE_n
],
"indexes": [
INDEX_1,
INDEX_2,
...,
INDEX_n
],
"command": COMMAND
}

Here are properties of COLUMN:

┌───────────┬───────────────────────────────────────┐
│Name │ Description │
└───────────┴───────────────────────────────────────┘

name │ The column name. │
│ │ │
│ │ For example, age. │
├───────────┼───────────────────────────────────────┤
table │ The table name of the column. │
│ │ │
│ │ For example, Users. │
├───────────┼───────────────────────────────────────┤
full_name │ The column name with table name. │
│ │ │
│ │ For example, Users.age. │
├───────────┼───────────────────────────────────────┤
type │ The column type. │
│ │ │
│ │ This is one of the followings: │
│ │ │
│ │ · scalar: │
│ │ /reference/columns/scalar
│ │ │
│ │ · vector: │
│ │ /reference/columns/vector
│ │ │
│ │ · index: │
│ │ /reference/columns/index
├───────────┼───────────────────────────────────────┤
value_type │ The type of the column's value. │
│ │ │
│ │ This is an object that has the │
│ │ following properties: │
│ │ │
│ │ · name: The type name. │
│ │ │
│ │ · type: reference if the │
│ │ type is an table, type
│ │ otherwise. │
├───────────┼───────────────────────────────────────┤
compress │ The compression method of the column. │
│ │ │
│ │ If the column doesn't use any │
│ │ compression methods, this is null. │
│ │ │
│ │ If the column uses a compression │
│ │ method, this is one of the │
│ │ followings: │
│ │ │
│ │ · zlib: The column uses │
│ │ zlib to compress column │
│ │ value. │
│ │ │
│ │ · lz4: The column uses LZ4 │
│ │ to compress column value. │
├───────────┼───────────────────────────────────────┤
section │ Whether the column can store section │
│ │ information or not. │
│ │ │
│ │ true if the column is created with │
│ │ WITH_SECTION flag, false otherwise. │
│ │ │
│ │ Normally, if the column isn't an │
│ │ index column, this is false. │
├───────────┼───────────────────────────────────────┤
weight │ Whether the column can store weight │
│ │ information or not. │
│ │ │
│ │ true if the column is created with │
│ │ WITH_WEIGHT flag, false otherwise. │
├───────────┼───────────────────────────────────────┤
position │ Whether the column can store position │
│ │ information or not. │
│ │ │
│ │ true if the column is created with │
│ │ WITH_POSITION flag, false otherwise. │
│ │ │
│ │ Normally, if the column isn't an │
│ │ index column, this is false. │
├───────────┼───────────────────────────────────────┤
sources │ The source columns of the index │
│ │ column. │
│ │ │
│ │ This is an array of SOURCE. │
│ │ │
│ │ Normally, if the column isn't an │
│ │ index column, this is an empty array. │
├───────────┼───────────────────────────────────────┤
indexes │ The indexes of the column. │
│ │ │
│ │ This is an array of INDEX. │
├───────────┼───────────────────────────────────────┤
command │ The Groonga command information to │
│ │ create the column. │
│ │ │
│ │ This is COMMAND. │
└───────────┴───────────────────────────────────────┘

SOURCE
SOURCE is an object that describes source detail:

{
"name": COLUMN_NAME,
"table": TABLE_NAME,
"full_name": COLUMN_NAME_WITH_TABLE_NAME
}

Here are properties of SOURCE:

┌──────────┬──────────────────────────────────┐
│Name │ Description │
├──────────┼──────────────────────────────────┤
name │ The source column name. │
│ │ │
│ │ For example, content. │
│ │ │
│ │ This may be a _key pseudo │
│ │ column. │
├──────────┼──────────────────────────────────┤
table │ The table name of the source │
│ │ column. │
│ │ │
│ │ For example, Memos. │
├──────────┼──────────────────────────────────┤
full_name │ The source column name with │
│ │ table name. │
│ │ │
│ │ For example, Memos.content. │
└──────────┴──────────────────────────────────┘

See also
· table_create

· column_create

select
Summary
select searches records that are matched to specified conditions from a table and then
outputs them.

select is the most important command in groonga. You need to understand select to use the
full power of Groonga.

Syntax
This command takes many parameters.

The required parameter is only table. Other parameters are optional:

select table
[match_columns=null]
[query=null]
[filter=null]
[scorer=null]
[sortby=null]
[output_columns="_id, _key, *"]
[offset=0]
[limit=10]
[drilldown=null]
[drilldown_sortby=null]
[drilldown_output_columns="_key, _nsubrecs"]
[drilldown_offset=0]
[drilldown_limit=10]
[cache=yes]
[match_escalation_threshold=0]
[query_expansion=null]
[query_flags=ALLOW_PRAGMA|ALLOW_COLUMN|ALLOW_UPDATE|ALLOW_LEADING_NOT|NONE]
[query_expander=null]
[adjuster=null]
[drilldown_calc_types=NONE]
[drilldown_calc_target=null]

select has the following named parameters for advanced drilldown:

· drilldown[${LABEL}].keys=null

· drilldown[${LABEL}].sortby=null

· drilldown[${LABEL}].output_columns="_key, _nsubrecs"

· drilldown[${LABEL}].offset=0

· drilldown[${LABEL}].limit=10

· drilldown[${LABEL}].calc_types=NONE

· drilldown[${LABEL}].calc_target=null

You can use one or more alphabets, digits, _ and . for ${LABEL}. For example, parent.sub1
is a valid ${LABEL}.

Parameters that have the same ${LABEL} are grouped.

For example, the following parameters specify one drilldown:

· --drilldown[label].keys column

· --drilldown[label].sortby -_nsubrecs

The following parameters specify two drilldowns:

· --drilldown[label1].keys column1

· --drilldown[label1].sortby -_nsubrecs

· --drilldown[label2].keys column2

· --drilldown[label2].sortby _key

Usage
Let's learn about select usage with examples. This section shows many popular usages.

Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries tag COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key": "The first post!",
"content": "Welcome! This is my first post!",
"n_likes": 5,
"tag": "Hello"},
{"_key": "Groonga",
"content": "I started to use Groonga. It's very fast!",
"n_likes": 10,
"tag": "Groonga"},
{"_key": "Mroonga",
"content": "I also started to use Mroonga. It's also very fast! Really fast!",
"n_likes": 15,
"tag": "Groonga"},
{"_key": "Good-bye Senna",
"content": "I migrated all Senna system!",
"n_likes": 3,
"tag": "Senna"},
{"_key": "Good-bye Tritonn",
"content": "I also migrated all Tritonn system!",
"n_likes": 3,
"tag": "Senna"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

There is a table, Entries, for blog entries. An entry has title, content, the number of
likes for the entry and tag. Title is key of Entries. Content is value of Entries.content
column. The number of likes is value of Entries.n_likes column. Tag is value of
Entries.tag column.

Entries._key column and Entries.content column are indexed using TokenBigram tokenizer. So
both Entries._key and Entries.content are fulltext search ready.

OK. The schema and data for examples are ready.

Simple usage
Here is the most simple usage with the above schema and data. It outputs all records in
Entries table.

Execution example:

select Entries
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

Why does the command output all records? There are two reasons. The first reason is that
the command doesn't specify any search conditions. No search condition means all records
are matched. The second reason is that the number of all records is 5. select command
outputs 10 records at a maximum by default. There are only 5 records. It is less than 10.
So the command outputs all records.

Search conditions
Search conditions are specified by query or filter. You can also specify both query and
filter. It means that selected records must be matched against both query and filter.

Search condition: query
query is designed for search box in Web page. Imagine a search box in google.com. You
specify search conditions for query as space separated keywords. For example, search
engine means a matched record should contain two words, search and engine.

Normally, query parameter is used for specifying fulltext search conditions. It can be
used for non fulltext search conditions but filter is used for the propose.

query parameter is used with match_columns parameter when query parameter is used for
specifying fulltext search conditions. match_columns specifies which columnes and indexes
are matched against query.

Here is a simple query usage example.

Execution example:

select Entries --match_columns content --query fast
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that contain a word fast in content column value from
Entries table.

query has query syntax but its deatils aren't described here. See
/reference/grn_expr/query_syntax for datails.

Search condition: filter
filter is designed for complex search conditions. You specify search conditions for filter
as ECMAScript like syntax.

Here is a simple filter usage example.

Execution example:

select Entries --filter 'content @ "fast" && _key == "Groonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that contain a word fast in content column value and
has Groonga as _key from Entries table. There are three operators in the command, @, &&
and ==. @ is fulltext search operator. && and == are the same as ECMAScript. && is logical
AND operator and == is equality operator.

filter has more operators and syntax like grouping by (...) its details aren't described
here. See /reference/grn_expr/script_syntax for datails.

Paging
You can specify range of outputted records by offset and limit. Here is an example to
output only the 2nd record.

Execution example:

select Entries --offset 1 --limit 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]

offset is zero-based. --offset 1 means output range is started from the 2nd record.

limit specifies the max number of output records. --limit 1 means the number of output
records is 1 at a maximium. If no records are matched, select command outputs no records.

The total number of records
You can use --limit 0 to retrieve the total number of recrods without any contents of
records.

Execution example:

select Entries --limit 0
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ]
# ]
# ]

--limit 0 is also useful for retrieving only the number of matched records.

Drilldown
You can get additional grouped results against the search result in one select. You need
to use two or more SELECT``s in SQL but ``select in Groonga can do it in one select.

This feature is called as drilldown in Groonga. It's also called as faceted search in
other search engine.

For example, think about the following situation.

You search entries that has fast word:

Execution example:

select Entries --filter 'content @ "fast"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

You want to use tag for additional search condition like --filter 'content @ "fast" && tag
== "???". But you don't know suitable tag until you see the result of content @ "fast".

If you know the number of matched records of each available tag, you can choose suitable
tag. You can use drilldown for the case:

Execution example:

select Entries --filter 'content @ "fast"' --drilldown tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ],
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ]
# ]
# ]
# ]

--drilldown tag returns a list of pair of available tag and the number of matched records.
You can avoid "no hit search" case by choosing a tag from the list. You can also avoid
"too many search results" case by choosing a tag that the number of matched records is few
from the list.

You can create the following UI with the drilldown results:

· Links to narrow search results. (Users don't need to input a search query by their
keyboard. They just click a link.)

Most EC sites use the UI. See side menu at Amazon.

Groonga supports not only counting grouped records but also finding the maximum and/or
minimum value from grouped records, summing values in grouped records and so on. See
Drilldown related parameters for details.

Parameters
This section describes all parameters. Parameters are categorized.

Required parameters
There is a required parameter, table.

table
Specifies a table to be searched. table must be specified.

If nonexistent table is specified, an error is returned.

Execution example:

select Nonexistent
# [
# [
# -22,
# 1337566253.89858,
# 0.000355720520019531,
# "invalid table name: <Nonexistent>",
# [
# [
# "grn_select",
# "proc.c",
# 1217
# ]
# ]
# ]
# ]

Search related parameters
There are search related parameters. Typically, match_columns and query parameters are
used for implementing a search box. filter parameters is used for implementing complex
search feature.

If both query and filter are specified, selected records must be matched against both
query and filter. If both query and filter aren't specified, all records are selected.

match_columns
Specifies the default target column for fulltext search by query parameter value. A target
column for fulltext search can be specified in query parameter. The difference between
match_columns and query is whether weight and score function are supported or not.
match_columns supports them but query doesn't.

Weight is relative importance of target column. A higher weight target column gets more
hit score rather than a lower weight target column when a record is matched by fulltext
search. The default weight is 1.

Here is a simple match_columns usage example.

Execution example:

select Entries --match_columns content --query fast --output_columns '_key, _score'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 1
# ],
# [
# "Mroonga",
# 2
# ]
# ]
# ]
# ]

--match_columns content means the default target column for fulltext search is content
column and its weight is 1. --output_columns '_key, _score' means that the select command
outputs _key value and _score value for matched records.

Pay attention to _score value. _score value is the number of matched counts against query
parameter value. In the example, query parameter value is fast. The fact that _score value
is 1 means that fast appers in content column only once. The fact that _score value is 2
means that fast appears in content column twice.

To specify weight, column * weight syntax is used. Here is a weight usage example.

Execution example:

select Entries --match_columns 'content * 2' --query fast --output_columns '_key, _score'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Mroonga",
# 4
# ]
# ]
# ]
# ]

--match_columns 'content * 2' means the default target column for fulltext search is
content column and its weight is 2.

Pay attention to _score value. _score value is doubled because weight is 2.

You can specify one or more columns as the default target columns for fulltext search. If
one or more columns are specified, fulltext search is done for all columns and scores are
accumulated. If one of the columns is matched against query parameter value, the record is
treated as matched.

To specify one or more columns, column1 * weight1 || column2 * weight2 || ... syntax is
used. * weight can be omitted. If it is omitted, 1 is used for weight. Here is a one or
more columns usage example.

Execution example:

select Entries --match_columns '_key * 10 || content' --query groonga --output_columns '_key, _score'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 11
# ]
# ]
# ]
# ]

--match_columns '_key * 10 || content' means the default target columns for fulltext
search are _key and content columns and _key column's weight is 10 and content column's
weight is 1. This weight allocation means _key column value is more important rather than
content column value. In this example, title of blog entry is more important rather thatn
content of blog entry.

You can also specify score function. See /reference/scorer for details.

Note that score function isn't related to scorer parameter.

query
Specifies the query text. Normally, it is used for fulltext search with match_columns
parameter. query parameter is designed for a fulltext search form in a Web page. A query
text should be formatted in /reference/grn_expr/query_syntax. The syntax is similar to
common search form like Google's search form. For example, word1 word2 means that groonga
searches records that contain both word1 and word2. word1 OR word2 means that groogna
searches records that contain either word1 or word2.

Here is a simple logical and search example.

Execution example:

select Entries --match_columns content --query "fast groonga"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that contain two words fast and groonga in content
column value from Entries table.

Here is a simple logical or search example.

Execution example:

select Entries --match_columns content --query "groonga OR mroonga"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that contain one of two words groonga or mroonga in
content column value from Entries table.

See /reference/grn_expr/query_syntax for other syntax.

It can be used for not only fulltext search but also other conditions. For example,
column:value means the value of column column is equal to value. column:<value means the
value of column column is less than value.

Here is a simple equality operator search example.

Execution example:

select Entries --query _key:Groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that _key column value is Groonga from Entries table.

Here is a simple less than operator search example.

Execution example:

select Entries --query n_likes:<11
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The select command searches records that n_likes column value is less than 11 from Entries
table.

See /reference/grn_expr/query_syntax for other operations.

filter
Specifies the filter text. Normally, it is used for complex search conditions. filter can
be used with query parameter. If both filter and query are specified, there are conbined
with logical and. It means that matched records should be matched against both filter and
query.

filter parameter is designed for complex conditions. A filter text should be formatted in
/reference/grn_expr/script_syntax. The syntax is similar to ECMAScript. For example,
column == "value" means that the value of column column is equal to "value". column <
value means that the value of column column is less than value.

Here is a simple equality operator search example.

Execution example:

select Entries --filter '_key == "Groonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that _key column value is Groonga from Entries table.

Here is a simple less than operator search example.

Execution example:

select Entries --filter 'n_likes < 11'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The select command searches records that n_likes column value is less than 11 from Entries
table.

See /reference/grn_expr/script_syntax for other operators.

Advanced search parameters
match_escalation_threshold
Specifies threshold to determine whether search storategy escalation is used or not. The
threshold is compared against the number of matched records. If the number of matched
records is equal to or less than the threshold, the search storategy escalation is used.
See /spec/search about the search storategy escalation.

The default threshold is 0. It means that search storategy escalation is used only when no
records are matched.

The default threshold can be customized by one of the followings.

· --with-match-escalation-threshold option of configure

· --match-escalation-threshold option of groogna command

· match-escalation-threshold configuration item in configuration file

Here is a simple match_escalation_threshold usage example. The first select doesn't have
match_escalation_threshold parameter. The second select has match_escalation_threshold
parameter.

Execution example:

select Entries --match_columns content --query groo
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ]
# ]
# ]
# ]
select Entries --match_columns content --query groo --match_escalation_threshold -1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ]
# ]
# ]

The first select command searches records that contain a word groo in content column value
from Entries table. But no records are matched because the TokenBigram tokenizer tokenizes
groonga to groonga not gr|ro|oo|on|ng|ga. (The TokenBigramSplitSymbolAlpha tokenizer
tokenizes groonga to gr|ro|oo|on|ng|ga. See /reference/tokenizers for details.) It means
that groonga is indexed but groo isn't indexed. So no records are matched against groo by
exact match. In the case, the search storategy escalation is used because the number of
matched records (0) is equal to match_escalation_threshold (0). One record is matched
against groo by unsplit search.

The second select command also searches records that contain a word groo in content column
value from Entries table. And it also doesn't found matched records. In this case, the
search storategy escalation is not used because the number of matched records (0) is
larger than match_escalation_threshold (-1). So no more searches aren't executed. And no
records are matched.

query_expansion
Deprecated. Use query_expander instead.

query_flags
It customs query parameter syntax. You cannot update column value by query parameter by
default. But if you specify ALLOW_COLUMN|ALLOW_UPDATE as query_flags, you can update
column value by query.

Here are available values:

· ALLOW_PRAGMA

· ALLOW_COLUMN

· ALLOW_UPDATE

· ALLOW_LEADING_NOT

· NONE

ALLOW_PRAGMA enables pragma at the head of query. This is not implemented yet.

ALLOW_COLUMN enables search againt columns that are not included in match_columns. To
specify column, there are COLUMN:... syntaxes.

ALLOW_UPDATE enables column update by query with COLUMN:=NEW_VALUE syntax. ALLOW_COLUMN is
also required to update column because the column update syntax specifies column.

ALLOW_LEADING_NOT enables leading NOT condition with -WORD syntax. The query searches
records that doesn't match WORD. Leading NOT condition query is heavy query in many cases
because it matches many records. So this flag is disabled by default. Be careful about it
when you use the flag.

NONE is just ignores. You can use NONE for specifying no flags.

They can be combined by separated | such as ALLOW_COLUMN|ALLOW_UPDATE.

The default value is ALLOW_PRAGMA|ALLOW_COLUMN.

Here is a usage example of ALLOW_COLUMN.

Execution example:

select Entries --query content:@mroonga --query_flags ALLOW_COLUMN
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that contain mroonga in content column value from
Entries table.

Here is a usage example of ALLOW_UPDATE.

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users age COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "alice", "age": 18},
{"_key": "bob", "age": 20}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select Users --query age:=19 --query_flags ALLOW_COLUMN|ALLOW_UPDATE
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt32"
# ]
# ],
# [
# 1,
# "alice",
# 19
# ],
# [
# 2,
# "bob",
# 19
# ]
# ]
# ]
# ]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt32"
# ]
# ],
# [
# 1,
# "alice",
# 19
# ],
# [
# 2,
# "bob",
# 19
# ]
# ]
# ]
# ]

The first select command sets age column value of all records to 19. The second select
command outputs updated age column values.

Here is a usage example of ALLOW_LEADING_NOT.

Execution example:

select Entries --match_columns content --query -mroonga --query_flags ALLOW_LEADING_NOT
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The select command searches records that don't contain mroonga in content column value
from Entries table.

Here is a usage example of NONE.

Execution example:

select Entries --match_columns content --query 'mroonga OR _key:Groonga' --query_flags NONE
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

The select command searches records that contain one of two words mroonga or _key:Groonga
in content from Entries table. Note that _key:Groonga doesn't mean that the value of _key
column is equal to Groonga. Because ALLOW_COLUMN flag is not specified.

See also /reference/grn_expr/query_syntax.

query_expander
It's for query expansion. Query expansion substitutes specific words to another words in
query. Nomally, it's used for synonym search.

It specifies a column that is used to substitute query parameter value. The format of this
parameter value is "${TABLE}.${COLUMN}". For example, "Terms.synonym" specifies synonym
column in Terms table.

Table for query expansion is called "substitution table". Substitution table's key must be
ShortText. So array table (TABLE_NO_KEY) can't be used for query expansion. Because array
table doesn't have key.

Column for query expansion is called "substitution column". Substitution column's value
type must be ShortText. Column type must be vector (COLUMN_VECTOR).

Query expansion substitutes key of substitution table in query with values in substitution
column. If a word in query is a key of substitution table, the word is substituted with
substitution column value that is associated with the key. Substition isn't performed
recursively. It means that substitution target words in substituted query aren't
substituted.

Here is a sample substitution table to show a simple query_expander usage example.

Execution example:

table_create Thesaurus TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Thesaurus synonym COLUMN_VECTOR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Thesaurus
[
{"_key": "mroonga", "synonym": ["mroonga", "tritonn", "groonga mysql"]},
{"_key": "groonga", "synonym": ["groonga", "senna"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

Thesaurus substitution table has two synonyms, "mroonga" and "groonga". If an user
searches with "mroonga", Groonga searches with "((mroonga) OR (tritonn) OR (groonga
mysql))". If an user searches with "groonga", Groonga searches with "((groonga) OR
(senna))".

Normally, it's good idea that substitution table uses a normalizer. For example, if
normalizer is used, substitute target word is matched in case insensitive manner. See
/reference/normalizers for available normalizers.

Note that those synonym values include the key value such as "mroonga" and "groonga". It's
recommended that you include the key value. If you don't include key value, substituted
value doesn't include the original substitute target value. Normally, including the
original value is better search result. If you have a word that you don't want to be
searched, you should not include the original word. For example, you can implement "stop
words" by an empty vector value.

Here is a simple query_expander usage example.

Execution example:

select Entries --match_columns content --query "mroonga"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]
select Entries --match_columns content --query "mroonga" --query_expander Thesaurus.synonym
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]
select Entries --match_columns content --query "((mroonga) OR (tritonn) OR (groonga mysql))"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The first select command doesn't use query expansion. So a record that has "tritonn" isn't
found. The second select command uses query expansion. So a record that has "tritonn" is
found. The third select command doesn't use query expansion but it is same as the second
select command. The third one uses expanded query.

Each substitute value can contain any /reference/grn_expr/query_syntax syntax such as
(...) and OR. You can use complex substitution by using those syntax.

Here is a complex substitution usage example that uses query syntax.

Execution example:

load --table Thesaurus
[
{"_key": "popular", "synonym": ["popular", "n_likes:>=10"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select Entries --match_columns content --query "popular" --query_expander Thesaurus.synonym
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

The load command registers a new synonym "popular". It is substituted with ((popular) OR
(n_likes:>=10)). The substituted query means that "popular" is containing the word
"popular" or 10 or more liked entries.

The select command outputs records that n_likes column value is equal to or more than 10
from Entries table.

Output related parameters
output_columns
Specifies output columns separated by ,.

Here is a simple output_columns usage example.

Execution example:

select Entries --output_columns '_id, _key' --limit 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!"
# ]
# ]
# ]
# ]

The select command just outputs _id and _key column values.

* is a special value. It means that all columns that are not /reference/columns/pseudo.

Here is a * usage example.

Execution example:

select Entries --output_columns '_key, *' --limit 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ]
# ]
# ]
# ]

The select command outputs _key pseudo column, content column and n_likes column values
but doesn't output _id pseudo column value.

The default value is _id, _key, *. It means that all column values except _score are
outputted.

sortby
Specifies sort keys separated by ,. Each sort key is column name.

Here is a simple sortby usage example.

Execution example:

select Entries --sortby 'n_likes, _id'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ]
# ]
# ]
# ]

The select command sorts by n_likes column value in ascending order. For records that has
the same n_likes are sorted by _id in ascending order. "Good-bye Senna" and "Good-bye
Tritonn" are the case.

If you want to sort in descending order, add - before column name.

Here is a descending order sortby usage example.

Execution example:

select Entries --sortby '-n_likes, _id'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The select command sorts by n_likes column value in descending order. But ascending order
is used for sorting by _id.

You can use _score pseudo column in sortby if you use query or filter parameter.

Execution example:

select Entries --match_columns content --query fast --sortby -_score --output_columns '_key, _score'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Mroonga",
# 2
# ],
# [
# "Groonga",
# 1
# ]
# ]
# ]
# ]

The select command sorts matched records by hit score in descending order and outputs
record key and hit score.

If you use _score without query nor filter parameters, it's just ignored but get a warning
in log file.

offset
Specifies offset to determine output records range. Offset is zero-based. --offset 1 means
output range is started from the 2nd record.

Execution example:

select Entries --sortby _id --offset 3 --output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Good-bye Senna"
# ],
# [
# "Good-bye Tritonn"
# ]
# ]
# ]
# ]

The select command outputs from the 4th record.

You can specify negative value. It means that the number of matched records + offset. If
you have 3 matched records and specify --offset -2, you get records from the 2nd (3 + -2 =
1. 1 means 2nd. Remember that offset is zero-based.) record to the 3rd record.

Execution example:

select Entries --sortby _id --offset -2 --output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Good-bye Senna"
# ],
# [
# "Good-bye Tritonn"
# ]
# ]
# ]
# ]

The select command outputs from the 4th record because the total number of records is 5.

The default value is 0.

limit
Specifies the max number of output records. If the number of matched records is less than
limit, all records are outputted.

Here is a simple limit usage example.

Execution example:

select Entries --sortby _id --offset 2 --limit 3 --output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Mroonga"
# ],
# [
# "Good-bye Senna"
# ],
# [
# "Good-bye Tritonn"
# ]
# ]
# ]
# ]

The select command outputs the 3rd, the 4th and the 5th records.

You can specify negative value. It means that the number of matched records + limit + 1.
For example, --limit -1 outputs all records. It's very useful value to show all records.

Here is a simple negative limit value usage example.

Execution example:

select Entries --limit -1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The select command outputs all records.

The default value is 10.

scorer
TODO: write in English and add example.

検索条件にマッチする全てのレコードに対して適用するgrn_exprをscript形式で指定します。

scorerは、検索処理が完了し、ソート処理が実行される前に呼び出されます。従って、各レコードのスコアを操作する式を指定しておけば、検索結果のソート順序をカスタマイズできるようになります。

Drilldown related parameters
This section describes basic drilldown related parameters. Advanced drilldown related
parameters are described in another section.

drilldown
Specifies keys for grouping separated by ,.

Matched records by specified search conditions are grouped by each key. If you specify no
search condition, all records are grouped by each key.

Here is a simple drilldown example:

Execution example:

select Entries \
--output_columns _key,tag \
--drilldown tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# "The first post!",
# "Hello"
# ],
# [
# "Groonga",
# "Groonga"
# ],
# [
# "Mroonga",
# "Groonga"
# ],
# [
# "Good-bye Senna",
# "Senna"
# ],
# [
# "Good-bye Tritonn",
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

The select command outputs the following information:

· There is one record that has "Hello" tag.

· There is two records that has "Groonga" tag.

· There is two records that has "Senna" tag.

Here is a drilldown with search condition example:

Execution example:

select Entries \
--output_columns _key,tag \
--filter 'n_likes >= 5' \
--drilldown tag
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# "The first post!",
# "Hello"
# ],
# [
# "Groonga",
# "Groonga"
# ],
# [
# "Mroonga",
# "Groonga"
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ]
# ]
# ]
# ]

The select command outputs the following information:

· In records that have 5 or larger as n_likes value:

· There is one record that has "Hello" tag.

· There is two records that has "Groonga" tag.

Here is a drilldown with multiple group keys example:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown tag,n_likes
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ]
# ],
# [
# [
# 4
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 5,
# 1
# ],
# [
# 10,
# 1
# ],
# [
# 15,
# 1
# ],
# [
# 3,
# 2
# ]
# ]
# ]
# ]

The select command outputs the following information:

· About tag:

· There is one record that has "Hello" tag.

· There is two records that has "Groonga" tag.

· There is two records that has "Senna" tag.

· About n_likes:

· There is one record that has "Hello" tag.

· There is two records that has "Groonga" tag.

· There is two records that has "Senna" tag.

drilldown_sortby
Specifies sort keys for drilldown outputs separated by ,. Each sort key is column name.

You can refer the number of grouped records by _nsubrecs /reference/columns/pseudo.

Here is a simple drilldown_sortby example:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown 'tag, n_likes' \
--drilldown_sortby '-_nsubrecs, _key'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ],
# [
# "Hello",
# 1
# ]
# ],
# [
# [
# 4
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 3,
# 2
# ],
# [
# 5,
# 1
# ],
# [
# 10,
# 1
# ],
# [
# 15,
# 1
# ]
# ]
# ]
# ]

Drilldown result is sorted by the number of grouped records (= _nsubrecs ) in descending
order. If there are grouped results that the number of records in the group are the same,
these grouped results are sorted by grouped key (= _key ) in ascending order.

The sort keys are used in all group keys specified in drilldown:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown 'tag, n_likes' \
--drilldown_sortby '-_nsubrecs, _key'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Senna",
# 2
# ],
# [
# "Hello",
# 1
# ]
# ],
# [
# [
# 4
# ],
# [
# [
# "_key",
# "UInt32"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 3,
# 2
# ],
# [
# 5,
# 1
# ],
# [
# 10,
# 1
# ],
# [
# 15,
# 1
# ]
# ]
# ]
# ]

The same sort keys are used in tag drilldown and n_likes drilldown.

If you want to use different sort keys for each drilldown, use Advanced drilldown related
parameters.

drilldown_output_columns
Specifies output columns for drilldown separated by ,.

Here is a drilldown_output_columns example:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Hello"
# ],
# [
# "Groonga"
# ],
# [
# "Senna"
# ]
# ]
# ]
# ]

The select command just outputs grouped key.

If grouped key is a referenced type column (= column that its type is a table), you can
access column of the table referenced by the referenced type column.

Here are a schema definition and sample data to show drilldown against referenced type
column:

Execution example:

table_create Tags TABLE_HASH_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Tags label COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Tags priority COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Items TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Items tag COLUMN_SCALAR Tags
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Tags
[
{"_key": "groonga", label: "Groonga", priority: 10},
{"_key": "mroonga", label: "Mroonga", priority: 5}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
load --table Items
[
{"_key": "A", "tag": "groonga"},
{"_key": "B", "tag": "groonga"},
{"_key": "C", "tag": "mroonga"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

Tags table is a referenced table. Items.tag is a referenced type column.

You can refer Tags.label by label in drilldown_output_columns:

Execution example:

select Items \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_output_columns '_key, label'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "Tags"
# ]
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "label",
# "ShortText"
# ]
# ],
# [
# "groonga",
# "Groonga"
# ],
# [
# "mroonga",
# "Mroonga"
# ]
# ]
# ]
# ]

You can use * to refer all columns in referenced table (= Tags):

Execution example:

select Items \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_output_columns '_key, *'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "Tags"
# ]
# ]
# ],
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "label",
# "ShortText"
# ],
# [
# "priority",
# "Int32"
# ]
# ],
# [
# "groonga",
# "Groonga",
# 10
# ],
# [
# "mroonga",
# "Mroonga",
# 5
# ]
# ]
# ]
# ]

* is expanded to label, priority.

The default value of drilldown_output_columns is _key, _nsubrecs. It means that grouped
key and the number of records in the group are output.

You can use more /reference/columns/pseudo in drilldown_output_columns such as _max, _min,
_sum and _avg when you use drilldown_calc_types. See drilldown_calc_types document for
details.

drilldown_offset
Specifies offset to determine range of drilldown output records. Offset is zero-based.
--drilldown_offset 1 means output range is started from the 2nd record.

Here is a drilldown_offset example:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_sortby _key \
--drilldown_offset 1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

The select command outputs from the 2nd record.

You can specify negative value. It means that the number of grouped results + offset. If
you have 3 grouped results and specify --drilldown_offset -2, you get grouped results from
the 2st (3 + -2 = 1. 1 means 2nd. Remember that offset is zero-based.) grouped result to
the 3rd grouped result.

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_sortby _key \
--drilldown_offset -2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

The select command outputs from the 2nd grouped result because the total number of grouped
results is 3.

The default value of drilldown_offset is 0.

drilldown_limit
Specifies the max number of groups in a drilldown. If the number of groups is less than
drilldown_limit, all groups are outputted.

Here is a drilldown_limit example:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_sortby _key \
--drilldown_offset 1 \
--drilldown_limit 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 1
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

The select command outputs the 2rd and the 3rd groups.

You can specify negative value. It means that the number of groups + drilldown_limit + 1.
For example, --drilldown_limit -1 outputs all groups. It's very useful value to show all
groups.

Here is a negative drilldown_limit value example.

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown tag \
--drilldown_sortby _key \
--drilldown_limit -1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Groonga",
# 2
# ],
# [
# "Hello",
# 1
# ],
# [
# "Senna",
# 2
# ]
# ]
# ]
# ]

The select command outputs all groups.

The default value of drilldown_limit is 10.

drilldown_calc_types
Specifies how to calculate (aggregate) values in grouped records by a drilldown. You can
specify multiple calculation types separated by ",". For example, MAX,MIN.

Calculation target values are read from a column of grouped records. The column is
specified by drilldown_calc_target.

You can read calculated value by /reference/columns/pseudo such as _max and _min in
drilldown_output_columns.

You can use the following calculation types:

┌──────────┬───────────────────────────┬───────────────────────┬─────────────────────┐
│Type name │ /reference/columns/pseudo │ Need │ Description │
│ │ name │ drilldown_calc_target │ │
├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤
NONE │ Nothing. │ Not needs. │ Just ignored. │
├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤
COUNT_nsubrecs │ Not needs. │ Counting grouped │
│ │ │ │ records. It's │
│ │ │ │ always enabled. So │
│ │ │ │ you don't need to │
│ │ │ │ specify it. │
├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤
MAX_max │ Needs. │ Finding the maximum │
│ │ │ │ integer value from │
│ │ │ │ integer values in │
│ │ │ │ grouped records. │
├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤
MIN_min │ Needs. │ Finding the minimum │
│ │ │ │ integer value from │
│ │ │ │ integer values in │
│ │ │ │ grouped records. │
├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤
SUM_sum │ Needs. │ Summing integer │
│ │ │ │ values in grouped │
│ │ │ │ records. │
├──────────┼───────────────────────────┼───────────────────────┼─────────────────────┤
AVG_avg │ Needs. │ Averaging │
│ │ │ │ integer/float │
│ │ │ │ values in grouped │
│ │ │ │ records. │
└──────────┴───────────────────────────┴───────────────────────┴─────────────────────┘

Here is a MAX example:

Execution example:

select Entries \
--limit -1 \
--output_column _id,n_likes \
--drilldown tag \
--drilldown_calc_types MAX \
--drilldown_calc_target n_likes \
--drilldown_output_columns _key,_max
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_max",
# "Int64"
# ]
# ],
# [
# "Hello",
# 5
# ],
# [
# "Groonga",
# 15
# ],
# [
# "Senna",
# 3
# ]
# ]
# ]
# ]

The select command groups all records by tag column value, finding the maximum n_likes
column value for each group and outputs pairs of grouped key and the maximum n_likes
column value for the group. It uses _max /reference/columns/pseudo to read the maximum
n_likes column value.

Here is a MIN example:

Execution example:

select Entries \
--limit -1 \
--output_column _id,n_likes \
--drilldown tag \
--drilldown_calc_types MIN \
--drilldown_calc_target n_likes \
--drilldown_output_columns _key,_min
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_min",
# "Int64"
# ]
# ],
# [
# "Hello",
# 5
# ],
# [
# "Groonga",
# 10
# ],
# [
# "Senna",
# 3
# ]
# ]
# ]
# ]

The select command groups all records by tag column value, finding the minimum n_likes
column value for each group and outputs pairs of grouped key and the minimum n_likes
column value for the group. It uses _min /reference/columns/pseudo to read the minimum
n_likes column value.

Here is a SUM example:

Execution example:

select Entries \
--limit -1 \
--output_column _id,n_likes \
--drilldown tag \
--drilldown_calc_types SUM \
--drilldown_calc_target n_likes \
--drilldown_output_columns _key,_sum
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_sum",
# "Int64"
# ]
# ],
# [
# "Hello",
# 5
# ],
# [
# "Groonga",
# 25
# ],
# [
# "Senna",
# 6
# ]
# ]
# ]
# ]

The select command groups all records by tag column value, sums all n_likes column values
for each group and outputs pairs of grouped key and the summed n_likes column values for
the group. It uses _sum /reference/columns/pseudo to read the summed n_likes column
values.

Here is a AVG example:

Execution example:

select Entries \
--limit -1 \
--output_column _id,n_likes \
--drilldown tag \
--drilldown_calc_types AVG \
--drilldown_calc_target n_likes \
--drilldown_output_columns _key,_avg
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_avg",
# "Float"
# ]
# ],
# [
# "Hello",
# 5.0
# ],
# [
# "Groonga",
# 12.5
# ],
# [
# "Senna",
# 3.0
# ]
# ]
# ]
# ]

The select command groups all records by tag column value, averages all n_likes column
values for each group and outputs pairs of grouped key and the averaged n_likes column
values for the group. It uses _avg /reference/columns/pseudo to read the averaged n_likes
column values.

Here is an example that uses all calculation types:

Execution example:

select Entries \
--limit -1 \
--output_column _id,n_likes \
--drilldown tag \
--drilldown_calc_types MAX,MIN,SUM,AVG \
--drilldown_calc_target n_likes \
--drilldown_output_columns _key,_nsubrecs,_max,_min,_sum,_avg
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ],
# [
# "_max",
# "Int64"
# ],
# [
# "_min",
# "Int64"
# ],
# [
# "_sum",
# "Int64"
# ],
# [
# "_avg",
# "Float"
# ]
# ],
# [
# "Hello",
# 1,
# 5,
# 5,
# 5,
# 5.0
# ],
# [
# "Groonga",
# 2,
# 15,
# 10,
# 25,
# 12.5
# ],
# [
# "Senna",
# 2,
# 3,
# 3,
# 6,
# 3.0
# ]
# ]
# ]
# ]

The select command specifies multiple calculation types separated by "," like
MAX,MIN,SUM,AVG. You can use _nsubrecs /reference/columns/pseudo in
drilldown_output_columns without specifying COUNT in drilldown_calc_types. Because COUNT
is always enabled.

The default value of drilldown_calc_types is NONE. It means that only COUNT is enabled.
Because NONE is just ignored and COUNT is always enabled.

drilldown_calc_target
Specifies the target column for drilldown_calc_types.

If you specify a calculation type that needs a target column such as MAX in
drilldown_calc_types but you omit drilldown_calc_target, the calculation result is always
0.

You can specify only one column name like --drilldown_calc_target n_likes. You can't
specify multiple column name like --drilldown_calc_target _key,n_likes.

You can use referenced value from the target record by combining "." like
--drilldown_calc_target reference_column.nested_reference_column.value.

See drilldown_calc_types to know how to use drilldown_calc_target.

The default value of drilldown_calc_target is null. It means that no calculation target
column is specified.

Advanced drilldown related parameters
You can get multiple drilldown results by specifying multiple group keys by drilldown. But
you need to use the same configuration for all drilldowns. For example,
drilldown_output_columns is used by all drilldowns.

You can use a configuration for each drilldown by the following parameters:

· drilldown[${LABEL}].keys

· drilldown[${LABEL}].sortby

· drilldown[${LABEL}].output_columns

· drilldown[${LABEL}].offset

· drilldown[${LABEL}].limit

· drilldown[${LABEL}].calc_types

· drilldown[${LABEL}].calc_target

${LABEL} is a variable. You can use the following characters for ${LABEL}:

· Alphabets

· Digits

· .

· _

NOTE:
You can use more characters but it's better that you use only these characters.

Parameters that has the same ${LABEL} value are grouped. Grouped parameters are used for
one drilldown.

For example, there are 2 groups for the following parameters:

· --drilldown[label1].keys _key

· --drilldown[label1].output_columns _nsubrecs

· --drilldown[label2].keys tag

· --drilldown[label2].output_columns _key,_nsubrecs

drilldown[label1].keys and drilldown[label1].output_columns are grouped.
drilldown[label2].keys and drilldown[label2].output_columns are also grouped.

In label1 group, _key is used for group key and _nsubrecs is used for output columns.

In label2 group, tag is used for group key and _key,_nsubrecs is used for output columns.

See document for corresponding drilldown_XXX parameter to know how to use it for the
following parameters:

· drilldown[${LABEL}].sortby: drilldown_sortby

· drilldown[${LABEL}].offset: drilldown_offset

· drilldown[${LABEL}].limit: drilldown_limit

· drilldown[${LABEL}].calc_types: drilldown_calc_types

· drilldown[${LABEL}].calc_target: drilldown_calc_target

The following parameters are needed more description:

· drilldown[${LABEL}].keys

· drilldown[${LABEL}].output_columns

Output format is different a bit. It's also needed more description.

drilldown[${LABEL}].keys
drilldown can specify multiple keys for multiple drilldowns. But it can't specify multiple
keys for one drilldown.

drilldown[${LABEL}].keys can't specify multiple keys for multiple drilldowns. But it can
specify multiple keys for one drilldown.

You can specify multiple keys separated by ",".

Here is an example to group by multiple keys, tag and n_likes column values:

Execution example:

select Entries \
--limit -1 \
--output_column tag,n_likes \
--drilldown[tag.n_likes].keys tag,n_likes \
--drilldown[tag.n_likes].output_columns _value.tag,_value.n_likes,_nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ],
# {
# "tag.n_likes": [
# [
# 4
# ],
# [
# [
# "tag",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "Hello",
# 5,
# 1
# ],
# [
# "Groonga",
# 10,
# 1
# ],
# [
# "Groonga",
# 15,
# 1
# ],
# [
# "Senna",
# 3,
# 2
# ]
# ]
# }
# ]
# ]

tag.n_likes is used as the label for the drilldown parameters group. You can refer grouped
keys by _value.${KEY_NAME} syntax in drilldown[${LABEL}].output_columns. ${KEY_NAME} is a
column name to be used by group key. tag and n_likes are ${KEY_NAME} in this case.

Note that you can't use _value.${KEY_NAME} syntax when you just specify one key as
drilldown[${LABEL}].keys like --drilldown[tag].keys tag. You should use _key for the case.
It's the same rule in drilldown_output_columns.

drilldown[${LABEL}].output_columns
It's almost same as drilldown_output_columns. The difference between
drilldown_output_columns and drilldown[${LABEL}].output_columns is how to refer group
keys.

drilldown_output_columns uses _key /reference/columns/pseudo to refer group key.
drilldown[${LABEL}].output_columns also uses _key /reference/columns/pseudo to refer group
key when you specify only one group key by drilldown[${LABEL}].keys.

Here is an example to refer single group key by _key /reference/columns/pseudo:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown[tag.n_likes].keys tag,n_likes \
--drilldown[tag.n_likes].output_columns _value.tag,_value.n_likes
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# {
# "tag.n_likes": [
# [
# 4
# ],
# [
# [
# "tag",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# "Hello",
# 5
# ],
# [
# "Groonga",
# 10
# ],
# [
# "Groonga",
# 15
# ],
# [
# "Senna",
# 3
# ]
# ]
# }
# ]
# ]

But you can't refer each group key by _key /reference/columns/pseudo in
drilldown[${LABEL}].output_columns. You need to use _value.${KEY_NAME} syntax. ${KEY_NAME}
is a column name that is used for group key in drilldown[${LABEL}].keys.

Here is an example to refer each group key in multiple group keys by _value.${KEY_NAME}
syntax:

Execution example:

select Entries \
--limit 0 \
--output_column _id \
--drilldown[tag.n_likes].keys tag,n_likes \
--drilldown[tag.n_likes].output_columns _value.tag,_value.n_likes
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ]
# ],
# {
# "tag.n_likes": [
# [
# 4
# ],
# [
# [
# "tag",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# "Hello",
# 5
# ],
# [
# "Groonga",
# 10
# ],
# [
# "Groonga",
# 15
# ],
# [
# "Senna",
# 3
# ]
# ]
# }
# ]
# ]

TIP:
Why _value.${KEY_NAME} syntax?

It's implementation specific information.

_key is a vector value. The vector value is consists of all group keys. You can see
byte sequence of the vector value by referring _key in
drilldown[${LABEL}].output_columns.

There is one grouped record in _value to refer each grouped values when you specify
multiple group keys to drilldown[${LABEL}].keys. So you can refer each group key by
_value.${KEY_NAME} syntax.

On the other hand, there is no grouped record in _value when you specify only one group
key to drilldown[${LABEL}].keys. So you can't refer group key by _value.${KEY_NAME}
syntax.

Output format for drilldown[${LABEL}] style
There is a difference in output format between drilldown and drilldown[${LABEL}].keys.
drilldown uses array to output multiple drilldown results. drilldown[${LABEL}].keys uses
pairs of label and drilldown result.

drilldown uses the following output format:

[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT1,
DRILLDOWN_RESULT2,
...
]
]

drilldown[${LABEL}].keys uses the following output format:

[
HEADER,
[
SEARCH_RESULT,
{
"LABEL1": DRILLDOWN_RESULT1,
"LABEL2": DRILLDOWN_RESULT2,
...
}
]
]

Cache related parameter
cache
Specifies whether caching the result of this query or not.

If the result of this query is cached, the next same query returns response quickly by
using the cache.

It doesn't control whether existing cached result is used or not.

Here are available values:

┌──────┬──────────────────────────────────┐
│Value │ Description │
├──────┼──────────────────────────────────┤
no │ Don't cache the output of this │
│ │ query. │
├──────┼──────────────────────────────────┤
yes │ Cache the output of this query. │
│ │ It's the default value. │
└──────┴──────────────────────────────────┘

Here is an example to disable caching the result of this query:

Execution example:

select Entries --cache no
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5,
# "Hello"
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10,
# "Groonga"
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15,
# "Groonga"
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3,
# "Senna"
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3,
# "Senna"
# ]
# ]
# ]
# ]

The default value is yes.

Score related parameters
There is a score related parameter, adjuster.

adjuster
Specifies one or more score adjust expressions. You need to use adjuster with query or
filter. adjuster doesn't work with not searched request.

You can increase score of specific records by adjuster. You can use adjuster to set high
score for important records.

For example, you can use adjuster to increase score of records that have groonga tag.

Here is the syntax:

--adjuster "SCORE_ADJUST_EXPRESSION1 + SCORE_ADJUST_EXPRESSION2 + ..."

Here is the SCORE_ADJUST_EXPRESSION syntax:

COLUMN @ "KEYWORD" * FACTOR

Note the following:

· COLUMN must be indexed.

· "KEYWORD" must be a string.

· FACTOR must be a positive integer.

Here is a sample adjuster usage example that uses just one SCORE_ADJUST_EXPRESSION:

Execution example:

select Entries \
--filter true \
--adjuster 'content @ "groonga" * 5' \
--output_columns _key,content,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "The first post!",
# "Welcome! This is my first post!",
# 1
# ],
# [
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 6
# ],
# [
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1
# ],
# [
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1
# ],
# [
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1
# ]
# ]
# ]
# ]

The select command matches all records. Then it applies adjuster. The adjuster increases
score of records that have "groonga" in Entries.content column by 5. There is only one
record that has "groonga" in Entries.content column. So the record that its key is
"Groonga" has score 6 (= 1 + 5).

You can omit FACTOR. If you omit FACTOR, it is treated as 1.

Here is a sample adjuster usage example that omits FACTOR:

Execution example:

select Entries \
--filter true \
--adjuster 'content @ "groonga"' \
--output_columns _key,content,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "The first post!",
# "Welcome! This is my first post!",
# 1
# ],
# [
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 2
# ],
# [
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 1
# ],
# [
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1
# ],
# [
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1
# ]
# ]
# ]
# ]

The adjuster in the select command doesn't have FACTOR. So the factor is treated as 1.
There is only one record that has "groonga" in Entries.content column. So the record that
its key is "Groonga" has score 2 (= 1 + 1).

Here is a sample adjuster usage example that uses multiple SCORE_ADJUST_EXPRESSION:

Execution example:

select Entries \
--filter true \
--adjuster 'content @ "groonga" * 5 + content @ "started" * 3' \
--output_columns _key,content,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "The first post!",
# "Welcome! This is my first post!",
# 1
# ],
# [
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 9
# ],
# [
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 4
# ],
# [
# "Good-bye Senna",
# "I migrated all Senna system!",
# 1
# ],
# [
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 1
# ]
# ]
# ]
# ]

The adjuster in the select command has two SCORE_ADJUST_EXPRESSION s. The final increased
score is sum of scores of these SCORE_ADJUST_EXPRESSION s. All SCORE_ADJUST_EXPRESSION s
in the select command are applied to a record that its key is "Groonga". So the final
increased score of the record is sum of scores of all SCORE_ADJUST_EXPRESSION s.

The first SCORE_ADJUST_EXPRESSION is content @ "groonga" * 5. It increases score by 5.

The second SCORE_ADJUST_EXPRESSION is content @ "started" * 3. It increases score by 3.

The final increased score is 9 (= 1 + 5 + 3).

A SCORE_ADJUST_EXPRESSION has a factor for "KEYWORD". This means that increased scores of
all records that has "KEYWORD" are the same value. You can change increase score for each
record that has the same "KEYWORD". It is useful to tune search score. See
weight-vector-column for details.

Return value
select returns response with the following format:

[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT_1,
DRILLDOWN_RESULT_2,
...,
DRILLDOWN_RESULT_N
]
]

If select fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

There are zero or more DRILLDOWN_RESULT. If no drilldown and drilldown[${LABEL}].keys are
specified, they are omitted like the following:

[
HEADER,
[
SEARCH_RESULT
]
]

If drilldown has two or more keys like --drilldown "_key, column1, column2", multiple
DRILLDOWN_RESULT exist:

[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT_FOR_KEY,
DRILLDOWN_RESULT_FOR_COLUMN1,
DRILLDOWN_RESULT_FOR_COLUMN2
]
]

If drilldown[${LABEL}].keys is used, only one DRILLDOWN_RESULT exist:

[
HEADER,
[
SEARCH_RESULT,
DRILLDOWN_RESULT_FOR_LABELED_DRILLDOWN
]
]

DRILLDOWN_RESULT format is different between drilldown and drilldown[${LABEL}].keys. It's
described later.

SEARCH_RESULT is the following format:

[
[N_HITS],
COLUMNS,
RECORDS
]

See Simple usage for concrete example of the format.

N_HITS is the number of matched records before limit is applied.

COLUMNS describes about output columns specified by output_columns. It uses the following
format:

[
[COLUMN_NAME_1, COLUMN_TYPE_1],
[COLUMN_NAME_2, COLUMN_TYPE_2],
...,
[COLUMN_NAME_N, COLUMN_TYPE_N]
]

COLUMNS includes one or more output column information. Each output column information
includes the followings:

· Column name as string

· Column type as string or null

Column name is extracted from value specified as output_columns.

Column type is Groonga's type name or null. It doesn't describe whether the column value
is vector or scalar. You need to determine it by whether real column value is array or
not.

See /reference/types for type details.

null is used when column value type isn't determined. For example, function call in
output_columns such as --output_columns "snippet_html(content)" uses null.

Here is an example of COLUMNS:

[
["_id", "UInt32"],
["_key", "ShortText"],
["n_likes", "UInt32"],
]

RECORDS includes column values for each matched record. Included records are selected by
offset and limit. It uses the following format:

[
[
RECORD_1_COLUMN_1,
RECORD_1_COLUMN_2,
...,
RECORD_1_COLUMN_N
],
[
RECORD_2_COLUMN_1,
RECORD_2_COLUMN_2,
...,
RECORD_2_COLUMN_N
],
...
[
RECORD_N_COLUMN_1,
RECORD_N_COLUMN_2,
...,
RECORD_N_COLUMN_N
]
]

Here is an example RECORDS:

[
[
1,
"The first post!",
5
],
[
2,
"Groonga",
10
],
[
3,
"Mroonga",
15
]
]

DRILLDOWN_RESULT format is different between drilldown and drilldown[${LABEL}].keys.

drilldown uses the same format as SEARCH_RESULT:

[
[N_HITS],
COLUMNS,
RECORDS
]

And drilldown generates one or more DRILLDOWN_RESULT when drilldown has one ore more keys.

drilldown[${LABEL}].keys uses the following format. Multiple drilldown[${LABEL}].keys are
mapped to one object (key-value pairs):

{
"LABEL_1": [
[N_HITS],
COLUMNS,
RECORDS
],
"LABEL_2": [
[N_HITS],
COLUMNS,
RECORDS
],
...,
"LABEL_N": [
[N_HITS],
COLUMNS,
RECORDS
]
}

Each drilldown[${LABEL}].keys corresponds to the following:

"LABEL": [
[N_HITS],
COLUMNS,
RECORDS
]

The following value part is the same format as SEARCH_RESULT:

[
[N_HITS],
COLUMNS,
RECORDS
]

See also Output format for drilldown[${LABEL}] style for drilldown[${LABEL}] style
drilldown output format.

See also
· /reference/grn_expr/query_syntax

· /reference/grn_expr/script_syntax

shutdown
Summary
shutdown stops the Groonga server process.

shutdown uses graceful shutdown by default. If there are some running commands, the
Groonga server process stops after these running commands are finished. New command
requests aren't processed after shutdown command is executed.

New in version 6.0.1: shutdown uses immediate shutdown by specifying immediate to mode
parameter. The Groonga server process stops immediately even when there are some running
commands.

NOTE:
You need to set /reference/command/request_id to all requests to use immediate
shutdown.

Syntax
This command takes only one optional parameter:

shutdown [mode=graceful]

Usage
shutdown use graceful shutdown by default:

Execution example:

shutdown
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can specify graceful to mode parameter explicitly:

Execution example:

shutdown --mode graceful
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can choose immediate shutdown by specifying immediate to mode parameter:

Execution example:

shutdown --mode immediate
# [[0, 1337566253.89858, 0.000355720520019531], true]

Immediate shutdown is useful when you don't have time for graceful shutdown. For example,
Windows kills service that takes long time to stop on Windows shutdown.

Parameters
This section describes parameters of this command.

Required parameters
There is no required parameter.

Optional parameters
There are optional parameters.

mode
Specifies shutdown mode. Here are available shutdown modes:

┌──────────┬──────────────────────────────────┐
│Value │ Description │
├──────────┼──────────────────────────────────┤
graceful │ Stops after running commands are │
│ │ finished. │
│ │ │
│ │ This is the default. │
├──────────┼──────────────────────────────────┤
immediate │ New in version 6.0.1: Stops │
│ │ immediately even if there are │
│ │ some running commands. │
└──────────┴──────────────────────────────────┘

Return value
shutdown returns true as body when shutdown is accepted:

[HEADER, true]

If shutdown doesn't accept shutdown, error details are in HEADER.

See /reference/command/output_format for HEADER.

status
Summary
status returns the current status of the context that processes the request.

Context is an unit that processes requests. Normally, context is created for each thread.

Syntax
This command takes no parameters:

status

Usage
Here is a simple example:

Execution example:

status
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "uptime": 0,
# "max_command_version": 2,
# "start_time": 1441980651,
# "cache_hit_rate": 0.0,
# "version": "5.0.7-126-gb6fd7f7",
# "alloc_count": 206,
# "command_version": 1,
# "starttime": 1441980651,
# "default_command_version": 1,
# "n_queries": 0
# }
# ]

It returns the current status of the context that processes the request. See Return value
for details.

Parameters
This section describes all parameters.

Required parameters
There is no required parameter.

Optional parameters
There is no optional parameter.

Return value
The command returns the current status as an object:

[
HEADER,
{
"alloc_count": ALLOC_COUNT,
"cache_hit_rate": CACHE_HIT_RATE,
"command_version": COMMAND_VERSION,
"default_command_version": DEFAULT_COMMAND_VERSION,
"max_command_version": MAX_COMMAND_VERSION,
"n_queries": N_QUERIES,
"start_time": START_TIME,
"starttime": STARTTIME,
"uptime": UPTIME,
"version": VERSION
}
]

See /reference/command/output_format for HEADER.

Here are descriptions about values. See Usage for real values:

┌────────────────────────┬────────────────────────────────────┬────────────┐
│Key │ Description │ Example │
├────────────────────────┼────────────────────────────────────┼────────────┤
alloc_count │ The number of allocated │ 1400
│ │ memory blocks that │ │
│ │ aren't freed. If this │ │
│ │ value is continuously │ │
│ │ increased, there may be │ │
│ │ a memory leak. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
cache_hit_rate │ Percentage of cache used │ 29.4
│ │ responses in the Groonga │ │
│ │ process. If there are 10 │ │
│ │ requests and 7 responses │ │
│ │ are created from cache, │ │
│ │ cache_hit_rate is 70.0. │ │
│ │ The percentage is │ │
│ │ computed from only │ │
│ │ requests that use │ │
│ │ commands that support │ │
│ │ cache. │ │
│ │ │ │
│ │ Here are commands that │ │
│ │ support cache: │ │
│ │ │ │
│ │ · select │ │
│ │ │ │
│ │ · logical_select │ │
│ │ │ │
│ │ · logical_range_filter │ │
│ │ │ │
│ │ · logical_count │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
command_version │ The │ 1
│ │ /reference/command/command_version │ │
│ │ that is used by the context. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
default_command_version │ The default │ 1
│ │ /reference/command/command_version │ │
│ │ of the Groonga process. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
max_command_version │ The max │ 2
│ │ /reference/command/command_version │ │
│ │ of the Groonga process. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
n_queries │ The number of requests processed │ 29
│ │ by the Groonga process. It counts │ │
│ │ only requests that use commands │ │
│ │ that support cache. │ │
│ │ │ │
│ │ Here are commands that support │ │
│ │ cache: │ │
│ │ │ │
│ │ · select │ │
│ │ │ │
│ │ · logical_select │ │
│ │ │ │
│ │ · logical_range_filter │ │
│ │ │ │
│ │ · logical_count │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
start_time │ New in version 5.0.8. │ 1441761403
│ │ │ │
│ │ │ │
│ │ The time that the Groonga process │ │
│ │ started in UNIX time. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
starttime │ Deprecated since version 5.0.8: │ 1441761403
│ │ Use start_time instead. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
uptime │ The elapsed time since the Groonga │ 216639
│ │ process started in second. │ │
│ │ │ │
│ │ For example, 216639 means that 2.5 │ │
│ │ (= 216639 / 60 / 60 / 24 = 2.507) │ │
│ │ days. │ │
├────────────────────────┼────────────────────────────────────┼────────────┤
version │ The version of the Groonga │ 5.0.7
│ │ process. │ │
└────────────────────────┴────────────────────────────────────┴────────────┘

suggest
NOTE:
The suggest feature specification isn't stable. The specification may be changed.

Summary
suggest - returns completion, correction and/or suggestion for a query.

The suggest command returns completion, correction and/or suggestion for a specified
query.

See /reference/suggest/introduction about completion, correction and suggestion.

Syntax
suggest types table column query [sortby [output_columns [offset [limit [frequency_threshold [conditional_probability_threshold [prefix_search]]]]]]]

Usage
Here are learned data for completion.

Execution example:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "1", "time": 1312950803.86057, "item": "e"},
{"sequence": "1", "time": 1312950803.96857, "item": "en"},
{"sequence": "1", "time": 1312950804.26057, "item": "eng"},
{"sequence": "1", "time": 1312950804.56057, "item": "engi"},
{"sequence": "1", "time": 1312950804.76057, "item": "engin"},
{"sequence": "1", "time": 1312950805.86057, "item": "engine", "type": "submit"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 6]

Here are learned data for correction.

Execution example:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "2", "time": 1312950803.86057, "item": "s"},
{"sequence": "2", "time": 1312950803.96857, "item": "sa"},
{"sequence": "2", "time": 1312950804.26057, "item": "sae"},
{"sequence": "2", "time": 1312950804.56057, "item": "saer"},
{"sequence": "2", "time": 1312950804.76057, "item": "saerc"},
{"sequence": "2", "time": 1312950805.76057, "item": "saerch", "type": "submit"},
{"sequence": "2", "time": 1312950809.76057, "item": "serch"},
{"sequence": "2", "time": 1312950810.86057, "item": "search", "type": "submit"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 8]

Here are learned data for suggestion.

Execution example:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "3", "time": 1312950803.86057, "item": "search engine", "type": "submit"},
{"sequence": "3", "time": 1312950808.86057, "item": "web search realtime", "type": "submit"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

Here is a completion example.

Execution example:

suggest --table item_query --column kana --types complete --frequency_threshold 1 --query en
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "complete": [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "engine",
# 1
# ]
# ]
# }
# ]

Here is a correction example.

Execution example:

suggest --table item_query --column kana --types correct --frequency_threshold 1 --query saerch
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "correct": [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "search",
# 1
# ]
# ]
# }
# ]

Here is a suggestion example.

Execution example:

suggest --table item_query --column kana --types suggest --frequency_threshold 1 --query search
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "suggest": [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "search engine",
# 1
# ],
# [
# "web search realtime",
# 1
# ]
# ]
# }
# ]

Here is a mixed example.

Execution example:

suggest --table item_query --column kana --types complete|correct|suggest --frequency_threshold 1 --query search
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "suggest": [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "search engine",
# 1
# ],
# [
# "web search realtime",
# 1
# ]
# ],
# "complete": [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "search",
# 2
# ],
# [
# "search engine",
# 2
# ]
# ],
# "correct": [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "search",
# 2
# ]
# ]
# }
# ]

Parameters
types Specifies what types are returned by the suggest command.

Here are available types:

complete
The suggest command does completion.

correct
The suggest command does correction.

suggest
The suggest command does suggestion.

You can specify one or more types separated by |. Here are examples:
It returns correction:

correct

It returns correction and suggestion:

correct|suggest

It returns complete, correction and suggestion:

complete|correct|suggest

table Specifies table name that has item_${DATA_SET_NAME} format. For example,
item_query is a table name if you created dataset by the following command:

groonga-suggest-create-dataset /tmp/db-path query

column Specifies a column name that has furigana in Katakana in table table.

query Specifies query for completion, correction and/or suggestion.

sortby Specifies sort key.

Default:
-_score

output_columns
Specifies output columns.

Default:
_key,_score

offset Specifies returned records offset.

Default:
0

limit Specifies number of returned records.

Default:
10

frequency_threshold
Specifies threshold for item frequency. Returned records must have _score that is
greater than or equal to frequency_threshold.

Default:
100

conditional_probability_threshold
Specifies threshold for conditional probability. Conditional probability is used for
learned data. It is probability of query submission when query is occurred. Returned
records must have conditional probability that is greater than or equal to
conditional_probability_threshold.

Default:
0.2

prefix_search
Specifies whether optional prefix search is used or not in completion.

Here are available values:

yes Prefix search is always used.

no Prefix search is never used.

auto Prefix search is used only when other search can't find any records.

Default:
auto

similar_search
Specifies whether optional similar search is used or not in correction.

Here are available values:

yes Similar search is always used.

no Similar search is never used.

auto Similar search is used only when other search can't find any records.

Default:
auto

Return value
Here is a returned JSON format:

{"type1": [["candidate1", score of candidate1],
["candidate2", score of candidate2],
...],
"type2": [["candidate1", score of candidate1],
["candidate2", score of candidate2],
...],
...}

type
A type specified by types.

candidate
A candidate for completion, correction or suggestion.

score of candidate
A score of corresponding candidate. It means that higher score candidate is more likely
candidate for completion, correction or suggestion. Returned candidates are sorted by
score of candidate descending by default.

See also
· /reference/suggest

· /reference/executables/groonga-suggest-create-dataset

table_create
Summary
table_create creates a new table in the current database. You need to create one or more
tables to store and search data.

Syntax
This command takes many parameters.

The required parameter is only name and otehrs are optional:

table_create name
[flags=TABLE_HASH_KEY]
[key_type=null]
[value_type=null]
[default_tokenizer=null]
[normalizer=null]
[token_filters=null]

Usage
table_create command creates a new persistent table. See /reference/tables for table
details.

Create data store table
You can use all table types for data store table. See /reference/tables for all table
types.

Table type is specified as TABLE_${TYPE} to flags parameter.

Here is an example to create TABLE_NO_KEY table:

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]

The table_create command creates a table that is named Logs and is TABLE_NO_KEY type.

If your records aren't searched by key, TABLE_NO_KEY type table is suitable. Because
TABLE_NO_KEY doesn't support key but it is fast and small table. Storing logs into Groonga
database is the case.

If your records are searched by key or referenced by one or more columns, TABLE_NO_KEY
type isn't suitable. Lexicon for fulltext search is the case.

Create large data store table
If you want to store many large keys, your table may not be able to store them. If total
key data is larger than 4GiB, you can't store all key data into your table by default.

You can expand the maximum total key size to 1TiB from 4GiB by KEY_LARGE flag. KEY_LARGE
flag can be used with only TABLE_HASH_KEY. You can't use KEY_LARGE flag with TABLE_NO_KEY,
TABLE_PAT_KEY nor TABLE_DAT_KEY.

Here is an example to create a table that can store many large keys:

Execution example:

table_create Paths TABLE_HASH_KEY|KEY_LARGE ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

The table_create command creates a table that is named Paths and is TABLE_HASH_KEY type.
The Paths table can store many large keys.

Create lexicon table
You can use all table types except TABLE_NO_KEY for lexicon table. Lexicon table needs
key support but TABLE_NO_KEY doesn't support key.

Here is an example to create TABLE_PAT_KEY table:

Execution example:

table_create Lexicon TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]

The table_create command creates the following table:

· The table is named Lexicon.

· The table is TABLE_PAT_KEY type table.

· The table's key is ShortText type.

· The table uses TokenBigram tokenizer to extract tokens from a normalized text.

· The table uses NormalizerAuto normalizer to normalize a text.

TABLE_PAT_KEY is suitable table type for lexicon table. Lexicon table is used for fulltext
search.

In fulltext search, predictive search may be used for fuzzy search. Predictive search is
supported by TABLE_PAT_KEY and TABLE_DAT_KEY.

Lexicon table has many keys because a fulltext target text has many tokens. Table that has
many keys should consider table size because large table requires large memory. Requiring
large memory causes disk I/O. It blocks fast search. So table size is important for a
table that has many keys. TABLE_PAT_KEY is less table size than TABLE_DAT_KEY.

Because of the above reasons, TABLE_PAT_KEY is suitable table type for lexicon table.

Create tag index table
You can use all table types except TABLE_NO_KEY for tag index table. Tag index table needs
key support but TABLE_NO_KEY doesn't support key.

Here is an example to create TABLE_HASH_KEY table:

Execution example:

table_create Tags TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

The table_create command creates a table that is named Tags, is TABLE_HASH_KEY type and
has ShortText type key.

TABLE_HASH_KEY or TABLE_DAT_KEY are suitable table types for tag index table.

If you need only exact match tag search feature, TABLE_HASH_KEY is suitable. It is the
common case.

If you also need predictive tag search feature (for example, searching "groonga" by "gr"
keyword.), TABLE_DAT_KEY is suitable. TABLE_DAT_KEY is large table size but it is not
important because the number of tags will not be large.

Create range index table
You can use TABLE_PAT_KEY and TABLE_DAT_KEY table types for range index table. Range index
table needs range search support but TABLE_NO_KEY and TABLE_HASH_KEY don't support it.

Here is an example to create TABLE_DAT_KEY table:

Execution example:

table_create Ages TABLE_DAT_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]

The table_create command creates a table that is named Ages, is TABLE_DAT_KEY type and has
UInt32 type key.

TABLE_PAT_KEY and TABLE_DAT_KEY are suitable table types for range index table.

If you don't have many indexed items, TABLE_DAT_KEY is suitable. Index for age is the case
in the above example. Index for age will have only 0-100 items because human doesn't live
so long.

If you have many indexed items, TABLE_PAT_KEY is suitable. Because TABLE_PAT_KEY is
smaller than TABLE_DAT_KEY.

Parameters
This section describes all parameters.

name
Specifies a table name to be created. name must be specified.

Here are available characters:

· 0 .. 9 (digit)

· a .. z (alphabet, lower case)

· A .. Z (alphabet, upper case)

· # (hash)

· @ (at mark)

· - (hyphen)

· _ (underscore) (NOTE: Underscore can't be used as the first character.)

You need to create a name with one or more the above characters. Note that you cannot use
_ as the first character such as _name.

flags
Specifies a table type and table customize options.

Here are available flags:

┌───────────────┬──────────────────────────────────┐
│Flag │ Description │
├───────────────┼──────────────────────────────────┤
TABLE_NO_KEY │ Array table. See also │
│ │ table-no-key. │
├───────────────┼──────────────────────────────────┤
TABLE_HASH_KEY │ Hash table. See also │
│ │ table-hash-key. │
├───────────────┼──────────────────────────────────┤
TABLE_PAT_KEY │ Patricia trie. See also │
│ │ table-pat-key. │
├───────────────┼──────────────────────────────────┤
TABLE_DAT_KEY │ Double array trie. See also │
│ │ table-dat-key. │
├───────────────┼──────────────────────────────────┤
KEY_WITH_SIS │ Enable Semi Infinite String. │
│ │ Require TABLE_PAT_KEY. │
├───────────────┼──────────────────────────────────┤
KEY_LARGE │ Expand the maximum total key │
│ │ size to 1TiB from 4GiB. Require │
│ │ TABLE_HASH_KEY. │
└───────────────┴──────────────────────────────────┘

NOTE:
Since Groonga 2.1.0 KEY_NORMALIZE flag is deprecated. Use normalizer option with
NormalizerAuto instead.

You must specify one of TABLE_${TYPE} flags. You cannot specify two or more TABLE_${TYPE}
flags. For example, TABLE_NO_KEY|TABLE_HASH_KEY is invalid.

You can combine flags with | (vertical bar) such as TABLE_PAT_KEY|KEY_WITH_SIS.

See /reference/tables for difference between table types.

The default flags are TABLE_HASH_KEY.

key_type
Specifies key type.

If you specify TABLE_HASH_KEY, TABLE_PAT_KEY or TABLE_DAT_KEY as flags parameter, you need
to specify key_type option.

See /reference/types for all types.

The default value is none.

value_type
Specifies value type.

You can use value when you specify TABLE_NO_KEY, TABLE_HASH_KEY or TABLE_PAT_KEY as flags
parameter. Value type must be a fixed size type. For example, UInt32 can be used but
ShortText cannot be used. Use columns instead of value.

The default value is none.

default_tokenizer
Specifies the default tokenizer that is used on searching and data loading.

You must specify default_tokenizer for a table that is used for lexicon of fulltext search
index. See /reference/tokenizers for available tokenizers. You must choose a tokenizer
from the list for fulltext search.

You don't need to specify default_tokenizer in the following cases:

· You don't use the table as a lexicon.

· You use the table as a lexicon but you don't need fulltext search. For example:

· Index target data isn't text data such as Int32 and Time.

· You just need exact match search, prefix search and so on.

You can't use default_tokenizer with TABLE_NO_KEY flag because a table that uses
TABLE_NO_KEY flag can't be used as lexicon.

You must specify TABLE_HASH_KEY, TABLE_PAT_KEY, TABLE_DAT_KEY to flags when you want to
use the table as a lexicon.

The default value is none.

normalizer
Specifies a normalizer that is used to normalize key.

You cannot use normalizer with TABLE_NO_KEY because TABLE_NO_KEY doesn't support key.

See /reference/normalizers for all normalizsers.

The default value is none.

token_filters
Specifies token filters that is used to some processes tokenized token.

You cannot use token_filters with TABLE_NO_KEY because TABLE_NO_KEY doesn't support key.

See /reference/token_filters for all token filters.

The default value is none.

Return value
table_create returns true as body on success such as:

[HEADER, true]

If table_create fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

See also
· /reference/tables

· /reference/commands/column_create

· /reference/tokenizers

· /reference/normalizers

· /reference/command/output_format

table_list
Summary
table_list - DBに定義されているテーブルをリスト表示

Groonga組込コマンドの一つであるtable_listについて説明します。組込コマンドは、groonga実行ファイルの引数、標準入力、またはソケット経由でgroongaサーバにリクエストを送信することによって実行します。

table_listは、DBに定義されているテーブルのリストを表示します。

Syntax
table_list

Usage
Execution example:

table_list
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# "id",
# "UInt32"
# ],
# [
# "name",
# "ShortText"
# ],
# [
# "path",
# "ShortText"
# ],
# [
# "flags",
# "ShortText"
# ],
# [
# "domain",
# "ShortText"
# ],
# [
# "range",
# "ShortText"
# ],
# [
# "default_tokenizer",
# "ShortText"
# ],
# [
# "normalizer",
# "ShortText"
# ]
# ],
# [
# 259,
# "Ages",
# "/tmp/groonga-databases/commands_table_create.0000103",
# "TABLE_DAT_KEY|PERSISTENT",
# "UInt32",
# null,
# null,
# null
# ],
# [
# 257,
# "Lexicon",
# "/tmp/groonga-databases/commands_table_create.0000101",
# "TABLE_PAT_KEY|PERSISTENT",
# "ShortText",
# null,
# "TokenBigram",
# "NormalizerAuto"
# ],
# [
# 256,
# "Logs",
# "/tmp/groonga-databases/commands_table_create.0000100",
# "TABLE_NO_KEY|PERSISTENT",
# null,
# null,
# null,
# null
# ],
# [
# 258,
# "Tags",
# "/tmp/groonga-databases/commands_table_create.0000102",
# "TABLE_HASH_KEY|PERSISTENT",
# "ShortText",
# null,
# null,
# null
# ]
# ]
# ]

Parameters
ありません。

Return value
テーブル名一覧が以下の形式で返却されます。:

[[[テーブル情報名1,テーブル情報型1],...], テーブル情報1,...]

テーブル情報名n
テーブル情報n
には複数の情報が含まれますが、そこに入る情報がどんな内容かを示す名前を出力します。
情報名は以下の通りです。

id
テーブルオブジェクトに割り当てられたID

name
テーブル名

path
テーブルのレコードを格納するファイル名

flags
テーブルのflags属性

domain
主キー値の属する型

range
valueが属する型

テーブル情報型n
テーブル情報の型を出力します。

テーブル情報n
テーブル情報名n で示された情報の配列を出力します。 情報の順序は テーブル情報名n
の順序と同じです。

table_remove
Summary
table_remove removes a table and its columns. If there are one or more indexes against key
of the table and its columns, they are also removed.

New in version 6.0.1: You can also remove tables and columns that reference the target
table by using dependent parameter.

Syntax
This command takes two parameters:

table_remove name
[dependent=no]

Usage
You just specify table name that you want to remove. table_remove removes the table and
its columns. If the table and its columns are indexed, all index columns for the table and
its columns are also removed.

This section describes about the followings:

· Basic usage

· Unremovable cases

· Removes a table with tables and columns that reference the target table

· Decreases used resources

Basic usage
Let's think about the following case:

· There is one table Entries.

· Entries table has some columns.

· Entries table's key is indexed.

· A column of Entries is indexed.

Here are commands that create Entries table:

Execution example:

table_create Entries TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here are commands that create an index for Entries table's key:

Execution example:

table_create EntryKeys TABLE_HASH_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create EntryKeys key_index COLUMN_INDEX Entries _key
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here are commands that create an index for Entries table's column:

Execution example:

table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms content_index COLUMN_INDEX Entries content
# [[0, 1337566253.89858, 0.000355720520019531], true]

Let's confirm the current schema before running table_remove:

Execution example:

dump
# table_create Entries TABLE_HASH_KEY UInt32
# column_create Entries content COLUMN_SCALAR Text
# column_create Entries title COLUMN_SCALAR ShortText
#
# table_create EntryKeys TABLE_HASH_KEY UInt32
#
# table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
#
# column_create EntryKeys key_index COLUMN_INDEX Entries _key
# column_create Terms content_index COLUMN_INDEX Entries content

If you remove Entries table, the following tables and columns are removed:

· Entries

· Entries.title

· Entries.context

· EntryKeys.key_index

· Terms.content_index

The following tables (lexicons) aren't removed:

· EntryKeys

· Terms

Let's run table_remove:

Execution example:

table_remove Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is schema after table_remove. Only EntryKeys and Terms exist:

Execution example:

dump
# table_create EntryKeys TABLE_HASH_KEY UInt32
#
# table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto

Unremovable cases
There are some unremovable cases:

· One or more tables use the table as key type.

· One or more columns use the table as value type.

Both cases blocks dangling references. If the table is referenced as type and the table is
removed, tables and columns that refer the table are broken.

If the target table satisfies one of them, table_remove is failed. The target table and
its columns aren't removed.

Here is an example for the table is used as key type case.

The following commands create a table to be removed and a table that uses the table to be
removed as key type:

Execution example:

table_create ReferencedByTable TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create ReferenceTable TABLE_HASH_KEY ReferencedByTable
# [[0, 1337566253.89858, 0.000355720520019531], true]

table_remove against ReferencedByTable is failed:

Execution example:

table_remove ReferencedByTable
# [
# [
# -2,
# 1337566253.89858,
# 0.000355720520019531,
# "[table][remove] a table that references the table exists: <ReferenceTable._key> -> <ReferencedByTable>",
# [
# [
# "is_removable_table",
# "db.c",
# 8831
# ]
# ]
# ],
# false
# ]

You need to remove ReferenceTable before you remove ReferencedByTable:

Execution example:

table_remove ReferenceTable
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_remove ReferencedByTable
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here is an example for the table is used as value type case.

The following commands create a table to be removed and a column that uses the table to be
removed as value type:

Execution example:

table_create ReferencedByColumn TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Table TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Table reference_column COLUMN_SCALAR ReferencedByColumn
# [[0, 1337566253.89858, 0.000355720520019531], true]

table_remove against ReferencedByColumn is failed:

Execution example:

table_remove ReferencedByColumn
# [
# [
# -2,
# 1337566253.89858,
# 0.000355720520019531,
# "[table][remove] a column that references the table exists: <Table.reference_column> -> <ReferencedByColumn>",
# [
# [
# "is_removable_table",
# "db.c",
# 8851
# ]
# ]
# ],
# false
# ]

You need to remove Table.reference_column before you remove ReferencedByColumn:

Execution example:

column_remove Table reference_column
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_remove ReferencedByColumn
# [[0, 1337566253.89858, 0.000355720520019531], true]

Removes a table with tables and columns that reference the target table
New in version 6.0.1.

If you understand what you'll do, you can also remove tables and columns that reference
the target table with one table_remove command by using --dependent yes parameter.

ReferencedTable in the following schema is referenced from a table and a column:

Execution example:

table_create ReferencedTable TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Table1 TABLE_HASH_KEY ReferencedTable
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Table2 TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Table2 reference_column COLUMN_SCALAR ReferencedTable
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can't remove ReferencedTable by default:

Execution example:

table_remove ReferencedTable
# [
# [
# -2,
# 1337566253.89858,
# 0.000355720520019531,
# "[table][remove] a table that references the table exists: <Table1._key> -> <ReferencedTable>",
# [
# [
# "is_removable_table",
# "db.c",
# 8831
# ]
# ]
# ],
# false
# ]

You can remove ReferencedTable, Table1 and Table2.reference_column by using --dependent
yes parameter. Table1 and Table2.reference_column reference ReferencedTable:

Execution example:

table_remove ReferencedTable --dependent yes
# [[0, 1337566253.89858, 0.000355720520019531], true]

Decreases used resources
table_remove opens all tables and columns in database to check Unremovable cases.

If you have many tables and columns, table_remove may use many resources. There is a
workaround to avoid the case.

table_remove closes temporary opened tables and columns for checking when the max number
of threads is 1.

You can confirm and change the current max number of threads by thread_limit.

The feature is used in the following case:

Execution example:

table_create Entries TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
thread_limit 2
# [[0, 1337566253.89858, 0.000355720520019531], 1]
table_remove Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]

The feature isn't used in the following case:

Execution example:

table_create Entries TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
thread_limit 2
# [[0, 1337566253.89858, 0.000355720520019531], 1]
table_remove Entries
# [[0, 1337566253.89858, 0.000355720520019531], true]

Parameters
This section describes all parameters.

Required parameters
There is only one required parameter.

name
Specifies the table name to be removed.

See Usage how to use this parameter.

Optional parameters
There is only one optional parameter.

dependent
New in version 6.0.1.

Specifies whether tables and columns that reference the target table are also removed or
not.

If this value is yes, tables and columns that reference the target table are also removed.
Otherwise, they aren't removed and an error is returned.

In other words, if there are any tables and columns that reference the target table, the
target table isn't removed by default.

You should use this parameter carefully. This is a danger parameter.

See Removes a table with tables and columns that reference the target table how to use
this parameter.

Return value
The command returns true as body on success such as:

[HEADER, true]

If the command fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

table_rename
Summary
table_rename command renames a table.

It is a light operation. It just changes a relationship between name and the table object.
It doesn't copy table and its column values.

It is a dangerous operation. You must stop all operations including read operations while
you run table_rename. If the following case is occurred, Groonga process may be crashed:

· Starts an operation (like select) that accesses the table to be renamed by the
current table name. The current table name is called as the old table name in the
below because the table name is renamed.

· Runs table_rename. The select is still running.

· The select accesses the table to be renamed by the old table name. But the select
can't find the table by the old name because the table has been renamed to the new
table name. It may crash the Groonga process.

Syntax
This command takes two parameters.

All parameters are required:

table_rename name new_name

Usage
Here is a simple example of table_rename command.

Execution example:

table_create Users TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users score COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "Alice", "score": 2},
{"_key": "Bob", "score": 0},
{"_key": "Carlos", "score": -1}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
table_rename Users Players
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_list
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# "id",
# "UInt32"
# ],
# [
# "name",
# "ShortText"
# ],
# [
# "path",
# "ShortText"
# ],
# [
# "flags",
# "ShortText"
# ],
# [
# "domain",
# "ShortText"
# ],
# [
# "range",
# "ShortText"
# ],
# [
# "default_tokenizer",
# "ShortText"
# ],
# [
# "normalizer",
# "ShortText"
# ]
# ],
# [
# 256,
# "Players",
# "/tmp/groonga-databases/commands_table_rename.0000100",
# "TABLE_PAT_KEY|PERSISTENT",
# "ShortText",
# null,
# null,
# null
# ]
# ]
# ]
select Players
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "score",
# "Int32"
# ]
# ],
# [
# 1,
# "Alice",
# 2
# ],
# [
# 2,
# "Bob",
# 0
# ],
# [
# 3,
# "Carlos",
# -1
# ]
# ]
# ]
# ]

Parameters
This section describes parameters of table_rename.

Required parameters
All parameters are required.

name
Specifies the table name to be renamed.

new_name
Specifies the new table name.

Return value
The command returns true as body on success such as:

[HEADER, true]

If the command fails, error details are in HEADER.

See /reference/command/output_format for HEADER.

table_tokenize
Summary
table_tokenize command tokenizes text by the specified table's tokenizer.

Syntax
This command takes many parameters.

table and string are required parameters. Others are optional:

table_tokenize table
string
[flags=NONE]
[mode=GET]

Usage
Here is a simple example.

Execution example:

register token_filters/stop_word
# [[0,0.0,0.0],true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStopWord
# [[0,0.0,0.0],true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0,0.0,0.0],true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0,0.0,0.0],1]
table_tokenize Terms "Hello and Good-bye" --mode GET
# [
# [
# 0,
# 0.0,
# 0.0
# ],
# [
# {
# "value": "hello",
# "position": 0
# },
# {
# "value": "good",
# "position": 2
# },
# {
# "value": "-",
# "position": 3
# },
# {
# "value": "bye",
# "position": 4
# }
# ]
# ]

Terms table is set TokenBigram tokenizer, NormalizerAuto normalizer, TokenFilterStopWord
token filter. It returns tokens that is generated by tokenizeing "Hello and Good-bye" with
TokenBigram tokenizer. It is normalized by NormalizerAuto normalizer. and token is
removed with TokenFilterStopWord token filter.

Parameters
This section describes all parameters. Parameters are categorized.

Required parameters
There are required parameters, table and string.

table
Specifies the lexicon table. table_tokenize command uses the tokenizer, the normalizer,
the token filters that is set the lexicon table.

string
Specifies any string which you want to tokenize.

See tokenize-string option in /reference/commands/tokenize about details.

Optional parameters
There are optional parameters.

flags
Specifies a tokenization customize options. You can specify multiple options separated by
"|".

The default value is NONE.

See tokenize-flags option in /reference/commands/tokenize about details.

mode
Specifies a tokenize mode.

The default value is GET.

See tokenize-mode option in /reference/commands/tokenize about details.

Return value
table_tokenize command returns tokenized tokens.

See tokenize-return-value option in /reference/commands/tokenize about details.

See also
· /reference/tokenizers

· /reference/commands/tokenize

thread_limit
Summary
New in version 5.0.7.

thread_limit has the following two features:

· It returns the max number of threads.

· It sets the max number of threads.

/reference/executables/groonga is the only Groonga server that supports full thread_limit
features.

/reference/executables/groonga-httpd supports only one feature that returns the max number
of threads. The max number of threads of /reference/executables/groonga-httpd always
returns 1 because /reference/executables/groonga-httpd uses single thread model.

If you're using Groonga as a library, thread_limit doesn't work without you set custom
functions by grn_thread_set_get_limit_func() and grn_thread_set_set_limit_func(). If you
set a function by grn_thread_set_get_limit_func(), the feature that returns the max number
of threads works. If you set a function by grn_thread_set_set_limit_func(), the feature
that sets the max number of threads works.

Syntax
This command takes only one optional parameter:

thread_limit [max=null]

Usage
You can get the max number of threads by calling without any parameters:

Execution example:

thread_limit
# [[0, 1337566253.89858, 0.000355720520019531], 2]

If it returns 0, your Groonga server doesn't support the feature.

You can set the max number of threads by calling max parameter:

Execution example:

thread_limit --max 4
# [[0, 1337566253.89858, 0.000355720520019531], 2]

It returns the previous max number of threads when you pass max parameter.

Parameters
This section describes all parameters.

Required parameters
There is no required parameter.

Optional parameters
There is one optional parameter.

max
Specifies the new max number of threads.

You must specify positive integer:

Execution example:

thread_limit --max 3
# [[0, 1337566253.89858, 0.000355720520019531], 4]

If you specify max parameter, thread_limit returns the max number of threads before max is
applied.

Return value
The command returns the max number of threads as body:

[HEADER, N_MAX_THREADS]

If max is specified, N_MAX_THREADS is the max number of threads before max is applied.

See /reference/command/output_format for HEADER.

tokenize
Summary
tokenize command tokenizes text by the specified tokenizer. It is useful to debug
tokenization.

Syntax
This command takes many parameters.

tokenizer and string are required parameters. Others are optional:

tokenize tokenizer
string
[normalizer=null]
[flags=NONE]
[mode=ADD]
[token_filters=NONE]

Usage
Here is a simple example.

Execution example:

tokenize TokenBigram "Fulltext Search"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "Fu"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ul"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lt"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ex"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "t "
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": " S"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "Se"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ea"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "ar"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "rc"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "ch"
# },
# {
# "position": 14,
# "force_prefix": false,
# "value": "h"
# }
# ]
# ]

It has only required parameters. tokenizer is TokenBigram and string is "Fulltext Search".
It returns tokens that is generated by tokenizing "Fulltext Search" with TokenBigram
tokenizer. It doesn't normalize "Fulltext Search".

Parameters
This section describes all parameters. Parameters are categorized.

Required parameters
There are required parameters, tokenizer and string.

tokenizer
Specifies the tokenizer name. tokenize command uses the tokenizer that is named tokenizer.

See /reference/tokenizers about built-in tokenizers.

Here is an example to use built-in TokenTrigram tokenizer.

Execution example:

tokenize TokenTrigram "Fulltext Search"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "Ful"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ull"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "llt"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lte"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "tex"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ext"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "xt "
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "t S"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": " Se"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "Sea"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ear"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "arc"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "rch"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "ch"
# },
# {
# "position": 14,
# "force_prefix": false,
# "value": "h"
# }
# ]
# ]

If you want to use other tokenizers, you need to register additional tokenizer plugin by
register command. For example, you can use KyTea based tokenizer by registering
tokenizers/kytea.

string
Specifies any string which you want to tokenize.

If you want to include spaces in string, you need to quote string by single quotation (')
or double quotation (").

Here is an example to use spaces in string.

Execution example:

tokenize TokenBigram "Groonga is a fast fulltext earch engine!"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "Gr"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ro"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "oo"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "on"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "ng"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ga"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "a "
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": " i"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "is"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "s "
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": " a"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "a "
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": " f"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "fa"
# },
# {
# "position": 14,
# "force_prefix": false,
# "value": "as"
# },
# {
# "position": 15,
# "force_prefix": false,
# "value": "st"
# },
# {
# "position": 16,
# "force_prefix": false,
# "value": "t "
# },
# {
# "position": 17,
# "force_prefix": false,
# "value": " f"
# },
# {
# "position": 18,
# "force_prefix": false,
# "value": "fu"
# },
# {
# "position": 19,
# "force_prefix": false,
# "value": "ul"
# },
# {
# "position": 20,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 21,
# "force_prefix": false,
# "value": "lt"
# },
# {
# "position": 22,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 23,
# "force_prefix": false,
# "value": "ex"
# },
# {
# "position": 24,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 25,
# "force_prefix": false,
# "value": "t "
# },
# {
# "position": 26,
# "force_prefix": false,
# "value": " e"
# },
# {
# "position": 27,
# "force_prefix": false,
# "value": "ea"
# },
# {
# "position": 28,
# "force_prefix": false,
# "value": "ar"
# },
# {
# "position": 29,
# "force_prefix": false,
# "value": "rc"
# },
# {
# "position": 30,
# "force_prefix": false,
# "value": "ch"
# },
# {
# "position": 31,
# "force_prefix": false,
# "value": "h "
# },
# {
# "position": 32,
# "force_prefix": false,
# "value": " e"
# },
# {
# "position": 33,
# "force_prefix": false,
# "value": "en"
# },
# {
# "position": 34,
# "force_prefix": false,
# "value": "ng"
# },
# {
# "position": 35,
# "force_prefix": false,
# "value": "gi"
# },
# {
# "position": 36,
# "force_prefix": false,
# "value": "in"
# },
# {
# "position": 37,
# "force_prefix": false,
# "value": "ne"
# },
# {
# "position": 38,
# "force_prefix": false,
# "value": "e!"
# },
# {
# "position": 39,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

Optional parameters
There are optional parameters.

normalizer
Specifies the normalizer name. tokenize command uses the normalizer that is named
normalizer. Normalizer is important for N-gram family tokenizers such as TokenBigram.

Normalizer detects character type for each character while normalizing. N-gram family
tokenizers use character types while tokenizing.

Here is an example that doesn't use normalizer.

Execution example:

tokenize TokenBigram "Fulltext Search"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "Fu"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ul"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lt"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ex"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "t "
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": " S"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "Se"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ea"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "ar"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "rc"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "ch"
# },
# {
# "position": 14,
# "force_prefix": false,
# "value": "h"
# }
# ]
# ]

All alphabets are tokenized by two characters. For example, Fu is a token.

Here is an example that uses normalizer.

Execution example:

tokenize TokenBigram "Fulltext Search" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "fulltext"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "search"
# }
# ]
# ]

Continuous alphabets are tokenized as one token. For example, fulltext is a token.

If you want to tokenize by two characters with noramlizer, use
TokenBigramSplitSymbolAlpha.

Execution example:

tokenize TokenBigramSplitSymbolAlpha "Fulltext Search" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "fu"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ul"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lt"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ex"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "t"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "se"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "ea"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ar"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "rc"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "ch"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "h"
# }
# ]
# ]

All alphabets are tokenized by two characters. And they are normalized to lower case
characters. For example, fu is a token.

flags
Specifies a tokenization customize options. You can specify multiple options separated by
"|". For example, NONE|ENABLE_TOKENIZED_DELIMITER.

Here are available flags.

┌───────────────────────────┬──────────────────────────────────┐
│Flag │ Description │
├───────────────────────────┼──────────────────────────────────┤
NONE │ Just ignored. │
├───────────────────────────┼──────────────────────────────────┤
ENABLE_TOKENIZED_DELIMITER │ Enables tokenized delimiter. See │
│ │ /reference/tokenizers about │
│ │ tokenized delimiter details. │
└───────────────────────────┴──────────────────────────────────┘

Here is an example that uses ENABLE_TOKENIZED_DELIMITER.

Execution example:

tokenize TokenDelimit "Full￾text Sea￾crch" NormalizerAuto ENABLE_TOKENIZED_DELIMITER
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "full"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "text sea"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "crch"
# }
# ]
# ]

TokenDelimit tokenizer is one of tokenized delimiter supported tokenizer.
ENABLE_TOKENIZED_DELIMITER enables tokenized delimiter. Tokenized delimiter is special
character that indicates token border. It is U+FFFE. The character is not assigned any
character. It means that the character is not appeared in normal string. So the character
is good character for this puropose. If ENABLE_TOKENIZED_DELIMITER is enabled, the target
string is treated as already tokenized string. Tokenizer just tokenizes by tokenized
delimiter.

mode
Specifies a tokenize mode. If the mode is specified ADD, the text is tokenized by the rule
that adding a document. If the mode is specified GET, the text is tokenized by the rule
that searching a document. If the mode is omitted, the text is tokenized by the ADD mode.

The default mode is ADD.

Here is an example to the ADD mode.

Execution example:

tokenize TokenBigram "Fulltext Search" --mode ADD
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "Fu"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ul"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lt"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ex"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "t "
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": " S"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "Se"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ea"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "ar"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "rc"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "ch"
# },
# {
# "position": 14,
# "force_prefix": false,
# "value": "h"
# }
# ]
# ]

The last alphabet is tokenized by one character.

Here is an example to the GET mode.

Execution example:

tokenize TokenBigram "Fulltext Search" --mode GET
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "Fu"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ul"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lt"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "ex"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "t "
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": " S"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "Se"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ea"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "ar"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "rc"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "ch"
# }
# ]
# ]

The last alphabet is tokenized by two characters.

token_filters
Specifies the token filter names. tokenize command uses the tokenizer that is named
token_filters.

See /reference/token_filters about token filters.

Return value
tokenize command returns tokenized tokens. Each token has some attributes except token
itself. The attributes will be increased in the feature:

[HEADER, tokens]

HEADER
See /reference/command/output_format about HEADER.

tokens
tokens is an array of token. Token is an object that has the following attributes.

┌─────────┬─────────────────┐
│Name │ Description │
├─────────┼─────────────────┤
value │ Token itself. │
├─────────┼─────────────────┤
position │ The N-th token. │
└─────────┴─────────────────┘

See also
· /reference/tokenizers

tokenizer_list
Summary
tokenizer_list command lists tokenizers in a database.

Syntax
This command takes no parameters:

tokenizer_list

Usage
Here is a simple example.

Execution example:

tokenizer_list
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "name": "TokenMecab"
# },
# {
# "name": "TokenDelimit"
# },
# {
# "name": "TokenUnigram"
# },
# {
# "name": "TokenBigram"
# },
# {
# "name": "TokenTrigram"
# },
# {
# "name": "TokenBigramSplitSymbol"
# },
# {
# "name": "TokenBigramSplitSymbolAlpha"
# },
# {
# "name": "TokenBigramSplitSymbolAlphaDigit"
# },
# {
# "name": "TokenBigramIgnoreBlank"
# },
# {
# "name": "TokenBigramIgnoreBlankSplitSymbol"
# },
# {
# "name": "TokenBigramIgnoreBlankSplitSymbolAlpha"
# },
# {
# "name": "TokenBigramIgnoreBlankSplitSymbolAlphaDigit"
# },
# {
# "name": "TokenDelimitNull"
# },
# {
# "name": "TokenRegexp"
# }
# ]
# ]

It returns tokenizers in a database.

Return value
tokenizer_list command returns tokenizers. Each tokenizers has an attribute that contains
the name. The attribute will be increased in the feature:

[HEADER, tokenizers]

HEADER
See /reference/command/output_format about HEADER.

tokenizers
tokenizers is an array of tokenizer. Tokenizer is an object that has the following
attributes.

┌─────┬─────────────────┐
│Name │ Description │
├─────┼─────────────────┤
name │ Tokenizer name. │
└─────┴─────────────────┘

See also
· /reference/tokenizers

· /reference/commands/tokenize

truncate
Summary
truncate command deletes all records from specified table or all values from specified
column.

Syntax
This command takes only one required parameter:

truncate target_name

New in version 4.0.9: target_name parameter can be used since 4.0.9. You need to use table
parameter for 4.0.8 or earlier.

For backward compatibility, truncate command accepts table parameter. But it should not be
used for newly written code.

Usage
Here is a simple example of truncate command against a table.

Execution example:

table_create Users TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users score COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "Alice", "score": 2},
{"_key": "Bob", "score": 0},
{"_key": "Carlos", "score": -1}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "score",
# "Int32"
# ]
# ],
# [
# 1,
# "Alice",
# 2
# ],
# [
# 2,
# "Bob",
# 0
# ],
# [
# 3,
# "Carlos",
# -1
# ]
# ]
# ]
# ]
truncate Users
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "score",
# "Int32"
# ]
# ]
# ]
# ]
# ]

Here is a simple example of truncate command against a column.

Execution example:

table_create Users TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users score COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "Alice", "score": 2},
{"_key": "Bob", "score": 0},
{"_key": "Carlos", "score": -1}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "score",
# "Int32"
# ]
# ],
# [
# 1,
# "Alice",
# 2
# ],
# [
# 2,
# "Bob",
# 0
# ],
# [
# 3,
# "Carlos",
# -1
# ]
# ]
# ]
# ]
truncate Users.score
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Users
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "score",
# "Int32"
# ]
# ],
# [
# 1,
# "Alice",
# 0
# ],
# [
# 2,
# "Bob",
# 0
# ],
# [
# 3,
# "Carlos",
# 0
# ]
# ]
# ]
# ]

Parameters
This section describes parameters of truncate.

Required parameters
There is required parameter, target_name.

target_name
Specifies the name of table or column.

Return value
truncate command returns whether truncation is succeeded or not:

[HEADER, SUCCEEDED_OR_NOT]

HEADER
See /reference/command/output_format about HEADER.

SUCCEEDED_OR_NOT
If command succeeded, it returns true, otherwise it returns false on error.

Data types
Name
Groonga data types

Description
Groonga identifies data types to store.

A primary key of table and column value belong to some kind of data types in Groonga
database. And normally, column values become in common with all records in one table.

A primary key type and column type can be specified Groonga defined types, user defined
types or user defined table.

If you specify other table to primary key type, this table becomes subset of the table of
primary key type.

If you specify other table to column type, this column becomes reference key of the table
of column type.

Builtin types
The following types are defined as builtin types.

Bool
Boolean type. The possible values are true and false. (default: false)

To store a value by /reference/commands/load command, becomes false if you specify false,
0 or empty string, becomes true if you specify others.

Int8
Signed 8bit integer. It's -128 or more and 127 or less. (default: 0)

UInt8
Unsigned 8bit integer. Is't 0 or more and 255 or less. (default: 0)

Int16
Signed 16bit integer. It's -32,768 or more and 32,767 or less. (default: 0)

UInt16
Unsigned 16bit integer. It's 0 or more and 65,535 or less. (default: 0)

Int32
Signed 32bit integer. It's -2,147,483,648 or more and 2,147,483,647 or less. (default: 0)

UInt32
Unsigned 32bit integer. It's 0 or more and 4,294,967,295 or less. (default: 0)

Int64
Signed 64bit integer. It's -9,223,372,036,854,775,808 or more and
9,223,372,036,854,775,807 or less. (default: 0)

UInt64
Unsigned 64bit integer. It's 0 or more and 18,446,744,073,709,551,615 or less. (default:
0)

Float
Double-precision floating-point number of IEEE 754 as a real number. (default: 0.0)

See IEEE floating point - Wikipedia, the free encyclopedia or IEEE 754: Standard for
Binary Floating-Point for details of IEEE 754 format.

Time
Date and Time, the number of seconds that have elapsed since 1970-01-01 00:00:00 by 64 bit
signed integer. (default: 0)

To store a value by /reference/commands/load command, specifies the number of elapsed
seconds since 1970-01-01 00:00:00. To specify the detailed date and time than seconds, use
the decimal.

ShortText
String of 4,095 or less bytes. (default: "")

Text
String of 65,535 or less bytes. (default: "")

LongText
String of 2,147,483,647 or less bytes. (default: "")

TokyoGeoPoint
旧日本測地系による経緯度であり、経度と緯度をミリ秒単位で表現した整数の組により表現します。(デフォルト値:
0x0)

度分秒形式でx度y分z秒となる経度・緯度は、(((x * 60) + y) * 60 + z) *
1000という計算式でミリ秒単位へと変換されます。

/reference/commands/load コマンドで値を格納するときは、"ミリ秒単位の経度xミリ秒単位の緯度"
もしくは "経度の小数表記x緯度の小数表記"
という文字列表現を使って指定します。経度と緯度の区切りとしては、'x' のほかに ','
を使うことができます。

測地系の詳細については、 測地系 - Wikipedia を参照してください。

WGS84GeoPoint
世界測地系(World Geodetic System, WGS
84)による経緯度であり、経度と緯度をミリ秒単位で表現した整数の組により表現します。(デフォルト値:
0x0)

度分秒形式からミリ秒形式への変換方法や /reference/commands/load
コマンドにおける指定方法はTokyoGeoPointと同じです。

Limitations about types
Types that can't be specified in primary key of table
Text and LongText can't be specified in primary key of table.

ベクターとして格納できない型
Groongaのカラムは、ある型のベクターを保存することができます。しかし、ShortText, Text,
LongTextの3つの型についてはベクターとして保存したり出力したりすることはできますが、検索条件やドリルダウン条件に指定することができません。

テーブル型は、ベクターとして格納することができます。よって、ShortTextのベクターを検索条件やドリルダウン条件に使用したい場合には、主キーがShortText型のテーブルを別途作成し、そのテーブルを型として利用します。

Tables
Summary
Table in Groonga manages relation between ID and key. Groonga provides four table types.
They are TABLE_NO_KEY, TABLE_HASH_KEY, TABLE_PAT_KEY and TABLE_DAT_KEY.

All tables except TABLE_NO_KEY provides both fast ID search by key and fast key search by
ID. TABLE_NO_KEY doesn't support key. TABLE_NO_KEY only manages ID. So TABLE_NO_KEY
doesn't provides ID search and key search.

Characteristics
Here is a chracteristic table of all tables in Groonga. (TABLE_ prefix is omitted in the
table.)

┌─────────────────┬────────┬────────────┬───────────────┬──────────────────┐
│ │ NO_KEYHASH_KEYPAT_KEYDAT_KEY
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Data structure │ Array │ Hash table │ Patricia trie │ Double array │
│ │ │ │ │ trie │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│ID support │ o │ o │ o │ o │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Key support │ x │ o │ o │ o │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Value support │ o │ o │ o │ x │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Key -> ID speed │ - │ oo │ x │ o │
│ │ │ │ │ │
│ · o: fast │ │ │ │ │
│ │ │ │ │ │
│ · x: slow │ │ │ │ │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Update speed │ ooo │ o │ o │ x │
│ │ │ │ │ │
│ · o: fast │ │ │ │ │
│ │ │ │ │ │
│ · x: slow │ │ │ │ │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Size │ ooo │ o │ oo │ x │
│ │ │ │ │ │
│ · o: │ │ │ │ │
│ small │ │ │ │ │
│ │ │ │ │ │
│ · x: │ │ │ │ │
│ large │ │ │ │ │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Key update │ - │ x │ x │ o │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Common prefix │ - │ x │ o │ o │
│search │ │ │ │ │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Predictive │ - │ x │ o │ o │
│search │ │ │ │ │
├─────────────────┼────────┼────────────┼───────────────┼──────────────────┤
│Range search │ - │ x │ o │ o │
└─────────────────┴────────┴────────────┴───────────────┴──────────────────┘

TABLE_NO_KEY
TABLE_NO_KEY is very fast and very small but it doesn't support key. TABLE_NO_KEY is a
only table that doesn't support key.

You cannot use TABLE_NO_KEY for lexicon for fulltext search because lexicon stores tokens
as key. TABLE_NO_KEY is useful for no key records such as log.

TABLE_HASH_KEY
TABLE_HASH_KEY is fast but it doesn't support advanced search functions such as common
prefix search and predictive search.

TABLE_HASH_KEY is useful for index for exact search such as tag search.

TABLE_PAT_KEY
TABLE_PAT_KEY is small and supports advanced search functions.

TABLE_PAT_KEY is useful for lexicon for fulltext search and index for range search.

TABLE_DAT_KEY
TABLE_DAT_KEY is fast and supports key update but it is large. It is not suitable for
storing many records. TABLE_DAT_KEY is a only table that supports key update.

TABLE_DAT_KEY is used in Groonga database. Groonga database needs to convert object name
such as ShortText, TokenBigram and table names to object ID. And Groonga database needs to
rename object name. Those features are implemented by TABLE_DAT_KEY. The number of objects
is small. So large data size demerit of TABLE_DAT_KEY can be ignored.

Record ID
Record ID is assigned automatically. You cannot assign record ID.

Record ID of deleted record may be reused.

Valid record ID range is between 1 and 268435455. (1 and 268435455 are valid IDs.)

Persistent table and temporary table
Table is persistent table or temporary table.

Persistent table
Persistent table is named and registered to database. Records in persistent table aren't
deleted after closing table or database.

Persistent table can be created by /reference/commands/table_create command.

Temporary table
Temporary table is anonymous. Records in temporary table are deleted after closing table.
Temporary table is used to store search result, sort result, group (drilldown) result and
so on. TABLE_HASH_KEY is used for search result and group result. TABLE_NO_KEY is used for
sort result.

Limitations
The max number of records is 268435455. You cannot add 268435456 or more records in a
table.

The max number of a key size is 4096byte. You cannot use 4097byte or larger key. You can
use column instead of key for 4097byte or larger size data. Text and LargeText types
supports 4097byte or larger size data.

The max number of total key size is 4GiB. You need to split a table, split a database
(sharding) or reduce each key size to handle 4GiB or more larger total key size.

See also
· /reference/commands/table_create

Column
Column is a data store object or an index object for fast search.

A column belongs to a table. Table has zero or more columns.

Both data store column and index column have type. Type of data store column specifies
data range. In other words, it is "value type". Type of index column specifies set of
documents to be indexed. A set of documents is a table in Groonga. In other words, type of
index column must be a table.

Here are data store columns:

Scalar column
Summary
TODO

Usage
TODO

Vector column
Summary
Vector column is a data store object. It can stores zero or more scalar values. In short,
scalar value is a single value such as number and string. See scalar about scalar value
details.

One of vector column use cases is tags store. You can use a vector column to store tag
values.

You can use vector column as index search target in the same way as scalar column. You can
set weight for each element. The element that has one or more weight is matched, the
record has more score rather than no weight case. It is a vector column specific feature.
Vector column that can store weight is called weight vector column.

You can also do full text search against each text element. But search score is too high
when weight is used. You should use full text search with weight carefully.

Usage
There are three vector column types:

· Normal vector column

· Reference vector column

· Weight vector column

This section describes how to use these types.

Normal vector column
Normal vector column stores zero or more scalar data. For example, scalar data are number,
string and so on.

A normal vector column can store the same type elements. You can't mix types. For example,
you can't store a number and a string in the same normal vector column.

Normal vector column is useful when a record has multiple values with a key. Tags are the
most popular use case.

How to create
Use /reference/commands/column_create command to create a normal vector column. The point
is COLUMN_VECTOR flag:

Execution example:

table_create Bookmarks TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Bookmarks tags COLUMN_VECTOR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

You can set zero or more tags to a bookmark.

How to load
You can load vector data by JSON array syntax:

[ELEMENT1, ELEMENT2, ELEMENT3, ...]

Let's load the following data:

┌────────────────────┬─────────────────────────────────┐
_keytags
├────────────────────┼─────────────────────────────────┤
http://groonga.org/["groonga"]
├────────────────────┼─────────────────────────────────┤
http://mroonga.org/["mroonga", "mysql", "groonga"]
├────────────────────┼─────────────────────────────────┤
http://ranguba.org/["ruby", "groonga"]
└────────────────────┴─────────────────────────────────┘

Here is a command that loads the data:

Execution example:

load --table Bookmarks
[
{"_key": "http://groonga.org/", "tags": ["groonga"]},
{"_key": "http://mroonga.org/", "tags": ["mroonga", "mysql", "groonga"]},
{"_key": "http://ranguba.org/", "tags": ["ruby", "groonga"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

The loaded data can be outputted as JSON array syntax:

Execution example:

select Bookmarks
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://groonga.org/",
# [
# "groonga"
# ]
# ],
# [
# 2,
# "http://mroonga.org/",
# [
# "mroonga",
# "mysql",
# "groonga"
# ]
# ],
# [
# 3,
# "http://ranguba.org/",
# [
# "ruby",
# "groonga"
# ]
# ]
# ]
# ]
# ]

How to search
You need to create an index to search normal vector column:

Execution example:

table_create Tags TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Tags bookmark_index COLUMN_INDEX Bookmarks tags
# [[0, 1337566253.89858, 0.000355720520019531], true]

There is no vector column specific way. You can create an index like a scalar column.

You can search an element in tags like full text search syntax.

With select-match-columns and select-query:

Execution example:

select Bookmarks --match_columns tags --query mysql --output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://mroonga.org/",
# [
# "mroonga",
# "mysql",
# "groonga"
# ],
# 1
# ]
# ]
# ]
# ]

You can also use weight in select-match-columns:

Execution example:

select Bookmarks --match_columns 'tags * 3' --query mysql --output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://mroonga.org/",
# [
# "mroonga",
# "mysql",
# "groonga"
# ],
# 3
# ]
# ]
# ]
# ]

With select-filter:

Execution example:

select Bookmarks --filter 'tags @ "msyql"' --output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 0
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ]
# ]
# ]
# ]

Reference vector column
TODO

Reference vector column is space-efficient if there are many same value elements.
Reference vector column keeps reference record IDs not value itself. Record ID is smaller
than value itself.

How to create
TODO

How to load
TODO

How to search
TODO

Weight vector column
Weight vector column is similar to normal vector column. It can store elements. It can
also stores weights for them. Weight is degree of importance of the element.

Weight is positive integer. 0 is the default weight. It means that no weight.

If weight is one or larger, search score is increased by the weight. If the weight is 0,
score is 1. If the weight is 10, score is 11 (= 1 + 10).

Weight vector column is useful for tuning search score. See also select-adjuster. You can
increase search score of specific records.

Limitations
There are some limitations for now. They will be resolved in the future.

Here are limitations:

· You need to use string representation for element value on load. For example, you
can't use 29 for number 29. You need to use "29" for number 29.

How to create
Use /reference/commands/column_create command to create a weight vector column. The point
is COLUMN_VECTOR|WITH_WEIGHT flags:

Execution example:

table_create Bookmarks TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Bookmarks tags COLUMN_VECTOR|WITH_WEIGHT ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

If you don't specify WITH_WEIGHT flag, it is just a normal vector column.

You can set zero or more tags with weight to a bookmark.

How to load
You can load vector data by JSON object syntax:

{"ELEMENT1": WEIGHT1, "ELEMENT2": WEIGHT2, "ELEMENT3": WEIGHT3, ...}

Let's load the following data:

┌────────────────────┬──────────────────────────────────┐
_keytags
├────────────────────┼──────────────────────────────────┤
http://groonga.org/{"groonga": 100}
├────────────────────┼──────────────────────────────────┤
http://mroonga.org/{"mroonga": 100, "mysql": 50,
│ │ "groonga": 10}
├────────────────────┼──────────────────────────────────┤
http://ranguba.org/{"ruby": 100, "groonga": 50}
└────────────────────┴──────────────────────────────────┘

Here is a command that loads the data:

Execution example:

load --table Bookmarks
[
{"_key": "http://groonga.org/",
"tags": {"groonga": 100}},
{"_key": "http://mroonga.org/",
"tags": {"mroonga": 100,
"mysql": 50,
"groonga": 10}},
{"_key": "http://ranguba.org/",
"tags": {"ruby": 100,
"groonga": 50}}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

The loaded data can be outputted as JSON object syntax:

Execution example:

select Bookmarks
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ]
# ],
# [
# 1,
# "http://groonga.org/",
# {
# "groonga": 100
# }
# ],
# [
# 2,
# "http://mroonga.org/",
# {
# "mroonga": 100,
# "groonga": 10,
# "mysql": 50
# }
# ],
# [
# 3,
# "http://ranguba.org/",
# {
# "ruby": 100,
# "groonga": 50
# }
# ]
# ]
# ]
# ]

How to search
You need to create an index to search weight vector column. You don't forget to specify
WITH_WEIGHT flag to column_create:

Execution example:

table_create Tags TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Tags bookmark_index COLUMN_INDEX|WITH_WEIGHT Bookmarks tags
# [[0, 1337566253.89858, 0.000355720520019531], true]

There is no weight vector column specific way except WITH_WEIGHT flag. You can create an
index like a scalar column.

You can search an element in tags like full text search syntax.

With select-match-columns and select-query:

Execution example:

select Bookmarks --match_columns tags --query groonga --output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://groonga.org/",
# {
# "groonga": 100
# },
# 101
# ],
# [
# "http://mroonga.org/",
# {
# "mroonga": 100,
# "groonga": 10,
# "mysql": 50
# },
# 11
# ],
# [
# "http://ranguba.org/",
# {
# "ruby": 100,
# "groonga": 50
# },
# 51
# ]
# ]
# ]
# ]

You can also use weight in select-match-columns. The score is (1 +
weight_in_weight_vector) * weight_in_match_columns:

Execution example:

select Bookmarks --match_columns 'tags * 3' --query groonga --output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://groonga.org/",
# {
# "groonga": 100
# },
# 303
# ],
# [
# "http://mroonga.org/",
# {
# "mroonga": 100,
# "groonga": 10,
# "mysql": 50
# },
# 33
# ],
# [
# "http://ranguba.org/",
# {
# "ruby": 100,
# "groonga": 50
# },
# 153
# ]
# ]
# ]
# ]

With select-filter:

Execution example:

select Bookmarks --filter 'tags @ "groonga"' --output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://groonga.org/",
# {
# "groonga": 100
# },
# 101
# ],
# [
# "http://mroonga.org/",
# {
# "mroonga": 100,
# "groonga": 10,
# "mysql": 50
# },
# 11
# ],
# [
# "http://ranguba.org/",
# {
# "ruby": 100,
# "groonga": 50
# },
# 51
# ]
# ]
# ]
# ]

How to apply just weight
You can use weight in weight vector column to just increase search score without changing
a set of matched records.

Use select-adjuster for the purpose:

Execution example:

select Bookmarks \
--filter true \
--adjuster 'tags @ "mysql" * 10 + tags @ "groonga" * 5' \
--output_columns _key,tags,_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tags",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "http://groonga.org/",
# {
# "groonga": 100
# },
# 506
# ],
# [
# "http://mroonga.org/",
# {
# "mroonga": 100,
# "groonga": 10,
# "mysql": 50
# },
# 566
# ],
# [
# "http://ranguba.org/",
# {
# "ruby": 100,
# "groonga": 50
# },
# 256
# ]
# ]
# ]
# ]

The select command uses --filter true. So all records are matched with score 1. Then it
applies --adjuster. The adjuster does the following:

· tags @ "mysql" * 10 increases score by (1 + weight) * 10 of records that has "mysql"
tag.

· tags @ "groonga" * 5 increases score by (1 + weight) * 5 of records that has
"groonga" tag.

For example, record "http://mroonga.org/" has both "mysql" tag and "groonga" tag. So its
score is increased by 565 (= ((1 + 50) * 10) + ((1 + 10) * 5) = (51 * 10) + (11 * 5) = 510
+ 55). The search score is 1 by --filter true before applying --adjuster. So the final
search score is 566 (= 1 + 565) of record "http://mroonga.org/".

Pseudo column
名前
疑似カラム

説明
Groongaのデータベースで作成したテーブルには、いくつかのカラムが自動的に定義されます。

これらのカラムはいずれもアンダースコア('_')で始まる名前が付与されます。定義される疑似カラムは、テーブルの種類によって異なります。

_id
レコードに付与される一意な番号です。全てのテーブルに定義されます。値の範囲は1〜1073741824の整数で、通常はレコードを追加した順に1ずつ加算されます。_idの値は不変で、レコードが存在する限り変更することはできません。ただし、削除されたレコードの_idの値は再利用されます。

_key
レコードの主キー値を表します。主キーを持つテーブルのみに定義されます。主キー値はテーブルの中で一意であり、変更することはできません。

_value
レコードの値を表します。value_typeを指定したテーブルのみに定義されます。自由に変更可能です。

_score
各レコードのスコア値を表します。検索結果として生成されたテーブルのみに定義されます。

検索処理を実行する過程で値が設定されますが、自由に変更可能です。

_nsubrecs
主キーの値が同一であったレコードの件数を表します。検索結果として生成されたテーブルのみに定義されます。グループ化(drilldown)処理を実行すると、グループ化前のテーブルにおいて、グループ化キーの値が同一であったレコードの件数が、グループ化処理の結果を格納するテーブルの_nsubrecsに記録されます。

Here is an index column:

Index column
Summary
TODO

Usage
TODO

Normalizers
Summary
Groonga has normalizer module that normalizes text. It is used when tokenizing text and
storing table key. For example, A and a are processed as the same character after
normalization.

Normalizer module can be added as a plugin. You can customize text normalization by
registering your normalizer plugins to Groonga.

A normalizer module is attached to a table. A table can have zero or one normalizer
module. You can attach a normalizer module to a table by table-create-normalizer option in
/reference/commands/table_create.

Here is an example table_create that uses NormalizerAuto normalizer module:

Execution example:

table_create Dictionary TABLE_HASH_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]

NOTE:
Groonga 2.0.9 or earlier doesn't have --normalizer option in table_create.
KEY_NORMALIZE flag was used instead.

You can open an old database by Groonga 2.1.0 or later. An old database means that the
database is created by Groonga 2.0.9 or earlier. But you cannot open the opened old
database by Groonga 2.0.9 or earlier. Once you open the old database by Groonga 2.1.0
or later, KEY_NORMALIZE flag information in the old database is converted to normalizer
information. So Groonga 2.0.9 or earlier cannot find KEY_NORMALIZE flag information in
the opened old database.

Keys of a table that has a normalizer module are normalized:

Execution example:

load --table Dictionary
[
{"_key": "Apple"},
{"_key": "black"},
{"_key": "COLOR"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select Dictionary
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 1,
# "apple"
# ],
# [
# 2,
# "black"
# ],
# [
# 3,
# "color"
# ]
# ]
# ]
# ]

NormalizerAuto normalizer normalizes a text as a downcased text. For example, "Apple" is
normalized to "apple", "black" is normalized to "blank" and "COLOR" is normalized to
"color".

If a table is a lexicon for fulltext search, tokenized tokens are normalized. Because
tokens are stored as table keys. Table keys are normalized as described above.

Built-in normalizers
Here is a list of built-in normalizers:

· NormalizerAuto

· NormalizerNFKC51

NormalizerAuto
Normally you should use NormalizerAuto normalizer. NormalizerAuto was the normalizer for
Groonga 2.0.9 or earlier. KEY_NORMALIZE flag in table_create on Groonga 2.0.9 or earlier
equals to --normalizer NormalizerAuto option in table_create on Groonga 2.1.0 or later.

NormalizerAuto supports all encoding. It uses Unicode NFKC (Normalization Form
Compatibility Composition) for UTF-8 encoding text. It uses encoding specific original
normalization for other encodings. The results of those original normalization are similar
to NFKC.

For example, half-width katakana (such as U+FF76 HALFWIDTH KATAKANA LETTER KA) +
half-width katakana voiced sound mark (U+FF9E HALFWIDTH KATAKANA VOICED SOUND MARK) is
normalized to full-width katakana with voiced sound mark (U+30AC KATAKANA LETTER GA). The
former is two characters but the latter is one character.

Here is an example that uses NormalizerAuto normalizer:

Execution example:

table_create NormalLexicon TABLE_HASH_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]

NormalizerNFKC51
NormalizerNFKC51 normalizes texts by Unicode NFKC (Normalization Form Compatibility
Composition) for Unicode version 5.1. It supports only UTF-8 encoding.

Normally you don't need to use NormalizerNFKC51 explicitly. You can use NormalizerAuto
instead.

Here is an example that uses NormalizerNFKC51 normalizer:

Execution example:

table_create NFKC51Lexicon TABLE_HASH_KEY ShortText --normalizer NormalizerNFKC51
# [[0, 1337566253.89858, 0.000355720520019531], true]

Additional normalizers
There are additional normalizers:

· groonga-normalizer-mysql

See also
· /reference/commands/table_create

Tokenizers
Summary
Groonga has tokenizer module that tokenizes text. It is used when the following cases:

· Indexing text
[image] Tokenizer is used when indexing text..UNINDENT

· Searching by query
[image] Tokenizer is used when searching by query..UNINDENT

Tokenizer is an important module for full-text search. You can change trade-off between
precision and recall by changing tokenizer.

Normally, TokenBigram is a suitable tokenizer. If you don't know much about tokenizer,
it's recommended that you choose TokenBigram.

You can try a tokenizer by /reference/commands/tokenize and
/reference/commands/table_tokenize. Here is an example to try TokenBigram tokenizer by
/reference/commands/tokenize:

Execution example:

tokenize TokenBigram "Hello World"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "He"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "el"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lo"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "o "
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": " W"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "Wo"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "or"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "rl"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "ld"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "d"
# }
# ]
# ]

What is tokenize ?
"tokenize" is the process that extracts zero or more tokens from a text. There are some
"tokenize" methods.

For example, Hello World is tokenized to the following tokens by bigram tokenize method:

· He

· el

· ll

· lo

· o_ (_ means a white-space)

· _W (_ means a white-space)

· Wo

· or

· rl

· ld

In the above example, 10 tokens are extracted from one text Hello World.

For example, Hello World is tokenized to the following tokens by white-space-separate
tokenize method:

· Hello

· World

In the above example, 2 tokens are extracted from one text Hello World.

Token is used as search key. You can find indexed documents only by tokens that are
extracted by used tokenize method. For example, you can find Hello World by ll with bigram
tokenize method but you can't find Hello World by ll with white-space-separate tokenize
method. Because white-space-separate tokenize method doesn't extract ll token. It just
extracts Hello and World tokens.

In general, tokenize method that generates small tokens increases recall but decreases
precision. Tokenize method that generates large tokens increases precision but decreases
recall.

For example, we can find Hello World and A or B by or with bigram tokenize method. Hello
World is a noise for people who wants to search "logical and". It means that precision is
decreased. But recall is increased.

We can find only A or B by or with white-space-separate tokenize method. Because World is
tokenized to one token World with white-space-separate tokenize method. It means that
precision is increased for people who wants to search "logical and". But recall is
decreased because Hello World that contains or isn't found.

Built-in tokenizsers
Here is a list of built-in tokenizers:

· TokenBigram

· TokenBigramSplitSymbol

· TokenBigramSplitSymbolAlpha

· TokenBigramSplitSymbolAlphaDigit

· TokenBigramIgnoreBlank

· TokenBigramIgnoreBlankSplitSymbol

· TokenBigramIgnoreBlankSplitAlpha

· TokenBigramIgnoreBlankSplitAlphaDigit

· TokenUnigram

· TokenTrigram

· TokenDelimit

· TokenDelimitNull

· TokenMecab

· TokenRegexp

TokenBigram
TokenBigram is a bigram based tokenizer. It's recommended to use this tokenizer for most
cases.

Bigram tokenize method tokenizes a text to two adjacent characters tokens. For example,
Hello is tokenized to the following tokens:

· He

· el

· ll

· lo

Bigram tokenize method is good for recall because you can find all texts by query consists
of two or more characters.

In general, you can't find all texts by query consists of one character because one
character token doesn't exist. But you can find all texts by query consists of one
character in Groonga. Because Groonga find tokens that start with query by predictive
search. For example, Groonga can find ll and lo tokens by l query.

Bigram tokenize method isn't good for precision because you can find texts that includes
query in word. For example, you can find world by or. This is more sensitive for ASCII
only languages rather than non-ASCII languages. TokenBigram has solution for this problem
described in the below.

TokenBigram behavior is different when it's worked with any /reference/normalizers.

If no normalizer is used, TokenBigram uses pure bigram (all tokens except the last token
have two characters) tokenize method:

Execution example:

tokenize TokenBigram "Hello World"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "He"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "el"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lo"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "o "
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": " W"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "Wo"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "or"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "rl"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "ld"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "d"
# }
# ]
# ]

If normalizer is used, TokenBigram uses white-space-separate like tokenize method for
ASCII characters. TokenBigram uses bigram tokenize method for non-ASCII characters.

You may be confused with this combined behavior. But it's reasonable for most use cases
such as English text (only ASCII characters) and Japanese text (ASCII and non-ASCII
characters are mixed).

Most languages consists of only ASCII characters use white-space for word separator.
White-space-separate tokenize method is suitable for the case.

Languages consists of non-ASCII characters don't use white-space for word separator.
Bigram tokenize method is suitable for the case.

Mixed tokenize method is suitable for mixed language case.

If you want to use bigram tokenize method for ASCII character, see TokenBigramSplitXXX
type tokenizers such as TokenBigramSplitSymbolAlpha.

Let's confirm TokenBigram behavior by example.

TokenBigram uses one or more white-spaces as token delimiter for ASCII characters:

Execution example:

tokenize TokenBigram "Hello World" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "hello"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "world"
# }
# ]
# ]

TokenBigram uses character type change as token delimiter for ASCII characters. Character
type is one of them:

· Alphabet

· Digit

· Symbol (such as (, ) and !)

· Hiragana

· Katakana

· Kanji

· Others

The following example shows two token delimiters:

· at between 100 (digits) and cents (alphabets)

· at between cents (alphabets) and !!! (symbols)

Execution example:

tokenize TokenBigram "100cents!!!" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "100"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "cents"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "!!!"
# }
# ]
# ]

Here is an example that TokenBigram uses bigram tokenize method for non-ASCII characters.

Execution example:

tokenize TokenBigram "日本語の勉強" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "日本"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "本語"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "語の"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "の勉"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "勉強"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "強"
# }
# ]
# ]

TokenBigramSplitSymbol
TokenBigramSplitSymbol is similar to TokenBigram. The difference between them is symbol
handling. TokenBigramSplitSymbol tokenizes symbols by bigram tokenize method:

Execution example:

tokenize TokenBigramSplitSymbol "100cents!!!" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "100"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "cents"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

TokenBigramSplitSymbolAlpha
TokenBigramSplitSymbolAlpha is similar to TokenBigram. The difference between them is
symbol and alphabet handling. TokenBigramSplitSymbolAlpha tokenizes symbols and alphabets
by bigram tokenize method:

Execution example:

tokenize TokenBigramSplitSymbolAlpha "100cents!!!" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "100"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "ce"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "en"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "nt"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "ts"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "s!"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

TokenBigramSplitSymbolAlphaDigit
TokenBigramSplitSymbolAlphaDigit is similar to TokenBigram. The difference between them is
symbol, alphabet and digit handling. TokenBigramSplitSymbolAlphaDigit tokenizes symbols,
alphabets and digits by bigram tokenize method. It means that all characters are tokenized
by bigram tokenize method:

Execution example:

tokenize TokenBigramSplitSymbolAlphaDigit "100cents!!!" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "10"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "00"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "0c"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "ce"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "en"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "nt"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "ts"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "s!"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

TokenBigramIgnoreBlank
TokenBigramIgnoreBlank is similar to TokenBigram. The difference between them is blank
handling. TokenBigramIgnoreBlank ignores white-spaces in continuous symbols and non-ASCII
characters.

You can find difference of them by ! ! ! text because it has symbols and
non-ASCII characters.

Here is a result by TokenBigram :

Execution example:

tokenize TokenBigram "日 本 語 ! ! !" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "日"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "本"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "語"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

Here is a result by TokenBigramIgnoreBlank:

Execution example:

tokenize TokenBigramIgnoreBlank "日 本 語 ! ! !" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "日本"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "本語"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "語"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "!!!"
# }
# ]
# ]

TokenBigramIgnoreBlankSplitSymbol
TokenBigramIgnoreBlankSplitSymbol is similar to TokenBigram. The differences between them
are the followings:

· Blank handling

· Symbol handling

TokenBigramIgnoreBlankSplitSymbol ignores white-spaces in continuous symbols and non-ASCII
characters.

TokenBigramIgnoreBlankSplitSymbol tokenizes symbols by bigram tokenize method.

You can find difference of them by ! ! ! text because it has symbols and
non-ASCII characters.

Here is a result by TokenBigram :

Execution example:

tokenize TokenBigram "日 本 語 ! ! !" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "日"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "本"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "語"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

Here is a result by TokenBigramIgnoreBlankSplitSymbol:

Execution example:

tokenize TokenBigramIgnoreBlankSplitSymbol "日 本 語 ! ! !" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "日本"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "本語"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "語!"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

TokenBigramIgnoreBlankSplitSymbolAlpha
TokenBigramIgnoreBlankSplitSymbolAlpha is similar to TokenBigram. The differences between
them are the followings:

· Blank handling

· Symbol and alphabet handling

TokenBigramIgnoreBlankSplitSymbolAlpha ignores white-spaces in continuous symbols and
non-ASCII characters.

TokenBigramIgnoreBlankSplitSymbolAlpha tokenizes symbols and alphabets by bigram tokenize
method.

You can find difference of them by Hello ! ! ! text because it has symbols and
non-ASCII characters with white spaces and alphabets.

Here is a result by TokenBigram :

Execution example:

tokenize TokenBigram "Hello 日 本 語 ! ! !" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "hello"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "日"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "本"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "語"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

Here is a result by TokenBigramIgnoreBlankSplitSymbolAlpha:

Execution example:

tokenize TokenBigramIgnoreBlankSplitSymbolAlpha "Hello 日 本 語 ! ! !" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "he"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "el"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lo"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "o日"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "日本"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "本語"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "語!"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "!"
# }
# ]
# ]

TokenBigramIgnoreBlankSplitSymbolAlphaDigit
TokenBigramIgnoreBlankSplitSymbolAlphaDigit is similar to TokenBigram. The differences
between them are the followings:

· Blank handling

· Symbol, alphabet and digit handling

TokenBigramIgnoreBlankSplitSymbolAlphaDigit ignores white-spaces in continuous symbols and
non-ASCII characters.

TokenBigramIgnoreBlankSplitSymbolAlphaDigit tokenizes symbols, alphabets and digits by
bigram tokenize method. It means that all characters are tokenized by bigram tokenize
method.

You can find difference of them by Hello ! ! ! 777 text because it has symbols
and non-ASCII characters with white spaces, alphabets and digits.

Here is a result by TokenBigram :

Execution example:

tokenize TokenBigram "Hello 日 本 語 ! ! ! 777" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "hello"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "日"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "本"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "語"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "!"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "777"
# }
# ]
# ]

Here is a result by TokenBigramIgnoreBlankSplitSymbolAlphaDigit:

Execution example:

tokenize TokenBigramIgnoreBlankSplitSymbolAlphaDigit "Hello 日 本 語 ! ! ! 777" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "he"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "el"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ll"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "lo"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "o日"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "日本"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "本語"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "語!"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "!!"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "!7"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "77"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "77"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "7"
# }
# ]
# ]

TokenUnigram
TokenUnigram is similar to TokenBigram. The differences between them is token unit.
TokenBigram uses 2 characters per token. TokenUnigram uses 1 character per token.

Execution example:

tokenize TokenUnigram "100cents!!!" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "100"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "cents"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "!!!"
# }
# ]
# ]

TokenTrigram
TokenTrigram is similar to TokenBigram. The differences between them is token unit.
TokenBigram uses 2 characters per token. TokenTrigram uses 3 characters per token.

Execution example:

tokenize TokenTrigram "10000cents!!!!!" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "10000"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "cents"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "!!!!!"
# }
# ]
# ]

TokenDelimit
TokenDelimit extracts token by splitting one or more space characters (U+0020). For
example, Hello World is tokenized to Hello and World.

TokenDelimit is suitable for tag text. You can extract groonga and full-text-search and
http as tags from groonga full-text-search http.

Here is an example of TokenDelimit:

Execution example:

tokenize TokenDelimit "Groonga full-text-search HTTP" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "groonga"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "full-text-search"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "http"
# }
# ]
# ]

TokenDelimitNull
TokenDelimitNull is similar to TokenDelimit. The difference between them is separator
character. TokenDelimit uses space character (U+0020) but TokenDelimitNull uses NUL
character (U+0000).

TokenDelimitNull is also suitable for tag text.

Here is an example of TokenDelimitNull:

Execution example:

tokenize TokenDelimitNull "Groonga\u0000full-text-search\u0000HTTP" NormalizerAuto
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "groongau0000full-text-searchu0000http"
# }
# ]
# ]

TokenMecab
TokenMecab is a tokenizer based on MeCab part-of-speech and morphological analyzer.

MeCab doesn't depend on Japanese. You can use MeCab for other languages by creating
dictionary for the languages. You can use NAIST Japanese Dictionary for Japanese.

TokenMecab is good for precision rather than recall. You can find 東京都 and 京都 texts by
京都 query with TokenBigram but 東京都 isn't expected. You can find only 京都 text by 京都
query with TokenMecab.

If you want to support neologisms, you need to keep updating your MeCab dictionary. It
needs maintain cost. (TokenBigram doesn't require dictionary maintenance because
TokenBigram doesn't use dictionary.) mecab-ipadic-NEologd : Neologism dictionary for MeCab
may help you.

Here is an example of TokenMeCab. 東京都 is tokenized to 東京 and . They don't include
京都:

Execution example:

tokenize TokenMecab "東京都"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "東京"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "都"
# }
# ]
# ]

TokenRegexp
New in version 5.0.1.

CAUTION:
This tokenizer is experimental. Specification may be changed.

CAUTION:
This tokenizer can be used only with UTF-8. You can't use this tokenizer with EUC-JP,
Shift_JIS and so on.

TokenRegexp is a tokenizer for supporting regular expression search by index.

In general, regular expression search is evaluated as sequential search. But the following
cases can be evaluated as index search:

· Literal only case such as hello

· The beginning of text and literal case such as \A/home/alice

· The end of text and literal case such as \.txt\z

In most cases, index search is faster than sequential search.

TokenRegexp is based on bigram tokenize method. TokenRegexp adds the beginning of text
mark (U+FFEF) at the begging of text and the end of text mark (U+FFF0) to the end of text
when you index text:

Execution example:

tokenize TokenRegexp "/home/alice/test.txt" NormalizerAuto --mode ADD
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# {
# "position": 0,
# "force_prefix": false,
# "value": "￯"
# },
# {
# "position": 1,
# "force_prefix": false,
# "value": "/h"
# },
# {
# "position": 2,
# "force_prefix": false,
# "value": "ho"
# },
# {
# "position": 3,
# "force_prefix": false,
# "value": "om"
# },
# {
# "position": 4,
# "force_prefix": false,
# "value": "me"
# },
# {
# "position": 5,
# "force_prefix": false,
# "value": "e/"
# },
# {
# "position": 6,
# "force_prefix": false,
# "value": "/a"
# },
# {
# "position": 7,
# "force_prefix": false,
# "value": "al"
# },
# {
# "position": 8,
# "force_prefix": false,
# "value": "li"
# },
# {
# "position": 9,
# "force_prefix": false,
# "value": "ic"
# },
# {
# "position": 10,
# "force_prefix": false,
# "value": "ce"
# },
# {
# "position": 11,
# "force_prefix": false,
# "value": "e/"
# },
# {
# "position": 12,
# "force_prefix": false,
# "value": "/t"
# },
# {
# "position": 13,
# "force_prefix": false,
# "value": "te"
# },
# {
# "position": 14,
# "force_prefix": false,
# "value": "es"
# },
# {
# "position": 15,
# "force_prefix": false,
# "value": "st"
# },
# {
# "position": 16,
# "force_prefix": false,
# "value": "t."
# },
# {
# "position": 17,
# "force_prefix": false,
# "value": ".t"
# },
# {
# "position": 18,
# "force_prefix": false,
# "value": "tx"
# },
# {
# "position": 19,
# "force_prefix": false,
# "value": "xt"
# },
# {
# "position": 20,
# "force_prefix": false,
# "value": "t"
# },
# {
# "position": 21,
# "force_prefix": false,
# "value": "￰"
# }
# ]
# ]

Token filters
Summary
Groonga has token filter module that some processes tokenized token.

Token filter module can be added as a plugin.

You can customize tokenized token by registering your token filters plugins to Groonga.

A table can have zero or more token filters. You can attach token filters to a table by
table-create-token-filters option in /reference/commands/table_create.

Here is an example table_create that uses TokenFilterStopWord token filter module:

Execution example:

register token_filters/stop_word
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStopWord
# [[0, 1337566253.89858, 0.000355720520019531], true]

Available token filters
Here is the list of available token filters:

· TokenFilterStopWord

· TokenFilterStem

TokenFilterStopWord
TokenFilterStopWord removes stop words from tokenized token in searching the documents.

TokenFilterStopWord can specify stop word after adding the documents because it removes
token in searching the documents.

The stop word is specified is_stop_word column on lexicon table.

Here is an example that uses TokenFilterStopWord token filter:

Execution example:

register token_filters/stop_word
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Memos TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Memos content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStopWord
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
load --table Memos
[
{"content": "Hello"},
{"content": "Hello and Good-bye"},
{"content": "Good-bye"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select Memos --match_columns content --query "Hello and"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "Hello"
# ],
# [
# 2,
# "Hello and Good-bye"
# ]
# ]
# ]
# ]

and token is marked as stop word in Terms table.

"Hello" that doesn't have and in content is matched. Because and is a stop word and and is
removed from query.

TokenFilterStem
TokenFilterStem stems tokenized token.

Here is an example that uses TokenFilterStem token filter:

Execution example:

register token_filters/stem
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Memos TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Memos content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStem
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms memos_content COLUMN_INDEX|WITH_POSITION Memos content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Memos
[
{"content": "I develop Groonga"},
{"content": "I'm developing Groonga"},
{"content": "I developed Groonga"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select Memos --match_columns content --query "develops"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 1,
# "I develop Groonga"
# ],
# [
# 2,
# "I'm developing Groonga"
# ],
# [
# 3,
# "I developed Groonga"
# ]
# ]
# ]
# ]

All of develop, developing, developed and develops tokens are stemmed as develop. So we
can find develop, developing and developed by develops query.

See also
· /reference/commands/table_create

Query expanders
QueryExpanderTSV
Summary
QueryExpanderTSV is a query expander plugin that reads synonyms from TSV (Tab Separated
Values) file. This plugin provides poor feature than the embedded query expansion feature.
For example, it doesn't support word normalization. But it may be easy to use because you
can manage your synonyms by TSV file. You can edit your synonyms by spreadsheet
application such as Excel. With the embedded query expansion feature, you manage your
synonyms by Groonga's table.

Install
You need to register query_expanders/tsv as a plugin before you use QueryExpanderTSV:

plugin_register query_expanders/tsv

Usage
You just add --query_expander QueryExpanderTSV parameter to select command:

select --query "QUERY" --query_expander QueryExpanderTSV

If QUERY has registered synonyms, they are expanded. For example, there are the following
synonyms.

┌────────┬───────────┬───────────────┐
│word │ synonym 1 │ synonym 2 │
├────────┼───────────┼───────────────┤
│groonga │ groonga │ Senna │
├────────┼───────────┼───────────────┤
│mroonga │ mroonga │ groonga MySQL │
└────────┴───────────┴───────────────┘

The table means that synonym 1 and synonym 2 are synonyms of word. For example, groonga
and Senna are synonyms of groonga. And mroonga and groonga MySQL are synonyms of mroonga.

Here is an example of query expnasion that uses groonga as query:

select --query "groonga" --query_expander QueryExpanderTSV

The above command equals to the following command:

select --query "groonga OR Senna" --query_expander QueryExpanderTSV

Here is another example of query expnasion that uses mroonga search as query:

select --query "mroonga search" --query_expander QueryExpanderTSV

The above command equals to the following command:

select --query "(mroonga OR (groonga MySQL)) search" --query_expander QueryExpanderTSV

It is important that registered words (groonga and mroonga) are only expanded to synonyms
and not registered words (search) are not expanded. Query expansion isn't occurred
recursively. groonga is appeared in (mroonga OR (groonga MySQL)) as query expansion result
but it isn't expanded.

Normally, you need to include word itself into synonyms. For example, groonga and mroonga
are included in synonyms of themselves. If you want to ignore word itself, you don't
include word itself into synonyms. For example, if you want to use query expansion as
spelling correction, you should use the following synonyms.

┌───────┬─────────┐
│word │ synonym │
├───────┼─────────┤
│gronga │ groonga │
└───────┴─────────┘

gronga in word has a typo. A o is missing. groonga in synonym is the correct word.

Here is an example of using query expnasion as spelling correction:

select --query "gronga" --query_expander QueryExpanderTSV

The above command equals to the following command:

select --query "groonga" --query_expander QueryExpanderTSV

The former command has a typo in --query value but the latter command doesn't have any
typos.

TSV File
Synonyms are defined in TSV format file. This section describes about it.

Location
The file name should be synonyms.tsv and it is located at configuration directory. For
example, /etc/groonga/synonyms.tsv is a TSV file location. The location is decided at
build time.

You can change the location by environment variable GRN_QUERY_EXPANDER_TSV_SYNONYMS_FILE
at run time:

% env GRN_QUERY_EXPANDER_TSV_SYNONYMS_FILE=/tmp/synonyms.tsv groonga

With the above command, /tmp/synonyms.tsv file is used.

Format
You can define zero or more synonyms in a TSV file. You define a word and synonyms pair by
a line. word is expanded to synonyms in --query value. Synonyms are combined by OR. For
example, groonga and Senna synonyms are expanded as groonga OR Senna.

The first column is word and the rest columns are synonyms of the word. Here is a sample
line for word is groonga and synonyms are groonga and Senna. (TAB) means a tab character
(U+0009):

groonga(TAB)groonga(TAB)Senna

Comment line is supported. Lines that start with # are ignored. Here is an example for
comment line. groonga line is ignored as comment line:

#groonga(TAB)groonga(TAB)Senna
mroonga(TAB)mroonga(TAB)groonga MySQL

Limitation
You need to restart groonga to reload your synonyms. TSV file is loaded only at the plugin
load time.

See also
· select-query-expansion

Scorer
Summary
Groonga has scorer module that customizes score function. Score function computes score of
matched record. The default scorer function uses the number of appeared terms. It is also
known as TF (term frequency).

TF is a fast score function but it's not suitable for the following cases:

· Search query contains one or more frequently-appearing words such as "the" and "a".

· Document contains many same keywords such as "They are keyword, keyword, keyword ...
and keyword". Search engine spammer may use the technique.

Score function can solve these cases. For example, TF-IDF (term frequency-inverse document
frequency) can solve the first case. Okapi BM25 can solve the second case. But their are
slower than TF.

Groonga provides TF-IDF based scorer as /reference/scorers/scorer_tf_idf but doesn't
provide Okapi BM25 based scorer yet.
You don't need to resolve scoring only by score function. Score function is highly
depends on search query. You may be able to use metadata of matched record.

For example, Google uses PageRank for scoring. You may be able to use data type
("title" data are important rather than "memo" data), tag, geolocation and so on.

Please stop to think about only score function for scoring.

Usage
This section describes how to use scorer.

Here are a schema definition and sample data to show usage.

Sample schema:

Execution example:

table_create Memos TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Memos title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Memos content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms title_index COLUMN_INDEX|WITH_POSITION Memos title
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms content_index COLUMN_INDEX|WITH_POSITION Memos content
# [[0, 1337566253.89858, 0.000355720520019531], true]

Sample data:

Execution example:

load --table Memos
[
{
"_key": "memo1",
"title": "Groonga is easy",
"content": "Groonga is very easy full text search engine!"
},
{
"_key": "memo2",
"title": "Mroonga is easy",
"content": "Mroonga is more easier full text search engine!"
},
{
"_key": "memo3",
"title": "Rroonga is easy",
"content": "Ruby is very helpful."
},
{
"_key": "memo4",
"title": "Groonga is fast",
"content": "Groonga! Groonga! Groonga! Groonga is very fast!"
},
{
"_key": "memo5",
"title": "PGroonga is fast",
"content": "PGroonga is very fast!"
},
{
"_key": "memo6",
"title": "PGroonga is useful",
"content": "SQL is easy because many client libraries exist."
},
{
"_key": "memo7",
"title": "Mroonga is also useful",
"content": "MySQL has replication feature. Mroonga can use it."
}
]
# [[0, 1337566253.89858, 0.000355720520019531], 7]

You can specify custom score function in select-match-columns. There are some syntaxes.

For score function that doesn't require any parameter such as
/reference/scorers/scorer_tf_idf:

SCORE_FUNCTION(COLUMN)

You can specify weight:

SCORE_FUNCTION(COLUMN) * WEIGHT

For score function that requires one or more parameters such as
/reference/scorers/scorer_tf_at_most:

SCORE_FUNCTION(COLUMN, ARGUMENT1, ARGUMENT2, ...)

You can specify weight:

SCORE_FUNCTION(COLUMN, ARGUMENT1, ARGUMENT2, ...) * WEIGHT

You can use different score function for each select-match-columns:

SCORE_FUNCTION1(COLUMN1) ||
SCORE_FUNCTION2(COLUMN2) * WEIGHT ||
SCORE_FUNCTION3(COLUMN3, ARGUMENT1) ||
...

Here is a simplest example:

Execution example:

select Memos \
--match_columns "scorer_tf_idf(content)" \
--query "Groonga" \
--output_columns "content, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga! Groonga! Groonga! Groonga is very fast!",
# 2
# ],
# [
# "Groonga is very easy full text search engine!",
# 1
# ]
# ]
# ]
# ]

Groonga! Groonga! Groonga! Groonga is very fast! contains 4 Groonga. If you use TF based
scorer that is the default scorer, _score is 4. But the actual _score is 2. Because the
select command uses TF-IDF based scorer scorer_tf_idf().

Here is an example that uses weight:

Execution example:

select Memos \
--match_columns "scorer_tf_idf(content) * 10" \
--query "Groonga" \
--output_columns "content, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga! Groonga! Groonga! Groonga is very fast!",
# 22
# ],
# [
# "Groonga is very easy full text search engine!",
# 10
# ]
# ]
# ]
# ]

Groonga! Groonga! Groonga! Groonga is very fast! has 22 as _score. It had 2 as _score in
the previous example that doesn't specify weight.

Here is an example that uses scorer that requires one argument.
/reference/scorers/scorer_tf_at_most scorer requires one argument. You can limit TF score
by the scorer.

Execution example:

select Memos \
--match_columns "scorer_tf_at_most(content, 2.0)" \
--query "Groonga" \
--output_columns "content, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga! Groonga! Groonga! Groonga is very fast!",
# 2
# ],
# [
# "Groonga is very easy full text search engine!",
# 1
# ]
# ]
# ]
# ]

Groonga! Groonga! Groonga! Groonga is very fast! contains 4 Groonga. If you use normal TF
based scorer that is the default scorer, _score is 4. But the actual _score is 2. Because
the scorer used in the select command limits the maximum score value to 2.

Here is an example that uses multiple scorers:

Execution example:

select Memos \
--match_columns "scorer_tf_idf(title) || scorer_tf_at_most(content, 2.0)" \
--query "Groonga" \
--output_columns "title, content, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "title",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga is fast",
# "Groonga! Groonga! Groonga! Groonga is very fast!",
# 3
# ],
# [
# "Groonga is easy",
# "Groonga is very easy full text search engine!",
# 2
# ]
# ]
# ]
# ]

The --match_columns uses scorer_tf_idf(title) and scorer_tf_at_most(content, 2.0). _score
value is sum of them.

You can use the default scorer and custom scorer in the same --match_columns. You can use
the default scorer by just specifying a match column:

Execution example:

select Memos \
--match_columns "title || scorer_tf_at_most(content, 2.0)" \
--query "Groonga" \
--output_columns "title, content, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "title",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Groonga is fast",
# "Groonga! Groonga! Groonga! Groonga is very fast!",
# 3
# ],
# [
# "Groonga is easy",
# "Groonga is very easy full text search engine!",
# 2
# ]
# ]
# ]
# ]

The --match_columns uses the default scorer (TF) for title and
/reference/scorers/scorer_tf_at_most for content. _score value is sum of them.

Built-in scorers
Here are built-in scores:

scorer_tf_at_most
NOTE:
This scorer is an experimental feature.

New in version 5.0.1.

Summary
scorer_tf_at_most is a scorer based on TF (term frequency).

TF based scorer includes TF-IDF based scorer has a problem for the following case:

If document contains many same keywords such as "They are keyword, keyword, keyword ...
and keyword", the document has high score. It's not expected. Search engine spammer may
use the technique.

scorer_tf_at_most is a TF based scorer but it can solve the case.

scorer_tf_at_most limits the maximum score value. It means that scorer_tf_at_most limits
effect of a match.

If document contains many same keywords such as "They are keyword, keyword, keyword ...
and keyword", scorer_tf_at_most(column, 2.0) returns at most 2 as score.
You don't need to resolve scoring only by score function. Score function is highly
depends on search query. You may be able to use metadata of matched record.

For example, Google uses PageRank for scoring. You may be able to use data type
("title" data are important rather than "memo" data), tag, geolocation and so on.

Please stop to think about only score function for scoring.

Syntax
This scorer has two parameters:

scorer_tf_at_most(column, max)
scorer_tf_at_most(index, max)

Usage
This section describes how to use this scorer.

Here are a schema definition and sample data to show usage.

Sample schema:

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs message COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms message_index COLUMN_INDEX|WITH_POSITION Logs message
# [[0, 1337566253.89858, 0.000355720520019531], true]

Sample data:

Execution example:

load --table Logs
[
{"message": "Notice"},
{"message": "Notice Notice"},
{"message": "Notice Notice Notice"},
{"message": "Notice Notice Notice Notice"},
{"message": "Notice Notice Notice Notice Notice"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

You specify scorer_tf_at_most in select-match-columns like the following:

Execution example:

select Logs \
--match_columns "scorer_tf_at_most(message, 3.0)" \
--query "Notice" \
--output_columns "message, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "message",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Notice Notice Notice Notice Notice",
# 3
# ],
# [
# "Notice Notice Notice Notice",
# 3
# ],
# [
# "Notice Notice Notice",
# 3
# ],
# [
# "Notice Notice",
# 2
# ],
# [
# "Notice",
# 1
# ]
# ]
# ]
# ]

If a document has three or more Notice terms, its score is 3. Because the select specify
3.0 as the max score.

If a document has one or two Notice terms, its score is 1 or 2. Because the score is less
than 3.0 specified as the max score.

Parameters
This section describes all parameters.

Required parameters
There is only one required parameters.

column
The data column that is match target. The data column must be indexed.

index
The index column to be used for search.

Optional parameters
There is no optional parameter.

Return value
This scorer returns score as builtin-type-float.

/reference/commands/select returns _score as Int32 not Float. Because it casts to Int32
from Float for keeping backward compatibility.

Score is computed as TF with limitation.

See also
· ../scorer

scorer_tf_idf
NOTE:
This scorer is an experimental feature.

New in version 5.0.1.

Summary
scorer_tf_idf is a scorer based of TF-IDF (term frequency-inverse document frequency)
score function.

To put it simply, TF (term frequency) divided by DF (document frequency) is TF-IDF. "TF"
means that "the number of occurrences is more important". "TF divided by DF" means that
"the number of occurrences of important term is more important".

The default score function in Groonga is TF (term frequency). It doesn't care about term
importance but is fast.

TF-IDF cares about term importance but is slower than TF.

TF-IDF will compute more suitable score rather than TF for many cases. But it's not
perfect.

If document contains many same keywords such as "They are keyword, keyword, keyword ...
and keyword", it increases score by TF and TF-IDF. Search engine spammer may use the
technique. But TF-IDF doesn't guard from the technique.

Okapi BM25 can solve the case. But it's more slower than TF-IDF and not implemented yet in
Groonga.

Groonga provides scorer_tf_at_most scorer that can also solve the case.
You don't need to resolve scoring only by score function. Score function is highly
depends on search query. You may be able to use metadata of matched record.

For example, Google uses PageRank for scoring. You may be able to use data type
("title" data are important rather than "memo" data), tag, geolocation and so on.

Please stop to think about only score function for scoring.

Syntax
This scorer has only one parameter:

scorer_tf_idf(column)
scorer_tf_idf(index)

Usage
This section describes how to use this scorer.

Here are a schema definition and sample data to show usage.

Sample schema:

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs message COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms message_index COLUMN_INDEX|WITH_POSITION Logs message
# [[0, 1337566253.89858, 0.000355720520019531], true]

Sample data:

Execution example:

load --table Logs
[
{"message": "Error"},
{"message": "Warning"},
{"message": "Warning Warning"},
{"message": "Warning Warning Warning"},
{"message": "Info"},
{"message": "Info Info"},
{"message": "Info Info Info"},
{"message": "Info Info Info Info"},
{"message": "Notice"},
{"message": "Notice Notice"},
{"message": "Notice Notice Notice"},
{"message": "Notice Notice Notice Notice"},
{"message": "Notice Notice Notice Notice Notice"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 13]

You specify scorer_tf_idf in select-match-columns like the following:

Execution example:

select Logs \
--match_columns "scorer_tf_idf(message)" \
--query "Error OR Info" \
--output_columns "message, _score" \
--sortby "-_score"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "message",
# "Text"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Info Info Info Info",
# 3
# ],
# [
# "Error",
# 2
# ],
# [
# "Info Info Info",
# 2
# ],
# [
# "Info Info",
# 1
# ],
# [
# "Info",
# 1
# ]
# ]
# ]
# ]

Both the score of Info Info Info and the score of Error are 2 even Info Info Info includes
three Info terms. Because Error is more important term rather than Info. The number of
documents that include Info is 4. The number of documents that include Error is 1. Term
that is included in less documents means that the term is more characteristic term.
Characteristic term is important term.

Parameters
This section describes all parameters.

Required parameters
There is only one required parameters.

column
The data column that is match target. The data column must be indexed.

index
The index column to be used for search.

Optional parameters
There is no optional parameter.

Return value
This scorer returns score as builtin-type-float.

/reference/commands/select returns _score as Int32 not Float. Because it casts to Int32
from Float for keeping backward compatibility.

Score is computed as TF-IDF based algorithm.

See also
· ../scorer

grn_expr
Grn_expr is an object that searches records with specified conditions and manipulates a
database. It's pronounced as gurun expression.

Conditions for searching records from a database can be represented by conbining condition
expressions such as equal condition expression and less than condition expression with set
operations such as AND, OR and NOT. Grn_expr executes those conditions to search records.
You can also use advanced searches such as similar search and near search by grn_expr. You
can also use flexible full text search. For example, you can control hit scores for
specified words and improve recall by re-searching with high-recall algolithm dinamically.
To determine whether re-searching or not, the number of matched rescords is used.

There are three ways to create grn_expr:

· Parsing /reference/grn_expr/query_syntax string.

· Parsing /reference/grn_expr/script_syntax string.

· Calling grn_expr related APIs.

/reference/grn_expr/query_syntax is for common search form in Internet search site. It's
simple and easy to use but it has a limitation. You can not use all condition expressions
and set operations in /reference/grn_expr/query_syntax. You can use
/reference/grn_expr/query_syntax with query option in /reference/commands/select.

/reference/grn_expr/script_syntax is ECMAScript like syntax. You can use all condition
expressions and set operations in /reference/grn_expr/script_syntax. You can use
/reference/grn_expr/script_syntax with filter option and scorer option in
/reference/commands/select.

You can use groonga as a library and create a grn_expr by calling grn_expr related APIs.
You can use full features with calling APIs like /reference/grn_expr/script_syntax.
Calling APIs is useful creating a custom syntax to create grn_expr. They are used in
rroonga that is Ruby bindings of Groonga. Rroonga can create a grn_expr by Ruby's syntax
instead of parsing string.

Query syntax
Query syntax is a syntax to specify search condition for common Web search form. It is
similar to the syntax of Google's search form. For example, word1 word2 means that groonga
searches records that contain both word1 and word2. word1 OR word2 means that groogna
searches records that contain either word1 or word2.

Query syntax consists of conditional expression, combind expression and assignment
expression. Nomrally assignment expression can be ignored. Because assignment expression
is disabled in --query option of /reference/commands/select. You can use it if you use
groonga as library and customize query syntax parser options.

Conditinal expression specifies an condition. Combinded expression consists of one or more
conditional expression, combined expression or assignment expression. Assignment
expression can assigns a column to a value.

Sample data
Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key": "The first post!",
"content": "Welcome! This is my first post!",
"n_likes": 5},
{"_key": "Groonga",
"content": "I started to use Groonga. It's very fast!",
"n_likes": 10},
{"_key": "Mroonga",
"content": "I also started to use Mroonga. It's also very fast! Really fast!",
"n_likes": 15},
{"_key": "Good-bye Senna",
"content": "I migrated all Senna system!",
"n_likes": 3},
{"_key": "Good-bye Tritonn",
"content": "I also migrated all Tritonn system!",
"n_likes": 3}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

There is a table, Entries, for blog entries. An entry has title, content and the number of
likes for the entry. Title is key of Entries. Content is value of Entries.content column.
The number of likes is value of Entries.n_likes column.

Entries._key column and Entries.content column are indexed using TokenBigram tokenizer. So
both Entries._key and Entries.content are fulltext search ready.

OK. The schema and data for examples are ready.

Escape
There are special characters in query syntax. To use a special character as itself, it
should be escaped by prepending \. For example, " is a special character. It is escaped as
\".

Here is a special character list:

· [space] (escaped as [backslash][space]) (You should substitute [space] with a white
space character that is 0x20 in ASCII and [backslash] with \\.)

· " (escaped as \")

· ' (escaped as \')

· ( (escaped as \()

· ) (escaped as \))

· \ (escaped as \\)

You can use quote instead of escape special characters except \ (backslash). You need to
use backslash for escaping backslash like \\ in quote.

Quote syntax is "..." or '...'. You need escape " as \" in "..." quote syntax. You need
escape ' as \' in '...' quote syntax. For example, Alice's brother (Bob) can be quoted
"Alice's brother (Bob)" or 'Alice\'s brother (Bob)'.

NOTE:
There is an important point which you have to care. The \ (backslash) character is
interpreted by command line shell. So if you want to search ( itself for example, you
need to escape twice (\\() in command line shell. The command line shell interprets
\\( as \(, then pass such a literal to Groonga. Groonga regards \( as (, then search (
itself from database. If you can't do intended search by Groonga, confirm whether
special character is escaped properly.

Conditional expression
Here is available conditional expression list.

Full text search condition
Its syntax is keyword.

Full text search condition specifies a full text search condition against the default
match columns. Match columns are full text search target columns.

You should specify the default match columns for full text search. They can be specified
by --match_columns option of /reference/commands/select. If you don't specify the default
match columns, this conditional expression fails.

This conditional expression does full text search with keyword. keyword should not contain
any spaces. If keyword contains a space such as search keyword, it means two full text
search conditions; search and keyword. If you want to specifies a keyword that contains
one or more spaces, you can use phrase search condition that is described below.

Here is a simple example.

Execution example:

select Entries --match_columns content --query fast
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that contain a word fast in content column value.

content column is the default match column.

Phrase search condition
Its syntax is "search keyword".

Phrase search condition specifies a phrase search condition against the default match
columns.

You should specify the default match columns for full text search. They can be specified
by --match_columns option of /reference/commands/select. If you don't specify the default
match columns, this conditional expression fails.

This conditional expression does phrase search with search keyword. Phrase search searches
records that contain search and keyword and those terms are appeared in the same order and
adjacent. Thus, Put a search keyword in the form is matched but Search by the keyword and
There is a keyword. Search by it! aren't matched.

Here is a simple example.

Execution example:

select Entries --match_columns content --query '"I started"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that contain a phrase I started in content column value. I
also started isn't matched because I and started aren't adjacent.

content column is the default match column.

Full text search condition (with explicit match column)
Its syntax is column:@keyword.

It's similar to full text search condition but it doesn't require the default match
columns. You need to specify match column for the full text search condition by column:
instead of --match_columns option of /reference/commands/select.

This condtional expression is useful when you want to use two or more full text search
against different columns. The default match columns specified by --match_columns option
can't be specified multiple times. You need to specify the second match column by this
conditional expression.

The different between full text search condition and full text search condition (with
explicit match column) is whether advanced match columns are supported or not. Full text
search condition supports advanced match columns but full text search condition (with
explicit match column) isn't supported. Advanced match columns has the following features:

· Weight is supported.

· Using multiple columns are supported.

· Using index column as a match column is supported.

See description of --match_columns option of /reference/commands/select about them.

Here is a simple example.

Execution example:

select Entries --query content:@fast
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that contain a word fast in content column value.

Phrase search condition (with explicit match column)
Its syntax is column:@"search keyword".

It's similar to phrase search condition but it doesn't require the default match columns.
You need to specify match column for the phrase search condition by column: instead of
--match_columns option of /reference/commands/select.

The different between phrase search condition and phrase search condition (with explicit
match column) is similar to between full text search condition and full text search
condition (with explicit match column). Phrase search condition supports advanced match
columns but phrase search condition (with explicit match column) isn't supported. See
description of full text search condition (with explicit match column) about advanced
match columns.

Here is a simple example.

Execution example:

select Entries --query 'content:@"I started"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that contain a phrase I started in content column value. I
also started isn't matched because I and started aren't adjacent.

Prefix search condition
Its syntax is column:^value or value*.

This conditional expression does prefix search with value. Prefix search searches records
that contain a word that starts with value.

You can use fast prefix search against a column. The column must be indexed and index
table must be patricia trie table (TABLE_PAT_KEY) or double array trie table
(TABLE_DAT_KEY). You can also use fast prefix search against _key pseudo column of
patricia trie table or double array trie table. You don't need to index _key.

Prefix search can be used with other table types but it causes all records scan. It's not
problem for small records but it spends more time for large records.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query '_key:^Goo'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ]
# ]
# ]
# ]

The expression matches records that contain a word that starts with Goo in _key pseudo
column value. Good-bye Senna and Good-bye Tritonn are matched with the expression.

Suffix search condition
Its syntax is column:$value.

This conditional expression does suffix search with value. Suffix search searches records
that contain a word that ends with value.

You can use fast suffix search against a column. The column must be indexed and index
table must be patricia trie table (TABLE_PAT_KEY) with KEY_WITH_SIS flag. You can also use
fast suffix search against _key pseudo column of patricia trie table (TABLE_PAT_KEY) with
KEY_WITH_SIS flag. You don't need to index _key. We recommended that you use index column
based fast suffix search instead of _key based fast suffix search. _key based fast suffix
search returns automatically registered substrings. (TODO: write document about suffix
search and link to it from here.)

NOTE:
Fast suffix search can be used only for non-ASCII characters such as hiragana in
Japanese. You cannot use fast suffix search for ASCII character.

Suffix search can be used with other table types or patricia trie table without
KEY_WITH_SIS flag but it causes all records scan. It's not problem for small records but
it spends more time for large records.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example. It uses fast suffix search for hiragana in Japanese that is one
of non-ASCII characters.

Execution example:

table_create Titles TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Titles content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create SuffixSearchTerms TABLE_PAT_KEY|KEY_WITH_SIS ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create SuffixSearchTerms index COLUMN_INDEX Titles content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Titles
[
{"content": "ぐるんが"},
{"content": "むるんが"},
{"content": "せな"},
{"content": "とりとん"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
select Titles --query 'content:$んが'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 2,
# "むるんが"
# ],
# [
# 1,
# "ぐるんが"
# ]
# ]
# ]
# ]

The expression matches records that have value that ends with んが in content column
value. ぐるんが and むるんが are matched with the expression.

Equal condition
Its syntax is column:value.

It matches records that column value is equal to value.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query _key:Groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that _key column value is equal to Groonga.

Not equal condition
Its syntax is column:!value.

It matches records that column value isn't equal to value.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query _key:!Groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that _key column value is not equal to Groonga.

Less than condition
Its syntax is column:<value.

It matches records that column value is less than value.

If column type is numerical type such as Int32, column value and value are compared as
number. If column type is text type such as ShortText, column value and value are compared
as bit sequence.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query n_likes:<10
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is less than 10.

Greater than condition
Its syntax is column:>value.

It matches records that column value is greater than value.

If column type is numerical type such as Int32, column value and value are compared as
number. If column type is text type such as ShortText, column value and value are compared
as bit sequence.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query n_likes:>10
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is greater than 10.

Less than or equal to condition
Its syntax is column:<=value.

It matches records that column value is less than or equal to value.

If column type is numerical type such as Int32, column value and value are compared as
number. If column type is text type such as ShortText, column value and value are compared
as bit sequence.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query n_likes:<=10
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is less than or equal to 10.

Greater than or equal to condition
Its syntax is column:>=value.

It matches records that column value is greater than or equal to value.

If column type is numerical type such as Int32, column value and value are compared as
number. If column type is text type such as ShortText, column value and value are compared
as bit sequence.

It doesn't require the default match columns such as full text search condition and phrase
search condition.

Here is a simple example.

Execution example:

select Entries --query n_likes:>=10
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is greater than or equal to 10.

Regular expression condition
New in version 5.0.1.

Its syntax is column:~pattern.

It matches records that column value is matched to pattern. pattern must be valid
/reference/regular_expression.

The following example uses .roonga as pattern. It matches Groonga, Mroonga and so on.

Execution example:

select Entries --query content:~.roonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

In most cases, regular expression is evaluated sequentially. So it may be slow against
many records.

In some cases, Groonga evaluates regular expression by index. It's very fast. See
/reference/regular_expression for details.

Combined expression
Here is available combined expression list.

Logical OR
Its syntax is a OR b.

a and b are conditional expressions, conbinded expressions or assignment expressions.

If at least one of a and b are matched, a OR b is matched.

Here is a simple example.

Execution example:

select Entries --query 'n_likes:>10 OR content:@senna'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is greater than 10 or contain a
word senna in content column value.

Logical AND
Its syntax is a + b or just a b.

a and b are conditional expressions, conbinded expressions or assignment expressions.

If both a and b are matched, a + b is matched.

You can specify + the first expression such as +a. The + is just ignored.

Here is a simple example.

Execution example:

select Entries --query 'n_likes:>=10 + content:@groonga'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is greater than or equal to 10
and contain a word groonga in content column value.

Logical NOT
Its syntax is a - b.

a and b are conditional expressions, conbinded expressions or assignment expressions.

If a is matched and b is not matched, a - b is matched.

You can not specify - the first expression such as -a. It's syntax error.

Here is a simple example.

Execution example:

select Entries --query 'n_likes:>=10 - content:@groonga'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is greater than or equal to 10
and don't contain a word groonga in content column value.

Grouping
Its syntax is (...). ... is space separated expression list.

(...) groups one ore more expressions and they can be processed as an expression. a b OR c
means that a and b are matched or c is matched. a (b OR c) means that a and one of b and c
are matched.

Here is a simple example.

Execution example:

select Entries --query 'n_likes:<5 content:@senna OR content:@fast'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
select Entries --query 'n_likes:<5 (content:@senna OR content:@fast)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ]
# ]
# ]
# ]

The first expression doesn't use grouping. It matches records that n_likes:<5 and
content:@senna are matched or content:@fast is matched.

The second expression uses grouping. It matches records that n_likes:<5 and one of
content:@senna or content:@fast are matched.

Assignment expression
This section is for advanced users. Because assignment expression is disabled in --query
option of /reference/commands/select by default. You need to specify
ALLOW_COLUMN|ALLOW_UPDATE as --query_flags option value to enable assignment expression.

Assignment expression in query syntax has some limitations. So you should use
/reference/grn_expr/script_syntax instead of query syntax for assignment.

There is only one syntax for assignment expression. It's column:=value.

value is assigend to column. value is always processed as string in query syntax. value is
casted to the type of column automatically. It causes some limitations. For example, you
cannot use boolean literal such as true and false for Bool type column. You need to use
empty string for false but query syntax doesn't support column:= syntax.

See /reference/cast about cast.

Script syntax
Script syntax is a syntax to specify complex search condition. It is similar to
ECMAScript. For example, _key == "book" means that groonga searches records that _key
value is "book". All values are string in query_syntax but its own type in script syntax.
For example, "book" is string, 1 is integer, TokenBigram is the object whose name is
TokenBigram and so on.

Script syntax doesn't support full ECMAScript syntax. For example, script syntax doesn't
support statement such as if control statement, for iteration statement and variable
definition statement. Function definion is not supported too. But script syntax addes the
original additional operators. They are described after ECMAScript syntax is described.

Security
For security reason, you should not pass an input from users to Groonga directly. If there
is an evil user, the user may input a query that retrieves records that should not be
shown to the user.

Think about the following case.

A Groonga application constructs a Groonga request by the following program:

filter = "column @ \"#{user_input}\""
select_options = {
# ...
:filter => filter,
}
groonga_client.select(select_options)

user_input is an input from user. If the input is query, here is the constructed
select-filter parameter:

column @ "query"

If the input is x" || true || ", here is the constructed select-filter parameter:

column @ "x" || true || ""

This query matches to all records. The user will get all records from your database. The
user may be evil.

It's better that you just receive an user input as a value. It means that you don't accept
that user input can contain operator such as @ and &&. If you accept operator, user can
create evil query.

If user input has only value, you blocks evil query by escaping user input value. Here is
a list how to escape user input value:

· True value: Convert it to true.

· False value: Convert it to false.

· Numerical value: Convert it to Integer or Float. For example, 1.2, -10, 314e-2 and so
on.

· String value: Replace " with \" and \ with \\ in the string value and surround
substituted string value by ". For example, double " quote and back \ slash should be
converted to "double \" quote and back \\ slash".

Sample data
Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries n_likes COLUMN_SCALAR UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"_key": "The first post!",
"content": "Welcome! This is my first post!",
"n_likes": 5},
{"_key": "Groonga",
"content": "I started to use Groonga. It's very fast!",
"n_likes": 10},
{"_key": "Mroonga",
"content": "I also started to use Mroonga. It's also very fast! Really fast!",
"n_likes": 15},
{"_key": "Good-bye Senna",
"content": "I migrated all Senna system!",
"n_likes": 3},
{"_key": "Good-bye Tritonn",
"content": "I also migrated all Tritonn system!",
"n_likes": 3}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

There is a table, Entries, for blog entries. An entry has title, content and the number of
likes for the entry. Title is key of Entries. Content is value of Entries.content column.
The number of likes is value of Entries.n_likes column.

Entries._key column and Entries.content column are indexed using TokenBigram tokenizer. So
both Entries._key and Entries.content are fulltext search ready.

OK. The schema and data for examples are ready.

Literals
Integer
Integer literal is sequence of 0 to 9 such as 1234567890. + or - can be prepended as sign
such as +29 and -29. Integer literal must be decimal. Octal notation, hex and so on can't
be used.

The maximum value of integer literal is 9223372036854775807 (= 2 ** 63 - 1). The minimum
value of integer literal is -9223372036854775808 (= -(2 ** 63)).

Float
Float literal is sequence of 0 to 9, . and 0 to 9 such as 3.14. + or - can be prepended as
sign such as +3.14 and -3.14. ${RADIX}e${EXPORNENTIAL} and ${RADIX}E${EXPORNENTIAL}
formats are also supported. For example, 314e-2 is the same as 3.14.

String
String literal is "...". You need to escape " in literal by prepending \\'' such as ``\".
For example, "Say \"Hello!\"." is a literal for Say "Hello!". string.

String encoding must be the same as encoding of database. The default encoding is UTF-8.
It can be changed by --with-default-encoding configure option, --encodiong
/reference/executables/groonga option and so on.

Boolean
Boolean literal is true and false. true means true and false means false.

Null
Null literal is null. Groonga doesn't support null value but null literal is supported.

Time
NOTE:
This is the groonga original notation.

Time literal doesn't exit. There are string time notation, integer time notation and float
time notation.

String time notation is "YYYY/MM/DD hh:mm:ss.uuuuuu" or "YYYY-MM-DD hh:mm:ss.uuuuuu". YYYY
is year, MM is month, DD is day, hh is hour, mm is minute, ss is second and uuuuuu is
micro second. It is local time. For example, "2012/07/23 02:41:10.436218" is
2012-07-23T02:41:10.436218 in ISO 8601 format.

Integer time notation is the number of seconds that have elapsed since midnight UTC,
January 1, 1970. It is also known as POSIX time. For example, 1343011270 is
2012-07-23T02:41:10Z in ISO 8601 format.

Float time notation is the number of seconds and micro seconds that have elapsed since
midnight UTC, January 1, 1970. For example, 1343011270.436218 is
2012-07-23T02:41:10.436218Z in ISO 8601 format.

Geo point
NOTE:
This is the groonga original notation.

Geo point literal doesn't exist. There is string geo point notation.

String geo point notation has the following patterns:

· "LATITUDE_IN_MSECxLONGITUDE_IN_MSEC"

· "LATITUDE_IN_MSEC,LONGITUDE_IN_MSEC"

· "LATITUDE_IN_DEGREExLONGITUDE_IN_DEGREE"

· "LATITUDE_IN_DEGREE,LONGITUDE_IN_DEGREE"

x and , can be used for separator. Latitude and longitude can be represented in
milliseconds or degree.

Array
Array literal is [element1, element2, ...].

Object literal
Object literal is {name1: value1, name2: value2, ...}. Groonga doesn't support object
literal yet.

Control syntaxes
Script syntax doesn't support statement. So you cannot use control statement such as if.
You can only use A ? B : C expression as control syntax.

A ? B : C returns B if A is true, C otherwise.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == (_id == 1 ? 5 : 3)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that _id column value is equal to 1 and n_likes column
value is equal to 5 or _id column value is not equal to 1 and n_likes column value is
equal to 3.

Grouping
Its syntax is (...). ... is comma separated expression list.

(...) groups one ore more expressions and they can be processed as an expression. a && b
|| c means that a and b are matched or c is matched. a && (b || c) means that a and one of
b and c are matched.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes < 5 && content @ "senna" || content @ "fast"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]
select Entries --filter 'n_likes < 5 && (content @ "senna" || content @ "fast")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ]
# ]
# ]
# ]

The first expression doesn't use grouping. It matches records that n_likes < 5 and content
@ "senna" are matched or content @ "fast" is matched.

The second expression uses grouping. It matches records that n_likes < 5 and one of
content @ "senna" or content @ "fast" are matched.

Function call
Its syntax is name(arugment1, argument2, ...).

name(argument1, argument2, ...) calls a function that is named name with arguments
argument1, argument2 and ....

See /reference/function for available functin list.

Here is a simple example.

Execution example:

select Entries --filter 'edit_distance(_key, "Groonga") <= 1'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression uses /reference/functions/edit_distance. It matches records that _key
column value is similar to "Groonga". Similality of "Groonga" is computed as edit
distance. If edit distance is less than or equal to 1, the value is treated as similar. In
this case, "Groonga" and "Mroonga" are treated as similar.

Basic operators
Groonga supports operators defined in ECMAScript.

Arithmetic operators
Here are arithmetic operators.

Addition operator
Its syntax is number1 + number2.

The operator adds number1 and number2 and returns the result.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == 10 + 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 15 (= 10 + 5).

Subtraction operator
Its syntax is number1 - number2.

The operator subtracts number2 from number1 and returns the result.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == 20 - 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 15 (= 20 - 5).

Multiplication operator
Its syntax is number1 * number2.

The operator multiplies number1 and number2 and returns the result.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == 3 * 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 15 (= 3 * 5).

Division operator
Its syntax is number1 / number2 and number1 % number2.

The operator divides number2 by number1. / returns the quotient of result. % returns the
remainder of result.

Here is simple examples.

Execution example:

select Entries --filter 'n_likes == 26 / 7'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 3 (= 26 / 7).

Execution example:

select Entries --filter 'n_likes == 26 % 7'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 (= 26 % 7).

Logical operators
Here are logical operators.

Logical NOT operator
Its syntax is !condition.

The operator inverts boolean value of condition.

Here is a simple example.

Execution example:

select Entries --filter '!(n_likes == 5)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is not equal to 5.

Logical AND operator
Its syntax is condition1 && condition2.

The operator returns true if both of condition1 and condition2 are true, false otherwise.

Here is a simple example.

Execution example:

select Entries --filter 'content @ "fast" && n_likes >= 10'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that content column value has the word fast and n_likes
column value is greater or equal to 10.

Logical OR operator
Its syntax is condition1 || condition2.

The operator returns true if either condition1 or condition2 is true, false otherwise.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == 5 || n_likes == 10'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 or 10.

Logical AND NOT operator
Its syntax is condition1 &! condition2.

The operator returns true if condition1 is true but condition2 is false, false otherwise.
It returns difference set.

Here is a simple example.

Execution example:

select Entries --filter 'content @ "fast" &! content @ "mroonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that content column value has the word fast but doesn't
have the word mroonga.

Bitwise operators
Here are bitwise operators.

Bitwise NOT operator
Its syntax is ~number.

The operator returns bitwise NOT of number.

Here is a simple example.

Execution example:

select Entries --filter '~n_likes == -6'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 because bitwise NOT
of 5 is equal to -6.

Bitwise AND operator
Its syntax is number1 & number2.

The operator returns bitwise AND between number1 and number2.

Here is a simple example.

Execution example:

select Entries --filter '(n_likes & 1) == 1'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is even number because bitwise
AND between an even number and 1 is equal to 1 and bitwise AND between an odd number and 1
is equal to 0.

Bitwise OR operator
Its syntax is number1 | number2.

The operator returns bitwise OR between number1 and number2.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == (1 | 4)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 (= 1 | 4).

Bitwise XOR operator
Its syntax is number1 ^ number2.

The operator returns bitwise XOR between number1 and number2.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == (10 ^ 15)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 (= 10 ^ 15).

Shift operators
Here are shift operators.

Left shift operator
Its syntax is number1 << number2.

The operator performs a bitwise left shift operation on number1 by number2.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == (5 << 1)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 10 (= 5 << 1).

Signed right shift operator
Its syntax is number1 >> number2.

The operator shifts bits of number1 to right by number2. The sign of the result is the
same as number1.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == -(-10 >> 1)'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 (= -(-10 >> 1) =
-(-5)).

Unsigned right shift operator
Its syntax is number1 >>> number2.

The operator shifts bits of number1 to right by number2. The leftmost number2 bits are
filled by 0.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == (2147483648 - (-10 >>> 1))'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5 (= 2147483648 -
(-10 >>> 1) = 2147483648 - 2147483643).

Comparison operators
Here are comparison operators.

Equal operator
Its syntax is object1 == object2.

The operator returns true if object1 equals to object2, false otherwise.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes == 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 1,
# "The first post!",
# "Welcome! This is my first post!",
# 5
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is equal to 5.

Not equal operator
Its syntax is object1 != object2.

The operator returns true if object1 does not equal to object2, false otherwise.

Here is a simple example.

Execution example:

select Entries --filter 'n_likes != 5'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 4,
# "Good-bye Senna",
# "I migrated all Senna system!",
# 3
# ],
# [
# 5,
# "Good-bye Tritonn",
# "I also migrated all Tritonn system!",
# 3
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

The expression matches records that n_likes column value is not equal to 5.

Less than operator
TODO: ...

Less than or equal to operator
TODO: ...

Greater than operator
TODO: ...

Greater than or equal to operator
TODO: ...

Assignment operators
Addition assignment operator
Its syntax is column1 += column2.

The operator performs addition assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score += n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 4
# ],
# [
# "Good-bye Tritonn",
# 3,
# 4
# ],
# [
# "Groonga",
# 10,
# 11
# ],
# [
# "Mroonga",
# 15,
# 16
# ],
# [
# "The first post!",
# 5,
# 6
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs addition
assignment operation such as '_score = _score + n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 + 3 is evaluated and stored to _score column as the execution result.

Subtraction assignment operator
Its syntax is column1 -= column2.

The operator performs subtraction assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score -= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# -2
# ],
# [
# "Good-bye Tritonn",
# 3,
# -2
# ],
# [
# "Groonga",
# 10,
# -9
# ],
# [
# "Mroonga",
# 15,
# -14
# ],
# [
# "The first post!",
# 5,
# -4
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score - n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 - 3 is evaluated and stored to _score column as the execution result.

Multiplication assignment operator
Its syntax is column1 *= column2.

The operator performs multiplication assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score *= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 3
# ],
# [
# "Good-bye Tritonn",
# 3,
# 3
# ],
# [
# "Groonga",
# 10,
# 10
# ],
# [
# "Mroonga",
# 15,
# 15
# ],
# [
# "The first post!",
# 5,
# 5
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score * n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 * 3 is evaluated and stored to _score column as the execution result.

Division assignment operator
Its syntax is column1 /= column2.

The operator performs division assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score /= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 0
# ],
# [
# "Good-bye Tritonn",
# 3,
# 0
# ],
# [
# "Groonga",
# 10,
# 0
# ],
# [
# "Mroonga",
# 15,
# 0
# ],
# [
# "The first post!",
# 5,
# 0
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score / n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 / 3 is evaluated and stored to _score column as the execution result.

Modulo assignment operator
Its syntax is column1 %= column2.

The operator performs modulo assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score %= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 1
# ],
# [
# "Good-bye Tritonn",
# 3,
# 1
# ],
# [
# "Groonga",
# 10,
# 1
# ],
# [
# "Mroonga",
# 15,
# 1
# ],
# [
# "The first post!",
# 5,
# 1
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score % n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 % 3 is evaluated and stored to _score column as the execution result.

Bitwise left shift assignment operator
Its syntax is column1 <<= column2.

The operator performs left shift assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score <<= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 8
# ],
# [
# "Good-bye Tritonn",
# 3,
# 8
# ],
# [
# "Groonga",
# 10,
# 1024
# ],
# [
# "Mroonga",
# 15,
# 32768
# ],
# [
# "The first post!",
# 5,
# 32
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score << n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 << 3 is evaluated and stored to _score column as the execution result.

Bitwise signed right shift assignment operator
Its syntax is column2 >>= column2.

The operator performs signed right shift assignment operation on column1 by column2.

Bitwise unsigned right shift assignment operator
Its syntax is column1 >>>= column2.

The operator performs unsigned right shift assignment operation on column1 by column2.

Bitwise AND assignment operator
Its syntax is column1 &= column2.

The operator performs bitwise AND assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score &= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 1
# ],
# [
# "Good-bye Tritonn",
# 3,
# 1
# ],
# [
# "Groonga",
# 10,
# 0
# ],
# [
# "Mroonga",
# 15,
# 1
# ],
# [
# "The first post!",
# 5,
# 1
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score & n_likes' for each records.

For example, the value of _score about the record which stores "Groonga" as the _key is
10.

So the expression 1 & 10 is evaluated and stored to _score column as the execution result.

Bitwise OR assignment operator
Its syntax is column1 |= column2.

The operator performs bitwise OR assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score |= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 3
# ],
# [
# "Good-bye Tritonn",
# 3,
# 3
# ],
# [
# "Groonga",
# 10,
# 11
# ],
# [
# "Mroonga",
# 15,
# 15
# ],
# [
# "The first post!",
# 5,
# 5
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score | n_likes' for each records.

For example, the value of _score about the record which stores "Groonga" as the _key is
10.

So the expression 1 | 10 is evaluated and stored to _score column as the execution result.

Bitwise XOR assignment operator
Its syntax is column1 ^= column2.

The operator performs bitwise XOR assignment operation on column1 by column2.

Execution example:

select Entries --output_columns _key,n_likes,_score --filter true --scorer '_score ^= n_likes'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "n_likes",
# "UInt32"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Good-bye Senna",
# 3,
# 2
# ],
# [
# "Good-bye Tritonn",
# 3,
# 2
# ],
# [
# "Groonga",
# 10,
# 11
# ],
# [
# "Mroonga",
# 15,
# 14
# ],
# [
# "The first post!",
# 5,
# 4
# ]
# ]
# ]
# ]

The value of _score by --filter is always 1 in this case, then performs subtraction
assignment operation such as '_score = _score ^ n_likes' for each records.

For example, the value of _score about the record which stores "Good-bye Senna" as the
_key is 3.

So the expression 1 ^ 3 is evaluated and stored to _score column as the execution result.

Original operators
Script syntax adds the original binary opearators to ECMAScript syntax. They operate
search specific operations. They are starts with @ or *.

Match operator
Its syntax is column @ value.

The operator searches value by inverted index of column. Normally, full text search is
operated but tag search can be operated. Because tag search is also implemented by
inverted index.

query_syntax uses this operator by default.

Here is a simple example.

Execution example:

select Entries --filter 'content @ "fast"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I started to use Groonga. It's very fast!"
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!"
# ]
# ]
# ]
# ]

The expression matches records that contain a word fast in content column value.

Prefix search operator
Its syntax is column @^ value.

The operator does prefix search with value. Prefix search searches records that contain a
word that starts with value.

You can use fast prefix search against a column. The column must be indexed and index
table must be patricia trie table (TABLE_PAT_KEY) or double array trie table
(TABLE_DAT_KEY). You can also use fast prefix search against _key pseudo column of
patricia trie table or double array trie table. You don't need to index _key.

Prefix search can be used with other table types but it causes all records scan. It's not
problem for small records but it spends more time for large records.

Here is a simple example.

Execution example:

select Entries --filter '_key @^ "Goo"' --output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "Good-bye Tritonn"
# ],
# [
# "Good-bye Senna"
# ]
# ]
# ]
# ]

The expression matches records that contain a word that starts with Goo in _key pseudo
column value. Good-bye Senna and Good-bye Tritonn are matched with the expression.

Suffix search operator
Its syntax is column @$ value.

This operator does suffix search with value. Suffix search searches records that contain a
word that ends with value.

You can use fast suffix search against a column. The column must be indexed and index
table must be patricia trie table (TABLE_PAT_KEY) with KEY_WITH_SIS flag. You can also use
fast suffix search against _key pseudo column of patricia trie table (TABLE_PAT_KEY) with
KEY_WITH_SIS flag. You don't need to index _key. We recommended that you use index column
based fast suffix search instead of _key based fast suffix search. _key based fast suffix
search returns automatically registered substrings. (TODO: write document about suffix
search and link to it from here.)

NOTE:
Fast suffix search can be used only for non-ASCII characters such as hiragana in
Japanese. You cannot use fast suffix search for ASCII character.

Suffix search can be used with other table types or patricia trie table without
KEY_WITH_SIS flag but it causes all records scan. It's not problem for small records but
it spends more time for large records.

Here is a simple example. It uses fast suffix search for hiragana in Japanese that is one
of non-ASCII characters.

Execution example:

table_create Titles TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Titles content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create SuffixSearchTerms TABLE_PAT_KEY|KEY_WITH_SIS ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create SuffixSearchTerms index COLUMN_INDEX Titles content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Titles
[
{"content": "ぐるんが"},
{"content": "むるんが"},
{"content": "せな"},
{"content": "とりとん"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
select Titles --query 'content:$んが'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 2,
# "むるんが"
# ],
# [
# 1,
# "ぐるんが"
# ]
# ]
# ]
# ]

The expression matches records that have value that ends with んが in content column
value. ぐるんが and むるんが are matched with the expression.

Near search operator
Its syntax is column *N "word1 word2 ...".

The operator does near search with words word1 word2 .... Near search searches records
that contain the words and the words are appeared in the near distance. Near distance is
always 10 for now. The unit of near distance is the number of characters in N-gram family
tokenizers and the number of words in morphological analysis family tokenizers.

(TODO: Add a description about TokenBigram doesn't split ASCII only word into tokens. So
the unit for ASCII words with TokenBigram is the number of words even if TokenBigram is a
N-gram family tokenizer.)

Note that an index column for full text search must be defined for column.

Here is a simple example.

Execution example:

select Entries --filter 'content *N "I fast"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I started to use Groonga. It's very fast!"
# ]
# ]
# ]
# ]
select Entries --filter 'content *N "I Really"' --output_columns content
# [[0, 1337566253.89858, 0.000355720520019531], [[[0], [["content", "Text"]]]]]
select Entries --filter 'content *N "also Really"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I also started to use Mroonga. It's also very fast! Really fast!"
# ]
# ]
# ]
# ]

The first expression matches records that contain I and fast and the near distance of
those words are in 10 words. So the record that its content is I also started to use
mroonga. It's also very fast! ... is matched. The number of words between I and fast is
just 10.

The second expression matches records that contain I and Really and the near distance of
those words are in 10 words. So the record that its content is I also started to use
mroonga. It's also very fast! Really fast! is not matched. The number of words between I
and Really is 11.

The third expression matches records that contain also and Really and the near distance of
those words are in 10 words. So the record that its content is I also st arted to use
mroonga. It's also very fast! Really fast! is matched. The number of words between also
and Really is 10.

Similar search
Its syntax is column *S "document".

The operator does similar search with document document. Similar search searches records
that have similar content to document.

Note that an index column for full text search must be defined for column.

Here is a simple example.

Execution example:

select Entries --filter 'content *S "I migrated all Solr system!"' --output_columns content
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "content",
# "Text"
# ]
# ],
# [
# "I migrated all Senna system!"
# ],
# [
# "I also migrated all Tritonn system!"
# ]
# ]
# ]
# ]

The expression matches records that have similar content to I migrated all Solr system!.
In this case, records that have I migrated all XXX system! content are matched.

Term extract operator
Its syntax is _key *T "document".

The operator extracts terms from document. Terms must be registered as keys of the table
of _key.

Note that the table must be patricia trie (TABLE_PAT_KEY) or double array trie
(TABLE_DAT_KEY). You can't use hash table (TABLE_HASH_KEY) and array (TABLE_NO_KEY)
because they don't support longest common prefix search. Longest common prefix search is
used to implement the operator.

Here is a simple example.

Execution example:

table_create Words TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Words
[
{"_key": "groonga"},
{"_key": "mroonga"},
{"_key": "Senna"},
{"_key": "Tritonn"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
select Words --filter '_key *T "Groonga is the successor project to Senna."' --output_columns _key
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "groonga"
# ],
# [
# "senna"
# ]
# ]
# ]
# ]

The expression extrcts terms that included in document Groonga is the successor project to
Senna.. In this case, NormalizerAuto normalizer is specified to Words. So Groonga can be
extracted even if it is loaded as groonga into Words. All of extracted terms are also
normalized.

Regular expression operator
New in version 5.0.1.

Its syntax is column @~ "pattern".

The operator searches records by the regular expression pattern. If a record's column
value is matched to pattern, the record is matched.

pattern must be valid regular expression syntax. See /reference/regular_expression about
regular expression syntax details.

The following example uses .roonga as pattern. It matches Groonga, Mroonga and so on.

Execution example:

select Entries --filter 'content @~ ".roonga"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "content",
# "Text"
# ],
# [
# "n_likes",
# "UInt32"
# ]
# ],
# [
# 2,
# "Groonga",
# "I started to use Groonga. It's very fast!",
# 10
# ],
# [
# 3,
# "Mroonga",
# "I also started to use Mroonga. It's also very fast! Really fast!",
# 15
# ]
# ]
# ]
# ]

In most cases, regular expression is evaluated sequentially. So it may be slow against
many records.

In some cases, Groonga evaluates regular expression by index. It's very fast. See
/reference/regular_expression for details.

See also
· /reference/api/grn_expr: grn_expr related APIs

Regular expression
Summary
NOTE:
Regular expression support is an experimental feature.

New in version 5.0.1.

Groonga supports pattern match by regular expression. Regular expression is widely used
format to describe a pattern. Regular expression is useful to represent complex pattern.

In most cases, pattern match by regular expression is evaluated as sequential search.
It'll be slow for many records and many texts.

In some cases, pattern match by regular expression can be evaluated by index. It's very
fast rather than sequential search. Patterns that can be evaluated by index are described
later.

New in version 5.0.7: Groonga normalizes match target text by normalizer-auto normalizer
when Groonga doesn't use index for regular expression search. It means that regular
expression that has upper case such as Groonga never match. Because normalizer-auto
normalizer normalizes all alphabets to lower case. groonga matches to both Groonga and
groonga.

Why is match target text normalizered? It's for increasing index search-able patterns. If
Groonga doesn't normalize match target text, you need to write complex regular expression
such as [Dd][Ii][Ss][Kk] and (?i)disk for case-insensitive match. Groonga can't use index
against complex regular expression.

If you write disk regular expression for case-insensitive match, Groonga can search the
pattern with index. It's fast.

You may feel the behavior is strange. But fast search based on this behavior will help
you.

There are many regular expression syntaxes. Groonga uses the same syntax in Ruby. Because
Groonga uses the same regular expression engine as Ruby. The regular expression engine is
Onigmo. Characteristic difference with other regular expression syntax is ^ and $. The
regular expression syntax in Ruby, ^ means the beginning of line and $ means the end of
line. ^ means the beginning of text and $ means the end of text in other most regular
expression syntaxes. The regular expression syntax in Ruby uses \A for the beginning of
text and \z for the end of text.

New in version 5.0.6: Groonga uses multiline mode since 5.0.6. It means that . matches on
\n.

But it's meaningless. Because \n is removed by normalizer-auto normalizer.

You can use regular expression in select-query and select-filter options of
/reference/commands/select command.

Usage
Here are a schema definition and sample data to show usage. There is only one table, Logs.
Logs table has only message column. Log messages are stored into the message column.

Execution example:

table_create Logs TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Logs message COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Logs
[
{"message": "host1:[error]: No memory"},
{"message": "host1:[warning]: Remained disk space is less than 30%"},
{"message": "host1:[error]: Disk full"},
{"message": "host2:[error]: No memory"},
{"message": "host2:[info]: Shutdown"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

Here is an example that uses regular expression in select-query. You need to use
${COLUMN}:~${REGULAR_EXPRESSION} syntax.

Execution example:

select Logs --query 'message:~"disk (space|full)"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 2,
# "host1:[warning]: Remained disk space is less than 30%"
# ],
# [
# 3,
# "host1:[error]: Disk full"
# ]
# ]
# ]
# ]

Here is an example that uses regular expression in select-filter. You need to use
${COLUMN} @~ ${REGULAR_EXPRESSION} syntax.

Execution example:

select Logs --filter 'message @~ "disk (space|full)"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 2,
# "host1:[warning]: Remained disk space is less than 30%"
# ],
# [
# 3,
# "host1:[error]: Disk full"
# ]
# ]
# ]
# ]

Index
Groonga can search records by regular expression with index. It's very fast rather than
sequential search.

But it doesn't support all regular expression patterns. It supports only the following
regular expression patterns. The patterns will be increased in the future.

· Literal only pattern such as disk

· The begging of text and literal only pattern such as \Adisk

· The end of text and literal only pattern such as disk\z

You need to create an index for fast regular expression search. Here are requirements of
index:

· Lexicon must be table-pat-key table.

· Lexicon must use token-regexp tokenizer.

· Index column must has WITH_POSITION flag.

Other configurations such as lexicon's normalizer are optional. You can choose what you
like. If you want to use case-insensitive search, use normalizer-auto normalizer.

Here are recommended index definitions. In general, it's reasonable index definitions.

Execution example:

table_create RegexpLexicon TABLE_PAT_KEY ShortText \
--default_tokenizer TokenRegexp \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create RegexpLexicon logs_message_index \
COLUMN_INDEX|WITH_POSITION Logs message
# [[0, 1337566253.89858, 0.000355720520019531], true]

Now, you can use index for regular expression search. The following regular expression can
be evaluated by index because it uses only "the beginning of text" and "literal".

Execution example:

select Logs --query message:~\\\\Ahost1
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 1,
# "host1:[error]: No memory"
# ],
# [
# 2,
# "host1:[warning]: Remained disk space is less than 30%"
# ],
# [
# 3,
# "host1:[error]: Disk full"
# ]
# ]
# ]
# ]

Here is an example that uses select-filter instead of select-query. It uses the same
regular expression as the previous example.

Execution example:

select Logs --filter 'message @~ "\\\\Ahost1:"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 1,
# "host1:[error]: No memory"
# ],
# [
# 2,
# "host1:[warning]: Remained disk space is less than 30%"
# ],
# [
# 3,
# "host1:[error]: Disk full"
# ]
# ]
# ]
# ]

\ escape will confuse you because there are some steps that require escape between you and
Groonga. Here are steps that require \ escape:

· Shell only when you pass Groonga command from command line the following:

% groonga /tmp/db select Logs --filter '"message @~ \"\\\\Ahost1:"\"'

--filter '"message @~ \"\\\\Ahost1:\""' is evaluated as the following two arguments
by shell:

· --filter

· "message @~ \"\\\\Ahost1:\""

· Groonga command parser only when you pass Groonga command by command line style
(COMMAND ARG1_VALUE ARG2_VALUE ...) not HTTP path style
(/d/COMMAND?ARG1_NAME=ARG1_VALUE&ARG2_NAME=ARG3_VALUE).

"message @~ \"\\\\Ahost1:\"" is evaluated as the following value by Groonga command
parser:

· message @~ "\\Ahost1:"

· /reference/grn_expr parser. \ escape is required in both
/reference/grn_expr/query_syntax and /reference/grn_expr/script_syntax.

"\\Ahost1:" string literal in script syntax is evaluated as the following value:

· \Ahost1

The value is evaluated as regular expression.

Syntax
This section describes about only commonly used syntaxes. See Onigmo syntax documentation
for other syntaxes and details.

Escape
In regular expression, there are the following special characters:

· \

· |

· (

· )

· [

· ]

· .

· *

· +

· ?

· {

· }

· ^

· $

If you want to write pattern that matches these special character as is, you need to
escape them.

You can escape them by putting \ before special character. Here are regular expressions
that match special character itself:

· \\

· \|

· \(

· \)

· \[

· \]

· \.

· \*

· \+

· \?

· \{

· \}

· \^

· \$

Execution example:

select Logs --filter 'message @~ "warning|info"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 2,
# "host1:[warning]: Remained disk space is less than 30%"
# ],
# [
# 5,
# "host2:[info]: Shutdown"
# ]
# ]
# ]
# ]

If your regular expression doesn't work as you expected, confirm that some special
characters are used without escaping.

Choice
Choice syntax is A|B. The regular expression matches when either A pattern or B pattern is
matched.

Execution example:

select Logs --filter 'message @~ "warning|info"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 2,
# "host1:[warning]: Remained disk space is less than 30%"
# ],
# [
# 5,
# "host2:[info]: Shutdown"
# ]
# ]
# ]
# ]

CAUTION:
Regular expression that uses this syntax can't be evaluated by index.

Group
Group syntax is (...). Group provides the following features:

· Back reference

· Scope reducing

You can refer matched groups by \n (n is the group number) syntax. For example, e(r)\1o\1
matches error. Because \1 is replaced with match result (r) of the first group (r).

Execution example:

select Logs --filter 'message @~ "e(r)\\\\1o\\\\1"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 1,
# "host1:[error]: No memory"
# ],
# [
# 3,
# "host1:[error]: Disk full"
# ],
# [
# 4,
# "host2:[error]: No memory"
# ]
# ]
# ]
# ]

You can also use more powerful back reference features. See "8. Back reference" section in
Onigmo documentation for details.

Group syntax reduces scope. For example, \[(warning|info)\] reduces choice syntax scope.
The regular expression matches [warning] and [info].

Execution example:

select Logs --filter 'message @~ "\\\\[(warning|info)\\\\]"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 2,
# "host1:[warning]: Remained disk space is less than 30%"
# ],
# [
# 5,
# "host2:[info]: Shutdown"
# ]
# ]
# ]
# ]

You can also use more powerful group related features. See "7. Extended groups" section in
Onigmo documentation for details.

CAUTION:
Regular expression that uses this syntax can't be evaluated by index.

Character class
Character class syntax is [...]. Character class is useful to specify multiple characters
to be matched.

For example, [12] matches 1 or 2.

Execution example:

select Logs --filter 'message @~ "host[12]"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 1,
# "host1:[error]: No memory"
# ],
# [
# 2,
# "host1:[warning]: Remained disk space is less than 30%"
# ],
# [
# 3,
# "host1:[error]: Disk full"
# ],
# [
# 4,
# "host2:[error]: No memory"
# ],
# [
# 5,
# "host2:[info]: Shutdown"
# ]
# ]
# ]
# ]

You can specify characters by range. For example, [0-9] matches one digit.

Execution example:

select Logs --filter 'message @~ "[0-9][0-9]%"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 2,
# "host1:[warning]: Remained disk space is less than 30%"
# ]
# ]
# ]
# ]

You can also use more powerful character class related features. See "6. Character class"
section in Onigmo documentation for details.

CAUTION:
Regular expression that uses this syntax can't be evaluated by index.

Anchor
There are the following commonly used anchor syntaxes. Some anchors can be evaluated by
index.

┌───────┬───────────────────────┬─────────────┐
│Anchor │ Description │ Index ready │
├───────┼───────────────────────┼─────────────┤
^ │ The beginning of line │ o │
├───────┼───────────────────────┼─────────────┤
$ │ The end of line │ x │
├───────┼───────────────────────┼─────────────┤
\A │ The beginning of text │ o │
├───────┼───────────────────────┼─────────────┤
\z │ The end of text │ x │
└───────┴───────────────────────┴─────────────┘

Here is an example that uses \z.

Execution example:

select Logs --filter 'message @~ "%\\\\z"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 2,
# "host1:[warning]: Remained disk space is less than 30%"
# ]
# ]
# ]
# ]

You can also use more anchors. See "5. Anchors" section in Onigmo documentation for
details.

CAUTION:
Regular expression that uses this syntax except \A and \z can't be evaluated by index.

Quantifier
There are the following commonly used quantifier syntaxes.

┌───────────┬─────────────────┐
│Quantifier │ Description │
├───────────┼─────────────────┤
? │ 0 or 1 time │
├───────────┼─────────────────┤
* │ 0 or more times │
├───────────┼─────────────────┤
+ │ 1 or more times │
└───────────┴─────────────────┘

For example, er+or matches error, errror and so on.

Execution example:

select Logs --filter 'message @~ "er+or"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "message",
# "Text"
# ]
# ],
# [
# 1,
# "host1:[error]: No memory"
# ],
# [
# 3,
# "host1:[error]: Disk full"
# ],
# [
# 4,
# "host2:[error]: No memory"
# ]
# ]
# ]
# ]

You can also use more quantifiers. See "4. Quantifier" section in Onigmo documentation for
details.

CAUTION:
Regular expression that uses this syntax can't be evaluated by index.

Others
There are more syntaxes. If you're interested in them, see Onigmo documentation for
details. You may be interested in "character type" and "character" syntaxes.

Function
Function can be used in some commands. For example, you can use function in --filter,
--scorer and output_columns options of commands/select.

This section describes about function and built-in functions.

TODO: Add documentations about function.

between
Summary
between is used for checking the specified value exists in the specific range. It is
often used for combination with select-filter option in /reference/commands/select.

Syntax
between has five parameters:

between(column_or_value, min, min_border, max, max_border)

Usage
Here are a schema definition and sample data to show usage:

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users age COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Ages TABLE_HASH_KEY Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Ages user_age COLUMN_INDEX Users age
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "Alice", "age": 12},
{"_key": "Bob", "age": 13},
{"_key": "Calros", "age": 15},
{"_key": "Dave", "age": 16},
{"_key": "Eric", "age": 20}
{"_key": "Frank", "age": 21}
]
# [[0, 1337566253.89858, 0.000355720520019531], 6]

Here is the query to show the persons to match PG-13 rating (MPAA).

Execution example:

select Users --filter 'between(age, 13, "include", 16, "include")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "Int32"
# ]
# ],
# [
# 2,
# "Bob",
# 13
# ],
# [
# 3,
# "Calros",
# 15
# ],
# [
# 4,
# "Dave",
# 16
# ]
# ]
# ]
# ]

It returns 13, 14, 15 and 16 years old users.

between function accepts not only a column of table, but also the value.

If you specify the value as 1st parameter, it is checked whether the value is included or
not. if it matches to the specified range, it returns the all records because between
function returns true.

If it doesn't match to the specified range, it returns no records because between function
returns false.

Execution example:

select Users --filter 'between(14, 13, "include", 16, "include")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 6
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "Int32"
# ]
# ],
# [
# 1,
# "Alice",
# 12
# ],
# [
# 2,
# "Bob",
# 13
# ],
# [
# 3,
# "Calros",
# 15
# ],
# [
# 4,
# "Dave",
# 16
# ],
# [
# 5,
# "Eric",
# 20
# ],
# [
# 6,
# "Frank",
# 21
# ]
# ]
# ]
# ]

In the above case, it returns all the records, because 14 exists in between 13 and 16.
This behavior is used for checking the specified value exists or not in the table.

Parameters
There are five required parameters, column_or_value, and min, min_border, max and
max_border.

column_or_value
Specifies a column of the table or the value.

min
Specifies the minimal border value of the range. You can control the behavior that the
value of max is included or excluded by max_border parameter.

min_border
Specifies whether the specified range contains the value of min or not. The value of
min_border are either "include" or "exclude". If it is "include", min value is included.
If it is "exclude", min value is not included.

max
Specifies the maximum border value of the range. You can control the behavior that the
value of max is included or excluded by max_border parameter.

max_border
Specifies whether the specified range contains the value of max or not. The value of
max_border are either "include" or "exclude". If it is "include", max value is included.
If it is "exclude", max value is not included.

Return value
between returns whether the value of column exists in specified the value of range or not.
If record is matched to specified the value of range, it returns true. Otherwise, it
returns false.

edit_distance
名前
edit_distance - 指定した2つの文字列の編集距離を計算する

書式
edit_distance(string1, string2)

説明
Groonga組込関数の一つであるedit_distanceについて説明します。組込関数は、script形式のgrn_expr中で呼び出すことができます。

edit_distance()
関数は、string1に指定した文字列とstring2に指定した文字列の間の編集距離を求めます。

引数
string1
文字列を指定します

string2
もうひとつの文字列を指定します

返値
指定した2つ文字列の編集距離をUint32型の値として返します。


edit_distance(title, "hoge")
1

geo_distance
Summary
geo_distance calculates the value of distance between specified two points.

Syntax
geo_distance requires two point. The parameter approximate_type is optional:

geo_distance(point1, point2)
geo_distance(point1, point2, approximate_type)

The default value of approximate_type is "rectangle". If you omit approximate_type,
geo_distance calculates the value of distance as if "rectangle" was specified.

Usage
geo_distance is one of the Groonga builtin functions.

You can call a builtin function in /reference/grn_expr

geo_distance function calculates the value of distance (approximate value) between the
coordinate of point1 and the coordinate of point2.

NOTE:
Groonga provides three built in functions for calculating the value of distance. There
are geo_distance(), geo_distance2() and geo_distance3(). The difference of them is the
algorithm of calculating distance. geo_distance2() and geo_distance3() were deprecated
since version 1.2.9. Use geo_distance(point1, point2, "sphere") instead of
geo_distance2(point1, point2). Use geo_distance(point1, point2, "ellipsoid") instead
of geo_distance3(point1, point2).

Lets's learn about geo_distance usage with examples. This section shows simple usages.

Here are two schema definition and sample data to show the difference according to the
usage. Those samples show how to calculate the value of distance between New York City
and London.

1. Using the column value of location for calculating the distance (Cities table)

2. Using the explicitly specified coordinates for calculating the distance (Geo table)

Using the column value of location
Here are a schema definition of Cities table and sample data to show usage.

table_create Cities TABLE_HASH_KEY ShortText
column_create Cities location COLUMN_SCALAR WGS84GeoPoint
load --table Cities
[
{
"_key", "location"
},
{
"New York City", "146566000x-266422000",
},
]

Execution example:

table_create Cities TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Cities location COLUMN_SCALAR WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Cities
[
{
"_key", "location"
},
{
"New York City", "146566000x-266422000",
},
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

This execution example creates a table named Cities which has one column named location.
location column stores the value of coordinate. The coordinate of Tokyo is stored as
sample data.

Execution example:

select Cities --output_columns _score --filter 1 --scorer '_score = geo_distance(location, "185428000x-461000", "rectangle")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_score",
# "Int32"
# ]
# ],
# [
# 5715104
# ]
# ]
# ]
# ]

This sample shows that geo_distance use the value of location column and the value of
coordinate to calculate distance.

The value ("185428000x-461000") passed to geo_distance as the second argument is the
coordinate of London.

Using the explicitly specified value of location
Here are a schema definition of Geo table and sample data to show usage.

table_create Geo TABLE_HASH_KEY ShortText
column_create Geo distance COLUMN_SCALAR Int32
load --table Geo
[
{
"_key": "the record for geo_distance() result"
}
]

Execution example:

table_create Geo TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Geo distance COLUMN_SCALAR Int32
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Geo
[
{
"_key": "the record for geo_distance() result"
}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

This execution example creates a table named Geo which has one column named distance.
distance column stores the value of distance.

Execution example:

select Geo --output_columns distance --scorer 'distance = geo_distance("146566000x-266422000", "185428000x-461000", "rectangle")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "distance",
# "Int32"
# ]
# ],
# [
# 5807750
# ]
# ]
# ]
# ]

This sample shows that geo_distance use the coordinate of London and the coordinate of New
York to calculate distance.

Parameters
Required parameters
There are two required parameter, point1 and point2.

point1
Specifies the start point that you want to calculate the value of distance between two
points.

You can specify the value of GeoPoint type. [1]

See /reference/types about GeoPoint.

point2
Specifies the end point that you want to calculate the value of distance between two
points.

You can specify the value of GeoPoint type or the string indicating the coordinate.

See /reference/types about GeoPoint and the coordinate.

Optional parameter
There is a optional parameter, approximate_type.

approximate_type
Specifies how to approximate the geographical features for calculating the value of
distance.

You can specify the value of approximate_type by one of the followings.

· rectangle

· sphere

· ellipsoid

NOTE:
There is a limitation about geo_distance. geo_distance can not calculate the value of
distance between two points across meridian, equator or the date line if you use sphere
or ellipsoid as approximate type. There is not such a limitation for rectangle. This
is temporary limitation according to the implementation of Groonga, but it will be
fixed in the future release.

rectangle
This parameter require to approximate the geographical features by square approximation
for calculating the distance.

Since the value of distance is calculated by simple formula, you can calculate the value
of distance fast. But, the error of distance increases as it approaches the pole.

You can also specify rect as abbrev expression.

Here is a sample about calculating the value of distance with column value.

Execution example:

select Cities --output_columns _score --filter 1 --scorer '_score = geo_distance(location, "185428000x-461000", "rectangle")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_score",
# "Int32"
# ]
# ],
# [
# 5715104
# ]
# ]
# ]
# ]

Here is a sample about calculating the value of distance with explicitly specified point.

Execution example:

select Geo --output_columns distance --scorer 'distance = geo_distance("146566000x-266422000", "185428000x-461000", "rectangle")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "distance",
# "Int32"
# ]
# ],
# [
# 5807750
# ]
# ]
# ]
# ]

Here are samples about calculating the value of distance with explicitly specified point
across meridian, equator, the date line.

Execution example:

select Geo --output_columns distance --scorer 'distance = geo_distance("175904000x8464000", "145508000x-13291000", "rectangle")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "distance",
# "Int32"
# ]
# ],
# [
# 1051293
# ]
# ]
# ]
# ]

This sample shows the value of distance across meridian. The return value of
geo_distance("175904000x8464000", "145508000x-13291000", "rectangle") is the value of
distance from Paris, Flance to Madrid, Spain.

Execution example:

select Geo --output_columns distance --scorer 'distance = geo_distance("146566000x-266422000", "-56880000x-172310000", "rectangle")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "distance",
# "Int32"
# ]
# ],
# [
# 6880439
# ]
# ]
# ]
# ]

This sample shows the value of distance across equator. The return value of
geo_distance("146566000x-266422000", "-56880000x-172310000", "rectangle") is the value of
distance from New York, The United Status to Brasillia, Brasil.

Execution example:

select Geo --output_columns distance --scorer 'distance = geo_distance("143660000x419009000", "135960000x-440760000", "rectangle")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "distance",
# "Int32"
# ]
# ],
# [
# 10475205
# ]
# ]
# ]
# ]

This sample shows the value of distance across the date line. The return value of
geo_distance("143660000x419009000", "135960000x-440760000", "rectangle") is the value of
distance from Beijin, China to San Francisco, The United States.

NOTE:
geo_distance uses square approximation as default. If you omit approximate_type,
geo_distance behaves like rectangle was specified.

NOTE:
geo_distance accepts the string indicating the coordinate as the value of point1 when
the value of approximate_type is "rectangle". If you specified the string indicating
the coordinate as the value of point1 with sphere or ellipsoid, geo_distance returns 0
as the value of distance.

sphere
This parameter require to approximate the geographical features by spherical approximation
for calculating the distance.

It is slower than rectangle, but the error of distance becomes smaller than rectangle.

You can also specify sphr as abbrev expression.

Here is a sample about calculating the value of distance with column value.

Execution example:

select Cities --output_columns _score --filter 1 --scorer '_score = geo_distance(location, "185428000x-461000", "sphere")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_score",
# "Int32"
# ]
# ],
# [
# 5715102
# ]
# ]
# ]
# ]

ellipsoid
This parameter require to approximate the geographical features by ellipsoid approximation
for calculating the distance.

It uses the calculation of distance by the formula of Hubeny. It is slower than sphere,
but the error of distance becomes smaller than sphere.

You can also specify ellip as abbrev expression.

Here is a sample about calculating the value of distance with column value.

Execution example:

select Cities --output_columns _score --filter 1 --scorer '_score = geo_distance(location, "185428000x-461000", "ellipsoid")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_score",
# "Int32"
# ]
# ],
# [
# 5706263
# ]
# ]
# ]
# ]

Return value
geo_distance returns the value of distance in float type. The unit of return value is
meter. Footnote

[1] You can specify whether TokyoGeoPoint or WGS84GeoPoint.

geo_in_circle
名前
geo_in_circle - 座標が円の範囲内に存在するかどうかを調べます。

書式
geo_in_circle(point, center, radious_or_point[, approximate_type])

説明
Groonga組込関数の一つであるgeo_in_circleについて説明します。組込関数は、script形式のgrn_expr中で呼び出すことができます。

geo_in_circle()
関数は、pointに指定した座標が、centerに指定した座標を中心とする円の範囲内にあるかどうかを調べます。

引数
point
円の範囲内に存在するかどうかを調べる座標を指定します。Point型の値を指定できます。 [1]

center
円の中心となる座標を指定します。Point型の値、あるいは座標を示す文字列を指定できます。

radious_or_point
円の半径を指定します。数値を指定した場合には、半径(単位:メートル)が指定されたものとみなします。
Point型の値、あるいは座標を示す文字列を指定した場合は、円周上の点の一つの座標が指定されたものとみなします。

approximate_type
半径からの距離を求めるために地形をどのように近似するかを指定します。指定できる値は以下の通りです。

"rectangle"
方形近似で近似します。単純な計算式で距離を求めることができるため高速ですが、極付近では誤差が大きくなります。

"rect" と省略して指定することもできます。

この近似方法がデフォルト値です。 approximate_type
を省略した場合は方形近似になります。

"sphere"
球面近似で近似します。 "rectangle" よりも遅くなりますが、誤差は小さいです。

"sphr" と省略して指定することもできます。

"ellipsoid"
楕円体近似で近似します。距離の計算にはヒュベニの距離計算式を用います。 "sphere"
よりも遅くなりますが、誤差は小さくなります。

"ellip" と省略して指定することもできます。

返値
pointに指定した座標が円の範囲内にあるかどうかをBool型の値で返します。


geo_in_circle(pos, "100x100", 100)
true
脚注

[1] TokyoGeoPoint(日本測地系座標)かWGS84GeoPoint(世界測地系座標)のいずれかを指定できます。

geo_in_rectangle
名前
geo_in_rectangle - 座標が矩形の範囲内に存在するかどうかを調べます。

書式
geo_in_rectangle(point, top_left, bottom_right)

説明
Groonga組込関数の一つであるgeo_in_rectangleについて説明します。組込関数は、script形式のgrn_expr中で呼び出すことができます。

geo_in_rectangle()
関数は、pointに指定した座標が、top_leftとbottom_rightがなす矩形の範囲内にあるかどうかを調べます。

引数
point
矩形の範囲内に存在するかどうかを調べる座標を指定します。Point型の値を指定できます。 [1]

top_left
矩形の左上隅となる座標を指定します。Point型の値、あるいは座標を示す文字列を指定できます。

bottom_right
矩形の右下隅となる座標を指定します。Point型の値、あるいは座標を示す文字列を指定できます。

返値
pointに指定した座標が矩形の範囲内にあるかどうかをBool型の値で返します。


geo_in_rectangle(pos, "150x100", "100x150")
true
脚注

[1] TokyoGeoPoint(日本測地系座標)かWGS84GeoPoint(世界測地系座標)のいずれかを指定できます。

highlight_full
CAUTION:
This feature is experimental. API will be changed.

Summary
highlight_full tags target text. It can use to highlight the search keyword. It can
specify use/not use HTML escape, the normalizer name and change the tag for each keyword.

Syntax
highlight_full has required parameter and optional parameter:

highlight_full(column, normalizer_name, use_html_escape,
keyword1, open_tag1, close_tag1,
...
[keywordN, open_tagN, close_tagN])

Usage
Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries body COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"body": "Mroonga is a MySQL storage engine based on Groonga. <b>Rroonga</b> is a Ruby binding of Groonga."}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

highlight_full can be used in only --output_columns in /reference/commands/select.

highlight_full requires Groonga 4.0.5 or later.

highlight_full requires /reference/command/command_version 2 or later.

The following example uses HTML escape and normalzier is NormalizeAuto. It specifies the
tags <span class="keyword1"> and </span> of the keyword groonga, and the tags <span
class="keyword2"> and </span> of the keyword mysql.

Execution example:

select Entries --output_columns 'highlight_full(body, "NormalizerAuto", true, "Groonga", "<span class=\\"keyword1\\">", "</span>", "mysql", "<span class=\\"keyword2\\">", "</span>")' --command_version 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "highlight_full",
# "null"
# ]
# ],
# [
# "Mroonga is a <span class=\"keyword2\">MySQL</span> storage engine based on <span class=\"keyword1\">Groonga</span>. &lt;b&gt;Rroonga&lt;/b&gt; is a Ruby binding of <span class=\"keyword1\">Groonga</span>."
# ]
# ]
# ]
# ]

The text are scanned by the keywords for tagging after they are normalized by
NormalizerAuto normalizer.

--query "groonga mysql" matches to the first record's body. highight_full surrounds the
keywords groonga contained in the text with <span class="keyword1"> and </span>, and the
keywords mysql contained in the text with with <span class="keyword2"> and </span>.

Special characters such as < and > are escapsed as &lt; and &gt;.

You can specify string literal instead of column.

Execution example:

select Entries --output_columns 'highlight_full("Groonga is very fast fulltext search engine.", "NormalizerAuto", true, "Groonga", "<span class=\\"keyword1\\">", "</span>", "mysql", "<span class=\\"keyword2\\">", "</span>")' --command_version 2 --match_columns body --query "groonga"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "highlight_full",
# "null"
# ]
# ],
# [
# "<span class=\"keyword1\">Groonga</span> is very fast fulltext search engine."
# ]
# ]
# ]
# ]

Parameters
There are three required parameters, column, normalizer_name and use_html_escape. There
are three or over optional parameters, keywordN, open_tagN and end_tagN.

column
Specifies a column of the table.

normalizer_name
Specifies a normalizer name.

use_html_escape
Specifies use or not use HTML escape. If it is true , use HTML escape. If it is false ,
not use HTML escape.

keywordN
Specifies a keyword for tagging. You can specify multiple keywords for each three
arguments.

open_tagN
Specifies a open tag. You can specify multiple open tags for each three arguments.

close_tagN
Specifies a close tag. You can specify multiple close tags for each three arguments.

Return value
highlight_full returns a tagged string or null. If highlight_full can't find any keywords,
it returns null.

See also
· /reference/commands/select

· /reference/functions/highlight_html

highlight_html
CAUTION:
This feature is experimental. API will be changed.

New in version 4.0.5.

Summary
highlight_html tags target text. It can use to highlight the search keywords. The tagged
text are prepared for embedding HTML. Special characters such as < and > are escapsed as
&lt; and &gt;. Keyword is surrounded with <span class="keyword"> and </span>. For
example, a tagged text of I am a groonga user. <3 for keyword groonga is I am a <span
class="keyword">groonga</span> user. &lt;3.

Syntax
This function has only one parameter:

highlight_html(text)

Usage
Here are a schema definition and sample data to show usage.

Execution example:

table_create Entries TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Entries body COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms document_index COLUMN_INDEX|WITH_POSITION Entries body
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Entries
[
{"body": "Mroonga is a MySQL storage engine based on Groonga. <b>Rroonga</b> is a Ruby binding of Groonga."}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

highlight_html can be used in only --output_columns in /reference/commands/select.

highlight_html requires /reference/command/command_version 2 or later.

You also need to specify --query and/or --filter. Keywords are extracted from --query and
--filter arguments.

The following example uses --query "groonga mysql". In this case, groonga and mysql are
used as keywords.

Execution example:

select Entries --output_columns --match_columns body --query 'groonga mysql' --output_columns 'highlight_html(body)' --command_version 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "highlight_html",
# "null"
# ]
# ],
# [
# "Mroonga is a <span class=\"keyword\">MySQL</span> storage engine based on <span class=\"keyword\">Groonga</span>. &lt;b&gt;Rroonga&lt;/b&gt; is a Ruby binding of <span class=\"keyword\">Groonga</span>."
# ]
# ]
# ]
# ]

The text are scanned by the keywords for tagging after they are normalized by
NormalizerAuto normalizer.

--query "groonga mysql" matches to only the first record's body. highlight_html(body)
surrounds the keywords groonga or mysql contained in the text with <span class="keyword">
and </span>.

You can specify string literal instead of column.

Execution example:

select Entries --output_columns 'highlight_html("Groonga is very fast fulltext search engine.")' --command_version 2 --match_columns body --query "groonga"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "highlight_html",
# "null"
# ]
# ],
# [
# "<span class=\"keyword\">Groonga</span> is very fast fulltext search engine."
# ]
# ]
# ]
# ]

Parameters
This section describes all parameters.

Required parameters
There is only one required parameters.

text
The text to be highlighted in HTML.

Optional parameters
There is no optional parameter.

Return value
highlight_html returns a tagged string or null. If highlight_html can't find any keywords,
it returns null.

See also
· /reference/commands/select

· /reference/functions/highlight_full

html_untag
Summary
html_untag strips HTML tags from HTML and outputs plain text.

html_untag is used in --output_columns described at select-output-columns.

Syntax
html_untag requires only one argument. It is html.

html_untag(html)

Requirements
html_untag requires Groonga 3.0.5 or later.

html_untag requires /reference/command/command_version 2 or later.

Usage
Here are a schema definition and sample data to show usage.

Sample schema:

Execution example:

table_create WebClips TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create WebClips content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

Sample data:

Execution example:

load --table WebClips
[
{"_key": "http://groonga.org", "content": "groonga is <span class='emphasize'>fast</span>"},
{"_key": "http://mroonga.org", "content": "mroonga is <span class=\"emphasize\">fast</span>"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

Here is the simple usage of html_untag function which strips HTML tags from content of
column.

Execution example:

select WebClips --output_columns "html_untag(content)" --command_version 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "html_untag",
# "null"
# ]
# ],
# [
# "groonga is fast"
# ],
# [
# "mroonga is fast"
# ]
# ]
# ]
# ]

When executing the above query, you can see "span" tag with "class" attribute is stripped.
Note that you must specify --command_version 2 to use html_untag function.

Parameters
There is one required parameter, html.

html
Specifies HTML text to be untagged.

Return value
html_untag returns plain text which is stripped HTML tags from HTML text.

in_values
Summary
New in version 4.0.7.

in_values enables you to simplify the query which uses multiple OR or ==. It is
recommended to use this function in point of view about performance improvements in such a
case.

Syntax
in_values requires two or more arguments - target_value and multiple value.

in_values(target_value, value1, ..., valueN)

Usage
Here is a schema definition and sample data.

Sample schema:

Execution example:

table_create Tags TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Memos TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Memos tag COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Tags memos_tag COLUMN_INDEX Memos tag
# [[0, 1337566253.89858, 0.000355720520019531], true]

Sample data:

Execution example:

load --table Memos
[
{"_key": "Groonga is fast", "tag": "groonga"},
{"_key": "Mroonga is fast", "tag": "mroonga"},
{"_key": "Rroonga is fast", "tag": "rroonga"},
{"_key": "Droonga is fast", "tag": "droonga"},
{"_key": "Groonga is a HTTP server", "tag": "groonga"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

Here is the simple usage of in_values function which selects the records - the value of
tag column is "groonga" or "mroonga" or "droonga".

Execution example:

select Memos --output_columns _key,tag --filter 'in_values(tag, "groonga", "mroonga", "droonga")' --sortby _id
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 4
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "tag",
# "ShortText"
# ]
# ],
# [
# "Groonga is fast",
# "groonga"
# ],
# [
# "Mroonga is fast",
# "mroonga"
# ],
# [
# "Droonga is fast",
# "droonga"
# ],
# [
# "Groonga is a HTTP server",
# "groonga"
# ]
# ]
# ]
# ]

When executing the above query, you can get the records except "rroonga" because "rroonga"
is not specified as value in in_values.

Parameters
There are two or more required parameter, target_value and multiple value.

target_value
Specifies a column of the table that is specified by table parameter in select.

value
Specifies a value of the column which you want to select.

Return value
in_values returns whether the value of column exists in specified the value of parameters
or not.

If record is matched to specified the value of parameters, it returns true. Otherwise, it
returns false.

now
名前
now - 現在時刻を返す

書式
now()

説明
Groonga組込関数の一つであるnowについて説明します。組込関数は、script形式のgrn_expr中で呼び出すことができます。

now() 関数は現在時刻に対応するTime型の値を返します。

返値
現在時刻に対応するTime型のオブジェクトを返します。


now()
1256791194.55541

prefix_rk_search()
Summary
prefix_rk_search() selects records by /reference/operations/prefix_rk_search.

You need to create table-pat-key table for prefix RK search.

You can't use prefix_rk_search() for sequential scan. It's a selector only procedure.

Syntax
prefix_rk_search() requires two arguments. They are column and query:

prefix_rk_search(column, query)

column must be _key for now.

query must be string.

Usage
Here are a schema definition and sample data to show usage:

Execution example:

table_create Readings TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Readings
[
{"_key": "ニホン"},
{"_key": "ニッポン"},
{"_key": "ローマジ"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

Here is the simple usage of prefix_rk_search() function which selects ニホン and ニッポン
by ni:

Execution example:

select Readings --filter 'prefix_rk_search(_key, "ni")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# 2,
# "ニッポン"
# ],
# [
# 1,
# "ニホン"
# ]
# ]
# ]
# ]

You can implement /reference/suggest/completion like feature by combining sub_filter.

Create a table that has candidates of completion as records. Each records have zero or
more readings. They are stored into Readings table. Don't forget define an index column
for Items.readings in Readings table. The index column is needed for sub_filter:

Execution example:

table_create Items TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Items readings COLUMN_VECTOR Readings
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Readings items_index COLUMN_INDEX Items readings
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Items
[
{"_key": "日本", "readings": ["ニホン", "ニッポン"]},
{"_key": "ローマ字", "readings": ["ローマジ"]},
{"_key": "漢字", "readings": ["カンジ"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

You can find 日本 record in Items table by niho. Because prefix RK search with niho
selects ニホン reading and ニホン reading is one of readings of 日本 record:

Execution example:

select Items \
--filter 'sub_filter(readings, "prefix_rk_search(_key, \\"niho\\")")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "readings",
# "Readings"
# ]
# ],
# [
# 1,
# "日本",
# [
# "ニホン",
# "ニッポン"
# ]
# ]
# ]
# ]
# ]

You need to combine script-syntax-prefix-search-operator to support no reading completion
targets.

Add one no reading completion target:

Execution example:

load --table Items
[
{"_key": "nihon", "readings": []}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

Combine script-syntax-prefix-search-operator to support no reading completion targets:

Execution example:

select Items \
--filter 'sub_filter(readings, "prefix_rk_search(_key, \\"niho\\")") || \
_key @^ "niho"'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "readings",
# "Readings"
# ]
# ],
# [
# 1,
# "日本",
# [
# "ニホン",
# "ニッポン"
# ]
# ],
# [
# 4,
# "nihon",
# []
# ]
# ]
# ]
# ]

Normally, you want to use case insensitive search for completion. Use --normalizer
NormalizerAuto and label column for the case:

Execution example:

table_create LooseItems TABLE_HASH_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create LooseItems label COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create LooseItems readings COLUMN_VECTOR Readings
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Readings loose_items_index COLUMN_INDEX LooseItems readings
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table LooseItems
[
{"_key": "日本", "label": "日本", "readings": ["ニホン", "ニッポン"]},
{"_key": "ローマ字", "label": "ローマ字", "readings": ["ローマジ"]},
{"_key": "漢字", "label": "漢字", "readings": ["カンジ"]},
{"_key": "Nihon", "label": "日本", "readings": []}
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]

Use LooseItems.label for display:

Execution example:

select LooseItems \
--filter 'sub_filter(readings, "prefix_rk_search(_key, \\"nIhO\\")") || \
_key @^ "nIhO"' \
--output_columns '_key,label'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "label",
# "ShortText"
# ]
# ],
# [
# "日本",
# "日本"
# ],
# [
# "nihon",
# "日本"
# ]
# ]
# ]
# ]

Parameters
There are two required parameter, column and query.

column
Always specifies _key for now.

query
Specifies a query in romaji, katakana or hiragana as string.

Return value
prefix_rk_search() function returns matched records.

See also
· /reference/operations/prefix_rk_search

· /reference/functions/sub_filter

query
Summary
query provides --match_columns and --query parameters of /reference/commands/select
feature as function. You can specify multiple query functions in --filter parameter in
/reference/commands/select.

Because of such flexibility, you can control full text search behavior by combination of
multiple query functions.

query can be used in only --filter in /reference/commands/select.

Syntax
query requires two arguments - match_columns and query_string.

The parameter query_expander or substitution_table is optional.

query(match_columns, query_string)
query(match_columns, query_string, query_expander)
query(match_columns, query_string, substitution_table)

Usage
Here are a schema definition and sample data to show usage.

Sample schema:

Execution example:

table_create Documents TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Documents content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms documents_content_index COLUMN_INDEX|WITH_POSITION Documents content
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Users TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users name COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users memo COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Lexicon TABLE_HASH_KEY ShortText \
--default_tokenizer TokenBigramSplitSymbolAlphaDigit \
--normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon users_name COLUMN_INDEX|WITH_POSITION Users name
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon users_memo COLUMN_INDEX|WITH_POSITION Users memo
# [[0, 1337566253.89858, 0.000355720520019531], true]

Sample data:

Execution example:

load --table Users
[
{"name": "Alice", "memo": "groonga user"},
{"name": "Alisa", "memo": "mroonga user"},
{"name": "Bob", "memo": "rroonga user"},
{"name": "Tom", "memo": "nroonga user"},
{"name": "Tobby", "memo": "groonga and mroonga user. mroonga is ..."},
]
# [[0, 1337566253.89858, 0.000355720520019531], 5]

Here is the simple usage of query function which execute full text search by keyword
'alice' without using --match_columns and --query arguments in --filter.

Execution example:

select Users --output_columns name,_score --filter 'query("name * 10", "alice")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "name",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Alice",
# 10
# ]
# ]
# ]
# ]

When executing above query, the keyword 'alice' is weighted to the value - '10'.

Here are the contrasting examples with/without query.

Execution example:

select Users --output_columns name,memo,_score --match_columns "memo * 10" --query "memo:@groonga OR memo:@mroonga OR memo:@user" --sortby -_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "name",
# "ShortText"
# ],
# [
# "memo",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Tobby",
# "groonga and mroonga user. mroonga is ...",
# 4
# ],
# [
# "Alice",
# "groonga user",
# 2
# ],
# [
# "Alisa",
# "mroonga user",
# 2
# ],
# [
# "Bob",
# "rroonga user",
# 1
# ],
# [
# "Tom",
# "nroonga user",
# 1
# ]
# ]
# ]
# ]

In this case, the keywords 'groonga' and 'mroonga' and 'user' are given same weight value.
You can't pass different weight value to each keyword in this way.

Execution example:

select Users --output_columns name,memo,_score --filter 'query("memo * 10", "groonga") || query("memo * 20", "mroonga") || query("memo * 1", "user")' --sortby -_score
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 5
# ],
# [
# [
# "name",
# "ShortText"
# ],
# [
# "memo",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "Tobby",
# "groonga and mroonga user. mroonga is ...",
# 51
# ],
# [
# "Alisa",
# "mroonga user",
# 21
# ],
# [
# "Alice",
# "groonga user",
# 11
# ],
# [
# "Tom",
# "nroonga user",
# 1
# ],
# [
# "Bob",
# "rroonga user",
# 1
# ]
# ]
# ]
# ]

On the other hand, by specifying multiple query, the keywords 'groonga' and 'mroonga' and
'user' are given different value of weight.

As a result, you can control full text search result by giving different weight to the
keywords on your purpose.

Parameters
Required parameters
There are two required parameter, match_columns and query_string.

match_columns
Specifies the default target column for fulltext search by query_string parameter value.
It is the same role as select-match-columns parameter in select.

query_string
Specifies the search condition in /reference/grn_expr/query_syntax. It is the same role as
query parameter in select.

See select-match-columns about query parameter in select.

Optional parameter
There are some optional parameters.

query_expander
Specifies the plugin name for query expansion.

There is one plugin bundled in official release - /reference/query_expanders/tsv.

See /reference/query_expanders/tsv about details.

substitution_table
Specifies the substitution table and substitution column name by following format such as
${TABLE}.${COLUMN} for query expansion.

See select-query-expander about details.

Return value
query returns whether any record is matched or not. If one or more records are matched, it
returns true. Otherwise, it returns false.

TODO
· Support query_flags

See also
· /reference/commands/select

rand
名前
rand - 乱数を生成する

書式
rand([max])

説明
Groonga組込関数の一つであるrandについて説明します。組込関数は、script形式のgrn_expr中で呼び出すことができます。

rand() 関数は 0 から max の間の疑似乱数整数を返します。

引数
max
返値の最大値を指定します。省略した場合は RAND_MAX が指定されたものとみなされます。

返値
0 と max の間の数を表すInt32型の値を返します。


rand(10)
3

snippet_html
CAUTION:
This feature is experimental. API will be changed.

Summary
snippet_html extracts snippets of target text around search keywords (KWIC. KeyWord In
Context). The snippets are prepared for embedding HTML. Special characters such as < and >
are escapsed as &lt; and &gt;. Keyword is surrounded with <span class="keyword"> and
</span>. For example, a snippet of I am a groonga user. <3 for keyword groonga is I am a
<span class="keyword">groonga</span> user. &lt;3.

Syntax
snippet_html has only one parameter:

snippet_html(column)

snippet_html has many parameters internally but they can't be specified for now. You will
be able to custom those parameters soon.

Usage
Here are a schema definition and sample data to show usage.

Execution example:

table_create Documents TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Documents content COLUMN_SCALAR Text
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms documents_content_index COLUMN_INDEX|WITH_POSITION Documents content
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Documents
[
["content"],
["Groonga is a fast and accurate full text search engine based on inverted index. One of the characteristics of groonga is that a newly registered document instantly appears in search results. Also, groonga allows updates without read locks. These characteristics result in superior performance on real-time applications."],
["Groonga is also a column-oriented database management system (DBMS). Compared with well-known row-oriented systems, such as MySQL and PostgreSQL, column-oriented systems are more suited for aggregate queries. Due to this advantage, groonga can cover weakness of row-oriented systems."]
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

snippet_html can be used in only --output_columns in /reference/commands/select.

You need to specify --command_version 2 argument explicitly because function call in
--output_columns is experimental feature in Groonga 2.0.9. It will be enabled by default
soon.

You also need to specify --query and/or --filter. Keywords are extracted from --query and
--filter arguments.

The following example uses --query "fast performance". In this case, fast and performance
are used as keywords.

Execution example:

select Documents --output_columns "snippet_html(content)" --command_version 2 --match_columns content --query "fast performance"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "snippet_html",
# "null"
# ]
# ],
# [
# [
# "Groonga is a <span class=\"keyword\">fast</span> and accurate full text search engine based on inverted index. One of the characteristics of groonga is that a newly registered document instantly appears in search results. Also, gro",
# "onga allows updates without read locks. These characteristics result in superior <span class=\"keyword\">performance</span> on real-time applications."
# ]
# ]
# ]
# ]
# ]

--query "fast performance" matches to only the first record's content.
snippet_html(content) extracts two text parts that include the keywords fast or
performance and surrounds the keywords with <span class="keyword"> and </span>.

The max number of text parts is 3. If there are 4 or more text parts that include the
keywords, only the leading 3 parts are only used.

The max size of a text part is 200byte. The unit is bytes not characters. The size doesn't
include inserted <span keyword="keyword"> and </span>.

Both the max number of text parts and the max size of a text part aren't customizable.

You can specify string literal instead of column.

Execution example:

select Documents --output_columns 'snippet_html("Groonga is very fast fulltext search engine.")' --command_version 2 --match_columns content --query "fast performance"
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "snippet_html",
# "null"
# ]
# ],
# [
# [
# "Groonga is very <span class=\"keyword\">fast</span> fulltext search engine."
# ]
# ]
# ]
# ]
# ]

Return value
snippet_html returns an array of string or null. If snippet_html can't find any snippets,
it returns null.

An element of array is a snippet:

[SNIPPET1, SNIPPET2, SNIPPET3]

A snippet includes one or more keywords. The max byte size of a snippet except <span
keyword="keyword"> and </span> is 200byte. The unit isn't the number of characters.

The array size is larger than or equal to 0 and less than or equal to 3. The max size 3
will be customizable soon.

TODO
· Make the max number of text parts customizable.

· Make the max size of a text part customizable.

· Make keywords customizable.

· Make tag that surrounds a keyword customizable.

· Make normalization customizable.

· Support options by object literal.

See also
· /reference/commands/select

sub_filter
Summary
sub_filter evaluates filter_string in scope context.

sub_filter can be used in only --filter in /reference/commands/select.

Syntax
sub_filter requires two arguments. They are scope and filter_string.

sub_filter(scope, filter_string)

Usage
Here are a schema definition and sample data to show usage.

Sample schema:

Execution example:

table_create Comment TABLE_PAT_KEY UInt32
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comment name COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comment content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Blog TABLE_PAT_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Blog title COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Blog content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Blog comments COLUMN_VECTOR Comment
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Comment blog_comment_index COLUMN_INDEX Blog comments
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Lexicon TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon comment_content COLUMN_INDEX|WITH_POSITION Comment content
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon comment_name COLUMN_INDEX|WITH_POSITION Comment name
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Lexicon blog_content COLUMN_INDEX|WITH_POSITION Blog content
# [[0, 1337566253.89858, 0.000355720520019531], true]

Sample data:

Execution example:

load --table Comment
[
{"_key": 1, "name": "A", "content": "groonga"},
{"_key": 2, "name": "B", "content": "groonga"},
{"_key": 3, "name": "C", "content": "rroonga"},
{"_key": 4, "name": "A", "content": "mroonga"},
]
# [[0, 1337566253.89858, 0.000355720520019531], 4]
load --table Blog
[
{"_key": "groonga's blog", "content": "content of groonga's blog", comments: [1, 2, 3]},
{"_key": "mroonga's blog", "content": "content of mroonga's blog", comments: [2, 3, 4]},
{"_key": "rroonga's blog", "content": "content of rroonga's blog", comments: [3]},
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

Here is the simple usage of sub_filter function which extracts the blog entry commented by
user 'A'.

Execution example:

select Blog --output_columns _key --filter "comments.name @ \"A\" && comments.content @ \"groonga\""
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "groonga's blog"
# ],
# [
# "mroonga's blog"
# ]
# ]
# ]
# ]

When executing the above query, not only "groonga's blog", but also "mroonga's blog".
This is not what you want because user "A" does not mention "groonga" to "mroonga's blog".

Without sub_filter, it means that following conditions are met.

· There is at least one record that user "A" commented out.

· There is at least one record that mentioned about "groonga".

Execution example:

select Blog --output_columns _key --filter 'sub_filter(comments, "name @ \\"A\\" && content @ \\"groonga\\"")'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ]
# ],
# [
# "groonga's blog"
# ]
# ]
# ]
# ]

On the other hand, executing the above query returns the intended result. Because the
arguments of sub_filter is evaluated in comments column's context.

It means that sub_filter requires the following condition is met.

· There are the records that user "A" mentions about "groonga".

Parameters
There are two required parameter, scope and filter_string.

scope
Specifies a column of the table that is specified by table parameter in select. The column
has a limitation. The limitation is described later. filter_string is evaluated in the
column context. It means that filter_string is evaluated like select --table
TYPE_OF_THE_COLUMN --filter FILTER_STRING.

The specified column type must be a table. In other words, the column type must be
reference type.

You can chain columns by COLUMN_1.COLUMN_2.COLUMN_3...COLUMN_N syntax. For example,
user.group.name.

See select-table about table parameter in select.

filter_string
Specifies a search condition in /reference/grn_expr/script_syntax. It is evaluated in
scope context.

Return value
sub_filter returns whether any record is matched or not. If one or more records are
matched, it returns true. Otherwise, it returns false.

See also
· /reference/commands/select

· /reference/grn_expr/script_syntax

vector_size
Summary
New in version 5.0.3.

vector_size returns the value of vector column size.

To enable this function, register functions/vector plugin by following the command:

plugin_register functions/vector

Then, use vector_size function with --command_version 2 option. Note that you must specify
--command_version 2 to use vector_size function."

Syntax
vector_size requires one argument - target.

vector_size(target)

Usage
Here is a schema definition and sample data.

Sample schema:

Execution example:

table_create Memos TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Memos tags COLUMN_VECTOR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

Sample data:

Execution example:

load --table Memos
[
{"_key": "Groonga", "tags": ["Groonga"]},
{"_key": "Rroonga", "tags": ["Groonga", "Ruby"]},
{"_key": "Nothing"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

Here is the simple usage of vector_size function which returns tags and size - the value
of tags column and size of it.

Execution example:

select Memos --output_columns 'tags, vector_size(tags)' --command_version 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "tags",
# "ShortText"
# ],
# [
# "vector_size",
# "Object"
# ]
# ],
# [
# [
# "Groonga"
# ],
# 1
# ],
# [
# [
# "Groonga",
# "Ruby"
# ],
# 2
# ],
# [
# [],
# 0
# ]
# ]
# ]
# ]

Parameters
There is one required parameter, target.

target
Specifies a vector column of table that is specified by table parameter in select.

Return value
vector_size returns the value of target vector column size.

Operations
Groonga has the multiple search operations. This section describes about search
operations.

Geolocation search
Groonga supports geolocation search. It uses index for search. It means that you can
search by geolocation fast like fulltext search.

Supported features
Groonga supports only point as data type. Line, surface and so on aren't supported yet.
Here is a feature list:

1. Groonga can store a point to a column.

2. Groonga can search records that have a point in the specified rectangle.

3. Groonga can search records that have a point in the specified circle.

4. Groonga can calculate distance between two points.

5. Groonga can sort records by distance from the specified point in ascending order.

Here are use cases for Groonga's geolocation search:

· You list McDonald's around a station.

· You list KFC around the current location sort by distance from the current location in
ascending order with distance.

Here are not use cases:

· You search McDonald's in a city. (Groonga doesn't support geolocation search by a shape
except a rectangle and a circle.)

· You store a region instead of a point as a lake record. (A column can't has geolocation
data except a point.)

The following figures show about Groonga's geolocation search features.

Here is a figure that only has records. A black point describes a record. The following
figures shows how records are treated. [image: only records] [image]

Coming soon...

Prefix RK search
Summary
Groonga supports prefix RK search. RK means Romaji and Kana (reading). Prefix RK search
can find registered text in katakana by query in romaji, hiragana or katakana. Found
registered texts are started with query.

Prefix RK search is useful for completing Japanese text. Because romaji is widely used to
input Japanese on computer. See also Japanese input methods on Wikipedia.

If users can search Japanese text in romaji, users doesn't need to convert romaji to
hiragana, katakana or kanji by themselves. For example, if you register a reading for
"日本" as "ニホン", users can find "日本" by "ni", "に" or "二".

The feature is helpful because it reduces one or more operations of users.

This feature is used in /reference/suggest/completion.

You can use this feature in select-filter by /reference/functions/prefix_rk_search.

Usage
You need table-pat-key table for using prefix RK search.

You need to put reading in katakana to TABLE_PAT_KEY as key:

Execution example:

table_create Readings TABLE_PAT_KEY ShortText --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Readings
[
{"_key": "ニホン"},
{"_key": "ニッポン"},
{"_key": "ローマジ"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

You can finds ニホン and ニッポン by prefix RK search with ni as query from the Readings
table.

You can finds ローマジ by prefix RK search with r as query from the Readings table.

How to convert romaji to reading
Prefix RK search is based on JIS X 4063:2000 specification.

The specification was obsoleted. See ローマ字入力 on Japanese Wikipedia for JIS X
4063:2000.

Normally, you can get converted results as expected.

See also
· /reference/suggest/completion

· /reference/functions/prefix_rk_search

Configuration
New in version 5.1.2.

Groonga can manage configuration items in each database. These configuration items are
persistent. It means that these configuration items are usable after a Groonga process
exits.

Summary
You can change some Groonga behaviors such as /spec/search by some ways such as request
parameter (select-match-escalation-threshold) and build parameter
(install-configure-with-match-escalation-threshold).

Configuration is one of these ways. You can change some Groonga behaviors per database by
configuration.

A configuration item consists of key and value. Both of key and value are string. The max
key size is 4KiB. The max value size is 4091B (= 4KiB - 5B).

You can set a configuration item by /reference/commands/config_set.

You can get a configuration item by /reference/commands/config_get.

You can delete a configuration item by /reference/commands/config_delete.

You can confirm all configuration items by /reference/commands/dump.

Commands
Alias
New in version 5.1.2.

You can refer a table and column by multiple names by using alias feature.

Summary
The alias feature is useful for the following cases:

· You want to rename a table but you can't change some Groonga clients that uses the
current table name.

· You want to change column type without downtime.

In the former case, some Groonga clients can use the current table name after you rename a
table. Because the alias feature maps the current table name to the renamed new table
name.

In the latter case, all Groonga clients access the column by aliased name such as
aliased_column. aliased_column refers current_column. You create a new column new_column
with new type and copy data from current_column by /reference/commands/column_copy. You
change aliased_column to refer new_column from current_column. Now, all Groonga clients
access new_column by aliased_column without stopping search requests.

Usage
You manage alias to real name mapping by a normal table and a normal column.

You can use any table type except table-no-key for the table. table-hash-key is
recommended because exact key match search is only used for the alias feature.
table-hash-key is the fastest table type for exact key match search.

The column must be /reference/columns/scalar and type is ShortText. You can also use Text
and LongText types but they are meaningless. Because the max table/column name size is
4KiB. ShortText can store 4KiB data.

Here are example definitions of table and column for managing aliases:

Execution example:

table_create Aliases TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Aliases real_name COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

You need to register the table and column by configuration. The alias feature uses
alias.column configuration item. You can register the table and column by the following
/reference/commands/config_set:

Execution example:

config_set alias.column Aliases.real_name
# [[0, 1337566253.89858, 0.000355720520019531], true]

Here are schema and data to show how to use alias:

Execution example:

table_create Users TABLE_HASH_KEY ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Users age COLUMN_SCALAR UInt8
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Users
[
{"_key": "alice", "age": 14},
{"_key": "bob", "age": 29}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

You can use Users.age in /reference/commands/select:

Execution example:

select Users --filter 'age < 20'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "age",
# "UInt8"
# ]
# ],
# [
# 1,
# "alice",
# 14
# ]
# ]
# ]
# ]

You can't use Users.age when you rename Users.age to Users.years by
/reference/commands/column_rename:

Execution example:

column_rename Users age years
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Users --filter 'age < 20'
# [
# [
# -63,
# 1337566253.89858,
# 0.000355720520019531,
# "Syntax error: <age| |< 20>",
# [
# [
# "yy_syntax_error",
# "grn_ecmascript.lemon",
# 34
# ]
# ]
# ],
# []
# ]

But you can use Users.age by registering Users.age to Users.years mapping to Aliases.

Execution example:

load --table Aliases
[
{"_key": "Users.age", "real_name": "Users.years"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select Users --filter 'age < 20'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "years",
# "UInt8"
# ]
# ],
# [
# 1,
# "alice",
# 14
# ]
# ]
# ]
# ]

Now, you can use Users.age as alias of Users.years.

How to resolve alias
This section describes how to resolve alias.

Groonga uses the alias feature when nonexistent object name (table name, column name,
command name, function name and so on) is referred. It means that you can't override
existing object (table, column, command, function and so on) by the alias feature.

For example, alias isn't resolved in the following example because Users.years exists:

Execution example:

column_rename Users years years_old
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Users --filter 'age < 20'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "years_old",
# "UInt8"
# ]
# ],
# [
# 1,
# "alice",
# 14
# ]
# ]
# ]
# ]

Alias is resolved recursively. If you rename Users.years to Users.years_old and you refer
Users.age, Groonga replaces Users.age with Users.years and then Users.years with
Users.years_old. Because Aliases table has the following records:

┌────────────┬─────────────────┐
_keyreal_name
├────────────┼─────────────────┤
Users.ageUsers.years
├────────────┼─────────────────┤
Users.yearsUsers.years_old
└────────────┴─────────────────┘

Here is an example to Users.age is resolved recursively:

Execution example:

column_rename Users years years_old
# [[0, 1337566253.89858, 0.000355720520019531], true]
select Users --filter 'age < 20'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "years_old",
# "UInt8"
# ]
# ],
# [
# 1,
# "alice",
# 14
# ]
# ]
# ]
# ]

See also
· /reference/configuration

· /reference/commands/config_set

· /reference/commands/table_create

· /reference/commands/column_create

· /reference/commands/select

Suggest
Groonga has the suggest feature. This section describes how to use it and how it works.

Introduction
The suggest feature in Groonga provides the following features:

· Completion

· Correction

· Suggestion

Completion
Completion helps user input. If user inputs a partial word, Groonga can return complete
words from registered words.

For example, there are registered words:

· "groonga"

· "complete"

· "correction"

· "suggest"

An user inputs "co" and groonga returns "complete" and "correction" because they starts
with "co".

An user inputs "sug" and groonga returns "suggest" because "suggest" starts with "sug".

An user inputs "ab" and groonga returns nothing because no word starts with "ab".

Correction
Correction also helps user input. If user inputs a wrong word, groonga can return correct
words from registered correction pairs.

For example, there are registered correction pairs:

┌───────────┬──────────────┐
│wrong word │ correct word │
├───────────┼──────────────┤
│grroonga │ groonga │
├───────────┼──────────────┤
│gronga │ groonga │
├───────────┼──────────────┤
│gronnga │ groonga │
└───────────┴──────────────┘

An user inputs "gronga" and groonga returns "groonga" because "gronga" is in wrong word
and corresponding correct word is "groonga".

An user inputs "roonga" and groonga returns nothing because "roonga" isn't in wrong word.

Suggestion
Suggestion helps that user filters many found documents. If user inputs a query, groonga
can return new queries that has more additional keywords from registered related query
pairs.

For example, there are registered related query pairs:

┌────────┬───────────────────────┐
│keyword │ related query │
├────────┼───────────────────────┤
│groonga │ groonga search engine │
├────────┼───────────────────────┤
│search │ Google search │
├────────┼───────────────────────┤
│speed │ groonga speed │
└────────┴───────────────────────┘

An user inputs "groonga" and groonga returns "groonga search engine" because "groonga" is
in keyword column and related query column is "groonga search engine".

An user inputs "MySQL" and groonga returns nothing because "MySQL" isn't in keyword column
values.

Learning
The suggest feature requires registered data before using the feature. Those data can be
registered from user inputs. Gronnga-suggest-httpd and groonga-suggest-learner commands
are provided for the propose.

Completion
This section describes about the following completion features:

· How it works

· How to use

· How to learn

How it works
The completion feature uses three searches to compute completed words:

1. Prefix RK search against registered words.

2. Cooccurrence search against learned data.

3. Prefix search against registered words. (optional)

Prefix RK search
See /reference/operations/prefix_rk_search for prefix RK search.

If you create dataset which is named as example by
/reference/executables/groonga-suggest-create-dataset executable file, you can update
pairs of registered word and its reading by loading data to _key and kana column of
item_example table explicitly for prefix RK search.

Cooccurrence search
Cooccurrence search can find registered words from user's partial input. It uses user
input sequences that will be learned from query logs, access logs and so on.

For example, there is the following user input sequence:

┌────────┬────────────┐
│input │ submit │
├────────┼────────────┤
│s │ no │
├────────┼────────────┤
│se │ no │
├────────┼────────────┤
│sea │ no │
├────────┼────────────┤
│sear │ no │
├────────┼────────────┤
│searc │ no │
├────────┼────────────┤
│search │ yes │
├────────┼────────────┤
│e │ no │
├────────┼────────────┤
│en │ no │
├────────┼────────────┤
│eng │ no │
├────────┼────────────┤
│engi │ no │
├────────┼────────────┤
│engin │ no │
├────────┼────────────┤
│engine │ no │
├────────┼────────────┤
│enginen │ no (typo!) │
├────────┼────────────┤
│engine │ yes │
└────────┴────────────┘

Groonga creates the following completion pairs:

┌────────┬────────────────┐
│input │ completed word │
├────────┼────────────────┤
│s │ search │
├────────┼────────────────┤
│se │ search │
├────────┼────────────────┤
│sea │ search │
├────────┼────────────────┤
│sear │ search │
├────────┼────────────────┤
│searc │ search │
├────────┼────────────────┤
│e │ engine │
├────────┼────────────────┤
│en │ engine │
├────────┼────────────────┤
│eng │ engine │
├────────┼────────────────┤
│engi │ engine │
├────────┼────────────────┤
│engin │ engine │
├────────┼────────────────┤
│engine │ engine │
└────────┴────────────────┘

│enginen │ engine │
└────────┴────────────────┘

All user not-submitted inputs (e.g. "s", "se" and so on) before each an user submission
maps to the submitted input (e.g. "search").

To be precise, this description isn't correct because it omits about time stamp. Groonga
doesn't case about "all user not-submitted inputs before each an user submission". Groonga
just case about "all user not-submitted inputs within a minute from an user submission
before each an user submission". Groonga doesn't treat user inputs before a minute ago.

If an user inputs "sea" and cooccurrence search returns "search" because "sea" is in input
column and corresponding completed word column value is "search".

Prefix search
Prefix search can find registered word that start with user's input. This search doesn't
care about romaji, katakana and hiragana not like prefix RK search.

This search isn't always ran. It's just ran when it's requested explicitly or both prefix
RK search and cooccurrence search return nothing.

For example, there is a registered word "search". An user can find "search" by "s", "se",
"sea", "sear", "searc" and "search".

How to use
Groonga provides /reference/commands/suggest command to use completion. --type complete
option requests completion.

For example, here is an command to get completion results by "en":

Execution example:

suggest --table item_query --column kana --types complete --frequency_threshold 1 --query en
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "complete": [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "engine",
# 1
# ]
# ]
# }
# ]

How it learns
Cooccurrence search uses learned data. They are based on query logs, access logs and so
on. To create learned data, Groonga needs user input sequence with time stamp and user
submit input with time stamp.

For example, an user wants to search by "engine". The user inputs the query with the
following sequence:

1. 2011-08-10T13:33:23+09:00: e

2. 2011-08-10T13:33:23+09:00: en

3. 2011-08-10T13:33:24+09:00: eng

4. 2011-08-10T13:33:24+09:00: engi

5. 2011-08-10T13:33:24+09:00: engin

6. 2011-08-10T13:33:25+09:00: engine (submit!)

Groonga can be learned from the input sequence by the following command:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "1", "time": 1312950803.86057, "item": "e"},
{"sequence": "1", "time": 1312950803.96857, "item": "en"},
{"sequence": "1", "time": 1312950804.26057, "item": "eng"},
{"sequence": "1", "time": 1312950804.56057, "item": "engi"},
{"sequence": "1", "time": 1312950804.76057, "item": "engin"},
{"sequence": "1", "time": 1312950805.86057, "item": "engine", "type": "submit"}
]

How to update reading data
Groonga requires registered word and its reading for prefix RK search. This section
describes how to register a word and its reading.

Here is an example to register "日本" which means Japan in English:

Execution example:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "1", "time": 1312950805.86058, "item": "日本", "type": "submit"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

Here is an example to update reading data to complete "日本":

Execution example:

load --table item_query
[
{"_key":"日本", "kana":["ニホン", "ニッポン"]}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]

Then you can complete registered word "日本" by Romaji input - "nihon".

Execution example:

suggest --table item_query --column kana --types complete --frequency_threshold 1 --query nihon
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "complete": [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "日本",
# 2
# ]
# ]
# }
# ]

Without loading above reading data, you can't complete registered word "日本" by query -
"nihon".

You can register multiple readings for a registered word because kana column in item_query
table is defined as a /reference/columns/vector.

This is the reason that you can also complete the registered word "日本" by query -
"nippon".

Execution example:

suggest --table item_query --column kana --types complete --frequency_threshold 1 --query nippon
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "complete": [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "日本",
# 2
# ]
# ]
# }
# ]

This feature is very convenient because you can search registered word even though
Japanese input method is disabled.

If there are multiple candidates as completed result, you can customize priority to set
the value of boost column in item_query table.

Here is an example to customize priority for prefix RK search:

Execution example:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "1", "time": 1312950805.86059, "item": "日本語", "type": "submit"}
{"sequence": "1", "time": 1312950805.86060, "item": "日本人", "type": "submit"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
load --table item_query
[
{"_key":"日本語", "kana":"ニホンゴ"}
{"_key":"日本人", "kana":"ニホンジン"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
suggest --table item_query --column kana --types complete --frequency_threshold 1 --query nihon
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "complete": [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "日本",
# 2
# ],
# [
# "日本人",
# 2
# ],
# [
# "日本語",
# 2
# ]
# ]
# }
# ]
load --table item_query
[
{"_key":"日本人", "boost": 100},
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
suggest --table item_query --column kana --types complete --frequency_threshold 1 --query nihon
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "complete": [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "日本人",
# 102
# ],
# [
# "日本",
# 2
# ],
# [
# "日本語",
# 2
# ]
# ]
# }
# ]

Correction
This section describes about the following correction features:

· How it works

· How to use

· How to learn

How it works
The correction feature uses three searches to compute corrected words:

1. Cooccurrence search against learned data.

2. Similar search against registered words. (optional)

Cooccurrence search
Cooccurrence search can find registered words from user's wrong input. It uses user submit
sequences that will be learned from query logs, access logs and so on.

For example, there are the following user submissions:

┌────────────────┬───────────────────────────┐
│query │ time │
├────────────────┼───────────────────────────┤
│serach (typo!) │ 2011-08-10T22:20:50+09:00 │
├────────────────┼───────────────────────────┤
│search (fixed!) │ 2011-08-10T22:20:52+09:00 │
└────────────────┴───────────────────────────┘

Groonga creates the following correction pair from the above submissions:

┌───────┬────────────────┐
│input │ corrected word │
├───────┼────────────────┤
│serach │ search │
└───────┴────────────────┘

Groonga treats continuous submissions within a minute as input correction by user. Not
submitted user input sequence between two submissions isn't used as learned data for
correction.

If an user inputs "serach" and cooccurrence search returns "search" because "serach" is in
input column and corresponding corrected word column value is "search".

Similar search
Similar search can find registered words that has one or more the same tokens as user
input. TokenBigram tokenizer is used for tokenization because suggest dataset schema
created by /reference/executables/groonga-suggest-create-dataset uses TokenBigram
tokenizer as the default tokenizer.

For example, there is a registered query "search engine". An user can find "search engine"
by "web search service", "sound engine" and so on. Because "search engine" and "web search
engine" have the same token "search" and "search engine" and "sound engine" have the same
token "engine".

"search engine" is tokenized to "search" and "engine" tokens. (Groonga's TokenBigram
tokenizer doesn't tokenize two characters for continuous alphabets and continuous digits
for reducing search noise. TokenBigramSplitSymbolAlphaDigit tokenizer should be used to
ensure tokenizing to two characters.) "web search service" is tokenized to "web", "search"
and "service". "sound engine" is tokenized to "sound" and "engine".

How to use
Groonga provides /reference/commands/suggest command to use correction. --type correct
option requests corrections.

For example, here is an command to get correction results by "saerch":

Execution example:

suggest --table item_query --column kana --types correction --frequency_threshold 1 --query saerch
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "correct": [
# [
# 1
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "search",
# 1
# ]
# ]
# }
# ]

How it learns
Cooccurrence search uses learned data. They are based on query logs, access logs and so
on. To create learned data, groonga needs user submit inputs with time stamp.

For example, an user wants to search by "search" but the user has typo "saerch" before
inputs the correct query. The user inputs the query with the following sequence:

1. 2011-08-10T13:33:23+09:00: s

2. 2011-08-10T13:33:23+09:00: sa

3. 2011-08-10T13:33:24+09:00: sae

4. 2011-08-10T13:33:24+09:00: saer

5. 2011-08-10T13:33:24+09:00: saerc

6. 2011-08-10T13:33:25+09:00: saerch (submit!)

7. 2011-08-10T13:33:29+09:00: serch (correcting...)

8. 2011-08-10T13:33:30+09:00: search (submit!)

Groonga can be learned from the input sequence by the following command:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "1", "time": 1312950803.86057, "item": "s"},
{"sequence": "1", "time": 1312950803.96857, "item": "sa"},
{"sequence": "1", "time": 1312950804.26057, "item": "sae"},
{"sequence": "1", "time": 1312950804.56057, "item": "saer"},
{"sequence": "1", "time": 1312950804.76057, "item": "saerc"},
{"sequence": "1", "time": 1312950805.76057, "item": "saerch", "type": "submit"},
{"sequence": "1", "time": 1312950809.76057, "item": "serch"},
{"sequence": "1", "time": 1312950810.86057, "item": "search", "type": "submit"}
]

Suggestion
This section describes about the following completion features:

· How it works

· How to use

· How to learn

How it works
The suggestion feature uses a search to compute suggested words:

1. Cooccurrence search against learned data.

Cooccurrence search
Cooccurrence search can find related words from user's input. It uses user submissions
that will be learned from query logs, access logs and so on.

For example, there are the following user submissions:

┌────────────────────┐
│query │
├────────────────────┤
│search engine │
├────────────────────┤
│web search realtime │
└────────────────────┘

Groonga creates the following suggestion pairs:

┌─────────┬─────────────────────┐
│input │ suggested words │
├─────────┼─────────────────────┤
│search │ search engine │
└─────────┴─────────────────────┘

│engine │ search engine │
├─────────┼─────────────────────┤
│web │ web search realtime │
├─────────┼─────────────────────┤
│search │ web search realtime │
├─────────┼─────────────────────┤
│realtime │ web search realtime │
└─────────┴─────────────────────┘

Those pairs are created by the following steps:

1. Tokenizes user input query by TokenDelimit tokenizer that uses a space as token
delimiter. (e.g. "search engine" is tokenized to two tokens "search" and "engine".)

2. Creates a pair that is consists of a token and original query for each token.

If an user inputs "search" and cooccurrence search returns "search engine" and "web search
realtime" because "search" is in two input columns and corresponding suggested word
columns have "search engine" and "web search realtime".

How to use
Groonga provides /reference/commands/suggest command to use suggestion. --type suggest
option requests suggestion

For example, here is an command to get suggestion results by "search":

Execution example:

suggest --table item_query --column kana --types suggest --frequency_threshold 1 --query search
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "suggest": [
# [
# 2
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_score",
# "Int32"
# ]
# ],
# [
# "search engine",
# 1
# ],
# [
# "web search realtime",
# 1
# ]
# ]
# }
# ]

How it learns
Cooccurrence search uses learned data. They are based on query logs, access logs and so
on. To create learned data, groonga needs user input sequence with time stamp and user
submit input with time stamp.

For example, an user wants to search by "engine". The user inputs the query with the
following sequence:

1. 2011-08-10T13:33:23+09:00: search engine (submit)

2. 2011-08-10T13:33:28+09:00: web search realtime (submit)

Groonga can be learned from the submissions by the following command:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "1", "time": 1312950803.86057, "item": "search engine", "type": "submit"},
{"sequence": "1", "time": 1312950808.86057, "item": "web search realtime", "type": "submit"}
]

How to extract learning data
The learning data is stored into item_DATASET and pair_DATASET tables. By using select
command for such tables, you can all extract learing data.

Here is the query to extract all learning data:

select item_DATASET --limit -1
select pair_DATASET --filter 'freq0 > 0 || freq1 > 0 || freq2 > 0' --limit -1

Without '--limit -1', you can't get all data. In pair table, the valid value of freq0,
freq1 and freq2 column must be larger than 0.

Don't execute above query via HTTP request because enourmous number of records are
fetched.

Indexing
Groonga supports both online index construction and offline index construction since
2.0.0.

Online index construction
In online index construction, registered documents can be searchable quickly while
indexing. But indexing requires more cost rather than indexing by offline index
construction.

Online index construction is suitable for a search system that values freshness. For
example, a search system for tweets, news, blog posts and so on will value freshness.
Online index construction can make fresh documents searchable and keep searchable while
indexing.

Offline index construction
In offline index construction, indexing cost is less than indexing cost by online index
construction. Indexing time will be shorter. Index will be smaller. Resources required for
indexing will be smaller. But a registering document cannot be searchable until all
registered documents are indexed.

Offline index construction is suitable for a search system that values less required
resources. If a search system doesn't value freshness, offline index construction will be
suitable. For example, a reference manual search system doesn't value freshness because a
reference manual will be updated only at a release.

How to use
Groonga uses online index construction by default. We register a document, we can search
it quickly.

Groonga uses offline index construction by adding an index to a column that already has
data.

We define a schema:

Execution example:

table_create Tweets TABLE_NO_KEY
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Tweets content COLUMN_SCALAR ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Lexicon TABLE_HASH_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]

We register data:

Execution example:

load --table Tweets
[
{"content":"Hello!"},
{"content":"I just start it!"},
{"content":"I'm sleepy... Have a nice day... Good night..."}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]

We can search with sequential search when we don't have index:

Execution example:

select Tweets --match_columns content --query 'good nice'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 3,
# "I'm sleepy... Have a nice day... Good night..."
# ]
# ]
# ]
# ]

We create index for Tweets.content. Already registered data in Tweets.content are indexed
by offline index construction:

Execution example:

column_create Lexicon tweet COLUMN_INDEX|WITH_POSITION Tweets content
# [[0, 1337566253.89858, 0.000355720520019531], true]

We search with index. We get a matched record:

Execution example:

select Tweets --match_columns content --query 'good nice'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 3,
# "I'm sleepy... Have a nice day... Good night..."
# ]
# ]
# ]
# ]

We register data again. They are indexed by online index construction:

Execution example:

load --table Tweets
[
{"content":"Good morning! Nice day."},
{"content":"Let's go shopping."}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]

We can also get newly registered records by searching:

Execution example:

select Tweets --match_columns content --query 'good nice'
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "content",
# "ShortText"
# ]
# ],
# [
# 3,
# "I'm sleepy... Have a nice day... Good night..."
# ],
# [
# 4,
# "Good morning! Nice day."
# ]
# ]
# ]
# ]

Sharding
New in version 5.0.0.

Groonga has /limitations against table size. You can't add 268,435,455 more records in one
table.

Groonga supports time based sharding to resolve the limitation.

It works in the same database. It doesn't work with multiple databases. It means that this
sharding feature isn't for distributing large data to multiple hosts.

If you want distributed sharding feature, use Mroonga or PGroonga. You can use sharding
feature by MySQL or PostgreSQL. You'll be able to use Droonga for distributed sharding
feature soon.

Summary
Sharding is implemented in sharding plugin. The plugin is written in mruby. You need to
enable mruby when you build Groonga.

You can confirm whether your Groonga supports mruby or not by --version command line
argument of /reference/executables/groonga:

% groonga --version
groonga 5.0.5 [...,mruby,...]

configure options: <...>

If you find mruby, your Groonga supports mruby.

sharding plugin provides only search commands. They have logical_ prefix in their command
names such as /reference/commands/logical_select and
/reference/commands/logical_range_filter.

sharding plugin doesn't provide schema define commands and data load commands yet. You
need to use existing commands such as /reference/commands/table_create,
/reference/commands/column_create and /reference/commands/load.

sharding plugin requires some rules against table and column. You need to follow these
rules. They are described later.

Glossary
┌───────────────────┬───────────────────────────────────┐
│Name │ Description │
├───────────────────┼───────────────────────────────────┤
│Logical table │ It's a table that consists of │
│ │ shards. It doesn't exist in │
│ │ Groonga database. It just exists │
│ │ in our minds. │
├───────────────────┼───────────────────────────────────┤
│Logical table name │ The name of logical table. It's │
│ │ prefix of shard names. For │
│ │ example, Logs is a logical table │
│ │ name and Logs_20150814 and │
│ │ Logs_20150815 are shard names. │
└───────────────────┴───────────────────────────────────┘

│Shard │ It's a table that has records in │
│ │ a day or month. One shard has │
│ │ only partial records. │
│ │ │
│ │ Shard name (= table name) must │
│ │ follow │
│ │ ${LOGICAL_TABLE_NAME}_${YYYYMMDD}
│ │ format or │
│ │ ${LOGICAL_TABLE_NAME}_${YYYYMM}
│ │ format. ${LOGICAL_TABLE_NAME} is │
│ │ expanded to logical table name. │
│ │ ${YYYYMMDD} is expanded to day. │
│ │ ${YYYYMM} is expanded to month. │
│ │ │
│ │ For example, Logs_20150814 is │
│ │ consists of Logs logical name │
│ │ and 20150814 day. │
└───────────────────┴───────────────────────────────────┘

Rules
TODO

Commands
Log
Groonga has two log files. They are process log and query log. Process log is for all of
executables/groonga works. Query log is just for query processing.

Process log
Process log is enabled by default. Log path can be customized by --log-path option. Each
log has its log level. If a log is smaller than groonga process' log level, it's not
logged. Log level can be customized by -l or commands/log_level.

Format
Process log uses the following format:

#{TIME_STAMP}|#{L}| #{MESSAGE}

TIME_STAMP
It's time stamp uses the following format:

YYYY-MM-DD hh:mm:ss.SSSSSS

YYYY Year with four digits.

MM Month with two digits.

DD Day with two digits.

hh Hour with two digits.

mm Minute with two digits.

ss Second with two digits.

SSSSSS Microsecond with six digits.

Example:

2011-07-05 06:25:18.345734

L Log level with a character. Here is a character and log level map.

E Emergency

A Alert

C Critical

e Error

w Warning

n Notification

i Information

d Debug

- Dump

Example:

E

MESSAGE
Details about the log with free format.

Example:

log opened.

Example:

2011-07-05 08:35:09.276421|n| grn_init
2011-07-05 08:35:09.276553|n| RLIMIT_NOFILE(4096,4096)

Query log
Query log is disabled by default. It can be enabled by --query-log-path option.

Format
Query log uses the following formats:

#{TIME_STAMP}|#{MESSAGE}
#{TIME_STAMP}|#{ID}|>#{QUERY}
#{TIME_STAMP}|#{ID}|:#{ELAPSED_TIME} #{PROGRESS}
#{TIME_STAMP}|#{ID}|<#{ELAPSED_TIME} #{RETURN_CODE}

TIME_STAMP
It's time stamp uses the following format:

YYYY-MM-DD hh:mm:ss.SSSSSS

YYYY Year with four digits.

MM Month with two digits.

DD Day with two digits.

hh Hour with two digits.

mm Minute with two digits.

ss Second with two digits.

SSSSSS Microsecond with six digits.

Example:

2011-07-05 06:25:18.345734

ID ID of a thread. Groonga process creates threads to process requests concurrently.
Each thread outputs some logs for a request. This ID can be used to extract a log
sequence by a thread.

Example:

45ea3034

> A character that indicates query is started.

: A character that indicates query is processing.

< A character that indicates query is finished.

MESSAGE
Details about the log with free format.

Example:

query log opened.

QUERY A query to be processed.

Example:

select users --match_columns hobby --query music

ELAPSED_TIME
Elapsed time in nanoseconds since query is started.

Example:

000000000075770
(It means 75,770 nanoseconds.)

PROGRESS
A processed work at the time.

Example:

select(313401)
(It means that 'select' is processed and 313,401 records are remained.)

RETURN_CODE
A return code for the query.

Example:

rc=0
(It means return code is 0. 0 means GRN_SUCCESS.)

Example:

2011-07-05 06:25:19.458756|45ea3034|>select Properties --limit 0
2011-07-05 06:25:19.458829|45ea3034|:000000000072779 select(19)
2011-07-05 06:25:19.458856|45ea3034|:000000000099998 output(0)
2011-07-05 06:25:19.458875|45ea3034|<000000000119062 rc=0
2011-07-05 06:25:19.458986|45ea3034|>quit

Tuning
Summary
There are some tuning parameters for handling a large database.

Parameters
This section describes tuning parameters.

The max number of open files per process
This parameter is for handling a large database.

Groonga creates one or more files per table and column. If your database has many tables
and columns, Groonga process needs to open many files.

System limits the max number of open files per process. So you need to relax the
limitation.

Here is an expression that compute how many files are opened by Groonga:

3 (for DB) +
N tables +
N columns (except index clumns) +
(N index columns * 2) +
X (the number of plugins etc.)

Here is an example schema:

table_create Entries TABLE_HASH_KEY ShortText
column_create Entries content COLUMN_SCALAR Text
column_create Entries n_likes COLUMN_SCALAR UInt32
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content

This example opens at least 11 files:

3 +
2 (Entries and Terms) +
2 (Entries.content and Entries.n_likes) +
4 (Terms.entries_key_index and Terms.entries_content_index) +
X = 11 + X

Memory usage
This parameter is for handling a large database.

Groonga maps database files onto memory and accesses to them. Groonga doesn't maps
unnecessary files onto memory. Groonga maps files when they are needed.

If you access to all data in database, all database files are mapped onto memory. If total
size of your database files is 6GiB, your Groonga process uses 6GiB memory.

Normally, your all database files aren't mapped onto memory. But it may be occurred. It is
an example case that you dump your database.

Normally, you must have memory and swap that is larger than database. Linux has tuning
parameter to work with less memory and swap than database size.

Linux
This section describes how to configure parameters on Linux.

nofile
You can relax the The max number of open files per process parameter by creating a
configuration file /etc/security/limits.d/groonga.conf that has the following content:

${USER} soft nofile ${MAX_VALUE}
${USER} hard nofile ${MAX_VALUE}

If you run Groonga process by groonga user and your Groonga process needs to open less
than 10000 files, use the following configuration:

groonga soft nofile 10000
groonga hard nofile 10000

The configuration is applied after your Groonga service is restarted or re-login as your
groonga user.

vm.overcommit_memory
This is Memory usage related parameter. You can handle a database that is larger than your
memory and swap by setting vm.overcommit_memory kernel parameter to 1. 1 means that
Groonga can always map database files onto memory. Groonga recommends the configuration.

See Linux kernel documentation about overcommit about vm.overcommit_memory parameter
details.

You can set the configuration by putting a configuration file /etc/sysctl.d/groonga.conf
that has the following content:

vm.overcommit_memory = 1

The configuration can be applied by restarting your system or run the following command:

% sudo sysctl --system

vm.max_map_count
This is Memory usage related parameter. You can handle a 16GiB or more larger size
database by increasing vm.max_map_count kernel parameter. The parameter limits the max
number of memory maps.

The default value of the kernel parameter may be 65530 or 65536. Groonga maps 256KiB
memory chunk at one time. If a database is larger than 16GiB, Groonga reaches the
limitation. (256KiB * 65536 = 16GiB)

You needs to increase the value of the kernel parameter to handle 16GiB or more larger
size database. For example, you can handle almost 32GiB size database by 65536 * 2 =
131072. You can set the configuration by putting a configuration file
/etc/sysctl.d/groonga.conf that has the following content:

vm.max_map_count = 131072

Note that your real configuration file will be the following because you already have
vm.overcommit_memory configuration:

vm.overcommit_memory = 1
vm.max_map_count = 131072

The configuration can be applied by restarting your system or run the following command:

% sudo sysctl -p

FreeBSD
This section describes how to configure parameters on FreeBSD.

kern.maxfileperproc
TODO

API
Groonga can be used as a fulltext search library. This section describes APIs that are
provided by groonga.

Overview
Summary
You can use Groonga as a library. You need to use the following APIs to initialize and
finalize Groonga.

grn_init() initializes Groonga. In contrast, grn_fin() finalizes Groonga.

You must call grn_init() only once before you use APIs which are provided by Groonga. You
must call grn_fin() only once after you finish to use APIs which are provided by Groonga.

Example
Here is an example that uses Groonga as a full-text search library.

grn_rc rc;
/* It initializes resources used by Groonga. */
rc = grn_init();
if (rc != GRN_SUCCESS) {
return EXIT_FAILURE;
}
/* Some Groonga API calling codes... */
/* It releases resources used by Groonga. */
grn_fin();
return EXIT_SUCCESS;

Reference
grn_rc grn_init(void)
grn_init() initializes resources that are used by Groonga. You must call it just
once before you call other Groonga APIs.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_rc grn_fin(void)
grn_fin() releases resources that are used by Groonga. You can't call other Groonga
APIs after you call grn_fin().

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

Global configurations
Summary
Groonga has the global configurations. You can access them by API.

Reference
int grn_get_lock_timeout(void)
Returns the lock timeout.

grn_ctx acquires a lock for updating a shared value. If other grn_ctx is already
updating the same value, grn_ctx that try to acquire a lock can't acquires a lock.
The grn_ctx that can't acquires a lock waits 1 millisecond and try to acquire a
lock again. The try is done timeout times. If the grn_ctx that can't acquires a
lock until timeout times, the tries are failed.

The default lock timeout is 10000000. It means that Groonga doesn't report a lock
failure until about 3 hours. (1 * 10000000 [msec] = 10000 [sec] = 166.666... [min]
= 2.777... [hour])

Returns
The lock timeout.

grn_rc grn_set_lock_timeout(int timeout)
Sets the lock timeout.

See grn_get_lock_timeout() about lock timeout.

There are some special values for timeout.

· 0: It means that Groonga doesn't retry acquiring a lock. Groonga reports a
failure after one lock acquirement failure.

· negative value: It means that Groonga retries acquiring a lock until Groonga
can acquire a lock.

Parameters

· timeuot -- The new lock timeout.

Returns
GRN_SUCCESS. It doesn't fail.

Plugin
Summary
Groonga supports plugin. You can create a new plugin with the following API.

TOOD: Describe about how to create the minimum plugin here or create a tutorial about it.

Reference
grn_rc GRN_PLUGIN_INIT(grn_ctx *ctx)

grn_rc GRN_PLUGIN_REGISTER(grn_ctx *ctx)

grn_rc GRN_PLUGIN_FIN(grn_ctx *ctx)

GRN_PLUGIN_MALLOC(ctx, size)
GRN_PLUGIN_MALLOC() allocates size bytes and returns a pointer to the allocated
memory space. Note that the memory space is associated with ctx.

GRN_PLUGIN_REALLOC(ctx, ptr, size)
GRN_PLUGIN_REALLOC() resizes the memory space pointed to by ptr or allocates a new
memory space of size bytes. GRN_PLUGIN_REALLOC() returns a pointer to the memory
space. The contents is unchanged or copied from the old memory space to the new
memory space.

GRN_PLUGIN_FREE(ctx, ptr)
GRN_PLUGIN_FREE() frees a memory space allocated by GRN_PLUGIN_MALLOC() or
GRN_PLUGIN_REALLOC(). This means that ptr must be a pointer returned by
GRN_PLUGIN_MALLOC() or GRN_PLUGIN_REALLOC().

GRN_PLUGIN_LOG(ctx, level, ...)
GRN_PLUGIN_LOG() reports a log of level. Its error message is generated from the
varying number of arguments, in which the first one is the format string and the
rest are its arguments. See grn_log_level in "groonga.h" for more details of level.

GRN_PLUGIN_ERROR(ctx, error_code, ...)
GRN_PLUGIN_ERROR() reports an error of error_code. Its error message is generated
from the varying number of arguments, in which the first one is the format string
and the rest are its arguments. See grn_rc in "groonga.h" for more details of
error_code.

grn_plugin_mutex
grn_plugin_mutex is available to make a critical section. See the following
functions.

grn_plugin_mutex *grn_plugin_mutex_open(grn_ctx *ctx)
grn_plugin_mutex_open() returns a pointer to a new object of grn_plugin_mutex.
Memory for the new object is obtained with GRN_PLUGIN_MALLOC().
grn_plugin_mutex_open() returns NULL if sufficient memory is not available.

void grn_plugin_mutex_close(grn_ctx *ctx, grn_plugin_mutex *mutex)
grn_plugin_mutex_close() finalizes an object of grn_plugin_mutex and then frees
memory allocated for that object.

void grn_plugin_mutex_lock(grn_ctx *ctx, grn_plugin_mutex *mutex)
grn_plugin_mutex_lock() locks a mutex object. If the object is already locked, the
calling thread waits until the object will be unlocked.

void grn_plugin_mutex_unlock(grn_ctx *ctx, grn_plugin_mutex *mutex)
grn_plugin_mutex_unlock() unlocks a mutex object. grn_plugin_mutex_unlock() should
not be called for an unlocked object.

grn_obj *grn_plugin_proc_alloc(grn_ctx *ctx, grn_user_data *user_data, grn_id domain,
grn_obj_flags flags)
grn_plugin_proc_alloc() allocates a grn_obj object. You can use it in function
that is registered as GRN_PROC_FUNCTION.

grn_obj grn_plugin_proc_get_var(grn_ctx *ctx, grn_user_data *user_data, const char *name,
int name_size)
It gets a variable value from grn_user_data by specifying the variable name.

Parameters

· name -- The variable name.

· name_size -- The number of bytes of name. If name_size is negative, name
must be NUL-terminated. name_size is computed by strlen(name) for the
case.

Returns
A variable value on success, NULL otherwise.

grn_obj *grn_plugin_proc_get_var_by_offset(grn_ctx *ctx, grn_user_data *user_data,
unsigned int offset)
It gets a variable value from grn_user_data by specifying the offset position of
the variable.

Parameters

· offset -- The offset position of the variable.

Returns
A variable value on success, NULL otherwise.

const char *grn_plugin_win32_base_dir(void)
Deprecated since version 5.0.9.: Use grn_plugin_windows_base_dir() instead.

It returns the Groonga install directory. The install directory is computed from
the directory that has groonga.dll. You can use the directory to generate install
directory aware path. It only works on Windows. It returns NULL on other platforms.

const char *grn_plugin_windows_base_dir(void)
New in version 5.0.9.

It returns the Groonga install directory. The install directory is computed from
the directory that has groonga.dll. You can use the directory to generate install
directory aware path. It only works on Windows. It returns NULL on other platforms.

int grn_plugin_charlen(grn_ctx *ctx, const char *str_ptr, unsigned int str_length,
grn_encoding encoding)
grn_plugin_charlen() returns the length (#bytes) of the first character in the
string specified by str_ptr and str_length. If the starting bytes are invalid as a
character, grn_plugin_charlen() returns 0. See grn_encoding in "groonga.h" for more
details of encoding.

int grn_plugin_isspace(grn_ctx *ctx, const char *str_ptr, unsigned int str_length,
grn_encoding encoding)
grn_plugin_isspace() returns the length (#bytes) of the first character in the
string specified by str_ptr and str_length if it is a space character. Otherwise,
grn_plugin_isspace() returns 0.

grn_rc grn_plugin_expr_var_init(grn_ctx *ctx, grn_expr_var *var, const char *name,
int name_size)
It initializes a grn_expr_var.

Parameters

· var -- The pointer of grn_expr_var object to be initialized.

· name -- The name of grn_expr_var object to be initialized.

· name_size -- The number of bytes of name. If name_size is negative, name
must be NUL-terminated. name_size is computed by strlen(name) for the
case.

Returns
GRN_SUCCESS. It doesn't fail.

grn_obj * grn_plugin_command_create(grn_ctx *ctx, const char *name, int name_size,
grn_proc_func func, unsigned int n_vars, grn_expr_var *vars)
It creates a command.

Parameters

· name -- The proc name of the command to be created.

· name_size -- The number of bytes of name. If name_size is negative, name
must be NUL-terminated. name_size is computed by strlen(name) for the
case.

· func -- The function name to be called by the created command.

· n_vars -- The number of the variables of the command to create.

· vars -- The pointer of initialized grn_expr_var object.

Returns
The created command object if it creates a command successfully, NULL
otherwise. See ctx for error details.

grn_cache
Summary
NOTE:
This API is experimental.

grn_cache is a data store that keeps responses of /reference/commands/select command. It
is not general use cache object. It is only for /reference/commands/select command.

You can just change the current cache object by grn_cache_current_set().
/reference/commands/select command response cache is done internally.

/reference/commands/select command uses one global cache object. If you open multiple
databases, the one cache is shared. It is an important problem.

If you open multiple databases and use /reference/commands/select command, you need to use
grn_cache object. It is /reference/executables/groonga-httpd case. If you open only one
database or don't use /reference/commands/select command, you don't need to use grn_cache
object. It is rroonga case.

Example
Here is an example that change the current cache object.

grn_cache *cache;
grn_cache *cache_previous;
cache = grn_cache_open(ctx);
cache_previous = grn_cache_current_get(ctx);
grn_cache_current_set(ctx, cache);
/* grn_ctx_send(ctx, ...); */
grn_cache_current_set(ctx, cache_previous);

Reference
grn_cache
It is an opaque cache object. You can create a grn_cache by grn_cache_open() and
free the created object by grn_cache_close().

grn_cache *grn_cache_open(grn_ctx *ctx)
Creates a new cache object.

If memory allocation for the new cache object is failed, NULL is returned. Error
information is stored into the ctx.

Parameters

· ctx -- The context.

Returns
A newly allocated cache object on success, NULL otherwise. The returned
cache object must be freed by grn_cache_close().

grn_rc grn_cache_close(grn_ctx *ctx, grn_cache *cache)
Frees resourses of the cache.

Parameters

· ctx -- The context.

· cache -- The cache object to be freed.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS otherwise.

grn_rc grn_cache_current_set(grn_ctx *ctx, grn_cache *cache)
Sets the cache object that is used in /reference/commands/select command.

Parameters

· ctx -- The context.

· cache -- The cache object that is used in /reference/commands/select
command.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS otherwise.

grn_cache *grn_cache_current_get(grn_ctx *ctx)
Gets the cache object that is used in /reference/commands/select command.

Parameters

· ctx -- The context.

Returns
The cache object that is used in /reference/commands/select command. It may
be NULL.

grn_rc grn_cache_set_max_n_entries(grn_ctx *ctx, grn_cache *cache, unsigned int n)
Sets the max number of entries of the cache object.

Parameters

· ctx -- The context.

· cache -- The cache object to be changed.

· n -- The new max number of entries of the cache object.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS otherwise.

unsigned int grn_cache_get_max_n_entries(grn_ctx *ctx, grn_cache *cache)
Gets the max number of entries of the cache object.

Parameters

· ctx -- The context.

· cache -- The target cache object.

Returns
The max number of entries of the cache object.

grn_column
Summary
TODO...

Example
TODO...

Reference
GRN_COLUMN_NAME_ID
It returns the name of /reference/columns/pseudo _id.

It is useful to use with GRN_COLUMN_NAME_ID_LEN like the following:

grn_obj *id_column;
id_column = grn_ctx_get(ctx, GRN_COLUMN_NAME_ID, GRN_COLUMN_NAME_ID_LEN);

Since 3.1.1.

GRN_COLUMN_NAME_ID_LEN
It returns the byte size of GRN_COLUMN_NAME_ID.

Since 3.1.1.

GRN_COLUMN_NAME_KEY
It returns the name of /reference/columns/pseudo _key.

It is useful to use with GRN_COLUMN_NAME_KEY_LEN like the following:

grn_obj *key_column;
key_column = grn_ctx_get(ctx, GRN_COLUMN_NAME_KEY, GRN_COLUMN_NAME_KEY_LEN);

Since 3.1.1.

GRN_COLUMN_NAME_KEY_LEN
It returns the byte size of GRN_COLUMN_NAME_KEY.

Since 3.1.1.

GRN_COLUMN_NAME_VALUE
It returns the name of /reference/columns/pseudo _value.

It is useful to use with GRN_COLUMN_NAME_VALUE_LEN like the following:

grn_obj *value_column;
value_column = grn_ctx_get(ctx, GRN_COLUMN_NAME_VALUE, GRN_COLUMN_NAME_VALUE_LEN);

Since 3.1.1.

GRN_COLUMN_NAME_VALUE_LEN
It returns the byte size of GRN_COLUMN_NAME_VALUE.

Since 3.1.1.

GRN_COLUMN_NAME_SCORE
It returns the name of /reference/columns/pseudo _score.

It is useful to use with GRN_COLUMN_NAME_SCORE_LEN like the following:

grn_obj *score_column;
score_column = grn_ctx_get(ctx, GRN_COLUMN_NAME_SCORE, GRN_COLUMN_NAME_SCORE_LEN);

Since 3.1.1.

GRN_COLUMN_NAME_SCORE_LEN
It returns the byte size of GRN_COLUMN_NAME_SCORE.

Since 3.1.1.

GRN_COLUMN_NAME_NSUBRECS
It returns the name of /reference/columns/pseudo _nsubrecs.

It is useful to use with GRN_COLUMN_NAME_NSUBRECS_LEN like the following:

grn_obj *nsubrecs_column;
nsubrecs_column = grn_ctx_get(ctx, GRN_COLUMN_NAME_NSUBRECS, GRN_COLUMN_NAME_NSUBRECS_LEN);

Since 3.1.1.

GRN_COLUMN_NAME_NSUBRECS_LEN
It returns the byte size of GRN_COLUMN_NAME_NSUBRECS.

Since 3.1.1.

grn_obj *grn_column_create(grn_ctx *ctx, grn_obj *table, const char *name, unsigned
int name_size, const char *path, grn_obj_flags flags, grn_obj *type)
tableに新たなカラムを定義します。nameは省略できません。一つのtableに同一のnameのcolumnを複数定義することはできません。

Parameters

· table -- 対象tableを指定します。

· name -- カラム名を指定します。

· name_size -- nameパラメータのsize(byte)を指定します。

· path -- カラムを格納するファイルパスを指定します。 flagsに
GRN_OBJ_PERSISTENT が指定されている場合のみ有効です。
NULLなら自動的にファイルパスが付与されます。

· flags --

GRN_OBJ_PERSISTENT を指定すると永続columnとなります。

GRN_OBJ_COLUMN_INDEX を指定すると転置インデックスとなります。

GRN_OBJ_COLUMN_SCALAR を指定するとスカラ値(単独の値)を格納します。

GRN_OBJ_COLUMN_VECTOR を指定すると値の配列を格納します。

GRN_OBJ_COMPRESS_ZLIB を指定すると値をzlib圧縮して格納します。

GRN_OBJ_COMPRESS_LZO を指定すると値をlzo圧縮して格納します。

GRN_OBJ_COLUMN_INDEX と共に GRN_OBJ_WITH_SECTION
を指定すると、転置索引にsection(段落情報)を合わせて格納します。

GRN_OBJ_COLUMN_INDEX と共に GRN_OBJ_WITH_WEIGHT
を指定すると、転置索引にweight情報を合わせて格納します。

GRN_OBJ_COLUMN_INDEX と共に GRN_OBJ_WITH_POSITION
を指定すると、転置索引に出現位置情報を合わせて格納します。

· type --
カラム値の型を指定します。定義済みのtypeあるいはtableを指定できます。

grn_rc grn_column_index_update(grn_ctx *ctx, grn_obj *column, grn_id id, unsigned
int section, grn_obj *oldvalue, grn_obj *newvalue)
oldvalue, newvalueの値から得られるキーに対応するcolumnの値の中の、id,
sectionに対応するエントリを更新します。columnは GRN_OBJ_COLUMN_INDEX
型のカラムでなければなりません。

Parameters

· column -- 対象columnを指定します。

· id -- 対象レコードのIDを指定します。

· section -- 対象レコードのセクション番号を指定します。

· oldvalue -- 更新前の値を指定します。

· newvalue -- 更新後の値を指定します。

grn_obj *grn_column_table(grn_ctx *ctx, grn_obj *column)
columnが属するtableを返します。

Parameters

· column -- 対象columnを指定します。

grn_rc grn_column_rename(grn_ctx *ctx, grn_obj *column, const char *name, unsigned
int name_size)
ctxが使用するdbにおいてcolumnに対応する名前をnameに更新します。columnは永続オブジェクトでなければいけません。

Parameters

· column -- 対象columnを指定します。

· name -- 新しい名前を指定します。

· name_size -- nameパラメータのsize(byte)を指定します。

int grn_column_name(grn_ctx *ctx, grn_obj *obj, char *namebuf, int buf_size)
カラムobjの名前の長さを返します。buf_sizeの長さが名前の長さ以上であった場合は、namebufに該当する名前をコピーします。

Parameters

· obj -- 対象objectを指定します。

· namebuf -- 名前を格納するバッファ(呼出側で準備する)を指定します。

· buf_size -- namebufのサイズ(byte長)を指定します。

int grn_column_index(grn_ctx *ctx, grn_obj *column, grn_operator op, grn_obj **indexbuf,
int buf_size, int *section)
columnに張られているindexのうち、opの操作を実行可能なものの数を返します。またそれらのidを、buf_sizeに指定された個数を上限としてindexbufに返します。

Parameters

· column -- 対象のcolumnを指定します。

· op -- indexで実行したい操作を指定します。

· indexbuf -- indexを格納するバッファ(呼出側で準備する)を指定します。

· buf_size -- indexbufのサイズ(byte長)を指定します。

· section --
section番号を格納するint長バッファ(呼出側で準備する)を指定します。

grn_rc grn_column_truncate(grn_ctx *ctx, grn_obj *column)

NOTE:
This is a dangerous API. You must not use this API when other thread or process
accesses the target column. If you use this API against shared column, the
process that accesses the column may be broken and the column may be broken.

New in version 4.0.9.

Clears all values in the column.

Parameters

· column -- The column to be truncated.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_command_version
Summary
TODO...

Example
TODO...

Reference
grn_command_version

GRN_COMMAND_VERSION_MIN

GRN_COMMAND_VERSION_STABLE

GRN_COMMAND_VERSION_MAX

grn_command_version grn_get_default_command_version(void)
デフォルトのcommand_versionを返します。

grn_rc grn_set_default_command_version(grn_command_version version)
デフォルトのcommand_versionを変更します。

Parameters

· version -- 変更後のデフォルトのcommand_versionを指定します。

grn_content_type
Summary
grn_content_type shows input type and output type. Currently, it is used only for output
type.

Normally, you don't need to use this type. It is used internally in grn_ctx_send().

Reference
grn_content_type
Here are available values:

GRN_CONTENT_NONE
It means that outputting nothing or using the original format.
/reference/commands/dump uses the type.

GRN_CONTENT_TSV
It means tab separated values format.

GRN_CONTENT_JSON
It means JSON format.

GRN_CONTENT_XML
It means XML format.

GRN_CONTENT_MSGPACK
It means MessagePack format. You need MessagePack library on building
Groonga. If you don't have MessagePack library, you can't use this type.

grn_ctx
Summary
grn_ctx is the most important object. grn_ctx keeps the current information such as:

· The last occurred error.

· The current encoding.

· The default thresholds. (e.g. select-match-escalation-threshold)

· The default command version. (See /reference/command/command_version)

grn_ctx provides platform features such as:

· Memory management.

· Logging.

Most APIs receive grn_ctx as the first argument.

You can't use the same grn_ctx from two or more threads. You need to create a grn_ctx for
a thread. You can use two or more grn_ctx in a thread but it is not needed for usual
use-case.

Example
TODO...

Reference
grn_ctx
TODO...

grn_rc grn_ctx_init(grn_ctx *ctx, int flags)
ctxを初期化します。

Parameters

· ctx -- 初期化するctx構造体へのポインタを指定します。

· flags -- 初期化する ctx のオプションを指定します。

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_rc grn_ctx_fin(grn_ctx *ctx)
ctxの管理するメモリを解放し、使用を終了します。

If ctx is initialized by grn_ctx_open() not grn_ctx_init(), you need to use
grn_ctx_close() instead of grn_ctx_fin().

Parameters

· ctx -- 解放するctx構造体へのポインタを指定します。

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_ctx *grn_ctx_open(int flags)
初期化された grn_ctx オブジェクトを返します。

grn_ctx_init() で初期化された grn_ctx
オブジェクトは構造体の実体をAPIの呼び元で確保するのに対して、 grn_ctx_open()
ではGroongaライブラリの内部で、実体を確保します。 どちらで初期化された grn_ctx も、
grn_ctx_fin() で解放できます。 grn_ctx_open() で確保した grn_ctx
構造体に関しては、grn_ctx_fin() で解放した後に、その grn_ctx で作成した grn_obj
grn_obj_close() によって解放しても問題ありません。

Parameters

· flags -- 初期化する ctx のオプションを指定します。

Returns
初期化された grn_ctx オブジェクトを返します。

grn_rc grn_ctx_close(grn_ctx *ctx)
It calls grn_ctx_fin() and frees allocated memory for ctx by grn_ctx_open().

Parameters

· ctx -- no longer needed grn_ctx.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_rc grn_ctx_set_finalizer(grn_ctx *ctx, grn_proc_func *func)
ctxを破棄するときに呼ばれる関数を設定します。

Parameters

· ctx -- 対象ctxを指定します。

· func -- ctx を破棄するときに呼ばれる関数を指定します。

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_command_version grn_ctx_get_command_version(grn_ctx *ctx)
command_versionを返します。

grn_rc grn_ctx_set_command_version(grn_ctx *ctx, grn_command_version version)
command_versionを変更します。

Parameters

· version -- 変更後のcommand_versionを指定します。

grn_rc grn_ctx_use(grn_ctx *ctx, grn_obj *db)
ctxが操作対象とするdbを指定します。NULLを指定した場合は、dbを操作しない状態(init直後の状態)になります。

Don't use it with grn_ctx that has GRN_CTX_PER_DB flag.

Parameters

· db -- ctxが使用するdbを指定します。

grn_obj *grn_ctx_db(grn_ctx *ctx)
ctxが現在操作対象としているdbを返します。dbを使用していない場合はNULLを返します。

grn_obj *grn_ctx_get(grn_ctx *ctx, const char *name, int name_size)
ctxが使用するdbからnameに対応するオブジェクトを検索して返す。nameに一致するオブジェクトが存在しなければNULLを返す。

Parameters

· name -- 検索しようとするオブジェクトの名前。

· name_size -- The number of bytes of name. If negative value is specified,
name is assumed that NULL-terminated string.

grn_obj *grn_ctx_at(grn_ctx *ctx, grn_id id)
ctx、またはctxが使用するdbからidに対応するオブジェクトを検索して返す。idに一致するオブジェクトが存在しなければNULLを返す。

Parameters

· id -- 検索しようとするオブジェクトのidを指定します。

grn_rc grn_ctx_get_all_tables(grn_ctx *ctx, grn_obj *tables_buffer)
It pushes all tables in the database of ctx into tables_buffer. tables_buffer
should be initialized as GRN_PVECTOR. You can use GRN_PTR_INIT() with
GRN_OBJ_VECTOR flags to initialize tables_buffer.

Here is an example:

grn_rc rc;
grn_obj tables;
int i;
int n_tables;

GRN_PTR_INIT(&tables, GRN_OBJ_VECTOR, GRN_ID_NIL);
rc = grn_ctx_get_all_tables(ctx, &tables);
if (rc != GRN_SUCCESS) {
GRN_OBJ_FIN(ctx, &tables);
/* Handle error. */
return;
}

n_tables = GRN_BULK_VSIZE(&tables) / sizeof(grn_obj *);
for (i = 0; i < n_tables; i++) {
grn_obj *table = GRN_PTR_VALUE_AT(&tables, i);
/* Use table. */
}

/* Free resources. */
for (i = 0; i < n_tables; i++) {
grn_obj *table = GRN_PTR_VALUE_AT(&tables, i);
grn_obj_unlink(ctx, table);
}
GRN_OBJ_FIN(ctx, &tables);

Parameters

· ctx -- The context object.

· table_buffer -- The output buffer to store tables.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_content_type grn_ctx_get_output_type(grn_ctx *ctx)
Gets the current output type of the context.

Normally, this function isn't needed.

Parameters

· ctx -- The context object.

Returns
The output type of the context.

grn_rc grn_ctx_set_output_type(grn_ctx *ctx, grn_content_type type)
Sets the new output type to the context. It is used by executing a command by
grn_expr_exec(). If you use grn_ctx_send(), the new output type isn't used.
grn_ctx_send() sets output type from command line internally.

Normally, this function isn't needed.

Parameters

· ctx -- The context object.

· type -- The new output type.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_bool_rc grn_ctx_is_opened(grn_ctx *ctx, grn_id id)
Checks whether object with the ID is opened or not.

Parameters

· ctx -- The context object.

· id -- The object ID to be checked.

Returns
GRN_TRUE if object with the ID is opened, GRN_FALSE otherwise.

grn_db
Summary
TODO...

Example
TODO...

Reference
TODO...

grn_db TODO...

grn_db_create_optarg
It is used for specifying options for grn_db_create().

char **grn_db_create_optarg.builtin_type_names
組み込み型の名前となるnul終端文字列の配列を指定する。

int grn_db_create_optarg.n_builtin_type_names
n_builtin_type_namesには、optarg.builtin_type_namesで指定する文字列の数を
指定する。配列のoffsetはenum型grn_builtin_typeの値に対応する。

grn_obj *grn_db_create(grn_ctx *ctx, const char *path, grn_db_create_optarg *optarg)
新たなdbを作成します。

Parameters

· ctx -- 初期化済みの grn_ctx を指定します。

· path -- 作成するdbを格納するファイルパスを指定します。NULLならtemporary
dbとなります。NULL以外のパスを指定した場合はpersistent dbとなります。

· optarg --

Currently, it is not used. It is just ignored.

作成するdbの組み込み型の名前を変更する時に指定します。

optarg.builtin_type_namesには、組み込み型の名前となるnull終端文字列の配列を指定します。optarg.n_builtin_type_namesには、optarg.builtin_type_namesで指定する文字列の数を指定します。配列のoffsetはenum型grn_builtin_typeの値に対応します。

grn_obj *grn_db_open(grn_ctx *ctx, const char *path)
既存のdbを開きます。

Parameters

· path -- 開こうとするdbを格納するファイルパスを指定します。

void grn_db_touch(grn_ctx *ctx, grn_obj *db)
dbの内容の最終更新時刻を現在時刻にします。

最終更新時刻はキャッシュが有効かどうかの判断などに利用されます。

Parameters

· db -- 内容が変更されたdbを指定します。

grn_obj *grn_obj_db(grn_ctx *ctx, grn_obj *obj)
objの属するdbを返します。

Parameters

· obj -- 対象objectを指定します。

grn_rc grn_db_recover(grn_ctx *ctx, grn_obj *db)

NOTE:
This is an experimental API.

NOTE:
This is a dangerous API. You must not use this API when other thread or process
opens the target database. If you use this API against shared database, the
database may be broken.

New in version 4.0.9.

Checks the passed database and recovers it if it is broken and it can be recovered.

This API uses lock existence for checking whether the database is broken or not.

Here are recoverable cases:

· Index column is broken. The index column must have source column.

Here are unrecoverable cases:

· Object name management feature is broken.

· Table is broken.

· Data column is broken.

Object name management feature is used for managing table name, column name and so
on. If the feature is broken, the database can't be recovered. Please re-create the
database from backup.

Table and data column can be recovered by removing an existence lock and re-add
data.

Parameters

· db -- The database to be recovered.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_rc grn_db_unmap(grn_ctx *ctx, grn_obj *db)

NOTE:
This is an experimental API.

NOTE:
This is a thread unsafe API. You can't touch the database while this API is
running.

New in version 5.0.7.

Unmaps all opened tables and columns in the passed database. Resources used by
these opened tables and columns are freed.

Normally, this API isn't useless. Because resources used by opened tables and
columns are managed by OS automatically.

Parameters

· db -- The database to be recovered.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_encoding
Summary
TODO...

Example
TODO...

Reference
grn_encoding
TODO...

grn_encoding grn_get_default_encoding(void)
デフォルトのencodingを返します。

grn_rc grn_set_default_encoding(grn_encoding encoding)
デフォルトのencodingを変更します。

Parameters

· encoding -- 変更後のデフォルトのencodingを指定します。

const char *grn_encoding_to_string(grn_encoding encoding)
Returns string representation for the encoding. For example,
'grn_encoding_to_string(GRN_ENC_UTF8)' returns '"utf8"'.

"unknown" is returned for invalid encoding.

Parameters

· encoding -- The encoding.

grn_encoding grn_encoding_parse(const char *name)
Parses encoding name and returns grn_encoding. For example,
'grn_encoding_parse("UTF8")' returns 'GRN_ENC_UTF8'.

GRN_ENC_UTF8 is returned for invalid encoding name.

Parameters

· name -- The encoding name.

grn_expr
grn_expr is an grn_obj that represents an expression. Here is a list of what expression
can do:

· Expression can apply some operations to a record by grn_expr_exec().

· Expression can represents search condition. grn_table_select() can select records
that match against the search condition represented by expression.

There are two string representations of expression:

· /reference/grn_expr/query_syntax

· /reference/grn_expr/script_syntax

grn_expr_parse() parses string represented expression and appends the parsed expression to
another expression.

Example
TODO...

Reference
GRN_API grn_obj *grn_expr_create(grn_ctx *ctx, const char *name, unsigned int name_size)

GRN_API grn_rc grn_expr_close(grn_ctx *ctx, grn_obj *expr)

GRN_API grn_obj *grn_expr_add_var(grn_ctx *ctx, grn_obj *expr, const char *name, unsigned
int name_size)

GRN_API grn_obj *grn_expr_get_var_by_offset(grn_ctx *ctx, grn_obj *expr, unsigned
int offset)

GRN_API grn_obj *grn_expr_append_obj(grn_ctx *ctx, grn_obj *expr, grn_obj *obj,
grn_operator op, int nargs);

GRN_API grn_obj *grn_expr_append_const(grn_ctx *ctx, grn_obj *expr, grn_obj *obj,
grn_operator op, int nargs)

GRN_API grn_obj *grn_expr_append_const_str(grn_ctx *ctx, grn_obj *expr, const char *str,
unsigned int str_size, grn_operator op, int nargs)

GRN_API grn_obj *grn_expr_append_const_int(grn_ctx *ctx, grn_obj *expr, int i,
grn_operator op, int nargs)

GRN_API grn_rc grn_expr_append_op(grn_ctx *ctx, grn_obj *expr, grn_operator op, int nargs)

grn_rc grn_expr_get_keywords(grn_ctx *ctx, grn_obj *expr, grn_obj *keywords)
Extracts keywords from expr and stores to keywords. Keywords in keywords are owned
by expr. Don't unlink them. Each keyword is GRN_BULK and its domain is GRN_DB_TEXT.

keywords must be GRN_PVECTOR.

Here is an example code:

grn_obj keywords;
GRN_PTR_INIT(&keywords, GRN_OBJ_VECTOR, GRN_ID_NIL);
grn_expr_get_keywords(ctx, expr, &keywords);
{
int i, n_keywords;
n_keywords = GRN_BULK_VSIZE(&keywords) / sizeof(grn_obj *);
for (i = 0; i < n_keywords; i++) {
grn_obj *keyword = GRN_PTR_VALUE_AT(&keywords, i);
const char *keyword_content;
int keyword_size;
keyword_content = GRN_TEXT_VALUE(keyword);
keyword_size = GRN_TEXT_LEN(keyword);
/*
Use keyword_content and keyword_size.
You don't need to unlink keyword.
keyword is owned by expr.
*/
}
}
GRN_OBJ_FIN(ctx, &keywords);

Parameters

· ctx -- The context that creates the expr.

· expr -- The expression to be extracted.

· keywords --

The container to store extracted keywords. It must be GRN_PVECTOR.

Each extracted keyword is GRN_BULK and its domain is GRN_DB_TEXT.

Extracted keywords are owned by expr. Don't unlink them.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_rc grn_expr_syntax_escape(grn_ctx *ctx, const char *string, int string_size, const
char *target_characters, char escape_character, grn_obj *escaped_string)
Escapes target_characters in string by escape_character.

Parameters

· ctx -- Its encoding must be the same encoding of string. It is used for
allocating buffer for escaped_string.

· string -- String expression representation.

· string_size -- The byte size of string. -1 means string is NULL terminated
string.

· target_characters -- NULL terminated escape target characters. For
example, "+-><~*()\"\\:" is target_characters for
/reference/grn_expr/query_syntax.

· escape_character -- The character to use escape a character in
target_characters. For example, \\ (backslash) is escaped_character for
/reference/grn_expr/query_syntax.

· escaped_string -- The output of escaped string. It should be text typed
bulk.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_rc grn_expr_syntax_escape_query(grn_ctx *ctx, const char *query, int query_size,
grn_obj *escaped_query)
Escapes special characters in /reference/grn_expr/query_syntax.

Parameters

· ctx -- Its encoding must be the same encoding of query. It is used for
allocating buffer for escaped_query.

· query -- String expression representation in
/reference/grn_expr/query_syntax.

· query_size -- The byte size of query. -1 means query is NULL terminated
string.

· escaped_query -- The output of escaped query. It should be text typed
bulk.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

GRN_API grn_rc grn_expr_compile(grn_ctx *ctx, grn_obj *expr)

GRN_API grn_obj *grn_expr_exec(grn_ctx *ctx, grn_obj *expr, int nargs)

GRN_API grn_obj *grn_expr_alloc(grn_ctx *ctx, grn_obj *expr, grn_id domain,
grn_obj_flags flags)

grn_geo
Summary
TODO...

Example
TODO...

Reference
grn_geo_point

grn_rc grn_geo_select_in_rectangle(grn_ctx *ctx, grn_obj *index, grn_obj *top_left_point,
grn_obj *bottom_right_point, grn_obj *res, grn_operator op)
It selects records that are in the rectangle specified by top_left_point parameter
and bottom_right_point parameter. Records are searched by index parameter. Found
records are added to res parameter table with op parameter operation.

Parameters

· index -- the index column for TokyoGeoPoint or WGS84GeoPpoint type.

· top_left_point -- the top left point of the target rectangle. (ShortText,
Text, LongText, TokyoGeoPoint or WGS84GeoPoint)

· bottom_right_point -- the bottom right point of the target rectangle.
(ShortText, Text, LongText, TokyoGeoPoint or WGS84GeoPoint)

· res -- the table to store found record IDs. It must be GRN_TABLE_HASH_KEY
type table.

· op -- the operator for matched records.

int grn_geo_estimate_in_rectangle(grn_ctx *ctx, grn_obj *index, grn_obj *top_left_point,
grn_obj *bottom_right_point)
It estimates number of records in the rectangle specified by top_left_point
parameter and bottom_right_point parameter. Number of records is estimated by index
parameter. If an error is occurred, -1 is returned.

Parameters

· index -- the index column for TokyoGeoPoint or WGS84GeoPpoint type.

· top_left_point -- the top left point of the target rectangle. (ShortText,
Text, LongText, TokyoGeoPoint or WGS84GeoPoint)

· bottom_right_point -- the bottom right point of the target rectangle.
(ShortText, Text, LongText, TokyoGeoPoint or WGS84GeoPoint)

grn_obj *grn_geo_cursor_open_in_rectangle(grn_ctx *ctx, grn_obj *index,
grn_obj *top_left_point, grn_obj *bottom_right_point, int offset, int limit)
It opens a cursor to get records in the rectangle specified by top_left_point
parameter and bottom_right_point parameter.

Parameters

· index -- the index column for TokyoGeoPoint or WGS84GeoPpoint type.

· top_left_point -- the top left point of the target rectangle. (ShortText,
Text, LongText, TokyoGeoPoint or WGS84GeoPoint)

· bottom_right_point -- the bottom right point of the target rectangle.
(ShortText, Text, LongText, TokyoGeoPoint or WGS84GeoPoint)

· offset -- the cursor returns records from offset parameter position.
offset parameter is based on 0.

· limit -- the cursor returns at most limit parameter records. -1 means no
limit.

grn_posting *grn_geo_cursor_next(grn_ctx *ctx, grn_obj *cursor)
It returns the next posting that has record ID. It returns NULL after all records
are returned.

Parameters

· cursor -- the geo cursor.

grn_hook
Summary
TODO...

Example
TODO...

Reference
grn_hook_entry
TODO...

grn_rc grn_obj_add_hook(grn_ctx *ctx, grn_obj *obj, grn_hook_entry entry, int offset,
grn_obj *proc, grn_obj *data)
objに対してhookを追加します。

Parameters

· obj -- 対象objectを指定します。

· entry --

GRN_HOOK_GET は、objectの参照時に呼び出されるhookを定義します。

GRN_HOOK_SET は、objectの更新時に呼び出されるhookを定義します。

GRN_HOOK_SELECT
は、検索処理の実行中に適時呼び出され、処理の実行状況を調べたり、実行の中断を指示することができます。

· offset --

hookの実行順位。offsetに対応するhookの直前に新たなhookを挿入します。

0を指定した場合は先頭に挿入されます。-1を指定した場合は末尾に挿入されます。

objectに複数のhookが定義されている場合は順位の順に呼び出されます。

· proc -- 手続きを指定します。

· data -- hook固有情報を指定します。

int grn_obj_get_nhooks(grn_ctx *ctx, grn_obj *obj, grn_hook_entry entry)
objに定義されているhookの数を返します。

Parameters

· obj -- 対象objectを指定します。

· entry -- hookタイプを指定します。

grn_obj *grn_obj_get_hook(grn_ctx *ctx, grn_obj *obj, grn_hook_entry entry, int offset,
grn_obj *data)
objに定義されているhookの手続き(proc)を返します。hook固有情報が定義されている場合は、その内容をdataにコピーして返します。

Parameters

· obj -- 対象objectを指定します。

· entry -- hookタイプを指定します。

· offset -- 実行順位を指定します。

· data -- hook固有情報格納バッファを指定します。

grn_rc grn_obj_delete_hook(grn_ctx *ctx, grn_obj *obj, grn_hook_entry entry, int offset)
objに定義されているhookを削除します。

Parameters

· obj -- 対象objectを指定します。

· entry -- hookタイプを指定します。

· offset -- 実行順位を指定します。

grn_ii
Summary
buffered index builder

特定のアプリケーション用に準備した内部APIです。

TODO...

Example
TODO...

Reference
grn_ii

grn_ii_buffer

grn_ii_buffer *grn_ii_buffer_open(grn_ctx *ctx, grn_ii *ii, long long unsigned
int update_buffer_size)

grn_rc grn_ii_buffer_append(grn_ctx *ctx, grn_ii_buffer *ii_buffer, grn_id rid, unsigned
int section, grn_obj *value)

grn_rc grn_ii_buffer_commit(grn_ctx *ctx, grn_ii_buffer *ii_buffer)

grn_rc grn_ii_buffer_close(grn_ctx *ctx, grn_ii_buffer *ii_buffer)

grn_index_cursor
Summary
TODO...

Example
TODO...

Reference
grn_obj *grn_index_cursor_open(grn_ctx *ctx, grn_table_cursor *tc, grn_obj *index,
grn_id rid_min, grn_id rid_max, int flags)
grn_table_cursor から取得できるそれぞれのレコードについて、 GRN_OBJ_COLUMN_INDEX
型のカラムの値を順番に取り出すためのカーソルを生成して返します。

rid_min, rid_maxを指定して取得するレコードidの値を制限することができます。

戻り値であるgrn_index_cursorは grn_obj_close() を使って解放します。

Parameters

· tc -- 対象cursorを指定します。

· index -- 対象インデックスカラムを指定します。

· rid_min -- 出力するレコードidの下限を指定します。

· rid_max -- 出力するレコードidの上限を指定します。

grn_posting *grn_index_cursor_next(grn_ctx *ctx, grn_obj *ic, grn_id *tid)
cursorの範囲内のインデックスの値を順番に取り出します。

tidにNULL以外を指定した場合は、index_cursorを作成するときに指定したtable_cursorの現在の対象レコードのidを返します。

戻り値である grn_posting 構造体は解放する必要はありません。

Parameters

· ic -- 対象cursorを指定します。

· tid -- テーブルレコードIDを指定します。

grn_info
Summary
TODO...

Example
TODO...

Reference
grn_info_type
TODO...

grn_obj *grn_obj_get_info(grn_ctx *ctx, grn_obj *obj, grn_info_type type,
grn_obj *valuebuf)
objのtypeに対応する情報をvaluebufに格納します。

Parameters

· obj -- 対象objを指定します。

· type -- 取得する情報の種類を指定します。

· valuebuf -- 値を格納するバッファ(呼出側で準備)を指定します。

grn_rc grn_obj_set_info(grn_ctx *ctx, grn_obj *obj, grn_info_type type, grn_obj *value)
objのtypeに対応する情報をvalueの内容に更新します。

Parameters

· obj -- 対象objを指定します。

· type -- 設定する情報の種類を指定します。

grn_obj *grn_obj_get_element_info(grn_ctx *ctx, grn_obj *obj, grn_id id,
grn_info_type type, grn_obj *value)
objのidに対応するレコードの、typeに対応する情報をvaluebufに格納します。呼出側ではtypeに応じて十分なサイズのバッファを確保しなければいけません。

Parameters

· obj -- 対象objを指定します。

· id -- 対象IDを指定します。

· type -- 取得する情報の種類を指定します。

· value -- 値を格納するバッファ(呼出側で準備)を指定します。

grn_rc grn_obj_set_element_info(grn_ctx *ctx, grn_obj *obj, grn_id id, grn_info_type type,
grn_obj *value)
objのidに対応するレコードのtypeに対応する情報をvalueの内容に更新します。

Parameters

· obj -- 対象objectを指定します。

· id -- 対象IDを指定します。

· type -- 設定する情報の種類を指定します。

· value -- 設定しようとする値を指定します。

grn_match_escalation
Summary
TODO...

Example
TODO...

Reference
long long int grn_ctx_get_match_escalation_threshold(grn_ctx *ctx)
検索の挙動をエスカレーションする閾値を返します。エスカレーションの詳細は検索の仕様に関するドキュメントを参照してください。

grn_rc grn_ctx_set_match_escalation_threshold(grn_ctx *ctx, long long int threshold)
検索の挙動をエスカレーションする閾値を変更します。エスカレーションの詳細は検索の仕様に関するドキュメントを参照してください。

Parameters

· threshold -- 変更後の検索の挙動をエスカレーションする閾値を指定します。

long long int grn_get_default_match_escalation_threshold(void)
デフォルトの検索の挙動をエスカレーションする閾値を返します。エスカレーションの詳細は検索の仕様に関するドキュメントを参照してください。

grn_rc grn_set_default_match_escalation_threshold(long long int threshold)
デフォルトの検索の挙動をエスカレーションする閾値を変更します。エスカレーションの詳細は詳細は検索の仕様に関するドキュメントを参照してください。

Parameters

· threshold --
変更後のデフォルトの検索の挙動をエスカレーションする閾値を指定します。

grn_obj
Summary
TODO...

Example
TODO...

Reference
grn_obj
TODO...

grn_obj *grn_obj_column(grn_ctx *ctx, grn_obj *table, const char *name, unsigned
int name_size)
nameがカラム名の場合、それに対応するtableのカラムを返します。対応するカラムが存在しなければNULLを返します。

nameはアクセサ文字列の場合、それに対応するaccessorを返します。アクセサ文字列とは、カラム名等を'.'で連結した文字列です。'_id',
'_key'は特殊なアクセサで、それぞれレコードID/keyを返します。例) 'col1' /
'col2.col3' / 'col2._id'

Parameters

· table -- 対象tableを指定します。

· name -- カラム名を指定します。

grn_bool grn_obj_is_builtin(grn_ctx *ctx, grn_obj *obj)
Check whether Groonga built-in object.

Parameters

· ctx -- context

· obj -- target object

Returns
GRN_TRUE for built-in groonga object, GRN_FALSE otherwise.

grn_obj *grn_obj_get_value(grn_ctx *ctx, grn_obj *obj, grn_id id, grn_obj *value)
objのIDに対応するレコードのvalueを取得します。valueを戻り値として返します。

Parameters

· obj -- 対象objectを指定します。

· id -- 対象レコードのIDを指定します。

· value -- 値を格納するバッファ(呼出側で準備する)を指定します。

int grn_obj_get_values(grn_ctx *ctx, grn_obj *obj, grn_id offset, void **values)
objに指定されたカラムについて、offsetに指定されたレコードIDを開始位置として、IDが連続するレコードに対応するカラム値が昇順に格納された配列へのポインタをvaluesにセットします。

取得できた件数が戻り値として返されます。エラーが発生した場合は -1 が返されます。

NOTE:
値が固定長であるカラムのみがobjに指定できます。範囲内のIDに対応するレコードが有効であるとは限りません。delete操作を実行したことのあるテーブルに対しては、grn_table_at()
などによって各レコードの存否を別途確認しなければなりません。

Parameters

· obj -- 対象objectを指定します。

· offset -- 値を取得する範囲の開始位置となるレコードIDを指定します。

· values -- 値の配列がセットされます。

grn_rc grn_obj_set_value(grn_ctx *ctx, grn_obj *obj, grn_id id, grn_obj *value, int flags)
objのIDに対応するレコードの値を更新します。対応するレコードが存在しない場合は
GRN_INVALID_ARGUMENT を返します。

Parameters

· obj -- 対象objectを指定します。

· id -- 対象レコードのIDを指定します。

· value -- 格納する値を指定します。

· flags --

以下の値を指定できます。

· GRN_OBJ_SET

· GRN_OBJ_INCR

· GRN_OBJ_DECR

· GRN_OBJ_APPEND

· GRN_OBJ_PREPEND

· GRN_OBJ_GET

· GRN_OBJ_COMPARE

· GRN_OBJ_LOCK

· GRN_OBJ_UNLOCK

GRN_OBJ_SET_MASK

GRN_OBJ_SET
レコードの値をvalueと置き換えます。

GRN_OBJ_INCR
レコードの値にvalueを加算します。

GRN_OBJ_DECR
レコードの値にvalueを減算します。

GRN_OBJ_APPEND
レコードの値の末尾にvalueを追加します。

GRN_OBJ_PREPEND
レコードの値の先頭にvalueを追加します。

GRN_OBJ_GET
新しいレコードの値をvalueにセットします。

GRN_OBJ_COMPARE
レコードの値とvalueが等しいか調べます。

GRN_OBJ_LOCK
当該レコードをロックします。GRN_OBJ_COMPARE
と共に指定された場合は、レコードの値とvalueが等しい場合に限ってロックします。

GRN_OBJ_UNLOCK
当該レコードのロックを解除します。

grn_rc grn_obj_remove(grn_ctx *ctx, grn_obj *obj)
objをメモリから解放し、それが永続オブジェクトであった場合は、該当するファイル一式を削除します。

Parameters

· obj -- 対象objectを指定します。

grn_rc grn_obj_rename(grn_ctx *ctx, grn_obj *obj, const char *name, unsigned
int name_size)
ctxが使用するdbにおいてobjに対応する名前をnameに更新します。objは永続オブジェクトでなければいけません。

Parameters

· obj -- 対象objectを指定します。

· name -- 新しい名前を指定します。

· name_size -- nameパラメータのsize(byte)を指定します。

grn_rc grn_obj_close(grn_ctx *ctx, grn_obj *obj)
一時的なobjectであるobjをメモリから解放します。objに属するobjectも再帰的にメモリから解放されます。

永続的な、table, column,
exprなどは解放してはいけません。一般的には、一時的か永続的かを気にしなくてよい
grn_obj_unlink() を用いるべきです。

Parameters

· obj -- 対象objectを指定します。

grn_rc grn_obj_reinit(grn_ctx *ctx, grn_obj *obj, grn_id domain, unsigned char flags)
objの型を変更します。

objは GRN_OBJ_INIT() マクロなどで初期化済みでなければいけません。

Parameters

· obj -- 対象objectを指定します。

· domain -- 変更後のobjの型を指定します。

· flags -- GRN_OBJ_VECTOR
を指定するとdomain型の値のベクタを格納するオブジェクトになります。

void grn_obj_unlink(grn_ctx *ctx, grn_obj *obj)
objをメモリから解放します。objに属するobjectも再帰的にメモリから解放されます。

const char *grn_obj_path(grn_ctx *ctx, grn_obj *obj)
objに対応するファイルパスを返します。一時objectならNULLを返します。

Parameters

· obj -- 対象objectを指定します。

int grn_obj_name(grn_ctx *ctx, grn_obj *obj, char *namebuf, int buf_size)
objの名前の長さを返します。無名objectなら0を返します。

名前付きのobjectであり、buf_sizeの長さが名前の長以上であった場合は、namebufに該当する名前をコピーします。

Parameters

· obj -- 対象objectを指定します。

· namebuf -- 名前を格納するバッファ(呼出側で準備する)を指定します。

· buf_size -- namebufのサイズ(byte長)を指定します。

grn_id grn_obj_get_range(grn_ctx *ctx, grn_obj *obj)
objパラメータのとる値の範囲を表わしているオブジェクトのIDを返します。例えば、grn_builtin_type
にある GRN_DB_INT などを返します。

Parameters

· obj -- 対象objectを指定します。

int grn_obj_expire(grn_ctx *ctx, grn_obj *obj, int threshold)
objの占有するメモリのうち、可能な領域をthresholdを指標として解放します。

Parameters

· obj -- 対象objectを指定します。

int grn_obj_check(grn_ctx *ctx, grn_obj *obj)
objに対応するファイルの整合性を検査します。

Parameters

· obj -- 対象objectを指定します。

grn_rc grn_obj_lock(grn_ctx *ctx, grn_obj *obj, grn_id id, int timeout)
objをlockします。timeout(秒)経過してもlockを取得できない場合は
GRN_RESOURCE_DEADLOCK_AVOIDED を返します。

Parameters

· obj -- 対象objectを指定します。

grn_rc grn_obj_unlock(grn_ctx *ctx, grn_obj *obj, grn_id id)
objをunlockします。

Parameters

· obj -- 対象objectを指定します。

grn_rc grn_obj_clear_lock(grn_ctx *ctx, grn_obj *obj)
強制的にロックをクリアします。

Parameters

· obj -- 対象objectを指定します。

unsigned int grn_obj_is_locked(grn_ctx *ctx, grn_obj *obj)
objが現在lockされていれば0以外の値を返します。

Parameters

· obj -- 対象objectを指定します。

int grn_obj_defrag(grn_ctx *ctx, grn_obj *obj, int threshold)
objの占有するDBファイル領域のうち、可能な領域をthresholdを指標としてフラグメントの解消を行います。

フラグメント解消が実行されたセグメントの数を返します。

Parameters

· obj -- 対象objectを指定します。

grn_id grn_obj_id(grn_ctx *ctx, grn_obj *obj)
objのidを返します。

Parameters

· obj -- 対象objectを指定します。

grn_rc grn_obj_delete_by_id(grn_ctx *ctx, grn_obj *db, grn_id id, grn_bool removep)
dbからidに対応するテーブルやカラムなどを削除します。mroonga向けに用意した内部APIです。

Parameters

· db -- The target database.

· id -- The object (table, column and so on) ID to be deleted.

· removep -- If GRN_TRUE, clear object cache and remove relation between ID
and key in database. Otherwise, just clear object cache.

grn_rc grn_obj_path_by_id(grn_ctx *ctx, grn_obj *db, grn_id id, char *buffer)
dbのidに対応するpathを返します。mroonga向けに用意した内部APIです。

Parameters

· db -- The target database.

· id -- The object (table, column and so on) ID to be deleted.

· buffer -- path string corresponding to the id will be set in this buffer.

grn_rc grn_obj_cast_by_id(grn_ctx *ctx, grn_obj *source, grn_obj *destination,
grn_bool add_record_if_not_exist)
It casts value of source to value with type of destination. Casted value is
appended to destination.

Both source and destination must be bulk.

If destination is a reference type bulk. (Reference type bulk means that type of
destination is a table.) add_record_if_not_exist is used. If source value doesn't
exist in the table that is a type of destination. The source value is added to the
table.

Parameters

· ctx -- The context object.

· source -- The bulk to be casted.

· destination -- The bulk to specify cast target type and store casted
value.

· add_record_if_not_exist -- Whether adding a new record if source value
doesn't exist in cast target table. This parameter is only used when
destination is a reference type bulk.

Returns
GRN_SUCCESS on success, not GRN_SUCCESS on error.

grn_proc
Summary
TODO...

Example
TODO...

Reference
grn_proc_type
TODO...

grn_proc_func
TODO...

grn_obj *grn_proc_create(grn_ctx *ctx, const char *name, int name_size,
grn_proc_type type, grn_proc_func *init, grn_proc_func *next, grn_proc_func *fin, unsigned
int nvars, grn_expr_var *vars)
nameに対応する新たなproc(手続き)をctxが使用するdbに定義します。

Parameters

· name -- 作成するprocの名前を指定します。

· name_size -- The number of bytes of name parameter. If negative value is
specified, name parameter is assumed that NULL-terminated string.

· type -- procの種類を指定します。

· init -- 初期化関数のポインタを指定します。

· next -- 実処理関数のポインタを指定します。

· fin -- 終了関数のポインタを指定します。

· nvars -- procで使用する変数の数を指定します。

· vars -- procで使用する変数の定義を指定します。( grn_expr_var 構造体の配列)

grn_obj *grn_proc_get_info(grn_ctx *ctx, grn_user_data *user_data, grn_expr_var **vars,
unsigned int *nvars, grn_obj **caller)
user_dataをキーとして、現在実行中の grn_proc_func 関数および定義されている変数(
grn_expr_var )の配列とその数を取得します。

Parameters

· user_data -- grn_proc_func に渡されたuser_dataを指定します。

· nvars -- 変数の数を取得します。

grn_rc grn_obj_set_finalizer(grn_ctx *ctx, grn_obj *obj, grn_proc_func *func)
objectを破棄するときに呼ばれる関数を設定します。

table, column, proc, exprのみ設定可能です。

Parameters

· obj -- 対象objectを指定します。

· func -- objectを破棄するときに呼ばれる関数を指定します。

grn_search
Summary
TODO...

Example
TODO...

Reference
grn_search_optarg

grn_rc grn_obj_search(grn_ctx *ctx, grn_obj *obj, grn_obj *query, grn_obj *res,
grn_operator op, grn_search_optarg *optarg)
objを対象としてqueryにマッチするレコードを検索し、opの指定に従ってresにレコードを追加あるいは削除します。

Parameters

· obj -- 検索対象のobjectを指定します。

· query -- 検索クエリを指定します。

· res -- 検索結果を格納するテーブルを指定します。

· op -- GRN_OP_OR, GRN_OP_AND, GRN_OP_AND_NOT, GRN_OP_ADJUST
のいずれかを指定します。

· optarg -- 詳細検索条件を指定します。

grn_table
Summary
TODO...

Example
TODO...

Reference
grn_obj *grn_table_create(grn_ctx *ctx, const char *name, unsigned int name_size, const
char *path, grn_obj_flags flags, grn_obj *key_type, grn_obj *value_type)
nameパラメータに対応する新たなtableをctxが使用するdbに定義します。

Parameters

· name --

作成するtableの名前を指定します。NULLなら無名tableとなります。

persistent dbに対して名前をありのtableを作成するときには、flagsに
GRN_OBJ_PERSISTENT が指定されていなけれなりません。

· path -- 作成するtableのファイルパスを指定します。 flagsに
GRN_OBJ_PERSISTENT が指定されている場合のみ有効です。
NULLなら自動的にファイルパスが付与されます。

· flags --

GRN_OBJ_PERSISTENT を指定すると永続tableとなります。

GRN_OBJ_TABLE_PAT_KEY, GRN_OBJ_TABLE_HASH_KEY, GRN_OBJ_TABLE_NO_KEY
のいずれかを指定します。

GRN_OBJ_KEY_NORMALIZE を指定すると正規化された文字列がkeyとなります。

GRN_OBJ_KEY_WITH_SIS
を指定するとkey文字列の全suffixが自動的に登録されます。

· key_type --

keyの型を指定します。GRN_OBJ_TABLE_NO_KEY が指定された場合は無効です。
既存のtypeあるいはtableを指定できます。

key_typeにtable Aを指定してtable
Bを作成した場合、Bは必ずAのサブセットとなります。

· value_type -- keyに対応する値を格納する領域の型を指定します。
tableはcolumnとは別に、keyに対応する値を格納する領域を一つだけ持つことができます。

grn_id grn_table_add(grn_ctx *ctx, grn_obj *table, const void *key, unsigned int key_size,
int *added)
keyに対応する新しいrecordをtableに追加し、そのIDを返します。keyに対応するrecordがすでにtableに存在するならば、そのrecordのIDを返します。

GRN_OBJ_TABLE_NO_KEY が指定されたtableでは、key, key_size は無視されます。

Parameters

· table -- 対象tableを指定します。

· key -- 検索keyを指定します。

· added --
NULL以外の値が指定された場合、新たにrecordが追加された時には1が、既存recordだった時には0がセットされます。

grn_id grn_table_get(grn_ctx *ctx, grn_obj *table, const void *key, unsigned int key_size)
It finds a record that has key parameter and returns ID of the found record. If
table parameter is a database, it finds an object (table, column and so on) that
has key parameter and returns ID of the found object.

Parameters

· table -- The table or database.

· key -- The record or object key to be found.

grn_id grn_table_at(grn_ctx *ctx, grn_obj *table, grn_id id)
tableにidに対応するrecordが存在するか確認し、存在すれば指定されたIDを、存在しなければ
GRN_ID_NIL を返します。

注意: 実行には相応のコストがかかるのであまり頻繁に呼ばないようにして下さい。

Parameters

· table -- 対象tableを指定します。

· id -- 検索idを指定します。

grn_id grn_table_lcp_search(grn_ctx *ctx, grn_obj *table, const void *key, unsigned
int key_size)
tableが GRN_TABLE_PAT_KEY もしくは GRN_TABLE_DAT_KEY
を指定して作ったtableなら、longest common prefix
searchを行い、対応するIDを返します。

tableが GRN_TABLE_HASH_KEY
を指定して作ったtableなら、完全に一致するキーを検索し、対応するIDを返します。

Parameters

· table -- 対象tableを指定します。

· key -- 検索keyを指定します。

int grn_table_get_key(grn_ctx *ctx, grn_obj *table, grn_id id, void *keybuf, int buf_size)
tableのIDに対応するレコードのkeyを取得します。

対応するレコードが存在する場合はkey長を返します。見つからない場合は0を返します。対応するキーの検索に成功し、またbuf_sizeの長さがkey長以上であった場合は、keybufに該当するkeyをコピーします。

Parameters

· table -- 対象tableを指定します。

· id -- 対象レコードのIDを指定します。

· keybuf -- keyを格納するバッファ(呼出側で準備する)を指定します。

· buf_size -- keybufのサイズ(byte長)を指定します。

grn_rc grn_table_delete(grn_ctx *ctx, grn_obj *table, const void *key, unsigned
int key_size)
tableのkeyに対応するレコードを削除します。対応するレコードが存在しない場合は
GRN_INVALID_ARGUMENT を返します。

Parameters

· table -- 対象tableを指定します。

· key -- 検索keyを指定します。

· key_size -- 検索keyのサイズを指定します。

grn_rc grn_table_delete_by_id(grn_ctx *ctx, grn_obj *table, grn_id id)
tableのidに対応するレコードを削除します。対応するレコードが存在しない場合は
GRN_INVALID_ARGUMENT を返します。

Parameters

· table -- 対象tableを指定します。

· id -- レコードIDを指定します。

grn_rc grn_table_update_by_id(grn_ctx *ctx, grn_obj *table, grn_id id, const
void *dest_key, unsigned int dest_key_size)
tableのidに対応するレコードのkeyを変更します。新しいkeyとそのbyte長をdest_keyとdest_key_sizeに指定します。

この操作は、GRN_TABLE_DAT_KEY 型のテーブルのみ使用できます。

Parameters

· table -- 対象tableを指定します。

· id -- レコードIDを指定します。

grn_rc grn_table_update(grn_ctx *ctx, grn_obj *table, const void *src_key, unsigned
int src_key_size, const void *dest_key, unsigned int dest_key_size)
tableのsrc_keyに対応するレコードのkeyを変更します。新しいkeyとそのbyte長をdest_keyとdest_key_sizeに指定します。

この操作は、GRN_TABLE_DAT_KEY 型のテーブルのみ使用できます。

Parameters

· table -- 対象tableを指定します。

· src_key -- 対象レコードのkeyを指定します。

· src_key_size -- 対象レコードのkeyの長さ(byte)を指定します。

· dest_key -- 変更後のkeyを指定します。

· dest_key_size -- 変更後のkeyの長さ(byte)を指定します。

grn_rc grn_table_truncate(grn_ctx *ctx, grn_obj *table)
tableの全レコードを一括して削除します。

注意:
multithread環境では他のthreadのアクセスによって、存在しないアドレスへアクセスし、SIGSEGVが発生する可能性があります。

Parameters

· table -- 対象tableを指定します。

grn_table_sort_key
TODO...

grn_table_sort_flags
TODO...

int grn_table_sort(grn_ctx *ctx, grn_obj *table, int offset, int limit, grn_obj *result,
grn_table_sort_key *keys, int n_keys)
table内のレコードをソートし、上位limit個の要素をresultに格納します。

keys.keyには、tableのcolumn,accessor,procのいずれかが指定できます。keys.flagsには、GRN_TABLE_SORT_ASC
/ GRN_TABLE_SORT_DESC のいずれかを指定できます。GRN_TABLE_SORT_ASC
では昇順、GRN_TABLE_SORT_DESC
では降順でソートされます。keys.offsetは、内部利用のためのメンバです。

Parameters

· table -- 対象tableを指定します。

· offset --
sortされたレコードのうち、(0ベースで)offset番目から順にresにレコードを格納します。

· limit -- resに格納するレコードの上限を指定します。

· result -- 結果を格納するtableを指定します。

· keys -- ソートキー配列へのポインタを指定します。

· n_keys -- ソートキー配列のサイズを指定します。

grn_table_group_result
TODO...

grn_table_group_flags
TODO...

grn_rc grn_table_group(grn_ctx *ctx, grn_obj *table, grn_table_sort_key *keys, int n_keys,
grn_table_group_result *results, int n_results)
tableのレコードを特定の条件でグループ化します。

Parameters

· table -- 対象tableを指定します。

· keys -- group化キー構造体の配列へのポインタを指定します。

· n_keys -- group化キー構造体の配列のサイズを指定します。

· results -- group化の結果を格納する構造体の配列へのポインタを指定します。

· n_results -- group化の結果を格納する構造体の配列のサイズを指定します。

grn_rc grn_table_setoperation(grn_ctx *ctx, grn_obj *table1, grn_obj *table2,
grn_obj *res, grn_operator op)
table1とtable2をopの指定に従って集合演算した結果をresに格納します。

resにtable1あるいはtable2そのものを指定した場合を除けば、table1,
table2は破壊されません。

Parameters

· table1 -- 対象table1を指定します。

· table2 -- 対象table2を指定します。

· res -- 結果を格納するtableを指定します。

· op -- 実行する演算の種類を指定します。

grn_rc grn_table_difference(grn_ctx *ctx, grn_obj *table1, grn_obj *table2, grn_obj *res1,
grn_obj *res2)
table1とtable2から重複するレコードを取り除いた結果をそれぞれres1,
res2に格納します。

Parameters

· table1 -- 対象table1を指定します。

· table2 -- 対象table2を指定します。

· res1 -- 結果を格納するtableを指定します。

· res2 -- 結果を格納するtableを指定します。

int grn_table_columns(grn_ctx *ctx, grn_obj *table, const char *name, unsigned
int name_size, grn_obj *res)
nameパラメータから始まるtableのカラムIDをresパラメータに格納します。name_sizeパラメータが0の場合はすべてのカラムIDを格納します。

Parameters

· table -- 対象tableを指定します。

· name -- 取得したいカラム名のprefixを指定します。

· name_size -- nameパラメータの長さを指定します。

· res -- 結果を格納する GRN_TABLE_HASH_KEY のtableを指定します。

Returns
格納したカラムIDの数を返します。

unsigned int grn_table_size(grn_ctx *ctx, grn_obj *table)
tableに登録されているレコードの件数を返します。

Parameters

· table -- 対象tableを指定します。

grn_rc grn_table_rename(grn_ctx *ctx, grn_obj *table, const char *name, unsigned
int name_size)
ctxが使用するdbにおいてtableに対応する名前をnameに更新します。tableの全てのcolumnも同時に名前が変更されます。tableは永続オブジェクトでなければいけません。

Parameters

· name_size -- nameパラメータのsize(byte)を指定します。

grn_table_cursor
Summary
TODO...

Example
TODO...

Reference
grn_table_cursor
TODO...

grn_table_cursor *grn_table_cursor_open(grn_ctx *ctx, grn_obj *table, const void *min,
unsigned int min_size, const void *max, unsigned int max_size, int offset, int limit,
int flags)
tableに登録されているレコードを順番に取り出すためのカーソルを生成して返します。

Parameters

· table -- 対象tableを指定します。

· min -- keyの下限を指定します。(NULLは下限なしと見なします。)
GRN_CURSOR_PREFIX については後述。

· min_size -- minのsizeを指定します。GRN_CURSOR_PREFIX については後述。

· max -- keyの上限を指定します。(NULLは上限なしと見なします。)
GRN_CURSOR_PREFIX については後述。

· max_size -- maxのsizeを指定します。GRN_CURSOR_PREFIX
については無視される場合があります。

· flags --

GRN_CURSOR_ASCENDING を指定すると昇順にレコードを取り出します。

GRN_CURSOR_DESCENDING を指定すると降順にレコードを取り出します。(下記
GRN_CURSOR_PREFIX
を指定し、keyが近いレコードを取得する場合、もしくは、common prefix
searchを行う場合には、GRN_CURSOR_ASCENDING / GRN_CURSOR_DESCENDING
は無視されます。)

GRN_CURSOR_GT
を指定するとminに一致したkeyをcursorの範囲に含みません。(minがNULLの場合もしくは、下記
GRN_CURSOR_PREFIX
を指定し、keyが近いレコードを取得する場合、もしくは、common prefix
searchを行う場合には、GRN_CURSOR_GT は無視されます。)

GRN_CURSOR_LT
を指定するとmaxに一致したkeyをcursorの範囲に含みません。(maxがNULLの場合もしくは、下記
GRN_CURSOR_PREFIX を指定した場合には、GRN_CURSOR_LT は無視されます。)

GRN_CURSOR_BY_ID を指定するとID順にレコードを取り出します。(下記
GRN_CURSOR_PREFIX を指定した場合には、GRN_CURSOR_BY_ID は無視されます。)
GRN_OBJ_TABLE_PAT_KEY を指定したtableについては、GRN_CURSOR_BY_KEY
を指定するとkey順にレコードを取り出します。( GRN_OBJ_TABLE_HASH_KEY ,
GRN_OBJ_TABLE_NO_KEY を指定したテーブルでは GRN_CURSOR_BY_KEY
は無視されます。)

GRN_CURSOR_PREFIX を指定すると、 GRN_OBJ_TABLE_PAT_KEY
を指定したテーブルに関する下記のレコードを取り出すカーソルが作成されます。maxがNULLの場合には、keyがminと前方一致するレコードを取り出します。max_sizeパラメータは無視されます。

maxとmax_sizeが指定され、かつ、テーブルのkeyがShortText型である場合、maxとcommon
prefix searchを行い、common
prefixがmin_sizeバイト以上のレコードを取り出します。minは無視されます。

maxとmax_sizeが指定され、かつ、テーブルのkeyが固定長型の場合、maxとPAT木上で近い位置にあるノードから順番にレコードを取り出します。ただし、keyのパトリシア木で、min_sizeバイト未満のビットに対するノードで、maxと異なった方向にあるノードに対応するレコードについては取り出しません。PAT木上で位置が近いこととkeyの値が近いことは同一ではありません。この場合、maxで与えられるポインタが指す値は、対象テーブルのkeyサイズと同じか超える幅である必要があります。minは無視されます。

GRN_CURSOR_BY_ID / GRN_CURSOR_BY_KEY / GRN_CURSOR_PREFIX
の3フラグは、同時に指定することができません。

GRN_OBJ_TABLE_PAT_KEY を指定して作ったテーブルで、GRN_CURSOR_PREFIX
GRN_CURSOR_RK
を指定すると、半角小文字のアルファベット文字列から、それを旧JIS X
4063:2000規格に従って全角カタカナに変換した文字列に前方一致する値をkeyとするレコードを取り出します。GRN_ENC_UTF8
のみをサポートしています。GRN_CURSOR_ASCENDING / GRN_CURSOR_DESCENDING
は無効であり、レコードをkey値の昇降順で取り出すことはできません。

· offset --

該当する範囲のレコードのうち、(0ベースで)offset番目からレコードを取り出します。

GRN_CURSOR_PREFIX を指定したときは負の数を指定することはできません。

· limit --

該当する範囲のレコードのうち、limit件のみを取り出します。-1が指定された場合は、全件が指定されたものとみなします。

GRN_CURSOR_PREFIX
を指定したときは-1より小さい負の数を指定することはできません。

grn_rc grn_table_cursor_close(grn_ctx *ctx, grn_table_cursor *tc)
grn_table_cursor_open() で生成したcursorを解放します。

Parameters

· tc -- 対象cursorを指定します。

grn_id grn_table_cursor_next(grn_ctx *ctx, grn_table_cursor *tc)
cursorのカレントレコードを一件進めてそのIDを返します。cursorの対象範囲の末尾に達すると
GRN_ID_NIL を返します。

Parameters

· tc -- 対象cursorを指定します。

int grn_table_cursor_get_key(grn_ctx *ctx, grn_table_cursor *tc, void **key)
cursorのカレントレコードのkeyをkeyパラメータにセットし、その長さを返します。

Parameters

· tc -- 対象cursorを指定します。

· key -- カレントレコードのkeyへのポインタがセットされます。

int grn_table_cursor_get_value(grn_ctx *ctx, grn_table_cursor *tc, void **value)
cursorパラメータのカレントレコードのvalueをvalueパラメータにセットし、その長さを返します。

Parameters

· tc -- 対象cursorを指定します。

· value -- カレントレコードのvalueへのポインタがセットされます。

grn_rc grn_table_cursor_set_value(grn_ctx *ctx, grn_table_cursor *tc, const void *value,
int flags)
cursorのカレントレコードのvalueを引数の内容に置き換えます。cursorのカレントレコードが存在しない場合は
GRN_INVALID_ARGUMENT を返します。

Parameters

· tc -- 対象cursorを指定します。

· value -- 新しいvalueの値を指定します。

· flags -- grn_obj_set_value() のflagsと同様の値を指定できます。

grn_rc grn_table_cursor_delete(grn_ctx *ctx, grn_table_cursor *tc)
cursorのカレントレコードを削除します。cursorのカレントレコードが存在しない場合は
GRN_INVALID_ARGUMENT を返します。

Parameters

· tc -- 対象cursorを指定します。

grn_obj *grn_table_cursor_table(grn_ctx *ctx, grn_table_cursor *tc)
cursorが属するtableを返します。

Parameters

· tc -- 対象cursorを指定します。

grn_thread_*
Summary
Groonga provides thread related APIs with grn_thread_ prefix.

Normally, you don't need to use these APIs.

You may want to use these APIs when you write a Groonga server.

Example
Here is a real word use case of grn_thread_* APIs by /reference/executables/groonga.
/reference/executables/groonga increases its thread pool size when the max number of
threads is increased. /reference/executables/groonga decreases its thread pool size and
stops too many threads when the max number of threads is decreased.

static grn_mutex q_mutex;
static grn_cond q_cond;
static uint32_t nfthreads;
static uint32_t max_nfthreads;

static uint32_t
groonga_get_thread_limit(void *data)
{
return max_nfthreads;
}

static void
groonga_set_thread_limit(uint32_t new_limit, void *data)
{
uint32_t i;
uint32_t current_nfthreads;

MUTEX_LOCK(q_mutex);
current_nfthreads = nfthreads;
max_nfthreads = new_limit;
MUTEX_UNLOCK(q_mutex);

if (current_nfthreads > new_limit) {
for (i = 0; i < current_nfthreads; i++) {
MUTEX_LOCK(q_mutex);
COND_SIGNAL(q_cond);
MUTEX_UNLOCK(q_mutex);
}
}
}

int
main(int argc, char *argv)
{
/* ... */
grn_thread_set_get_limit_func(groonga_get_thread_limit, NULL);
grn_thread_set_set_limit_func(groonga_set_thread_limit, NULL);

grn_init();

/* ... */
}

Reference
uint32_t (*grn_thread_get_limit_func)(void *data)
It's the type of function that returns the max number of threads.

void (*grn_thread_set_limit_func)(uint32_t new_limit, void *data)
It's the type of function that sets the max number of threads.

uint32_t grn_thread_get_limit(void)
It returns the max number of threads.

If grn_thread_get_limit_func isn't set by grn_thread_set_get_limit_func(), it
always returns 0.

Returns
The max number of threads or 0.

void_t grn_thread_set_limit(uint32_t new_limit)
It sets the max number of threads.

If grn_thread_set_limit_func isn't set by grn_thread_set_set_limit_func(), it does
nothing.

Parameters

· new_limit -- The new max number of threads.

void grn_thread_set_get_limit_func(grn_thread_get_limit_func func, void *data)
It sets the custom function that returns the max number of threads.

data is passed to func when func is called from grn_thread_get_limit().

Parameters

· func -- The custom function that returns the max number of threads.

· data -- An user data to be passed to func when func is called.

void grn_thread_set_set_limit_func(grn_thread_set_limit_func func, void *data)
It sets the custom function that sets the max number of threads.

data is passed to func when func is called from grn_thread_set_limit().

Parameters

· func -- The custom function that sets the max number of threads.

· data -- An user data to be passed to func when func is called.

grn_type
Summary
TODO...

Example
TODO...

Reference
grn_builtin_type
TODO...

grn_obj *grn_type_create(grn_ctx *ctx, const char *name, unsigned int name_size,
grn_obj_flags flags, unsigned int size)
nameに対応する新たなtype(型)をdbに定義します。

Parameters

· name -- 作成するtypeの名前を指定します。

· flags -- GRN_OBJ_KEY_VAR_SIZE, GRN_OBJ_KEY_FLOAT, GRN_OBJ_KEY_INT,
GRN_OBJ_KEY_UINT のいずれかを指定します。

· size -- GRN_OBJ_KEY_VAR_SIZE
の場合は最大長、それ以外の場合は長さ(単位:byte)を指定します。

grn_user_data
Summary
TODO...

Example
TODO...

Reference
grn_user_data
TODO...

grn_user_data *grn_obj_user_data(grn_ctx *ctx, grn_obj *obj)
objectに登録できるユーザデータへのポインタを返します。table, column, proc,
exprのみ使用可能です。

Parameters

· obj -- 対象objectを指定します。

SPECIFICATION


GQTP
GQTP is the acronym of Groonga Query Transfer Protocol. GQTP is the original protocol for
groonga.

Protocol
GQTP is stateful client server model protocol. The following sequence is one processing
unit:

· Client sends a request

· Server receives the request

· Server processes the request

· Server sends a response

· Client receives the response

You can do zero or more processing units in a session.

Both request and response consist of GQTP header and body. GQTP header is fixed size data.
Body is variable size data and its size is stored in GQTP header. The content of body
isn't defined in GQTP.

GQTP header
GQTP header consists of the following unsigned integer values:

┌───────────┬───────┬───────────────────────┐
│Name │ Size │ Description │
├───────────┼───────┼───────────────────────┤
protocol │ 1byte │ Protocol type. │
├───────────┼───────┼───────────────────────┤
query_type │ 1byte │ Content type of body. │
├───────────┼───────┼───────────────────────┤
key_length │ 2byte │ Not used. │
├───────────┼───────┼───────────────────────┤
level │ 1byte │ Not used. │
├───────────┼───────┼───────────────────────┤
flags │ 1byte │ Flags. │
├───────────┼───────┼───────────────────────┤
status │ 2byte │ Return code. │
├───────────┼───────┼───────────────────────┤
size │ 4byte │ Body size. │
├───────────┼───────┼───────────────────────┤
opaque │ 4byte │ Not used. │
├───────────┼───────┼───────────────────────┤
cas │ 8byte │ Not used. │
└───────────┴───────┴───────────────────────┘

All header values are encoded by network byte order.

The following sections describes available values of each header value.

The total size of GQTP header is 24byte.

protocol
The value is always 0xc7 in both request and response GQTP header.

query_type
The value is one of the following values:

┌────────┬───────┬───────────────────────┐
│Name │ Value │ Description │
├────────┼───────┼───────────────────────┤
NONE │ 0 │ Free format. │
├────────┼───────┼───────────────────────┤
TSV │ 1 │ Tab Separated Values. │
├────────┼───────┼───────────────────────┤
JSON │ 2 │ JSON. │
├────────┼───────┼───────────────────────┤
XML │ 3 │ XML. │
├────────┼───────┼───────────────────────┤
MSGPACK │ 4 │ MessagePack. │
└────────┴───────┴───────────────────────┘

This is not used in request GQTP header.

This is used in response GQTP header. Body is formatted as specified type.

flags
The value is bitwise OR of the following values:

┌──────┬───────┬─────────────────────────┐
│Name │ Value │ Description │
├──────┼───────┼─────────────────────────┤
MORE │ 0x01 │ There are more data. │
├──────┼───────┼─────────────────────────┤
TAIL │ 0x02 │ There are no more data. │
├──────┼───────┼─────────────────────────┤
HEAD │ 0x04 │ Not used. │
├──────┼───────┼─────────────────────────┤
QUIET │ 0x08 │ Be quiet. │
├──────┼───────┼─────────────────────────┤
QUIT │ 0x10 │ Quit. │
└──────┴───────┴─────────────────────────┘

You must specify MORE or TAIL flag.

If you use MORE flag, you should also use QUIET flag. Because you don't need to show a
response for your partial request.

Use QUIT flag to quit this session.

status
Here are available values. The new statuses will be added in the future.

· 0: SUCCESS

· 1: END_OF_DATA

· 65535: UNKNOWN_ERROR

· 65534: OPERATION_NOT_PERMITTED

· 65533: NO_SUCH_FILE_OR_DIRECTORY

· 65532: NO_SUCH_PROCESS

· 65531: INTERRUPTED_FUNCTION_CALL

· 65530: INPUT_OUTPUT_ERROR

· 65529: NO_SUCH_DEVICE_OR_ADDRESS

· 65528: ARG_LIST_TOO_LONG

· 65527: EXEC_FORMAT_ERROR

· 65526: BAD_FILE_DESCRIPTOR

· 65525: NO_CHILD_PROCESSES

· 65524: RESOURCE_TEMPORARILY_UNAVAILABLE

· 65523: NOT_ENOUGH_SPACE

· 65522: PERMISSION_DENIED

· 65521: BAD_ADDRESS

· 65520: RESOURCE_BUSY

· 65519: FILE_EXISTS

· 65518: IMPROPER_LINK

· 65517: NO_SUCH_DEVICE

· 65516: NOT_A_DIRECTORY

· 65515: IS_A_DIRECTORY

· 65514: INVALID_ARGUMENT

· 65513: TOO_MANY_OPEN_FILES_IN_SYSTEM

· 65512: TOO_MANY_OPEN_FILES

· 65511: INAPPROPRIATE_I_O_CONTROL_OPERATION

· 65510: FILE_TOO_LARGE

· 65509: NO_SPACE_LEFT_ON_DEVICE

· 65508: INVALID_SEEK

· 65507: READ_ONLY_FILE_SYSTEM

· 65506: TOO_MANY_LINKS

· 65505: BROKEN_PIPE

· 65504: DOMAIN_ERROR

· 65503: RESULT_TOO_LARGE

· 65502: RESOURCE_DEADLOCK_AVOIDED

· 65501: NO_MEMORY_AVAILABLE

· 65500: FILENAME_TOO_LONG

· 65499: NO_LOCKS_AVAILABLE

· 65498: FUNCTION_NOT_IMPLEMENTED

· 65497: DIRECTORY_NOT_EMPTY

· 65496: ILLEGAL_BYTE_SEQUENCE

· 65495: SOCKET_NOT_INITIALIZED

· 65494: OPERATION_WOULD_BLOCK

· 65493: ADDRESS_IS_NOT_AVAILABLE

· 65492: NETWORK_IS_DOWN

· 65491: NO_BUFFER

· 65490: SOCKET_IS_ALREADY_CONNECTED

· 65489: SOCKET_IS_NOT_CONNECTED

· 65488: SOCKET_IS_ALREADY_SHUTDOWNED

· 65487: OPERATION_TIMEOUT

· 65486: CONNECTION_REFUSED

· 65485: RANGE_ERROR

· 65484: TOKENIZER_ERROR

· 65483: FILE_CORRUPT

· 65482: INVALID_FORMAT

· 65481: OBJECT_CORRUPT

· 65480: TOO_MANY_SYMBOLIC_LINKS

· 65479: NOT_SOCKET

· 65478: OPERATION_NOT_SUPPORTED

· 65477: ADDRESS_IS_IN_USE

· 65476: ZLIB_ERROR

· 65475: LZO_ERROR

· 65474: STACK_OVER_FLOW

· 65473: SYNTAX_ERROR

· 65472: RETRY_MAX

· 65471: INCOMPATIBLE_FILE_FORMAT

· 65470: UPDATE_NOT_ALLOWED

· 65469: TOO_SMALL_OFFSET

· 65468: TOO_LARGE_OFFSET

· 65467: TOO_SMALL_LIMIT

· 65466: CAS_ERROR

· 65465: UNSUPPORTED_COMMAND_VERSION

size
The size of body. The maximum body size is 4GiB because size is 4byte unsigned integer. If
you want to send 4GiB or more larger data, use MORE flag.

Example
How to run a GQTP server
Groonga has a special protocol, named Groonga Query Transfer Protocol (GQTP), for remote
access to a database. The following form shows how to run Groonga as a GQTP server.

Form:

groonga [-p PORT_NUMBER] -s DB_PATH

The -s option specifies to run Groonga as a server. DB_PATH specifies the path of the
existing database to be hosted. The -p option and its argument, PORT_NUMBER, specify the
port number of the server. The default port number is 10043, which is used when you don't
specify PORT_NUMBER.

The following command runs a server that listens on the default port number. The server
accepts operations to the specified database.

Execution example:

% groonga -s /tmp/groonga-databases/introduction.db
Ctrl-c
%

How to run a GQTP daemon
You can also run a GQTP server as a daemon by using the -d option, instead of the -s
option.

Form:

groonga [-p PORT_NUMBER] -d DB_PATH

A Groonga daemon prints its process ID as follows. In this example, the process ID is
12345. Then, the daemon opens a specified database and accepts operations to that
database.

Execution example:

% groonga -d /tmp/groonga-databases/introduction.db
12345
%

How to run a GQTP client
You can run Groonga as a GQTP client as follows:

Form:

groonga [-p PORT_NUMBER] -c [HOST_NAME_OR_IP_ADDRESS]

This command establishes a connection with a GQTP server and then enters into interactive
mode. HOST_NAME_OR_IP_ADDRESS specifies the hostname or the IP address of the server. If
not specified, Groonga uses the default hostname "localhost". The -p option and its
argument, PORT_NUMBER, specify the port number of the server. If not specified, Groonga
uses the default port number 10043.

Execution example:

% groonga -c
status
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# {
# "uptime": 0,
# "max_command_version": 2,
# "n_queries": 0,
# "cache_hit_rate": 0.0,
# "version": "4.0.1",
# "alloc_count": 140,
# "command_version": 1,
# "starttime": 1395806078,
# "default_command_version": 1
# }
# ]
> ctrl-d
%

In interactive mode, Groonga reads commands from the standard input and executes them one
by one.

How to terminate a GQTP server
You can terminate a GQTP server with a /reference/commands/shutdown command.

Execution example:

% groonga -c
> shutdown
%

See also
· /reference/executables/groonga

· /server/gqtp

検索
/reference/commands/select
コマンドがqueryパラメータを使ってどのように検索するのかを説明します。

検索の挙動
検索の挙動には以下の3種類あり、検索結果によって動的に使い分けています。

1. 完全一致検索

2. 非わかち書き検索

3. 部分一致検索

どのように検索の挙動を使い分けているかを説明する前に、まず、それぞれの検索の挙動を説明します。

完全一致検索
検索対象文書は複数の語彙にトークナイズ(分割)され、それぞれを単位とした語彙表に索引を管理します。
検索キーワードも同一の方法でトークナイズされます。

このとき、検索キーワードをトークナイズした結果得られる語彙の配列と同一の配列を含む文書を検索する処理を完全一致検索と呼んでいます。

たとえば、TokenMecabトークナイザを使用した索引では「東京都民」という文字列は
東京 / 都民

という二つの語彙の配列として格納されます。この索引に対して「東京都」というキーワードで検索した時、このキーワードは、
東京 / 都

という二つの語彙の配列として処理されます。この語彙の並びは、「東京 /
都民」という語彙の並びには一致しませんので、完全一致検索ではヒットしません。

これに対して、TokenBigramトークナイザを使用した索引では「東京都民」という文字列は
東京 / 京都 / 都民 / 民

という四つの語彙の配列として格納されます。この索引に対して「東京都」というキーワードで検索した時、このキーワードは、
東京 / 京都

という二つの語彙の配列として処理されます。この語彙の並びは、「東京 / 京都 /
都民」という語彙の並びに含まれますので、完全一致検索でヒットします。

なお、TokenBigramトークナイザでは、アルファベット・数値・記号文字列についてはbigramを生成せず、一つの連続したトークンとして扱います。たとえば、「楽しいbilliard」という文字列は、
楽し / しい / billiard

という三つの語彙の配列として格納されます。これに対して「bill」というキーワードで検索した時、このキーワードは、
bill

という一つの語彙として処理されます。この語彙の並びは「楽し / しい /
billiard」という語彙の並びには含まれないので、完全一致でヒットしません。

これに対して、TokenBigramSplitSymbolAlphaトークナイザではアルファベット文字列についてもbigramを生成し、「楽しいbilliard」という文字列は、
楽し / しい / いb / bi / il / ll / li / ia / ar / rd / d

という十一の語彙の配列として格納されます。これに対して「bill」というキーワードで検索した時、このキーワードは、
bi / il / ll

という三つの語彙として処理されます。この語彙の並びは「楽し / しい / いb / bi / il / ll /
li / ia / ar / rd / d」という語彙の並びに含まれるので、完全一致でヒットします。

非わかち書き検索
非わかち書き検索はパトリシア木で語彙表を構築している場合のみ利用可能です。

非わかち書き検索の挙動はTokenBigramなどN-gram系のトークナイザーを利用している場合とTokenMecabトークナイザーを利用している場合で挙動が変わります。

N-gram系のトークナイザーを利用している場合はキーワードで前方一致検索をします。

例えば、「bill」というキーワードで検索した場合、「bill」も「billiard」もヒットします。

TokenMeCabトークナイザーの場合はわかち書き前のキーワードで前方一致検索をします。

例えば、「スープカレー」というキーワードで検索した場合、「スープカレーバー」(1単語扱い)にヒットしますが、「スープカレー」("スープ"と"カレー"の2単語扱い)や「スープカレーライス」("スープ"と"カレーライス"の2単語扱い)にはヒットしません。

部分一致検索
部分一致検索はパトリシア木で語彙表を構築していて、かつ、KEY_WITH_SISオプションを指定している場合のみ利用可能です。KEY_WITH_SISオプションが指定されていない場合は非わかち書き検索と同等です。

部分一致検索の挙動はTokenBigramなどN-gram系のトークナイザーを利用している場合とTokenMecabトークナイザーを利用している場合で挙動が変わります。

Bigramの場合は前方一致検索と中間一致検索と後方一致検索を行います。

例えば、「ill」というキーワードで検索した場合、「bill」も「billiard」もヒットします。

TokenMeCabトークナイザーの場合はわかち書き後のキーワードで前方一致検索と中間一致検索と後方一致検索をします。

例えば、「スープカレー」というキーワードで検索した場合、「スープカレー」("スープ"と"カレー"の2単語扱い)や「スープカレーライス」("スープ"と"カレーライス"の2単語扱い)、「スープカレーバー」(1単語扱い)にもヒットします。

検索の使い分け
Groongaは基本的に完全一致検索のみを行います。完全一致検索でのヒット件数が所定の閾値以下の場合に限り、非わかち書き検索を行い、それでもヒット件数が閾値以下の場合は部分一致検索を行います。(閾値のデフォルト値は0です。)

ただし、すでに検索結果セットが存在する場合はたとえヒット件数が閾値以下でも完全一致検索のみを行います。

例えば、以下のようなクエリの場合は、それぞれの検索でヒット件数が閾値以下の場合は完全一致検索、非わかち書き検索、部分一致検索を順に行います。:

select Shops --match_column description --query スープカレー

しかし、以下のように全文検索を行う前に検索結果セットが存在する場合は完全一致検索のみを行います。(point
> 3で閾値の件数よりヒットしている場合):

select Shops --filter '"point > 3 && description @ \"スープカレー\""'

そのため、descriptionに「スープカレーライス」が含まれていても、「スープカレーライス」は「スープカレー」に完全一致しないのでヒットしません。

LIMITATIONS


Groonga has some limitations.

Limitations of table
A table has the following limitations.

· The maximum one key size: 4KiB

· The maximum total size of keys: 4GiB or 1TiB (by specifying KEY_LARGE flag to
table-create-flags)

· The maximum number of records: 268,435,455 (more than 268 million)

Keep in mind that these limitations may vary depending on conditions.

Limitations of indexing
A full-text index has the following limitations.

· The maximum number of distinct terms: 268,435,455 (more than 268 million)

· The maximum index size: 256GiB

Keep in mind that these limitations may vary depending on conditions.

Limitations of column
A column has the following limitation.

· The maximum stored data size of a column: 256GiB

トラブルシューティング


同じ検索キーワードなのに全文検索結果が異なる
同じ検索キーワードでも一緒に指定するクエリによっては全文検索の結果が異なることがあります。ここでは、その原因と対策方法を説明します。


まず、実際に検索結果が異なる例を説明します。

DDLは以下の通りです。BlogsテーブルのbodyカラムをTokenMecabトークナイザーを使ってトークナイズしてからインデックスを作成しています。:

table_create Blogs TABLE_NO_KEY
column_create Blogs body COLUMN_SCALAR ShortText
column_create Blogs updated_at COLUMN_SCALAR Time
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenMecab --normalizer NormalizerAuto
column_create Terms blog_body COLUMN_INDEX|WITH_POSITION Blogs body

テスト用のデータは1件だけ投入します。:

load --table Blogs
[
["body", "updated_at"],
["東京都民に深刻なダメージを与えました。", "2010/9/21 10:18:34"],
]

まず、全文検索のみで検索します。この場合ヒットします。:

> select Blogs --filter 'body @ "東京都"'
[[0,4102.268052438,0.000743783],[[[1],[["_id","UInt32"],["updated_at","Time"],["body","ShortText"]],[1,1285031914.0,"東京都民に深刻なダメージを与えました。"]]]]

続いて、範囲指定と全文検索を組み合わせて検索します(1285858800は2010/10/1
0:0:0の秒表記)。この場合もヒットします。:

> select Blogs --filter 'body @ "東京都" && updated_at < 1285858800'
[[0,4387.524084839,0.001525487],[[[1],[["_id","UInt32"],["updated_at","Time"],["body","ShortText"]],[1,1285031914.0,"東京都民に深刻なダメージを与えました。"]]]]

最後に、範囲指定と全文検索の順番を入れ替えて検索します。個々の条件は同じですが、この場合はヒットしません。:

> select Blogs --filter 'updated_at < 1285858800 && body @ "東京都"'
[[0,4400.292570838,0.000647716],[[[0],[["_id","UInt32"],["updated_at","Time"],["body","ShortText"]]]]]

どうしてこのような挙動になるかを説明します。

原因
このような挙動になるのは全文検索時に複数の検索の挙動を使い分けているからです。ここでは簡単に説明するので、詳細は
/spec/search を参照してください。

検索の挙動には以下の3種類があります。

1. 完全一致検索

2. 非わかち書き検索

3. 部分一致検索

Groongaは基本的に完全一致検索のみを行います。上記の例では「東京都民に深刻なダメージを与えました。」を「東京都」というクエリで検索していますが、TokenMecabトークナイザーを使っている場合はこのクエリはマッチしません。

検索対象の「東京都民に深刻なダメージを与えました。」は
東京 / 都民 / に / 深刻 / な / ダメージ / を / 与え / まし / た / 。

とトークナイズされますが、クエリの「東京都」は
東京 / 都

とトークナイズされるため、完全一致しません。

Groongaは完全一致検索した結果のヒット件数が所定の閾値を超えない場合に限り、非わかち書き検索を行い、それでもヒット件数が閾値を超えない場合は部分一致検索を行います(閾値は1がデフォルト値となっています)。このケースのデータは部分一致検索ではヒットするので、「東京都」クエリのみを指定するとヒットします。

しかし、以下のように全文検索前にすでに閾値が越えている場合(「updated_at <
1285858800」で1件ヒットし、閾値を越える)は、たとえ完全一致検索で1件もヒットしない場合でも部分一致検索などを行いません。:

select Blogs --filter 'updated_at < 1285858800 && body @ "東京都"'

そのため、条件の順序を変えると検索結果が変わるという状況が発生します。以下で、この情報を回避する方法を2種類紹介しますが、それぞれトレードオフとなる条件があるので採用するかどうかを十分検討してください。

対策方法1: トークナイザーを変更する
TokenMecabトークナイザーは事前に準備した辞書を用いてトークナイズするため、再現率よりも適合率を重視したトークナイザーと言えます。一方、TokenBigramなど、N-gram系のトークナイザーは適合率を重視したトークナイザーと言えます。例えば、TokenMecabの場合「東京都」で「京都」に完全一致することはありませんが、TokenBigramでは完全一致します。一方、TokenMecabでは「東京都民」に完全一致しませんが、TokenBigramでは完全一致します。

このようにN-gram系のトークナイザーを指定することにより再現率をあげることができますが、適合率が下がり検索ノイズが含まれる可能性が高くなります。この度合いを調整するためには
/reference/commands/select のmatch_columnsで使用する索引毎に重み付けを指定します。

ここでも、前述の例を使って具体例を示します。まず、TokenBigramを用いた索引を追加します。:

table_create Bigram TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
column_create Bigram blog_body COLUMN_INDEX|WITH_POSITION Blogs body

この状態でも以前はマッチしなかったレコードがヒットするようになります。:

> select Blogs --filter 'updated_at < 1285858800 && body @ "東京都"'
[[0,7163.448064902,0.000418127],[[[1],[["_id","UInt32"],["updated_at","Time"],["body","ShortText"]],[1,1285031914.0,"東京都民に深刻なダメージを与えました。"]]]]

しかし、N-gram系のトークナイザーの方がTokenMecabトークナイザーよりも語のヒット数が多いため、N-gram系のヒットスコアの方が重く扱われてしまいます。N-gram系のトークナイザーの方がTokenMecabトークナイザーよりも適合率の低い場合が多いので、このままでは検索ノイズが上位に表示される可能性が高くなります。

そこで、TokenMecabトークナイザーを使って作った索引の方をTokenBigramトークナイザーを使って作った索引よりも重視するように重み付けを指定します。これは、match_columnsオプションで指定できます。:

> select Blogs --match_columns 'Terms.blog_body * 10 || Bigram.blog_body * 3' --query '東京都' --output_columns '_score, body'
[[0,8167.364602632,0.000647003],[[[1],[["_score","Int32"],["body","ShortText"]],[13,"東京都民に深刻なダメージを与えました。"]]]]

この場合はスコアが11になっています。内訳は、Terms.blog_body索引(TokenMecabトークナイザーを使用)でマッチしたので10、Bigram.blog_body索引(TokenBigramトークナイザーを使用)でマッチしたので3、これらを合計して13になっています。このようにTokenMecabトークナイザーの重みを高くすることにより、検索ノイズが上位にくることを抑えつつ再現率を上げることができます。

この例は日本語だったのでTokenBigramトークナイザーでよかったのですが、アルファベットの場合はTokenBigramSplitSymbolAlphaトークナイザーなども利用する必要があります。例えば、「楽しいbilliard」はTokenBigramトークナイザーでは
楽し / しい / billiard

となり、「bill」では完全一致しません。一方、TokenBigramSplitSymbolAlphaトークナイザーを使うと
楽し / しい / いb / bi / il / ll / li / ia / ar / rd / d

となり、「bill」でも完全一致します。

TokenBigramSplitSymbolAlphaトークナイザーを使う場合も重み付けを考慮する必要があることはかわりありません。

利用できるバイグラム系のトークナイザーの一覧は以下の通りです。

· TokenBigram:
バイグラムでトークナイズする。連続する記号・アルファベット・数字は一語として扱う。

· TokenBigramSplitSymbol:
記号もバイグラムでトークナイズする。連続するアルファベット・数字は一語として扱う。

· TokenBigramSplitSymbolAlpha:
記号とアルファベットもバイグラムでトークナイズする。連続する数字は一語として扱う。

· TokenBigramSplitSymbolAlphaDigit:
記号・アルファベット・数字もバイグラムでトークナイズする。

· TokenBigramIgnoreBlank:
バイグラムでトークナイズする。連続する記号・アルファベット・数字は一語として扱う。空白は無視する。

· TokenBigramIgnoreBlankSplitSymbol:
記号もバイグラムでトークナイズする。連続するアルファベット・数字は一語として扱う。空白は無視する。

· TokenBigramIgnoreBlankSplitSymbolAlpha:
記号とアルファベットもバイグラムでトークナイズする。連続する数字は一語として扱う。空白は無視する。

· TokenBigramIgnoreBlankSplitSymbolAlphaDigit:
記号・アルファベット・数字もバイグラムでトークナイズする。空白は無視する。

対策方法2: 閾値をあげる
非わかち書き検索・部分一致検索を利用するかどうかの閾値は--with-match-escalation-threshold
configureオプションで変更することができます。以下のように指定すると、100件以下のヒット数であれば、たとえ完全一致検索でヒットしても、非わかち書き検索・部分一致検索を行います。:

% ./configure --with-match-escalation-threshold=100

この場合も対策方法1同様、検索ノイズが上位に現れる可能性が高くなることに注意してください。検索ノイズが多くなった場合は指定する値を低くする必要があります。

How to avoid mmap Cannot allocate memory error
Example
There is a case following mmap error in log file:
2013-06-04 08:19:34.835218|A|4e86e700|mmap(4194304,551,432017408)=Cannot allocate
memory <13036498944>

Note that <13036498944> means total size of mmap (almost 12GB) in this case.

Solution
So you need to confirm following point of views.

· Are there enough free memory?

· Are maximum number of mappings exceeded?

To check there are enough free memory, you can use vmstat command.

To check whether maximum number of mappings are exceeded, you can investigate the value of
vm.max_map_count.

If this issue is fixed by modifying the value of vm.max_map_count, it's exactly the
reason.

As groonga allocates memory chunks each 256KB, you can estimate the size of database you
can handle by following formula:
(database size) = vm.max_map_count * (memory chunks)

If you want to handle over 16GB groonga database, you must specify at least 65536 as the
value of vm.max_map_count:
database size (16GB) = vm.max_map_count (65536) * memory chunks (256KB)

You can modify vm.max_map_count temporary by sudo sysctl -w vm.max_map_count=65536.

Then save the configuration value to /etc/sysctl.conf or /etc/sysctl.d/*.conf.

See /reference/tuning documentation about tuning related parameters.

DEVELOPMENT


This section describes about developing with Groonga. You may develop an application that
uses Groonga as its database, a library that uses libgroonga, language bindings of
libgroonga and so on.

Travis CI
This section describes about using Groonga on Travis CI. Travis CI is a hosted continuous
integration service for the open source community.

You can use Travis CI for your open source software. This section only describes about
Groonga related configuration. See Travis CI: Documentation about general Travis CI.

Configuration
Travis CI is running on 64-bit Ubuntu 12.04 LTS Server Edition. (See Travis CI: About
Travis CI Environment.) You can use apt-line for Ubuntu 12.04 LTS provided by Groonga
project to install Groonga on Travis CI.

You can custom build lifecycle by .travis.yml. (See Travis CI: Conifugration your Travis
CI build with .travis.yml.) You can use before_install hook or install hook. You should
use before_install if your software uses a language that is supported by Travis CI such as
Ruby. You should use install otherwise.

Add the following sudo and before_install configuration to .travis.yml:

sudo: required
before_install:
- curl --silent --location https://github.com/groonga/groonga/raw/master/data/travis/setup.sh | sh

sudo: required configuration is required because sudo command is used in the setup script.

If you need to use install hook instead of before_install, you just substitute
before_install: with install:.

With the above configuration, you can use Groonga for your build.

Examples
Here are open source software that use Groonga on Travis CI:

· rroonga (Ruby bindings)

· rroonga on Travis CI

· .travis.yml for rroonga

· nroonga (node.js bindings)

· nroonga on Travis CI

· .travis.yml for nroonga

· logaling-command (A glossary management command line tool)

· logaling-command on Travis CI

· .travis.yml for logaling-command

HOW TO CONTRIBUTE TO GROONGA


We welcome your contributions to the groonga project. There are many ways to contribute,
such as using groonga, introduction to others, etc. For example, if you find a bug when
using groonga, you are welcome to report the bug. Coding and documentation are also
welcome for groonga and its related projects.

As a user:
If you are interested in groonga, please read this document and try it.

As a spokesman:
Please introduce groonga to your friends and colleagues.

As a developer: Bug report, development and documentation
This section describes the details.

How to report a bug
There are two ways to report a bug:

· Submit a bug to the issue tracker

· Report a bug to the mailing list

You can use either way It makes no difference to us.

Submit a bug to the issue tracker
Groonga project uses GitHub issue tracker.

You can use English or Japanese to report a bug.

Report a bug to the mailing list
Groonga project has /community for discussing about groonga. Please send a mail that
describes a bug.

How to contribute in documentation topics
We use Sphinx for documentation tool.

Introduction
This documentation describes about how to write, generate and manage Groonga
documentation.

Install depended software
Groonga uses Sphinx as documentation tool.

Here are command lines to install Sphinx.

Debian GNU/Linux, Ubuntu:

% sudo apt-get install -V -y python-sphinx

CentOS, Fedora:

% sudo yum install -y python-pip
% sudo pip install sphinx

OS X:

% brew install python
% brew install gettext
% export PATH=`brew --prefix gettext`/bin:$PATH
% pip install sphinx

If the version of Python on your platform is too old, you'll need to install a newer
version of Python 2.7 by your hand. For example, here are installation steps based on
pyenv:

% pyenv install 2.7.11
% pyenv global 2.7.11
% pip install sphinx

Run configure with --enable-document
Groonga disables documentation generation by default. You need to enable it explicitly by
adding --enable-document option to configure:

% ./configure --enable-document

Now, your Groonga build is documentation ready.

Generate HTML
You can generate HTML by the following command:

% make -C doc html

You can find generated HTML documentation at doc/locale/en/html/.

Update
You can find sources of documentation at doc/source/. The sources should be written in
English. See i18n about how to translate documentation.

You can update the target file when you update the existing documentation file.

You need to update file list after you add a new file, change file path and delete
existing file. You can update file list by the following command:

% make -C doc update-files

The command updates doc/files.am.

I18N
We only had documentation in Japanese. We start to support I18N documentation by gettext
based Sphinx I18N feature. We'll use English as base language and translate English into
other languages such as Japanese. We'll put all documentations into doc/source/ and
process them by Sphinx.

But we still use Japanese in doc/source/ for now. We need to translate Japanese
documentation in doc/source/ into English. We welcome to you help us by translating
documentation.

Translation flow
After doc/source/*.txt are updated, we can start translation.

Here is a translation flow:

1. Install Sphinx, if it is not installed.

2. Clone Groonga repository.

3. Update .po files.

4. Edit .po files.

5. Generate HTML files.

6. Confirm HTML output.

7. Repeat 2.-4. until you get good result.

8. Send your works to us!

Here are command lines to do the above flow. Following sections describes details.

# Please fork https://github.com/groonga/groonga on GitHub
% git clone https://github.com/${YOUR_GITHUB_ACCOUNT}/groonga.git
% ./autogen.sh
% ./configure --enable-document
% cd doc/locale/${LANGUAGE}/LC_MESSAGES # ${LANGUAGE} is language code such as 'ja'.
% make update # *.po are updated
% editor *.po # translate *.po # you can use your favorite editor
% cd ..
% make html
% browser html/index.html # confirm translation
% git add LC_MESSAGES/*.po
% git commit
% git push

How to install Sphinx
See the introduction.

How to clone Groonga repository
First, please fork Groonga repository on GitHub. You just access
https://github.com/groonga/groonga and press Fork button. Now you can clone your Groonga
repository:

% git clone https://github.com/${YOUR_GITHUB_ACCOUNT}/groonga.git

Then you need to configure your cloned repository:

% cd groonga
% ./autogen.sh
% ./configure --enable-document

The above steps are just needed at the first setup.

If you have troubles on the above steps, you can use source files available on
http://packages.groonga.org/source/groonga/ .

How to update .po files
You can update .po files by running make update on doc/locale/${LANGUAGE}/LC_MESSAGES.
(Please substitute ${LANGUAGE} with your language code such as 'ja'.):

% cd doc/locale/ja/LC_MESSAGES
% make update

How to edit .po
There are some tools to edit .po files. .po files are just text. So you can use your
favorite editor. Here is a specialized editor for .po file edit list.

Emacs's po-mode
It is bundled in gettext.

Poedit It is a .po editor and works on many platform.

gted It is also a .po editor and is implemented as Eclipse plugin.

How to generate HTML files
You can generate HTML files with updated .po files by running make html on
doc/locale/${LANGUAGE}. (Please substitute ${LANGUAGE} with your language code such as
'ja'.):

% cd doc/locale/ja/
% make html

You can also generate HTML files for all languages by running make html on doc/locale:

% cd doc/locale
% make html

NOTE:
.mo files are updated automatically by make html. So you don't care about .mo files.

How to confirm HTML output
HTML files are generated in doc/locale/${LANGUAGE}/html/. (Please substitute ${LANGUAGE}
with your language code such as 'ja'.) You can confirm HTML output by your favorite
browser:

% firefox doc/locale/ja/html/index.html

How to send your works
We can receive your works via pull request on GitHub or E-mail attachment patch or .po
files themselves.

How to send pull request
Here are command lines to send pull request:

% git add doc/locale/ja/LC_MESSAGES/*.po
% git commit
% git push

Now you can send pull request on GitHub. You just access your repository page on GitHub
and press Pull Request button.

SEE ALSO:
Help.GitHub - Sending pull requests.

How to send patch
Here are command lines to create patch:

% git add doc/locale/ja/LC_MESSAGES/*.po
% git commit
% git format-patch origin/master

You can find 000X-YYY.patch files in the current directory. Please send those files to us!

SEE ALSO:
/community describes our contact information.

How to send .po files
Please archive doc/locale/${LANGUAGE}/LC_MESSAGES/ (Please substitute ${LANGUAGE} with
your language code such as 'ja'.) and send it to us! We extract and merge them to the
Groonga repository.

SEE ALSO:
/community describes our contact information.

How to add new language
Here are command lines to add new translation language:

% cd doc/locale
% make add LOCALE=${LANGUAGE} # specify your language code such as 'de'.

Please substitute ${LANGUAGE} with your language code such as 'ja'.

SEE ALSO:
Codes for the Representation of Names of Languages.

C API
We still have C API documentation in include/groonga.h. But we want to move them into
doc/source/c-api/*.txt. We welcome to you help us by moving C API documentation.

We will use the C domain markup of Sphinx.

For Groonga developers
Repository
There is the repository of Groonga on GitHub. If you want to check-out Groonga, type the
below command:

% git clone --recursive https://github.com/groonga/groonga.git

There is the list of related projects of Groonga (grntest, fluent-plugin-groonga and so
on).

How to build Groonga at the repository
This document describes how to build Groonga at the repository for each build system. You
can choose GNU Autotools or CMake if you develop Groonga on GNU/Linux or Unix (*BSD,
Solaris, OS X and so on). You need to use CMake if you develop on Windows.

How to build Groonga at the repository by GNU Autotools
This document describes how to build Groonga at the repository by GNU Autotools.

You can't choose this way if you develop Groonga on Windows. If you want to use Windows
for developing Groonga, see windows_cmake.

Install depended software
TODO

· Autoconf

· Automake

· GNU Libtool

· Ruby

· Git

· Cutter

· ...

Checkout Groonga from the repository
Users use released source archive. But developers must build Groonga at the repository.
Because source code in the repository is the latest.

The Groonga repository is hosted on GitHub. Checkout the latest source code from the
repository:

% git clone --recursive [email protected]:groonga/groonga.git

Create configure
You need to create configure. configure is included in source archive but not included in
the repository.

configure is a build tool that detects your system and generates build configurations for
your environment.

Run autogen.sh to create configure:

% ./autogen.sh

Run configure
You can custom your build configuration by passing options to configure.

Here are recommended configure options for developers:

% ./configure --prefix=/tmp/local --enable-debug --enable-mruby --with-ruby

Here are descriptions of these options:

--prefix=/tmp/local
It specifies that you install your Groonga into temporary directory. You can do
"clean install" by removing /tmp/local directory. It'll be useful for debugging
install.

--enable-debug
It enables debug options for C/C++ compiler. It's useful for debugging on debugger
such as GDB and LLDB.

--eanble-mruby
It enables mruby support. The feature isn't enabled by default but developers
should enable the feature.

--with-ruby
It's needed for --enable-mruby and running functional tests.

Run make
Now, you can build Groonga.

Here is a recommended make command line for developers:

% make -j8 > /dev/null

-j8 decreases build time. It enables parallel build. If you have 8 or more CPU cores, you
can increase 8 to decreases more build time.

You can just see only warning and error messages by > /dev/null. Developers shouldn't add
new warnings and errors in new commit.

See also
· /contribution/development/test

How to build Groonga at the repository by CMake on GNU/Linux or Unix
This document describes how to build Groonga at the repository by CMake on GNU/Linux or
Unix.

Unix is *BSD, Solaris, OS X and so on.

If you want to use Windows for developing Groonga, see windows_cmake.

You can't choose this way if you want to release Groonga. Groonga release system is only
supported by GNU Autotools build. See unix_autotools about GNU Autotools build.

Install depended software
TODO

· CMake

· Ruby

· Git

· Cutter

· ...

Checkout Groonga from the repository
Users use released source archive. But developers must build Groonga at the repository.
Because source code in the repository is the latest.

The Groonga repository is hosted on GitHub. Checkout the latest source code from the
repository:

% git clone --recursive [email protected]:groonga/groonga.git

Run cmake
You need to create Makefile for your environment.

You can custom your build configuration by passing options to cmake.

Here are recommended cmake options for developers:

% cmake . -DCMAKE_INSTALL_PREFIX=/tmp/local -DGRN_WITH_DEBUG=on -DGRN_WITH_MRUBY=on

Here are descriptions of these options:

-DCMAKE_INSTALL_PREFIX=/tmp/local
It specifies that you install your Groonga into temporary directory. You can do "clean
install" by removing /tmp/local directory. It'll be useful for debugging install.

-DGRN_WITH_DEBUG=on
It enables debug options for C/C++ compiler. It's useful for debugging on debugger such
as GDB and LLDB.

-DGRN_WITH_MRUBY=on
It enables mruby support. The feature isn't enabled by default but developers should
enable the feature.

Run make
Now, you can build Groonga.

Here is a recommended make command line for developers:

% make -j8 > /dev/null

-j8 decreases build time. It enables parallel build. If you have 8 or more CPU cores, you
can increase 8 to decreases more build time.

You can just see only warning and error messages by > /dev/null. Developers shouldn't add
new warnings and errors in new commit.

See also
· /contribution/development/test

How to build Groonga at the repository by CMake on Windows
This document describes how to build Groonga at the repository by CMake on Windows.

If you want to use GNU/Linux or Unix for developing Groonga, see unix_cmake.

Unix is *BSD, Solaris, OS X and so on.

Install depended software
· Microsoft Visual Studio Express 2013 for Windows Desktop

· CMake

· Ruby

· RubyInstaller for Windows is recommended.

· Git: There are some Git clients for Windows. For example:

· The official Git package

· TortoiseGit

· Git for Windows

· GitHub Desktop

Checkout Groonga from the repository
Users use released source archive. But developers must build Groonga at the repository.
Because source code in the repository is the latest.

The Groonga repository is hosted on GitHub. Checkout the latest source code from the
repository:

> git clone --recursive [email protected]:groonga/groonga.git

Run cmake
You need to create Makefile for your environment.

You can custom your build configuration by passing options to cmake.

You must to pass -G option. Here are available -G value:

· "Visual Studio 12 2013": For 32bit build.

· "Visual Studio 12 2013 Win64": For 64bit build.

Here are recommended cmake options for developers:

> cmake . -G "Visual Studio 12 2013 Win64" -DCMAKE_INSTALL_PREFIX=C:\Groonga -DGRN_WITH_MRUBY=on

Here are descriptions of these options:

-G "Visual Studio 12 2013 Win64"

-DCMAKE_INSTALL_PREFIX=C:\Groonga
It specifies that you install your Groonga into C:\\Groonga folder.

-DGRN_WITH_MRUBY=on
It enables mruby support. The feature isn't enabled by default but developers should
enable the feature.

Build Groonga
Now, you can build Groonga.

You can use Visual Studio or cmake --build.

Here is a command line to build Groonga by cmake --build:

> cmake --build . --config Debug

See also
· /contribution/development/test

Groonga 通信アーキテクチャ
GQTPでのアーキテクチャ
· comが外部からの接続を受け付ける。

· comは1スレッド。

· comがedgeを作る。

· edgeは接続と1対1対応。

· edgeはctxを含む。

· workerはthreadと1対1対応。

· workerは上限が個定数。

· workerは、1つのedgeと結びつくことができる。

· edgeごとにqueueを持つ。

· msgはcomによって、edgeのqueueにenqueueされる。
edgeがworkerに結びついていないときは、同時に、ctx_newというqueueに、msgをenqueueした対象のedgeをenqueueする。

ユーザーと協力して開発をうまく進めていくための指針
Groongaを使ってくれているユーザーと協力して
開発をうまく進めていくためにこうするといい、という事柄をまとめました。
まとめておくと、新しく開発に加わる人とも共有することができます。

twitter編
Groongaを使ってもらえるようにtwitterのアカウントGroongaを取得して
日々、リリースの案内をしたり、ユーザーサポートをしたりしています。

リリースの案内に利用する場合には、やりとりを考えなくて良いですが、
複数人によるサポートをGroongaで行う場合に、どうサポートするのが
良いのか/どうしてそうするのかという共通認識を持っていないと一貫性のないサポートとなってしま
います。

twitterでサポートされている安心感からGroongaユーザーの拡大に繋げる
ことができるようにサポートの際に気をつけることをまとめます。

過去のツイートはおさらいしておく
理由
自分がツイートした内容を把握していない返信をされたら普通いい気はしません。

対応
過去のツイートをおさらいし、こうすれば良いという提案をできるのが望ましいです。:

良い例: ○○だと原因は□□ですね。××すると大丈夫です。

こちらから情報を提供する
理由
困っているユーザーが複数回ツイートして限られたなかで情報を提供してくれていることがあります。
その限られたツイートから解決方法が見つかればユーザーにとって余計な手間が少なくて済みます。
あれこれ情報提供を要求すると、ユーザーはそのぶん確認する作業が必要になります。

対応
最初に声をかけるときに解決策を1つか2つ提案できると望ましいです。ユーザーにあまり負担を感じさせないようにすると良いです。:

良い例: ○○の場合は□□の可能性があるので、××を試してもらえますか?

twitterでのやりとりはできるだけ他の場所(例えばredmine)へと誘導しない
理由
twitterは気軽につぶやけることが重要なので、気軽にできないことを相手に要求すると萎縮されてしまう可能性があります。

いきなりredmineでバグ報告をお願いすると、しりごみしてしまうかもしれません。:

駄目な例: 再現手順をMLかredmineに報告してもらえますか?

Groonga関連で気軽につぶやけないとなると開発者は困っている人を見つけられないし、利用者は困ったままとなるので、双方にとって嬉しくない状態になってしまいます。

対応
twitterでやりとりを完結できるようにします。

クエリの実現
Groongaのデータベースには大量のデータを格納し、その中から必要な部分を高速に取り出すことができます。必要な部分をGroongaのデータベースに問い合わせるためのクエリの表現と実行に関して、Groongaは複数の手段を用意しています。

クエリ実行のためのインタフェース
Groongaは低機能で単純なライブラリインタフェースから、高機能で複雑なコマンドインタフェースまでいくつかの階層的なインタフェースをユーザプログラムに提供しています。

クエリ実行のためのインタフェースも階層的なインタフェースのそれぞれに対応する形で用意されています。以下に低レイヤなインタフェースから順に説明します。

DB_API
DB_APIは、Groongaデータベースを操作するための一群のC言語向けAPI関数を提供します。DB_APIはデータベースを構成する個々の部分に対する単純な操作関数を提供します。DB_APIの機能を組み合わせることによって複雑なクエリを実行することができます。後述のすべてのクエリインタフェースはDB_APIの機能を組み合わせることによって実現されています。

grn_expr
grn_exprは、Groongaデータベースに対する検索処理や更新処理のための条件を表現するためのデータ構造で、複数の条件を再帰的に組み合わせてより複雑な条件を表現することができます。grn_exprによって表現されたクエリを実行するためには、grn_table_select()関数を使用します。

Groonga実行ファイル
Groongaデータベースを操作するためのコマンドインタープリタです。渡されたコマンドを解釈し、実行結果を返します。コマンドの実処理はC言語で記述されます。ユーザがC言語で定義した関数を新たなコマンドとしてGroonga実行ファイルに組み込むことができます。各コマンドはいくつかの文字列引数を受け取り、これをクエリとして解釈して実行します。引数をgrn_exprとして解釈するか、別の形式として解釈してDB_APIを使ってデータベースを操作するかはコマンド毎に自由に決めることができます。

grn_exprで表現できるクエリ
grn_exprは代入や関数呼び出しのような様々な操作を表現できますが、この中で検索クエリを表現するgrn_exprのことを特に条件式とよびます。条件式を構成する個々の要素を関係式と呼びます。条件式は一個以上の関係式か、あるいは条件式を論理演算子で結合したものです。

論理演算子は、以下の3種類があります。

&& (論理積)
|| (論理和)
! (否定)

関係式は、下記の11種類が用意されています。また、ユーザが定義した関数を新たな関係式として使うこともできます。

equal(==)
not_equal(!=)
less(<)
greater(>)
less_equal(<=)
greater_equal(>=)
contain()
near()
similar()
prefix()
suffix()

grn_table_select()
grn_table_select()関数は、grn_exprで表現された検索クエリを実行するときに使います。引数として、検索対象となるテーブル、クエリを表すgrn_expr、検索結果を格納するテーブル、それに検索にマッチしたレコードを検索結果にどのように反映するかを指定する演算子を渡します。演算子と指定できるのは下記の4種類です。

GRN_OP_OR
GRN_OP_AND
GRN_OP_BUT
GRN_OP_ADJUST

GRN_OP_ORは、検索対象テーブルの中からクエリにマッチするレコードを検索結果テーブルに加えます。GRN_OP_OR以外の演算子は、検索結果テーブルが空でない場合にだけ意味を持ちます。GRN_OP_ANDは、検索結果テーブルの中からクエリにマッチしないレコードを取り除きます。GRN_OP_BUTは、検索結果テーブルの中からクエリにマッチするレコードを取り除きます。GRN_OP_ADJUSTは、検索結果テーブルの中でクエリにマッチするレコードに対してスコア値の更新のみを行います。

grn_table_select()は、データベース上に定義されたテーブルや索引などを組み合わせて可能な限り高速に指定されたクエリを実行しようとします。

関係式
関係式は、検索しようとしているデータが満たすべき条件を、指定した値の間の関係として表現します。いずれの関係式も、その関係が成り立ったときに評価されるcallback、コールバック関数に渡されるargとを引数として指定することができます。callbackが与えられず、argのみが数値で与えられた場合はスコア値の係数とみなされます。主な関係式について説明します。

equal(v1, v2, arg, callback)
v1の値とv2の値が等しいことを表します。

not_equal(v1, v2, arg, callback)
v1の値とv2の値が等しくないことを表します。

less(v1, v2, arg, callback)
v1の値がv2の値よりも小さいことを表します。

greater(v1, v2, arg, callback)
v1の値がv2の値よりも大きいことを表します。

less_equal(v1, v2, arg, callback)
v1の値がv2の値と等しいか小さいことを表します。

greater_equal(v1, v2, arg, callback)
v1の値がv2の値と等しいか大きいことを表します。

contain(v1, v2, mode, arg, callback)
v1の値がv2の値を含んでいることを表します。また、v1の値が要素に分解されるとき、それぞれの要素に対して二つ目の要素が一致するためのmodeとして下記のいずれかを指定することができます。

EXACT: v2の値もv1の値と同様に要素に分解したとき、それぞれの要素が完全に一致する(デフォルト)
UNSPLIT: v2の値は要素に分解しない
PREFIX: v1の値の要素がv2の値に前方一致する
SUFFIX: v1の値の要素がv2の値に後方一致する
PARTIAL: v1の値の要素がv2の値に中間一致する

near(v1, v2, arg, callback)
v1の値の中に、v2の値の要素が接近して含まれていることを表します。(v2には値の配列を渡します)

similar(v1, v2, arg, callback)
v1の値とv2の値が類似していることを表します。

prefix(v1, v2, arg, callback)
v1の値がv2の値に対して前方一致することを表します。

suffix(v1, v2, arg, callback)
v1の値がv2の値に対して後方一致することを表します。

クエリの実例
grn_exprを使って様々な検索クエリを表現することができます。

検索例1
GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var);
grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1);
grn_expr_append_obj(ctx, query, column, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1);
grn_expr_append_op(ctx, query, GRN_OP_CALL, 3);
result = grn_table_select(ctx, table, query, NULL, GRN_OP_OR);

tableのcolumnの値がstringを含むレコードをresultに返します。columnの値が'needle in
haystack'であるレコードr1と、columnの値が'haystack'であるレコードr2がtableに登録されていたとき、stringに'needle'を指定したなら、レコードr1のみがヒットします。

検索例2
GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var);
grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1);
grn_expr_append_obj(ctx, query, column1, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, exact, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, score1, GRN_OP_PUSH, 1);
grn_expr_append_op(ctx, query, GRN_OP_CALL, 5);
result = grn_table_select(ctx, table, query, NULL, GRN_OP_OR);
grn_obj_close(ctx, query);
GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var);
grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1);
grn_expr_append_obj(ctx, query, column2, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, exact, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, score2, GRN_OP_PUSH, 1);
grn_expr_append_op(ctx, query, GRN_OP_CALL, 5);
grn_table_select(ctx, table, query, result, GRN_OP_ADJUST);
grn_obj_close(ctx, query);

tableのcolumn1の値がstringにexactモードでヒットするレコードについて得られるスコア値にscore1を積算してresultにセットします。次に、resultにセットされたレコードのうち、column2の値がstringにexactモードでヒットするレコードについては、得られたスコア値にscore2を積算したものを、元のスコア値に加えます。

検索例3
GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var);
grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1);
grn_expr_append_obj(ctx, query, column1, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, exact, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, score1, GRN_OP_PUSH, 1);
grn_expr_append_op(ctx, query, GRN_OP_CALL, 5);
result = grn_table_select(ctx, table, query, NULL, GRN_OP_OR);
grn_obj_close(ctx, query);
if (grn_table_size(ctx, result) < t1) {
GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var);
grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1);
grn_expr_append_obj(ctx, query, column1, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, partial, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, score2, GRN_OP_PUSH, 1);
grn_expr_append_op(ctx, query, GRN_OP_CALL, 3);
grn_table_select(ctx, table, query, result, GRN_OP_OR);
grn_obj_close(ctx, query);
}

tableのcolumn1の値がstringにexactモードでヒットするレコードについて得られるスコア値にscore1を積算してresultにセットします。得られた検索結果数がt1よりも小さい場合は、partialモードで再度検索し、ヒットしたレコードについて得られるスコア値にscore2を積算してresultに追加します。

検索例4
GRN_EXPR_CREATE_FOR_QUERY(ctx, table, query, var);
grn_expr_append_obj(ctx, query, contain, GRN_OP_PUSH, 1);
grn_expr_append_const(ctx, query, string, GRN_OP_PUSH, 1);
grn_expr_append_obj(ctx, query, column, GRN_OP_PUSH, 1);
grn_expr_append_op(ctx, query, GRN_OP_CALL, 3);
result = grn_table_select(ctx, table, query, NULL, GRN_OP_OR);

tableのcolumnの値がstringに含まれるレコードをresultに返します。
columnの値が'needle'であるレコードr1と、columnの値が'haystack'であるレコードr2がtableに登録されていたとき、stringに'hay
in haystack'を指定したなら、レコードr2のみがヒットします。

リリース手順
前提条件
リリース手順の前提条件は以下の通りです。

· ビルド環境は Debian GNU/Linux (sid)

· コマンドラインの実行例はzsh

作業ディレクトリ例は以下を使用します。

· GROONGA_DIR=$HOME/work/groonga

· GROONGA_CLONE_DIR=$HOME/work/groonga/groonga.clean

· GROONGA_ORG_PATH=$HOME/work/groonga/groonga.org

· CUTTER_DIR=$HOME/work/cutter

· CUTTER_SOURCE_PATH=$HOME/work/cutter/cutter

ビルド環境の準備
以下にGroongaのリリース作業を行うために事前にインストール
しておくべきパッケージを示します。

なお、ビルド環境としては Debian GNU/Linux
(sid)を前提として説明しているため、その他の環境では適宜読み替えて下さい。:

% sudo apt-get install -V debootstrap createrepo rpm mercurial python-docutils python-jinja2 ruby-full mingw-w64 g++-mingw-w64 mecab libmecab-dev nsis gnupg2 dh-autoreconf python-sphinx bison

Debian系(.deb)やRed Hat系(.rpm)パッケージのビルドには Vagrant
を使用します。apt-getでインストールできるのは古いバージョンなので、Webサイトから最新版をダウンロードしてインストールすることをおすすめします。

Vagrantで使用する仮想化ソフトウェア(VirtualBox、VMwareなど)がない場合、合わせてインストールしてください。なお、VirtualBoxはsources.listにcontribセクションを追加すればapt-getでインストールできます。:

% cat /etc/apt/sources.list
deb http://ftp.jp.debian.org/debian/ sid main contrib
deb-src http://ftp.jp.debian.org/debian/ sid main contrib
% sudo apt-get update
% sudo apt-get install virtualbox

また、rubyのrakeパッケージを以下のコマンドによりインストールします。:

% sudo gem install rake

パッケージ署名用秘密鍵のインポート
リリース作業ではRPMパッケージに対する署名を行います。
その際、パッケージ署名用の鍵が必要です。

Groongaプロジェクトでは署名用の鍵をリリース担当者の公開鍵で暗号化してリポジトリのpackages/ディレクトリ以下へと登録しています。

リリース担当者はリポジトリに登録された秘密鍵を復号した後に鍵のインポートを以下のコマンドにて行います。:

% cd packages
% gpg --decrypt release-key-secret.asc.gpg.(担当者) > (復号した鍵
ファイル)
% gpg --import (復号した鍵ファイル)

鍵のインポートが正常終了すると gpg --list-keys
でGroongaの署名用の鍵を確認することができます。:

pub 1024R/F10399C0 2012-04-24
uid groonga Key (groonga Official Signing Key)
<[email protected]>
sub 1024R/BC009774 2012-04-24

鍵をインポートしただけでは使用することができないため、インポートした鍵に対してtrust,signを行う必要があります。

以下のコマンドを実行して署名を行います。(途中の選択肢は省略):

% gpg --edit-key [email protected]
gpg> trust
gpg> sign
gpg> save
gpg> quit

この作業は、新規にリリースを行うことになった担当者やパッケージに署名する鍵に変更があった場合などに行います。

リリース作業用ディレクトリの作成
Groongaのリリース作業ではリリース専用の環境下(コンパイルフラグ)でビルドする必要があります。

リリース時と開発時でディレクトリを分けずに作業することもできますが、誤ったコンパイルフラグでリリースしてしまう危険があります。

そのため、以降の説明では$GROONGA_DIR以下のディレクトリにリリース用の作業ディレクトリ(groonga.clean)としてソースコードをcloneしたものとして説明します。

リリース用のクリーンな状態でソースコードを取得するために$GROONGA_DIRにて以下のコマンドを実行します。:

% git clone --recursive [email protected]:groonga/groonga.git groonga.clean

この作業はリリース作業ごとに行います。

変更点のまとめ
前回リリース時からの変更点を$GROONGA_CLONE_DIR/doc/source/news.txtにまとめます。
ここでまとめた内容についてはリリースアナウンスにも使用します。

前回リリースからの変更履歴を参照するには以下のコマンドを実行します。:

% git log -p --reverse $(git tag | tail -1)..

ログを^commitで検索しながら、以下の基準を目安として変更点を追記していきます。

含めるもの

· ユーザへ影響するような変更

· 互換性がなくなるような変更

含めないもの

· 内部的な変更(変数名の変更やらリファクタリング)

Groongaのウェブサイトの取得
GroongaのウェブサイトのソースはGroonga同様にgithubにリポジトリを置いています。

リリース作業では後述するコマンド(make
update-latest-release)にてトップページのバージョンを置き換えることができるようになっています。

Groongaのウェブサイトのソースコードを$GROONGA_ORG_PATHとして取得するためには、$GROONGA_DIRにて以下のコマンドを実行します。:

% git clone [email protected]:groonga/groonga.org.git

これで、$GROONGA_ORG_PATHにgroonga.orgのソースを取得できます。

cutterのソースコード取得
Groongaのリリース作業では、cutterに含まれるスクリプトを使用しています。

そこであらかじめ用意しておいた$HOME/work/cutterディレクトリにてcutterのソースコードを以下のコマンドにて取得します。:

% git clone [email protected]:clear-code/cutter.git

これで、$CUTTER_SOURCE_PATHディレクトリにcutterのソースを取得できます。

configureスクリプトの生成
Groongaのソースコードをcloneした時点ではconfigureスクリプトが含まれておらず、そのままmakeコマンドにてビルドすることができません。

$GROONGA_CLONE_DIRにてautogen.shを以下のように実行します。:

% sh autogen.sh

このコマンドの実行により、configureスクリプトが生成されます。

configureスクリプトの実行
Makefileを生成するためにconfigureスクリプトを実行します。

リリース用にビルドするためには以下のオプションを指定してconfigureを実行します。:

% ./configure \
--prefix=/tmp/local \
--with-launchpad-uploader-pgp-key=(Launchpadに登録したkeyID) \
--with-groonga-org-path=$HOME/work/groonga/groonga.org \
--enable-document \
--with-ruby \
--enable-mruby \
--with-cutter-source-path=$HOME/work/cutter/cutter

configureオプションである--with-groonga-org-pathにはGroongaのウェブサイトのリポジトリをcloneした場所を指定します。

configureオプションである--with-cutter-source-pathにはcutterのソースをcloneした場所を指定します。

以下のようにGroongaのソースコードをcloneした先からの相対パスを指定することもできます。:

% ./configure \
--prefix=/tmp/local \
--with-launchpad-uploader-pgp-key=(Launchpadに登録したkeyID) \
--with-groonga-org-path=../groonga.org \
--enable-document \
--with-ruby \
--enable-mruby \
--with-cutter-source-path=../../cutter/cutter

あらかじめpackagesユーザでpackages.groonga.orgにsshログインできることを確認しておいてください。

ログイン可能であるかの確認は以下のようにコマンドを実行して行います。:

% ssh [email protected]

make update-latest-releaseの実行
make
update-latest-releaseコマンドでは、OLD_RELEASE_DATEに前回のリリースの日付を、NEW_RELEASE_DATEに次回リリースの日付を指定します。

2.0.2のリリースを行った際は以下のコマンドを実行しました。::

% make update-latest-release OLD_RELEASE=2.0.1 OLD_RELEASE_DATE=2012-03-29 NEW_RELEASE_DATE=2012-04-29

これにより、clone済みのGroongaのWebサイトのトップページのソース(index.html,ja/index.html)やRPMパッケージのspecファイルのバージョン表記などが更新されます。

make update-filesの実行
ロケールメッセージの更新や変更されたファイルのリスト等を更新するために以下のコマンドを実行します。:

% make update-files

make
update-filesを実行すると新規に追加されたファイルなどが各種.amファイルへとリストアップされます。

リリースに必要なファイルですので漏れなくコミットします。

make update-poの実行
ドキュメントの最新版と各国語版の内容を同期するために、poファイルの更新を以下のコマンドにて実行します。:

% make update-po

make update-poを実行すると、doc/locale/ja/LC_MESSAGES以下の各種.poファイルが更新されます。

poファイルの翻訳
make update-poコマンドの実行により更新した各種.poファイルを翻訳します。

翻訳結果をHTMLで確認するために、以下のコマンドを実行します。:

% make -C doc/locale/ja html
% make -C doc/locale/en html

確認が完了したら、翻訳済みpoファイルをコミットします。

リリースタグの設定
リリース用のタグを打つには以下のコマンドを実行します。:

% make tag

NOTE:
タグを打った後にconfigureを実行することで、ドキュメント生成時のバージョン番号に反映されます。

リリース用アーカイブファイルの作成
リリース用のソースアーカイブファイルを作成するために以下のコマンドを$GROONGA_CLONE_DIRにて実行します。:

% make dist

これにより$GROONGA_CLONE_DIR/groonga-(バージョン).tar.gzが作成されます。

NOTE:
タグを打つ前にmake distを行うとversionが古いままになることがあります。 するとgroonga
--versionで表示されるバージョン表記が更新されないので注意が必要です。 make
distで生成したtar.gzのversionおよびversion.shがタグと一致することを確認するのが望ましいです。

パッケージのビルド
リリース用のアーカイブファイルができたので、パッケージ化する作業を行います。

パッケージ化作業は以下の3種類を対象に行います。

· Debian系(.deb)

· Red Hat系(.rpm)

· Windows系(.exe,.zip)

パッケージのビルドではいくつかのサブタスクから構成されています。

ビルド用パッケージのダウンロード
debパッケージのビルドに必要なパッケージをダウンロードするには以下のコマンドを実行します。:

% cd packages/apt
% make download

これにより、lucid以降の関連する.debパッケージやソースアーカイブなどがカレントディレクトリ以下へとダウンロードされます。

rpmパッケージのビルドに必要なパッケージをダウンロードするには以下のコマンドを実行します。:

% cd packages/yum
% make download

これにより、GroongaやMySQLのRPM/SRPMパッケージなどがカレントディレクトリ以下へとダウンロードされます。

Windowsパッケージのビルドに必要なパッケージをダウンロードするには以下のコマンドを実行します。:

% cd packages/windows
% make download

これにより、Groongaのインストーラやzipアーカイブがカレントディレクトリ以下へとダウンロードされます。

sourceパッケージに必要なものをダウンロードするには以下のコマンドを実行します。:

% cd packages/source
% make download

これにより過去にリリースしたソースアーカイブ(.tar.gz)が
packages/source/filesディレクトリ以下へとダウンロードされます。

Debian系パッケージのビルド
Groongaのpackages/aptサブディレクトリに移動して、以下のコマンドを実行します。:

% cd packages/apt
% make build PALALLEL=yes

make build
PALALLEL=yesコマンドを実行すると、ディストリビューションのリリースとアーキテクチャの組み合わせでビルドを平行して行うことができます。

現在サポートされているのは以下の通りです。

· Debian GNU/Linux

· wheezy i386/amd64

· jessie i386/amd64

正常にビルドが終了すると$GROONGA_CLONE_DIR/packages/apt/repositories配下に.debパッケージが生成されます。

make build ではまとめてビルドできないこともあります。
その場合にはディストリビューションごとやアーキテクチャごとなど、個別にビルドすることで問題が発生している箇所を切り分ける必要があります。

生成したパッケージへの署名を行うには以下のコマンドを実行します。:

% make sign-packages

リリース対象のファイルをリポジトリに反映するには以下のコマンドを実行します。:

% make update-repository

リポジトリにGnuPGで署名を行うために以下のコマンドを実行します。:

% make sign-repository

Red Hat系パッケージのビルド
Groongaのpackages/yumサブディレクトリに移動して、以下のコマンドを実行します。:

% cd packages/yum
% make build PALALLEL=yes

make build
PALALLEL=yesコマンドを実行すると、ディストリビューションのリリースとアーキテクチャの組み合わせでビルドを平行して行うことができます。

現在サポートされているのは以下の通りです。

· centos-5 i386/x86_64

· centos-6 i386/x86_64

· centos-7 i386/x86_64

ビルドが正常終了すると$GROONGA_CLONE_DIR/packages/yum/repositories配下にRPMパッケージが生成されます。

· repositories/yum/centos/5/i386/Packages

· repositories/yum/centos/5/x86_64/Packages

· repositories/yum/centos/6/i386/Packages

· repositories/yum/centos/6/x86_64/Packages

· repositories/yum/centos/7/i386/Packages

· repositories/yum/centos/7/x86_64/Packages

リリース対象のRPMに署名を行うには以下のコマンドを実行します。:

% make sign-packages

リリース対象のファイルをリポジトリに反映するには以下のコマンドを実行します。:

% make update-repository

Windows用パッケージのビルド
packages/windowsサブディレクトリに移動して、以下のコマンドを実行します。:

% cd packages/windows
% make build
% make package
% make installer

make
releaseを実行することでbuildからuploadまで一気に実行することができますが、途中で失敗することもあるので順に実行することをおすすめします。

make buildでクロスコンパイルを行います。
正常に終了するとdist-x64/dist-x86ディレクトリ以下にx64/x86バイナリを作成します。

make packageが正常に終了するとzipアーカイブをfilesディレクトリ以下に作成します。

make installerが正常に終了するとWindowsインストーラをfilesディレクトリ以下に作成します。

パッケージの動作確認
ビルドしたパッケージに対しリリース前の動作確認を行います。

Debian系もしくはRed
Hat系の場合には本番環境へとアップロードする前にローカルのaptないしyumのリポジトリを参照して正常に更新できることを確認します。

ここでは以下のようにrubyを利用してリポジトリをwebサーバ経由で参照できるようにします。:

% ruby -run -e httpd -- packages/yum/repositories (yumの場合)
% ruby -run -e httpd -- packages/apt/repositories (aptの場合)

grntestの準備
grntestを実行するためにはGroongaのテストデータとgrntestのソースが必要です。

まずGroongaのソースを任意のディレクトリへと展開します。:

% tar zxvf groonga-(バージョン).tar.gz

次にGroongaのtest/functionディレクトリ以下にgrntestのソースを展開します。
つまりtest/function/grntestという名前でgrntestのソースを配置します。:

% ls test/function/grntest/
README.md binlib license test

grntestの実行方法
grntestではGroongaコマンドを明示的にしていすることができます。
後述のパッケージごとのgrntestによる動作確認では以下のようにして実行します。:

% GROONGA=(groongaのパス指定) test/function/run-test.sh

最後にgrntestによる実行結果が以下のようにまとめて表示されます。:

55 tests, 52 passes, 0 failures, 3 not checked tests.
94.55% passed.

grntestでエラーが発生しないことを確認します。

Debian系の場合
Debian系の場合の動作確認手順は以下の通りとなります。

· 旧バージョンをchroot環境へとインストールする

· chroot環境の/etc/hostsを書き換えてpackages.groonga.orgがホストを 参照するように変更する

· ホストでwebサーバを起動してドキュメントルートをビルド環境のもの
(repositories/apt/packages)に設定する

· アップグレード手順を実行する

· grntestのアーカイブを展開してインストールしたバージョンでテストを実 行する

· grntestの正常終了を確認する

Red Hat系の場合
Red Hat系の場合の動作確認手順は以下の通りとなります。

· 旧バージョンをchroot環境へとインストール

· chroot環境の/etc/hostsを書き換えてpackages.groonga.orgがホストを参照するように変更する

· ホストでwebサーバを起動してドキュメントルートをビルド環境のもの(packages/yum/repositories)に設定する

· アップグレード手順を実行する

· grntestのアーカイブを展開してインストールしたバージョンでテストを実行する

· grntestの正常終了を確認する

Windows向けの場合
· 新規インストール/上書きインストールを行う

· grntestのアーカイブを展開してインストールしたバージョンでテストを実行する

· grntestの正常終了を確認する

zipアーカイブも同様にしてgrntestを実行し動作確認を行います。

リリースアナウンスの作成
リリースの際にはリリースアナウンスを流して、Groongaを広く通知します。

news.txtに変更点をまとめましたが、それを元にリリースアナウンスを作成します。

リリースアナウンスには以下を含めます。

· インストール方法へのリンク

· リリースのトピック紹介

· リリース変更点へのリンク

· リリース変更点(news.txtの内容)

リリースのトピック紹介では、これからGroongaを使う人へアピールする点や既存のバージョンを利用している人がアップグレードする際に必要な情報を提供します。

非互換な変更が含まれるのであれば、回避方法等の案内を載せることも重要です。

参考までに過去のリリースアナウンスへのリンクを以下に示します。

· [Groonga-talk] [ANN] Groonga 2.0.2

· http://sourceforge.net/mailarchive/message.php?msg_id=29195195

· [groonga-dev,00794] [ANN] Groonga 2.0.2

· http://osdn.jp/projects/groonga/lists/archive/dev/2012-April/000794.html

パッケージのアップロード
動作確認が完了し、Debian系、Red
Hat系、Windows向け、ソースコードそれぞれにおいてパッケージやアーカイブのアップロードを行います。

Debian系のパッケージのアップロードには以下のコマンドを実行します。:

% cd packages/apt
% make upload

Red Hat系のパッケージのアップロードには以下のコマンドを実行します。:

% cd packages/yum
% make upload

Windows向けのパッケージのアップロードには以下のコマンドを実行します。:

% cd packages/windows
% make upload

ソースアーカイブのアップロードには以下のコマンドを実行します。:

% cd packages/source
% make upload

アップロードが正常終了すると、リリース対象のリポジトリデータやパッケージ、アーカイブ等がpackages.groonga.orgへと反映されます。

Ubuntu用パッケージのアップロード
Ubuntu向けのパッケージのアップロードには以下のコマンドを実行します。:

% cd packages/ubuntu
% make upload

現在サポートされているのは以下の通りです。

· precise i386/amd64

· trusty i386/amd64

· vivid i386/amd64

アップロードが正常終了すると、launchpad.net上でビルドが実行され、ビルド結果がメールで通知されます。ビルドに成功すると、リリース対象のパッケージがlaunchpad.netのGroongaチームのPPAへと反映されます。公開されているパッケージは以下のURLで確認できます。
https://launchpad.net/~groonga/+archive/ubuntu/ppa

blogroonga(ブログ)の更新
http://groonga.org/blog/ および http://groonga.org/blog/
にて公開されているリリース案内を作成します。

基本的にはリリースアナウンスの内容をそのまま記載します。

cloneしたWebサイトのソースに対して以下のファイルを新規追加します。

· groonga.org/en/_post/(リリース日)-release.md

· groonga.org/ja/_post/(リリース日)-release.md

編集した内容をpushする前に確認したい場合にはJekyllおよびRedCloth(Textileパーサー)、RDiscount(Markdownパーサー)、JavaScript
interpreter(therubyracer、Node.jsなど)が必要です。
インストールするには以下のコマンドを実行します。:

% sudo gem install jekyll RedCloth rdiscount therubyracer

jekyllのインストールを行ったら、以下のコマンドでローカルにwebサーバを起動します。:

% jekyll serve --watch

あとはブラウザにてhttp://localhost:4000にアクセスして内容に問題がないかを確認します。

NOTE:
記事を非公開の状態でアップロードするには.mdファイルのpublished:をfalseに設定します。:

---
layout: post.en
title: Groonga 2.0.5 has been released
published: false
---

ドキュメントのアップロード
doc/source以下のドキュメントを更新、翻訳まで完了している状態で、ドキュメントのアップロード作業を行います。

そのためにはまず以下のコマンドを実行します。:

% make update-document

これによりcloneしておいたgroonga.orgのdoc/locale以下に更新したドキュメントがコピーされます。

生成されているドキュメントに問題のないことを確認できたら、コミット、pushしてgroonga.orgへと反映します。

Homebrewの更新
OS Xでのパッケージ管理方法として Homebrew があります。

Groongaを簡単にインストールできるようにするために、Homebrewへpull requestを送ります。
https://github.com/mxcl/homebrew

すでにGroongaのFormulaは取り込まれているので、リリースのたびにFormulaの内容を更新する作業を実施します。

Groonga 3.0.6のときは以下のように更新してpull requestを送りました。
https://github.com/mxcl/homebrew/pull/21456/files

上記URLを参照するとわかるようにソースアーカイブのurlとsha1チェックサムを更新します。

リリースアナウンス
作成したリリースアナウンスをメーリングリストへと流します。

· groonga-dev [email protected]

· Groonga-talk [email protected]

Twitterでリリースアナウンスをする
blogroongaのリリースエントリには「リンクをあなたのフォロワーに共有する」ためのツイートボタンがあるので、そのボタンを使ってリリースアナウンスします。(画面下部に配置されている)

このボタンを経由する場合、ツイート内容に自動的にリリースタイトル(「groonga
2.0.8リリース」など)とblogroongaのリリースエントリのURLが挿入されます。

この作業はblogroongaの英語版、日本語版それぞれで行います。
あらかじめgroongaアカウントでログインしておくとアナウンスを円滑に行うことができます。

以上でリリース作業は終了です。

リリース後にやること
リリースアナウンスを流し終えたら、次期バージョンの開発が始まります。

· Groonga プロジェクトの新規バージョン追加

· Groonga のbase_versionの更新

Groonga プロジェクトの新規バージョン追加
Groonga プロジェクトの設定ページ にて新規バージョンを追加します。(例: release-2.0.6)

Groonga バージョン更新
$GROONGA_CLONE_DIRにて以下のコマンドを実行します。:

% make update-version NEW_VERSION=2.0.6

これにより$GROONGA_CLONE_DIR/base_versionが更新されるのでコミットしておきます。

NOTE:
base_versionはtar.gzなどのリリース用のファイル名で使用します。

ビルド時のTIPS
ビルドを並列化したい
make build PALALLEL=yesを指定するとchroot環境で並列にビルドを 実行できます。

特定の環境向けのみビルドしたい
Debian系の場合、CODES,ARCHITECTURESオプションを明示的に指定することで、特定のリリース、アーキテクチャのみビルドすることができます。

squeezeのi386のみビルドしたい場合には以下のコマンドを実行します。:

% make build ARCHITECTURES=i386 CODES=squeeze

buildコマンド以外でも build-package-deb
build-repository-debなどのサブタスクでもARCHITECTURES,CODES指定は有効です。

Red
Hat系の場合、ARCHITECTURES,DISTRIBUTIONSオプションを明示的に指定することで、特定のリリース、アーキテクチャのみビルドすることができます。

fedoraのi386のみビルドしたい場合には以下のコマンドを実行します。:

% make build ARCHITECTURES=i386 DISTRIBUTIONS=fedora

buildコマンド以外でも build-in-chroot
build-repository-rpmなどのサブタスクでもARCHITECTURES,DISTRIBUTIONSの指定は有効です。

centosの場合、CENTOS_VERSIONSを指定することで特定のバージョンのみビルドすることができます。

パッケージの署名用のパスフレーズを知りたい
パッケージの署名に必要な秘密鍵のパスフレーズについては
リリース担当者向けの秘密鍵を復号したテキストの1行目に記載してあります。

バージョンを明示的に指定してドキュメントを生成したい
リリース後にドキュメントの一部を差し替えたい場合、特に何も指定しないと生成したHTMLに埋め込まれるバージョンが「v3.0.1-ドキュメント」となってしまうことがあります。gitでのコミット時ハッシュの一部が使われるためです。

これを回避するには、以下のようにDOCUMENT_VERSIONやDOCUMENT_VERSION_FULLを明示的に指定します。:

% make update-document DOCUMENT_VERSION=3.0.1 DOCUMENT_VERSION_FULL=3.0.1

テスト方法
TODO: Write in English.

TODO: Write about test/command/run-test.sh.

テスト環境の構築
Cutterのインストール
Groongaは、テストのフレームワークとして Cutter を用いています。

Cutterのインストール方法は プラットフォーム毎のCutterのインストール方法 をご覧下さい。

lcovのインストール
カバレッジ情報を計測するためには、lcov
1.6以上が必要です。DebianやUbuntuでは以下のようにしてインストールできます。:

% sudo aptitude install -y lcov

clangのインストール
ソースコードの静的解析を行うためには、clang(scan-build)をインストールする必要があります。DebianやUbuntuでは以下のようにしてインストールできます。:

% sudo aptitude install -y clang

libmemcachedのインストール
memcachedのバイナリプロトコルのテストを動作させるためには、libmemcachedの導入が必要です。squeeze以降のDebianやKarmic以降のUubntuでは以下の用にしてインストールできます。:

% sudo aptitude install -y libmemcached-dev

テストの動作
Groongaのトップディレクトリで、以下のコマンドを実行します。:

make check

カバレッジ情報
Groongaのトップディレクトリで、以下のコマンドを実行します。:

make coverage

すると、coverageディレクトリ以下に、カバレッジ情報が入ったhtmlが出力されます。

カバレッジには、Lines/Functions/Branchesの3つの対象があります。それぞれ、行/関数/分岐に対応します。Functionsがもっとも重要な対象です。すべての関数がテストされるようになっていることを心がけてください。

テストがカバーしていない部分の編集は慎重に行ってください。また、テストがカバーしている部分を増やすことも重要です。

様々なテスト
テストは、test/unitディレクトリにおいて、./run-test.shを実行することによっても行えます。run-test.shはいくつかのオプションをとります。詳細は、./run-test.sh
--helpを実行しヘルプをご覧ください。

特定のテスト関数のみテストする
特定のテスト関数(Cutterではテストと呼ぶ)のみをテストすることができます。

実行例:

% ./run-test.sh -n test_text_otoj

特定のテストファイルのみテストする
特定のテストファイル(Cutterではテストケースと呼ぶ)のみテストすることができます。

実行例:

% ./run-test.sh -t test_string

不正メモリアクセス・メモリリーク検出
環境変数CUTTER_CHECK_LEAKをyesと設定すると、valgrindを用いて不正メモリアクセスやメモリリークを検出しつつ、テストを動作させることができます。

run-test.shのみならず、make checkでも利用可能です。

実行例:

% CUTTER_CHECK_LEAK=yes make check

デバッガ上でのテスト実行
環境変数CUTTER_DEBUGをyesと設定すると、テストが実行できる環境が整ったgdbが実行されます。gdb上でrunを行うと、テストの実行が開始されます。

run-test.shのみならず、make checkでも利用可能です。

実行例:

% CUTTER_DEBUG=yes make check

静的解析
scan-buildを用いて、ソースコードの静的解析を行うことができます。scan_buildというディレクトリに解析結果のhtmlが出力されます。:

% scan-build ./configure --prefix=/usr
% make clean
% scan-build -o ./scan_build make -j4

configureは1度のみ実行する必要があります。

· genindex

· modindex

· search

Use groonga online using onworks.net services


Free Servers & Workstations

Download Windows & Linux apps

  • 1
    Brackets
    Brackets
    Brackets is a free, modern open-source
    text editor made especially for Web
    Development. Written in HTML, CSS, and
    JavaScript with focused visual tools and
    prepr...
    Download Brackets
  • 2
    Free Pascal Compiler
    Free Pascal Compiler
    A 32/64/16-bit Pascal compiler for
    Win32/64/CE, Linux, Mac OS X/iOS,
    Android, FreeBSD, OS/2, Game Boy
    Advance, Nintendo NDS and DOS;
    semantically compatible wi...
    Download Free Pascal Compiler
  • 3
    Canon EOS DIGITAL Info
    Canon EOS DIGITAL Info
    Canon doesn�t have shutter count
    included on the EXIF information of an
    image file, as opposed to Nikon and
    Pentax. There�s no official Canon based
    application ...
    Download Canon EOS DIGITAL Info
  • 4
    rEFInd
    rEFInd
    rEFInd is a fork of the rEFIt boot
    manager. Like rEFIt, rEFInd can
    auto-detect your installed EFI boot
    loaders and it presents a pretty GUI
    menu of boot option...
    Download rEFInd
  • 5
    ExpressLuke GSI
    ExpressLuke GSI
    This SourceForge download page was to
    grant users to download my source built
    GSIs, based upon phhusson's great
    work. I build both Android Pie and
    Android 1...
    Download ExpressLuke GSI
  • 6
    Music Caster
    Music Caster
    Music Caster is a tray music player
    that lets you cast your local music to a
    Google Cast device. On the first run,
    you will need to click the arrow in your
    tas...
    Download Music Caster
  • More »

Linux commands

Ad