kcc - Online in the Cloud

This is the command kcc that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator

Run in Ubuntu Run in Fedora Run in Windows Sim Run in MACOS Sim

PROGRAM:

NAME

kcc - Kanji code coverter with encoding auto detection

SYNOPSIS

kcc [ -IOchnvxz ] [ -b bufsize ] [ file ] ...

DESCRIPTION

kcc is a filter that reads file sequencially, converts kanji encodings and output to
stdout. If no file is specified, or specified - as filename, it read from stdin. You can
specify kanji encodings for input/output. However, kcc detect input encodig automatically,
if you don't specify input encoding.

Available kanji encodings are JIS (7 bit and/or 8 bit), Shift JISEUCDEC. For input
encoding, you can mix when these are pair of one of EUC DEC or Shift JIS and 7 bit JIS.
SI/SOESC(I are recognized as halfwidth of JIS.

OPTIONS

-O
-IO I for input kanji encoding¡¤O for output kanji encoding. When no input encoding
specified, it will be detected automatically, and if both of input/output aren't
specified, output encoding is 7 bit JIS.

You can specify one of the followings for the input encoding option, I.

e EUC(available with 7 bit JIS )
d DEC(available with 7 bit JIS )
s Shift JIS(available with 7 bit JIS )
j7 or k
7 bit JIS
8 8 bit JIS

You can specify one of the followings for output encoding option, O.

e EUC
d DEC
s Shift JIS
jXY or 7XY
7 bit JIS(usingSI/SO for JIS kana designation)
kXY 7 bit JIS(usingESC(I for JIS kana designation)
8XY 8 bit JIS

By XY in O option, You can specify which escape sequence used in JIS encoding. BJ
is default. Supplimental kanji designation is fixed to ESC$(D

X Kanji is designated by:
B ESC$B(JIS X0208-1983)
@ ESC$@(JIS X0208-1978)
+ ESC&@ESC$B(JIS X0212-1990)
Y Alpha Numerical is designated by:
B ESC(B(ASCII)
J ESC(J(JIS Roman; JIS X0201)
H ESC(H(Swedish; strongly deprecated)

-v outputs result of input encoding detection to stderr.

-x Extension mode. By auto detection of input encodings, recognize user-defined
characters and extended character region ( out of range of EUC, undefined halfwidth
kana, control character, C1 area and/or extended character region Shift C1 JIS ).
Distinguish between DEC and EUC is done in this mode.

-z Shrink mode. Don't recognize halfwidth kana (except 7 bit JIS ) with input encoding
detection. With this option, accuracy of auto detection of input encodings becomes
much better for file without halfwidth kana.

-h Normally, When converted halfwidth kana to DEC , it becomes fullwidth Katakana.
With this option, it becomes Hiragana.

-n user-defined characters, extended characters and supplimental kanji characters
areconverted to fullwidth white box, and undefined region of halfwidth kana are
converted to halfwidth centered dot.

-b bufsize
specify buffer size. 8kbytes is default.

-c don't convert but check input encoding and print result to stdout. Different with
normal auto-detection, whole contents of file is checked. However, when
inconsistency of encodings is found, abort reading and print "data". Options
except -x¡¤-z are ignored.

EXAMPLES

% kcc -e file
Input encoding are detect automatically, and output is in EUC encoding.

% kcc -sj file1 file2
Two files in Shift JIS concatinated with converting to JIS.

% command | kcc -k+J
output of command are converted to JIS(JIS JIS X0208 JIS JIS Roman¡¤ESC(I Halfwidth
Kana JIS )

% kcc -c file
Encoding of contents of file is detected(no conversion)

BUG

Auto detection of input encoding is well done for normal case, however, it has the
following problems.

7 bit JIS is recognized by escape sequence in certain. EUC and DEC are the same (refered
as EUC series). Halfwidth kana of 8 bit JIS is the same as halfwidth kana of Shift JIS
(refered as Shift JIS series). However, EUC series and JIS , which are both 8 bit
encoding, are sharing the same regions widely. So, the problem in auto detection is
detection of these 2 encodings.

Detection of EUC series/Shift JIS series is done in line by line, When it is found that
it's not Shift JIS series, or it's not EUC series, encoding is determined. When
inconsistensy found, it will be treated as "data" and contents of output is not
guaranteed.

While determined between EUC series/Shift JIS series after 8bit code found, conversions
are pending and put input data in buffer, however, buffer is fulled, it assumes it's EUC
series and forces to start conversion. Rationale. Usually, we can assume that documents
with kanji include JIS non-kanji or JIS first standard, it can be detected in certain if
it is Shift JIS , which does not share region with EUC. So if it can't be determined,
it's very likely to be EUC.

8 bit JIS and it has always even number of halfwidth kana sequences, then it will be
wrongly detected as EUC kanji. Be ceraful.

If input encoding doesn't have halfwidth kana, use -z and accuracy of detection become
much better. This is because shared region are restricted to area of JIS second
standards.

Extended region of Shift JIS user-defined area of EUC, control characters C1 of EUC,
undefined region of halfwidth kana of EUC are out of range of auto detection, so it will
fails to detect encodings if input has these characters. Use -x option to specify
extended mode, or specify input code.

Use kcc online using onworks.net services

Latest Linux & Windows online programs

react-error-boundary

react-error-boundary is a tiny,
well-tested utility that makes Reacts
error boundaries practical and ergonomic
for everyday apps. It wraps Reacts
error-handl...

Enter

Forma

Forma is an experimental vector
graphics renderer written in Rust,
developed by Google to explore
high-performance, parallelized rendering
techniques across mu...

Enter

YouTube-8M

youtube-8m is Googles open source
starter code and reference
implementation for training and
evaluating machine learning models on
the YouTube-8M dataset, one...

Enter

SwissGL

SwissGL is a compact JavaScript library
that provides a streamlined abstraction
layer over the WebGL2 API, designed to
minimize boilerplate when building GPU-a...

Enter

Brain Tokyo Workshop

The Brain Tokyo Workshop repository
hosts a collection of research materials
and experimental code developed by the
Google Brain team based in Tokyo. It
showca...

Enter

Active Learning

Active Learning is a Python-based
research framework developed by Google
for experimenting with and benchmarking
various active learning algorithms. It
provide...

Enter

Skylark

Skylark, now known as Starlark, is an
interpreter for a Python-like language
implemented in Go. It is designed as a
lightweight, deterministic, and
embeddable ...

Enter

WWWBasic

wwwBASIC is a JavaScript-based
implementation of the classic BASIC
programming language designed to run
seamlessly in web browsers and Node.js
environments. Cr...

Enter

SSL Logger

ssl_logger is a Python-based tool that
decrypts and logs a target processs
SSL/TLS traffic on Linux and macOS. It
attaches to a running process by name or
PID...

Enter

kcc - Online in the Cloud

PROGRAM:

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

EXAMPLES

BUG

Latest Linux & Windows online programs

Categories to download Software & Programs for Windows & Linux