This is the command sisu that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator
PROGRAM:
NAME
sisu - documents: markup, structuring, publishing in multiple standard formats, and search
SYNOPSIS
sisu [-short-options|--long-options] [filename/wildcard]
sisu [-abCcDdeFGghIikLMmNnoPpQqRrSsTtUuVvWwXxYyZ_0-9] [filename/wildcard]
sisu --txt --html --epub --odt --pdf --wordmap --sqlite --manpage --texinfo --sisupod
--source --qrcode [filename/wildcard]
sisu [-Ddcv] [instruction] [filename/wildcard]
sisu --pg (--createdb|update [filename/wildcard]|--dropall)
sisu [operations]
sisu [-CcFLSVvW]
sisu (--configure|--webrick|--sample-search-form)
SISU - MANUAL,
RALPH AMISSAH
WHAT IS SISU?
INTRODUCTION - WHAT IS SISU?
SiSU is a lightweight markup based document creation and publishing framework that is
controlled from the command line. Prepare documents for SiSU using your text editor of
choice, then use SiSU to generate various output document formats.
From a single lightly prepared document (plain-text UTF-8 ) sisu custom builds several
standard output formats which share a common (text object) numbering system for citation
of content within a document (that also has implications for search). The sisu engine
works with an abstraction of the document's structure and content from which it is
possible to generate different forms of representation of the document. SiSU produces:
plain-text, HTML, XHTML, XML, EPUB, ODF: ODT (Opendocument), LaTeX, PDF, and populates an
SQL database ( PostgreSQL or SQLite ) with text objects, roughly, paragraph sized chunks
so that document searches are done at this level of granularity.
Outputs share a common citation numbering system, associated with text objects and any
semantic meta-data provided about the document.
SiSU also provides concordance files, document content certificates and manifests of
generated output. Book indexes may be made.
Some document markup samples are provided in the package sisu -markup-samples.
Homepages:
* <http://www.sisudoc.org/>
* <http://www.jus.uio.no/sisu>
COMMANDS SUMMARY
DESCRIPTION
SiSU is a document publishing system, that from a simple single marked-up document,
produces multiple output formats including: plaintext, HTML, XHTML, XML, EPUB, ODT (
OpenDocument ( ODF ) text), LaTeX, PDF, info, and SQL ( PostgreSQL and SQLite ) , which
share text object numbers ("object citation numbering") and the same document structure
information. For more see: <http://sisudoc.org> or <http://www.jus.uio.no/sisu>
DOCUMENT PROCESSING COMMAND FLAGS
-[0-9] [filename/wildcard]
see --act
--ao [filename/wildcard/url]
assumed for most other flags, creates new intermediate files for processing
(abstract objects, document abstraction) that is used in all subsequent processing
of other output. This step is assumed for most processing flags. To skip it see -n.
Alias -m.
--act[s0-9] [filename/wildcard]
--act0 to --act9 configurable shortcuts for multiple flags, -0 to -9 synonyms,
configure in sisurc.yml; sisu default action on a specified file where no flag is
provided is --act0; --act or --acts for information on current actions ascribed to
--act0 to --act9
--asciidoc [filename/wildcard]
asciidoc, smart text (not available)
-b [filename/wildcard]
see --xhtml
--by-* see --output-by-*
-C configure/initialise shared output directory files initialize shared output
directory (config files such as css and dtd files are not updated if they already
exist unless modifier is used). -C --init-site configure/initialise site more
extensive than -C on its own, shared output directory files/force update, existing
shared output config files such as css and dtd files are updated if this modifier
is used.
-c [filename/wildcard]
see --color-toggle
--color
see --color-on
--color-off
turn off color in output to terminal
--color-on
turn on color in output to terminal
--color-toggle [filename/wildcard]
screen toggle ansi screen colour on or off depending on default set (unless -c flag
is used: if sisurc colour default is set to 'true', output to screen will be with
colour, if sisurc colour default is set to 'false' or is undefined screen output
will be without colour). Alias -c
--configure
configure/initialise shared output directory files initialize shared output
directory (config files such as css and dtd files are not updated if they already
exist unless modifier is used). The equivalent of: -C --init-site
configure/initialise site, more extensive than -C on its own, shared output
directory files/force update, existing shared output config files such as css and
dtd files are updated if -CC is used.
--concordance [filename/wildcard]
produces concordance (wordmap) a rudimentary index of all the words in a document.
(Concordance files are not generated for documents of over 260,000 words unless
this limit is increased in the file sisurc.yml). Alias -w
-d [filename/wildcard/url]
see --docbook
--dal [filename/wildcard/url]
(abstract objects, document abstraction renamed abstract objects in sisu5) see --ao
--delete [filename/wildcard]
see --zap
--digests [filename/wildcard/url]
document digest or document content certificate ( DCC ) as sha digest tree of the
document: the digest for the document, and digests for each object contained within
the document (together with information on software versions that produced it)
(digest.txt). --digests -V for verbose digest output to screen.
--docbook [filename/wildcard/url]
docbook xml
--dom [filename/wildcard/url]
see --xml-dom
--dump[=directory_path] [filename/wildcard]
places output in directory specified, if none is specified in the current directory
(pwd). Unlike using default settings HTML files have embedded css. Compare
--redirect
-e [filename/wildcard]
see --epub
--epub [filename/wildcard]
produces an epub document, [sisu version >=2 ] (filename.epub). Alias -e
--errors-as-warnings
override stop processing on error. Alias --no-stop
--exc-*
exclude output feature, overrides configuration settings --exc-numbering, see
--exc-ocn; --exc-ocn, (exclude "object citation numbering", (switches off object
citation numbers), affects html (seg, scroll), epub, xhtml, xml, pdf) ; --exc-toc,
(exclude table of contents, affects html (scroll), epub, pdf) ; --exc-links-to-
manifest, --exc-manifest-links, (exclude links to manifest, affects html (seg,
scroll)); --exc-search-form, (exclude search form, affects html (seg, scroll),
manifest); --exc-minitoc, (exclude mini table of contents, affects html (seg),
concordance, manifest); --exc-manifest-minitoc, (exclude mini table of contents,
affects manifest); --exc-html-minitoc, (exclude mini table of contents, affects
html (seg), concordance); --exc-html-navigation, (exclude navigation, affects html
(seg)); --exc-html-navigation-bar, (exclude navigation bar, affects html (seg));
--exc-html-search-form, (exclude search form, affects html (seg, scroll)); --exc-
html-right-pane, (exclude right pane/column, affects html (seg, scroll)); --exc-
html-top-band, (exclude top band, affects html (seg, scroll), concordance (minitoc
forced on to provide seg navigation)); --exc-segsubtoc (exclude sub table of
contents, affects html (seg), epub) ; see also --inc-*
-F [--webserv=webrick]
see --sample-search-form
-f [optional string part of filename]
see --find
--fictionbook [filename/wildcard/url]
fictionbook xml (not available)
--find [optional string part of filename]
see --glob
-G [optional string part of filename]
see --glob
-g [filename/wildcard]
see --git
--git [filename/wildcard]
produces or updates markup source file structure in a git repo (experimental and
subject to change). Alias -g
--glob [optional string part of filename]
without match string, glob all .sst .ssm files in directory (including language
subdirectories). With match string, find files that match given string in directory
(including language subdirectories). Alias -G, -f, --find
-h [filename/wildcard]
see --html
--harvest *.ss[tm]
makes two lists of sisu output based on the sisu markup documents in a directory:
list of author and authors works (year and titles), and; list by topic with titles
and author. Makes use of header metadata fields (author, title, date,
topic_register). Can be used with maintenance (-M) and remote placement (-R) flags.
--html [filename/wildcard]
produces html output, in two forms (i) segmented text with table of contents
(toc.html and index.html) and (ii) the document in a single file (scroll.html).
Alias -h
--html-scroll [filename/wildcard]
produces html output, the document in a single file (scroll.html) only. Compare
--html-seg and --html
--html-seg [filename/wildcard]
produces html output, segmented text with table of contents (toc.html and
index.html). Compare --html-scroll and --html
--html-strict [filename/wildcard]
produces html with --strict option. see --strict
-I [filename/wildcard]
see --texinfo
-i [filename/wildcard]
see --manpage
--i18n-*
these flags affect output by filetype and filename): --i18n-mono (--monolingual)
output filenames without language code for default language ('en' or as set);
--i18n-multi (--multilingual) language code provided as part of the output
filename, this is the default. Where output is in one language only the language
code may not be desired. see also --output-by-*
--inc-*
include output feature, overrides configuration settings, (usually the default if
none set), has precedence over --exc-* (exclude output feature). Some detail
provided under --exc-*, see --exc-*
-j [filename/wildcard]
copies images associated with a file for use by html, xhtml & xml outputs
(automatically invoked by --dump & redirect).
-k see --color-off
--keep-processing-files [filename/wildcard/url]
see --maintenance
-M [filename/wildcard/url]
see --maintenance
-m [filename/wildcard/url]
see --dal (document abstraction level/layer)
--machine [filename/wildcard/url]
see --dal (document abstraction level/layer)
--maintenance [filename/wildcard/url]
maintenance mode, interim processing files are preserved and their locations
indicated. (also see -V). Aliases -M and --keep-processing-files.
--manifest [filename/wildcard]
produces an html summary of output generated (hyperlinked to content) and document
specific metadata (sisu_manifest.html). This step is assumed for most processing
flags.
--manpage [filename/wildcard]
produces man page of file, not suitable for all outputs. Alias -i
--markdown [filename/wildcard/url]
markdown smart text (not available)
--monolingual
see --i18n-*
--multilingual
see --i18n-*
-N [filename/wildcard/url]
see --digests
-n [filename/wildcard/url]
skip the creation of intermediate processing files (document abstraction) if they
already exist, this skips the equivalent of -m which is otherwise assumed by most
processing flags.
--no-* see --exc-*
--no-stop
override stop processing on error. Alias --erros-as-warnings
--numbering
turn on "object citation numbers". See --inc-ocn and --exc-ocn
-o [filename/wildcard/url]
see --odt
--ocn "object citation numbers". See --inc-ocn and --exc-ocn
--odf [filename/wildcard/url]
see --odt
--odt [filename/wildcard/url]
output basic document in opendocument file format (opendocument.odt). Alias -o
--output-by-*
select output directory structure from 3 alternatives: --output-by-language,
(language directory (based on language code) with filetype (html, epub, pdf etc.)
subdirectories); --output-by-filetype, (filetype directories with language code as
part of filename); --output-by-filename, (filename directories with language code
as part of filename). This is configurable. Alias --by-*
-P [language_directory/filename language_directory]
see --po4a
-p [filename/wildcard]
see --pdf
--papersize-(a4|a5|b5|letter|legal)
in conjunction with --pdf set pdf papersize, overriding any configuration settings,
to set more than one papersize repeat the option --pdf --papersize-a4 --papersize-
letter. See also --papersize=*
--papersize=a4,a5,b5,letter,legal in conjunction with --pdf set pdf papersize,
overriding any configuration settings, to set more than one papersize list after
the equal sign with a comma separator --papersize=a4,letter. See also --papersize-*
--pdf [filename/wildcard]
produces LaTeX pdf (portrait.pdf & landscape.pdf). Orientation and papersize may be
set on the command-line. Default paper size is set in config file, or document
header, or provided with additional command line parameter, e.g. --papersize-a4
preset sizes include: 'A4', U.S. 'letter' and 'legal' and book sizes 'A5' and 'B5'
(system defaults to A4), and; --landscape or --portrait, so: e.g. "sisu --pdf-a4
--pdf-letter --landscape --verbose [filename/wildcard]" or "sisu --pdf --landscape
--a4 --letter --verbose [filename/wildcard]". --pdf defaults to both landscape &
portrait output, and a4 if no other papersizes are configured. Related options
--pdf-landscape --pdf-portrait --pdf-papersize-* --pdf-papersize=[list]. Alias -p
--pdf-l [filename/wildcard]
See --pdf-landscape
--pdf-landscape [filename/wildcard]
sets orientation, produces LaTeX pdf landscape.pdf. Default paper size is set in
config file, or document header, or provided with additional command line
parameter, e.g. --papersize-a4 preset sizes include: 'A4', U.S. 'letter' and
'legal' and book sizes 'A5' and --papersize-* or --papersize=[list]. Alias --pdf-l
or in conjunction with --pdf --landscape
--pdf-p [filename/wildcard]
See --pdf-portrait
--pdf-portrait [filename/wildcard]
sets orientation, produces LaTeX pdf portrait.pdf.pdf. Default paper size is set in
config file, or document header, or provided with additional command line
parameter, e.g. --papersize-a4 preset sizes include: 'A4', U.S. 'letter' and
'legal' and book sizes 'A5' and --papersize-* or --papersize=[list]. Alias --pdf-p
or in conjunction with --pdf --portrait
--pg-[instruction] [filename]
database PostgreSQL ( --pgsql may be used instead) possible instructions, include:
--pg-createdb; --pg-create; --pg-dropall; --pg-import [filename]; --pg-update
[filename]; --pg-remove [filename]; see database section below.
--po [language_directory/filename language_directory]
see --po4a
--po4a [language_directory/filename language_directory]
produces .pot and po files for the file in the languages specified by the language
directory. SiSU markup is placed in subdirectories named with the language code,
e.g. en/ fr/ es/. The sisu config file must set the output directory structure to
multilingual. v3, experimental
-Q [filename/wildcard]
see --qrcode
-q [filename/wildcard]
see --quiet
--qrcode [filename/wildcard]
generate QR code image of metadata (used in manifest).
--quiet [filename/wildcard]
quiet less output to screen.
-R [filename/wildcard]
see --rsync
-r [filename/wildcard]
see --scp
--redirect[=directory_path] [filename/wildcard]
places output in subdirectory under specified directory, subdirectory uses the
filename (without the suffix). If no output directory is specified places the
subdirectory under the current directory (pwd). Unlike using default settings HTML
files have embedded css. Compare --dump
--rst [filename/wildcard/url]
ReST (rST restructured text) smart text (not available)
--rsync [filename/wildcard]
copies sisu output files to remote host using rsync. This requires that sisurc.yml
has been provided with information on hostname and username, and that you have your
"keys" and ssh agent in place. Note the behavior of rsync different if -R is used
with other flags from if used alone. Alone the rsync --delete parameter is sent,
useful for cleaning the remote directory (when -R is used together with other
flags, it is not). Also see --scp. Alias -R
-S see --sisupod
-S [filename/wildcard]
see --sisupod
-s [filename/wildcard]
see --source
--sample-search-form [--db-(pg|sqlite)]
generate examples of (naive) cgi search form for SQLite or PgSQL depends on your
already having used sisu to populate an SQLite or PgSQL database, (the SQLite
version scans the output directories for existing sisu_sqlite databases, so it is
first necessary to create them, before generating the search form) see --sqlite &
--pg and the database section below. Optional additional parameters: --db-
user='www-data'. The samples are dumped in the present work directory which must be
writable, (with screen instructions given that they be copied to the cgi-bin
directory). Alias -F
--sax [filename/wildcard/url]
see --xml-sax
--scp [filename/wildcard]
copies sisu output files to remote host using scp. This requires that sisurc.yml
has been provided with information on hostname and username, and that you have your
"keys" and ssh agent in place. Also see --rsync. Alias -r
--sha256
set hash digest where used to sha256
--sha512
set hash digest where used to sha512
--sqlite-[instruction] [filename]
database type set to SQLite, this produces one of two possible databases, without
additional database related instructions it produces a discreet SQLite file for the
document processed; with additional instructions it produces a common SQLite
database of all processed documents that (come from the same document preparation
directory and as a result) share the same output directory base path (possible
instructions include: --sqlite-createdb; --sqlite-create; --sqlite-dropall;
--sqlite-import [filename]; --sqlite-update [filename]; --sqlite-remove
[filename]); see database section below.
--sisupod
produces a sisupod a zipped sisu directory of markup files including sisu markup
source files and the directories local configuration file, images and skins. Note:
this only includes the configuration files or skins contained in is tested only
with zsh). Alias -S
--sisupod [filename/wildcard]
produces a zipped file of the prepared document specified along with associated
images, by default named sisupod.zip they may alternatively be named with the
filename extension .ssp This provides a quick way of gathering the relevant parts
of a sisu document which can then for example be emailed. A sisupod includes sisu
markup source file, (along with associated documents if a master file, or available
in multilingual versions), together with related images and skin. SiSU commands
can be run directly against a sisupod contained in a local directory, or provided
as a url on a remote site. As there is a security issue with skins provided by
other users, they are not applied unless the flag --trust or --trusted is added to
the command instruction, it is recommended that file that are not your own are
treated as untrusted. The directory structure of the unzipped file is understood by
sisu, and sisu commands can be run within it. Note: if you wish to send multiple
files, it quickly becomes more space efficient to zip the sisu markup directory,
rather than the individual files for sending). See the -S option without
[filename/wildcard]. Alias -S
--source [filename/wildcard]
copies sisu markup file to output directory. Alias -s
--strict
together with --html, produces more w3c compliant html, for example not having
purely numeric identifiers for text, the location object url#33 becomes url#o33
-T [filename/wildcard (*.termsheet.rb)]
standard form document builder, preprocessing feature
-t [filename/wildcard]
see --txt
--texinfo [filename/wildcard]
produces texinfo and info file, (view with pinfo). Alias -I
--textile [filename/wildcard/url]
textile smart text (not available)
--txt [filename/wildcard]
produces plaintext with Unix linefeeds and without markup, (object numbers are
omitted), has footnotes at end of each paragraph that contains them [ -A for
equivalent dos (linefeed) output file] [see -e for endnotes]. (Options include:
--endnotes for endnotes --footnotes for footnotes at the end of each paragraph
--unix for unix linefeed (default) --msdos for msdos linefeed). Alias -t
--txt-asciidoc [filename/wildcard]
see --asciidoc
--txt-markdown [filename/wildcard]
see --markdown
--txt-rst [filename/wildcard]
see --rst
--txt-textile [filename/wildcard]
see --textile
-U [filename/wildcard]
see --urls
-u [filename/wildcard]
provides url mapping of output files for the flags requested for processing, also
see -U
--urls [filename/wildcard]
prints url output list/map for the available processing flags options and resulting
files that could be requested, (can be used to get a list of processing options in
relation to a file, together with information on the output that would be
produced), -u provides url output mapping for those flags requested for processing.
The default assumes sisu_webrick is running and provides webrick url mappings where
appropriate, but these can be switched to file system paths in sisurc.yml. Alias -U
-V on its own, provides SiSU version and environment information (sisu --help env)
-V [filename/wildcard]
even more verbose than the -v flag.
-v on its own, provides SiSU version information
-v [filename/wildcard]
see --verbose
--verbose [filename/wildcard]
provides verbose output of what is being generated, where output is placed (and
error messages if any), as with -u flag provides a url mapping of files created for
each of the processing flag requests. Alias -v
--very-verbose [filename/wildcard]
provides more verbose output of what is being generated. See --verbose. Alias -V
--version
sisu version
-W see --webrick
-w [filename/wildcard]
see --concordance
--webrick
starts ruby' s webrick webserver points at sisu output directories, the default
port is set to 8081 and can be changed in the resource configuration files. [tip:
the webrick server requires link suffixes, so html output should be created using
the -h option rather than -H ; also, note -F webrick ]. Alias -W
--wordmap [filename/wildcard]
see --concordance
--xhtml [filename/wildcard]
produces xhtml/ XML output for browser viewing (sax parsing). Alias -b
--xml-dom [filename/wildcard]
produces XML output with deep document structure, in the nature of dom. Alias -X
--xml-sax [filename/wildcard]
produces XML output shallow structure (sax parsing). Alias -x
-X [filename/wildcard]
see --xml-dom
-x [filename/wildcard]
see --xml-sax
-Y [filename/wildcard]
produces a short sitemap entry for the document, based on html output and the
sisu_manifest. --sitemaps generates/updates the sitemap index of existing sitemaps.
(Experimental, [g,y,m announcement this week])
-y [filename/wildcard]
see --manifest
-Z [filename/wildcard]
see --zap
--zap [filename/wildcard]
Zap, if used with other processing flags deletes output files of the type about to
be processed, prior to processing. If -Z is used as the lone processing related
flag (or in conjunction with a combination of -[mMvVq]), will remove the related
document output directory. Alias -Z
COMMAND LINE MODIFIERS
--no-ocn
[with --html --pdf or --epub] switches off object citation numbering. Produce
output without identifying numbers in margins of html or LaTeX /pdf output.
--no-annotate
strips output text of editor endnotes[^*1] denoted by asterisk or dagger/plus sign
--no-asterisk
strips output text of editor endnotes[^*2] denoted by asterisk sign
--no-dagger
strips output text of editor endnotes[^+1] denoted by dagger/plus sign
DATABASE COMMANDS
dbi - database interface
--pg or --pgsql set for PostgreSQL --sqlite default set for SQLite -d is modifiable with
--db=[database type (PgSQL or SQLite ) ]
--pg -v --createall
initial step, creates required relations (tables, indexes) in existing PostgreSQL
database (a database should be created manually and given the same name as working
directory, as requested) (rb.dbi) [ -dv --createall SQLite equivalent] it may be
necessary to run sisu -Dv --createdb initially NOTE: at the present time for
PostgreSQL it may be necessary to manually create the database. The command would
be directory name (without path)]. Please use only alphanumerics and underscores.
--pg -v --import
[filename/wildcard] imports data specified to PostgreSQL db (rb.dbi) [ -dv --import
SQLite equivalent]
--pg -v --update
[filename/wildcard] updates/imports specified data to PostgreSQL db (rb.dbi) [ -dv
--update SQLite equivalent]
--pg --remove
[filename/wildcard] removes specified data to PostgreSQL db (rb.dbi) [ -d --remove
SQLite equivalent]
--pg --dropall
kills data" and drops ( PostgreSQL or SQLite ) db, tables & indexes [ -d --dropall
SQLite equivalent]
The -v is for verbose output.
COMMAND LINE WITH FLAGS - BATCH PROCESSING
In the data directory run sisu -mh filename or wildcard eg. "sisu -h cisg.sst" or "sisu
-h *.{sst,ssm}" to produce html version of all documents.
Running sisu (alone without any flags, filenames or wildcards) brings up the interactive
help, as does any sisu command that is not recognised. Enter to escape.
HELP
SISU MANUAL
The most up to date information on sisu should be contained in the sisu_manual, available
at:
<http://sisudoc.org/sisu/sisu_manual/>
The manual can be generated from source, found respectively, either within the SiSU
tarball or installed locally at:
./data/doc/sisu/markup-samples/sisu_manual
/usr/share/doc/sisu/markup-samples/sisu_manual
move to the respective directory and type e.g.:
sisu sisu_manual.ssm
SISU MAN PAGES
If SiSU is installed on your system usual man commands should be available, try:
man sisu
Most SiSU man pages are generated directly from sisu documents that are used to prepare
the sisu manual, the sources files for which are located within the SiSU tarball at:
./data/doc/sisu/markup-samples/sisu_manual
Once installed, directory equivalent to:
/usr/share/doc/sisu/markup-samples/sisu_manual
Available man pages are converted back to html using man2html:
/usr/share/doc/sisu/html/
./data/doc/sisu/html
An online version of the sisu man page is available here:
* various sisu man pages <http://www.jus.uio.no/sisu/man/> [^1]
* sisu.1 <http://www.jus.uio.no/sisu/man/sisu.1.html> [^2]
SISU BUILT-IN INTERACTIVE HELP, [DISCONTINUED]
This fell out of date and has been discontinued.
INTRODUCTION TO SISU MARKUP[^3]
SUMMARY
SiSU source documents are plaintext ( UTF-8 )[^4] files
All paragraphs are separated by an empty line.
Markup is comprised of:
* at the top of a document, the document header made up of semantic meta-data about the
document and if desired additional processing instructions (such an instruction to
automatically number headings from a particular level down)
* followed by the prepared substantive text of which the most important single
characteristic is the markup of different heading levels, which define the primary outline
of the document structure. Markup of substantive text includes:
* heading levels defines document structure
* text basic attributes, italics, bold etc.
* grouped text (objects), which are to be treated differently, such as code
blocks or poems.
* footnotes/endnotes
* linked text and images
* paragraph actions, such as indent, bulleted, numbered-lists, etc.
MARKUP RULES, DOCUMENT STRUCTURE AND METADATA REQUIREMENTS
minimal content/structure requirement:
[metadata]
A~ (level A [title])
1~ (at least one level 1 [segment/(chapter)])
structure rules (document heirarchy, heading levels):
there are two sets of heading levels ABCD (title & parts if any) and 123 (segment &
subsegments if any)
sisu has the fllowing levels:
A~ [title] .
required (== 1) followed by B~ or 1~
B~ [part] *
followed by C~ or 1~
C~ [subpart] *
followed by D~ or 1~
D~ [subsubpart] *
followed by 1~
1~ [segment (chapter)] +
required (>= 1) followed by text or 2~
text *
followed by more text or 1~, 2~
or relevant part *()
2~ [subsegment] *
followed by text or 3~
text *
followed by more text or 1~, 2~ or 3~
or relevant part, see *()
3~ [subsubsegment] *
followed by text
text *
followed by more text or 1~, 2~ or 3~ or relevant part, see *()
*(B~ if none other used;
if C~ is last used: C~ or B~;
if D~ is used: D~, C~ or B~)
* level A~ is the tile and is mandatory
* there can only be one level A~
* heading levels BCD, are optional and there may be several of each
(where all three are used corresponding to e.g. Book Part Section)
* sublevels that are used must follow each other sequentially
(alphabetically),
* heading levels A~ B~ C~ D~ are followed by other heading levels rather
than substantive text
which may be the subsequent sequential (alphabetic) heading part level
or a heading (segment) level 1~
* there must be at least one heading (segment) level 1~
(the level on which the text is segmented, in a book would correspond
to the Chapter level)
* additional heading levels 1~ 2~ 3~ are optional and there may be several
of each
* heading levels 1~ 2~ 3~ are followed by text (which may be followed by
the same heading level)
and/or the next lower numeric heading level (followed by text)
or indeed return to the relevant part level
(as a corollary to the rules above substantive text/ content
must be preceded by a level 1~ (2~ or 3~) heading)
MARKUP EXAMPLES
ONLINE
Online markup examples are available together with the respective outputs produced from
<http://www.jus.uio.no/sisu/SiSU/examples.html> or from
<http://www.jus.uio.no/sisu/sisu_examples/>
There is of course this document, which provides a cursory overview of sisu markup and
the respective output produced: <http://www.jus.uio.no/sisu/sisu_markup/>
an alternative presentation of markup syntax: /usr/share/doc/sisu/on_markup.txt.gz
INSTALLED
With SiSU installed sample skins may be found in: /usr/share/doc/sisu/markup-samples (or
equivalent directory) and if sisu -markup-samples is installed also under:
/usr/share/doc/sisu/markup-samples-non-free
MARKUP OF HEADERS
Headers contain either: semantic meta-data about a document, which can be used by any
output module of the program, or; processing instructions.
Note: the first line of a document may include information on the markup version used in
the form of a comment. Comments are a percentage mark at the start of a paragraph (and as
the first character in a line of text) followed by a space and the comment:
% this would be a comment
SAMPLE HEADER
This current document is loaded by a master document that has a header similar to this
one:
% SiSU master 4.0
@title: SiSU
:subtitle: Manual
@creator:
:author: Amissah, Ralph
@publisher: [publisher name]
@rights: Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL 3
@classify:
:topic_register: SiSU:manual;electronic documents:SiSU:manual
:subject: ebook, epublishing, electronic book, electronic publishing,
electronic document, electronic citation, data structure,
citation systems, search
% used_by: manual
@date:
:published: 2008-05-22
:created: 2002-08-28
:issued: 2002-08-28
:available: 2002-08-28
:modified: 2010-03-03
@make:
:num_top: 1
:breaks: new=C; break=1
:bold: /Gnu|Debian|Ruby|SiSU/
:home_button_text: {SiSU}http://sisudoc.org; {git}http://git.sisudoc.org
:footer: {SiSU}http://sisudoc.org; {git}http://git.sisudoc.org
:manpage: name=sisu - documents: markup, structuring, publishing in multiple standard formats, and search;
synopsis=sisu [-abcDdeFhIiMmNnopqRrSsTtUuVvwXxYyZz0-9] [filename/wildcard ]
. sisu [-Ddcv] [instruction]
. sisu [-CcFLSVvW]
@links:
{ SiSU Homepage }http://www.sisudoc.org/
{ SiSU Manual }http://www.sisudoc.org/sisu/sisu_manual/
{ Book Samples & Markup Examples }http://www.jus.uio.no/sisu/SiSU/examples.html
{ SiSU Download }http://www.jus.uio.no/sisu/SiSU/download.html
{ SiSU Changelog }http://www.jus.uio.no/sisu/SiSU/changelog.html
{ SiSU Git repo }http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary
{ SiSU List Archives }http://lists.sisudoc.org/pipermail/sisu/
{ SiSU @ Debian }http://packages.qa.debian.org/s/sisu.html
{ SiSU Project @ Debian }http://qa.debian.org/developer.php?login=[email protected]
{ SiSU @ Wikipedia }http://en.wikipedia.org/wiki/SiSU
AVAILABLE HEADERS
Header tags appear at the beginning of a document and provide meta information on the
document (such as the Dublin Core ) , or information as to how the document as a whole is
to be processed. All header instructions take the form @headername: or on the next line
and indented by once space :subheadername: All Dublin Core meta tags are available
@identifier: information or instructions
where the "identifier" is a tag recognised by the program, and the "information" or
"instructions" belong to the tag/identifier specified
Note: a header where used should only be used once; all headers apart from @title: are
optional; the @structure: header is used to describe document structure, and can be useful
to know.
This is a sample header
% SiSU 2.0 [declared file-type identifier with markup version]
@title: [title text] [this header is the only one that is mandatory]
:subtitle: [subtitle if any]
:language: English
@creator:
:author: [Lastname, First names]
:illustrator: [Lastname, First names]
:translator: [Lastname, First names]
:prepared_by: [Lastname, First names]
@date:
:published: [year or yyyy-mm-dd]
:created: [year or yyyy-mm-dd]
:issued: [year or yyyy-mm-dd]
:available: [year or yyyy-mm-dd]
:modified: [year or yyyy-mm-dd]
:valid: [year or yyyy-mm-dd]
:added_to_site: [year or yyyy-mm-dd]
:translated: [year or yyyy-mm-dd]
@rights:
:copyright: Copyright (C) [Year and Holder]
:license: [Use License granted]
:text: [Year and Holder]
:translation: [Name, Year]
:illustrations: [Name, Year]
@classify:
:topic_register: SiSU:markup sample:book;book:novel:fantasy
:type:
:subject:
:description:
:keywords:
:abstract:
:loc: [Library of Congress classification]
:dewey: [Dewey classification
@identify:
:isbn: [ISBN]
:oclc:
@links: { SiSU }http://www.sisudoc.org
{ FSF }http://www.fsf.org
@make:
:num_top: 1
:headings: [text to match for each level
(e.g. PART; Chapter; Section; Article; or another: none; BOOK|FIRST|SECOND; none; CHAPTER;)
:breaks: new=:C; break=1
:promo: sisu, ruby, sisu_search_libre, open_society
:bold: [regular expression of words/phrases to be made bold]
:italics: [regular expression of words/phrases to italicise]
:home_button_text: {SiSU}http://sisudoc.org; {git}http://git.sisudoc.org
:footer: {SiSU}http://sisudoc.org; {git}http://git.sisudoc.org
@original:
:language: [language]
@notes:
:comment:
:prefix: [prefix is placed just after table of contents]
MARKUP OF SUBSTANTIVE TEXT
HEADING LEVELS
Heading levels are :A~ ,:B~ ,:C~ ,1~ ,2~ ,3~ ... :A - :C being part / section headings,
followed by other heading levels, and 1 -6 being headings followed by substantive text or
sub-headings. :A~ usually the title :A~? conditional level 1 heading (used where a stand-
alone document may be imported into another)
:A~ [heading text] Top level heading [this usually has similar content to the title
@title: ] NOTE: the heading levels described here are in 0.38 notation, see heading
:B~ [heading text] Second level heading [this is a heading level divider]
:C~ [heading text] Third level heading [this is a heading level divider]
1~ [heading text] Top level heading preceding substantive text of document or sub-heading
2, the heading level that would normally be marked 1. or 2. or 3. etc. in a document, and
the level on which sisu by default would break html output into named segments, names are
provided automatically if none are given (a number), otherwise takes the form
1~my_filename_for_this_segment
2~ [heading text] Second level heading preceding substantive text of document or sub-
heading 3 , the heading level that would normally be marked 1.1 or 1.2 or 1.3 or 2.1 etc.
in a document.
3~ [heading text] Third level heading preceding substantive text of document, that would
normally be marked 1.1.1 or 1.1.2 or 1.2.1 or 2.1.1 etc. in a document
1~filename level 1 heading,
% the primary division such as Chapter that is followed by substantive text, and may be further subdivided (this is the level on which by default html segments are made)
FONT ATTRIBUTES
markup example:
normal text, *{emphasis}*, !{bold text}!, /{italics}/, _{underscore}_, "{citation}",
^{superscript}^, ,{subscript},, +{inserted text}+, -{strikethrough}-, #{monospace}#
normal text
*{emphasis}* [note: can be configured to be represented by bold, italics or underscore]
!{bold text}!
/{italics}/
_{underscore}_
"{citation}"
^{superscript}^
,{subscript},
+{inserted text}+
-{strikethrough}-
#{monospace}#
resulting output:
normal text, emphasis, bold text , italics, underscore, "citation", ^superscript^,
[subscript], ++inserted text++, --strikethrough--, monospace
normal text
emphasis [note: can be configured to be represented by bold, italics or underscore]
bold text
italics
underscore
"citation"
^superscript^
[subscript]
++inserted text++
--strikethrough--
monospace
INDENTATION AND BULLETS
markup example:
ordinary paragraph
_1 indent paragraph one step
_2 indent paragraph two steps
_9 indent paragraph nine steps
resulting output:
ordinary paragraph
indent paragraph one step
indent paragraph two steps
indent paragraph nine steps
markup example:
_* bullet text
_1* bullet text, first indent
_2* bullet text, two step indent
resulting output:
* bullet text
* bullet text, first indent
* bullet text, two step indent
Numbered List (not to be confused with headings/titles, (document structure))
markup example:
# numbered list numbered list 1., 2., 3, etc.
_# numbered list numbered list indented a., b., c., d., etc.
HANGING INDENTS
markup example:
_0_1 first line no indent,
rest of paragraph indented one step
_1_0 first line indented,
rest of paragraph no indent
in each case level may be 0-9
resulting output:
first line no indent, rest of paragraph indented one step; first line no
indent, rest of paragraph indented one step; first line no indent, rest of
paragraph indented one step; first line no indent, rest of paragraph indented
one step; first line no indent, rest of paragraph indented one step; first
line no indent, rest of paragraph indented one step; first line no indent,
rest of paragraph indented one step; first line no indent, rest of paragraph
indented one step; first line no indent, rest of paragraph indented one step;
A regular paragraph.
first line indented, rest of paragraph no indent first line indented, rest of paragraph
no indent first line indented, rest of paragraph no indent first line indented, rest of
paragraph no indent first line indented, rest of paragraph no indent first line indented,
rest of paragraph no indent first line indented, rest of paragraph no indent first line
indented, rest of paragraph no indent first line indented, rest of paragraph no indent
first line indented, rest of paragraph no indent first line indented, rest of paragraph no
indent
in each case level may be 0-9
live-build
A collection of scripts used to build customized Debian
Livesystems.
.I live-build
was formerly known as live-helper, and even earlier known as live-package.
live-build
A collection of scripts used to build customized Debian
Livesystems. live-build
was formerly known as live-helper, and even earlier known as live-package.
FOOTNOTES / ENDNOTES
Footnotes and endnotes are marked up at the location where they would be indicated within
a text. They are automatically numbered. The output type determines whether footnotes or
endnotes will be produced
markup example:
~{ a footnote or endnote }~
resulting output:
[^5]
markup example:
normal text~{ self contained endnote marker & endnote in one }~ continues
resulting output:
normal text[^6] continues
markup example:
normal text ~{* unnumbered asterisk footnote/endnote, insert multiple asterisks if required }~ continues
normal text ~{** another unnumbered asterisk footnote/endnote }~ continues
resulting output:
normal text [^*] continues
normal text [^**] continues
markup example:
normal text ~[* editors notes, numbered asterisk footnote/endnote series ]~ continues
normal text ~[+ editors notes, numbered plus symbol footnote/endnote series ]~ continues
resulting output:
normal text [^*3] continues
normal text [^+2] continues
Alternative endnote pair notation for footnotes/endnotes:
% note the endnote marker "~^"
normal text~^ continues
^~ endnote text following the paragraph in which the marker occurs
the standard and pair notation cannot be mixed in the same document
LINKS
NAKED URLS WITHIN TEXT, DEALING WITH URLS
urls found within text are marked up automatically. A url within text is automatically
hyperlinked to itself and by default decorated with angled braces, unless they are
contained within a code block (in which case they are passed as normal text), or escaped
by a preceding underscore (in which case the decoration is omitted).
markup example:
normal text http://www.sisudoc.org/ continues
resulting output:
normal text <http://www.sisudoc.org/> continues
An escaped url without decoration
markup example:
normal text _http://www.sisudoc.org/ continues
deb _http://www.jus.uio.no/sisu/archive unstable main non-free
resulting output:
normal text <_http://www.sisudoc.org/> continues
deb <_http://www.jus.uio.no/sisu/archive> unstable main non-free
where a code block is used there is neither decoration nor hyperlinking, code blocks are
discussed later in this document
resulting output:
deb http://www.jus.uio.no/sisu/archive unstable main non-free
deb-src http://www.jus.uio.no/sisu/archive unstable main non-free
LINKING TEXT
To link text or an image to a url the markup is as follows
markup example:
about { SiSU }http://url.org markup
resulting output:
aboutSiSU <http://www.sisudoc.org/> markup
A shortcut notation is available so the url link may also be provided automatically as a
footnote
markup example:
about {~^ SiSU }http://url.org markup
resulting output:
aboutSiSU <http://www.sisudoc.org/> [^7] markup
Internal document links to a tagged location, including an ocn
markup example:
about { text links }#link_text
resulting output:
about ⌠text links⌡⌈link_text⌋
Shared document collection link
markup example:
about { SiSU book markup examples }:SiSU/examples.html
resulting output:
about ⌠ SiSU book markup examples⌡⌈:SiSU/examples.html⌋
LINKING IMAGES
markup example:
{ tux.png 64x80 }image
% various url linked images
{tux.png 64x80 "a better way" }http://www.sisudoc.org/
{GnuDebianLinuxRubyBetterWay.png 100x101 "Way Better - with Gnu/Linux, Debian and Ruby" }http://www.sisudoc.org/
{~^ ruby_logo.png "Ruby" }http://www.ruby-lang.org/en/
resulting output:
[ tux.png ]
tux.png 64x80 "Gnu/Linux - a better way" <http://www.sisudoc.org/>
GnuDebianLinuxRubyBetterWay.png 100x101 "Way Better - with Gnu/Linux, Debian and Ruby"
<http://www.sisudoc.org/>
ruby_logo.png 70x90 "Ruby" <http://www.ruby-lang.org/en/> [^8]
linked url footnote shortcut
{~^ [text to link] }http://url.org
% maps to: { [text to link] }http://url.org ~{ http://url.org }~
% which produces hyper-linked text within a document/paragraph, with an endnote providing the url for the text location used in the hyperlink
text marker *~name
note at a heading level the same is automatically achieved by providing names to headings
1, 2 and 3 i.e. 2~[name] and 3~[name] or in the case of auto-heading numbering, without
further intervention.
LINK SHORTCUT FOR MULTIPLE VERSIONS OF A SISU DOCUMENT IN THE SAME DIRECTORY
TREE
markup example:
!_ /{"Viral Spiral"}/, David Bollier
{ "Viral Spiral", David Bollier [3sS]}viral_spiral.david_bollier.sst
Viral Spiral , David Bollier "Viral Spiral", David Bollier
<http://corundum/sisu_manual/en/manifest/viral_spiral.david_bollier.html>
document manifest
<http://corundum/sisu_manual/en/manifest/viral_spiral.david_bollier.html>
⌠html, segmented
text⌡「http://corundum/sisu_manual/en/html/viral_spiral.david_bollier.html」
⌠html, scroll, document in
one⌡「http://corundum/sisu_manual/en/html/viral_spiral.david_bollier.html」
⌠epub⌡「http://corundum/sisu_manual/en/epub/viral_spiral.david_bollier.epub」
⌠pdf,
landscape⌡「http://corundum/sisu_manual/en/pdf/viral_spiral.david_bollier.pdf」
⌠pdf, portrait⌡「http://corundum/sisu_manual/en/pdf/viral_spiral.david_bollier.pdf」
⌠odf: odt, open document
text⌡「http://corundum/sisu_manual/en/odt/viral_spiral.david_bollier.odt」
⌠xhtml
scroll⌡「http://corundum/sisu_manual/en/xhtml/viral_spiral.david_bollier.xhtml」
⌠xml, sax⌡「http://corundum/sisu_manual/en/xml/viral_spiral.david_bollier.xml」
⌠xml, dom⌡「http://corundum/sisu_manual/en/xml/viral_spiral.david_bollier.xml」
⌠concordance⌡「http://corundum/sisu_manual/en/html/viral_spiral.david_bollier.html」
⌠dcc, document content certificate
(digests)⌡「http://corundum/sisu_manual/en/digest/viral_spiral.david_bollier.txt」
⌠markup source
text⌡「http://corundum/sisu_manual/en/src/viral_spiral.david_bollier.sst」
⌠markup source (zipped)
pod⌡「http://corundum/sisu_manual/en/pod/viral_spiral.david_bollier.sst.zip」
GROUPED TEXT / BLOCKED TEXT
There are two markup syntaxes for blocked text, using curly braces or using tics
BLOCKED TEXT CURLY BRACE SYNTAX
at the start of a line on its own use name of block type with an opening curly brace,
follow with the content of the block, and close with a closing curly brace and the name of
the block type, e.g.
code{
this is a code block
}code
poem{
this here is a poem
}poem
BLOCKED TEXT TIC SYNTAX
``` code
this is a code block
```
``` poem
this here is a poem
```
start a line with three backtics, a space followed by the name of the name of block type,
follow with the content of the block, and close with three back ticks on a line of their
own, e.g.
TABLES
Tables may be prepared in two either of two forms
markup example:
table{ c3; 40; 30; 30;
This is a table
this would become column two of row one
column three of row one is here
And here begins another row
column two of row two
column three of row two, and so on
}table
resulting output: This is a table|this would become column two of row one|column three of
row one is here』And here begins another row|column two of row two|column three of row
two, and so on』
a second form may be easier to work with in cases where there is not much information in
each column
markup example: [^9]
!_ Table 3.1: Contributors to Wikipedia, January 2001 - June 2005
{table~h 24; 12; 12; 12; 12; 12; 12;}
|Jan. 2001|Jan. 2002|Jan. 2003|Jan. 2004|July 2004|June 2006
Contributors* | 10| 472| 2,188| 9,653| 25,011| 48,721
Active contributors** | 9| 212| 846| 3,228| 8,442| 16,945
Very active contributors*** | 0| 31| 190| 692| 1,639| 3,016
No. of English language articles| 25| 16,000| 101,000| 190,000| 320,000| 630,000
No. of articles, all languages | 25| 19,000| 138,000| 490,000| 862,000|1,600,000
* Contributed at least ten times; ** at least 5 times in last month; *** more than 100 times in last month.
resulting output:
Table 3.1: Contributors to Wikipedia, January 2001 - June 2005 |Jan. 2001|Jan. 2002|Jan.
2003|Jan. 2004|July 2004|June 2006』Contributors*|10|472|2,188|9,653|25,011|48,721』Active
contributors**|9|212|846|3,228|8,442|16,945』Very active
contributors***|0|31|190|692|1,639|3,016』No. of English language
articles|25|16,000|101,000|190,000|320,000|630,000』No. of articles, all
languages|25|19,000|138,000|490,000|862,000|1,600,000』
* Contributed at least ten times; ** at least 5 times in last month; *** more than 100
times in last month.
POEM
basic markup:
poem{
Your poem here
}poem
Each verse in a poem is given an object number.
markup example:
poem{
`Fury said to a
mouse, That he
met in the
house,
"Let us
both go to
law: I will
prosecute
YOU. --Come,
I'll take no
denial; We
must have a
trial: For
really this
morning I've
nothing
to do."
Said the
mouse to the
cur, "Such
a trial,
dear Sir,
With
no jury
or judge,
would be
wasting
our
breath."
"I'll be
judge, I'll
be jury,"
Said
cunning
old Fury:
"I'll
try the
whole
cause,
and
condemn
you
to
death."'
}poem
resulting output:
`Fury said to a
mouse, That he
met in the
house,
"Let us
both go to
law: I will
prosecute
YOU. --Come,
I'll take no
denial; We
must have a
trial: For
really this
morning I've
nothing
to do."
Said the
mouse to the
cur, "Such
a trial,
dear Sir,
With
no jury
or judge,
would be
wasting
our
breath."
"I'll be
judge, I'll
be jury,"
Said
cunning
old Fury:
"I'll
try the
whole
cause,
and
condemn
you
to
death."'
GROUP
basic markup:
group{
Your grouped text here
}group
A group is treated as an object and given a single object number.
markup example:
group{
`Fury said to a
mouse, That he
met in the
house,
"Let us
both go to
law: I will
prosecute
YOU. --Come,
I'll take no
denial; We
must have a
trial: For
really this
morning I've
nothing
to do."
Said the
mouse to the
cur, "Such
a trial,
dear Sir,
With
no jury
or judge,
would be
wasting
our
breath."
"I'll be
judge, I'll
be jury,"
Said
cunning
old Fury:
"I'll
try the
whole
cause,
and
condemn
you
to
death."'
}group
resulting output:
`Fury said to a
mouse, That he
met in the
house,
"Let us
both go to
law: I will
prosecute
YOU. --Come,
I'll take no
denial; We
must have a
trial: For
really this
morning I've
nothing
to do."
Said the
mouse to the
cur, "Such
a trial,
dear Sir,
With
no jury
or judge,
would be
wasting
our
breath."
"I'll be
judge, I'll
be jury,"
Said
cunning
old Fury:
"I'll
try the
whole
cause,
and
condemn
you
to
death."'
CODE
Code tags code{ ... }code (used as with other group tags described above) are used to
escape regular sisu markup, and have been used extensively within this document to provide
examples of SiSU markup. You cannot however use code tags to escape code tags. They are
however used in the same way as group or poem tags.
A code-block is treated as an object and given a single object number. [an option to
number each line of code may be considered at some later time]
use of code tags instead of poem compared, resulting output:
`Fury said to a
mouse, That he
met in the
house,
"Let us
both go to
law: I will
prosecute
YOU. --Come,
I'll take no
denial; We
must have a
trial: For
really this
morning I've
nothing
to do."
Said the
mouse to the
cur, "Such
a trial,
dear Sir,
With
no jury
or judge,
would be
wasting
our
breath."
"I'll be
judge, I'll
be jury,"
Said
cunning
old Fury:
"I'll
try the
whole
cause,
and
condemn
you
to
death."'
From SiSU 2.7.7 on you can number codeblocks by placing a hash after the opening code tag
code{# as demonstrated here:
1 | `Fury said to a
2 | mouse, That he
3 | met in the
4 | house,
5 | "Let us
6 | both go to
7 | law: I will
8 | prosecute
9 | YOU. --Come,
10 | I'll take no
11 | denial; We
12 | must have a
13 | trial: For
14 | really this
15 | morning I've
16 | nothing
17 | to do."
18 | Said the
19 | mouse to the
20 | cur, "Such
21 | a trial,
22 | dear Sir,
23 | With
24 | no jury
25 | or judge,
26 | would be
27 | wasting
28 | our
29 | breath."
30 | "I'll be
31 | judge, I'll
32 | be jury,"
33 | Said
34 | cunning
35 | old Fury:
36 | "I'll
37 | try the
38 | whole
39 | cause,
40 | and
41 | condemn
42 | you
43 | to
44 | death."'
ADDITIONAL BREAKS - LINEBREAKS WITHIN OBJECTS, COLUMN AND PAGE-BREAKS
LINE-BREAKS
To break a line within a "paragraph object", two backslashes \\ with a space before and a
space or newline after them may be used.
To break a line within a "paragraph object",
two backslashes \\ with a space before
and a space or newline after them \\
may be used.
The html break br enclosed in angle brackets (though undocumented) is available in
versions prior to 3.0.13 and 2.9.7 (it remains available for the time being, but is
depreciated).
To draw a dividing line dividing paragraphs, see the section on page breaks.
PAGE BREAKS
Page breaks are only relevant and honored in some output formats. A page break or a new
page may be inserted manually using the following markup on a line on its own:
page new =\= breaks the page, starts a new page.
page break -- breaks a column, starts a new column, if using columns, else breaks the
page, starts a new page.
page break line across page -..- draws a dividing line, dividing paragraphs
page break:
-\\-
page (break) new:
=\\=
page (break) line across page (dividing paragraphs):
-..-
BIBLIOGRAPHY / REFERENCES
There are three ways to prepare a bibliography using sisu (which are mutually exclusive):
(i) manually preparing and marking up as regular text in sisu a list of references, this
is treated as a regular document segment (and placed before endnotes if any); (ii)
preparing a bibliography, marking a heading level 1~!biblio (note the exclamation mark)
and preparing a bibliography using various metadata tags including for author: title:
year: a list of which is provided below, or; (iii) as an assistance in preparing a
bibliography, marking a heading level 1~!biblio and tagging citations within footnotes for
inclusion, identifying citations and having a parser attempt to extract them and build a
bibliography of the citations provided.
For the heading/section sequence: endnotes, bibliography then book index to occur, the
name biblio or bibliography must be given to the bibliography section, like so:
1~!biblio~ [Note: heading marker::required title missing]
A MARKUP TAGGED METADATA BIBLIOGRAPHY SECTION
Here instead of writing your full citations directly in footnotes, each time you have new
material to cite, you add it to your bibliography section (if it has not been added yet)
providing the information you need against an available list of tags (provided below).
The required tags are au: ti: and year: [^10] an short quick example might be as follows:
1~!biblio~ [Note: heading marker::required title missing]
au: von Hippel, E.
ti: Perspective: User Toolkits for Innovation
lng: (language)
jo: Journal of Product Innovation Management
vo: 18
ed: (editor)
yr: 2001
note:
sn: Hippel, /{User Toolkits}/ (2001)
id: vHippel_2001
% form:
au: Benkler, Yochai
ti: The Wealth of Networks
st: How Social Production Transforms Markets and Freedom
lng: (language)
pb: Harvard University Press
edn: (edition)
yr: 2006
pl: U.S.
url: http://cyber.law.harvard.edu/wealth_of_networks/Main_Page
note:
sn: Benkler, /{Wealth of Networks}/ (2006)
id: Benkler2006
au: Quixote, Don; Panza, Sancho
ti: Taming Windmills, Keeping True
jo: Imaginary Journal
yr: 1605
url: https://en.wikipedia.org/wiki/Don_Quixote
note: made up to provide an example of author markup for an article with two authors
sn: Quixote & Panza, /{Taming Windmills}/ (1605)
id: quixote1605
Note that the section name !biblio (or !bibliography) is required for the bibliography to
be treated specially as such, and placed after the auto-generated endnote section.
Using this method, work goes into preparing the bibliography, the tags author or editor,
year and title are required and will be used to sort the bibliography that is placed under
the Bibliography section
The metadata tags may include shortname (sn:) and id, if provided, which are used for
substitution within text. Every time the given id is found within the text it will be
replaced by the given short title of the work (it is for this reason the short title has
sisu markup to italicize the title), it should work with any page numbers to be added, the
short title should be one that can easily be used to look up the full description in the
bibliography.
The following footnote~{ quixote1605, pp 1000 - 1001, also Benkler2006 p 1. }~
would be presented as:
Quixote and Panza, Taming Windmills (1605), pp 1000 - 1001 also, Benkler, Wealth of
Networks, (2006) p 1 or rather[^11]
au: author Surname, FirstNames (if multiple semi-colon separator)
(required unless editor to be used instead)
ti: title (required)
st: subtitle
jo: journal
vo: volume
ed: editor (required if author not provided)
tr: translator
src: source (generic field where others are not appropriate)
in: in (like src)
pl: place/location (state, country)
pb: publisher
edn: edition
yr: year (yyyy or yyyy-mm or yyyy-mm-dd) (required)
pg: pages
url: http://url
note: note
id: create_short_identifier e.g. authorSurnameYear
(used in substitutions: when found within text will be
replaced by the short name provided)
sn: short name e.g. Author, /{short title}/, Year
(used in substitutions: when an id is found within text
the short name will be used to replace it)
TAGGING CITATIONS FOR INCLUSION IN THE BIBLIOGRAPHY
Here whenever you make a citation that you wish be included in the bibliography, you tag
the citation as such using special delimiters (which are subsequently removed from the
final text produced by sisu)
Here you would write something like the following, either in regular text or a footnote
See .: Quixote, Don; Panza, Sancho /{Taming Windmills, Keeping True}/ (1605) :.
SiSU will parse for a number of patterns within the delimiters to try make out the
authors, title, date etc. and from that create a Bibliography. This is more limited than
the previously described method of preparing a tagged bibliography, and using an id within
text to identify the work, which also lends itself to greater consistency.
GLOSSARY
Using the section name 1~!glossary results in the Glossary being treated specially as
such, and placed after the auto-generated endnote section (before the bibliography/list of
references if there is one).
The Glossary is ordinary text marked up in a manner deemed suitable for that purpose.
e.g. with the term in bold, possibly with a hanging indent.
1~!glossary~ [Note: heading marker::required title missing]
_0_1 *{GPL}* An abbreviation that stands for "General Purpose License." ...
_0_1 [provide your list of terms and definitions]
In the given example the first line is not indented subsequent lines are by one level,
and the term to be defined is in bold text.
BOOK INDEX
To make an index append to paragraph the book index term relates to it, using an equal
sign and curly braces.
Currently two levels are provided, a main term and if needed a sub-term. Sub-terms are
separated from the main term by a colon.
Paragraph containing main term and sub-term.
={Main term:sub-term}
The index syntax starts on a new line, but there should not be an empty line between
paragraph and index markup.
The structure of the resulting index would be:
Main term, 1
sub-term, 1
Several terms may relate to a paragraph, they are separated by a semicolon. If the term
refers to more than one paragraph, indicate the number of paragraphs.
Paragraph containing main term, second term and sub-term.
={first term; second term: sub-term}
The structure of the resulting index would be:
First term, 1,
Second term, 1,
sub-term, 1
If multiple sub-terms appear under one paragraph, they are separated under the main term
heading from each other by a pipe symbol.
Paragraph containing main term, second term and sub-term.
={Main term:
sub-term+2|second sub-term;
Another term
}
A paragraph that continues discussion of the first sub-term
The plus one in the example provided indicates the first sub-term spans one additional
paragraph. The logical structure of the resulting index would be:
Main term, 1,
sub-term, 1-3,
second sub-term, 1,
Another term, 1
COMPOSITE DOCUMENTS MARKUP
It is possible to build a document by creating a master document that requires other
documents. The documents required may be complete documents that could be generated
independently, or they could be markup snippets, prepared so as to be easily available to
be placed within another text. If the calling document is a master document (built from
other documents), it should be named with the suffix .ssm Within this document you would
provide information on the other documents that should be included within the text. These
may be other documents that would be processed in a regular way, or markup bits prepared
only for inclusion within a master document .sst regular markup file, or .ssi
(insert/information) A secondary file of the composite document is built prior to
processing with the same prefix and the suffix ._sst
basic markup for importing a document into a master document
<< filename1.sst
<< filename2.ssi
The form described above should be relied on. Within the Vim editor it results in the
text thus linked becoming hyperlinked to the document it is calling in which is convenient
for editing.
SUBSTITUTIONS
markup example:
The current Debian is ${debian_stable} the next debian will be ${debian_testing}
Configure substitution in _sisu/sisu_document_make
@make:
:substitute: /${debian_stable}/,'*{Wheezy}*' /${debian_testing}/,'*{Jessie}*'
resulting output:
The current Debian is Jessie the next debian will be Stretch
Configure substitution in _sisu/sisu_document_make
SISU FILETYPES
SiSU has plaintext and binary filetypes, and can process either type of document.
.SST .SSM .SSI MARKED UP PLAIN TEXT
SiSU documents are prepared as plain-text (utf-8) files with SiSU markup. They may make
reference to and contain images (for example), which are stored in the directory
beneath them _sisu/image. 〔b¤SiSU plaintext markup files are of three types that
may be distinguished by the file extension used: regular text .sst; master
documents, composite documents that incorporate other text, which can be any
regular text or text insert; and inserts the contents of which are like regular
text except these are marked .ssi and are not processed.
SiSU processing can be done directly against a sisu documents; which may be located
locally or on a remote server for which a url is provided.
SiSU source markup can be shared with the command:
sisu -s [filename]
SISU TEXT - REGULAR FILES (.SST)
The most common form of document in SiSU, see the section on SiSU markup.
SISU MASTER FILES (.SSM)
Composite documents which incorporate other SiSU documents which may be either regular
SiSU text .sst which may be generated independently, or inserts prepared solely for the
purpose of being incorporated into one or more master documents.
The mechanism by which master files incorporate other documents is described as one of
the headings under under SiSU markup in the SiSU manual.
Note: Master documents may be prepared in a similar way to regular documents, and
processing will occur normally if a .sst file is renamed .ssm without requiring any other
documents; the .ssm marker flags that the document may contain other documents.
Note: a secondary file of the composite document is built prior to processing with the
same prefix and the suffix ._sst [^12]
SISU INSERT FILES (.SSI)
Inserts are documents prepared solely for the purpose of being incorporated into one or
more master documents. They resemble regular SiSU text files (.sst). Since sisu -5.5.0
(6.1.0) .ssi files can like .ssm files include other .sst or .ssm files. .ssi files cannot
be called by the sisu processor directly and can only be incorporated in other documents.
Making a file a .ssi file is a quick and convenient way of breaking up a document that is
to be included in a master document, and flagging that the file to be incorporated .ssi is
not intended that the file should be processed on its own.
SISUPOD, ZIPPED BINARY CONTAINER (SISUPOD.ZIP, .SSP)
A sisupod is a zipped SiSU text file or set of SiSU text files and any associated images
that they contain (this will be extended to include sound and multimedia-files)
SiSU plaintext files rely on a recognised directory structure to find contents such as
images associated with documents, but all images for example for all documents
contained in a directory are located in the sub-directory _sisu/image. Without the
ability to create a sisupod it can be inconvenient to manually identify all other
files associated with a document. A sisupod automatically bundles all associated
files with the document that is turned into a pod.
The structure of the sisupod is such that it may for example contain a single
document and its associated images; a master document and its associated documents
and anything else; or the zipped contents of a whole directory of prepared SiSU
documents.
The command to create a sisupod is:
sisu -S [filename]
Alternatively, make a pod of the contents of a whole directory:
sisu -S
SiSU processing can be done directly against a sisupod; which may be located
locally or on a remote server for which a url is provided.
<http://www.sisudoc.org/sisu/sisu_commands>
<http://www.sisudoc.org/sisu/sisu_manual>
CONFIGURATION
CONFIGURATION FILES
CONFIG.YML
SiSU configration parameters are adjusted in the configuration file, which can be used to
override the defaults set. This includes such things as which directory interim processing
should be done in and where the generated output should be placed.
The SiSU configuration file is a yaml file, which means indentation is significant.
SiSU resource configuration is determined by looking at the following files if they exist:
./_sisu/v7/sisurc.yml
./_sisu/sisurc.yml
~/.sisu/v7/sisurc.yml
~/.sisu/sisurc.yml
/etc/sisu/v7/sisurc.yml
/etc/sisu/sisurc.yml
The search is in the order listed, and the first one found is used.
In the absence of instructions in any of these it falls back to the internal program
defaults.
Configuration determines the output and processing directories and the database access
details.
If SiSU is installed a sample sisurc.yml may be found in /etc/sisu/sisurc.yml
SISU_DOCUMENT_MAKE
Most sisu document headers relate to metadata, the exception is the @make: header which
provides processing related information. The default contents of the @make header may be
set by placing them in a file sisu_document_make.
The search order is as for resource configuration:
./_sisu/v7/sisu_document_make
./_sisu/sisu_document_make
~/.sisu/v7/sisu_document_make
~/.sisu/sisu_document_make
/etc/sisu/v7/sisu_document_make
/etc/sisu/sisu_document_make
A sample sisu_document_make can be found in the _sisu/ directory under along with the
provided sisu markup samples.
CSS - CASCADING STYLE SHEETS (FOR HTML, XHTML AND XML)
CSS files to modify the appearance of SiSU html, XHTML or XML may be placed in the
configuration directory: ./_sisu/css ; ~/.sisu/css or; /etc/sisu/css and these will be
copied to the output directories with the command sisu -CC.
The basic CSS file for html output is html. css, placing a file of that name in directory
_sisu/css or equivalent will result in the default file of that name being overwritten.
HTML: html. css
XML DOM: dom.css
XML SAX: sax.css
XHTML: xhtml. css
The default homepage may use homepage.css or html. css
Under consideration is to permit the placement of a CSS file with a different name in
directory _sisu/css directory or equivalent.[^13]
ORGANISING CONTENT - DIRECTORY STRUCTURE AND MAPPING
SiSU v3 has new options for the source directory tree, and output directory structures of
which there are 3 alternatives.
DOCUMENT SOURCE DIRECTORY
The document source directory is the directory in which sisu processing commands are
given. It contains the sisu source files (.sst .ssm .ssi), or (for sisu v3 may contain)
subdirectories with language codes which contain the sisu source files, so all English
files would go in subdirectory en/, French in fr/, Spanish in es/ and so on. ISO 639-1
codes are used (as varied by po4a). A list of available languages (and possible sub-
directory names) can be obtained with the command "sisu --help lang" The list of languages
is limited to langagues supported by XeTeX polyglosia.
GENERAL DIRECTORIES
./subject_name/
% files stored at this level e.g. sisu_manual.sst or
% for sisu v3 may be under language sub-directories
% e.g.
./subject_name/en
./subject_name/fr
./subject_name/es
./subject_name/_sisu
./subject_name/_sisu/css
./subject_name/_sisu/image
DOCUMENT OUTPUT DIRECTORY STRUCTURES
OUTPUT DIRECTORY ROOT
The output directory root can be set in the sisurc.yml file. Under the root,
subdirectories are made for each directory in which a document set resides. If you have a
directory named poems or conventions, that directory will be created under the output
directory root and the output for all documents contained in the directory of a particular
name will be generated to subdirectories beneath that directory (poem or conventions). A
document will be placed in a subdirectory of the same name as the document with the
filetype identifier stripped (.sst .ssm)
The last part of a directory path, representing the sub-directory in which a document set
resides, is the directory name that will be used for the output directory. This has
implications for the organisation of document collections as it could make sense to place
documents of a particular subject, or type within a directory identifying them. This
grouping as suggested could be by subject (sales_law, english_literature); or just as
conveniently by some other classification (X University). The mapping means it is also
possible to place in the same output directory documents that are for organisational
purposes kept separately, for example documents on a given subject of two different
institutions may be kept in two different directories of the same name, under a directory
named after each institution, and these would be output to the same output directory.
Skins could be associated with each institution on a directory basis and resulting
documents will take on the appropriate different appearance.
ALTERNATIVE OUTPUT STRUCTURES
There are 3 possibile output structures described as being, by language, by filetype or
by filename, the selection is made in sisurc.yml
#% output_dir_structure_by: language; filetype; or filename
output_dir_structure_by: language #(language & filetype, preferred?)
#output_dir_structure_by: filetype
#output_dir_structure_by: filename #(default, closest to original v1 & v2)
BY LANGUAGE
The by language directory structure places output files
The by language directory structure separates output files by language code (all files of
a given language), and within the language directory by filetype.
Its selection is configured in sisurc.yml
output_dir_structure_by: language
|-- en
|-- epub
|-- hashes
|-- html
| |-- viral_spiral.david_bollier
| |-- manifest
| |-- qrcode
| |-- odt
| |-- sitemaps
| |-- txt
| |-- xhtml
| `-- xml
|-- po4a
| `-- live-manual
| |-- po
| |-- fr
| `-- pot
`-- _sisu
|-- css
|-- image
|-- image_sys -> ../../_sisu/image_sys
`-- xml
|-- rnc
|-- rng
`-- xsd
#by: language subject_dir/en/manifest/filename.html
BY FILETYPE
The by filetype directory structure separates output files by filetype, all html files in
one directory pdfs in another and so on. Filenames are given a language extension.
Its selection is configured in sisurc.yml
output_dir_structure_by: filetype
|-- epub
|-- hashes
|-- html
|-- viral_spiral.david_bollier
|-- manifest
|-- qrcode
|-- odt
|-- po4a
|-- live-manual
| |-- po
| |-- fr
| `-- pot
|-- _sisu
| |-- css
| |-- image
| |-- image_sys -> ../../_sisu/image_sys
| `-- xml
| |-- rnc
| |-- rng
| `-- xsd
|-- sitemaps
|-- txt
|-- xhtml
`-- xml
#by: filetype subject_dir/html/filename/manifest.en.html
BY FILENAME
The by filename directory structure places most output of a particular file (the
different filetypes) in a common directory.
Its selection is configured in sisurc.yml
output_dir_structure_by: filename
|-- epub
|-- po4a
|-- live-manual
| |-- po
| |-- fr
| `-- pot
|-- _sisu
| |-- css
| |-- image
| |-- image_sys -> ../../_sisu/image_sys
| `-- xml
| |-- rnc
| |-- rng
| `-- xsd
|-- sitemaps
|-- src
|-- pod
`-- viral_spiral.david_bollier
#by: filename subject_dir/filename/manifest.en.html
REMOTE DIRECTORIES
./subject_name/
% containing sub_directories named after the generated files from which they are made
./subject_name/src
% contains shared source files text and binary e.g. sisu_manual.sst and sisu_manual.sst.zip
./subject_name/_sisu
% configuration file e.g. sisurc.yml
./subject_name/_sisu/skin
% skins in various skin directories doc, dir, site, yml
./subject_name/_sisu/css
./subject_name/_sisu/image
% images for documents contained in this directory
./subject_name/_sisu/mm
SISUPOD
./sisupod/
% files stored at this level e.g. sisu_manual.sst
./sisupod/_sisu
% configuration file e.g. sisurc.yml
./sisupod/_sisu/skin
% skins in various skin directories doc, dir, site, yml
./sisupod/_sisu/css
./sisupod/_sisu/image
% images for documents contained in this directory
./sisupod/_sisu/mm
HOMEPAGES
SiSU is about the ability to auto-generate documents. Home pages are regarded as custom
built items, and are not created by SiSU. More accurately, SiSU has a default home page,
which will not be appropriate for use with other sites, and the means to provide your own
home page instead in one of two ways as part of a site's configuration, these being:
1. through placing your home page and other custom built documents in the subdirectory
_sisu/home/ (this probably being the easier and more convenient option)
2. through providing what you want as the home page in a skin,
Document sets are contained in directories, usually organised by site or subject. Each
directory can/should have its own homepage. See the section on directory structure and
organisation of content.
HOME PAGE AND OTHER CUSTOM BUILT PAGES IN A SUB-DIRECTORY
Custom built pages, including the home page index.html may be placed within the
configuration directory _sisu/home/ in any of the locations that is searched for the
configuration directory, namely ./_sisu ; ~/_sisu ; /etc/sisu From there they are copied
to the root of the output directory with the command:
sisu -CC
MARKUP AND OUTPUT EXAMPLES
MARKUP EXAMPLES
Current markup examples and document output samples are provided off <http://sisudoc.org>
or <http://www.jus.uio.no/sisu> and in the sisu -markup-sample package available off
<http://git.sisudoc.org>
For some documents hardly any markup at all is required at all, other than a header, and
an indication that the levels to be taken into account by the program in generating its
output are.
SISU MARKUP SAMPLES
A few additional sample books prepared as sisu markup samples, output formats to be
generated using SiSU are contained in a separate package sisu -markup-samples. sisu
-markup-samples contains books (prepared using sisu markup), that were released by their
authors various licenses mostly different Creative Commons licences that do not permit
inclusion in the Debian Project as they have requirements that do not meet the Debian Free
Software Guidelines for various reasons, most commonly that they require that the original
substantive text remain unchanged, and sometimes that the works be used only non-
commercially.
Accelerando, Charles Stross (2005) accelerando.charles_stross.sst
Alice's Adventures in Wonderland, Lewis Carroll (1865)
alices_adventures_in_wonderland.lewis_carroll.sst
CONTENT, Cory Doctorow (2008) content.cory_doctorow.sst
Democratizing Innovation, Eric von Hippel (2005)
democratizing_innovation.eric_von_hippel.sst
Down and Out in the Magic Kingdom, Cory Doctorow (2003)
down_and_out_in_the_magic_kingdom.cory_doctorow.sst
For the Win, Cory Doctorow (2010) for_the_win.cory_doctorow.sst
Free as in Freedom - Richard Stallman's Crusade for Free Software, Sam Williams (2002)
free_as_in_freedom.richard_stallman_crusade_for_free_software.sam_williams.sst
Free as in Freedom 2.0 - Richard Stallman and the Free Software Revolution, Sam Williams
(2002), Richard M. Stallman (2010)
free_as_in_freedom_2.richard_stallman_and_the_free_software_revolution.sam_williams.richard_stallman.sst
Free Culture - How Big Media Uses Technology and the Law to Lock Down Culture and Control
Creativity, Lawrence Lessig (2004) free_culture.lawrence_lessig.sst
Free For All - How Linux and the Free Software Movement Undercut the High Tech Titans,
Peter Wayner (2002) free_for_all.peter_wayner.sst
GNU GENERAL PUBLIC LICENSE v2, Free Software Foundation (1991) gpl2.fsf.sst
GNU GENERAL PUBLIC LICENSE v3, Free Software Foundation (2007) gpl3.fsf.sst
Gulliver's Travels, Jonathan Swift (1726 / 1735) gullivers_travels.jonathan_swift.sst
Little Brother, Cory Doctorow (2008) little_brother.cory_doctorow.sst
The Cathederal and the Bazaar, Eric Raymond (2000)
the_cathedral_and_the_bazaar.eric_s_raymond.sst
The Public Domain - Enclosing the Commons of the Mind, James Boyle (2008)
the_public_domain.james_boyle.sst
The Wealth of Networks - How Social Production Transforms Markets and Freedom, Yochai
Benkler (2006) the_wealth_of_networks.yochai_benkler.sst
Through the Looking Glass, Lewis Carroll (1871)
through_the_looking_glass.lewis_carroll.sst
Two Bits - The Cultural Significance of Free Software, Christopher Kelty (2008)
two_bits.christopher_kelty.sst
UN Contracts for International Sale of Goods, UN (1980)
un_contracts_international_sale_of_goods_convention_1980.sst
Viral Spiral, David Bollier (2008) viral_spiral.david_bollier.sst
SISU SEARCH - INTRODUCTION
Because the document structure of sites created is clearly defined, and the text object
citation system is available hypothetically at least, for all forms of output, it is
possible to search the sql database, and either read results from that database, or map
the results to the html or other output, which has richer text markup.
SiSU can populate a relational sql type database with documents at an object level,
including objects numbers that are shared across different output types. Making a document
corpus searchable with that degree of granularity. Basically, your match criteria is met
by these documents and at these locations within each document, which can be viewed within
the database directly or in various output formats.
SiSU can populate an sql database (sqlite3 or postgresql) with documents made up of their
objects. It also can generate a cgi search form that can be used to query the database.
In order to use the built in search functionality you would take the following steps.
* use sisu to populate an sql database with with a sisu markup content
* sqlite3 should work out of the box
* postgresql may require some initial database configuration
* provide a way to query the database, which sisu can assist with by
* generating a sample ruby cgi search form, required (sisu configuration
recommended)
* adding a query field for this search form to be added to all html files
(sisu configuration required)
SQL
POPULATE THE DATABASE
TO populate the sql database, run sisu against a sisu markup file with one of the
following sets of flags
sisu --sqlite filename.sst
creates an sqlite3 database containing searchable content of just the sisu markup
document selected
sisu --sqlite --update filename.sst
creates an sqlite3 database containing searchable content of marked up document(s)
selected by the user from a common directory
sisu --pg --update filename.sst
fills a postgresql database with searchable content of marked up document(s) selected by
the user from a common directory
For postgresql the first time the command is run in a given directory the user will be
prompted to create the requisite database, at the time of writing the prompt sisu provides
is as follows:
no connection with pg database established, you may need to run:
createdb "SiSU.7a.current"
after that don't forget to run:
sisu --pg --createall
before attempting to populate the database
The named database that sisu expects to find must exist and if necessary be created using
postgresql tools. If the database exist but the database tables do not, sisu will attempt
to create the tables it needs, the equivalent of the requested sisu --pg --createall
command.
Once this is done, the sql database is populated and ready to be queried.
SQL TYPE DATABASES
SiSU feeds sisu markup documents into sql type databases PostgreSQL [^14] and/or SQLite
[^15] database together with information related to document structure.
This is one of the more interesting output forms, as all the structural data of the
documents are retained (though can be ignored by the user of the database should they so
choose). All site texts/documents are (currently) streamed to four tables:
* one containing semantic (and other) headers, including, title, author,
subject, (the
.I Dublin Core.
..);
* another the substantive texts by individual "paragraph" (or object) - along
with structural information, each paragraph being identifiable by its
paragraph number (if it has one which almost all of them do), and the
substantive text of each paragraph quite naturally being searchable (both in
formatted and clean text versions for searching); and
* a third containing endnotes cross-referenced back to the paragraph from
which they are referenced (both in formatted and clean text versions for
searching).
* a fourth table with a one to one relation with the headers table contains
full text versions of output, eg. pdf, html, xml, and
.I ascii.
There is of course the possibility to add further structures.
At this level SiSU loads a relational database with documents chunked into objects, their
smallest logical structurally constituent parts, as text objects, with their object
citation number and all other structural information needed to construct the document.
Text is stored (at this text object level) with and without elementary markup tagging, the
stripped version being so as to facilitate ease of searching.
Being able to search a relational database at an object level with the SiSU citation
system is an effective way of locating content generated by SiSU. As individual text
objects of a document stored (and indexed) together with object numbers, and all versions
of the document have the same numbering, complex searches can be tailored to return just
the locations of the search results relevant for all available output formats, with live
links to the precise locations in the database or in html/xml documents; or, the
structural information provided makes it possible to search the full contents of the
database and have headings in which search content appears, or to search only headings
etc. (as the Dublin Core is incorporated it is easy to make use of that as well).
POSTGRESQL
NAME
SiSU - Structured information, Serialized Units - a document publishing system, postgresql
dependency package
DESCRIPTION
Information related to using postgresql with sisu (and related to the sisu_postgresql
dependency package, which is a dummy package to install dependencies needed for SiSU to
populate a postgresql database, this being part of SiSU - man sisu) .
SYNOPSIS
sisu -D [instruction] [filename/wildcard if required]
sisu -D --pg --[instruction] [filename/wildcard if required]
COMMANDS
Mappings to two databases are provided by default, postgresql and sqlite, the same
commands are used within sisu to construct and populate databases however -d (lowercase)
denotes sqlite and -D (uppercase) denotes postgresql, alternatively --sqlite or --pgsql
may be used
-D or --pgsql may be used interchangeably.
CREATE AND DESTROY DATABASE
--pgsql --createall
initial step, creates required relations (tables, indexes) in existing (postgresql)
database (a database should be created manually and given the same name as working
directory, as requested) (rb.dbi)
sisu -D --createdb
creates database where no database existed before
sisu -D --create
creates database tables where no database tables existed before
sisu -D --Dropall
destroys database (including all its content)! kills data and drops tables, indexes
and database associated with a given directory (and directories of the same name).
sisu -D --recreate
destroys existing database and builds a new empty database structure
IMPORT AND REMOVE DOCUMENTS
sisu -D --import -v [filename/wildcard]
populates database with the contents of the file. Imports documents(s) specified to
a postgresql database (at an object level).
sisu -D --update -v [filename/wildcard]
updates file contents in database
sisu -D --remove -v [filename/wildcard]
removes specified document from postgresql database.
SQLITE
NAME
SiSU - Structured information, Serialized Units - a document publishing system.
DESCRIPTION
Information related to using sqlite with sisu (and related to the sisu_sqlite dependency
package, which is a dummy package to install dependencies needed for SiSU to populate an
sqlite database, this being part of SiSU - man sisu) .
SYNOPSIS
sisu -d [instruction] [filename/wildcard if required]
sisu -d --(sqlite|pg) --[instruction] [filename/wildcard if required]
COMMANDS
Mappings to two databases are provided by default, postgresql and sqlite, the same
commands are used within sisu to construct and populate databases however -d (lowercase)
denotes sqlite and -D (uppercase) denotes postgresql, alternatively --sqlite or --pgsql
may be used
-d or --sqlite may be used interchangeably.
CREATE AND DESTROY DATABASE
--sqlite --createall
initial step, creates required relations (tables, indexes) in existing (sqlite)
database (a database should be created manually and given the same name as working
directory, as requested) (rb.dbi)
sisu -d --createdb
creates database where no database existed before
sisu -d --create
creates database tables where no database tables existed before
sisu -d --dropall
destroys database (including all its content)! kills data and drops tables, indexes
and database associated with a given directory (and directories of the same name).
sisu -d --recreate
destroys existing database and builds a new empty database structure
IMPORT AND REMOVE DOCUMENTS
sisu -d --import -v [filename/wildcard]
populates database with the contents of the file. Imports documents(s) specified to
an sqlite database (at an object level).
sisu -d --update -v [filename/wildcard]
updates file contents in database
sisu -d --remove -v [filename/wildcard]
removes specified document from sqlite database.
CGI SEARCH FORM
For the search form, which is a single search page
* configure the search form
* generate the sample search form with the sisu command, (this will be based on the
configuration settings and existing found sisu databases)
For postgresql web content you may need to edit the search cgi script. Two things to look
out for are that the user is set as needed, and that the any different databases that you
wish to be able to query are listed.
correctly, you may want www-data rather than your username.
@user='www-data'
* check the search form, copy it to the appropriate cgi directory and set the correct
permissions
For a search form to appear on each html page, you need to:
* rely on the above mentioned configuration of the search form
* configure the html search form to be on
* run the html command
SETUP SEARCH FORM
You will need a web server, httpd with cgi enabled, and a postgresql database to which
you are able to create databases.
Setup postgresql, make sure you are able to create and write to the database, e.g.:
sudo su postgres
createuser -d -a ralph
You then need to create the database that sisu will use, for sisu manual in the directory
manual/en for example, (when you try to populate a database that does not exist sisu
prompts as to whether it exists):
createdb SiSU.7a.manual
SiSU is then able to create the required tables that allow you to populate the database
with documents in the directory for which it has been created:
sisu --pg --createall -v
You can then start to populate the database, in this example with a single document:
sisu --pg --update -v en/sisu_manual.ssm
To create a sample search form, from within the same directory run:
sisu --sample-search-form --db-pg
and copy the resulting cgi form to your cgi-bin directory
A sample setup for nginx is provided that assumes data will be stored under /srv/www and
cgi scripts under /srv/cgi
SEARCH - DATABASE FRONTEND SAMPLE, UTILISING DATABASE AND SISU FEATURES,
INCLUDING OBJECT CITATION NUMBERING (BACKEND CURRENTLY POSTGRESQL)
Sample search frontend <http://search.sisudoc.org> [^16] A small database and sample
query front-end (search from) that makes use of the citation system, .I object citation
numbering to demonstrates functionality.[^17]
SiSU can provide information on which documents are matched and at what locations within
each document the matches are found. These results are relevant across all outputs using
object citation numbering, which includes html, XML, EPUB, LaTeX, PDF and indeed the SQL
database. You can then refer to one of the other outputs or in the SQL database expand the
text within the matched objects (paragraphs) in the documents matched.
Note you may set results either for documents matched and object number locations within
each matched document meeting the search criteria; or display the names of the documents
matched along with the objects (paragraphs) that meet the search criteria.[^18]
sisu -F --webserv-webrick
builds a cgi web search frontend for the database created
The following is feedback on the setup on a machine provided by the help command:
sisu --help sql
Postgresql
user: ralph
current db set: SiSU_sisu
port: 5432
dbi connect: DBI:Pg:database=SiSU_sisu;port=5432
sqlite
current db set: /home/ralph/sisu_www/sisu/sisu_sqlite.db
dbi connect DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db
Note on databases built
By default, [unless otherwise specified] databases are built on a directory basis,
from collections of documents within that directory. The name of the directory you
choose to work from is used as the database name, i.e. if you are working in a
directory called /home/ralph/ebook the database SiSU_ebook is used. [otherwise a
manual mapping for the collection is necessary]
SEARCH FORM
sisu -F
generates a sample search form, which must be copied to the web-server cgi
directory
sisu -F --webserv-webrick
generates a sample search form for use with the webrick server, which must be
copied to the web-server cgi directory
sisu -W
starts the webrick server which should be available wherever sisu is properly
installed
The generated search form must be copied manually to the webserver directory as
instructed
SISU_WEBRICK
NAME
SiSU - Structured information, Serialized Units - a document publishing system
SYNOPSIS
sisu_webrick [port]
or
sisu -W [port]
DESCRIPTION
sisu_webrick is part of SiSU (man sisu) sisu_webrick starts Ruby SiSU output is written,
providing a list of these directories (assuming SiSU is in use and they exist).
The default port for sisu_webrick is set to 8081, this may be modified in the yaml file:
~/.sisu/sisurc.yml a sample of which is provided as /etc/sisu/sisurc.yml (or in the
equivalent directory on your system).
SUMMARY OF MAN PAGE
sisu_webrick, may be started on it's own with the command: sisu_webrick [port] or using
the sisu command with the -W flag: sisu -W [port]
where no port is given and settings are unchanged the default port is 8081
DOCUMENT PROCESSING COMMAND FLAGS
sisu -W [port] starts Ruby Webrick web-server, serving SiSU output directories, on the
port provided, or if no port is provided and the defaults have not been changed in
~/.sisu/sisurc.yaml then on port 8081
SUMMARY OF FEATURES
* sparse/minimal markup (clean utf-8 source texts). Documents are prepared in a single
UTF-8 file using a minimalistic mnemonic syntax. Typical literature, documents like "War
and Peace" require almost no markup, and most of the headers are optional.
* markup is easily readable/parsable by the human eye, (basic markup is simpler and more
sparse than the most basic HTML ) , [this may also be converted to XML representations of
the same input/source document].
* markup defines document structure (this may be done once in a header pattern-match
description, or for heading levels individually); basic text attributes (bold, italics,
underscore, strike-through etc.) as required; and semantic information related to the
document (header information, extended beyond the Dublin core and easily further extended
as required); the headers may also contain processing instructions. SiSU markup is
primarily an abstraction of document structure and document metadata to permit taking
advantage of the basic strengths of existing alternative practical standard ways of
representing documents [be that browser viewing, paper publication, sql search etc.]
(html, epub, xml, odf, latex, pdf, sql)
* for output produces reasonably elegant output of established industry and
institutionally accepted open standard formats.[3] takes advantage of the different
strengths of various standard formats for representing documents, amongst the output
formats currently supported are:
* HTML - both as a single scrollable text and a segmented document
* XHTML
* EPUB
* XML - both in sax and dom style xml structures for further development as required
* ODT - Open Document Format text, the iso standard for document storage
* LaTeX - used to generate pdf
* PDF (via LaTeX )
* SQL - population of an sql database ( PostgreSQL or SQLite ) , (at the same object
level that is used to cite text within a document)
Also produces: concordance files; document content certificates (md5 or sha256 digests of
headings, paragraphs, images etc.) and html manifests (and sitemaps of content). (b) takes
advantage of the strengths implicit in these very different output types, (e.g. PDFs
produced using typesetting of LaTeX, databases populated with documents at an individual
object/paragraph level, making possible granular search (and related possibilities))
* ensuring content can be cited in a meaningful way regardless of selected output format.
Online publishing (and publishing in multiple document formats) lacks a useful way of
citing text internally within documents (important to academics generally and to lawyers)
as page numbers are meaningless across browsers and formats. sisu seeks to provide a
common way of pinpoint the text within a document, (which can be utilized for citation and
by search engines). The outputs share a common numbering system that is meaningful (to
man and machine) across all digital outputs whether paper, screen, or database oriented,
(pdf, HTML, EPUB, xml, sqlite, postgresql) , this numbering system can be used to
reference content.
* Granular search within documents. SQL databases are populated at an object level
(roughly headings, paragraphs, verse, tables) and become searchable with that degree of
granularity, the output information provides the object/paragraph numbers which are
relevant across all generated outputs; it is also possible to look at just the matching
paragraphs of the documents in the database; [output indexing also work well with search
indexing tools like hyperestraier].
* long term maintainability of document collections in a world of changing formats,
having a very sparsely marked-up source document base. there is a considerable degree of
future-proofing, output representations are "upgradeable", and new document formats may be
added. e.g. addition of odf (open document text) module in 2006, epub in 2009 and in
future html5 output sometime in future, without modification of existing prepared texts
* SQL search aside, documents are generated as required and static once generated.
* documents produced are static files, and may be batch processed, this needs to be done
only once but may be repeated for various reasons as desired (updated content, addition of
new output formats, updated technology document presentations/representations)
* document source ( plaintext utf-8) if shared on the net may be used as input and
processed locally to produce the different document outputs
* document source may be bundled together (automatically) with associated documents
(multiple language versions or master document with inclusions) and images and sent as a
zip file called a sisupod, if shared on the net these too may be processed locally to
produce the desired document outputs
* generated document outputs may automatically be posted to remote sites.
* for basic document generation, the only software dependency is Ruby, and a few standard
Unix tools (this covers plaintext, HTML, EPUB, XML, ODF, LaTeX ) . To use a database you
of course need that, and to convert the LaTeX generated to pdf, a latex processor like
tetex or texlive.
* as a developers tool it is flexible and extensible
Syntax highlighting for SiSU markup is available for a number of text editors.
SiSU is less about document layout than about finding a way with little markup to be able
to construct an abstract representation of a document that makes it possible to produce
multiple representations of it which may be rather different from each other and used for
different purposes, whether layout and publishing, or search of content
i.e. to be able to take advantage from this minimal preparation starting point of some of
the strengths of rather different established ways of representing documents for different
purposes, whether for search (relational database, or indexed flat files generated for
that purpose whether of complete documents, or say of files made up of objects), online
viewing (e.g. html, xml, pdf) , or paper publication (e.g. pdf) ...
the solution arrived at is by extracting structural information about the document (about
headings within the document) and by tracking objects (which are serialized and also given
hash values) in the manner described. It makes possible representations that are quite
different from those offered at present. For example objects could be saved individually
and identified by their hashes, with an index of how the objects relate to each other to
form a document.
*1. square brackets
*2. square brackets
+1. square brackets
1. <http://www.jus.uio.no/sisu/man/>
2. <http://www.jus.uio.no/sisu/man/sisu.1.html>
3. From sometime after SiSU 0.58 it should be possible to describe SiSU markup using
SiSU, which though not an original design goal is useful.
4. files should be prepared using UTF-8 character encoding
5. a footnote or endnote
6. self contained endnote marker & endnote in one
*. unnumbered asterisk footnote/endnote, insert multiple asterisks if required
**. another unnumbered asterisk footnote/endnote
*3. editors notes, numbered asterisk footnote/endnote series
+2. editors notes, numbered plus symbol footnote/endnote series
7. <http://www.sisudoc.org/>
8. <http://www.ruby-lang.org/en/>
9. Table from the Wealth of Networks by Yochai Benkler
<http://www.jus.uio.no/sisu/the_wealth_of_networks.yochai_benkler>
10. for which you may alternatively use the full form author: title: and year:
11. Quixote and Panza, Taming Windmills (1605), pp 1000 - 1001 also, Benkler, Wealth of
Networks (2006), p 1
12. is not a regular file to be worked on, and thus less likely that people will have
"accidents", working on a .ssc file that is overwritten by subsequent processing.
It may be however that when the resulting file is shared .ssc is an appropriate
suffix to use.
13. SiSU has worked this way in the past, though this was dropped as it was thought the
complexity outweighed the flexibility, however, the balance was rather fine and
this behaviour could be reinstated.
14. <http://www.postgresql.org/> <http://advocacy.postgresql.org/>
<http://en.wikipedia.org/wiki/Postgresql>
15. <http://www.hwaci.com/sw/sqlite/> <http://en.wikipedia.org/wiki/Sqlite>
16. <http://search.sisudoc.org>
17. (which could be extended further with current back-end). As regards scaling of the
database, it is as scalable as the database (here Postgresql) and hardware allow.
18. of this feature when demonstrated to an IBM software innovations evaluator in 2004
he said to paraphrase: this could be of interest to us. We have large document
management systems, you can search hundreds of thousands of documents and we can
tell you which documents meet your search criteria, but there is no way we can tell
you without opening each document where within each your matches are found.
SEE ALSO
sisu(1),
sisu-epub(1),
sisu-harvest(1),
sisu-html(1),
sisu-odf(1),
sisu-pdf(1),
sisu-pg(1),
sisu-sqlite(1),
sisu-txt(1).
sisu_vim(7)
HOMEPAGE
More information about SiSU can be found at <http://www.sisudoc.org/> or
<http://www.jus.uio.no/sisu/>
SOURCE
<http://git.sisudoc.org/>
AUTHOR
SiSU is written by Ralph Amissah <[email protected]>
Use sisu online using onworks.net services