This is the command hxpipe that can be run in the OnWorks free hosting provider using one of our multiple free online workstations such as Ubuntu Online, Fedora Online, Windows online emulator or MAC OS online emulator
hxpipe - convert XML file to a format easier to parse with Perl or AWK
hxpipe [ -l ] [ -- ] [ file-or-URL ]
hxpipe parses an HTML or XML file and outputs a line-oriented representation of it that is
well suited to further processing with AWK or similar tools. The format is similar to the
ESIS (Element Structure Information Set) that is output by nsgmls/onsgmls.
The reverse operation, converting back to mark-up, is performed by the hxunpipe program.
The output format is as follows:
Comments are output as
I.e., a single line starting with "*" followed by the text of the comment. Line
feeds, carriage returns and tabs in the text are written as "\n", "\r" and "\t",
respectively. Text that looks like a numerical character entity is written with
the "&" replaced by "\". The line ends with a line feed.
Note that onsgmls outputs comments starting with a "_" instead of a "*" and
doesn't replace the "&" of numerical character entities by "\" (and by default
it omits comments altogether).
Processing instructions are output as
I.e., a single line starting with a "?" followed by the text of the processing
instruction. The text is escaped as for comments (see above).
<!DOCTYPE root PUBLIC "-//foo//DTD bar//EN" "http://example.org/dtd">
DOCTYPEs are output as one of the following:
!root "-//foo//DTD bar//EN" http://example.org/dtd
!root "-//foo//DTD bar//EN"
!root "" http://example.org/dtd
for respectively: a DOCTYPE with (1) both a public and a system identifier, (2)
only a public identifier, (3) only a system identifier, or (4) neither of the
two. I.e., a single line starting with a "!", followed by a space and a possibly
empty quoted string, followed optionally by a space and arbitrary text. Note the
quotes for the public identifier and the absence of quotes for the system
<elt att1="value1" att2="value2">
A start tag is output as
Aatt1 CDATA value1
Aatt2 CDATA value2
I.e., as zero or more lines for the attributes and one line for the element
type. Each line for an attribute starts with "A" followed by the name of the
attribute, a space, the literal string "CDATA", another space, and the attribute
value. The text of the attribute value is escaped as for comments (see above).
The line for the element type starts with "(" followed by the element type.
hxpipe does not read DTDs and assumes that attributes are always CDATA. It never
generates other types (IMPLIED, TOKEN, ID, etc.), unlike onsgmls.
</elt> End tags are output as
I.e., as a line starting with ")" followed by the element type.
<empty att1="val1" att2="val2"/>
Empty elements (in XML) are output as
Aatt1 CDATA val1
Aatt2 CDATA val2
I.e., as zero or more lines for attributes and one line starting with "|"
followed by the element type.
Note that onsgmls never outputs "|". (However, it can optionally output a line
consisting of a single "e" just before the "(" line, to indicate that the
element is empty.)
text Text is output as
I.e., as a single line starting with a "-". The text is escaped as for comments
When the -l option is in effect, hxpipe will intersperse the output with lines
of the form
where "12" is replaced with the line number in the source where the next output
hxpipe does not normalize the input and does not add mising tags. It is thus possible that
there are unequal numbers of "(" and ")" lines. If it is important that every start tag is
matched by an end tag, pipe the input through hxnormalize -x first.
The following options are supported:
-l Add "L" lines to the output to indicate the line numbers in the source.
The following operand is supported:
The name or URL of an HTML file. If absent, standard input is read instead.
The following exit values are returned:
0 Successful completion.
> 0 An error occurred in the parsing of the HTML file. hxpipe will try to correct
the error and produce output anyway.
To use a proxy to retrieve remote files, set the environment variables http_proxy and
ftp_proxy. E.g., http_proxy="http://localhost:8080/"
Use hxpipe online using onworks.net services