Steve Ball'
NAME
::xml::parser - XML parser support for Tcl
SYNOPSIS
ppaacckkaaggee rreeqquuiirree xxmmll ppaacckkaaggee rreeqquuiirree parserclass xxmmll22..66 ::::xxmmll::::ssggmmll::::xxmmll::::ttccllppaarrsseerr ::::xxmmll::::ppaarrsseerrccllaassss option ? arg arg ... ?::::xxmmll::::ppaarrsseerr ?? name? ? -option value ... ?
parser option argDESCRIPTION
TclXML provides event-based parsing of XML documents. The application
may register callback scripts for certain document features, and when the parser encounters those features while parsing the document the callback is evaluated. The parser may also perform other functions, such as normalisation, validation and/or entity expansion. Generally, these functions are under the control of configuration options. Whether these functions can be performed at all depends on the parser implementation.The TclXML package provides a generic interface for use by a Tcl appli-
cation, along with a low-level interface for use by a parser implemen-
tation. Each implementation provides a class of XML parser, and these register themselves using the ::::xxmmll::::ppaarrsseerrccllaassss ccrreeaattee command. One of the registered parser classes will be the default parser class. Loading the package with the generic ppaacckkaaggee rreeqquuiirree xxmmll command allows the package to automatically determine the default parser class. In order to select a particular parser class as the default, that class' package may be loaded directly, eg. ppaacckkaaggee rreeqquuiirree eexxppaatt. In all cases, all available parser classes are registered with the TclXML package, the difference is simply in which one becomes the default. CCOOMMMMAANNDDSS ::::xxmmll::::ppaarrsseerrccllaassss The ::::xxmmll::::ppaarrsseerrccllaassss command is used to manage XML parser classes. Command Options The following command options may be used:ccrreeaattee create name ? -createcommand
script? ? -createentityparsercommand
script? ? -parsecommand script? ? -config-
urecommand script? ? -getcommand script? ?
-deletecommand script?
Creates an XML parser class with the given name. ddeessttrrooyy destroy name Destroys an XML parser class. iinnffoo info names Returns information about registered XML parser classes. ::::xxmmll::::ppaarrsseerr The ::::xxmmll::::ppaarrsseerr command creates an XML parser object. The return value of the command is the name of the newly created parser. The parser scans an XML document's syntactical structure, evaluating callback scripts for each feature found. At the very least the parser will normalise the document andcheck the document for well-formedness. If the document
is not well-formed then the -errorcommand option will be
evaluated. Some parser classes may perform additionalfunctions, such as validation. Additional features pro-
vided by the various parser classes are described in the section Parser Classes Parsing is performed synchronously. The command blocks until the entire document has been parsed. Parsing may be terminated by an application callback, see the sectionCallback Return Codes. Incremental parsing is also sup-
ported by using the -final configuration option.
Configuration Options The ::::xxmmll::::ppaarrsseerr command accepts the following configuration options:-attlistdeclcommand
-attlistdeclcommand script
Specifies the prefix of a Tclcommand to be evaluated when-
ever an attribute list decla-
ration is encountered in theDTD subset of an XML docu-
ment. The command evaluated is: script name attrname type default value where: nameEle-
ment type nameattr-
name Attribute name being declared type Attribute type default Attribute default, such as#IMPLIED
value Default attribute value. Empty string if none given.-baseurl
-baseurl
URISpeci-
fies the base URI forresolv-
ing rel-
ative URIs that may be used in theXML doc-
ument to refer to externalenti-
ties.-char-
acter-
datacom-
mand-charac-
terdata-
command scriptSpeci-
fies the prefix of a Tcl command to beevalu-
ated whenevercharac-
ter data isencoun-
tered in the XML document being parsed.The com-
mandevalu-
ated is: script data where: dataChar-
ac-
ter data in thedoc-
u-
ment-com-
ment-
com-
mand-com-
ment-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
atedwhen-
ever acom-
ment isencoun-
tered in the XMLdoc-
u-
ment being parsed. Thecom-
mandeval-
u-
ated is: script data where: dataCom-
ment data-default-
com-
mand-default-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when no othercall-
back has been defined for adoc-
u-
mentfea-
ture which has beenencoun-
tered. Thecom-
mandeval-
u-
ated is: script data where: dataDoc-
u-
ment data-defaul-
t-
ex-
pand-
in-
ter-
nalen-
ti-
ties-defaul-
t-
ex-
pand-
in-
ter-
nalen-
ti-
ties booleanSpec-
i-
fies whetherenti-
ties declared in theinter-
nal DTDsub-
set are expanded with theirreplace-
ment text. Ifenti-
ties are not expanded then the entityref-
er-
ences will be reported with noexpan-
sion.-doc-
type-
com-
mand-doc-
type-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when thedoc-
u-
ment typedec-
la-
ra-
tion isencoun-
tered. Thecom-
mandeval-
u-
ated is: script namepub-
licsys-
tem dtd where: name The name of thedoc-
u-
mentele-
mentpub-
licPub-
liciden-
ti-
fier for theexter-
nal DTDsub-
setsys-
temSys-
temiden-
ti-
fier for theexter-
nal DTDsub-
set.Usu-
ally a URI. dtd Theinter-
nal DTDsub-
set See also-start-
doc-
type-
de-
clcom-
mand and-end-
doc-
type-
de-
clcom-
mand.-ele-
ment-
de-
clcom-
mand-ele-
ment-
de-
clcom-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when anele-
ment markupdec-
la-
ra-
tion isencoun-
tered. Thecom-
mandeval-
u-
ated is: script name model where: name Theele-
ment type name modelCon-
tent modelspec-
i-
fi-
ca-
tion-ele-
mentend-
com-
mand-ele-
mentend-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when anele-
ment end tag isencoun-
tered. Thecom-
mandeval-
u-
ated is: script name args where: name Theele-
ment type name that has ended argsAddi-
tionalinfor-
ma-
tion about thisele-
mentAddi-
tionalinfor-
ma-
tion about theele-
ment takes the form ofcon-
fig-
u-
ra-
tion options.Pos-
si-
ble options are:-empty
boolean The emptyele-
mentsyn-
tax was used for thisele-
ment-names-
pace uri Theele-
ment is in the XMLnames-
paceasso-
ci-
ated with the given URI-ele-
mentstart-
com-
mand-ele-
mentstart-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when anele-
ment start tag isencoun-
tered. Thecom-
mandeval-
u-
ated is: script name attlist args where: name Theele-
ment type name that has started attlist A Tcl listcon-
tain-
ing the attributes for thisele-
ment. The list of attributes isfor-
mat-
ted as pairs of attribute names and theirval-
ues. argsAddi-
tionalinfor-
ma-
tion about thisele-
mentAddi-
tionalinfor-
ma-
tion about theele-
ment takes the form ofcon-
fig-
u-
ra-
tion options.Pos-
si-
ble options are:-empty
boolean The emptyele-
mentsyn-
tax was used for thisele-
ment-names-
pace uri Theele-
ment is in the XMLnames-
paceasso-
ci-
ated with the given URI-names-
pacede-
cls list The start tag included one or more XMLNames-
pacedec-
la-
ra-
tions. list is a Tcl listgiv-
ing thenames-
paces declared. The list isfor-
mat-
ted as pairs ofval-
ues, the first value is thenames-
pace URI and thesec-
ond value is thepre-
fix used for thenames-
pace in thisdoc-
u-
ment. A default XMLnames-
pacedec-
la-
ra-
tion will have an empty string for thepre-
fix.-end-
c-
data-
sec-
tion-
com-
mand-end-
c-
data-
sec-
tion-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when end of a CDATAsec-
tion isencoun-
tered. Thecom-
mand iseval-
u-
ated with nofur-
therargu-
ments.-end-
doc-
type-
de-
clcom-
mand-end-
doc-
type-
de-
clcom-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when end of thedoc-
u-
ment typedec-
la-
ra-
tion isencoun-
tered. Thecom-
mand iseval-
u-
ated with nofur-
therargu-
ments.-enti-
ty-
de-
clcom-
mand-enti-
ty-
de-
clcom-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when an entitydec-
la-
ra-
tion isencoun-
tered. Thecom-
mandeval-
u-
ated is: script name args where: name The name of the entity being declared argsAddi-
tionalinfor-
ma-
tion about the entitydec-
la-
ra-
tion. Aninter-
nal entity shall have asin-
gleargu-
ment, thereplace-
ment text. Anexter-
nal parsed entity shall have twoaddi-
tionalargu-
ments, thepub-
lic andsys-
teminden-
ti-
fiers of theexter-
nal resource. Anexter-
nal unparsed entity shall have threeaddi-
tionalargu-
ments, thepub-
lic andsys-
temiden-
ti-
fiersfol-
lowed by thenota-
tion name.-enti-
tyref-
er-
ence-
com-
mand-enti-
tyref-
er-
ence-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when an entityref-
er-
ence isencoun-
tered. Thecom-
mandeval-
u-
ated is: script name where: name The name of the entity beingref-
er-
enced-errro-
com-
mand-error-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when a fatal error is detected. The error may be due to the XMLdoc-
u-
ment not beingwell-
formed. In the case of aval-
i-
dat-
ing parser class, the error may also be due to the XMLdoc-
u-
ment notobey-
ingvalid-
itycon-
straints. By default, acall-
back script ispro-
vided which causes an error return code, but anappli-
ca-
tion maysup-
ply a script which attempts tocon-
tinuepars-
ing. Thecom-
mandeval-
u-
ated is: scripterror-
code errormsg where:error-
code Asin-
gle worddescrip-
tion of the error, intended for use by anappli-
ca-
tion errormsg Ahuman-
read-
abledescrip-
tion of the error-exter-
nalen-
ti-
ty-
com-
mand-exter-
nalen-
ti-
ty-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated to resolve anexter-
nal entityref-
er-
ence. If the parser has beencon-
fig-
ured toval-
i-
date the XMLdoc-
u-
ment, a default script issup-
plied that resolves the URI given as thesys-
temiden-
ti-
fier of theexter-
nal entity andrecur-
sively parses the entity's data. If the parser has beencon-
fig-
ured as anon-
val-
i-
dat-
ing parser, then by defaultexter-
nalenti-
ties are not resolved. This option can be used toover-
ride the defaultbe-
hav-
iour. Thecom-
mandeval-
u-
ated is: script name baseuri uri id where: name The Tclcom-
mand name of thecur-
rent parser baseuri Anabso-
lute URI for thecur-
rent entity which is to be used to resolverel-
a-
tive URIs uri Thesys-
temiden-
ti-
fier of theexter-
nal entity,usu-
ally a URI id Thepub-
liciden-
ti-
fier of theexter-
nal entity. If nopub-
liciden-
ti-
fier was given in the entitydec-
la-
ra-
tion then id will be an empty string.-final
-final
booleanSpec-
i-
fies whether the XMLdoc-
u-
ment being parsed iscom-
plete. If thedoc-
u-
ment is to beincre-
men-
tally parsed then this option will be set to false, and when the lastfrag-
ment ofdoc-
u-
ment is parsed it is set to true. Forexam-
ple,set parser [::xml::parser -final 0]
$parser parse $data1
$parser parse $data2
$parser configure -final 1
$parser parse $finaldata
-ignorewhites-
pace-ignorewhites-
pace boolean If this option is set to true then spans ofchar-
ac-
ter data in the XMLdoc-
u-
ment which arecom-
posed only ofwhite-
space (CR, LF, space, tab) will not be reported to theappli-
ca-
tion. In other words, the data passed to everyinvo-
ca-
tion of the-char-
ac-
ter-
dat-
a-
com-
mand script willcon-
tain at least onenon-
white-
spacechar-
ac-
ter.-nota-
tion-
de-
clcom-
mand-nota-
tion-
de-
clcom-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when anota-
tiondec-
la-
ra-
tion isencoun-
tered. Thecom-
mandeval-
u-
ated is: script name uri where: name The name of thenota-
tion uri Anexter-
naliden-
ti-
fier for thenota-
tion,usu-
ally a URI.-not-
stan-
dalonecom-
mand-not-
stan-
dalonecom-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when the parserdeter-
mines that the XMLdoc-
u-
ment being parsed is not astand-
alonedoc-
u-
ment.-para-
men-
ti-
ty-
pars-
ing-para-
men-
ti-
ty-
pars-
ing booleanCon-
trols whetherexter-
nalparam-
e-
terenti-
ties are parsed.-param-
e-
ter-
en-
ti-
ty-
de-
clcom-
mand-param-
e-
ter-
en-
ti-
ty-
de-
clcom-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when aparam-
e-
ter entitydec-
la-
ra-
tion isencoun-
tered. Thecom-
mandeval-
u-
ated is: script name args where: name The name of theparam-
e-
ter entity args For aninter-
nalparam-
e-
ter entity there is only oneaddi-
tionalargu-
ment, thereplace-
ment text. Forexter-
nalparam-
e-
terenti-
ties there are twoaddi-
tionalargu-
ments, thesys-
tem andpub-
liciden-
ti-
fiersrespec-
tively.-parser
-parser
name The name of the parser class toinstan-
ti-
ate for this parser object. This option may only bespec-
i-
fied when the parser instance iscre-
ated.-pro-
cessin-
gin-
struc-
tion-
com-
mand-pro-
cessin-
gin-
struc-
tion-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when apro-
cess-
inginstruc-
tion isencoun-
tered. Thecom-
mandeval-
u-
ated is: scripttar-
get data where:tar-
get The name of thepro-
cess-
inginstruc-
tiontar-
get dataRemain-
ing data from thepro-
cess-
inginstruc-
tion-reportempty
-reportempty
boolean If this option is enabled then when anele-
ment isencoun-
tered that uses thespe-
cial emptyele-
mentsyn-
tax,addi-
tionalargu-
ments are appended to the-ele-
mentstart-
com-
mand and-ele-
mentend-
com-
mandcall-
backs. Theargu-
ments-empty
1 are appended. Forexam-
ple: script-empty
1-startc-
data-
sec-
tion-
com-
mand-startc-
data-
sec-
tion-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when the start of a CDATAsec-
tionsec-
tion isencoun-
tered. Noargu-
ments are appended to the script.-start-
doc-
type-
de-
clcom-
mand-start-
doc-
type-
de-
clcom-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated at the start of adoc-
u-
ment typedec-
la-
ra-
tion. Noargu-
ments are appended to the script.-unknow-
nen-
cod-
ing-
com-
mand-unknow-
nen-
cod-
ing-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when achar-
ac-
ter isencoun-
tered with an unknownencod-
ing. This option has not beenimple-
mented.-unparseden-
ti-
ty-
de-
clcom-
mand-unparseden-
ti-
ty-
de-
clcom-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when adec-
la-
ra-
tion isencoun-
tered for an unparsed entity. Thecom-
mandeval-
u-
ated is: scriptsys-
tempub-
licnota-
tion where:sys-
tem Thesys-
temiden-
ti-
fier of theexter-
nal entity,usu-
ally a URIpub-
lic Thepub-
liciden-
ti-
fier of theexter-
nal entitynota-
tion The name of thenota-
tion for theexter-
nal entity-val-
i-
date-val-
i-
date boolean Enablesval-
i-
da-
tion of the XMLdoc-
u-
ment to be parsed. Any changes to this option are ignored after an XMLdoc-
u-
ment has started to be parsed, but the option may be changed after a reset.-warn-
ing-
com-
mand-warn-
ing-
com-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when awarn-
ingcon-
di-
tion is detected. Awarn-
ingcon-
di-
tion is where the XMLdoc-
u-
ment has not been authoredcor-
rectly, but is stillwell-
formed and may be valid. Forexam-
ple, thespe-
cial emptyele-
mentsyn-
tax may be used for anele-
ment which has not been declared to have emptycon-
tent. By default, acall-
back script ispro-
vided which silently ignores thewarn-
ing. Thecom-
mandeval-
u-
ated is: scriptwarn-
ing-
codewarn-
ingmsg where:warn-
ing-
code Asin-
gle worddescrip-
tion of thewarn-
ing, intended for use by anappli-
ca-
tionwan-
ringmsg Ahuman-
read-
abledescrip-
tion of thewarn-
ing-xmlde-
clcom-
mand-xmlde-
clcom-
mand scriptSpec-
i-
fies thepre-
fix of a Tclcom-
mand to beeval-
u-
ated when the XMLdec-
la-
ra-
tion isencoun-
tered. Thecom-
mandeval-
u-
ated is: scriptver-
sionencod-
ingstand-
alone where:ver-
sion Thever-
sionnum-
ber of the XMLspec-
i-
fi-
ca-
tion to which thisdoc-
u-
mentpur-
ports tocon-
formencod-
ing Thechar-
ac-
terencod-
ing of thedoc-
u-
mentstand-
alone A booleandeclar-
ing whether thedoc-
u-
ment isstand-
alone ParserCom-
mand The ::::xxmmll::::ppaarrsseerrcom-
mandcre-
ates a new Tclcom-
mand with the same name as the parser. Thiscom-
mand may be used to invokevar-
i-
ousoper-
a-
tions on the parser object. It has thefol-
low-
inggen-
eral form: name option arg option and the argdeter-
mine the exactbe-
hav-
iour of thecom-
mand. Thefol-
low-
ingcom-
mands arepos-
si-
ble for parser objects: cget cget-option
Returns thecur-
rent value of thecon-
fig-
u-
ra-
tion option given by option. Option may have any of theval-
ues accepted by the parser object.con-
fig-
urecon-
fig-
ure ?-option
value ... ?Mod-
ify thecon-
fig-
u-
ra-
tion options of the parser object. Option may have any of theval-
ues accepted by the parser object.enti-
ty-
parserenti-
ty-
parser ? option value ... ?Cre-
ates a new parser object. The new objectinher-
its the samecon-
fig-
u-
ra-
tion options as thepar-
ent parser object, but is able to parse XML data in a parsed entity. The option-dtd-
sub-
set allows markupdec-
la-
ra-
tions to be treated as being in theinter-
nal orexter-
nal DTDsub-
set. free free name Frees all resourcesasso-
ci-
ated with the parser object. The object is not usable after thiscom-
mand has been invoked. get get name args Returnsinfor-
ma-
tion about the XMLdoc-
u-
ment being parsed. Each parser classpro-
videsdif-
fer-
entinfor-
ma-
tion, see thedoc-
u-
men-
ta-
tion for the parser class. parse parse xml args Parses the XMLdoc-
u-
ment. The usual desired effect is forvar-
i-
ousappli-
ca-
tioncall-
backs to beeval-
u-
ated. Otherfunc-
tions will also beper-
formed by the parser class, at the very least this includescheck-
ing the XMLdoc-
u-
ment forwell-
formed-
ness. reset resetIni-
tialises the parser object inprepa-
ra-
tion forpars-
ing a new XMLdoc-
u-
ment.CCAALLLL-
BBAACCKK RREETTUURRNN CCOODDEESS Everycall-
back scripteval-
u-
ated by a parser may return a return code other than TCLOK. Return codes areinter-
preted asfol-
lows: breakSup-
pressesinvo-
ca-
tion of allfur-
thercall-
back scripts. The parse method returns the TCLOK return code.con-
tinueSup-
pressesinvo-
ca-
tion offur-
thercall-
back scripts until thecur-
rentele-
ment hasfin-
ished. errorSup-
pressesinvo-
ca-
tion of allfur-
thercall-
back scripts. The parse method also returns theTCLERROR
return code. default Any other return codesup-
pressesinvo-
ca-
tion of allfur-
thercall-
back scripts. The parse method returns the same return code.AAPPPPLLII-
CCAA-
TTIIOONNEEXXAAMM-
PPLLEESS This scriptout-
puts thechar-
ac-
ter data of an XMLdoc-
u-
ment read from stdin. package require xml proc cdata {data args} {puts -nonewline $data
}set parser [::xml::parser -characterdatacommand cdata]
$parser parse [read stdin]
This script counts thenum-
ber ofele-
ments in an XMLdoc-
u-
ment read from stdin. package require xml proc EStart {varName name attlist args} {upvar #0 $varName var
incr var } set count 0set parser [::xml::parser -elementstartcommand [list EStart count]]
$parser parse [read stdin]
puts "The XML document contains $count elements"
PPAARRSSEERR CCLLAASSSSEESS Thissec-
tion willdis-
cuss how a parser class isimple-
mented. Tcl Parser Class Thepure-
Tcl parser class requires nocom-
pi-
la-
tion-
it is acol-
lec-
tion of Tcl scripts. This parserimple-
men-
ta-
tion isnon-
val-
i-
dat-
ing, ie. it can only checkwell-
formed-
ness in adoc-
u-
ment.How-
ever, by enabling the-val-
i-
date option it will read thedoc-
u-
ment's DTD and resolveexter-
nalenti-
ties. This parserimple-
men-
ta-
tion aims toimple-
ment XML v1.0 andsup-
ports XMLNames-
paces.Gen-
er-
ally the parserpro-
duces XML Infosetinfor-
ma-
tion items. That is, it gives theappli-
ca-
tion a slightlyhigher-
level view than the raw XMLsyn-
tax. Forexam-
ple, it does not report CDATASec-
tions. Expat Parser Class Thissec-
tion willdis-
cuss the Expat parser class. SSEEEE AALLSSOO TclDOM, a Tclinter-
face for the W3CDoc-
u-
ment Object Model.KKEEYY-
WWOORRDDSSTcl Built-In Commands Tcl TclXML(n)