4. XML Document Type Definition (DTD)
Since XML allows you to define your own meaningful
tags to customize your documents, some mechanisms are needed for you to define
elements, attributes, and structures of your XML documents. DTD is one of the solutions for specifying
the markups for your XML document. A DTD serves the following purposes:
·
Defines and documents XML elements (tags, or markup)
·
Enforces compliance within the attributes (parameters)
·
Enables an XML parser to validate XML document during the
document parsing phase
Element Declarations
The
keyword <!ELEMENT > is used to declare DTD elements. Elements declared
for XML documents may contain the following definitions:
The naming rules for the elements are:
o
must begin with a letter and
o
may only contains letters, digits, hyphens, underscores
Elements may be declared to be one of the four types: EMPTY, ANY, a set
of child elements, or a mixed content.
Any
useful XML document must declare both elements and attributes. The keyword
“ATTLIST” allows a list of attributes to be defined with a combination of the
following 10 XML attribute types for an element. Attributes gives the
applications with some extra information or meaning about the element.
o
# - this symbol is used to show that the
character value will be decimal
o
#x - this combination indicates the
hexadecimal notation
·
Occurrence Operators allow the attributes of a declared element to
appear a number of times:
o
? - Must appear 0 or 1 time
o
+ - Must appear 1 or more times
o
* - May appear 0 or more times
Language and White Space Attributes
The
two W3C recommended xml attributes are xml:lang and xml:space. If white space
is significant in an XML document, the following declaration can be used:
<!ATTLIST element xml:space (default|preserve)
'defaultchoice'>
The xml:space
attribute specifies the white spaces as “preserve” or “default” .
<?xml
version=”1.0”>
<quote
xml:space=”preserved”>
…
</quote>
xml:lang - attribute to specify the language used in
the contents and attribute values of any element in the XML document. For example, “en-US” identifies US English
and “en-GB” recognizes English from United Kingdom. It can be used as the
following format:
<?xml
version=”1.0”>
<description
xml:lang=”en-US”>
…
</description>
ATTLIST Declaration Syntax
The ATTLIST declaration defines the element which can have
·
the
attribute
·
the
name of the attribute
·
the
type of the attribute, and
·
the
default attribute value.
The syntax is
<!ATTLIST element_name
attribute_name attribute_type
default_value>
Default Attribute Value: after the elements and attributes are defined with a default value of “0” for width. An XML example can overwrite it with a desired width value.
Example 1:
<!ELEMENT square EMPTY>
<!ATTLIST square width CDATA "0">
<square width="100"></square>
For CDATA type attributes, a default value should be enclosed by a pair
of double quotes as shown in the following syntax:
<!ATTLIST element_name attribute_name CDATA "default-value">
Example 2:
For example, we may have the following payment application that defines the default payment method to be direct-deposit:
<!ATTLIST payment type CDATA "credit_card">
<payment type="credit_card">
IMPLIED Attribute Value: The option is
considered when you have the following situations:
·
don't
want to force the author to include an attribute, and
·
don't
have an option for a default value either
Example
3:
Syntax:
<!ATTLIST element-name attribute-name attribute-type #IMPLIED>
Example:
<!ATTLIST contact phone CDATA #IMPLIED>
<contact phone="260-481-6339">
FIXED attribute value is used when an attribute value
should be fixed and the change is not allowed. If an author uses another value,
an error will be returned by the XML parser.
Example
4:
Syntax:
<!ATTLIST element_name attribute_name attribute_type #FIXED "value">
Example:
<!ATTLIST vendor company CDATA #FIXED "A-Dell">
<vendor company="A-Dell">
REQUIRED attribute value is use when you don't have an option
for a default value, but the attribute value is required to be present.
Example 5:
Syntax:
<!ATTLIST element-name attribute_name attribute-type #REQUIRED>
Example:
<!ATTLIST staff job_id CDATA #REQUIRED>
<staff jod_id="2354">
ENERERATED attribute values may be needed when you want the attribute
values to be one of a fixed set of legal values. In this declaration, many
possible values are separated by vertical bars.
Example
6:
Syntax:
<!ATTLIST element-name attribute-name (eval|eval|..) default-value>
Example:
<!ATTLIST payment type (credit_card|check|cash) "cash">
<payment type="credit_card">
or
<payment type="cash">
or
<payment type="check">
The Document Type Declaration
In XML document, the document type definition can be declared as either
· Internal DTD - to be used as a standalone DTD
· External DTD – to be store in a Web server for many XML documents, and it can further subdivide into two the following two types:
o private DTD (using SYSTEM keyword) – for use by a user of a group of user
o public DTD (using PUBLIC keyword) - for wider use
Rules for defining private external DTD:
<!DOCTYPE root_element SYSTEM "DTD_location">
where DTD_location can be relative or absolute URL.
Rules for defining public external DTD:
<!DOCTYPE root_element PUBLIC "DTD_name" "DTD_location">
where DTD_name refers to a DTD description to be looking for first. If it cannot be located, the DTD_location is identified. The following two DTD declarations show how to declare public external DTDs:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML
4.0//EN"
"http://www.w3.org/TR/REC-html40/strict.dtd">
Internal DTD
For some XML documents, we may only refer to DTD within certain XML files. This is referred as internal DTD. We insert DTD element and attribute declarations directly inside the desired XML document. Note that the symbols “[“ and “]” are used as delimiters the inside the <!DOCTYPE> directive as shown in Example 1.
Example 7: CDATA or CHARATER DATA defined as internal DTD.
This example explains how to use ATTLIST and CDATA within an XML page.
An image DTD declaration shows that “window” is an element, and may be EMPTY. The height and width are two attributes of the image element. After the DTD is declared within the DOCTYPE, the attributes can be assigned with number represented in character format.
<?xml version="1.0" encoding=”utf-8” standalone="yes"?>
<!DOCTYPE window [
<!ELEMENT window EMPTY>
<!ATTLIST window height CDATA #REQUIRED>
<!ATTLIST window width CDATA #REQUIRED>]>
<window height="32" width="32"/>
Example 8: A private external DTD example using SYSTEM keyword.
The following DOCTYPE declaration uses the SYSTEM keyword to show that this DTD will be used as a private DTD:
<!DOCTYPE picture SYSTEM http://www.etcs.ipfw.edu/~lin/xml/dtds/picture.dtd>
shows that the root element of XML document is “picture” and that the DTD for this document is located at the URI: http://www.etcs.ipfw.edu/~lin/xml/dtds/picture.dtd
We can also use following relative URLs to locate the DTD. It depends on where the desired DTD is located:
· If reside in the same site: <!DOCTYPE picture SYSTEM “/dtds/person.dtd”>
· If reside in the same directory: <!DOCTYPE picture SYSTEM “person.dtd”>
Example 9: An ID Example.
<?xml version="1.0" encoding=”utf-8” standalone="yes"?>
<!DOCTYPE employee_name [
<!ELEMENT employee_name (#PCDATA)>
<!ATTLIST employee_name employee_no ID #REQUIRED>
]>
<employee_name employee_no="temp999111000">Jon Doe</employee_name>
Example 10: AN IREF Example
<?xml version="1.0" encoding=”utf-8” standalone="yes"?>
<!DOCTYPE project_team [
<!ELEMENT project_team (employee_name)*>
<!ELEMENT employee_name (#PCDATA)>
<!ATTLIST employee_name employee_no ID #REQUIRED>
<!ATTLIST employee_name projectmanager_1 IDREF #IMPLIED>
<!ATTLIST employee_name projectmanager_2 IDREF #IMPLIED>
]>
<project_team>
<employee_name employee_no="nj12345678">Al King</employee_name>
<employee_name employee_no="ny12890000">Sam Bush</employee_name>
<employee_name employee_no="temp999111000"
projectmanager_1="nj12345678" projectmanager_2="ny12890000">Jon Doe</employee_name>
</project_team>
Example 11: ENTITY Example
<?xml version="1.0" encoding=”utf-8” standalone="no"?>
<!DOCTYPE lab_report [
<!ELEMENT lab_report (results)*>
<!ELEMENT results EMPTY>
<!ATTLIST results image ENTITY #REQUIRED>
<!ENTITY ipfw SYSTEM
"http://www.ipfw.edu/icons/ipfw.gif">
]>
<lab_report>
<results image="ipfw"/>
<lab_report>
Example 12: An ENTITIES Example
<?xml version="1.0" encoding=”utf-8” standalone="no"?>
<!DOCTYPE lab_report_all [
<!ELEMENT lab_report_1 (results)*>
<!ELEMENT results EMPTY>
<!ATTLIST results images ENTITIES #REQUIRED>
<!ENTITY imga SYSTEM
"http://www.ecet.ipfw.edu/ecet/lab/imga.gif">
<!ENTITY imgb SYSTEM
"http://www.ecet.ipfw.edu/ecet/lab/imgb.gif">
<!ENTITY imgc SYSTEM
"http://www.ecet.ipfw.edu/ecet/lab/imgc.gif ">
]>
< lab_report_all>
<results images="imga imgb imgc"/>
</ lab_report_all>
Example 13: NMTOKEN Example
<?xml version="1.0"?>
<!DOCTYPE student_name [
<!ELEMENT student_name (#PCDATA)>
<!ATTLIST student_name student_no NMTOKEN #REQUIRED>
]>
<student_name student_no="999888777">Mike Adams</student_name>
Example 14: NOTATION Example
<?xml version="1.0"?>
<!DOCTYPE program_code [
<!ELEMENT program_code (#PCDATA)>
<!NOTATION C PUBLIC "ANSI C">
<!ATTLIST program_code lang NOTATION (C) #REQUIRED>
]>
<code lang="C">Some C statements</code>
Example 15: EMERATION Example
<?xml version="1.0"?>
<!DOCTYPE assignments [
<!ELEMENT assignments (homework)*>
<!ELEMENT homework (#PCDATA)>
<!ATTLIST homework status (urgent|normal) #REQUIRED>
]>
<assignments>
<homework status="urgent">This is an urgent
homework, it’s due date is now</homework>
<homework status="normal">This homework due next week</homework>
</assignments>
External DTD Examples
The list below shows a DTD definition for the parent element “person”
which contains two child elements “first_mame”, and “last_name”. It also
specifies that the child elements “first_name” and “last_name” may contain
PCDATA (parsed character) or text data type. At the end, it defines an empty
tag called “nothing”. We may edit and
save this person.dtd file in an appropriate Web server directory. The Web
server also should be configured to recognize the MIME media type as /xml-dtd.
Example 16: A person DTD.
<!ELEMENT person (firs_tname, middle_name, last_name)>
<!ELEMENT firs_tname (#PCDATA)>
<!ELEMENT middle_name (#PCDATA)>
<!ELEMENT last_name (#PCDATA)>
If the person.dtd file is saved in the following Web server folder, any
XML document may include a reference to this external DTD with the following
DOCTYPE declaration:
<!DOCTYPE person SYSTEM
“http://www.etcs.ipfw.edu/~lin/xml/dtds/person.dtd
Where the “person” should be the root element matched in person.dtd.
If the display of XML content is needed, an XML-compliant application will process the following three files to display it:
Example 17: An XML-based business letter with a document type definition
file. The letter.dtd is the external data type definition (DTD) file which
define meaning, constraints, and types of elements to be used in the letter.xml
file. <!DOCTYPE
<?xml version =
"1.0"?>
<!-- letter.xml -->
<!—An XML Formatted Business
Letter -->
<!DOCTYPE letter SYSTEM
"letter.dtd">
<letter>
<contact type = "from">
<name>Paul Lin</name>
<address1>ECET Dept, Purdue U Fort Wayne
campus</address1>
<address2>2101 Coliseum Blvd. East</address2>
<city>Fort Wayne</city>
<state>Indiana</state>
<zip>46805</zip>
<phone>219-481-6339</phone>
<flag gender = "M"/>
</contact>
<contact type = "to">
<name>Lisa Smith</name>
<address1>PO. Box 54321</address1>
<address2/> <!-- Empty -->
<city>Othertown</city>
<state>Otherstate</state>
<zip>46815</zip>
<phone>219-555-4321</phone>
<flag gender = "F"/>
</contact>
<paragraph>Dear Customer,</paragraph>
<paragraph>Thank you very much for your
business.</paragraph>
<paragraph>Sincerely, Paul Lin</paragraph>
</letter>
<!-- letter.dtd -->
<!--Document Type
Definition file is saved with a dtd extension -->
<!ELEMENT letter
(contact+, paragraph+)>
<!ELEMENT contact
(name, address1, address2, city, state,
zip, phone, flag)>
<!ATTLIST contact type
CDATA #IMPLIED>
<!ELEMENT name
(#PCDATA)>
<!ELEMENT address1
(#PCDATA)>
<!ELEMENT address2
(#PCDATA)>
<!ELEMENT city
(#PCDATA)>
<!ELEMENT state
(#PCDATA)>
<!ELEMENT zip
(#PCDATA)>
<!ELEMENT phone
(#PCDATA)>
<!ELEMENT flag
(EMPTY)>
<!ATTLIST flag id CDATA
#IMPLIED>
<!ELEMENT paragraph
(#PCDATA)>
XML
provide a number of ways for dealing with entities (constants): general
entities and parameter entities. The built-in general entities including
&, ", &, <, and >, are for use in XML
documents. To declare an entity the keyword ENTITY is used.
The
general entities are also used in the declaration for entity text replacement
that is similar to the purpose of #define directive in C language.
Internal
parser entities use the following declaration syntax:
<!ENTITY entity_name “entity_content”>
After
the entity is defined, the following syntax:
&entity_name
can
be used to reference to the defined entity: entity_content.
<!ENTITY entity_name SYSTEM
“URI_of_entity_content”>
<!ENTITY entity_name PUBLIC “Public_Identifier”
“URI_of_entity_content”>
Many
information on the Internet including plain text, PDF files, image files (JPEG,
GIF), steam video (QuickTime movies),
and MIDI sound files are not well-formed XML content, but are necessary
component of many XML documents. The unparsed entities allow us to define
non-XML content and are completely bypassed by the XML parsers.
NDATA – stands for notation data which a keyword for specifying the type of
unparsed entities.
The
NOTATION declaration adds information about the type of content.
Simple
MIME type content can be notated as:
<!NOTATON
jpg SYSTEM “image/ipeg”>
<!NOTATON
gif SYSTEM “image/gif”>
<!ELEMENT
photo EMPTY>
<!ATTLIST
photo source ENTITY #REQUIRED>
Parameter Entity Syntax
The syntax for an external parameter entity is:
<ENTITY %entity_name SYSTEM
URI>
where the entity_nam is the name
of the entity and URI is the path to the file containing the entity content.
To reference the external parameter entity in a
DTD, use the following syntax:
%EntityName
Example 18: Declaring external parameter entities
to us a DTD. There are two dtd files in
this example: customer.dtd and order.dtd.
We could declare an external parameter entity in the order.dtd to use
the customer.dtd file: <!ENTITY % customer_info SYSTEM
"customer.dtd">. Then, we
use the entity name: %customer_info to reference the customer.dtd.
The customer.dtd that defines the elements for a
customer’s shipping address:
<?xml version='1.0' encoding='UTF-8' ?>
<!ELEMENT shipping_address (street , city , state , zip)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
<!ELEMENT name (firstname , lastname)>
<!ELEMENT firstName (#PCDATA)>
<!ELEMENT lastName (#PCDATA)>
<!ELEMENT customer (name , shipping_address?)>
<!ATTLIST customer Id CDATA #REQUIRED >
Thus,
the order.dtd for a customer might used the predefined customer information in
customer.dtd. It looks like this:
<?xml
version='1.0' encoding='UTF-8' ?>
<!ENTITY % CustomerInfo SYSTEM "Customer.dtd">
%CustomerInfo;
<!ELEMENT order (customer , shipping_address , sales_rep_id , order_date ,
promised_date? , ship_date , status_code , ship_method_code , payment_method ,
OrderLine+)>
<!ATTLIST order order_number CDATA #REQUIRED
ActiveFlag (Y | N ) 'Y' >
<!ELEMENT sales_rep_d (#PCDATA)>
<!ELEMENT order_date (#PCDATA)>
<!ELEMENT promised_date (#PCDATA)>
<!ELEMENT ship_method_code (#PCDATA)>
<!ELEMENT payment_method (#PCDATA)>
<!ELEMENT order_line (ordered_product , quantity)>
<!ATTLIST order_line order_line_num CDATA #REQUIRED >
<!ELEMENT product (UPCcode , description , color , in_stock_num ,
reorder_level_num , size_code?)>
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT UPCcode (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT color (#PCDATA)>
<!ELEMENT in_stock_num (#PCDATA)>
<!ELEMENT reorder_level_num (#PCDATA)>
<!ELEMENT ordered_product (UPCcode , Color, description , size_code?)>
<!ELEMENT size_code (#PCDATA)>
<!ELEMENT ship_date (#PCDATA)>