4. XML Document Type Definition (DTD)

 

Since XML allows you to define your own meaningful tags to customize your documents, some mechanisms are needed for you to define elements, attributes, and structures of your XML documents.  DTD is one of the solutions for specifying the markups for your XML document. A DTD serves the following purposes:

·         Defines and documents XML elements (tags, or markup)

·         Enforces compliance within the attributes (parameters)

·         Enables an XML parser to validate XML document during the document parsing phase

Element Declarations

The keyword <!ELEMENT > is used to declare DTD elements. Elements declared for XML documents may contain the following definitions:

The naming rules for the elements are:

o        must begin with a letter and

o        may only contains letters, digits, hyphens, underscores

Elements may be declared to be one of the four types: EMPTY, ANY, a set of child elements, or a mixed content.

 

Attribute Declarations

Any useful XML document must declare both elements and attributes. The keyword “ATTLIST” allows a list of attributes to be defined with a combination of the following 10 XML attribute types for an element. Attributes gives the applications with some extra information or meaning about the element.

o        #           -           this symbol is used to show that the character value will be decimal 

o        #x           -          this combination indicates the hexadecimal notation

·         Occurrence Operators allow the attributes of a declared element to appear a number of times:

o        ?          -           Must appear 0 or 1 time

o        +           -          Must appear 1 or more times

o        *          -          May appear 0 or more times

 

Language and White Space Attributes

The two W3C recommended xml attributes are xml:lang and xml:space. If white space is significant in an XML document, the following declaration can be used:

<!ATTLIST element xml:space (default|preserve) 'defaultchoice'>

The xml:space attribute specifies the white spaces as “preserve” or “default” .

<?xml version=”1.0”>

<quote xml:space=”preserved”>

</quote>

 

xml:lang - attribute to specify the language used in the contents and attribute values of any element in the XML document.  For example, “en-US” identifies US English and “en-GB” recognizes English from United Kingdom. It can be used as the following format:

<?xml version=”1.0”>

<description xml:lang=”en-US”>

</description>

 

ATTLIST Declaration Syntax

The ATTLIST declaration defines the element which can have

·         the attribute

·         the name of the attribute

·         the type of the attribute, and

·         the default attribute value.

The syntax is

<!ATTLIST element_name   attribute_name  attribute_type default_value>

 

Default Attribute Value: after the elements and attributes are defined with a default value of “0” for width. An XML example can overwrite it with a desired width value.
Example 1:
<!ELEMENT square EMPTY>
  <!ATTLIST square width CDATA "0">
<square width="100"></square>

 

For CDATA type attributes, a default value should be enclosed by a pair of double quotes as shown in the following syntax:

<!ATTLIST element_name attribute_name CDATA "default-value">
 
Example 2:
For example, we may have the following payment application that defines the default payment method to be direct-deposit:
 
<!ATTLIST payment type CDATA "credit_card">
 
<payment type="credit_card">

 

IMPLIED Attribute Value: The option is considered when you have the following situations:

·         don't want to force the author to include an attribute, and

·         don't have an option for a default value either 

 

Example 3:

Syntax:
<!ATTLIST element-name attribute-name attribute-type #IMPLIED>
Example:
<!ATTLIST contact phone CDATA #IMPLIED>
<contact phone="260-481-6339">

 

FIXED attribute value is used when an attribute value should be fixed and the change is not allowed. If an author uses another value, an error will be returned by the XML parser.

 

Example 4:

Syntax:
 
<!ATTLIST element_name attribute_name attribute_type #FIXED "value">
 
Example:
<!ATTLIST vendor company CDATA #FIXED "A-Dell">
<vendor company="A-Dell">

 

REQUIRED attribute value is use when you don't have an option for a default value, but the attribute value is required to be present.

 
 
Example 5:
Syntax:
<!ATTLIST element-name attribute_name attribute-type #REQUIRED>
Example:
<!ATTLIST staff job_id CDATA #REQUIRED>
<staff jod_id="2354">

 

 

ENERERATED attribute values may be needed when you want the attribute values to be one of a fixed set of legal values. In this declaration, many possible values are separated by vertical bars.

 

Example 6:

Syntax:
<!ATTLIST element-name attribute-name (eval|eval|..) default-value>
 
Example:
<!ATTLIST payment type (credit_card|check|cash) "cash">
 
<payment type="credit_card">
or
<payment type="cash">
or
<payment type="check">

 

 
The Document Type Declaration
In XML document, the document type definition can be declared as either
·         Internal DTD - to be used as a standalone DTD
·         External DTD – to be store in a Web server for many XML documents, and it can further subdivide into two the following two types:
o        private DTD (using SYSTEM keyword) – for use by a user of a group of user
o        public DTD (using PUBLIC keyword) - for wider use
Rules for defining private external DTD:
<!DOCTYPE root_element SYSTEM "DTD_location">
where DTD_location can be relative or absolute URL.
 
Rules for defining public external DTD:
<!DOCTYPE root_element PUBLIC "DTD_name" "DTD_location">
where DTD_name refers to a DTD description to be looking for first. If it cannot be located, the DTD_location is identified. The following two DTD declarations show how to declare public external DTDs: 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
  "http://www.w3.org/TR/REC-html40/loose.dtd">

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"

        "http://www.w3.org/TR/REC-html40/strict.dtd">

 

Internal DTD

For some XML documents, we may only refer to DTD within certain XML files. This is referred as internal DTD. We insert DTD element and attribute declarations directly inside the desired XML document. Note that the symbols “[“ and “]” are used as delimiters the inside the <!DOCTYPE> directive as shown in Example 1. 
 
Example 7: CDATA or CHARATER DATA defined as internal DTD.
This example explains how to use ATTLIST and CDATA within an XML page.
An image DTD declaration shows that “window” is an element, and may be EMPTY. The height and width are two attributes of the image element. After the DTD is declared within the DOCTYPE, the attributes can be assigned with number represented in character format. 
 
<?xml version="1.0" encoding=”utf-8” standalone="yes"?>
<!DOCTYPE window [
  <!ELEMENT window EMPTY>
  <!ATTLIST window height CDATA #REQUIRED>
  <!ATTLIST window width CDATA #REQUIRED>]>
<window height="32" width="32"/>

 

Example 8: A private external DTD example using SYSTEM keyword.
The following DOCTYPE declaration uses the SYSTEM keyword to show that this DTD will be used as a private DTD: 
<!DOCTYPE picture SYSTEM http://www.etcs.ipfw.edu/~lin/xml/dtds/picture.dtd>
shows that the root element of XML document is “picture” and that the DTD for this document is located at the URI: http://www.etcs.ipfw.edu/~lin/xml/dtds/picture.dtd
We can also use following relative URLs to locate the DTD. It depends on where the desired DTD is located:
·         If reside in the same site: <!DOCTYPE picture SYSTEM “/dtds/person.dtd”>
·         If reside in the same directory: <!DOCTYPE picture SYSTEM “person.dtd”>

 

Example 9: An ID Example.

<?xml version="1.0" encoding=”utf-8” standalone="yes"?>
<!DOCTYPE employee_name [
  <!ELEMENT employee_name (#PCDATA)>
  <!ATTLIST employee_name employee_no ID #REQUIRED>
]>
<employee_name employee_no="temp999111000">Jon Doe</employee_name>

 

Example 10: AN IREF Example

<?xml version="1.0" encoding=”utf-8” standalone="yes"?>
<!DOCTYPE project_team [
  <!ELEMENT project_team (employee_name)*>
  <!ELEMENT employee_name (#PCDATA)>
  <!ATTLIST employee_name employee_no ID #REQUIRED>
  <!ATTLIST employee_name projectmanager_1 IDREF #IMPLIED>
  <!ATTLIST employee_name projectmanager_2 IDREF #IMPLIED>
]>
<project_team>
  <employee_name employee_no="nj12345678">Al King</employee_name>
  <employee_name employee_no="ny12890000">Sam Bush</employee_name>
  <employee_name employee_no="temp999111000"
    projectmanager_1="nj12345678" projectmanager_2="ny12890000">Jon Doe</employee_name>
</project_team>

 

Example 11: ENTITY Example

<?xml version="1.0" encoding=”utf-8” standalone="no"?>
<!DOCTYPE lab_report [
  <!ELEMENT lab_report (results)*>
  <!ELEMENT results EMPTY>
  <!ATTLIST results image ENTITY #REQUIRED>
  <!ENTITY ipfw SYSTEM
    "http://www.ipfw.edu/icons/ipfw.gif">
]>
<lab_report>
  <results image="ipfw"/>
<lab_report>

 

Example 12: An ENTITIES Example

<?xml version="1.0" encoding=”utf-8” standalone="no"?>
<!DOCTYPE lab_report_all [
  <!ELEMENT lab_report_1 (results)*>
  <!ELEMENT results EMPTY>
  <!ATTLIST results images ENTITIES #REQUIRED>
  <!ENTITY imga SYSTEM
    "http://www.ecet.ipfw.edu/ecet/lab/imga.gif">
  <!ENTITY imgb SYSTEM
    "http://www.ecet.ipfw.edu/ecet/lab/imgb.gif">
  <!ENTITY imgc SYSTEM
    "http://www.ecet.ipfw.edu/ecet/lab/imgc.gif ">
]>
< lab_report_all>
  <results images="imga imgb imgc"/>
</ lab_report_all>

 

Example 13: NMTOKEN Example

<?xml version="1.0"?>
<!DOCTYPE student_name [
  <!ELEMENT student_name (#PCDATA)>
  <!ATTLIST student_name student_no NMTOKEN #REQUIRED>
]>
<student_name student_no="999888777">Mike Adams</student_name>

 

Example 14: NOTATION Example

<?xml version="1.0"?>
<!DOCTYPE program_code [
  <!ELEMENT program_code (#PCDATA)>
  <!NOTATION C PUBLIC "ANSI C">
  <!ATTLIST program_code lang NOTATION (C) #REQUIRED>
]>
<code lang="C">Some C statements</code>

 

Example 15: EMERATION Example

<?xml version="1.0"?>
<!DOCTYPE assignments [
  <!ELEMENT assignments (homework)*>
  <!ELEMENT homework (#PCDATA)>
  <!ATTLIST homework status (urgent|normal) #REQUIRED>
]>
<assignments>
  <homework status="urgent">This is an urgent
    homework, it’s due date is now</homework>
  <homework status="normal">This homework due next week</homework>
</assignments>

 

 

External DTD Examples

 

The list below shows a DTD definition for the parent element “person” which contains two child elements “first_mame”, and “last_name”. It also specifies that the child elements “first_name” and “last_name” may contain PCDATA (parsed character) or text data type. At the end, it defines an empty tag called “nothing”.  We may edit and save this person.dtd file in an appropriate Web server directory. The Web server also should be configured to recognize the MIME media type as /xml-dtd.

 
Example 16: A person DTD.
<!ELEMENT person (firs_tname, middle_name, last_name)>
<!ELEMENT firs_tname (#PCDATA)>
<!ELEMENT middle_name (#PCDATA)>
<!ELEMENT last_name (#PCDATA)>

 

If the person.dtd file is saved in the following Web server folder, any XML document may include a reference to this external DTD with the following DOCTYPE declaration:

<!DOCTYPE person SYSTEM “http://www.etcs.ipfw.edu/~lin/xml/dtds/person.dtd

 

Where the “person” should be the root element matched in person.dtd.

 

Building and Using DTDs 

If the display of XML content is needed, an XML-compliant application will process the following three files to display it:

 

Example 17: An XML-based business letter with a document type definition file. The letter.dtd is the external data type definition (DTD) file which define meaning, constraints, and types of elements to be used in the letter.xml file. <!DOCTYPE

 

 

<?xml version = "1.0"?>

<!-- letter.xml               -->

<!—An XML Formatted Business Letter  -->

<!DOCTYPE letter SYSTEM "letter.dtd">

<letter>

   <contact type = "from">

      <name>Paul Lin</name>

      <address1>ECET Dept, Purdue U Fort Wayne campus</address1>

      <address2>2101 Coliseum Blvd. East</address2>

      <city>Fort Wayne</city>

      <state>Indiana</state>

      <zip>46805</zip>

      <phone>219-481-6339</phone>

      <flag gender = "M"/>

   </contact>

 

   <contact type = "to">

      <name>Lisa Smith</name>

      <address1>PO. Box 54321</address1>

      <address2/>          <!-- Empty -->

      <city>Othertown</city>

      <state>Otherstate</state>

      <zip>46815</zip>

      <phone>219-555-4321</phone>

      <flag gender = "F"/>

   </contact>

   <paragraph>Dear Customer,</paragraph>

   <paragraph>Thank you very much for your business.</paragraph>

   <paragraph>Sincerely, Paul Lin</paragraph>

</letter>

 

<!-- letter.dtd -->

<!--Document Type Definition file is saved with a dtd extension -->

<!ELEMENT letter (contact+, paragraph+)>

<!ELEMENT contact (name, address1, address2, city, state,

   zip, phone, flag)>

<!ATTLIST contact type CDATA #IMPLIED>

<!ELEMENT name (#PCDATA)>

<!ELEMENT address1 (#PCDATA)>

<!ELEMENT address2 (#PCDATA)>

<!ELEMENT city (#PCDATA)>

<!ELEMENT state (#PCDATA)>

<!ELEMENT zip (#PCDATA)>

<!ELEMENT phone (#PCDATA)>

<!ELEMENT flag (EMPTY)>

<!ATTLIST flag id CDATA #IMPLIED>

<!ELEMENT paragraph (#PCDATA)>

 

Entities

XML provide a number of ways for dealing with entities (constants): general entities and parameter entities. The built-in general entities including &amp, &quot, &amp, &lt, and &gt, are for use in XML documents. To declare an entity the keyword ENTITY is used.

 

General Entities

The general entities are also used in the declaration for entity text replacement that is similar to the purpose of #define directive in C language. 

 

Internal Parsed Entities

Internal parser entities use the following declaration syntax:

<!ENTITY entity_name “entity_content”>

After the entity is defined, the following syntax:

          &entity_name

can be used to reference to the defined entity: entity_content.

 

External Parsed Entities

<!ENTITY entity_name SYSTEM “URI_of_entity_content”>

<!ENTITY entity_name PUBLIC “Public_Identifier” “URI_of_entity_content”>

 

Unparsed Entities and Notations

Many information on the Internet including plain text, PDF files, image files (JPEG, GIF), steam video  (QuickTime movies), and MIDI sound files are not well-formed XML content, but are necessary component of many XML documents. The unparsed entities allow us to define non-XML content and are completely bypassed by the XML parsers. 

 

NDATA – stands for notation data  which a keyword for specifying the type of unparsed entities.

 

NOTATION

The NOTATION declaration adds information about the type of content.

Simple MIME type content can be notated as:

<!NOTATON jpg SYSTEM “image/ipeg”>

<!NOTATON gif SYSTEM “image/gif”>

 

Embedding Unparsed Content

<!ELEMENT photo EMPTY>

<!ATTLIST photo source ENTITY #REQUIRED>

 

Parameter Entities

XML Parameter entities enable you to create a structure that allows a document author to choose from two or more possible DTD structures without giving that person control over the actual DTD.

 
Parameter Entity Syntax

The syntax for an external parameter entity is:      

<ENTITY %entity_name SYSTEM URI>

where the entity_nam is the name of the entity and URI is the path to the file containing the entity content.

To reference the external parameter entity in a DTD, use the following syntax:
%EntityName

Example 18: Declaring external parameter entities to us a DTD.  There are two dtd files in this example: customer.dtd and order.dtd.  We could declare an external parameter entity in the order.dtd to use the customer.dtd file:  <!ENTITY % customer_info SYSTEM "customer.dtd">.  Then, we use the entity name: %customer_info to reference the customer.dtd.

The customer.dtd that defines the elements for a customer’s shipping address:

<?xml version='1.0' encoding='UTF-8' ?>
<!ELEMENT shipping_address (street , city , state , zip)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
<!ELEMENT name (firstname , lastname)>
<!ELEMENT firstName (#PCDATA)>
<!ELEMENT lastName (#PCDATA)>
<!ELEMENT customer (name , shipping_address?)>
<!ATTLIST customer Id CDATA #REQUIRED >

 

Thus, the order.dtd for a customer might used the predefined customer information in customer.dtd. It looks like this:

<?xml version='1.0' encoding='UTF-8' ?>
<!ENTITY % CustomerInfo SYSTEM "Customer.dtd">

%CustomerInfo;

<!ELEMENT order (customer , shipping_address , sales_rep_id , order_date , promised_date? , ship_date , status_code , ship_method_code , payment_method , OrderLine+)>

<!ATTLIST order order_number CDATA #REQUIRED
                               ActiveFlag (Y | N ) 'Y' >
<!ELEMENT sales_rep_d (#PCDATA)>
<!ELEMENT order_date (#PCDATA)>
<!ELEMENT promised_date (#PCDATA)>
<!ELEMENT ship_method_code (#PCDATA)>
<!ELEMENT payment_method (#PCDATA)>
<!ELEMENT order_line (ordered_product , quantity)>
<!ATTLIST order_line order_line_num CDATA #REQUIRED >
<!ELEMENT product (UPCcode , description , color , in_stock_num , reorder_level_num , size_code?)>
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT UPCcode (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT color (#PCDATA)>
<!ELEMENT in_stock_num (#PCDATA)>
<!ELEMENT reorder_level_num (#PCDATA)>
<!ELEMENT ordered_product (UPCcode , Color, description , size_code?)>
<!ELEMENT size_code (#PCDATA)>
<!ELEMENT ship_date (#PCDATA)>

 

 

Web Site References