9. Extensible Markup Language (XML): An Introduction

 

 

XML - an Overview

XML (Extensible Markup Language) is a standard protocol developed by John Bosak and members from the World Wide Web Consortium W3C (http://www.w3.org/XML). It roots back to 1969, Charles Goldfarb, an IBM researcher who was in charge of the Generalized Markup Language (GML) design team. Although SGML is an International standard for marking up data, it has the following weaknesses: extreme complex, expensive to implement, and no support in common browsers such as Netscape and Internet Explorer. While HTML is free and simple, but is not able to describe the data it is representing

. The XML was developed to overcome shortcomings of SGML and HTML. XML promises to increase both the efficiency and the flexibility of handling computerized information as needed in E-Commerce, for example you can specify a new set of user defined tags: <PRICE> and </PRICE> that is unique to some specific applications and is not limited by the tags available from HTML.

 

What is XML?

·        A meta language for describing data (meta-data)

 

·        A standard protocol for exchanging and publishing information in a structured manner

 

A quick look of an

XML markups document:

<?xml version=“1.0”?>

<!—dataset.xml -->

<dataset>

  <row num=“1”>

    <lastname>SMITH</lastname>

    <firstname>Dan</firstname>

    <salary>80000</salary>

    <jobtitle>Web Developer <jobtitle>

  </row>

</dataset>

 

Contents of this XML page include the

 

Benefits of XML (source - http://www.softwareag.com/xml/about/xml_ben.htm

 

)

 

Additional highlights of XML are:

 

The design goals for XML are: (source – http://www.w3.org/TR/REC-xml/)

  1. XML shall be straightforwardly usable over the Internet.

2.      XML shall support a wide variety of applications.

3.      XML shall be compatible with SGML.

4.      It shall be easy to write programs which process XML documents.

5.      The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

6.      XML documents should be human-legible and reasonably clear.

7.      The XML design should be prepared quickly.

8.      The design of XML shall be formal and concise.

9.      XML documents shall be easy to create.

10. Terseness in XML markup is of minimal importance.

 

XML in 10 points (source- http://www.w3.org/XML/1999/XML-in-10-points)

  1. XML is for structuring data
  2. XML looks a bit like HTML
  3. XML is text, but isn’t meant to be read
  4. XML is verbose by design
  5. XML is a family of technologies
  6. XML is new, but not that new
  7. XML leads HTML to XHTML
  8. XML is modular
  9. XML is the basis for RDF (Resource Description and Framework) and Semantic Web
  10. XML is license-free, platform-independent and well supported

 

Examples of XML Applications

·        Separation of data storage and display within HTML pages - we can use XML to carry data. The data in an XML file can be loaded into HTML pages as “data islands” and then using HTML for data layout and display.

 

 

 

XML Associated Files or programs

In addition to an XML file itself, some files and programs are needed for processing an XML-compliant applications:

 

o       SAX (Simple API for XML) http://www.saxproject.org/ - is originally a Java-only API. The current version is SAX 2.0.1, and there are versions for several programming language environments other than Java.

 

 

XML Syntax and Terminology

are rewritten, in XML, as empty tag notation which is ended with a forward slash.

which have no text appearing between two tags,

                        Start Tag:<ElementName AttributesName="AttributeValue">

            End Tag: </ElementName>

 

Components of XML Documents

 

XML Declaration

The XML declaration appears at the first line of an XML document, before the content:

<?xml 

version=”versionumber”

encoding=”encoding_declaration”

standalone=”yes_or_no” ?>

 

This declaration is interpreted as the following:

Version Declaration (required):

Encoding Declaration (optional):

Standalone Document Declaration (optional):

 

 

Basic XML Document Page Creation

 

XML pages can be created with any text editor. We will use a Microsoft Notepad editor to create the following email XML example

 page, save it as email.xml, and view it with Internet Explorer .

 

Example 1: A well-formed email is prepared in XML format. The XML declaration appears first line to show the version of XML standard and encoded language code. It follows by a comment line, which shows the XML file name with .xml as the extension. View email.xml example.

<?xml version="1.0" encoding’”UTF-8” ?>

<!-- email.xml -->

<EMAIL>

  <TO>lin@ipfw.edu</TO>

  <FROM>lin@hotmail.com</FROM>

  <CC>XMLL_in_Action@hotmail.com</CC>

  <SUBJECT>XML Example: Basic email</SUBJECT>

  <BODY>This is an XML-based email.</BODY>

</EMAIL>

 

 

Example 2:  XML elements (tags) created for capture information of <PRICE>, <PICTURE>, and <ARTICLE>.

<PRICE Currency="Euro"> 26.02 </PRICE>

 

<PICTURE>   </PICTURE>

 

<ARTICLE>  

<TITLE>     </TITLE>

<DATE>      </DATE>

<AUTHOR>   

<FIRSTNAME> </FIRSTNAME>

<LASTNAME>  </LASTNAME>

</AUTHOR>

<SUMMARY>   </SUMMARY>

<CONTENT>   </CONTENT>

</ARTICLE>

 

 

Example 3: Empty elements.

 

<img align =”center” src=”http://www.etcs.ipfw.edu/~lin/netdiagram.gif”  />

<br> </br>

<br/>

 

 

XML Elements, Attributes, and Values

XML elements (tags) are basic components of a document. All elements in an XML documents have parent-child relationships.  Elements can have different content types (attributes) and may have a set of attribute specifications. Attributes are contained with an element’s opening tag. Each attribute has a quotation mark delimited values that describe the purpose and content of the particular element. Information contained in attributes are called Metadata or “data about data” which describes the content, description, quality, condition, or other characteristics of data. For Web applications, metadata can be seen as machine understandable information is addressed by the W3C Metadata Activity (http://www.w3.org/Metadata/Activity.html).

 

XML elements must follow these naming rules defined in XML specification (www.w3.org/TR/REC-cml):

Root Element

Content within XML document can be encoded as either elements or attributes. For example, a book title can be expressed as one of the following:

As an element:

<book>

          <title> XML-RPC </title>

      *****

</book>

As an attribute:

<book  title=”XML-RPC”>

*****

</book>

 

XML Trees

XML documents are structured as hierarchical trees. A root element contains information to show its meaning. As shown in Example 1, the element <EMAIL> is the root element which has five child elements including <TO>, <FROM>, <CC>, <SUBJECT>, and <BODY>. In Example 4, again, <lab-room> is the root tag (element), which contains three elements <labtable>, <labchair>, and <pc>.  Further elements and attributes are used to describe Lab Tables, Lab Chairs, and Personal Computer in terms of <quantity>, <quality>, <color>, <manufacturer>, etc.

 

Example 4: An XML file for describing a university computer lab. View et226lab.xml example.

 

<?xml version="1.0"?>

<!-- et226lab.xml -->

<lab-room>

  <labtable type="rectangular" wood="maple">

    <quantity>8</quantity>

    <quality>good</quality>

    <color>brown</color>

    <manufacturer>Steel Case</manufacturer>

  </labtable>

  <labchair wood="oak">

    <quantity>20</quantity>

    <quality>good</quality>

    <cushion included="false">

      <color>brown</color>

    </cushion>

  </labchair>

 

  <pc>

    <quantity>14</quantity>

    <monitor>14</monitor>

    <cpu> Intel Pentium 4 1.2 GHz</cpu>

    <harddisk> 40Gbytes</harddisk> 

  </pc>

</lab-room>

 

 

Example 5: XML documents can be viewed with Microsoft Internet Explorer 6.0 in tree structure format.  You can open et226lab.xml document with Internet Explorer 6.0 to see the left picture with only the root element, then click the “-/+” sign to expand or contract the viewed list.

 

  - <labtable type="

 

 

 

 

XML Document Processing

The function of the XML markup is to describe its storage and logical structure, and to associate attribute-value pairs with its logical structures

 

Extracting XML Data Using Internet Explorer

There is a need to extract XML data or content and display it using a Web browser. For example, we have an XML page named home_appliance_catalog.xml, which describes the <Washer> product of <HOME_APPLIANCE> catalog.  To bind this XML data on a Web page, we need to make a reference to the XML data source. This is accomplished by placing the following code in the example HTML page named “ha_catalog_display.html”:

<xml  src="home_appliance_catalog.xml"

      id="xmlhomeapp"

      async="false">

</xml>

 

We use its DATASRC property with the same id=”xmlhomeapp” to reference to the same XML data island:

<table      datasrc="#xmlhomeapp"   width="100%"      border="1">

 

We then use HTML <SPAN> tags to embed DATAFLD to extract the bound data, and display content:

 

<tr align="center">

<td><span datafld="product_name"></span></td>

<td><span datafld="product_id"></span></td>

<td><span datafld="EnergyStarQualified"></span></td>

<td><span datafld="price"></span></td>

</tr>

 

When you open the ha_catalog_display.html page with Microsoft Internet Explorer, the XML content should display in HTML table format.

 

Example 6: Binding XML data and display it with a IE Web browser. We created two files as shown below:

Then open the HTML page to read and display the XML data with the HTML table format.

 

 

 

 

<html>

<!-- ha_catalog_display.html -->

<body>

<xml  src="home_appliance_catalog.xml"

      id="xmlhomeapp"

      async="false">

</xml>

<table      datasrc="#xmlhomeapp"   width="100%"      border="1">

<thead>

      <th>Product Name</th>

      <th>Product ID</th>

      <th>EnergyStar Qualified</th>

      <th>Price</th>

</thead>

<tr align="center">

<td><span datafld="product_name"></span></td>

<td><span datafld="product_id"></span></td>

<td><span datafld="EnergyStarQualified"></span></td>

<td><span datafld="price"></span></td>

</tr>

</table>

</body>

</html>

 

<?xml version="1.0" encoding="utf-8"  ?>

<!-- home_appliance_catalog.xml -->

<!-- Edited with Microsoft Notepad -->

<HOME_APPLIANCES>

      <WASHER>

            <product_name>WH Front Load Washer</product_name>

            <product_id>      GHW8200</product_id>

            <EnergyStarQualified>Yes</EnergyStarQualified>

            <price>$350.00</price>

      </WASHER>

 

      <WASHER>

            <product_name>WH Top Load Washer</product_name>

            <product_id>      GHW8100</product_id>

            <EnergyStarQualified>No</EnergyStarQualified>

              <price>$300.00</price>

      </WASHER>

      <WASHER>

            <product_name>Washer/Dryer All In One     </product_name>

            <product_id>      GHW9300</product_id>

            <EnergyStarQualified>Yes</EnergyStarQualified>

            <price>$950.00</price>

</WASHER>

</HOME_APPLIANCES>

 

 

 

XML-Based Languages

 

WML (Wireless Markup Language) – a WAP (Wireless Application Protocol) WML specification endorsed by Ericsson, Motorola, Nokia, and Unwired PlanetWML.

 

VoiceXML Standard - VoiceXML is a Document Type Definition (DTD) created by four companies AT&T, IBM, Lucent Technologies, and Motorola for implementing interactive voice recognition (IVR) applications as Web-based telephony systems.

MathML - Mathematics Markup Language (http://www.w3.org/1999/07/REC-MathML-19990707/): <plus/>, <times/>, <power/>, <mrow>, <msup>, <mi>, <mn>

EdaXML  - XML-based symbols mapped directly to its database of millions of electronic components. The availability of these XML (Extensible Markup Language) symbols allows printed circuit board (PCB) designers and design teams to perform cross-platform design using electronic design automation (EDA) tools from a variety of vendors.

XML For Automation Devices -  It covers basic elements in automation including algorithms, programs, controllers and interfaces (http://www.gca.org/papers/xmleurope2001/papers/html/s07-1.html)

CIM/XML – a language for representing power system models http://www.langdale.com.au/CIMXML/

 

The Control System Modeling Language www.slac.stanford.edu/econf/C011127/talks/THCT004.pdf

 

The LandXML schema (http://www.anvil.eu.com/XML.htm) - facilitates the exchange of data created during the Land Planning, Civil Engineering and Land Survey process.

 

 

Web Sites and Document References:

XML Related Magazines

XML Development Tools: Document Editor, IDE, XSLT

XML Applications:

Development tools

XML Resources: