Learning XML: Creating Self-Describing Data

XML: Learning XML: Creating self-describing data [Book Reviews]. Published in: IEEE Micro (Volume: 21, Issue: 2, March-April ). Article #. Page(s): 95 -.
Table of contents

Complementing that development is ADO. These serve a function similar to the Java virtual machine. While similar in form and philosophy to Java, C introduces a few improvements. For example, it standardizes the getting and setting of object properties. Finally, to mix a metaphor, Microsoft has embraced XML and jumped into it with both feet. XML also provides the base format for remote interfacing, and it serves as the working data format for ADO. Both the System and System.

Data class hierarchies devote a namespace to XML. Wrox Press specializes in books for programmers. NET software development kit. They give a pretty coherent picture, hedged in disclaimers, of what the first release—probably still about a year away—will look like. If you expect to develop software for Microsoft systems in the next few years, you need to know all about.

NET, and these authors are excellent guides to the territory. Ray not to be confused with Eric J. At appropriate points, Ray delves deeply into details by presenting complete, clearly written examples. If you plan to work with XML to produce technical documentation, this book pays for itself many times over.

Many authors dump sample code into their books, but the XML, XSLT, and Perl examples in this book are well organized, clearly formatted, well annotated, and easy to understand. For brevity, completeness of coverage, clarity of writing, and usefulness of examples, this is the best XML book I have seen. There is an empty element inside this example:.

Learning XML - O'Reilly Media

Figure shows the syntax for a container element. An element can have any number of attributes, but no two attributes can have the same name.


  • Home-School Connections in a Multicultural Society: Learning From and With Culturally and Linguistic?
  • Marshall Plan Days (Routledge Revivals): Volume 26.
  • Dead Until Dark: A True Blood Novel (Sookie Stackhouse Book 1).
  • Learning XML: Creating Self-Describing Data.
  • Joy and Paine!
  • The Secret Path: Meditation Teachings from One of the Greatest Spiritual Explorers of the Twentieth ?
  • Learning XML, 2nd Edition.

Following the start tag is the element's content 6 , which in turn is followed by an end tag 7. The end tag consists of an opening angle bracket, a slash, the element's name, and a closing bracket. The end tag has no attributes, and the element name must match the start tag's name exactly. An element name must start with a letter or an underscore, and can contain any number of letters, numbers, hyphens, periods, and underscores. The colon symbol is used in namespaces, as explained in "Namespaces: Expanding Your Vocabulary," so avoid using it in element names that don't use a namespace.

Space, tab, newline, equals sign, and any quote characters are separators for element names, attribute names, and attribute values, so they are not allowed either. Some valid element names are: There is no specific number, but probably anything over 40 characters is unnecessarily long. There can be no space between the opening angle bracket and the element name, but adding extra space anywhere else in the element tag is okay. This allows you to break an element across lines to make it more readable. There are two rules about the positioning of start and end tags: To understand the second rule, think of elements as boxes.

A box can sit inside or outside another box, but it can't protrude through the box without making a hole in the side. Thus, the following example of overlapping elements doesn't work:. Anything in the content that is not an element is text, or character data. The text can include any character in the character set that was specified in the prolog.

However, some characters must be represented in a special way so as not to confuse the parser. Including it directly in content causes an ambiguous situation: To resolve this conflict, you need to use a special code in place of the offending character. So we can rewrite the above example like this: Such a substitution is known as an entity reference. We'll describe entities and entity references in "Entities: In XML, all characters are preserved as a matter of course, including the white-space characters space, tab, and newline; compare this to programming languages such as Perl and C, where whitespace characters are essentially ignored.

In markup languages such as HTML, multiple sequential spaces are collapsed by the browser into a single space, and lines can be broken anywhere to suit the formatter. XML, on the other hand, keeps all space characters by default. Some important changes you should take note of include: Element names are case-sensitive in XML. HTML allows you to write tags in whatever case you want. In XML, container elements always require both a start and an end tag.

In HTML, on the other hand, you can drop the end tag in some cases. Empty XML elements require a slash before the right bracket i. XML elements treat whitespace as part of the content, preserving it unless they are explicitly told not to. But in HTML, most elements throw away extra spaces and line breaks when formatting content in the browser. You should not assume any kind of formatting or presentational style based on markup alone. Instead, XML leaves presentation for stylesheets, which are separate documents that map the elements to styles.

Sometimes you need to convey more information about an element than its name and content can express. The use of attributes lets you describe details about the element more clearly. An attribute can be used to give the element a unique label so it can be easily located, or it can describe a property about the element, such as the location of a file at the end of a link. It can be used to describe some aspect of the element's behavior or to create a subtype. As shown in Figure , an attribute consists of a property name 1 , an equals sign 2 , and a value in quotes 3.

An element can have any number of attributes, as long as each has a unique name. Here is an element with three attributes:. Attributes are separated by spaces. They must always follow the element name, but they can be in any order. The values must be in single ' or double " quotes.

Learning XML

If the value contains quotes, use the opposite kind of quote to contain it. Here is an example: Attribute values can be constrained to certain types if you use a DTD. No two elements in a document can have the same ID. Let's demonstrate how these might be used. First, there is an element somewhere in the document with an ID -type attribute:. We talk more about these attributes in Chapter 3, "Connecting Resourceswith Links". Another way a DTD can restrict attributes is by creating an allowed set of values.

You may want to use an attribute called day that can have one of seven values: For a more detailed explanation of attribute types, see Chapter 5, "Document Models: Some attribute names have been set aside for special purposes by the XML working group. These attributes are reserved for XML's use and begin with the prefix xml: Two other names, xml: These special attribute names are described here: Classifies an element by the language of its content.

This is useful for creating conditional text, which is content selected by an XML processor based on criteria such as what language the user wants to view a document in. We'll return to this topic in Chapter 7, "Internationalization". Specifies whether whitespace should be preserved in an element's content.

If set to "preserve" , any XML processor displaying the document should honor all newlines, spaces, and tabs in the element's content. If it is set to "default" , then the processor can do whatever it wants with whitespace i. Thus, if you want to compress whitespace in an element, set the attribute xml: Signals to an XLink processor that an element is a link element. For information on how to use this attribute, see Chapter 3, "Connecting Resourceswith Links". In addition to xml: But to prevent conflict with other potential uses of those attributes, XLink defines the xml: That is, you can say, "When XLink is looking for an attribute called title , I want you to use the attribute called linkname instead.

What happens when you want to include elements or attributes from different document types? If you can survive without a DTD and most browsers will tolerate documents without them , you can use a feature of XML called namespaces. A namespace is a group of element and attribute names. You can declare that an element exists within a particular namespace and that it should be validated against that namespace's DTD. By appending a namespace prefix to an element or attribute name, you tell the parser which namespace it comes from. Imagine, for example, that the English language is divided into namespaces corresponding to conceptual topics.

We'll take two of these, say hardware and food. The topic hardware contains words such as hammer and bolt, while food has words like fruit and meat.

Books & Videos

Both namespaces contain the word nut, which has a different meaning in each context even though it's spelled the same in both. It really is two different words with the same name, but how can we express that fact without causing a namespace clash? This same problem can occur in XML, where two XML objects in different name-spaces can have the same name, resulting in ambiguity about where they came from.

The solution is to have each element or attribute specify which namespace it comes from by including the namespace as a prefix. The syntax for this qualified element name is shown in Figure A namespace prefix 1 is joined by a colon 2 to the local name of the element or attribute 3. Namespaces aren't useful only for preventing name clashes. More generally, they help the XML processor sort out different groups of elements for different treatments. The browser needs to know when to enter "math equation mode" and when to be in "regular XML mode. RepurposingDocuments" relies on namespaces to distinguish between XML objects that are data, and those that are instructions for processing the data.

The instructional elements and attributes have an xsl: Anything without a namespace prefix is treated as data in the transformation process. A namespace must be declared in the document before you can use it. The declaration is in the form of an attribute inside an element. Any descendants of that element become part of the namespace. Figure shows the syntax for a namespace declaration. It starts with the keyword xmlns: This is followed by a colon, then a namespace prefix 2 , an equals sign, and finally a URL in quotes 3.

If the namespace prefix bob isn't to your liking, you can use any name you want, as long as it observes the element-naming rules. As a result, b , bobs-company , or wiggledy. Be careful not to use prefixes like xml , xsl , or other names reserved by XML and related languages. The value of the xmlns: There doesn't even have to be a document at the location it points to.

Specifying the URL is a formality to provide additional information about the namespace, such as who owns it and what version you're using. Any element in the document can contain a namespace declaration. Most often, the root element will contain the declarations used in the document, but that's not a requirement.

You may find it useful to limit the scope of a namespace to a region inside the document by declaring the namespace in a deeper element. In that case, the namespace applies only to that element and its descendants. Here's an example of a document combining two namespaces, myns and eq:. We can declare one of the namespaces to be the default by omitting the colon: Elements and attributes in the default namespace don't need the namespace prefix, resulting in clearer markup: Namespaces can be a headache if used in conjunction with a DTD.

It would be nice if the parser ignored any elements or attributes from another namespace, so your document would validate under a DTD that had no knowledge of the namespace. Unfortunately, that is not the case. Another problem with namespaces is that they don't import a DTD or any other kind of information about the elements and attributes you're using.

So you can actually make up your own elements, add the namespace prefix, and the parser will be none the wiser. This makes namespaces less useful for those who want to constrain their documents to conform to a DTD. For these and other reasons, namespaces are a point of contention among XML planners.

It's not clear what will happen in the future, but something needs to be done to bridge the gap between structure enforcement and namespaces. With the basic parts of XML markup defined, there is one more component we need to look at.

Learning XML

An entity is a placeholder for content, which you declare once and can use many times almost anywhere in the document. It doesn't add anything semantically to the markup. Rather, it's a convenience to make XML easier to write, maintain, and read. Entities can be used for different reasons, but they always eliminate an inconvenience.

They do everything from standing in for impossible-to-type characters to marking the place where a file should be imported.


  • Carmella Jackson: Manifest Vampire, Kindle Edition?
  • American Auto Trail-Nebraskas U.S. Highway 75.
  • Charles Faudrees French Country Signature?
  • Learning XML: (Guide to) Creating Self-Describing Data: Chapter 2: Markup and Core Concepts;
  • Filipino Ghost Stories: Spine-Tingling Tales of Supernatural Encounters and Hauntings.
  • The Wayfarers Tale?
  • .

You can define entities of your own to stand in for recurring text such as a company name or legal boilerplate. Entities can hold a single character, a string of text, or even a chunk of XML markup. Without entities, XML would be much less useful. Whenever you enter the entity in a document, it will be replaced with the text http: Figure shows the different kinds of entities and their roles.

The two major entity types are parameter entities and general entities. In this section, we'll focus on the other type, general entities. General entities are placeholders for any content that occurs at the level of or inside the root element of an XML document.

XML: Learning XML: Creating self-describing data [Book Reviews]

An entity consists of a name and a value. When an XML parser begins to process a document, it first reads a series of declarations , some of which define entities by associating a name with a value. The value is anything from a single character to a file of XML markup. As the parser scans the XML document, it encounters entity references , which are special markers derived from entity names. For each entity reference, the parser consults a table in memory for something with which to replace the marker.

It replaces the entity reference with the appropriate replacement text or markup, then resumes parsing just before that point, so the new text is parsed too. Any entity references inside the replacement text are also replaced; this process repeats as many times as necessary. Figure shows that there are two kinds of syntax for entity references.

The following is an example of a document that declares three general entities and references them in the text:. This entity is referenced but not declared; no declaration is necessary because numbered character entities are implicitly defined in XML as references to characters in the current character set. For more information about character sets, see Chapter 7, "Internationalization". The XML parser simply replaces the entity with the correct character. All entities besides predefined ones must be declared before they are used in a document.

Two acceptable places to declare them are in the internal subset, which is ideal for local entities, and in an external DTD, which is more suitable for entities shared between documents. If the parser runs across an entity reference that hasn't been declared, either implicitly a predefined entity or explicitly, it can't insert replacement text in the document because it doesn't know what to replace the entity with.

XML Tutorial 32 XSD Schema DataTypes

This error prevents the document from being well-formed. Entities that contain a single character are called, naturally, character entities. These fall into several groups: Some characters cannot be used in the text of an XML document because they conflict with the special markup delimiters. The XML specification provides the following predefined character entities , so you can express these characters safely:. XML supports Unicode, a huge character set with tens of thousands of different symbols, letters, and ideograms. You should be able to use any Unicode character in your document.

The problem is how enter a nonstandard character from a keyboard with less than keys, or how to represent one in a text-only editor display. One solution is to use a numbered character entity , an entity whose name is of the form n , where n is a number that represents the character's position in the Unicode character set. The number in the name of the entity can be expressed in decimal or hexadecimal format. Note that the hexadecimal version is distinguished with an x as the prefix to the number. The range of characters that can be represented this way starts at zero and goes up to 65, We'll discuss character sets and encodings in more detail in Chapter 7, "Internationalization".

The problem with numbered character entities is that they're hard to remember: Imagine an application designed to display the original version of note. Then imagine a newer version of note. Many computer systems contain data in incompatible formats. Exchanging data between incompatible systems or upgraded systems is a time-consuming task for web developers. Large amounts of data must be converted, and incompatible data is often lost. XML stores data in plain text format.