Week 1

Q1. Answer

Let the following elements be part of a complete XML document. Which of them are well-formed? If they are not well-formed, what is wrong?

<p/>paragraph</p>
<p/><p>paragraph</p>
< title>My book</title>
<Author>Me</author>
<date>2002-11-11</date>
<1>Item 1</1>

Q2. Answer

Write an xml document describing these exercises: the root element is <exercises>. The
root has an attribute number that has value 1. The root element has three child elements;
<due> that contains as text the due date of the exercise, and two <item> elements for the
first two exercises.Write some text in the <item> elements; in English in the first one and
some other language in the second one. Specify the language used. Test the document in an
XML aware browser (IE, Mozilla). How does the browser react to a well-formed document?
What if there are errors in the document?

Q3. Answer

What XML markup would you suggest for the following dictionary extract? Decide a suitable
granularity.
Main Entry: language
Pronunciation: ’la[ng]-gwij, -wij
Function: noun
Etymology: Middle English, from Old French, from langue tongue, language, from Latin
lingua - more at TONGUE
Date: 14th century
1 a : the words, their pronunciation, and the methods of combining them used and understood
by a community b (1) : audible, articulate, meaningful sound as produced by the
action of the vocal organs (2) : a systematic means of communicating ideas or feelings by
the use of conventionalized signs, sounds, gestures, or marks having understood meanings
(3) : the suggestion by objects, actions, or conditions of associated ideas or feelings
(4) : the means by which animals communicate (5) : a formal system of signs and symbols
(as FORTRAN or a calculus in logic) including rules for the formation and transformation
of admissible expressions (6) : MACHINE LANGUAGE 1
2 a : form or manner of verbal expression; specifically : STYLE b : the vocabulary and
phraseology belonging to an art or a department of knowledge c : PROFANITY
3 : the study of language especially as a school subject
Check that the markup is well-formed. (for instance with xmllint or one of the others
mentioned in the slides). What messages does the parser write if there are errors /aren’t
errors?

Q4. Answer

Can you come up with a way to use the text in exercise 3 that isn’t supported by your
tagging? How would you amend the tagging? How difficult do you think it is to come up
with a suitable tagging?



A1.

<p/><p>paragraph</p>
At the beginning of the statement should be only one <p> tag without slash.

< title>My book</title>
At the first tag, there is a space after the the first angle-bracket.

<Author>Me</author>
In the first tag, author should be written with a lower case latter. <author>

A2.

<exercises name="1">

    <due>
    07-Februari-2012
    </due>
   
    <item>
    This text is in English!
    </item>
   
    <item>
    Это Россия.
    </item>
   
</exercises>

If the document contains errors, the browser gives me an error message.
For example:

XML Parsing Error: not well-formed
Location: file:///c:/WINNT/profiles/csala/Desktop/Webtech.xml
Line Number 13, Column 6:

    Это Россия.
------------^

If the document is well formed, the browser displays the code:

This XML file does not appear to have any style information associated with it. The document tree is shown below.
     

<exercises name="1">
<due>
    07-Februari-2012
    </due>
<item>
    This text is in English!
    </item>
<item>
    Это Россия.
    </item>
</exercises>

A3.

<?xml version="1.0" encoding="UTF-8" ?>
<main>Main Entry:
    <language>language</language>
    <pronunciation>Pronunciation: &apos;la[ng]-gwij, -wij</pronunciation>
    <function>Function: noun</function>
    <etymology>Etymology: Middle English, from Old French, from langue tongue, language, from Latin
    lingua - more at TONGUE</etymology>
    <date>Date: 14th century</date>
    <meaning_1>1 <meaning_1a>a : the words, their pronunciation, and the methods of combining them used and understood
    by a community </meaning_1a><meaning_1b>b
    <meaning_1b_1>(1) : audible, articulate, meaningful sound as produced by the
    action of the vocal organs</meaning_1b_1>
    <meaning_1b_2>(2) : a systematic means of communicating ideas or feelings by
    the use of conventionalized signs, sounds, gestures, or marks having understood meanings</meaning_1b_2>
    <meaning_1b_3>(3) : the suggestion by objects, actions, or conditions of associated ideas or feelings</meaning_1b_3>
    <meaning_1b_4>(4) : the means by which animals communicate </meaning_1b_4>
    <meaning_1b_5>(5) : a formal system of signs and symbols
    (as FORTRAN or a calculus in logic) including rules for the formation and transformation
    of admissible expressions </meaning_1b_5>
    <meaning_1b_6>(6) : MACHINE LANGUAGE 1</meaning_1b_6></meaning_1b></meaning_1>
    <meaning_2>2 <meaning_2a>a : form or manner of verbal expression; specifically : STYLE </meaning_2a><meaning_2b>b : the vocabulary and
    phraseology belonging to an art or a department of knowledge </meaning_2b><meaning_2c>c : PROFANITY</meaning_2c></meaning_2>
    <meaning_3>3 : the study of language especially as a school subject</meaning_3>
</main>

A4.

    Yes, if the text would use a lot of characters like: "" '' <> /, that would be not easy to mend, I think its only possibly to format the text by hand     by replacing such characters with codes like: &lt; &amp; &apos; &quot; etc.
    Also can be "grammatical" problems in the structure. If we put a new word instead of language, it is very possible that the new word has a
    different amount of meanings, so my construction with the three 3 "meaning" elements and 7 child elements its not compatible with every
    word.
Comments