Jun 18, 2012 at 09:34
With respect to types of lexical conceptual resources: when to classify a resource as a wordList, and when to classify it as a terminologicalResource?
UiB has created metadata for several word lists compiled by The Norwegian Language Council. These are special word lists over historical names and events, geographical names, names of inhabitants from different places in Norway, state names, names of public departments, translational correspondences between Norwegian Bokmål and Nynorsk, etc. Most of these lists are not simply listing the names, but they also provide descriptions of the name in question, e.g.:
-Inhabitant names are listed with the geographical origin (name:altaværing; municipality:Alta; County:Finnmark)
-Geographical names are listed with the type of geographical entity and country (Aachen town in Germany)
-Historical names are listed with date of birth (if person) and a description (Abdul Rahman Putra (1903–90) malayan prince and politician)
-wordList (dictionary?) with correspondences between words and expressions in Norwegian Bokmål and Norwegian Nynorsk
These lists are not the result of terminological work, neither are they exhaustive. It thus seems somewhat wrong to classify them as terminologicalResources, but on the other hand they seem to contain too much information to pass as mere wordLists?
thank you for raising this issue. I agree with you that your resource looks neither like a prototypical wordlist nor like a prototypical terminology.
What you have here seems to me more like a gazetteer, especially for the geographical part.
Such resources are sometimes described as lexica, or (machine readable) dictionaries.
The latter option (machine readable dictionary) is the one I would recommend here; it has been used by other users for similar resources (http://metashare.fbk.eu/repository/browse/1086/).
We know that any classification lexicalConcetpual resources is rather arbitrary; it is therefore crucial to provide a good description using the other metadata.
the metadata group
Dear Francesca, thank you very much for the recommendation. We will do our best to supply as detailed information as possible about these resources.
A further question is, which standard format should be used for lists of named entities (with their definitions). Is LMF the most appropriate or are there other option? I include a draft proposal based on LMF, please comment.
<Lexicon> <feat att="name" val="geonavn"/> <feat att="language" val="nb"/> <LexicalEntry> <Lemma> <feat att="partOfSpeech" val="proper noun"/> <FormRepresentation> <feat att="writtenForm" val="Aalst"/> <feat att="originalLanguage" val="nl"/> <feat att="variantType" val="preferred"/> </FormRepresentation> <FormRepresentation> <feat att="writtenForm" val="Alost"/> <feat att="originalLanguage" val="fr"/> <feat att="variantType" val="dispreferred"/> </FormRepresentation> </Lemma> <Sense> <Definition>by i Belgia</Definition> </Sense> </LexicalEntry> <LexicalEntry> <Lemma> <feat att="partOfSpeech" val="proper noun"/> <FormRepresentation> <feat att="writtenForm" val="Etiopia"/> <feat att="variantType" val="current"/> </FormRepresentation> <FormRepresentation> <feat att="writtenForm" val="Abessinia"/> <feat att="variantType" val="obsolete"/> </FormRepresentation> </Lemma> <Sense> <Definition>stat i Afrika</Definition> </Sense> </LexicalEntry> </Lexicon>
P.S. Sorry for the messed up code, I don't know how to prevent this 'smart' system from totally destroying my formatting.
Dear Koenraad,As far as I know there is no standard for representing NEs.(you may want to check the progress of the ISO group on Named Entities annotation).LMF was used for named entities before (Toral A., Munoz R., Monachini M. 2008).The LMF you suggest seems ok, the only probelm is with the multilingual dimension.In LMF, multilingual resources have a special extension. See for this Revision 16, page 48Yet, in your case using the extension would mean to create different lexical entries for each NE variant, and connect them.It seems very unpractical.One question. How did you define the language of the whole lesixon (vs the language of each FormRepresentation)?Is it the language of the definitions?By the way, if you want to use standard feats, check the ISOCAT feature names, if you haven't done it already.Best regards,Francesca FrontiniThe metadata group
sorry for the formatting...
as you see i've had the same problem with my text.