2022.01.17 01:42

Thesaurus as information retrieval tool

Skill as well as patience and subject knowledge are needed, since thesauri vary greatly in quality and in format See Section 5 below. Nowadays it is hard to find indexers with the requisite training, while trained end-users are very rare indeed. Therefore modern systems tend to automate both indexing and searching, using an electronic version of the thesaurus. Generally a software package designed for that application is used, with indexing support capabilities that vary from at the simple end speeding up the task of thesaurus navigation to at the sophisticated end delivering totally automatic indexing.

Forty years ago Caplan reported a number of failures in trials of thesaurus-based automatic indexing.

More recently Lancaster provided a more promising account of the techniques available, but concluded p. Eight years later Tudhope et al.

Ten years later, research into metadata enrichment with thesaurus terms was outlined in Tudhope and Binding Kempf and Neubert describe several modes of implementation, including one that exploits inter-KOS mappings.

Clause 16 of ISO advises on the thesaurus features needed to enable such functions. Meantime, as described in Section 7 below, a new breed of KOS is emerging in the enterprise search sector, loosely named "taxonomy", and stimulating a demand for automatic categorization tools.

Unfortunately a great many in-house applications go unreported in the research literature, including research by the vendors of software for automatic categorization. In the experience of this author, the support for thesaurus-based indexing in off-the-shelf library management packages is rarely as effective or user-friendly as it could be. Likewise the quality of automatic or semi-automated indexing tools varies greatly and much care is needed to obtain reliable outputs.

Further discussion of automatic indexing is outside the scope of this article, particularly since most cases do not use a thesaurus. Turning now to search applications, here too the electronic medium speeds up thesaurus navigation. Furthermore, with suitable software it enables broadening or narrowing a search at will. Consider, for example, a search for "packaging AND fruit". Relevant results would include any items dealing with the packaging of any of potentially hundreds of different types of fruit.

The technique known as search explosion exploits the hierarchical relationships in a thesaurus to expand the search statement automatically and cover all those hundreds of fruit types. It is similarly possible to extend a search via associative relationships, and this is usually termed search expansion. These and other search functions are reviewed in Shiri et al. The case study of the STW Thesaurus for Economics by Kempf and Neubert illustrates similar techniques, and other ways in which a thesaurus can be used to enhance retrieval, even when the user is unaware of its support.

Evidently indexing and searching have moved on from the early days, when a thesaurus and its IR system could operate usefully in isolation, and even without a computer. Thesaurus use in today's IR applications relies on electronic manipulation, involving transfer of data from one subsystem to another. Success depends on interoperability, i. There are now at least two main contexts for thesaurus interoperability:. The vertical context sees a thesaurus transformed from a static map of concepts, terms and relationships to a functioning system.

The horizontal context crosses a different boundary, to be described next. A single search across multiple databases would be relatively straightforward if all used the same natural language, the same machine protocols, and the same indexing language. To overcome the disparities found in real life, two approaches to interoperability are especially relevant for KOSs, namely inter-vocabulary mappings and Linked Data. A mapping is defined as a "relationship between a concept in one vocabulary and one or more concepts in another" ISO , clause 3.

For example, an equivalence mapping between the concepts labelled instant coffee in one thesaurus and soluble coffee in another, would establish that they are viewed as identical for semantic purposes. When sets of mappings are available between many KOSs, it opens the prospect of extending searches widely and multilingually.

It contains concepts from more than KOSs as well as relationships from within the KOSs and many mappings between their respective concepts. Andrade and Lopes Gines de Lara assess its usefulness in retrieval from relevant databases. The influence of this construct has led some authors to speak of a metathesaurus wherever existing thesauri are integrated, linked or mapped together Shiri — and a variety of ways is possible.

Not all mappings are as simple as equivalence. ISO International Organization for Standardization provides for hierarchical and associative mappings as well as equivalence. Hierarchical mappings are directional — either broader or narrower. Equivalence mappings subdivide into simple or compound; compound equivalence has two subtypes intersecting or cumulative while simple equivalence can be qualified as exact or inexact.

Figure 3 shows the range of mapping types, with an example of each. Even more subtlety is possible in applications that need to distinguish between subtypes of hierarchical mapping.

See Figure 4. While thesaurus mapping projects have a much longer history see, for example, Horsnell or Hood and Eberman , or Hoppe reporting on UMLS work that began in , the growth of the Internet and the WWW has made them more widely applicable.

Thus Zeng and Chan drew attention to opportunities emerging in the Internet context and Vizine-Goetz et al. Mayr and Petras illustrated the possibilities. Several other mapping projects were reported in Proceedings of the Cologne Conference on Interoperability and Semantics in Knowledge Organization Boteram et al.

Doerr analysed perceived semantic problems of thesaurus mapping. Confusingly for us today, his use of the term mapping differs from the ISO definition, applying to relationships within one vocabulary rather than between different ones. Thus he deplored the weakness of thesaurus semantics for hierarchical relationships when compared with class subsumption in an ontology.

According to Isaac and Baker , 2. On the one hand, vast resources have come within our reach; on the other hand, individual resources may be expressed in a multiplicity of languages, like the Tower of Babel. SKOS publication followed on from a research report by Miles , 1 aiming "to develop a formal theory of retrieval using controlled vocabularies that have a simple and intuitive structure [such as thesauri, classification schemes, subject heading systems, taxonomies and other types of structured vocabulary], to provide the necessary theoretical foundations for the development of Semantic Web languages and design patterns for distributed retrieval applications".

Since a number of extensions have been added to SKOS to support interoperability in particular contexts; work on some mapping tools for thesauri is described in Endnote 2. Turning to the other main interoperability opportunity, the principles of Linked Data are set out in Tim Berners-Lee's paper at www. As he explains p. With linked data, when you have some of it, you can find other, related, data.

Like the web of hypertext, the web of data is constructed with documents on the web. However, unlike the web of hypertext, where links are relationships anchors in hypertext documents written in HTML, for data they links [sic] between arbitrary things described by RDF". Once that is in place, anyone anywhere can set up a direct link to any concept or class. For example, if a web page or a bibliographic record in a database on the Web has been indexed with the thesaurus concept renewable energy , the person interested in that concept can move directly from the thesaurus to those and other relevant pages.

This opens up the prospect for any thesaurus published on the Web to act as a connecting hub for an immense literature in the subject field concerned, without any need to assemble the disparate documents in one collection or database. A vision of Wikipedia as the connecting linked data hub for hundreds of thesauri and other KOSs is outlined in a speculative paper from Garcia-Marco They hope thereby to enable potential universal access to information in different formats and languages, about the works of art and countless other exhibits in museums, libraries and galleries around the world.

Although not the primary purpose, thesauri may also be used for precoordinate indexing Wellisch ; International Organization for Standardization When this is done, users of the precoordinate index typically found at the back of a book are not expected to consult a thesaurus since cross-references to synonyms etc. Conversely, a thesaurus may be used not for indexing but only for searching. This removes the need for compliance with the standards.

See Section 5. Lykke Nielsen , states that "the thesaurus is a tool that helps individual users to get an understanding of the collective knowledge domain". Broughton www. It makes us think about the nature of concepts, the form of their labels [and about] their relationships" and slide 9 "there's something fundamental about this approach to modelling information domains that should not be lightly abandoned".

More generally Soergel has argued that the construction of any sort of knowledge organization schema, particularly with entity-relationship modelling, facet analysis and a graphical presentation of concepts, is a useful learning discipline. Still more uses are emerging as the Internet pervades the office and everyday living. To satisfy these new uses, however, the standard thesaurus model may need to evolve.

To Peter Mark Roget, working in the middle of the 19th century, we owe the insight that it would be valuable to supply "a collection of the words [the English language] contains and of the idiomatic combinations peculiar to it, arranged, not in alphabetical order, as they are in a dictionary, but according to the ideas which they express" Roget , His aim, rather than information retrieval, was to help "find the word, or words, by which [an] idea may be most fitly and aptly expressed" ibid.

It led to developments such as faceted classification, post-coordinate indexing, and experiment with various sorts of cards, all of which were to prove helpful when the idea of an IR thesaurus was conceived.

According to Roberts the first suggestion of using a thesaurus in the context of IR came from Calvin Mooers in At around the same time C. Bernier and E. Crane made a similar, independent suggestion, but "expressed the view that a general thesaurus was not an appropriate form for retrieval purposes" Roberts , Much experiment followed over the next decade, but none of the various thesaurus approaches described by Joyce and Needham — e.

It was after this gestation period that "the first full-scale, operational in-house retrieval thesaurus [was produced] to solve pressing practical problems at E. Du Pont Nemours and Co. Krooks and Lancaster credit Eugene Wall with developing the principles that determined the shape of this pioneering compilation.

A fuller description of these works can be found in Krooks and Lancaster , and in Aitchison and Dextre Clarke Widespread use of thesauri continued throughout the s, s and s, in IR systems that mostly relied on cards of various types, sizes and materials, including some that were sorted by machines Dextre Clarke ; Sharp These were post-coordinate systems, which require each document to be indexed by selecting relevant terms from a controlled vocabulary such as a thesaurus.

A thesaurus was used too by many of the bibliographic databases that were hosted online by services such as Lockheed's Dialog system, followed later by CD-ROM distribution. Notable pioneers of construction methodology included Jean Viet, Jean Aitchison and Donald Leatherdale, who each produced a number of influential thesauri.

Dextre Clarke provides a vivid account of how the tools and technology of those times were used. Further impetus came from development of national and international standards for thesaurus construction, indicating the extent of interest from the information-using community. The most influential, listed in chronological order of their first editions, included:.

These and other KOS standards are discussed in Dextre Clarke b , although this article pre-dates publication of the two parts of ISO , in and respectively International Organization for Standards ; As noted by Dextre Clarke , 86 "standardisation has not brought uniformity".

Having the status of guidelines rather than mandatory requirements, all the standards left plenty of scope for continuing experiment. While nearly all published thesauri include an alphabetical list of terms which may be as simple as the extract in Figure 2, or can show additional attributes and relationships very often the alphabetical list is complemented by other types of display. The great weakness of any alphabetical list is the need to know a term before one can find the corresponding concept s.

Thus an alphabetical list does not respect Roget's vision that it would be useful to arrange terms systematically according to the ideas or concepts they represent. His literary insight applies equally in the context of information retrieval. A search concerning wood for example, could equally be expressed using the term timber , and an alphabetical list would place these terms far apart even though the underlying concept may well be the same.

The classic TEST thesaurus addressed this weakness by providing three Indexes: permuted, hierarchical and by subject category. A derived style, slightly more elaborate, was followed in several thesauri designed by Jean Viet, including the influential Macrothesaurus from the Organization for Economic Cooperation and Development OECD.

A different approach was adopted by Aitchison, Gomersall and Ireland in their ground-breaking vocabulary Thesaurofacet , comprising a faceted classification fully integrated with a thesaurus. This approach relies on concept-based analysis from the very start, enabling elaboration of the faceted classification and subsequent derivation of a thesaurus.

Biswas and Smith review a number of other efforts to combine a classification scheme with a thesaurus, especially the "Classaurus" and its variants developed in India by Bhattacharyya, Devadason and others.

Broughton a also advocates facet analysis as the soundest basis for thesaurus construction, and claims that "the generation of a thesaurus from its equivalent faceted classification is almost as automatic a process as thesaurus construction can ever hope to be" Broughton b , Rather than a full-blooded classification, the systematic listing of preferred terms in MeSH Medical Subject Headings was a set of extensive hierarchical "Tree Structures" with an elaborate expressive notation that served both as a vocabulary look-up device and as a search key in the databases of MEDLARS Medical Literature Analysis and Retrieval System and later Medline.

The first edition of the multilingual thesaurus AGROVOC Leatherdale , taking a different approach, avoided the need for a separate hierarchical section by embedding the complete upper and lower hierarchical context of each concept within the alphabetical display. See Figure 5. Throughout the s, s and s much of the experiment was constrained by the need to provide users with printed copies of the thesaurus, and update them regularly.

But after that only the electronic version has been maintained. From the s onwards, most new thesauri have been published in electronic media only. If the focus is on an electronic version, not only are the costs and hassle of printed distribution eliminated, but also there is greater freedom to change the presentation frequently in response to feedback, and develop features that support indexing and searching of any associated databases.

A tailing-off in the popularity of thesauri has occurred from approximately the end of the s, probably due to increasing availability of desktop computers, as well as the rise of the Internet Dextre Clarke The new technologies have enabled alternative retrieval methods that for most applications appear less expensive than post-coordinate indexing plus thesaurus development and maintenance. From that time onwards, while a good thesaurus works no less effectively than before, its role has been relegated to relatively fewer search applications Dextre Clarke , such as retrieval from image collections MacFarlane , cultural heritage collections and bibliographic databases.

In these situations it still brings benefits, especially when implemented in linked data mode Tudhope and Binding Shiri paints an optimistic picture of the opportunities. Latest versions of the standards Z, ISO and ISO are dated reaffirmed , confirmed and respectively. While in them the basic principles of thesaurus design show little change from previous versions, it's clear the context in which a thesaurus operates has changed markedly. Interoperability is now the key to success — and is reflected in the content of the standards.

See Endnote 3 for some clarification of the differences between these standards. Some authors believe the way relationships are treated in a thesaurus could usefully evolve.

Alexiev et al. He points out further that the most useful types of relationship to specify may vary from one domain to another. Enthusiasts for change may like to note that the current standards are already permissive of developments e. All international standards are reviewed on a five-year cycle, enabling proponents to make the case for revision as soon as such developments have proved their worth. The passage of time will tell whether the thesaurus continues as before in its relatively few niche applications, or whether it blossoms into new networked opportunities, perhaps revitalised by an infusion of ideas from ontologies and other types of KOS.

Section 7 below summarizes the challenges and opportunities for continuing exploitation and evolution. Despite their astonishing variability in aspects such as subject scope, size, specificity, function, format, layout, language, quality of construction, etc.

Much of the variation seems stylistic rather than fundamental, with one style borrowing features from another and a proliferation of hybrids.

This section will therefore start by describing the "bare minimum" that can be expected in any IR thesaurus, and continue with some discussion of frequently observed differences in style, before discussing some categories of thesaurus that might or might not be considered distinct types. Taken together these three requirements typically lead to a list of all terms and relationships, with entries alphabetically arranged, in the style of the extract in Figure 2.

Alongside these traditional requirements it is worth noting a trend towards applications in which the thesaurus is implemented behind the scenes; this reduces or obviates the need for any kind of display, or indeed for designating the preferred term for a concept.

While the vocabulary illustrated in Figure 2 complies with the standards, a more ambitious thesaurus would also incorporate Scope Notes, History Notes, faceted arrays introduced by node labels, concept groups and other optional features. The data model in Figure 1 points to very many opportunities for enhancing a thesaurus in ways that are standards-compliant and supportive of interoperability in networked applications.

As to format, the alphabetic list is often supplemented by other displays to help users find the right term, such as a classified display, a set of hierarchical trees, a permuted index, or even a graphical display. Thesauri that were developed to serve a particular database sometimes show extras, such as the number of postings for each term. In lieu of explicit display, some of the extra features may be hidden, invoked only as functions of a retrieval system. In this section we discuss presentational differences, which may not be fundamental to thesaurus operation but can still influence user acceptance and hence retrieval effectiveness.

In certain domains a particularly influential thesaurus has influenced the development of subsequent vocabularies. Other historical influences have been thesaurus maintenance software and preferences of the original designers. Many thesauri funded in the twentieth century by the Commission of the European Communities used the ASTUTE software, which generated alphabetical displays with entries in the style of Figure 5.

The style and conceptual approach of pioneers Jean Viet and Jean Aitchison see 4. Shiri provides an update, including screen layouts for electronic thesauri, to be discussed next. Arguably an electronic format is just another stylistic variation, not affecting the fundamentals. In such cases the underlying content and structure of both online and printed versions are the same. That said, the electronic medium offers enhanced opportunities for thesaurus design, maintenance, presentation and implementation, enabling interactive retrieval functions for the users as described in Section 3 above.

Shiri discusses several examples and offers guidelines for the design of thesaurus-enhanced search interfaces. All the stylistic variations described so far can apply to monolingual or to multilingual thesauri. The inclusion of more than one language is not just another variable — it makes a big difference to design, maintenance and use.

The display illustrated is for use by speakers of English; an alternative, language-inverted display for speakers of Spanish would show all the terms and relationships for that language. Multilingual thesauri can be subdivided into two types — symmetrical or not. In a symmetrical thesaurus, every concept has a preferred term in each of the languages, and the scope and relational structure is identical in each.

In a non-symmetrical thesaurus not every concept need be represented in all the languages, and the hierarchical structure may vary from one language to another to accommodate cultural differences. An original aim of the OECD's Macrothesaurus published in was to "create a documentary language for processing information in the broad field of economic and social development, while striving for compatibility with sectoral thesauri serving agriculture, industry, labour, education, population, science, technology, culture communication, health and the environment" Viet , v.

Both the name and the aim were popular, and so years later the term macrothesaurus with a small m was borrowed as a generic name for any broad-level thesaurus that either contains or is aligned with a number of microthesauri having greater specificity in a more limited field. The normal situation according to Aitchison et al. They acknowledge, however, that sometimes the macrothesaurus is a separate entity, managed independently of any corresponding microthesauri. In practice it is not easy to maintain alignment between specialized thesauri used by different communities, unless management is centralized.

Concepts belong to more than one microthesaurus if appropriate. Each microthesaurus contains a hierarchically structured list of concepts, terms and relationships, and can be downloaded separately see and browse at eurovoc. The search thesaurus is one designed for use, not in indexing, but only at the search stage see more discussion in Aitchison et al. At first glance this would not seem to make it very different.

And indeed, sometimes a normal standards-compliant thesaurus is applied only at the search stage, and then described as a "search thesaurus". A deeper study, however, reminds us of the way a standard thesaurus is designed to work: "The concepts are represented by terms, and for each concept, one of the possible representations is selected as the preferred term" ISO , clause 4.

In the case of the thesaurus shown in Figure 2, for example, an indexer would assign the term pigs to every item in the collection that deals with pigs or sows, or hogs, or porkers. The searcher would use only the term pigs to retrieve all these items. But if the same tool was being used as a search thesaurus, indexing would not have taken place. Thus the notion of a preferred term is inapplicable to a search thesaurus designed as such. Standards such as ISO become irrelevant, allowing even greater freedom of content, style and structure.

Lopez-Huertas proposes one example, structured very differently from the standard thesaurus. Another fully worked example is Knapp's The contemporary thesaurus of social science terms and synonyms which attempts to remind readers of many alternative ways of expressing the same idea, using a layout quite different from the standard see Fig.

In practice not many such works have been published. Pioneer s. Early settler s. Pilgrim s. Frontiers man,men. Backwoods man,men. Early colonist s. Homesteader s. Early immigrant s. Consider also: discoverer s , explorer s , pathfinder s , scout s , trailblazer s , leader s. See also: Explorers; Pioneering; Scientists.

The converse of the search thesaurus is the indexing thesaurus , to be used for indexing and not for search. While applications are sometimes found in which indexing is enhanced automatically with the help of a thesaurus, a standard thesaurus is usually applied, rather than one designed for indexing alone.

While the criteria presented by Mader and Haslhofer apply to a range of KOSs and not specifically thesauri, they do help evaluate interoperability in the context of SKOS use, for any controlled vocabulary.

Much earlier, Owens and Cochrane described four approaches — structural, formative, observational and comparative — to thesaurus evaluation. None of these directly measures the effectiveness with which a thesaurus succeeds in the purpose for which it was intended — retrieving information.

Such a measure is difficult if not impossible to devise, partly because the thesaurus is only one of several components in the retrieval system, and partly because there are so many variables in the context of use. Lengthy experiments in the s and s studied the effects on precision and recall as different features of indexing languages were tested, but ultimately failed to provide conclusive support for the use of any controlled vocabulary Keen ; Soergel ; Svenonius ; Dextre Clarke Despite efforts over many years, we still do not have definitive proof that development and use of a thesaurus is a worthwhile investment.

Dextre Clarke provides an account of the continuing debate. Although quantitative proof of efficacy may be lacking, there is plenty of qualitative evidence of thesauri prospering and supporting users in some key areas of a changing environment.

Modes of evaluation may have to adapt to reflect the new context. The thesaurus as conceived by the current national and international standards is still based on the assumption "that human intellect is usually involved in the selection of indexing terms and in the selection of search terms. If both the indexer and the searcher are guided to choose the same term for the same concept, then relevant documents will be retrieved.

This is the main principle underlying thesaurus design" ISO , "Introduction", vi. Nowadays opportunities to apply the thesaurus may shrink because the trained indexer and searcher are increasingly scarce.

End-users are largely unaware of thesauri this is confirmed, for example, by Greenberg ; trained indexers and searchers are usually deemed unaffordable.

Areas where the thesaurus seems most likely to survive and flourish include:. Examples of developments like these may be found in the special issue of Knowledge Organization mentioned above, and in Shiri Simultaneously as true thesauri still thrive in the types of application just listed, a parallel future may lie in their gradual transformation under the banner taxonomy. This term, long applied to the practice and science of classification and especially the Linnaean classification of biological organisms, has been widely applied since the s to a variety of KOSs found in electronic media.

Applications include corporate intranets, online retail sales outlets, digital libraries, public sector advice websites, as well as displacement of the thesaurus in some of its traditional occupations. White points to the value of KO tools and techniques in some of these contexts.

There's still little uniformity among the "taxonomies" being developed for such applications, which may be simple heading lists, or may be complex hybrids that combine features from thesauri, traditional classification schemes, faceted schemes, ontologies and other types of KOS.

In comparison with the widespread adoption of web search engines, their value is barely recognized. But there certainly is a very large need and opportunity for the principles of knowledge organization to be applied towards helping millions of workers in the knowledge society to find information resources of all kinds.

The names we shall find for the emerging hybrid vocabularies are hard to predict, but we can safely say that the thesaurus will pass some of its genes into new tools for searching the cyberworld to come.

The latter site also lists relevant events, blogs, publishers and links to some associated products such as software. The first of these also carries an extensive bibliography. Contents include:. All the articles have useful reference lists. On its website at nkos. For an overview, see Tudhope and Lykke Nielsen Information retrieval is used here broadly to mean "the activity of obtaining information resources relevant to an information need from one or more collections of information resources" definition adapted from Wikipedia.

It is not limited to use in systems held on computer. As explained on the W3C World Wide Web Consortium website, "SKOS [Simple Knowledge Organization Systems] is an area of work developing specifications and standards to support the use of knowledge organization systems KOS such as thesauri, classification schemes, subject heading systems and taxonomies within the framework of the Semantic Web".

Development work on this specification took place around the same time as the development of BS and ISO , with regular communication between the corresponding teams, so that a high degree of compatibility was achieved. The data models of these two standards are not identical, because ISO must provide for the needs of all sorts of thesauri whether for Web use or for other applications while SKOS must provide for all sorts of KOS including classification schemes and many others that do not comply with ISO At the time of writing July this latter site is unavailable but work is in hand to restore it.

More background on all the above developments is provided at www. It should be noted that exploitation of these interoperability standards and opportunities demands skill and attention to detail. Some practical examples and cautionary tales are provided in De Smedt and Lindenthal Here are some key points of similarity or difference:.

In view of the relatively wider scope of ISO in respect of thesauri, including in-depth treatment of interoperability and provision of a data model, it has been referenced more often than Z Aitchison, Jean. International thesaurus of refugee terms. Aitchison, Jean and Stella Dextre Clarke. The thesaurus: a historical viewpoint, with a look to the future.

Aitchison, Jean and Alan Gilchrist. Thesaurus construction: a practical manual. London: Aslib. Thesaurus construction and use: a practical manual. Thesaurofacet: a thesaurus and faceted classification for engineering and related subjects.

American Institute of Chemical Engineers. Chemical Engineering Thesaurus: a wordbook for use with the concept coordination system of information storage and retrieval. American National Standards Institute. American National Standard Guidelines for thesaurus structure, construction and use.

ANSI Z Knowledge Organization 43, no. Armed Services Technical Information Agency. Baca, Murtha and Melissa Gill. Encoding multilingual knowledge systems in the digital age: the Getty vocabularies. Knowledge Organization 42, no. Berners-Lee, Tim. Available at www. Biswas, Subal C and Fred Smith. Classed thesauri in indexing and retrieval: a literature review and critical evaluation of online alphabetic classaurus. Library and Information Science Research 11, no.

Wuerzburg, Germany: Ergon Verlag. British Standards Institution. Broughton, Vanda. The need for a faceted classification as the basis of all methods of information retrieval. Aslib Proceedings 58, no. Caplan, Priscilla Louise. Thesaurus-based automatic indexing: a study of indexing failure. Chapel Hill, North Carolina University. Caracciolo, Caterina and Johannes Keizer. What KOS can do, with the proper tools available. De Keyser, Pierre.

The TGN includes names and associated information about places. Places in TGN include administrative political entities and physical features. Current and historical places are included. Other information related to history, population, culture, art and architecture is included. It documents the standardization and registration of metadata to make data understandable and shareable. Simple Knowledge Organization System SKOS is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary.

ISO was the ISO international standard for monolingual thesauri for information retrieval, first published in and revised in The official title of the standard was " Guidelines for the establishment and development of monolingual thesauri ". The AgMES initiative was developed by the Food and Agriculture Organization FAO of the United Nations and aims to encompass issues of semantic standards in the domain of agriculture with respect to description, resource discovery, interoperability and data exchange for different types of information resources.

Agricultural Information Management Standards , abbreviated to AIMS is a space for accessing and discussing agricultural information management standards, tools and methodologies connecting information workers worldwide to build a global community of practice. Information management standards, tools and good practices can be found on AIMS:. Its full title was Guidelines for the establishment and development of multilingual thesauri.

It was withdrawn in , when replaced by ISO See more explanation on the official website for ISO Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:. AGRIS is a global public domain database with more than 12 million structured bibliographical records on agricultural science and technology.

It became operational in and the database was maintained by Coherence in Information for Agricultural Research for Development, and its content is provided by more than participating institutions from 65 countries.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data LLOD describes a method and an interdisciplinary community concerned with creating, sharing, and re- using language resources in accordance with Linked Data principles. For other uses, see Thesaurus disambiguation.

This section needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.

March Learn how and when to remove this template message. They are often used by writers to help find the best word to express an idea ISO is the international standard for thesauri, published in two parts as follows: ISO Information and documentation - Thesauri and interoperability with other vocabularies Part 1: Thesauri for information retrieval [published August ] Part 2: Interoperability with other vocabularies [published March ].

The pre-history of the information retrieval thesaurus. Journal of Documentation , 40 4 , , p. The thesaurus: a historical viewpoint, with a look to the future. The evolution of guidelines for thesaurus construction. Libri , 43 4 , , p. From ISO to ISO the evolution of thesaurus standards towards interoperability and data modeling Information standards quarterly , 24 1 , , p.

National Information Standards Organization, Natural language processing. Collocation extraction Concept mining Coreference resolution Deep linguistic processing Distant reading Information extraction Named-entity recognition Ontology learning Parsing Part-of-speech tagging Semantic role labeling Semantic similarity Sentiment analysis Terminology extraction Text mining Textual entailment Truecasing Word-sense disambiguation Word-sense induction.

Compound-term processing Lemmatisation Lexical analysis Text chunking Stemming Sentence segmentation Word segmentation. Multi-document summarization Sentence extraction Text simplification. Speech recognition Speech segmentation Speech synthesis Natural language generation Optical character recognition. Document classification Latent Dirichlet allocation Pachinko allocation.

Chatbot Interactive fiction Question answering Virtual assistant Voice user interface.

taiwinheatlna1988's Ownd

0コメント

1000 / 1000