Terminology

A
Administrative Metadata

Metadata used in managing and administering objects, e.g. location or creator information

A
Attributes

Aspects, properties, features, characteristics, parameters associated with an entity or an object in an ontology.

A
Authority Record

A record that registers the preferred form of a personal or corporate name, geographic region or subject term. It may indicate variant forms of the established heading; biographical or cultural information associated with the heading, as well as related headings.

A
Axioms

A statement that is assumed to be true without proof, it is widely accepted on its own merits.

A
C
Classes

Sets, collections, classes, types of objects, kinds of things.

C
Classification

The act of systematically arranging ideas or objects into categories according to specific criteria.

C
Computational Linguistics

An academic field dealing with the statistical and/or rule-based modeling of natural language from a computational perspective.

C
Conference Room Pilot

A process used to validate a software application against the business needs of end-users of the software, by allowing end-users to use the software to carry out typical or key business activities using the new software. A commercial advantage of a conference room pilot is that it may allow the customer to prove that the new software will do the job (meets business requirements and expectations) before committing to buying the software, thus avoiding buying an inappropriate application. The term is most commonly used in the context of 'out of the box' (OOTB) or 'commercial off-the-shelf' software (COTS).

C
Content Model

A schema that defines data (including metadata) structures, including the types of elements, sub-elements and values they contain.

C
Content Standard

Standard authorities or sets of rules that determine the vocabulary, syntax or format of what is entered into a data or metadata element, e.g., Library of Congress Subject Headings, Anglo-American Cataloging Rules.

C
Controlled Vocabulary

Sets of predefined, authorized and curated terms that have been preselected by the designer of the vocabulary. In the organization of information, a pick list or drop-down list of authorized terms is a good example of a controlled vocabulary. Often lists of department names are presented as a controlled vocabulary so there is consistency in the use of terms.

C
D
Descriptive Metadata

Information describing the intellectual content of the object, such as cataloguing records, finding aids or similar schemes

D
Disambiguation

The ability to remove ambiguity from the meaning of a word or term. In text analysis, disambiguation is the process of identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings (polysemy).  When referring to Metadata, disambiguation means the process of eliminating name ambiguity and name variation within a data set.

D
E
Embedded Metadata

Metadata that is maintained and sorted within the object it describes, the opposite of stand-alone metadata

E
Entities

In the field of text analysis, entities refer to person names, organizations, locations, dates, specialized terms and product terminology.

E
Entity Extraction

The process of automatically extracting document metadata from unstructured text documents. Extracting key entities such as person names, organizations, locations, dates, specialized terms and product terminology from free-form text can empower organizations to not only improve keyword search but also open the door to semantic search, faceted search and document repurposing.

E
Events

A change in the value of one or more attributes or a change in the relationship between two or more entities.

E
Extensible

Having the potential to expand in scope, area or size. In the case of the Dublin Core, the ability to extend a core set of meta data with additional elements.

E
F
Functional Terms

Complex structures formed from relationships that can be used in place of an individual term in a statement.

F
G
Granularity

The extent to which a system is broken down into small parts. For example, the term industry granularity describes the extent to which industry subject headings are broken down into narrower subheadings so that users can navigate to very fine grained terms.

G
Graph Database

A kind of database that uses graph structures with nodes, edges, and properties to represent and store information. Can be considered as a candidate technology for storing facts because factual information behaves like a network, and a graph database describes many different networked relationships for each entity.

G
I
Indexing

The process of evaluating information entities and creating terms that aid in finding and accessing the entity. Index terms may be in natural language, controlled vocabulary or a classification notation.

I
Information Architecture

The art and science of organizing and labelling websites, intranets, online communities and software to support usability. It is an emerging discipline and community of practice focused on bringing together principles of design and architectureto the digital landscape. Typically, it involves a model or concept of information which is used and applied to activities that require explicit details of complex information systems. These activities include library systems and database development.

I
Information Extraction

The subset of NLP technology that specifically deals with extracting structured information (entity types, relations, events, sentiments) from unstructured text.

I
Inline tagging

A system that "tags" (or keyword links) exactly where a concept is referenced in a document rather than sending the index terms from the controlled vocabulary or taxonomy to an XML field or other location. Inline tagging is especially useful for lengthy documents. Full text lends itself to inline tagging because each search takes the user directly to the information they were searching for - in its original context.

I
Instances

Individual objects, entities

I
K
Keyword

A word which occurs in a text more often than we would expect to occur by chance alone. Keywords are calculated by carrying out a statistical test (e.g., log linear) which compares the word frequencies in a text against their expected frequencies derived in a much larger corpus, which acts as a reference for general language use.

K
Knowledge Base

A repository based on semantic technology that combines the extracted information facts expressed as triples and the conceptual knowledge from OWL ontology.

K
Knowledge Domain

The content of a particular field of knowledge, which can include an area of human endeavor, an autonomous computer activity, or other specialized discipline.

K
Knowledge Model

A formal (machine readable) description of the general knowledge in a domain. It is expressed in a formal ontology language like OWL in terms of concept types, relation types and axioms in the domain.

K
Knowledge Module

The combination of the knowledge model and the linguistic rules for a particular domain that enables deep semantic extraction from content.

K
L
Lemma

A set for all forms of a word that have the same meanings, e.g. run, runs, ran, running. In this case run is the lemma.

L
Linguistic Rules

The body of rules that express how concepts and relation types are referred to in the text. The use various language elements like parts of speech and stems to determine how to identify and link entities into facts. Linguistic rules are often used to recognize patterns so that information can be extracted from unstructured text.

L
Linguistics

The scientific study of language.

L
Linked Data

A method of exposing, sharing, and connecting data by connecting RDF Triples. An "open data" crowd sourcing initiative exists to connect all data on the web.

L
M
Machine Learning

A scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases.  A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too large to be covered by the set of observed examples. Hence the software program must generalize from the given examples, so as to be able to produce a useful output in new cases.

M
Metadata

Data about data. Often, metadata describes the creation of the data, the purpose of the data, the time and date of creation, identifies the creator or author of data, and provides data about the contribution of the data and its provenance. Metadata describe the format of an artifact and the physical location.

M
Metadata Elements

Properties of the object that are defined in a specification, including "author/creator", "title", and subject.

M
Metadata Functions

The grouping of metadata elements by their purpose.

M
Metadata Record

A full set of structured relevant metadata, comprising all relevant elements, describing one object.

M
Metadata Tags

Encoding that identifies the metadata elements. XML metadata tags identify metadata elements with a representative vocabulary term or an intelligible abbreviation.

M
N
Natural Language Processing

 A field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and natural languages. Specifically, the process of a computer extracting meaningful information from natural language input and/or producing natural language output. Modern NLP algorithms are grounded in machine learning, especially statistical machine learning.

N
O
Ontology

A rigorous organization of a knowledge domain that includes multiple types of relationships, including hierarchical, and equivelance, contains all the relevant entities in a domain and describes how they are related. An ontology is the explicit specification of a conceptualization; it captures the structure of a domain.

O
OWL

A semantic markup language for publishing and sharing ontologies on the World Wide Web.

O
P
Parsing

This process may be divided into two parts: lexical analysis and semantic parsing. Lexical analysis divides strings into components based on punctuation or tagging. Semantic parsing then attempts to determine the meaning of the string.

P
Precision

A measure of exactness or fidelity in information retrieval systems, i.e. the number of relevant documents retrieved by a search divided by the total number of documents retrieved. 

P
Predictive Analysis

A variety of statistical techniques from modeling, machine learning, data mining and game theory that analyze current and historical facts to make predictions about future events. In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions.

P
Preservation Metadata

Metadata related to the preservation of management of information resources, e.g., metadata used to document preservation processes performed on information.

P
Provenance

The origin, or the source of something, or the history of the ownership or location. The term is used in information retrieval for describing the process of going back to the original source of a piece of data.

P
R
RDF

A data model for objects ("resources") and relations between them. RDF provides a method for representing the meaning of data which can be embedded in XML syntax.

R
RDF Schema

An extended RDF model like taxonomy, with preassembled constructs (classes, objects, etc.) aimed specifically at representing knowledge objects.

R
RDF Triples

Each unique fact can be represented as an RDF Triple and includes a subject, predicate and object.

R
Recall

The number of relevant documents retrieved by a search divided by the total number of existing relevant documents.

R
Relationships

Ways in which classes, objects, individuals and entities can be related.

R
Resource

Anything that has an identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources.

R
Restrictions

Formally stated descriptions of what must be true in order for an assertion to be accepted.

R
Rights Management

Metadata dealing with the intellectual property rights of an object.

R
Rules

Statements, usually in an if-then form, that describes logical inferences.

R
S
Schema or Schemes

In general terms, any organization, coding, outline or plan of concepts. In terms of metadata, a systematic, orderly combination of elements or terms. In terms of DCMI term declarations represented in XML or RDF schema language, schemas are machine-processable specifications which define the structure and syntax of metadata specifications in a formal schema language. In terms of an encoding scheme, a set of rules for encoding information that supports a specific community of users.

S
Semantic Enrichment

The process of adding metadata to content to provide additional context to the content and to describe the meaning of words, phrases and concepts presented in the content.

S
Semantic Model

A set of machine-interpretable representations used to model an area of knowledge or some part of the world, including software. Examples of such models are ontologies that embody some community agreement, logic-based representations, etc. Depending upon the framework or language used for modeling, different terminologies exist for denoting the building blocks of semantic models.

S
Semantic Search

A process to improve search accuracy by understanding searcher intent through natural language processing.

S
Semantic Web

A future-state environment that will automatically connect multiple items of content that have the same conceptual characteristics (semantics). Even when data is not physically tagged with specific metadata, the Semantic Web will provide the coding language that can resort to an ontology which describes all possible relationships in the universe.

S
Semantics

The study of meaning, typically focusing on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for. Linguistic semantics is the study of meaning that is used by humans to express themselves through language.

S
Sentiment

The attitude of a speaker or a writer with respect to some topic or the overall tonality of a document. Sentiment analysis refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials.

S
Social Media

Web-based and mobile technologies used to turn communication into interactive dialogue. Social Media allows individuals to interact with organizations and companies on a one-to-one basis and share user-generated content quickly across social networks.

S
SPARQL

A query language purpose-built to retrieve RDF Triples. The language was developed by the Worldwide Web Consortium (WC#).

S
Structural Metadata

Metadata that defines the digital objects internal organization and is needed for display and navigation of that object

S
Subject Headings

An alphabetical list of words or phrases that represent a concept that is under authority control, e.g. the Library of Congress

S
T
Taxonomy

The  practice and science of classification. A taxonomic scheme is a particular classification ("the taxonomy of ..."), arranged in a hierarchical structure. Typically this is organized by supertype-subtype relationships, also called generalization-specialization relationships, or less formally, parent-child relationships. In such an inheritance relationship, the subtype by definition has the same properties, behaviors, and constraints as the supertype plus one or more additional properties, behaviors, or constraints. For example: car is a subtype of vehicle, so any car is also a vehicle, but not every vehicle is a car. Therefore a type needs to satisfy more constraints to be a car than to be a vehicle. Another example: any shirt is also a piece of clothing, but not every piece of clothing is a shirt. Hence, a type must satisfy more parameters to be a shirt than to be a piece of clothing.

T
Technical Metadata

Metadata created for, or generated by, a computer system, relating to how the system or its content behaves for needs to be processed.

T
Temporal

Time related characteristics of the intellectual content of the resource.

T
Thesaurus

A controlled vocabulary of terms or concepts that are structured hierarchically (parent/child relationships) or as equivalences (synonyms) and related terms (associative).

T
Triplestore

A purpose-built database for the storage and retrieval of Resource Description Framework (RDF) metadata. Much like a relational database, one stores information in a triplestore and retrieves it via a query language. Unlike a relational database, a triplestore is optimized for the storage and retrieval of many short statements called triples, in the form of subject-predicate-object.

T
U
URI

A string of characters used to identify a name or a resource on the Internet.

U
Use Case

A list of steps, typically defining interactions between a user and a system, to achieve a goal.

U
User Scenarios

A narrative, which most commonly describes foreseeable interactions of user roles and the technical system. This narrative describes one way that a system is envisaged to be used in the context of an activity in a defined time-frame. The time-frame for a scenario could be a single transaction; a business operation; a day or other period; or the whole operational life of a system. Similarly the scope of a scenario could be a single system or piece of equipment; an equipped team or department; or an entire organization. Scenarios are frequently used as part of the system development process.

U
W
Website Architecture

An approach to the design and planning of websites which, like architecture itself, involves technical, aesthetic and functional criteria. The focus of website architecture is on the user and on user requirements. This requires particular attention to usability, web content, navigation architecture, information architecture and web design.

W
Wireframe

A skeletal three-dimensional model in which only lines and vertices are represented. In web design, a wireframe is a visual guide that represents the skeletal framework of a website. The wireframe depicts the page layout or arrangement of the website’s content, including interface elements and navigational systems, and how they work together. The wireframe usually lacks typographic style, color, or graphics, since the main focus lies in the functionality, behavior, and priority of the content.

W
X
XML

Computer language often used to structure documents; it imposes no semantic constraints on the meaning of these documents.

X