Metadata (
meta data, or sometimes
metainformation) is "data about data", of any sort
in any media. Metadata is text, voice, or image that describes what
the audience wants or needs to see or experience. The audience
could be a person, group, or software program. Metadata is
important because it aids in clarifying and finding the actual
data. An item of metadata may describe an individual
datum, or content item, or a collection of data
including multiple content items and hierarchical levels, such as a
database schema. In data processing,
metadata provides information about, or documentation of, other
data managed within an application or environment. This commonly
defines the structure or
schema of
the primary data.
For example, metadata would document data about data elements or
attributes, (name, size, data type, etc) and data about records or
data structures (length, fields, columns, etc) and data about data
(where it is located, how it is associated, ownership, etc.).
Metadata may include descriptive information about the context,
quality and condition, or characteristics of the data. It may be
recorded with high or low
granularity.
An example of metadata occurs within file systems. Associated with
every file on the storage medium is metadata that records the date
the file was created, the date it was last modified and the date
the file (or indeed the metadata itself) was last accessed.
Purpose
Metadata provides context for data.
Metadata is used to facilitate the understanding, usage, and
management of data, both by human and computers. Thus metadata can
describe the data conceptually so that others can understand them;
it can describe the data syntactically so others can use them; and
the two types of descriptions together can facilitate decisions
about how to manage the data.
The metadata required to effectively work with data with the type
of data, their context of use, and their purpose. Often data
providers will provide users access to a variety of metadata
fields, which can be used individually or in combinations, and
applied by different users to achieve different goals. These users
can be human 'end users', or other computing systems.
See also the
Use section below for more details about the
use of metadata.
Hierarchies
When structured into a hierarchical arrangement, metadata is more
properly called an
ontology or
schema. Both terms describe "what exists" for some
purpose or to enable some action. For instance, the arrangement of
subject headings in a library catalog serves not only as a guide to
finding books on a particular subject in the stacks, but also as a
guide to what subjects "exist" in the library's own ontology and
how more specialized topics are related to or derived from the more
general subject headings.
Metadata is frequently stored in a central location and used to
help organizations standardize their data. This information is
typically stored in a
metadata
registry.
Examples
These examples list metadata that describe particular digital
entities. For clarity and consistency with some definitions of
metadata, these examples are expressed with respect to digitized
form of each entity, not data that is represented solely in a
physical object like a book. (For information resources that are
not in digital form, metadata is only the information that
describes the information content, not the information about the
physical representation.)
In most cases, the examples illustrate the use of metadata to
describe the entity's content (conceptually), how the entity came
to be (provenance), and information necessary for the system to use
it. The last set of information about system-related details is
typically hidden from the user, but includes the internal file
name, location, and creation/access times for the digital
entity.
Because the concept of metadata is specific to each situation—"one
person's data is another person's metadata"—examples should be
considered illustrative rather than absolute.
Video Recording
The television show or movie recorded on a
digital video recorder has extensive
metadata. These may include the title, director, actors, summary of
the contents, length of the recording, critical rating, and the
data and source of this recording. System use metadata includes the
file name and current status (viewing status, 'save until'
date).
Book
Examples of
metadata regarding a book
would be the title, author(s), date of publication, subject, a
unique identifier (such as
International Standard Book
Number ), number of pages, and the language of the text.
Metadata unique to the electronic format includes usage (last
opened, current page, times read) and other user-provided data
(ranking, tags, annotations). System use metadata might include
purchase and digital rights information for the content.
Image
Digital images include both digital
photographs, and images that have been created or modified on a
computer. Metadata for a digital photograph typically includes the
date and time at which it was created and details of the camera
settings (such as focal length, aperture, exposure). Many digital
cameras record metadata in their digital images, in formats like
exchangeable image file
format (EXIF) or
JPEG. Some cameras can
automatically include extended metadata such as the location the
picture was taken (e.g., from a GPS). Most image editing software
includes at least some metadata in the digital image, and can
include content about the image's provenance and licensing.
Audio
Audio recordings may also be labeled with metadata. When audio
formats moved from analog to digital, it became possible to embed
this metadata within the digital content itself. (Without any
metadata, the digital content is simply a file containing the audio
waveform.)
Metadata can be used to name, describe, catalog and indicate
ownership or copyright for a digital audio file, as well as allow
user characterizations of the audio content (ratings, tags, and
other auxiliary metadata). Its presence simplifies locating a
specific audio file within a group, through use of a search engine
that accesses the metadata. The typical audio player or audio
application on a computer relies heavily on metadata to provide
user features.
As different digital audio formats were developed, it was agreed
that a standardized and specific location would be set aside within
the digital files where this information could be stored. As a
result, almost all digital audio formats, including
mp3, broadcast wav and
AIFF files,
have similar standardized locations that can be populated with
metadata. This "information about information" has become one of
the great advantages of working with digital audio files, since the
catalog and descriptive information that makes up the metadata is
built right into the audio file itself, ready for easy access and
use.
Web page
The
HTML format used to define web pages allows
for the inclusion of a variety of types of metadata, from simple
descriptive text, dates and keywords to highly-granular information
such as the
Dublin Core and
e-GMS standards. Pages can be
geotagged with
coordinates. Metadata may be
included in the page's header or in a separate file.
Microformats allow metadata to be added to
on-page data in a way that users don't see, but computers can
readily access.
Levels
The hierarchy of metadata descriptions can go on forever, but
usually context or semantic understanding makes extensively
detailed explanations unnecessary.
The role played by any particular
datum
depends on the context. For example, when considering the geography
of London, "E8 3BJ" would be a datum and "Post Code" would be
metadatum. But, when considering the data management of an
automated system that manages geographical data, "Post Code" might
be a datum and then "data item name" and "6 characters, starting
with A–Z" would be metadata.
In any particular context, metadata characterizes the data it
describes, not the entity described by that data. So, in relation
to "E8 3BJ", the datum "is in London" is a further description of
the place in the real world which has the post code "E8 3BJ", not
of the code itself. Therefore, although it is providing information
connected to "E8 3BJ" (telling us that this is the post code of a
place in London), this would not normally be considered metadata,
as it is describing "E8 3BJ" as a place in the real world and not
as data.
Definitions
Etymology
Meta is a classical Greek preposition (μετ’
αλλων εταιρων) and prefix (μεταβασις) conveying the following
senses in English, depending upon the case of the associated noun:
among; along with; with; by means of; in the midst of; after;
behind. In
epistemology, the word means
"about (its own category)"; thus metadata is "data about the
data".
Varying definitions
The term was introduced intuitively, without a formal definition.
Because of that, today there are various definitions. The most
common one is the literal translation:
- "Data about data are referred to as metadata."
Example: "12345" is data, and with no additional context is
meaningless.
When "12345" is given a meaningful name
(metadata) of "ZIP code", one can
understand , and further placing "ZIP code" within the context of a
postal address) that "12345" refers
to the General Electric plant in
Schenectady, New
York
.
As for most people the difference between data and
information is merely a
philosophical one of no relevance in practical
use, other definitions are:
- Metadata is information about data.
- Metadata is information about information.
- Metadata contains information about that data or other
data
There are more sophisticated definitions, such as:
- "Metadata is structured, encoded data that describe
characteristics of information-bearing entities to aid in the
identification, discovery, assessment, and management of the
described entities."
- "[Metadata is a set of] optional structured descriptions that
are publicly available to explicitly assist in locating
objects."
These are used more rarely because they tend to concentrate on one
purpose of metadata — to find "objects", "entities" or "resources"
— and ignore others, such as using metadata to optimize
compression algorithms, or to perform
additional computations using the data.
The metadata concept has been extended into the world of systems to
include any "data about data": the names of tables, columns,
programs, and the like. Different views of this "system metadata"
are detailed below, but beyond that is the recognition that
metadata can describe all aspects of systems: data, activities,
people and organizations involved, locations of data and processes,
access methods, limitations, timing and events, as well as
motivation and rules.
Fundamentally, then, metadata is "the data that describe the
structure and workings of an organization's use of information, and
which describe the systems it uses to manage that information". To
do a model of metadata is to do an "
Enterprise model" of the information
technology industry itself.
Markup
In the context of the web and the work of the
W3C in providing markup technologies of
HTML,
XML and
SGML the concept of metadata has specific context that
is perhaps clearer than in other information domains. With markup
technologies there is metadata, markup and data content. The
metadata describes characteristics about the data, while the markup
identifies the specific type of data content and acts as a
container for that document instance. This page in Wikipedia is
itself an example of such usage, where the textual information is
data, how it is packaged, linked, referenced, styled and displayed
is markup and aspects and characteristics of that markup are
metadata set globally across Wikipedia.
In the context of markup the metadata is architected to allow
optimization of document instances to contain only a minimum amount
of metadata, while the metadata itself is likely referenced
externally such as in a
schema definition
(
XSD) instance. Also it should be noted that
markup provides specialised mechanisms that handle referential
data, again avoiding confusion over what is metadata or data, and
allowing optimizations. The reference and ID mechanisms in markup
allowing reference links between related data items, and links to
data items that can then be repeated about a data item, such as an
address or product details. These are then all themselves simply
more data items and markup instances rather than metadata.
Similarly there are concepts such as classifications, ontologies
and associations for which markup mechanisms are provided. A data
item can then be linked to such categories via markup and hence
provide a clean delineation between what is metadata, and actual
data instances. Therefore the concepts and descriptions in a
classification would be metadata, but the actual classification
entry for a data item is simply another data instance.
Some examples can illustrate the points here. Items in bold are
data content, in italic are metadata, normal text items are all
markup.
The two examples show in-line use of metadata within markup
relating to a data instance (XML) compared to simple markup
(HTML).
A simple
HTML instance example:
<span style="normalText">Example</span>
And then an
XML instance example with
metadata:
nillable="true">John
Where the inline assertion that a person's middle name may be an
empty data item is metadata about the data item. Such definitions
however are usually not placed inline in XML. Instead these
definitions are moved away into the
schema
definition that contains the metadata for the entire document
instance. This again illustrates another important aspect of
metadata in the context of markup. The metadata is optimally
defined only once for a collection of data instances. Hence
repeated items of markup are rarely metadata, but rather more
markup data instances themselves.
Difference between data and metadata
Usually it is not possible to distinguish between (plain) data and
metadata because:
- Something can be data and metadata at the same time. The
headline of an article is both its title (metadata) and part of its
text (data).
- Data and metadata can change their roles. A poem, as such,
would be regarded as data, but if there is a song that uses it as
lyrics, the whole poem could be attached to an audio file of the
song as metadata. Thus, the labeling depends on the point of
view.
These considerations apply no matter which of the above definitions
is considered, except where explicit markup is used to denote what
is data and what is metadata.
Use
Metadata has many different applications; this section lists some
of the most common.
Metadata is used to speed up and enrich searching for resources. In
general, search queries using metadata can save users from
performing more complex filter operations manually. It is now
common for web browsers (with the notable exception of Mozilla
Firefox), P2P applications and media management software to
automatically download and locally cache metadata, to improve the
speed at which files can be accessed and searched.
Metadata may also be associated to files manually. This is often
the case with documents which are scanned into a document storage
repository such as FileNet or Documentum. Once the documents have
been converted into an electronic format a user brings the image up
in a viewer application, manually reads the document and keys
values into an online application to be stored in a metadata
repository.
Metadata provide additional information to users of the data it
describes. This information may be descriptive ("These pictures
were taken by children in the school's third grade class.") or
algorithmic ("Checksum=139F").
Metadata helps to bridge the
semantic
gap. By telling a computer how data items are related and how
these relations can be evaluated automatically, it becomes possible
to process even more complex filter and search operations. For
example, if a search engine understands that "Van Gogh" was a
"Dutch painter", it can answer a search query on "Dutch painters"
with a link to a web page about Vincent Van Gogh, although the
exact words "Dutch painters" never occur on that page. This
approach, called knowledge representation, is of special interest
to the
semantic web and
artificial intelligence.
Certain metadata is designed to optimize
lossy compression. For example, if a video
has metadata that allows a computer to tell foreground from
background, the latter can be compressed more aggressively to
achieve a higher compression rate.
Some metadata is intended to enable variable content presentation.
For example, if a picture has metadata that indicates the most
important region — the one where there is a person — an image
viewer on a small screen, such as on a mobile phone's, can narrow
the picture to that region and thus show the user the most
interesting details. A similar kind of metadata is intended to
allow blind people to access diagrams and pictures, by converting
them for special output devices or reading their description using
text-to-speech software.
Other descriptive metadata can be used to automate workflows. For
example, if a "smart" software tool knows content and structure of
data, it can convert it automatically and pass it to another
"smart" tool as input. As a result, users save the many
copy-and-paste operations required when
analyzing data with "dumb" tools.
Metadata is becoming an increasingly important part of
electronic discovery.
[701775] Application and file system metadata
derived from
electronic
documents and files can be important evidence. Recent changes
to the
Federal Rules of
Civil Procedure make metadata routinely discoverable as part of
civil litigation. Parties to
litigation are required to maintain and produce metadata as part of
discovery, and
spoliation of metadata can lead to
sanctions.
Metadata has become important on the
World Wide Web because of the need to find
useful information from the mass of information available.
Manually-created metadata adds value because it ensures
consistency. If a web page about a certain topic contains a word or
phrase, then all web pages about that topic should contain that
same word or phrase. Metadata also ensures variety, so that if a
topic goes by two names each will be used. For example, an article
about "
sport utility vehicles"
would also be
tagged "4 wheel
drives", "4WDs" and "four wheel drives", as this is how SUVs are
known in some countries.
Examples of metadata for an
audio CD
include the
MusicBrainz project and
All Media Guide's
Allmusic. Similarly,
MP3 files
have metadata tags in a format called
ID3.
Types
Metadata can be classified by:
- Content. Metadata can either describe the resource
itself (for example, name and size of a file) or the
content of the resource (for example, "This video shows a
boy playing football").
- Mutability. With respect to the whole resource, metadata can be
either immutable (for example, the "Title" of a video does
not change as the video itself is being played) or mutable
(the "Scene description" does change).
- Logical function. There are three layers of logical function:
at the bottom the subsymbolic layer that contains the raw
data itself, then the symbolic layer with metadata
describing the raw data, and on the top the logical layer
containing metadata that allows logical reasoning using the
symbolic layer
types of metadata are;
- descriptive metadata.
- administrative metadata.
- structural metadata.
- technical metadata.
- use metadata
To successfully develop and use metadata, several important issues
should be treated with care:
Risks
Microsoft Office files include
metadata beyond their printable content, such as the original
author's name, the creation date of the document, and the amount of
time spent editing it. Unintentional disclosure can be awkward or
even, in professional practices requiring confidentiality, raise
malpractice concerns. Some of Microsoft Office document's metadata
can be seen by clicking
File then
Properties from
the program's menu. Other metadata is not visible except through
external analysis of a file, such as is done in forensics. The
author of the Microsoft Word-based
Melissa computer virus in 1999 was
caught due to Word metadata that uniquely identified the computer
used to create the original infected document.
Lifecycle
Even in the early phases of planning and designing it is necessary
to keep track of all metadata created. It is not economical to
start attaching metadata only after the production process has been
completed. For example, if metadata created by a digital camera at
recording time is not stored immediately, it may have to be
restored afterwards manually with great effort. Therefore, it is
necessary for different groups of resource producers to cooperate
using compatible methods and standards.
- Manipulation. Metadata must adapt if the resource it describes
changes. It should be merged when two resources are merged. These
operations are seldom performed by today's software; for example,
image editing programs usually do not keep track of the Exif metadata created by
digital cameras.
- Destruction. It can be useful to keep metadata even after the
resource it describes has been destroyed, for example in change
histories within a text document or to archive file deletions due
to digital rights management. None of today's metadata standards
consider this phase.
Storage
Metadata can be stored either
internally, in the same file
as the data, or
externally, in a separate file. Metadata
that is embedded with content is called
embedded metadata.
A data repository typically stores the metadata
detached
from the data. Both ways have advantages and disadvantages:
- Internal storage allows transferring metadata together with the
data it describes; thus, metadata is always at hand and can be
manipulated easily. This method creates high redundancy and does
not allow holding metadata together.
- External storage allows bundling metadata, for example in a
database, for more efficient searching. There is no redundancy and
metadata can be transferred simultaneously when using streaming. However, as most formats use
URIs for that purpose,
the method of how the metadata is linked to its data should be
treated with care. What if a resource does not have a URI
(resources on a local hard disk or web pages that are created
on-the-fly using a content management system)? What if metadata can
only be evaluated if there is a connection to the Web, especially
when using RDF? How
to realize that a resource is replaced by another with the same
name but different content?
Moreover, there is the question of data format: storing metadata in
a human-readable format such as XML can be useful because users can
understand and edit it without specialized tools. On the other
hand, these formats are not optimized for storage capacity; it may
be useful to store metadata in a binary, non-human-readable format
instead to speed up transfer and save memory.
Types
In general, there are two distinct classes of metadata: structural
or control metadata and guide metadata. Structural metadata is used
to describe the structure of computer systems such as tables,
columns and indexes. Guide metadata is used to help humans find
specific items and is usually expressed as a set of keywords in a
natural language.
Metadata can be divided into 3 distinct categories:
- Administrative
- Descriptive
- Structural
Information Technology and Software Engineering metadata
General IT metadata
In contrast, David Marco, another metadata theorist, defines
metadata as "all physical data and knowledge from inside and
outside an organization, including information about the physical
data, technical and business processes, rules and constraints of
the data, and structures of the data used by a corporation." Others
have included web services, systems and interfaces. In fact, the
entire
Zachman Framework (see
Enterprise Architecture) can
be represented as metadata.
Notice that such definitions expand metadata's scope considerably,
to encompass most or all of the data required by the
Management Information Systems
capability. In this sense, the concept of metadata has significant
overlaps with the
ITIL concept of a
Configuration Management Database (
CMDB), and
also with disciplines such as
Enterprise Architecture and
IT portfolio management.
This broader definition of metadata has precedent. Third generation
corporate repository products (such as those eventually merged into
the CA Advantage line) not only store information about data
definitions (COBOL copybooks, DBMS schema), but also about the
programs accessing those data structures, and the
Job Control Language and batch job
infrastructure dependencies as well. These products (some of which
are still in production) can provide a very complete picture of a
mainframe computing environment, supporting exactly the kinds of
impact analysis required for ITIL-based processes such as
Incident and
Change Management. The
ITIL Back Catalogue includes the
Data Management
volume which recognizes the role of these metadata products on the
mainframe, posing the
CMDB as the distributed
computing equivalent. CMDB vendors however have generally not
expanded their scope to include data definitions, and metadata
solutions are also available in the distributed world. Determining
the appropriate role and scope for each is thus a challenge for
large IT organizations requiring the services of both.
Since metadata is pervasive, centralized attempts at tracking it
need to focus on the most highly leveraged assets. Enterprise
Assets may only constitute a small percentage of the entire IT
portfolio.
Some practitioners have successfully managed IT metadata using the
Dublin Core metamodel.
IT metadata management products
First generation data dictionary/metadata repository tools would be
those only supporting a specific
DBMS, such as
IDMS's IDD (integrated data dictionary), the
IMS Data Dictionary,
and
ADABAS's Predict.
Second generation would be ASG's DATAMANAGER product which could
support many different file and DBMS types.
Third generation repository products became briefly popular in the
early 1990s along with the rise of widespread use of
RDBMS engines such as IBM's
DB2.
Fourth generation products link the repository with more
Extract, transform, load tools and
can be connected with architectural modeling tools.
Fifth generation products are taking things to a
new level by integrating distributed computing, specialized
hardware, extreme visualization, and analytics, in a sense that now
allows vertical uses of metadata in all sorts of things such as
applications, messaging buses etc.
Relational database metadata
Each
relational database system
has its own mechanisms for storing metadata. Examples of
relational-database metadata include:
- Tables of all tables in a database, their names, sizes and
number of rows in each table.
- Tables of columns in each database, what tables they are used
in, and the type of data stored in each column.
In database terminology, this set of metadata is referred to as the
catalog. The
SQL
standard specifies a uniform means to access the catalog, called
the
INFORMATION_SCHEMA, but not all databases
implement it, even if they implement other aspects of the SQL
standard. For an example of database-specific metadata access
methods, see
Oracle metadata.
Programmatic access to metadata is possible using APIs such as
JDBC, or SchemaCrawler.
Data warehouse metadata
Data warehouse (DW) is a repository
of an organization's electronically stored data. Data warehouses
are designed to manage and store the data whereas the
Business Intelligence (BI) focuses on
the usage of data to facilitate reporting and analysis
The purpose of a data warehouse is to house standardized,
structured, consistent, integrated, correct, cleansed and timely
data, extracted from various operational systems in an
organization. The extracted data is integrated in the
data warehouse environment in order to
provide an enterprise wide perspective, one version of the
truth.Data is structured in a way to specifically address the
reporting and analytic requirements.
An essential component of a
data
warehouse/
business
intelligence system is the metadata and tools to manage and
retrieve metadata.
Ralph Kimball
describes metadata as the DNA of the data warehouse as metadata
defines the elements of the
data
warehouse and how they work together.
As the
data warehouse tools and the
users of
business intelligence
products matures the importance of metadata will increase. Metadata
will in the future also be used for creating simple
ETL functionality for storing data as well as for
creating functionality to display business information to
end-users. An increase in users developing even
ETL functionality is most likely in the future.
As data and information increases metadata will probably be very
important in relation to search for the relevant already existing
information to be used in a specific decision process. Simple
search tools like Google might be developed.
The enterprise perspective of the
data
warehouse also applies to metadata. The idealistic is to have
one single enterprise metadata repository used by all processes of
the DW/BI system. An ongoing process in the data warehouse
lifecycle is to keep the metadata repository updated reflecting
changes in business and data.
Often metadata is split into two categoriesInternal meta-data which
is relevant for system administrators and external meta-data which
is relevant for end-users.
According to
Ralph Kimball metadata
can be divided into 2 similar categories - Technical metadata and
Business metadata. Technical metadata correspond to internal
metadata, business metadata to external metadata. Kimball adds a
third category named Process metadata. This applies for the back
room of the
data warehouse (the
ETL –
extract,
transform and
load - part of
the DW/BI system), the front room (reports, and the BI
applications) and for the presentation server connecting the front
and the back room. Data is stored on the presentation server in a
dimensional structure together with an aggregate navigator.
Technical metadata is mostly definitional, while process and
business data are mainly descriptive.
Examples for each category of metadata:
- Technical metadata for ETL
- Source descriptions of all data sources, including record
layouts and column definitions.
- Business metadata for ETL
- Data quality screen specifications including the code for data
quality tests, severity score of the potential error, and action to
be taken when error occurs.
- Process metadata for ETL
- ETL operations statistics including start times, end times, CPU
seconds used, disk reads, disk writes, and row counts
- Technical metadata for presentation server
- Database system tables containing standard RDBMS table, column, view, index, and security
information
- Business metadata for presentation server
- Business metadata regarding the presentation server is provided
by the BI application’s semantic layer, the OLAP definitions, or
the database system table and column directly.
- Process metadata for presentation server
- Database monitoring system tables containing information about
the use of tables throughout the presentation server.
- Technical metadata for BI
- BI semantic layer definition including business names for all
tables and columns mapped to appropriate presentation server
objects, join paths, computed columns, and business groupings. May
also include aggregate navigation and drill across
functionality
Obviously, the BI semantic layer definition contains robust
business-oriented metadata. Additional BI business metadata
includes the following:
-
- Conformed attribute and fact definitions and business rules
including slowly changing dimensions policies, null handling, and
error handling for each column.
- Process metadata for BI
- Report and query execution statistics including user, column,
table and application usage tracking, run times, and result set row
count
Business Intelligence metadata
Described in connection with metadata for
Data Warehouse
File system metadata
Nearly all
file systems keep metadata
about files
out-of-band. Some systems
keep metadata in
directory
entries; others in specialized structure like
inodes or even in the name of a file. Metadata can
range from simple
timestamps,
mode bits, and other special-purpose information
used by the implementation itself, to
icons and free-text comments, to arbitrary
attribute-value pairs.
With more complex and open-ended metadata, it becomes useful to
search for files based on the metadata contents. The
Unix find utility was an early
example, although inefficient when scanning hundreds of thousands
of files on a modern computer system.
Apple Computer
's Mac OS X operating system
supports cataloguing and searching for file metadata through a
feature known as Spotlight, as
of version 10.4. Microsoft worked in the development of similar
functionality with the
Instant Search
system in
Windows Vista, as well as
being present in
SharePoint
Server.
Linux implements file metadata
using
extended file
attributes.
Program metadata
Metadata is casually used to describe the controlling data used in
software architectures that are more abstract or configurable. Most
executable file formats
include what may be termed "metadata" that specifies certain,
usually configurable, behavioral
runtime characteristics. However, it is
difficult if not impossible to precisely distinguish program
"metadata" from general aspects of
stored-program computing
architecture; if the machine reads it and acts upon it, it is a
computational
instruction, and the prefix
"meta" has little significance.
In
Java, the
class file format contains metadata used
by the
Java compiler and the
Java virtual machine to
dynamically link classes and to support
reflection. The
Java Platform, Standard
Edition since J2SE 5.0 has included a
metadata facility to allow
additional annotations that are used by
development tools.
In
MS-DOS, the
COM
file format does
not include metadata, while the
EXE file and Windows
PE formats do. These metadata can
include the company that published the program, the date the
program was created, the version number and more.
In the
Microsoft .NET executable
format, extra metadata is included to allow
reflection at runtime.
Existing software metadata
Object Management Group
(OMG) has defined metadata format for representing entire existing
applications for the purposes of
software mining,
software modernization and software
assurance. This specification, called the OMG
Knowledge Discovery Metamodel
(KDM) is the OMG's foundation for "modeling in reverse". KDM is a
common language-independent intermediate representation that
provides an integrated view of an entire enterprise application,
including its behavior (program flow), data, and structure. One of
the applications of KDM is Business Rules Mining.
Knowledge Discovery
Metamodel includes a fine grained low-level representation
(called "micro KDM"), suitable for performing static analysis of
programs.
Document metadata
Most programs that create documents, including Microsoft
SharePoint,
Microsoft Word and other
Microsoft Office products, save metadata
with the document files. These metadata can contain the name of the
person who created the file (obtained from the operating system),
the name of the person who last edited the file, how many times the
file has been printed, and even how many revisions have been made
on the file. Other saved material, such as deleted text (saved in
case of an undelete command), document comments and the like, is
also commonly referred to as "metadata", and the inadvertent
inclusion of this material in distributed files has sometimes led
to undesirable disclosures.
Document Metadata is particularly important in legal environments
where litigation can request this sensitive information (metadata)
which can include many elements of private detrimental data. This
data has been linked to multiple lawsuits that have got
corporations into legal complications.
Many legal firms today use
metadata removal tool. These clean
documents before they are sent outside of the firm. This process
partially protects lawfirms from potentially unsafe leaking of
sensitive data through
Electronic
Discovery. Removal of metadata alone is only one aspect of
redaction, a
technique for which it's infamously necessary to perform thoroughly
and completely.
For a list of executable formats, see
object
file.
Digital library metadata
There are three categories of metadata that are frequently used to
describe objects in a
digital
library:
- descriptive - Information describing the
intellectual content of the object, such as MARC cataloguing records, finding aids or
similar schemes. It is typically used for bibliographic purposes
and for search and retrieval.
- structural - Information that ties each object
to others to make up logical units (e.g., information that relates
individual images of pages from a book to the others that make up
the book).
- administrative - Information used to manage
the object or control access to it. This may include information on
how it was scanned, its storage format, copyright and licensing information, and
information necessary for the long-term preservation of the digital
objects.
Standards for metadata in digital libraries include
Dublin Core,
METS,
PREMIS
schema, and
OAI-PMH.
Image metadata
Examples of image files containing metadata include
Exchangeable image file
format (EXIF) and
Tagged
Image File Format (TIFF).
Having metadata about images embedded in TIFF or EXIF files is one
way of acquiring additional data about an image.
Tagging pictures with subjects, related
emotions, and other descriptive phrases helps Internet users find
pictures easily rather than having to search through entire image
collections. A prime example of an image tagging service is
Flickr, where users upload images and then
describe the contents. Other patrons of the site can then search
for those tags. Flickr uses a
folksonomy:
a free-text keyword system in which the community defines the
vocabulary through use rather than through a
controlled vocabulary.
Users can also tag photos for organization purposes using Adobe's
Extensible Metadata
Platform (XMP) language, for example.
Digital photography is increasingly making use of technical
metadata tags describing the conditions of exposure. Photographers
shooting
Camera RAW file formats
can use applications such as
Adobe
Bridge or Apple Computer's
Aperture to work with camera
metadata for post-processing.
Geospatial metadata
Metadata that describe geographic objects (such as datasets, maps,
features, or simply documents with a geospatial component) have a
history going back to at least 1994 (refer
MIT Library page on FGDC Metadata). This class
of metadata is described more fully on the
Geospatial metadata page.
Meta-metadata
Since metadata are also data, it is possible to have metadata of
metadata–"meta-metadata". What is machine-generated meta-metadata,
such as the reversed index created by a free-text search engine, is
generally not considered metadata, though.
Metadata and the Law
United States
Problems
involving metadata in litigation in the
United
States
are becoming widespread. Courts have looked
at various questions involving metadata, including the
discoverability of metadata by parties. Although the
Federal Rules of Civil
Procedure have only specified rules about electronic documents,
subsequent case law has elaborated on the requirement of parties to
reveal metadata.
See also
References
- Hoberman, Steve, Data Modeling Made Simple, 2nd
Edition, Technics Publications, LLC, 2009, page 313
- Liddell & Scott, An Intermediate Greek-English
Lexicon, OUP, pp. 500ff.
- James Martin, Strategic Data Planning Methodologies,
Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1982, p.127
- American Library Association, Task Force on Metadata
Summary Report, June 1999
- D. C. A. Bulterman, Is It Time For a Moratorium on Metadata?,
IEEE MultiMedia, Oct-Dec 2004
- William R. Durrell, Data Administration: A Practical Guide
to Data Administration, McGraw-Hill, 1985
- Bretherton, F. P. and Singley, P. T. 1994, Metadata: A User's
View, Proceedings of the International Conference on Very Large
Data Bases (VLDB), 1091-1094
- David Marco, Building and Managing the Meta Data Repository: A
Full Lifecycle Guide, Wiley, 2000, ISBN 0-471-35523-2
- David C. Hay, Data Model Patterns: A Metadata Map, Morgan
Kaufman, 2006, ISBN 0-12-088798-3
- R. Todd Stephens (2003). Utilizing Metadata as a Knowledge
Communication Tool. Proceedings of the International Professional
Communication Conference 2004. Minneapolis, MN: Institute of
Electrical and Electronics Engineers, Inc.
- Inmon, W.H. Tech Topic: What is a Data Warehouse? Prism
Solutions. Volume 1. 1995.
(http://en.wikipedia.org/wiki/Data_warehouse)
- Ralph
Kimball,The Data Warehouse Lifecycle Toolkit, Second
Edition. New York, Wiley, 2008, ISBN 978-0-470-14977-5, page 10,
115-117,131-132, 140, 154-155
- Matteno Golfarelli and Stefano Rizzi, Data Warehouse Design:
Modern Principles and Methodologies,McGraw-Hill; 1st edition, ISBN
978-0-07-161039-1, page 25
- http://www.odl.ox.ac.uk/metadata.htm
http://www.cs.cornell.edu/wya/DigLib/MS1999/Chapter4.html
External links