The
World Wide Web, abbreviated as
WWW and
W3 and commonly known as
The Web, is a system of interlinked
hypertext documents contained on the
Internet. With a
web
browser, one can view
web pages that
may contain
text,
images,
videos, and other
multimedia and navigate between them
using
hyperlinks.
Using concepts from
earlier hypertext systems, English
physicist
Sir Tim Berners-Lee, now the
Director of the World Wide Web
Consortium, wrote a proposal in March 1989 for what would
eventually become the World Wide Web. He was later joined by
Belgian computer scientist Robert
Cailliau while both were working at CERN
in Geneva
, Switzerland
. In 1990, they proposed using "HyperText
[...] to link and access information of various kinds as a web of
nodes in which the user can browse at will", and released that web
in December.
"The World-Wide Web (W3) was developed to be a pool of human
knowledge, which would allow collaborators in remote sites to share
their ideas and all aspects of a common project." If two projects
are independently created, rather than have a central figure make
the changes, the two bodies of information could form into one
cohesive piece of work.
History
In March 1989,
Tim Berners-Lee wrote
a proposal that referenced
ENQUIRE, a
database and software project he had built in 1980, and described a
more elaborate information management system. With help from
Robert Cailliau, he published a more
formal proposal (on November 12, 1990) to build a "
Hypertext project" called "WorldWideWeb" (one
word, also "W3") as a "web" of "hypertext documents" to be viewed
by "
browsers", using a
client-server architecture. This proposal
estimated that a read-only web would be developed within three
months and that it would take six months to achieve, "the creation
of new links and new material by readers, [so that] authorship
becomes universal" as well as "the automatic notification of a
reader when new material of interest to him/her has become
available". See
Web 2.0 and
RSS/
Atom, which have
taken a little longer to mature.
The proposal had been modeled after the
Dynatext SGML reader, by
Electronic Book Technology, a spin-off from the Institute for
Research in Information and Scholarship at Brown University. The
Dynatext system, licensed by CERN, was technically advanced and was
a key player in the extension of SGML ISO 8879:1986 to Hypermedia
within
HyTime, but it was considered too
expensive and had an inappropriate licensing policy for use in the
general high energy physics community, namely a fee for each
document and each document alteration.
A
NeXT Computer was used by
Berners-Lee as the world's first
web
server and also to write the first
web
browser,
WorldWideWeb, in 1990. By
Christmas 1990, Berners-Lee had built all the tools necessary for a
working Web: the
first web browser
(which was a web editor as well), the first web server, and the
first web pages which described the project itself. On August 6,
1991, he posted a short summary of the World Wide Web project on
the
alt.hypertext newsgroup. This
date also marked the debut of the Web as a publicly available
service on the Internet.
The first server outside Europe was set up at
SLAC
in December 1991. The crucial underlying
concept of
hypertext originated with older
projects from the 1960s, such as the Hypertext Editing System (HES)
at Brown University--- among others
Ted
Nelson and
Andries van Dam---
Ted Nelson's
Project Xanadu and
Douglas Engelbart's
oN-Line System (NLS). Both Nelson and
Engelbart were in turn inspired by
Vannevar Bush's
microfilm-based "
memex,"
which was described in the 1945 essay "
As We May Think".
Berners-Lee's breakthrough was to marry hypertext to the Internet.
In his book
Weaving The Web, he explains that he had
repeatedly suggested that a marriage between the two technologies
was possible to members of
both technical communities, but
when no one took up his invitation, he finally tackled the project
himself. In the process, he developed a system of globally unique
identifiers for resources on the Web and elsewhere: the Universal
Document Identifier (UDI) later known as
Uniform Resource Locator (URL) and
Uniform Resource
Identifier (URI); and the publishing language
HyperText Markup Language (HTML);
and the
Hypertext Transfer
Protocol (HTTP).
The World Wide Web had a number of differences from other hypertext
systems that were then available. The Web required only
unidirectional links rather than bidirectional ones. This made it
possible for someone to link to another resource without action by
the owner of that resource. It also significantly reduced the
difficulty of implementing web servers and browsers (in comparison
to earlier systems), but in turn presented the chronic problem of
link rot. Unlike predecessors such as
HyperCard, the World Wide Web was
non-proprietary, making it possible to develop servers and clients
independently and to add extensions without licensing restrictions.
On April
30, 1993, CERN
announced
that the World Wide Web would be free to anyone, with no fees
due. Coming two months after the announcement that the
Gopher protocol was no longer free
to use, this produced a rapid shift away from Gopher and towards
the Web. An early popular web browser was
ViolaWWW, which was based upon
HyperCard.
Scholars
generally agree that a turning point for the World Wide Web began
with the introduction of the Mosaic web browser in 1993, a graphical
browser developed by a team at the National Center
for Supercomputing Applications at the University of Illinois at
Urbana-Champaign
(NCSA-UIUC), led by Marc
Andreessen. Funding for Mosaic came from the U.S.
High-Performance Computing and Communications Initiative,
a funding program initiated by the
High
Performance Computing and Communication Act of 1991, one
of
several computing
developments initiated by U.S. Senator
Al
Gore. Prior to the release of Mosaic, graphics were not
commonly mixed with text in web pages, and its popularity was less
than older protocols in use over the Internet, such as
Gopher and
Wide Area Information Servers
(WAIS). Mosaic's graphical user interface allowed the Web to
become, by far, the most popular Internet protocol.
The World
Wide Web Consortium (W3C) was founded by Tim Berners-Lee after he
left the European Organization for Nuclear Research (CERN
) in October,
1994. It was founded at the Massachusetts
Institute of Technology
Laboratory for Computer Science (MIT/LCS) with
support from the Defense Advanced
Research Projects Agency (DARPA)—which had pioneered the
Internet—and the European
Commission
. By the end of 1994, while the total number
of websites was still minute compared to present standards, quite a
number of
notable
websites were already active, many of whom are the precursors
or inspiration for today's most popular services.
Connected by the existing Internet, other
websites were created around the world, adding
international standards for
domain names
and the
HTML. Since then, Berners-Lee has
played an active role in guiding the development of web standards
(such as the
markup languages in
which web pages are composed), and in recent years has advocated
his vision of a
Semantic Web. The World
Wide Web enabled the spread of information over the
Internet through an easy-to-use and flexible
format. It thus played an important role in popularizing use of the
Internet. Although the two terms are sometimes
conflated in popular use,
World Wide Web
is not
synonymous with
Internet.
The Web is an application built on top of the Internet.
Function
The terms Internet and World Wide Web are often used in every-day
speech without much distinction. However, the Internet and the
World Wide Web are not one and the same. The Internet is a global
system of interconnected
computer
networks. In contrast, the Web is one of the services that runs
on the Internet. It is a collection of interconnected documents and
other resources, linked by hyperlinks and URLs. In short, the Web
is an
application running on
the Internet. Viewing a
web page on the
World Wide Web normally begins either by typing the
URL of the page into a
web browser, or by following a
hyperlink to that page or resource. The web
browser then initiates a series of communication messages, behind
the scenes, in order to fetch and display it.
First, the server-name portion of the URL is resolved into an
IP address using the global, distributed
Internet database known as the
domain name system, or DNS. This IP
address is necessary to contact the
Web
server. The browser then requests the resource by sending an
HTTP request to the Web
server at that particular address. In the case of a typical web
page, the
HTML text of the page is requested
first and
parsed immediately by the web
browser, which then makes additional requests for images and any
other files that form parts of the page. Statistics measuring a
website's popularity are usually based either on the number of
'
page views' or associated server
'
hits' (file requests) that take
place.
While receiving these files from the web server, browsers may
progressively
render the page onto the
screen as specified by its HTML,
CSS, and other web languages. Any
images and other resources are incorporated to produce the
on-screen web page that the user sees. Most web pages will
themselves contain
hyperlinks to other
related pages and perhaps to downloads, source documents,
definitions and other web resources. Such a collection of useful,
related resources, interconnected via hypertext links, is what was
dubbed a "web" of information. Making it available on the Internet
created what
Tim Berners-Lee first
called the
WorldWideWeb (in its original
CamelCase, which was subsequently discarded) in
November 1990.
What Does W3 Define?
W3, or www, stands for many different things. The main topics
being:
- The idea of a boundless information world in which all items
have a reference by which they can be retrieved;
- the address system (URI) which the project implemented to make
this world possible, despite many different protocols;
- a network protocol (HTTP) used by native W3 servers giving
performance and features not otherwise available;
- a markup language (HTML) which every W3 client is required to
understand, and is used for the transmission of basic things such
as text, menus and simple on-line help information across the
net;
- the body of data available on the Internet using all or some of
the preceding listed items.
Linking
Over time, many web resources pointed to by hyperlinks disappear,
relocate, or are replaced with different content. This phenomenon
is referred to in some circles as "
link
rot" and the hyperlinks affected by it are often called
"
dead links". The ephemeral nature of the
Web has prompted many efforts to archive web sites.
The Internet
Archive
is one of the best-known efforts; it has been
active since 1996.
Ajax updates
JavaScript is a
scripting language that was
initially developed in 1995 by
Brendan
Eich, then of
Netscape, for use within
web pages. The standardized version is
ECMAScript. To overcome some of the limitations
of the page-by-page model described above, some web applications
also use
Ajax (
asynchronous JavaScript and
XML). JavaScript is delivered with the page that can
make additional HTTP requests to the server, either in response to
user actions such as mouse-clicks, or based on lapsed time. The
server's responses are used to modify the current page rather than
creating a new page with each response. Thus the server only needs
to provide limited, incremental information. Since multiple Ajax
requests can be handled at the same time, users can interact with a
page even while data is being retrieved. Some web applications
regularly
poll the server
to ask if new information is available.
WWW prefix
Many
web addresses begin with
www, because of the long-standing practice of naming
Internet hosts (servers) according to the services they provide.
So, the host name for a
web server is
often
www as it is
ftp for an
FTP server, and
news or
nntp
for a
USENET news
server etc. These host names then appear as
DNS subdomain
names, as in "www.example.com". The use of such subdomain names is
not required by any technical or policy standard; indeed, the first
ever web server was called "nxoc01.cern.ch", and many web sites
exist without a
www subdomain prefix, or with some other
prefix such as "www2", "secure" etc. These subdomain prefixes have
no consequence; they are simply chosen names. Many web servers are
set up such that both the domain by itself (e.g., example.com) and
the
www subdomain (e.g., www.example.com) refer to the
same site, others require one form or the other, or they may map
two different web sites.
When a single word is typed into the address bar and the
return key is pressed, some web browsers
automatically try adding "www." to the beginning of it and possibly
".com", ".org" and ".net" at the end. For example, typing 'apple'
may resolve to
http://www.apple.com/'' and 'openoffice' to
''http://www.openoffice.org''. This feature was beginning to be
included in early versions of Mozilla [[Firefox]] (when it still
had the working title 'Firebird') in early 2003.{{cite web
|url=http://forums.mozillazine.org/viewtopic.php?f=9&t=10980
|title=automatically adding www.___.com |publisher=mozillaZine
|date=May 16th, 2003 |accessdate=May 27, 2009 }} It is reported
that Microsoft was granted a US patent for the same idea in 2008,
but only with regard to mobile devices.{{cite web
|url=http://www.techdirt.com/articles/20080626/0203581527.shtml
|title=Microsoft Patents Adding 'www.' And '.com' To Text
|publisher=Techdirt |last=Masnick |first=Mike |date=July 7th 2008
|accessdate=May 27, 2009 }} The 'http://' or 'https://' part of web
addresses does
have meaning: These refer to Hypertext Transfer Protocol and
to HTTP Secure and so define the
communication protocol that will be used to request and receive the
page and all its images and other resources. The HTTP
network protocol is fundamental to the way the World Wide Web
works, and the encryption involved in HTTPS adds an essential layer
if confidential information such as passwords or bank details are
to be exchanged over the public internet. Web browsers
often prepend this 'scheme' part to URLs too, if it is
omitted. Despite this, Berners-Lee himself has admitted
that the two 'forward slashes' (//) were in fact initially
unnecessary. In overview, RFC 2396 defined web URLs to
have the following form: ://?#.Here is for example the web
server (like www.example.com),and identifies the web page. The web
server processes the (which can be data sent via a form, e.g. terms
sent to a search engine),and the returned page depends on
it.Finally, is not sent to the web server. It identifies the
portion of the page which the browser shows first.
In
English,
www is pronounced by individually
pronouncing the name of characters (
double-u double-u
double-u). Although some technical users pronounce it
dub-dub-dub this is not widespread. The English writer
Douglas Adams once quipped in
The Independent on Sunday (1999): "The World
Wide Web is the only thing I know of whose shortened form takes
three times longer to say than what it's short for," with Stephen
Fry later pronouncing it in his "Podgrammes" series of podcasts as
"wuh wuh wuh." It is also interesting that in Mandarin
Chinese,
World Wide Web is
commonly translated via a
phono-semantic matching to
wàn
wéi wǎng ( ), which satisfies
www and literally means
"myriad dimensional net", a translation that very appropriately
reflects the design concept and proliferation of the World Wide
Web. Tim Berners-Lee's web-space states that
World Wide
Web is officially spelled as three separate words, each
capitalized, with no intervening hyphens.
Safety
Privacy
Computer users, who save time and money, and who gain conveniences
and entertainment, may or may not have surrendered the right to
privacy in exchange for using a number of
technologies including the Web. Worldwide, more than a half billion
people have used a
social network
service, and of Americans who grew up with the Web, half
created an online profile and are part of a generational shift that
could be changing norms.
Facebook
progressed from U.S. college students to a 70% non-U.S. audience,
and in 2009 prior to launching a beta test of the "transition
tools" to set privacy preferences, estimated that only 20% of its
members use privacy settings.
Privacy representatives from 60 countries have resolved to ask for
laws to complement industry self-regulation, for education for
children and other minors who use the Web, and for default
protections for users of social networks. They also believe data
protection for
personally identifiable
information benefits business more than the sale of that
information. Users can opt-in to features in browsers to clear
their personal histories locally and block some
cookies and
advertising networks but they are still
tracked in websites'
server logs, and
particularly
web beacons. Berners-Lee
and colleagues see hope in accountability and appropriate use
achieved by extending the Web's architecture to policy awareness,
perhaps with audit logging, reasoners and appliances. Among
services paid for by
advertising,
Yahoo! could collect the most data about
users of commercial websites, about 2,500 bits of information per
month about each typical user of its site and its affiliated
advertising network sites. Yahoo! was followed by
MySpace with about half that potential and then by
AOL–
TimeWarner,
Google,
Facebook,
Microsoft, and
eBay.
Security
The Web has become criminals' preferred pathway for spreading
malware. Cybercrime carried out on the Web
can include
identity theft, fraud,
espionage and intelligence gathering. Web-based vulnerabilities now
outnumber traditional computer security concerns, and as measured
by
Google, about one in ten web pages may
contain malicious code. Most Web-based attacks take place on
legitimate websites, and most, as measured by
Sophos, are hosted in the United States, China and
Russia. The most common of all malware threats is
SQL injection attacks against websites.
Through HTML and URIs the Web was vulnerable to attacks like
cross-site scripting (XSS) that
came with the introduction of JavaScript and were exacerbated to
some degree by Web 2.0 and Ajax
web
design that favors the use of scripts. Today by one estimate,
70% of all websites are open to XSS attacks on their users.
Proposed solutions vary to extremes. Large security vendors like
McAfee already design governance and
compliance suites to meet post-9/11 regulations, and some, like
Finjan have recommended active real-time
inspection of code and all content regardless of its source. Some
have argued that for enterprise to see security as a business
opportunity rather than a cost center, "ubiquitous, always-on
digital rights management" enforced in the infrastructure by a
handful of organizations must replace the hundreds of companies
that today secure data and networks.
Jonathan Zittrain has said users sharing
responsibility for computing safety is far preferable to locking
down the Internet.
Availability
Standards
Many formal standards and other technical specifications define the
operation of different aspects of the World Wide Web, the Internet,
and computer information exchange. Many of the documents are the
work of the
World Wide Web
Consortium (W3C), headed by Berners-Lee, but some are produced
by the
Internet
Engineering Task Force (IETF) and other organizations.
Usually, when web standards are discussed, the following
publications are seen as foundational:
Additional publications provide definitions of other essential
technologies for the World Wide Web, including, but not limited to,
the following:
- Uniform Resource Identifier (URI), which is a universal
system for referencing resources on the Internet, such as hypertext
documents and images. URIs, often called URLs, are defined by the
IETF's RFC 3986 / STD 66: Uniform Resource Identifier (URI):
Generic Syntax, as well as its predecessors and numerous
URI scheme-defining RFCs;
- HyperText Transfer Protocol (HTTP), especially as
defined by RFC 2616: HTTP/1.1 and RFC 2617: HTTP
Authentication, which specify how the browser and server
authenticate each other.
Accessibility
Access to the Web is for everyone regardless of
disability including visual, auditory, physical,
speech, cognitive, and neurological. Accessibility features also
help others with temporary disabilities like a broken arm and an
ageing population as their abilities change. The Web is used for
receiving information as well as providing information and
interacting with society, making it essential that the Web be
accessible in order to provide equal access and
equal opportunity to people with
disabilities. Tim Berners-Lee once noted, "The power of the Web is
in its universality. Access by everyone regardless of disability is
an essential aspect." Many countries regulate
web accessibility as a requirement for
websites. International cooperation in the W3C
Web Accessibility Initiative
led to simple guidelines that web content authors as well as
software developers can use to make the Web accessible to persons
who may or may not be using
assistive technology.
Internationalization
The W3C
Internationalization
Activity assures that web technology will work in all languages,
scripts, and cultures. Beginning in 2004 or 2005,
Unicode gained ground and eventually in December
2007 surpassed both
ASCII and Western European
as the Web's most frequently used
character encoding. Originally RFC 3986
allowed resources to be identified by
URI in a
subset of US-ASCII. RFC 3987 allows more characters—any character
in the
Universal Character
Set—and now a resource can be identified by
IRI in any
language.
Statistics
According to a 2001 study, there were massively more than 550
billion documents on the Web, mostly in the invisible Web, or
deep Web. A 2002 survey of 2,024 million
Web pages determined that by far the most Web content was in
English: 56.4%; next were pages in German (7.7%), French (5.6%),
and Japanese (4.9%). A more recent study, which used Web searches
in 75 different languages to sample the Web, determined that there
were over 11.5 billion Web pages in the
publicly indexable Web as of the end of January
2005. , the indexable web contains at least 25.21 billion pages. On
July 25, 2008, Google software engineers Jesse Alpert and Nissan
Hajaj announced that
Google Search had
discovered one trillion unique URLs. , over 109.5 million websites
operated. Of these 74% were commercial or other sites operating in
the
.com generic
top-level domain.
Technology
Speed issues
Frustration over
congestion issues in the
Internet infrastructure and the high
latency that results in slow
browsing has led to an alternative, pejorative name for the World
Wide Web: the
World Wide Wait. Speeding up the Internet is
an ongoing discussion over the use of
peering and
QoS
technologies. Other solutions to reduce the World Wide Wait can be
found at W3C. Standard
guidelines for
ideal Web response times are:
- 0.1 second (one tenth of a second). Ideal response time. The
user doesn't sense any interruption.
- 1 second. Highest acceptable response time. Download times
above 1 second interrupt the user experience.
- 10 seconds. Unacceptable response time. The user experience is
interrupted and the user is likely to leave the site or
system.
Caching
If a user revisits a Web page after only a short interval, the page
data may not need to be re-obtained from the source Web server.
Almost all web browsers
cache recently
obtained data, usually on the local hard drive. HTTP requests sent
by a browser will usually only ask for data that has changed since
the last download. If the locally cached data are still current, it
will be reused. Caching helps reduce the amount of Web traffic on
the Internet. The decision about expiration is made independently
for each downloaded file, whether image,
stylesheet,
JavaScript, HTML, or whatever other content the
site may provide. Thus even on sites with highly dynamic content,
many of the basic resources only need to be refreshed occasionally.
Web site designers find it worthwhile to collate resources such as
CSS data and JavaScript into a few site-wide files so that they can
be cached efficiently. This helps reduce page download times and
lowers demands on the Web server.
There are other components of the Internet that can cache Web
content. Corporate and academic
firewalls often cache Web resources
requested by one user for the benefit of all. (See also
Caching proxy server.) Some
search engines, such as
Google or
Yahoo!, also store
cached content from websites. Apart from the facilities built into
Web servers that can determine when files have been updated and so
need to be re-sent, designers of dynamically generated Web pages
can control the HTTP headers sent back to requesting users, so that
transient or sensitive pages are not cached.
Internet banking and news sites frequently
use this facility. Data requested with an
HTTP 'GET' is likely to be
cached if other conditions are met; data obtained in response to a
'POST' is assumed to depend on the data that was POSTed and so is
not cached.
See also
Notes
- "
- Wardrip-Fruin, Noah and Nick Montfort, ed (2003). The New Media
Reader. Section 54. The MIT Press. ISBN 0-262-23227-8.
- # ^ Wardrip-Fruin, Noah and Nick Montfort, ed (2003). The New
Media Reader. The MIT Press. ISBN 0-262-23227-8.
- http://news.bbc.co.uk/1/hi/technology/8306631.stm
- and
- in
- and AJAX web applications can introduce security
vulnerabilities like "client-side security controls, increased
attack surfaces, and new possibilities for Cross-Site Scripting
(XSS)", in which cites
References
External links