Memorandum submitted by Dr Richard Clayton
1. My name is Dr Richard Clayton and I am
a researcher in the Security Group of the Computer Laboratory
at the University of Cambridge. Over the past few years I have
published several academic papers on various schemes for blocking
Internet content.[1]
ISP BLOCKING OF
CONTENT
2. The Government (by which I mean senior
civil servants and ministers) appear to be under the impression
that it is now possible to require ISPs to block undesirable Internet
content and that this will be effective.[2]
Since your inquiry may be tempted to endorse blocking by ISPs,
it is important to explain that this impression isin almost
all practical respectsentirely mistaken.
3. Blocking can only practically be done
at individual ISPsthere is no "Internet backbone"
where it can be done for everyone at once. Hence for a blocking
scheme to work at a national level in the UK, every single ISP
(and there are nearly 70 major suppliers, and quite a number more
of smaller ones), needs to deploy suitable equipment. Since all
of these ISPs have different network designs, they would each
need to design their own particular scheme for content blocking.
However, despite all this complexity, it is possible to distinguish
four basic schemes by which blocking can be done.
4. The first is to block particular IP addressesparticular
machines on the Internet. Schemes that do this are reasonably
low cost if only a few machines (a few thousand) are to be blocked,
but can become very expensive indeed if tens of thousands of machines
are to be blocked (leaving aside the question as to where such
an extensive list might come from). The key problem with this
blocking method is that many machines are shared between multiple
content providers. Thus a machine in Romania might be hosting
not only some undesirable content such as DraculaBitesYou.com,
but also the Transylvanian Tourist Board. Both sites will have
the same IP address and so both would be blockedwhich is
clearly undesirable for the Romanian tourist industry. Similarly,
IP address blocking could only block the whole of FaceBook, or
geocities.com, rather than individual parts of these enormous
sites.
5. The second scheme is to block particular
domains by arranging that an invalid response is given when the
domain name is "resolved" to give the IP address where
it is currently hosted. This permits the blocking of DraculaBitesYou.com
(which would fail to be resolved properly) whereas the Tourist
Board could still be available. This scheme is very low cost,
and "scales" wellviz: it is still low
cost when tens of thousands of domain names are to be blocked.
However, it is still ineffective if it desired to block only one
part of a large website such as FaceBook or GeoCities.
6. The third scheme involves the use of
"proxy" machines. Here the ISP arranges that instead
of connections being made directly to the remote machine, the
connection is forced to be made to an ISP machine (the proxy)which
pretends to be the remote machine. This proxy relays all of the
traffic to and from the remote machine, but can apply filtering
rules as material passes by. This scheme can be made very precise,
individual images from a page can be suppressed, and there are
none of the potential "overblocking" issues that occur
with the previous two schemes. However, it is an extremely expensive
scheme for an ISP to implement, because all of the traffic must
be handled by the proxy (which must therefore be a very capable
and hence very costly device). Also, in practiceto avoid
"single points of failure"ISPs will need to purchase
multiple proxies, and this only adds to the cost.
7. The fourth scheme is that of "deep
packet inspection" whereby traffic is examined by devices
at the ISP as they pass the packets of data to and from the customers.
Unlike the previous scheme, there is no proxy masquerading as
the remote machineinstead, if a "bad" connection
is being made, further packets will be discarded (or perhaps some
extra "reset" packets will be introduced to persuade
the machines to close the connection). Deep packet inspection
is expensive, albeit not as expensive as using proxies. As a scheme
it is famous because it is used for censoring the Internet by
the Chinese Governmentas a key part of their "Great
Firewall of China"and it is also currently being used
by Comcast, the American ISP, to block some types of peer-to-peer
file-sharing traffic.[3]
8. It is also possible to create hybrid
schemes, of which the most famous is BT's Anti-Child-Abuse Initiative,
commonly known as CleanFeed. In the CleanFeed system
traffic to particular IP addresses is specially treated. However,
instead of simply being blocked (as in the first scheme above),
it is instead passed through a proxy machine which then applies
the necessary filtering. Hence it is an amalgam of schemes 1 and
3 and is a distinct improvement on both of them. Essentially it
combines all the exactness of the proxy scheme, but it need only
handle a small proportion of the traffic, and so the proxy machines
can be smaller and cheaper: the traffic to both DraculaBitesYou.com
and the Transylvanian Tourist Board is redirected to the proxy,
and the proxy then permits unfettered access to the tourist information
whilst blocking inappropriate pages from the Dracula site.
9. Unfortunately, although CleanFeed
is a good engineering design, it suffers from a significant problem
in that it is possible to reverse engineer the list of sites that
are being blocked. If this list is of paedophile sites (as is
currently the case in the BT deployment) then this permits people
who are interested in such material to obtain information about
places to visit which they might not otherwise have known about.
This runs counter to the public policy aims of the system.
10. CleanFeed's problems aside, the
real problem with all of the blocking schemes is that they are
all pretty trivial to evade. Two generic ways of avoiding blocking
are encryption and proxy services.
11. If traffic is encrypted this defeats
any system that relies on looking at the packets to determine
what they contain, either by deep packet inspection or within
a proxy. Since a lot of traffic is encrypted anyway (when people
are doing online shopping, or using a webmail system) it isn't
possible to just decide to block encrypted material. It is worth
noting in passing that encryption is becoming very widespread
in the peer-to-peer world of file sharing. This is because of
the deployment of peer-to-peer blocking systems and "traffic
shaping" (systems that slow down peer-to-peer traffic). The
use of encryption allows file sharing to continue at full speedhence
its growing popularity. It must be expected that if content blocking
disrupted what people wished to do on the Internet then a similar
evolutionary process would occur and encryption would rapidly
become very popular so that access to the content could continue.
12. Proxy services work just like the ISP
proxies already described, except by being located in another
jurisdiction, they are not bound by UK rules on what to block,
so they will not filter anything but will allow unfettered access
to content. Since the connections to these proxies are encrypted,
and proxies have legitimate uses for people who need anonymityso
that blocking them would be unreasonablethey provide a
simple and easy-to-use way of evading blocking schemes. Their
only downside (apart from making connections slightly slower)
is that the providers of proxies usually charge for usagealthough
there are free systems available.
13. There are more complex anonymity systems
available such as Tor,[4]
originally developed by the US Navy to permit them to surf the
web without revealing their identity, and which currently has
several hundred thousand users. Someone who uses Tor will also
evade local content blocking schemesand if the "exit
node" is in an appropriate jurisdiction, will evade blocking
altogether.
14. Besides these generic evasion schemes,
other more specific methods of evasion are possible: for example,
if ISP A runs CleanFeed then it will be correctly resolving domain
names; so the customers of ISP Bwhere blocking is done
by arranging for incorrect name resolutioncan use ISP A's
resolvers and thereby evade B's scheme!
15. Furthermore, all of the discussion so
far has assumed that it is possible to construct a list of what
is to be blocked in the first place, and that's considerably more
difficult than it seems.
16. A taste of the difficulties can be seen
by examining the criminals who set up fake bank websites (phishing)
or who illegally sell pharmaceuticals online. It is far from uncommon
for them to purchase thousands of domain names (with a complaisant
registrar, they can avoid paying for them, provided they recycle
the domains within five days)[5]
and then arrange for all of the domains to point at a single websitethey
then send out their spam to advertise their wares, and anyone
who wants to block access must learn the thousands of domain names,
and arrange to block them all.
17. The other scheme that is now widely
used by the most successful phishing gangs is called "fast-flux".
Here it is arranged that a domain resolves to a particular IP
address for just a few minutes, thereafter it resolves to another
machine for a few more minutes, and so on. The machines are not
owned by the criminals, but are consumer machines onto which they
have planted some malicious software (such as might be done to
recruit them into a "botnet"). This software will either
serve up the content directly, or will cause the machines to act
as a proxy and relay traffic to the criminals' actual website.
18. Thus it can be seen that it is relatively
straightforward to make the blocking systems' problem become that
of blocking thousands of domains, which resolve to thousands of
addresses, with this all changing dynamically every few minutes.
Quite clearly, this is a difficult problem and would make creating
and distributing blocking lists very difficult indeed. The banks
(who try to remove phishing websites) do manage to tackle their
similar problem, with websites removed within a day or sobut
they have the law on their side: the phishing website is illegal
in every jurisdiction. If what the blocking system is preventing
access to is legal content where it is hosted, then it is far
less likely to be removed. An example of this sort of difficulty
occurs even with indecent images of childrenoften thought
to be a clear example of material that is illegal everywhere.
However, if the images have been computer-generated, then the
United States Supreme Court held that these were lawful[6]
(no child was harmed in their production), whereas they are illegal
to make or possess in the United Kingdom (where we assume that
they incite viewers into moving on to further activity that does
involve real children).
19. Nonetheless, despite all of the issues
I have described above, and despite informed criticism from the
ISP industry, the UK Government continues to believe that it is
possible to block Internet content. The Home Office has instructed
UK consumer ISPs to block all websites on the Internet Watch Foundation
list (these are sites that the IWF is unable to get removed from
the Internet, occasionally because the content is lawful where
they are hosted, but mainlyin my viewbecause of
the ineffective manner in which they contact the hosting sites,
passing reports via multiple law enforcement agencies which seldom
leads to timely action). This blocking is, in my view, for the
reasons set out above, rather a waste of time and money, and I
urge you not to consider extending this system or endorsing it
in any way.
END-USER
BLOCKING SYSTEMS
20. What does make some sense, and I believe
you should be seriously considering the complex issues that surround
it, is the voluntary use of blocking software on end-user machines.
This software can have direct access to the requests being made
by the user, and the data that is returned. It should therefore,
in principle, be unaffected by the use of encryption or of proxy
systems.
21. Furthermore, because it is specific
to a particular machine, or even to a particular person using
the machine, it can be customised to block precisely what is appropriate.
It could, for example, block different sites for an eight-year
old boy, for his 14-year old sister, whilst permitting an adult
to view what they wished. ISPs are of course unaware of who is
using each of the machines within a particular house, and what
their age is, and so ISP-level blocking can only be "on"
or "off" for everyone.
22. However, there are a number of generic
problems with end-user blocking software.
23. Some brands are better than othersbut
there is currently no way for a potential purchaser to distinguish.
The industry has been developing a BSI "kite-mark" for
several years, but the launch continues to be delayed and so consumers
cannot tell if a product meets a minimum standard. Previous surveys
by Which? have indicated that many do nota 2005 report
by Intertek summarises the situation across Europe.[7]
24. Some filtering software is trivially
easy to circumvent or to switch off altogether, and of course
this is one of the issues as to whether or not it is "fit
for purpose" and should be given a kite-mark! Although considerable
skill may be needed to first discover how to do this, it is the
sort of information that will rapidly circulate in chat-rooms
and playgrounds, so "obscurity" cannot be seen as an
alternative to robust protection mechanisms. Famously, the system
endorsed by the Australian Government in summer 2007 was circumvented
within 30 minutes by a 16-year-old childand the history
of this type of software is littered with similar incidents.[8]
Nevertheless, in a family situation, it may well be that even
though a filter could be turned off, a reasonably well-behaved
child would not turn it off as a matter of course: so it would
still provide some level of protection and some reassurance to
parents.
25. Most of the filtering software on the
market is fairly simple-minded when considering what should be
blocked. Typical designs contain "blacklists" of sites
that should always be blocked, "whitelists" of sites
that should never be blocked, and then some heuristic rules that
attempt to deduce whether a site that has never been heard of
should or should not be blocked. The lists should be relatively
trouble-free (though one current system is reported to block the
CBeebies site when it is accessed as www.bbc.net.uk/cbeebies rather
than the normal www.bbc.co.uk address), but the heuristics are
generally extremely fallible, and are unable to distinguish discussions
(and images) of breast cancer from breast enhancement, or SuperBowl
XXX sites from "XXX" porn siteshence the use
of the sitelists to fix up the shortcomings.[9]
26. The anecdotal evidence (anecdotal because
the blocking software firms do not publish detailed information
about their products) is that what is blocked is informed by the
concerns of the middle-class, white, Calvinist, Connecticut males
who commissioned the softwareso there is wholesale blocking
of sites discussing homosexuality, the occult or gambling. This
may well be appropriate for many younger children, but many parents
(especially those not from a "WASP" tradition) may not
be so worried about these issues; and in particular blocking of
informational sites on controversial topics may make little sense
for teenagers trying to do research for their schoolwork.
27. In this context, it is very instructive
to look at survey results from the schools blocking system in
the Republic of Ireland (where the schools use a specialist ISP
for Internet access, which implements a centralised blocking system).
Here some 85% of primary school teachers believed that the blocking
system was "just right", whereas in the secondary schools,
this fell to 52%with 40% of survey respondents asking for
a way of overriding the system on a case by case basis.[10]
The lesson here is that blocking systems have significant limitations
for older children where subtle judgments must be made as to whether
particular content should be accessible. In particular, these
judgments depend as much on the context of the access as the nature
of the materialdistinctions that no automated filtering
system can hope to address.
28. It should be noted that although considerable
lip service is paid to the desirability of end-user blocking software,
in practice it is relatively rarely used. The reason for this
appears to be that when it comes down to it, parents are not prepared
to pay ongoing subscriptions for software that contains up-to-date
lists of websites. Without an income stream the software providers
are not prepared to keep their lists current. Trying to break
this vicious circle with Government money would raise questions
of state subsidy and competition lawso unless some sort
of charity comes forward to put serious money behind the shipping
of free software to all, it seems that this software will remain
highly praisedand rather more seldom used.
WEBSITE LABELLING
29. Finally, I wish to briefly comment upon
another Bad Idea, which continues to have remarkable traction
in the UK Government and within the EU Commission. This is the
notion of website self-labelling.
30. The idea is a simple onethere
is much to be said for end-user blocking systems, but they have
difficulties in rating sites they have not previously encountered.
So shouldn't the sites rate themselves and the blocking systems
can use those ratings?
31. This idea first surfaced in the mid-1990s
with an RSACI scheme that was almost identical to their existing
videogame rating system. It later evolved into an ICRA scheme
that had slightly more sensible categories. Meanwhile, since there
were a handful of labelling schemes, the W3C organisation developed
PICS as a way of permitting multiple labels on a single webpage.
32. Labelling was never very popular; although
tens of thousands of websites were labelled, this was only ever
tiny fractions of a percent of the total. The difficulty was that
it was just too hard to label sites correctly, because once you
move away from sites consisting of sales brochures, extremely
complex judgments arise. For example, for a Guy Fawkes themed
site in 1996, I was involved in extensive discussions as to what
rating should be applied to a webpage that discussed 1605 torture
techniques and explained what "hung drawn and quartered"
actually meant.[11]
In practice, any website that carries anything other than completely
innocuous material will face these types of decisions every dayand
will rapidly conclude that spending their staff time on these
complex decisions makes very little economic sense.
33. This abrogation of responsibility for
self-labelling can be seen on the UK Department of Health websitewhich,
in common with many other UK Government sites is ICRA labelled.
However, chapter 19 of the Inquiry into Child Abuse in North Wales
is viewable at http://www.dh.gov.uk/en/Publicationsandstatistics/Publications/Publicatio
nsPolicyAndGuidance/Browsable/DH_4927518 and is labelled "no
potentially offensive language" despite the presence of a
four letter word for sexual intercourse within some reported speechwhich
is precisely what the labelling is supposed to warn about.[12]
34. The bottom line on labelling is that
although it sounds a neat idea, more than a decade of experience
shows that it is completely impractical; take-up has always been
miniscule; when used to pay lip service to political correctness
it is often inaccurate; and the burdens on webspace creators are
so significant that even its advocates find it just too much work.
It is high time that politicians stopped endorsing self-labelling
schemes as an apparently easy out when considering how the web
should be rated. I trust that this inquiry will have more sense!
FINALLY
35. I believe that the above discussion,
though rather long, clearly sets out the technical issues relating
to content blocking systems. If I can be of further assistance
to the inquiry on this, or other technical matters, I would be
pleased to help.
January 2008
1 Richard Clayton: Failures in a Hybrid Content
Blocking System, in George Danezis and David Martin, editors,
Privacy Enhancing Technologies, Fifth International Workshop,
PET 2005, Cavtat, Croatia, May 30-June 1 2005, volume 3856 of
LNCS, pages 78-92, Springer.
Richard Clayton: Anonymity and Traceability in Cyberspace.
Technical Report UCAM-CL-TR-653, University of Cambridge Computer
Laboratory, November 2005.
Richard Clayton, Stephen J Murdoch and Robert NM Watson: Ignoring
the Great Firewall of China, in George Danezis and Philippe
Golle, editors, Privacy Enhancing Technologies, Sixth International
Workshop, PET 2006, Cambridge, UK, 28-30 June 2006, volume 4258
of LNCS, pages 20-35, Springer. Back
2
"Recently, it has become technically feasible for ISPs to
block home users' access to websites irrespective of where in
the world they are hosted" Vernon Coaker, Hansard,
15 May 2006, Column 715W. Back
3
Peter Svensson, Associated Press, Comcast Blocks Some Internet
Traffic, 19 October 2007. http://www.breitbart.com/article.php?id=D8SCEBLG0
For much more detail about Comcast's activity see: http://www.eff.org/wp/detecting-packet-injection Back
4
http://www.torproject.org Back
5
Joel Hruska, Cybersquatting daisy-chain leads to Dell trademark
infringement lawsuit, ars technica, 30 November 2007. http://arstechnica.com/news.ars/post/20071130-cybersquatting-daisy-chain-leads-to-dell-trademark-infringement-lawsuit.html Back
6
Ashcroft v Free Speech Coalition (00-795) 535 US 234 (2002)
198 F.3d 1083, affirmed. Back
7
http://www.anec.org/attachments/ANEC-R&T-2006-ICT-002%20(1).pdf Back
8
Nick Higginbottom and Ben Packham, Student cracks Government's
$84 million porn filter, news.com.au, 26 August 2007. http://www.news.com.au/story/0,23599,22304224-2,00.html Back
9
The ACLU report Censorship in a box contains a number of
other examples of unexpected blocking decisions. http://www.aclu.org/privacy/speech/14915pub20020916.html Back
10
Ronan Byrne: Content Filtering on Ireland's Schools Network:
Delivering a Safer Online Environment for Irish Schools? TERENA
2007, Copenhagen, Denmark, 21-24 May 2007. http://tnc2007.terena.org/programme/presentations/show.php?
pres_id=39 Back
11
For a longer version of this section, along with hyperlinks to
relevant material, see: http://www.lightbluetouchpaper.org/2007/09/17/web-content-labelling/ Back
12
http://www.icra.org/cgi-bin/rdf-tester/labelTester.cgi?lang=en&url=http%3A%2F%2Fwww.dh.gov.uk%2Fen%2FPublic
ationsandstatistics%2FPublications%2FPublicationsPolicyAndGuidance%2FBrowsable%2FDH_4927518 Back
|