Web caching can be done solely using an IPCop server. It may also be done on a separate server with authentication, allowing monitoring of who visits what sites.

There are good reasons to cache web data, on all sites using squid it has been shown that between 12-50% of data requested by web browsers can be recalled from cache. This has numerous effects, it provides much better use of the Internet connecting, in effect possibly doubling the capacity. It also increases the speed of web access for the local clients as often the images and web pages will be pulled from memory on a local computer, rather than from a hard drive on a computer in a remote part of the world.

There are a number of proxy servers available for UNIX and Linux. I am solely going to concentrate on squid for ease of use. Squid is in wide use and is used for caching and web filtering for all states schools in the London Boroughs, it is also used in many small and large organisations.

Squid http proxy is part of most Linux distributions. The default installation is normally set up with access only from the local machine. The configuration file is heavily commented and very extensive. Squid Cache is a complex and very capable piece of software, as such the configuration may seem daunting at first. Squid is available is available from here: http://www.squid-cache.org

An example configuration is available here. There is extensive documentation available for Squid and a mailing list where many of the queries you may have are covered in detail many times before :-)

What people often get stuck on is the ACLs (Access Control Lists). ACLs are important, for if the proxy is left open to all, it will be abused by mail spammers and others.

It is possible to have password authentication against a samba server, a Windows computer (NT W2K or XP), htpasswd files, Novell servers and LDAP servers. I have probably missed a few – check the documentation.

It is possible to provide continuous logins against a samba server or NT server using NTLM logins, check again for how to do this in the docs.

The ACLs were misunderstood by me for a long time, I had assumed that the ACLs were an OR argument rather than AND.

So for example if you create an ACL for the local network and a colleague’s network to allow both of these you would have to do the following:

acl local src
acl colleague src

http_access allow local
http_access allow colleague

The following would NOT work

http_access allow local colleague

The second would not work because a computer / user would never be in both networks

The ACLs are stackable and so Squid will work through them in order to get a match; hence you can order them to make finer adjustments of the ACLs

You might then want to allow remote users passworded access if they are not on either of the two networks. If the password ACL was put first, everyone would be asked for a password, which would be undesirable.

auth_param basic program /usr/lib/squid/pam_auth
auth_param basic children 5
auth_param basic realm Squid proxy-caching web server
auth_param basic credentialsttl 2 hours

The realm is required and here the authentication is set to last for two hours before e re-request.

So here we have pam authentication. Pam is modular and allows authentication against a whole range of services. Note that you also have to make a squid entry for pam in /etc/pam.d/squid consisting of

auth required /lib/security/pam_unix.so
account required /lib/security/pam_unix.so

You will also need to set the file /usr/lib/squid/pam_auth to be SUID. This is so when the squid user executes the file it is run with root permissions and authentication is granted.

I have been caught on this more than once as it works well when testing as root :-)

to change the file to SUID execute the following at a root prompt:

chmod 4755 /usr/lib/squid/pam_auth

Each user on the system with a password will be able to use the proxy.

I have found pam a bit flakey for authenticating against a samba or NT server and here it is better to use the NTLM authentication. In addition NTLM authentication is built in to Internet Explorer and consequently the browser uses the login details and authenticates against the Samba server you have set up.

acl password proxy_auth REQUIRED
http_access allow password

WPAD – Web Proxy Auto Discovery

You may ask what an earth is this for! This is the check box that is set by default in Windows Internet Explorer called “Detect Settings Automatically”

WPAD may be provided by a number of methods. DHCP, DNS (by A record), DNS (by SVR record)

I have found very limited documentation of WPAD by DHCP and have not managed to implement it.

WPAD using DNS by A record is relatively straight forward once you know the pitfalls :-)

WPAD by SVR record is not implemented in many clients.

A client set to try to discover their own proxys looks at its own FQDN (Fully Qualiffied Domain Name) such as workstation1.domain.co.uk and will look for the following hosts wpad.domain.co.uk then wpad.co.uk It looks for the A record of this host.

It will then try to download the following file


This is important if you are running virtual hosts.

The client does not request a domain when it asks for the wpad.dat file, it only forwards the IP address (or at least IE6 does), therefore if you are running multiple websites on that IP address, the wpad.dat file must be placed in the default website. In short you must be able to access it by IP.

The dat file would be similar to this, however it may be much more complex:

  function FindProxyForURL(url, host) 
  if (isPlainHostName(host) ||
      dnsDomainIs(host, "domain.co.uk") ||
      isInNet(host, "", ""))
      return "DIRECT" ;
  return "PROXY proxy.domain.co.uk:3128 ; PROXY" ;

At present there is a patch available for Mozilla. which will allow it to do WPAD; however for Mozilla it may be easier to copy the wpad.dat file to proxy.pac file on the same server and set Mozilla to use the URL http://ipaddress/proxy.pac as an automatic configuration URL (see Mozilla’s settings).