Web Filtering

The http caching that has been reviewed here uses Squid Cache. Similarly for filtering we will look at configuration details and modules to be used with Squid will be reviewed here.

* url_regex
* Squid Guard

url_regex

This is integrel to Squid and grabs url segments from a specified file and if a match occurs squid will either allow or disallow dependent upon the configuration. Here is a segment from the example squid file and an example of a banned file.

acl filter url_regex “/etc/squid/banned”
http_access deny filter

Here we have an acl called filter, the type of filter is a url_regex and we use the file /etc/squid/banned.
The http_access is set to deny upon match, as you can see from the example file, this is set to block advert sites and other rubbish. It is easy to add new sites to block just by adding another domain to the list.

After a new entry has been added, squid needs to be told by the following command:
squid -k reconfigure

To restart the http cache, you can run the following command:
service squid restart

Squid Guard

Squid Guard has to be downloaded and compiled. This is easier than it sounds. It is dependant upon having gcc package installed.

It runs as follows:

tar zxvf squidguard-xxxx.tar.gz

cd squidGuard-xxx

./configure

make

The install has to be done as root.
make install

Have a read of the documentation and any other information on the site. You will also have to download and install the block lists. There are a large number of different blacklists available, from porn to violence. These are regularly updated and contain tens of thousands of sites and IPs. These are located normally in /var/spool/squidguard/

The Access Control Lists work very similarly to those in the squid configuration file.

Read what documentation you can. Once you have it up and working it is launched from squid using the re-director config option, have a look at the sample file for details.

Once you have downloaded or changed any of the files, you can rebuild the database files using the command:

squidGuard -C all

You will note that there are blockfiles such as:
drwxr-xr-x 2 squid squid 4096 Mar 3 01:23 ads
drwxr-xr-x 2 squid squid 4096 Feb 11 19:12 aggressive
drwxr-xr-x 2 squid squid 4096 Feb 11 19:12 audio-video
drwxr-xr-x 2 squid squid 4096 Feb 11 19:12 drugs
drwxr-xr-x 2 squid squid 4096 Feb 11 19:12 gambling
drwxr-xr-x 2 squid squid 4096 Feb 11 19:12 hacking
drwxr-xr-x 2 squid squid 4096 Feb 12 18:26 mail

Within these directories you will find files such as:
ls -l /var/spool/squidguard/blacklists/ads
total 184
-rw-r—– 1 squid squid 44500 Mar 3 01:23 domains
-rw-r–r– 1 squid squid 122880 Mar 3 01:24 domains.db
-rw-r–r– 1 squid squid 27 Feb 25 13:18 expressions
-rw-r—– 1 squid squid 3147 Feb 7 23:55 urls
-rw-r–r– 1 squid squid 8192 Mar 3 01:24 urls.db

Note that you have domains and urls, and domains.db and urls.db, these are the database files that are built by the command above.

The blocklists also provide a good list, if you build your ACLs with good then !bad the URL will be accepted if it is found in the good list, even if it is in any of the blacklists.