This is the result of a little personal project of mine, to explore how useful using a full-blown RDBMS would be when building an INDEX for the FreeBSD ports, plus teach myself something about designing database schemas and writing SQL in general. I have previous form in this area, so fiddling about with data from ports seemed an ideal model system. One eventual result might be to add alternate backing stores for the FreeBSD::Portindex cache, but that's something for another day. It has resulted in some interesting ideas about how I can improve FreeBSD::Portindex in other ways though.
The current version is 1090 released on 11th June 2012.
[Download]
[Checksum]
Revision 1090 (2012-06-11) [Download] [Checksum]
Fix dumb mistake in the previous release. This gives correct UNIQUENAME results now.
Revision 1087 (2012-06-09) [Download] [Checksum]
Track the UNIQUENAME of ports. Add uniquename and uniquename_dependency tables, plus a sample SQL query listing all the ports using non-unique UNIQUENAME values.
Revision 1069 (2012-06-01) [Download] [Checksum]
Some make variables are lists of keywords: occasionally a keyword appears multiple times in a list (which is an error, really.) This causes the scan phase to fail with a uniqueness constraint violation. Unique the elements of some list-variables that have been seen to be affected.
Revision 972 (2012-01-29) [Download] [Checksum]
Note: if you are upgrading, you will need to drop and recreate the database schema as described below.
Revision 962 (2012-01-20) [Download] [Checksum]
Note: I'm just being lazy and using the SVN revision number as the version, hence the inflated value and irregular sequence.
Optionally set some environment variables. These allow you to chose which ports tree to analyse and which database to populate with the results. If left unset, the default values are as shown.
% export PORTSDIR=/usr/ports % export PGDATABASE=portindex
% createdb -U pgsql -O ${USER} -l C -T template0 -E SQL_ASCII $PGDATABASE
You don't have to use SQL_ASCII encoding — UTF8 (which the usual default) will be fine. There are rumours that SQL_ASCII is faster though. In any case, using the C locale is important so you get the correct collation order.
Edit ${PGDATA}/data/pg_hba.conf if necessary to set access rights — the scripts default to connecting to a postgresql server on localhost via unix domain socket using the same username as the Unix account and no password. However, all of those defaults are modifiable from the command line or by various standard methods used by PostgreSQL. See perldoc portindexdb for details.
Note: relies on the PGDATABASE variable to connect to the intended database.
% psql < schema.sql
% portindexdb
This will scan your ports tree and then build the database from the information it collects. It takes about 1.2ks (20 minutes) on my system, most of which time is spent scanning through all the ports tree extracting the raw data.
There are a number of sample queries in queries.sql, including one that will generate a standard INDEX file, if you turn off the column headings and result padding in psql(1).
If you find this useful, or have suggestions for improvements or bug reports, or in fact just want to comment on it in any way, I would be glad to hear from you.