Measuring The Circle:

Armed only with a Ring of Confidence, Bob looks towards the Fortean future.


INDEXING THE INFINITELY PROBABLE

In the last few weeks I have at last begun a project that was long on my mind ... an online index to Fortean Times. The initial stages will explore the potential of Internet tools (such as wikis) to make this a live project available to all via the CFI website.

As some of you may know, Steve Moore and I compiled detailed index to FT up to issue 105 (1997), but, in the 12 years since then, the count has risen by another 149 issues.

The index to issues 1-66 filled a volume of their own, after which indexes for 67-105 appeared in instalments in our Fortean Studies series, which at that time was fortunate to appear each year. I'm not counting the two indexes done by Ian Simmons for issues 106-129, because they were, unfortunately, critically limited in scope and detail due to shortage of space available and Ian's time.

There were significant logistic problems with our indexes. Besides the daunting hard work of compiling them to the sophisticated level of cross-referencing that good Fortean research requires, they were a devil to proof and impossible to correct without new printing. Printing itself was expensive and, in hindsight, probably not the best way to distribute the work to those who would find it most useful. We regret that we could not add indexes to the seven volumes of Fortean Studies - or any other FT publication, such as our handful of Occasional Papers - into a grand index. Finally, while we have plans for further research publications, we have no affordable way to include conventional indexes.

Seeing a few online indexes to periodicals has inspired me to try and progress this dream of an FT index. Of course the chief advantages of an online reference work are that it will be easy to accumulate and maintain and it will even accommodate more than one contributor. It must be flexible enough to be queried from nearly every conceivable angle, especially principal queries such as issues by date and issue number; contents lists of each issue, articles by title and author; reviews by title, author and reviewer; obituaries by name; report items by title and issue data. The main indexes should include subjects/topics, titles, names, dates, and geographical data. That is the least I'd want from conventional indexing ...but I think we can improve on that.

In order to avoid copyright problems I have decided that, initially, this great index will have no content other than descriptive content summary where necessary. Image-scans of pages, illustrations and text can all be attached later, when we enter a more expansive phase - eg. digitising the extensive news-clipping collection from which FT's reports were written, and several times more than that of material that has not been used in FT. 

In this way, the CFI site will have some useful content to offer members and the public. I know from direct experience that the FT team and most of its family of writers have a serious need for a solid index of this sort, as back-content queries arise several times a day.

My idea for keeping it up to date is to negotiate with Dennis Publishing to get page files off the FT team for each issue when it is published. The text files could be processed into a list of indexable words and these parsed into the wiki-type pages using an online editor. For all earlier issues for which we have no archived text files, systematic scanning to OCR will be necessary. If 250 people volunteered to scan an issue each, that entire stage could be done in a week. More realistically, if we get, say, five people to scan an issue each every two weeks, it could be done within a year. We'll see.

But scanning is just the start of the process, of course; the text files have to be boiled down into lists of indexable words. The current index, as detailed as it was - we had 12 indexes (from topics, countries and dates to contributors and reviews by name and title) - was rigid in that you could only query on what we had decided was an important keyword, and all the cross-links and references were ones we had chosen. As we cannot anticipate every possible line of inquiry or what datum may be important to your particular query, I am trying to develop a structure which will allow such versatility in querying the data.Instead of multiple but parallel conventional indexes (as outlined above) what if we have just one database that can cope with user-defined queries with impressive speed? 

This is definitely not an attempt to categorise data that defies categorisation; even Fort had dificulties with his system of shoeboxes and 1,300 headings. I have abandonned the traditional form of indexing under alphabetical hierarchies (with laboured cross-referencing) as too rigid, cumbersome, time-consuming, prone to error (who proofreads an index?) and correction. Instead, I'm embracing the new technology of wikis and high-speed processing (on demand) accessable by all via the internet. The degree of flexibility needed should be possible if I list all the indexable words for each item on each page in each issue; these will then be used to generate the wiki entries. I am currently experimenting with a few OCR and text analysis programs to better judge the procedures (and their difficulties) for generating word-lists from scanned pages.

If all goes well, indexes for each issue could be put up regularly and the whole completed within a handful of years. Thereafter, it would have to be updated every month. I've embarked on an examination of suitable indexing software and learning how to edit wikis.

Daunting? Maybe, but if it works, it will be worth the effort and I'm willing to give it my time. I just need to figure out the best way to do it.

 

 

 

 

 

 

 

 

 

 

 

 

 

Trackback URL for this post:

http://blogs.forteana.org/trackback/82
xxx