Welcome to the tOKo website

tOKo is an open source tool for text analysis and browsing a corpus of documents. It implements a wide variety of text analysis and browsing functions in an interactive user interface.

An important application area of tOKo is ontology development. It supports both ontology construction from a corpus, as well as relating the ontology back to a corpus (for example by highlighting concepts from the ontology in a document).

Another application area is community research. Here the objective is to analyse the exchange of information, for example in a community forum or through a collection of interconnected weblogs.


Below is an incomplete list of features. If the feature is currently documented a link with more information is given.


  • Automatically create a tOKo application from a collection of HTML or text documents, and from some weblog formats.
  • A corpus may be hierarchical, documents can be typed (e.g. folder vs. file distinction) and may contain a body of text as well as metadata. See also creating a corpus.

Text analysis

  • Word frequency: by word class, by lemma, etc.
  • Context: postfix, infix, prefix.
  • KWIC (keyword in context).
  • Parsing phrases: identification of wordclass information (English, Dutch).
  • An extremely powerful pattern search facility.
  • ...

Ontology development

  • Create concepts and specify the relation to natural language of the concepts (including abbreviations and other alternate spellings).
  • Organise concepts in is-a and part-of hierarchies.
  • ...

API (Application Programmer's Interface)

All functions available from the user interface can also be accessed through a Prolog API. This API is not documented yet. Check contact to influence the order in which the API will be documented.

HTTP interface

For a number of projects tOKo is running as a web-server through an HTTP interface. The specification and implementation of the HTTP interface is driven by demand. The HTTP interface is not currently documented.