Creating an application

The first step after installation is to create a corpus and then automatically derive an application. A corpus is a set of documents packaged in such a way tOKo can handle it. Generally, there are two cases:

  1. The documents are in an open format recognised by tOKo. Currently these are plain text, HTML, and Movable Type (for bloggers).
  2. In all other cases you need pre-processing to get the documents into one of the recognised formats, or wrap some XML around the documents to create the corpus.

Assuming you have started tOKo, for example by double-clicking applications/startup/run.pl, a corpus and the corresponding application is then created by the wizard File / New application ....

The wizard first asks the kind of documents you have:

  • HTML or text files in a folder. The next question asks for the folder that contains the files.
  • Movable Type file. Next question asks for details about your weblog. See below.
  • Existing corpus. Asks for the corpus file and the so called knowledge base. See creating a corpus for more information.

If all is well tOKo has now created a corpus. The next dialog asks for details about the application:

  • Application. The name of the application. This is normally a short lowercase mnemonic, for example abc. Hit return after entering, some of the other fields will then be filled automatically.
  • Namespace. A URL representing the full name of the application, for example: http://www.example.org/abc# (it is usual to put a # or / at the end).
  • Directory. The directory in which the application will be stored (below applications). Defaults to the same as the application.
  • Language. Natural language of the corpus.
  • Meta class. Normally derived automatically from the application name (e.g. ABC_Class).

When the wizard is finished the application is stored in the directory. It can be started by doubling clicking the run.pl file (e.g. applications/abc/run.pl).

Movable Type file

Movable Type (MT) is a weblogging infrastructure which defines an import and export format that allows you to save or backup a weblog. tOKo can read the MT format and automatically create a corpus from it that includes categories, keywords, comments and trackbacks. Typepad uses Movable Type.

The MT format contains all the content of a weblog, but it does not contain the metadata. After selecting the MT option in File / New application ... the wizard asks for this metadata:

  • Source. The file to which you have exported your weblog.
  • Permalink prefix. Prefix of the permalinks. It is normally something like: http://my.blog.net/archives/.
  • URL. Full URL of the weblog.
  • Permalink. Method to derive a permalink from the title of a post. MT seems to provide two options: truncated (at most 15 characters) and full title. Select the one that is applicable for your blog.
  • Title. The title of the weblog.
  • Person. Name of the blogger (can be the same as the title).