Creating an application
The first step after installation is to create a corpus
and then automatically derive an application. A corpus is a set of
documents packaged in such a way tOKo can handle it. Generally, there
are two cases:
- The documents are in an open format recognised by tOKo. Currently
these are plain text, HTML, and Movable Type (for bloggers).
- In all other cases you need pre-processing to get the
documents into one of the recognised formats, or wrap some XML around
the documents to create the corpus.
Assuming you have started tOKo, for example by double-clicking
applications/startup/run.pl, a corpus and the corresponding
application is then created by the wizard .
The wizard first asks the kind of documents you have:
- HTML or text files in a folder.
The next question asks for the folder that contains the files.
- Movable Type file. Next
question asks for details about your weblog. See below.
- Existing corpus.
Asks for the corpus file and the so called knowledge base. See creating a corpus for more information.
If all is well tOKo has now created a corpus. The next dialog asks
for details about the application:
- Application. The name of the application. This is normally
a short lowercase mnemonic, for example abc. Hit return
after entering, some of the other fields will then be filled automatically.
- Namespace. A URL representing the full name of the
application, for example: http://www.example.org/abc# (it is
usual to put a # or / at the end).
- Directory. The directory in which the application will be
stored (below applications). Defaults to the same as
- Language. Natural language of the corpus.
- Meta class. Normally derived automatically from the
application name (e.g. ABC_Class).
When the wizard is finished the application is stored in the
directory. It can be started by doubling clicking the
run.pl file (e.g. applications/abc/run.pl).
Movable Type file
Type (MT) is a weblogging infrastructure which defines an import
and export format that allows you to save or backup a weblog. tOKo
can read the MT format and automatically create a corpus from it that
includes categories, keywords, comments and trackbacks. Typepad uses Movable
The MT format contains all the content of a weblog, but it does
not contain the metadata. After selecting the MT option in the wizard asks for
- Source. The file to which you have exported your weblog.
- Permalink prefix. Prefix of the permalinks. It is
normally something like: http://my.blog.net/archives/.
- URL. Full URL of the weblog.
- Permalink. Method to derive a permalink from the title
of a post. MT seems to provide two options: truncated (at
most 15 characters) and full title. Select the one that is
applicable for your blog.
- Title. The title of the weblog.
- Person. Name of the blogger (can be the same as the title).