Ingestion

Almost all operations in ayfie are based on linguistically motivated document profiles.

Each document is initially ingested as a sequence of tokens (and is always searchable as such). During the ingestion process, all relevant terminology in that document is identified. Terminology can consist of entity names, domain vocabulary or special expressions such as acronyms, measurements etc.  This terminology is often not represented by single terms but by multi-word expressions, e.g. “revenue sharing agreement” or “Jim McGhee”.

Ayfie groups these terms together based on spelling variants, inflectional forms etc. and enriches them with synonyms. The resulting term groups make up the document profile, i.e. a linguistic signature that describes the document content at an abstract level.

Download the whitepaper by filling out the form...

Loading...