The process by which documents or entities are assigned to groups (classes) or taxonomies. Classification becomes more precise through training – human checks on machine-generated results.
The algorithmic grouping of documents based on extracted concepts and weights. Clustering is rarely fully automated; such a process typically creates results that do not match a user's intuition about how documents should be grouped together. Modern text analytics platforms allow for dynamic drill-down, supervised clustering and other semi-automated (but highly precise and efficient) processes.
Computer investigation and analysis techniques to determine legal evidence. Areas of application include investigations in computer crime or misuse, theft of trade secrets, theft of or destruction of intellectual property, and fraud. Computer forensics specialists use many methods to capture computer system data, and recover deleted, encrypted, or damaged file information.
Written or recorded information, typically in the form of electronic documents (e.g., Word files, emails or PDFs), images (e.g., JPG or PNG files), video (e.g., mp4 files) or audio files (e.g., mp3 files).
The most central information present in a document or piece of content. Examples are the monetary terms and length of a business contract.
A collection of related documents or texts. Examples include Wikipedia and the collected works of Shakespeare.
Reducing the size of a set of electronic documents using mutually defined criteria (dates, keywords, custodians, etc.) to decrease volume while increasing relevancy of the information.
The Electronic Discovery Reference Model (EDRM) defines a custodian as: "Person having administrative control of a document or electronic file; for example, the custodian of an email is the owner of the mailbox which contains the message."