KNIME Analytics Platform is the open source software for creating data science applications and services. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone.
KNIME Text Processing feature was designed and developed to read and process textual data, and transform it into numerical data (document and term vectors) in order to apply regular KNIME data mining nodes (e.g. for clustering and classification). This feature allows for the parsing of texts available in various formats (e.g. Xml, Microsoft Word or PDF and the internal representation of documents and terms) as KNIME data cells stored in a data table.
It is possible to recognize and tag different kinds of named entities such as names of persons and organizations, genes and proteins or chemical compounds, thus enriching the documents semantically.
Furthermore, documents can be filtered (e.g. by stop word or named entity filters), stemmed by stemmers for various languages and preprocessed in many other ways. Frequencies of words can be computed, keywords can be extracted, and documents can be visualized (e.g.tag clouds).