Data Extraction From Structured & Unstructured Sources

Description

Data extraction is the act or process of retrieving data out of (usually unstructured or poorly structured) data sources for further data processing or data storage. The import into the intermediate extracting system is thus usually followed by data transformation and possibly the addition of metadata prior to export to another stage in the data workflow.

Typical unstructured data sources include web pages, emails, documents, PDFs, scanned text, mainframe reports, spool files etc. Extracting data from these unstructured sources has grown into a considerable technical challenge. Whereas historically data extraction has had to deal with changes in physical hardware formats, the majority of current data extraction deals with extracting data from these unstructured data sources, and from different software formats. This growing process of data extraction from the web is referred to as Web scraping.

Business Value

Data extraction typically takes one of three approaches: (1) Using text pattern matching such as regular expressions to identify small or large-scale structure; (2) Using a table-based approach to identify common sections within a limited domain e.g. in emailed resumes, identifying skills, previous work experience, qualifications etc. using a standard set of commonly used headings; or (3) Using text analytics to attempt to understand the text and link it to other information.

Company Name	Product Name	Type
Averbis	Information Discovery	Commercial
Everteam	Enterprise Data Integration	Commercial
Hyland	Document Filters	Commercial
iManage	iManage RAVN Extract	Commercial
Information Builders	Data Management Platform	Commercial
NetOwl	NetOwl Extractor	Commercial
SAP	SAP HANA	Commercial
SAP	SAP Data Services	Commercial
SAP	SAP Data Services Text Data Processing	Commercial
SAP	SAP Business Intelligence (BI) Solutions	Commercial

Company Name

Product Name

Type

Averbis

Information Discovery

Commercial

Everteam

Enterprise Data Integration

Commercial

Hyland

Document Filters

Commercial

iManage

iManage RAVN Extract

Commercial

Information Builders

Data Management Platform

Commercial

NetOwl

NetOwl Extractor

Commercial

SAP

SAP HANA

Commercial

SAP

SAP Data Services

Commercial

SAP

SAP Data Services Text Data Processing

Commercial

SAP

SAP Business Intelligence (BI) Solutions

Commercial

About Iknow

Strategy Services, Services

Content Services, Services

Services, Technology Services

Knowledge Sharing and Decision-Making Services, Services

Culture and Change Management Services, Services

Capture, save, and reuse knowledge assets, Technologies

Increase the efficiency and effectiveness of teams and business processes, Technologies

Extract insights from data and improve the speed and quality of decision making, Technologies

Accelerate collaboration, knowledge sharing, and knowledge-based innovation, Technologies

Projects, Technology Services

Content Services, Projects

Projects, Culture and Change Management Services

Strategy Services, Projects

Knowledge Sharing and Decision-Making Services, Projects

People

IknowInsights

Data Extraction

Accelerate collaboration, knowledge sharing, and knowledge-based innovation

Extract insights from data and improve the speed and quality of decision making

Increase the efficiency and effectiveness of teams and business processes

Capture, save, and reuse knowledge assets