Data is a set of information or characteristics that are collected through observation. In various industries, many forms of data are continuously being collected accumulated.
- Business Industry (sales, revenue, profit, stock price data)
- Finance Industry (crime rate, unempolyment rate, literacy rate data)
- Social Industry (homeless data)
Structured data is a data or file stored in a formatted repository, such as spreadsheets or tables in relational database systems. Structured data is supported by the schema structure, and thus follows "table search, column search, row search" in a searching process. These type of data usually exists in an internal system and has a designated internal format, so it is relatively easy to collect.
- RDBMS table
Semi-structured data is a data in a file format that consists of metadata, which is a characteristic of a structured data. It is important to understand the data structure to successfully parse data. These type of data are often available in API formats, and requires some degree of data processing techniques. It also requires some modifications to its data arhictecture to have it transformed into a sturctured data.
- HTML in URL links
- XML, JSON from API format
- Web logs, IOT sensor data
Unstructured Data is a type of data that does not have a pre-defined format or not organized in a predefined manner. Texts, images, and videos are exampels of unstructured data. HTML Data may also be classified as semi-structured data, but it is difficult to accurately classify because data is collected through text mining in some cases. These data require parsing and converting the data set into a meta structure. Processing unstructured data is relatively challenging, as it also needs data architect modifications to organize the data into a structured format.
- Images and videos in binary file format
- Texts in script file format
Internal Data is a type of data which has its original data stored in an internal system. As the original data and the collected data are both stored internally, data communication is easier and has relatively fewer technical restrictions
External Data is a type of data which has its original data stored in an external system. As the original data is stored outside, data communication requires a consent from the data provider. Collecting data also requires analyzing collection cycle and methods.