Data Management

Anuradha Mohanty
6 min readJun 24, 2021

Data management is the ability of an organization to manage its data using various technologies. It also ensures that only authorized users can access the data within an organization. Data management is required at every step of an organization, be it to run daily operations or to make analytical queries on the stored data.

Data comes from everywhere and at any time. The frequency, velocity of incoming data has changed.

Where do data come from
What the data is all about.

Data management starts with collecting, storing, and governing the flow of data(by running the governing logic on top of it) to know the data quality and if it’s in a usable format. We also need to integrate data from multiple sources and bring some value out of it. Also, important is designing the policies and rules to manage data quality and security issues. Ultimately, choosing the right technologies to manage the data at every stage.

Enterprise needs to manage data because:

  1. To run the daily business: To run its daily transactions, a company builds a system that can handle updation, deletion, and insertion queries as quickly as possible, allowing different users to access the system concurrently. A transaction system also asks to verify and update each transaction in the system. Such systems can be supported by DBMSes based on the relational model.
  2. To make data-driven decisions: A data-driven decision-making process requires historical data that is stored by a company from all the relevant parts of the company in one place. This data is then modeled using dimensional models for analysis.

Data Warehouses are central repositories where data about a particular subject is integrated and analysis is done on that data.

There are two methods of data management:

  1. File Systems:

a. The data is stored in files. Different files are created for each activity. Customers, daily transactions, employees, sales, marketing, and other such business activities have their own files.

b. Programs are written to update any information or query these files.

c. One program is for adding daily transactions, one for querying a customer file, one program is for data manipulation.

2. DBMS:

a. The data is stored in databases. databases are a collection of tables. There is a different database for each activity. Customers' daily transactions employees, sales, marketing, and other business activities have their own databases.

b. SQL commands are written to update any information or query these files.

Components of Data Management

Data management involves designing various policies that govern the flow of data within an organization. Besides this, creating rules for data-quality processes, providing the required data to relevant business users, and integrating data for consistency and analytics are the other important aspects of data management.

Components of Data Management are:

  1. Data Governance: is a set of rules and policies that govern the flow of data in a company. Every bit of data is collected, stored, and managed as per the rules set up by a company e.g. Where to store the activity log data of a customer on the app, which applications to use to store various kinds of data, who can access the data stored at various places.

Which connectors to use to extract data from various sources, Which applications to use to analyze the data, Whether to use a warehouse or data lakes as the central repository.

How to handle incorrect data: In case certain fields in a table are null, whether to use default values according to the data type or add descriptive detail such as ‘Not Specified’. In case there are many null fields in a row, whether to reject it or make changes to the row and add it to the data processing stream.

Data Governance decides the tools that need to be used to implement data security, data integration, and data quality. It sets clear and standardized rules throughout the organization.

Data governance answers the following questions:

  • What is the need for setting certain rules for data usage?
  • What are these rules?
  • Where can one find specific company data?
  • Who has access to certain data?
  • Which technologies can be used for data management and governance?
  • How can certain rules be implemented for efficient usage of data?
Data governance in File-based system and DBMS

2. Data Integrity: the process of compiling data in one central location, Every piece of data should have only one copy that can be accessed by business users, Multiple copies of the same data should not be available at various locations, If Data integration not done, there can be multiple copies of the same data, making the data inconsistent, Data inconsistency occurs when the same data copied at two different places displays different values for a particular field.

3. Data Quality: is the process of cleaning and standardizing the collected data, A complete description of the stored data is available at every stage of data management, The data must describe every detail about the entity that it represents and the stored information must not be incorrect, A description of how the stored data is related must be available, Data Quality ensures that the data can be used for transactional and analytical purposes.

Data Quality
  1. Data Security: It refers to the process of securing the data within an organization as well as outside it, Data Governance policies decide the rules regarding data usages and access within the organization, Data Security enables only authorized users to access the data.

Uses of Managed Data: data flows from various data sources and is finally used by business users to report various business queries and plan their business activities including sales and marketing based on the reports.

A company collects data from various sources such as web servers, surveys, transactional systems, flat files, etc. Within an organization, data can be collected and stored in many different formats. To make data-driven decisions based on relevant and important data, every company creates and defines certain processes. One of these processes is ETL. ETL stands for Extract, Transform and Load.

Data is first Extracted by connecting to various data sources that are relevant to the business requirements of a company. The raw data is then Transformed into specific formats as per data management policies. In the transformation phase, the data is cleaned, structured and enriched, and, finally, Loaded into a central repository.

This central repository could be either a data warehouse, if the data is structured, or a data lake if the data is semi-structured or unstructured. The data stored in the central repository is then used for analytical reporting of sales figures and revenue generated reports among various other reports that are created based on the business requirements of a company. You can also run machine learning models on this data to analyse the market before launching new promotional campaigns or products.

Flow of Data from Source to Reporting

Reference: MSc Curriculum, LJMU

--

--