Knowing the process of how data is generated and how to collect the data will make your data analysis work more efficient. You analyze data to solve a problem, and this problem will narrow down what kind of data needs to be gathered. In data analysis, you need to figure out which data will help you solve the problem, and knowing things listed below will help you acheive that goal.
Data Collection Consideration
- Decide how you will collect the data
- Choose where you are getting the data
- Decide which data to use
- How much of data you will collect (or use)
- Select the right data type
- Determine the time frame
How You Will Collect The Data
You will need to decide whether you will collect the data yourself (1st-party data) or receive it from others (2nd/3rd party data).
Where You Will Get The Data
When you decide not to collect data yourself, you might as well get the data from 2nd-party or 3rd-party providers. Difference between those is whether the data is collected by them or not. 3rd-party providers provide data that is not collected by themselves.
Which Data To Use
Remember you are trying to solve a problem, so stick to the information that can actually help you achieve the goal.
How Much of Data To Collect
If you are collecting data yourself, deciding reasonable sample size will help your analysis to be more accurate. Some tasks can be completed by random-sampling data from the existing historical data. Each project has different needs.
Select The Right Data Type
If you are analyzing trends over time, then you are interested in time right? That means the right data type for this task is time series data e.g. 'date', 'hour', 'second' etc.
Determine Time Frame
You will need to determine how long you will collect the data. If there is no time to collect data, you will need to select already existing historical data to solve the problem.
Flow Chart
If you are collecting new data:
- Select the right data type
- Determine the Time Frame
- Decide how you will collect the data
- Decide how much of data to collect
If you are using existing data:
- Select the right data type
- Determine the Time Frame
- Decide where you will get the data
- Decide which data to use