
Understanding Step One
Defining the Problem and/or Goal
What does this mean? — Clearly identify the question you’re trying to answer or the problem you’re aiming to solve using data. This sets the direction for the entire data science process and ensures the work is aligned with a real-world objective.
What to ask. — “What do I want to understand, predict, or improve?” From there, narrow it down into a clear, specific question that data can help answer—this becomes your foundation for the rest of the project.
When to start step one. — Start defining your data problem or goal when you have a business challenge, decision to make, or curiosity that could be informed by data.

Understanding Step Two
Collecting the Data
What does this mean? — Gather relevant data from available sources to support analysis and address the defined problem or goal.
What to ask. — “What data do I need to answer my question, and where can I get it?” This could involve pulling from internal systems, public sources, surveys, or sensors—anything that provides relevant, trustworthy information.
When to start step two. — Begin collecting data once you’ve clearly defined your problem or goal.

Understanding Step Three
Clean / Pre-Process Data
What does this mean? — Prepare the data by correcting errors, filling in missing values, and formatting it for analysis.
What to ask. — “Is my data accurate, complete, and usable?” Clean data lays the foundation for meaningful results.
When to start step three. — Start cleaning / pre-processing data after it’s been collected. This step involves removing errors, duplicates, and irrelevant information, and formatting everything consistently so it’s ready for analysis.

Understanding Step Four
Standardize & Integrate Data Sets
What does this mean? — Ensure consistency across data sets by formatting them uniformly and combining multiple sources into a cohesive structure.
What to ask. — “Are my data sources aligned and speaking the same language?” This makes your analysis more reliable and scalable.
When to start step four. — Move to standardizing and integrating data once it’s been cleaned and you’re working with multiple sources or formats. This step ensures that all datasets use consistent units, labels, and structures so they can be combined and compared accurately.

Understanding Step Five
Transform Data
What does this mean? — Modify data to create new features or formats that better support analysis, modeling, or visualization.
What to ask. — “Does my data need to be reshaped or enhanced to support deeper insights?” Transformation makes raw data more meaningful and actionable.
When to start step five. — Begin transforming data when you need to convert it into formats or features better suited for analysis or modeling. This step may involve creating new variables, aggregating data, or applying mathematical functions

Understanding Step Five
Mine Data
What does this mean? — Analyze data using statistical methods, algorithms, or machine learning to discover patterns, trends, or insights.
What to ask. — “What insights am I hoping to discover from this data that will inform decisions or strategies?” Mining helps turn data into powerful, predictive, or explanatory tools.
When to start step six. — You start mining data when you’re ready to uncover patterns, relationships, or trends that aren’t immediately obvious. This stage often uses statistical models, machine learning, or AI techniques to extract knowledge from the data.

Understanding Step Five
Evaluate & Interpret Data
What does this mean? — Assess the quality and relevance of the findings, drawing conclusions and identifying actionable insights based on the data analysis.
What to ask. — “Do these results make sense given the original goal, and what decisions can they inform?” It ensures the insights are valid, actionable, and aligned with your initial objectives.
When to start step seven. — Begin evaluating and interpreting your data when your analysis or model has produced results, and you need to understand what they mean. This stage is about assessing the accuracy, relevance, and implications of your findings.