When leveraged correctly, data science enables insights and decisions that can increase profitability, drive operational excellence, and enable competitive growth and agility. Utilizing data science does not require an organization to create a formal, dedicated data science team or even make a substantial investment. With eight simple steps, you and your organization can get started today.
Identify & Support Analytical Talent
This step is foundational because you need to have the right individuals to execute the plan. Consider people who are entrepreneurial, have the appropriate technical skillset (some knowledge of math, statistics and programming), and are excellent communicators. As you mature data science in your organization, there will be continuous change and being able to maintain open communication with key stakeholders is important.
Continuously motivating your organization’s analytical talent is also vital. In order to take something abstract and make it attainable, you need to foster a culture of learning and give your team the time and resources to grow. When building something from the ground up, you need to maintain motivation. Value even the smallest milestones.
Identify a “Small” Question
Think ‘use case’ and start with a realistic question – a question that is in your line of business and an area you have sufficient knowledge about. The question should also be one in which you have access to or are able to obtain the necessary data in order to build the model.
Further, given the subjectivity of unsupervised learning problems, it is recommended that you start with a supervised learning (aka predictive type question) and subsequently graduate to an unsupervised learning question after becoming more comfortable with the process.
Obtain, Prepare, and Cleanse Raw Data
This task is not unique to data science and is often the most time consuming part of the process. There are virtually no organizations that have one clean centralized system. More likely, there will be disparate systems, information stored in spreadsheets on someone’s computer, and limited unique identifiers between sources.
Perform Exploratory Data Analysis
After the data is ‘usable’ and standardized, the next step is to evaluate the data. Exploratory data analysis helps to identify trends and anomalies as well as provide insights into what features may be ‘telling’. Visualizations such as boxplots, histograms, scatterplots or even bar charts can help with the analysis.
Perform Feature Engineering
Feature engineering is essentially creating new features from existing features. For example, breaking out a date into separate features of ‘month’, ‘day’, and ‘year’; or creating a moving average are examples. It is not difficult to create features, but it may be time consuming to think through which additional features provide the most value. Often, looking to community resources or speaking with your organization’s experts can be extremely insightful – their experience can often shed light on the most helpful ways to manipulate your available data in order to extract further insights.
Determine Algorithm & Build Model
Once you have a defined question to answer and identified a resource to create the model, data, and features, the next step is to determine which algorithm to use (e.g. linear regression, random forest). From there, it is time to build the model.
Perform Feature Selection & Parameter Tuning
More features do not always result in a stronger model. Trial and error along with general critical thinking can be applied to avoid the curse of dimensionality, where you bog down your model with extraneous details that can result in an over-fit model tied to data that might not actually impact what you are trying to predict. This will allow you to determine what features and parameters best inform the model, while weeding out the noise.
Evaluate Model Outputs & Communicate Results
Once you finish building and you evaluate the output, there are two potential paths:
- Your model is great! Time to celebrate and move on to round two.
- You determine that you need more data, your question is no longer valid, you need to try a different type of model, or add new features. This is an iterative process so do not be discouraged. The more you perform these steps, the better and more proficient both you and your models will get at answering critical business questions.
Make sure to have your team document their work as you move through these steps, since forming a repeatable and documented process can help to significantly expedite project timelines in the future, as well as to onboard new talent as you grow your department. Just like model building, this will also be an iterative process that is improved and refined over time.
In conclusion, a substantial investment is not required to begin embedding data science into your organization. With a little bit of drive and intellectual curiosity you can embrace data science by leveraging your existing analytical talent and get started today.