Credit: Data Science Central
Machine Learning (ML) has been among the top strategies for almost every organization – whoever adopts the new methodology early and quickly establishes the corporate capability will gain competitive advantages in the market.
However, because ML is still a relatively new field and data science is a skillset that is scarce for many companies, how to make the journey move at optimal speed without disaster or time wasted becomes an important question to every executive and senior manager.
By sharing five best practices acquired from our own experiences, this article will streamline your project with fewer pitfalls and setbacks:
1. Be very selective on the first ML project.
The selection of the initial ML project to be tested is extremely important. A correct selection will make the project already 50% successful. The project should have a scope that is well defined, with a clear measurable outcome and a conceivable project timeline of fewer than 3 months.
Avoid large and “long-term” vision projects at the beginning, even if it can be broken down into smaller phases. The problem of this type of project is that due to the large scale, the objective is usually of very high level and the scope of smaller phases can be also vague and ill-defined at the start, which can lead to unexpected results and dilemma of continuing or stopping the project due to the longer project timeline.
It is highly recommended to start with a few candidate projects and to take time to make the right decision for the first project. This will guarantee it to be successful and proceed smoothly, with fewer resources and time wasted, while laying a solid foundation within the organization to move forward with more confidence and clarity on machine learning.
2. Leverage external expertise to perform the proof-of-concept (POC) and select vendors.
In many cases, an organization does not have any prior experience on machine learning, therefore, executives or managers cannot make the right judgment on neither vendor selection nor resource hiring for the ML project, no matter how enthusiastic they are.
Our experience told us that if this stage is not done correctly, it could waste a lot of time and lead to reinventing the wheel. What we recommend is to first hire an experienced consulting firm that has the experience of doing machine learning projects, who will be working with the company closely to do the vendor assessment and POC project planning.
Today there are many consulting firms that can do this. Below lists the criteria to make a good selection:
- They should have prior successful experiences with similar use cases.
- Avoid those who sell their own ML software or solutions, hence, likely provide biased recommendations.
- Have a good relationship with multiple software and solution vendors
- Have resources strong in data science and project management skills
Many consulting firms have also teamed with Cloud vendors, which makes the POC a lot easier, faster and cost-effective.
3. Plan carefully the POC to make it complete quickly while delivering realistic results
Once the consulting confirm is chosen, the next stage is to do the Proof-of-Concept (POC). This is another critical stage. In our case, we had many POCs performed during the initial 2 years, with the majority of them ended up not conclusive enough for the company to move on. Below lists the top lessons learned:
- The sample data was too small or not representative from the beginning; no matter how excellent the ML result was, the result was still not convincing.
- The success criteria were not defined at the beginning, and those who should validate the ML results were not involved until the very end.
- Not enough investment to allow for the POC test with realistic sample size and more methodologies.
- Focused on a particular algorithm too early without giving enough options at the end.
In order to make the POC successful and useful, it must be done as a formal project by starting with a clear project scope and success criteria that are clearly defined and documented. In addition, all the key project team members should participate from the very beginning, including business analysts and internal people who can validate and approve the ML results.
Lastly, testing with different algorithms and different vendors should be baked into the project plan, so that the organization can make the final decision based on several options with clear pros and cons.
4. Build up internal centralized data science expertise and capability as an investment.
After the POC is completed with promising results, the team should now have a good idea how to proceed next by deciding upon:
- How many data scientists would be needed
- Future team and organization structure
- Future architecture for machine learning
Note that this is a great opportunity to establish centralized excellence of machine learning for the whole organization to avoid any silos within the company from the beginning. The budget should be reviewed by the top executives in the same room and funding should be given with the support of long-term vision. If it is not done, the project and budget can be still killed or delayed because it is perceived as “too much cost” or not enough return on investment. At this stage, the goal is more investment for the future growth or huge potential of cost savings.
5. Scale up with complementary resources and establish “citizen data science” culture for larger and sustainable implementations.
After several machine learning projects are done successfully with the core competency built up, the organization is now ready to plan for larger systems and more projects to leverage machine learning methodologies. At this stage, there are 2 key areas that need to be realized and transformed:
- Data engineers, business analysts and project managers need to work closely with data scientists for a successful large-scale systematic implementation, which accommodates special features and quick cycles of machine learning algorithms. Data scientists are good at developing and testing algorithms and prototypes with quick cycles but do not have expertise in change management and system programming.
- Given the quick advances in ML libraries, software, and platforms, it becomes very plausible and feasible to train internal people to equip with data science skills and bring up a population of “citizen data scientists” across the organization. In other word, it is important for an organization to build up the culture of machine learning and talents internally as complementary to external hiring.
To initiate and develop the core competency of machine learning within a company is not as simple as hiring data scientists and kicking off projects. These two factors are necessary but not sufficient for success.
It’s imperative that organizations leverage these five key areas — they are required steps to build scalable machine learning capabilities with sustainable core competencies that will drive continuous revenue and profit growth.