A lot of companies struggle to bring their data science projects to production. It is mainly because there is a huge knowledge gap, as data scientists understand model building well but lack productionization skills. The simple reason is that these skills are not taught on Youtube videos and are hardly being touched by data science courses or the Kaggle learning method.

Objective of this newsletter is to share my learning from those various deployments, I have done.

Tech Rule to deployment: Lesser dependency ∝ Faster deployment

Let’s understand data science model deployment with a real problem. A friend of mine called me sometime back requesting that I help him with his use case and deploy the model to production.

We discussed the problem for an hour or two to understand the constraints.

Discussed Constraints Summary:

  • Data Source is Elastic Search (ES very frequently updates with new entries)

  • Real-time or Near Real-Time inference (with a delay acceptable till 10mins)

  • Low on Budget

  • Minimum Failure Rate with Fallbacks

  • Alerting system in case of any failure occurred

     

Post understanding the constraints and the problem, he was trying to solve. I proposed an architecture (check diagram below) that is near-realtime (batch inference every 5mins), fallbacks to the last model update (backend fetches previous update results from s3), and shared a slack webhook simple alert route.

After about two weeks, he called up sharing that the solution worked well 🥳 (emotion super happy). The above is tried and tested: Low on Budget and Low on Maintenance — Model Productionisation Design.

 

Let’s reason!

The architecture design on how the above solves for the constraints

 

Alternative Sagemaker Batch Inference: This could have been a good option but we did not go for it because he already had an EC2 instance running 24x7 and it was underutilized. Apart from the above for 5mins inference (near-real-time inference architecture), it is safer not to use Sagemaker batch inference while the real-time inference option is costlier.

 

Dev familiarity is another factor that is super important when building an architecture design, EC2 has always been a playground for him.

 

Another learning point: if you are running a model for with an update frequency of 1 day, and you want to run task compute on a new EC2 machine, on every run one can still use above architecture. One can follow this blog to learn “How do I stop and start Amazon EC2 instances at regular intervals using Lambda?” This will help save cost of EC2 machine by running it only till compute window.

Why Airflow?

Apache Airflow is an open-source scheduler to manage your regular jobs. It is an excellent tool to organize, execute, and monitor your workflows so that they work seamlessly.

 

Airflow would trigger the data fetch from ES and prediction task in a defined interval for the above case - every 5mins. To write a scheduler expression one can use Crontab. Guru (this expression writing tool is excellent, I often use it when writing an Airflow task).

Other Reasoning Pointers on the Architecture:

  • The model is loaded from S3 in memory for inferencing so that local disk storage is not utilized by model weights. Advice: I have seen many data scientists keep the model file on machine storage. And when something goes wrong with EC2 all files get deleted and with it all their efforts. Always keep the model on S3 as a backup.

     

  • Output is overwritten on S3 and the backend picks model results from S3; S3 storage is reliable and cheapest on AWS.

     

  • For alerts, slack is the best option given it is always on during office hours and when you are unavailable your team members can have visibility on alerts, one can even add airflow failure and retry emails but I prefer alerts with webhooks on the office communication tools: slack/flock/teams

 

I know there are better options to Airflow component like Dagster or Perfect; similarity for other components in the architecture there are new/comparative alternatives. But never forgot the factor of Dev familiarity when choosing tools for your model pipelines. Older the tool better the support and one cannot really downplay it.

“WHAT IF, WE WANT TO DEPLOY A REAL-TIME MODEL IN PRODUCTION?” — my friend asked, even readers here must be thinking the same

We discussed the constraints, summarized below:

  • A pool of Items is fetched from ES
  • Real-Time Model Output with Timeout of 100ms required
  • On Time-Out, Model Fallbacks to Cached version
  • Logging is required

Below is the real-time model productionisation architecture design 👇

Reasoning and Other takeaways on real-time Architecture

  • I have shown dockerized model deployed on EC2. It can be deployed on ECS/SageMaker as well, would leave that choice to you.
  • For caching — personal preference Redis.
  • Kibana is for logging response and info logs within model service
  • For model serving one can use MLFlow, BentoML, FastAPI, Cortex, etc. I prefer BentoML as in under 10 minutes, you’ll be able to serve your ML model over an HTTP API endpoint, and build a docker image that is ready to be deployed in production.

Conclusion

I hope that data science model productionization architecture design is not an out-of-syllabus question anymore. There is so much more to it than we can cover but do spend some time racking your Brain on it..!

#bigdata #ai #7wdata #artificialintelligence #cloud #fact #engineering #didyouknow #technology #physics #nasa #space #facts #universe #knowledge #dailyfacts #biology #factz #chemistry #astronomy #education #earth #memes #cosmos #amazing #nature #allfacts #tech #innovation #astrophysics