An introduction to this modern gradient-boosting library

Photo by Manja Vitolic on Unsplash

If you’ve worked as a data scientist, competed in Kaggle competitions, or even browsed data science articles on the internet, there’s a high chance that you’ve heard of XGBoost. Even today, it is often the go-to algorithm for many Kagglers and data scientists working on general machine learning tasks.

While XGBoost is popular for good reasons, it does have some limitations, which I mentioned in my article below.

Odds are, you’ve probably heard of XGBoost, have you ever heard of CatBoost? CatBoost is another open-source gradient boosting library that was created by researchers at Yandex. While it might be slower…


Using this Python library to send model training updates.

Photo by Sara Kurfeß on Unsplash

Imagine this scenario — you’re working on a deep learning project and just started a time-consuming training job on a GPU. Based on your estimates, it will take you about fifteen hours for your job to finish. Obviously, you don’t want to watch your model train for that long. But you still want to know when it finishes training while you’re away from your computer or working on a different task.

Recently, HuggingFace released a Python library called knockknock that allows developers to set up and receive notifications when their models are done training.


How you can use this powerful API for analyzing articles on the web.

Image by the author.

In the Information Age, we have a huge amount of information available to us at our fingertips. The internet is so large that actually estimating its size is a complex task. When it comes to information, our problem is not the absence of information, but rather making sense of the vast amount of information available to us.

What if you could automatically sift through hundreds of web pages and gather the most important points and keywords without having to read everything? This is where TLDR comes into play!

TLDR (too long, didn’t read) is an API that I created for…


How to deploy your ML models quickly with this API-building tool.

Photo by toine G on Unsplash

Knowing how to integrate machine learning models into usable applications is an important skill for data scientists. In my previous article linked below, I demonstrated how you can quickly and easily build web apps to showcase your models with Streamlit.

However, what if you want to integrate your machine learning model into a larger software solution instead of a simple standalone web application? What if you are working alongside a software engineer who is building a large application and needs to access your model through a REST API? This exactly where FastAPI comes into play.

FastAPI is a Python web…


The quickest way to embed your models into web apps.

A stream in a mountain landscape with trees.
A stream in a mountain landscape with trees.
Photo by Tom Gainor on Unsplash

If you’re a data scientist or a machine learning engineer, you are probably reasonably confident in your ability to build models to solve real-world business problems. But how good are you at front-end web development? Can you build a visually appealing web application to showcase your models? Chances are, you may be a Python specialist, but not a front-end Javascript expert.

But thankfully, you don’t have to be one! Streamlit is a Python framework that makes it very easy for machine learning and data science practitioners to build web apps in pure Python. …


Getting Started

Train, visualize, evaluate, interpret, and deploy models with minimal code

Photo by Armando Arauz on Unsplash

When we approach supervised machine learning problems, it can be tempting to just see how a random forest or gradient boosting model performs and stop experimenting if we are satisfied with the results. What if you could compare many different models with just one line of code? What if you could reduce each step of the data science process from feature engineering to model deployment to just a few lines of code?

This is exactly where PyCaret comes into play. PyCaret is a high-level, low-code Python library that makes it easy to compare, train, evaluate, tune, and deploy machine learning…


Understanding the hype behind this language model that generates human-like text

Photo by Patrick Tomasso on Unsplash

GPT-3 (Generative Pre-trained Transformer 3) is a language model that was created by OpenAI, an artificial intelligence research laboratory in San Francisco. The 175-billion parameter deep learning model is capable of producing human-like text and was trained on large text datasets with hundreds of billions of words.

“I am open to the idea that a worm with 302 neurons is conscious, so I am open to the idea that GPT-3 with 175 billion parameters is conscious too.” — David Chalmers

Since last summer, GPT-3 has made headlines, and entire startups have been created using this tool. However, it’s important to…


Using data from high energy collisions to detect new particles

Photo by Yulia Buchatskaya on Unsplash

An interesting branch of physics that is still being researched and developed today is the study of subatomic particles. Scientists at particle physics laboratories around the world will use particle accelerators to smash particles together at high speeds in the search for new particles. Finding new particles involves identifying events of interest (signal processes) from background processes.

The HEPMASS Dataset, which is publicly available in the UCI Machine Learning Repository, contains data from Monte Carlo simulations of 10.5 million particle collisions. The dataset contains labeled samples with 27 normalized features and a mass feature for each particle collision. …


Using Spotify’s data to generate music recommendations.

Photo by Marcela Laskoski on Unsplash

Have you ever wondered how Spotify recommends songs and playlists based on your listening history? Do you wonder how Spotify manages to find songs that sound similar to the ones you’ve already listened to?

Interestingly, Spotify has a web API that developers can use to retrieve audio features and metadata about songs such as the song’s popularity, tempo, loudness, key, and the year in which it was released. We can use this data to build music recommendation systems that recommend songs to users based on both the audio features and the metadata of the songs that they have listened to.


Building movie recommender systems with deep learning.

Spotlight on a stage.
Spotlight on a stage.
Photo by Nick Fewings on Unsplash

In my previous article, I demonstrated how to build shallow recommender systems based on techniques such as matrix factorization using Surprise.

But what if you want to build a recommender system that uses techniques that are more sophisticated than simple matrix factorization? What if you want to build recommender systems with deep learning? What if you want to use a user’s viewing history to predict the next movie that they will watch?

This is where Spotlight, a Python library that uses PyTorch to create recommender systems with deep learning, comes into play. …

Amol Mavuduru

Software Engineer, Former Researcher, and Aspiring Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store