Getting Started

Train, visualize, evaluate, interpret, and deploy models with minimal code

Image for post
Image for post
Photo by Armando Arauz on Unsplash

When we approach supervised machine learning problems, it can be tempting to just see how a random forest or gradient boosting model performs and stop experimenting if we are satisfied with the results. What if you could compare many different models with just one line of code? What if you could reduce each step of the data science process from feature engineering to model deployment to just a few lines of code?

This is exactly where PyCaret comes into play. PyCaret is a high-level, low-code Python library that makes it easy to compare, train, evaluate, tune, and deploy machine learning…

Understanding the hype behind this language model that generates human-like text

Image for post
Image for post
Photo by Patrick Tomasso on Unsplash

GPT-3 (Generative Pre-trained Transformer 3) is a language model that was created by OpenAI, an artificial intelligence research laboratory in San Francisco. The 175-billion parameter deep learning model is capable of producing human-like text and was trained on large text datasets with hundreds of billions of words.

“I am open to the idea that a worm with 302 neurons is conscious, so I am open to the idea that GPT-3 with 175 billion parameters is conscious too.” — David Chalmers

Since last summer, GPT-3 has made headlines, and entire startups have been created using this tool. However, it’s important to…

Using data from high energy collisions to detect new particles

Image for post
Image for post
Photo by Yulia Buchatskaya on Unsplash

An interesting branch of physics that is still being researched and developed today is the study of subatomic particles. Scientists at particle physics laboratories around the world will use particle accelerators to smash particles together at high speeds in the search for new particles. Finding new particles involves identifying events of interest (signal processes) from background processes.

The HEPMASS Dataset, which is publicly available in the UCI Machine Learning Repository, contains data from Monte Carlo simulations of 10.5 million particle collisions. The dataset contains labeled samples with 27 normalized features and a mass feature for each particle collision. …

Using Spotify’s data to generate music recommendations.

Image for post
Image for post
Photo by Marcela Laskoski on Unsplash

Have you ever wondered how Spotify recommends songs and playlists based on your listening history? Do you wonder how Spotify manages to find songs that sound similar to the ones you’ve already listened to?

Interestingly, Spotify has a web API that developers can use to retrieve audio features and metadata about songs such as the song’s popularity, tempo, loudness, key, and the year in which it was released. We can use this data to build music recommendation systems that recommend songs to users based on both the audio features and the metadata of the songs that they have listened to.

Building movie recommender systems with deep learning.

Spotlight on a stage.
Spotlight on a stage.
Photo by Nick Fewings on Unsplash

In my previous article, I demonstrated how to build shallow recommender systems based on techniques such as matrix factorization using Surprise.

But what if you want to build a recommender system that uses techniques that are more sophisticated than simple matrix factorization? What if you want to build recommender systems with deep learning? What if you want to use a user’s viewing history to predict the next movie that they will watch?

This is where Spotlight, a Python library that uses PyTorch to create recommender systems with deep learning, comes into play. …

A step-by-step approach to getting started and developing your skills in this rapidly changing field.

Image for post
Image for post
Photo by Myriam Jessier on Unsplash

For several years, Data Scientist was ranked as the best job in America by Glassdoor. Today it no longer holds the top spot in job rankings but it still ranks near the top of the list. It’s no secret that data science is a broad and rapidly growing field, especially as advances in artificial intelligence push the limits of what we previously believed was possible.

If you are reading this article, you probably want to learn data science or get better at data science if you’ve already started learning. One of the most challenging parts of learning data science is…

Using this Python library to build a book recommendation system.

Surprise with confetti.
Surprise with confetti.
Photo by Hugo Ruiz on Unsplash

If you’ve ever worked on a data science project, you probably have a default library that you use for standard tasks. Most people will probably use Pandas for data manipulation, Scikit-learn for general-purpose machine learning applications, and TensorFlow or PyTorch for deep learning. But what would you use to build a recommender system? This is where Surprise comes into play.

Surprise is an open-source Python library that makes it easy for developers to build recommender systems with explicit rating data. In this article, I will show you how you can use Surprise to build a book recommendation system using the…

An introduction to Facebook’s updated forecasting library.

Stock price prediction with NeuralProphet.
Stock price prediction with NeuralProphet.
Image created by author using NeuralProphet.

Just recently, Facebook, in collaboration with researchers at Stanford and Monash University, released a new open-source time-series forecasting library called NeuralProphet. NeuralProphet is an extension of Prophet, a forecasting library that was released in 2017 by Facebook’s Core Data Science Team.

NeuralProphet is an upgraded version of Prophet that is built using PyTorch and uses deep learning models such as AR-Net for time-series forecasting. The main benefit of using NeuralProphet is that it features a simple API inspired by Prophet, but gives you access to more sophisticated deep learning models for time-series forecasting.

How to Use NeuralProphet


You can install NeuralProphet directly with pip…

Exploring the strengths and limitations of this metaphor in the information age.

Oil refinery at night.
Oil refinery at night.
Photo by Robin Sommer on Unsplash

Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.” — Clive Humby, 2006

Clive Humby, a British mathematician and data science entrepreneur, originally coined the phrase “data is the new oil” and since then several others have repeated this phrase. In 2011, the senior vice-president of Gartner, Peter Sondergaard, took this concept even further.

“Information is the oil of the 21st century, and…

An introduction to the models that have revolutionized natural language processing in the last few years.

Image for post
Image for post
Photo by Arseny Togulev on Unsplash

One innovation that has taken natural language processing to new heights in the last three years was the development of transformers. And no, I’m not talking about the giant robots that turn into cars in the famous science-fiction film series directed by Michael Bay.

Transformers are semi-supervised machine learning models that are primarily used with text data and have replaced recurrent neural networks in natural language processing tasks. The goal of this article is to explain how transformers work and to show you how you can use them in your own machine learning projects.

How Transformers Work

Transformers were originally introduced by researchers…

Amol Mavuduru

Software Engineer, Former Researcher, and Aspiring Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store