Cloud Providers: AWS, Azure, GCP
In the previous tutorials, we explored how running software in the cloud isn’t just about where it lives, it’s also about how much of it you want to manage. Now that you’ve seen the different levels of...
View ArticleProject Tutorial: Build A Python Word Guessing Game
In this tutorial, I'll guide you through creating a word-guessing game similar to Wordle using Python. While many coding projects focus on data analysis, building a game is not only fun but also helps...
View ArticleIntroduction to Prompt Engineering for Data Professionals
As a data professional, you're comfortable with tools like Python, SQL, data visualization, and statistical analysis. You can transform messy datasets into valuable insights and build models that help...
View ArticleProject Tutorial: Predicting Heart Disease with Machine Learning
In this tutorial, we'll walk through a complete machine learning project to predict the likelihood of heart disease in patients. As a data scientist working for a healthcare solutions company, your...
View ArticlePractical Application of Prompt Engineering for Data Professionals
In Part 1 of this tutorial, you learned about prompt engineering fundamentals and strategies to communicate effectively with AI models. Now, we'll put those skills into practice with a common data...
View ArticleProject Tutorial: Customer Segmentation Using K-Means Clustering
In this project walkthrough, we'll explore how to segment credit card customers using unsupervised machine learning. By analyzing customer behavior and demographic data, we'll identify distinct...
View ArticleUsing LLMs to Improve Data Communication
Whether you’ve just uncovered a key data insight or have already walked through chart selection and storytelling, the challenge is the same: How do you communicate your message so your audience...
View ArticleThe Evolution of AI in Data Science and What It Means for Your Career
The term "data science" (and the practice itself) has evolved dramatically over the years. In recent years, its popularity has grown considerably due to innovations in data collection, technology, and...
View Article10 Data Science Jobs That Are in Demand
Data science continues to be a vital field, sparking innovation in everything from healthcare to finance. Even with the tech layoffs of 2023, data science jobs were largely spared, highlighting their...
View ArticleProject Tutorial: Answering Business Questions Using SQL
In this project walkthrough, we'll explore how to use SQL for data analysis from a digital music store and answer critical business questions. By working with the Chinook database—a sample database...
View ArticleHow to Choose the Right Cloud Service Provider for Your Team
Many development teams spend more time than necessary comparing Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), often getting lost in endless feature lists as if shopping...
View ArticleSetting Up Apache Airflow with Docker Locally (Part I)
Let’s imagine this: you’re a data engineer working for a company that relies heavily on data. Your everyday job? Extract data, transform it, and load it somewhere, maybe a database, maybe a dashboard,...
View ArticleProject Tutorial: Predicting Insurance Costs with Linear Regression
In this project walkthrough, we'll explore how to build a linear regression model to predict patient medical insurance costs. By analyzing demographic and health data, we'll develop a predictive model...
View ArticleIntroduction to Snowflake
Snowflake is one of the most in-demand tools for modern data engineers. Many companies use it as part of their analytics pipelines and cloud data platforms, and it's frequently listed as a required or...
View ArticlePySpark Tutorial for Beginners – Install and Learn Apache Spark with Python
Imagine you're an analyst who's responsible for analyzing customer data for a growing e-commerce company. Last year, your Python pandas scripts handled the data just fine. But now, with millions of...
View ArticleCloud Setup for Airflow (Part II)
You’ve built and tested your ETL pipeline locally using Apache Airflow and Docker — well done. But running it on your own machine has its limits. What happens when your laptop is off? Or when other...
View ArticleWorking with RDDs in PySpark
In the previous tutorial, you saw how to set up PySpark locally and got your first taste of SparkSession, the modern entry point that coordinates Spark's distributed processing. You saw how the Driver...
View ArticleIntroduction to Docker
Have you ever heard someone say, "Well, it worked on my machine"? It’s one of the most common problems in software and data workflows: something works fine in your local environment, but completely...
View ArticleWorking with DataFrames in PySpark
In our previous tutorial, you learned about RDDs and saw how Spark's distributed processing makes big data analysis possible. You worked with transformations, actions, and experienced lazy evaluation...
View ArticleUsing Spark SQL in PySpark for Distributed Data Analysis
In our previous tutorials, you've built a solid foundation in PySpark fundamentals—from setting up your development environment to working with Resilient Distributed Datasets and DataFrames. You've...
View Article