Science

Storytelling in Data Science and Business Intelligence

Nowadays, data science and business intelligence have become more important in supporting the day-to-day decision-making within organizations. Both fields span across industries, including healthcare, finance, marketing, etc. They share the goal of extracting valuable insights from data. Data science is a field that uses scientific methods, algorithms, and processes to extract insightful information from both structured and unstructured data. It encompasses various techniques, such as machine learning, statistical analysis, data mining, and data visualization to uncover relationships, patterns, and trends within data. Compared to data science, business intelligence focuses on structured data. Business intelligence tools are used to visualize data. For both fields, storytelling is meaningful. Storytelling adds more context to the data so that it is easier for decision-makers to understand the implications and they can make well-informed decisions.

Data storytelling is the skill of crafting the contextual narrative. Aside from analyzing, finding patterns in data, and visualizing data, storytelling is one of the skills you need to become a data analyst, data scientist, or business intelligence specialist.

Communicating Insights and Engagement

Insights gleaned from large amounts of data are often complex. On top of that, it is technical in nature. Data scientists and other data specialists can effectively communicate their findings to decision-makers or other stakeholders by crafting a compelling story around the data. Stories are more memorable than just raw data and statistics. Make your story more engaging by giving some real-life examples. Data visualization improves the understanding of the data and it facilitates the interpretation of the data.

Data Visualizations to Enhance Storytelling

Data scientists can use various Python libraries to visualize data. For example, Matplotlib, Seaborn, and Plotly are the most common libraries used to visualize data. The Plotly package enables us to produce interactive, publication-quality graphs. For example, you investigate the occurrence of malaria in Africa in the past decade. You can visualize the number of cases of malaria by year making an interactive plot with Plotly. The interactive plot looks like a video.

Another powerful business intelligence tool data specialists use is Power BI. Power BI offers us a wide range of visuals, including bar charts, line charts, pie charts, scatter plots, maps, gauges, and many more. It enables users to interact with visualizations by clicking on them, filtering data, and selecting data. Users can easily publish reports and collaborate as a team.

How to create Data Stories?

Knowing your audience and objectives is the first thing to consider when creating your data story. Who is your audience and what is your audience interested in? Tailor your story to their interest and expertise level. And what message do you want to convey?

Structure your narrative into a proper flow. Start with introducing the topic and give some background information. In this section, you introduce the problem statement or question. Also, if necessary, share some past trends or industry trends. In the middle, you present the main findings and insights. Spice up your data with some visuals. End with a conclusion or summary. At the end, discuss the implications and make some room for discussion. Feel free to share a summary of your story in a PDF report or MS PowerPoint.

My Example of Data Storytelling

Also, I have a good example myself of data storytelling. In 2021, I followed the IBM Data Science Track on Coursera. For the Capstone Project, students were tasked with creating a presentation using MS PowerPoint to showcase their findings. I opted out of creating the MS PowerPoint presentation and instead delivered my findings on Lydia’s Library.

Check out my data storytelling about the best neighborhood in Vancouver:

https://lydiaslibrary.home.blog/2021/03/04/my-coursera-ibm-data-science-project-battle-of-neighborhoods-in-vancouver/

Data Science: Important Skills you should master to become a Data Scientist

Data Science is the hottest profession of the 21st century and the demand for data scientists continues to grow. Nowadays, data is everywhere and is generated from different sources. Unraveling data makes sense to identify patterns in data that could contain valuable information. To become a data scientist, one must at least have a university degree. Besides, online learning tools could be useful to get the right hard skills to become a successful data scientist.

1. Learn Programming Languages

Learning a programming language is key to succeed as a data scientist. Most commonly used tools in data science are R Studio and Python. These tools could be used for statistical computing, building models, and data visualization. Online learning tools, such as Coursera and DataCamp, provide courses to learn R programming and Python. R programming works with various packages from the library. These packages contain functions. To install a package, you should use the library() function. When learning Python, it is good to know that you have to install the packages the first time you use Python on your computer. Scientific libraries, that are used in Python, include Pandas, NumPy, and SciPy. In most online introduction courses, you learn arithmetic with the programming language, how to add a comment, how to assign variables, how to create a data frame or a matrix, some summarizing commands, and the print statement.

2. Machine Learning

One can use Machine Learning (often abbreviated as ML) to make predictions from a model or function. Major Machine Learning techniques include regression, classification, and clustering. In Machine Learning, we distinguish two types of learning, supervised learning and unsupervised learning. As a data scientist, you need to know the difference between supervised and unsupervised learning, and the commonly used techniques in both fields.

Supervised learning:

Supervise literally means ‘observe‘. With supervised learning, you could ‘teach‘ the model. And how do you exactly ‘teach‘ the model? Right, we load the model with knowledge. In supervised learning, both the input data and desired output data are provided. The data in supervised learning is labeled.

There are two types of supervised learning techniques, including classification and regression. Classification involves the class of an item. With classification, you can predict the item’s class or category in which an item belongs. On the other hand, regression does predict a continuous value instead of a discrete value. The most basic prediction model from regression is the linear model.

Unsupervised learning:

In unsupervised learning, data is unlabeled and the model works on its own to discover information. A method of unsupervised learning is clustering. Clustering is used to group data points.

3. Data Visualization

Data visualization is a very important feature to gain insight from data visualization. Data visualization refers to a graphical presentation of your data, such as bar plots, histograms, scatterplots, boxplots, etc. Visually displaying your data could easily detect outliers. Also, displaying data in a visual format makes it easy to comprehend, and it is a quick method of getting valuable insights from your data.

In packages such as R Studio and Python, you can use plot functions to make bar plots, histograms, scatterplots, boxplots, etc.
In Python, you have to install a visualization package first. The most widely used package for plots and graphs is Mathplotlib. Also, the Seaborn package could be used to make plots to display heat maps, time series plots, violin plots, etc. A commonly used function to display a graph in R Studio is the ggplot function.

4. Know how to work with Unstructured Data

As a data scientist, it is critical to know how to work with unstructured data. Unstructured data contains undefined content, for example, social media posts, customer reviews on websites, videos, and blog posts. According to IBM, around 80 percent of all generated data is unstructured.

Pages: 1 2