Python for Data Visualization: Getting Started with Matplot
Data is often represented in different fields in a variety of forms, hence knowing how to represent their data is one of the key skills for any individual who has to deal with data. They will be able to identify patterns, identify trends and also identify relationships within the data. Matplotlib: is one of the most common libraries used in Python when it comes to visualising data. So, Matplotlib is robust, adaptable, and user-friendly, which makes it an ideal package for novices to the field of data science and analytics.
Here we will take a more detailed look into areas that you can work on using Matplotlib specifically in data visualization when it comes to the basic concepts and the plots that can be constructed.
What's Matplotlib?
Matplotlib is an open-source plotting library which is built on top of the Python programming language. It was purposefully created with the idea of developing comprehensible static, animated and interactive visualizations in the language of Python. It is capable of producing a wide variety of graphs such as line plots, bar graphs, histograms and scatter graphs which makes it one of the most important tools among data analysts and data scientists.
The reference object for Matplotlib is the multiple plot Try warning with plenty of examples. The main task of the try is to allow a person to create rich selling and integrative plots that represent a complex set of data in a simple and easy to understand way. The library is built on top of NumPy libraries, so it is also easy to use for numerical and scientific computations.
Installing and Configuring Matplotlib
First and foremost, let's begin by installing Matplotlib. If you utilize the Anaconda Python environment, this software package may already be on your system. In other circumstances, you may obtain it via the pip command line. The installation command goes as:
Installation
pip install matplotlib
When Matplotlib has finished downloading, you may call it in your python code with this statement:
Importing Matplotlib
import matplotlib.pyplot as plt
Considering Name-related issues, the abbreviation of the term needed has become popularized so that the term 'plt'
is used so often. In this manner, all the functions and classes associated with Matplotlib that you require for your plots and visualizations are easily accessible.
Drawing Your First Graph
In the beginning, let's make a line plot which is the type of graph that depicts the changing of a variable against time or any given data. To be able to create a line plot, you will need x,y coordinates as the input, in which the letter x represents the independent variable while y represents the dependent variable.
Take a look at this simple figure:
Simple Line Plot
import matplotlib.pyplot as plt
x = [1,2,3,4,5]
y = [1, 4, 9, 16, 25]
plt.plot(x, y)
plt.show()
From this code we can see that:
- The Plot is created by the command plt.plot(x,y)
Where x is along the horizontal axis and y vertical axis.
- The command plt.show()
displays the plot created.
This code will create a simple graph in the form of a line which will show the increasing trend of y in relation to x.
Customizing Your Plot
You will find that Matplotlib has almost no limits in terms of customizing the looks of your plots. You can change the aspect of the graph in terms of colors and styles and you would still be allowed to put in a title and the x and y labels along with the legend. Some modifications would include the following:
Customized Line Plot
import matplotlib.pyplot as plt
x = [1,2,3,4,5]
y = [1, 4, 9, 16, 25]
plt.plot(x, y,color='green', linestyle='--', marker='o', markersize=8)
plt.title('Sample Line Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.grid(True)
plt.show()
In comparison:
- The line color was changed to green with color='green'
- We used dashed lines with linestyle='--'
- Each datapoint on the graph will have a round marker with marker='o'
- Each symbol on the new graph is scaled up with markersize=8
- To insert the title into the visualization use the command plt.title('Sample Line Plot')
.
- The axis labels are also included as one puts x in plt.xlabel('X Axis')
and y in plt.ylabel('Y Axis')
.
- Additionally, to improve the plot, you can use the command plt.grid(True)
to add a grid layout.
These customizations enable you to enhance the clarity and aesthetics of the plot and the information displayed in the graph.
Types of Plots in Matplotlib
Matplotlib provides a wide range of plot types intended for data visualization from different perspectives. Most widely used ones are:
- Line Plot: Most suitable to use when showing any trends over time or when visualizing a time series.
- Bar Chart: Also used mostly for the comparison between items in different categories.
- Histogram: This type of graph shows the number of data points that fall within a range of values, called bins.
- Scatter Plot: A graph that plots two sets of data points against each other to present the relations between them.
- Pie Chart: Helps to visualize proportions between parts of data in a circular format.
Similar suites of customizations would also be applicable for other plots such as pie graphs or scatter plots, thus allowing you to adjust it to other characteristics of your data and audience.
Subplots
If the intention is to include several plots within a single figure, then the use of subplots may be warranted. It can be noted that designing subplots involves embedding various plots in a grid format in a single figure. And to do that in Matplotlib, one has to make use of the plt.subplot()
method.
For Example:
Subplot Example
import matplotlib.pyplot as plt
# Create the first plot
plt.subplot(1, 2, 1) (rows, columns, index)
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.title('First Plot')
# Create the second plot
plt.subplot(1, 2, 2)
plt.plot([1, 2, 3, 4], [1, 2, 3, 4])
plt.title('Second Plot')
plt.tight_layout()
plt.show()
In this case:
- The proposed grid consists of 1 row and 2 columns with the first plot located at the first position of the grid when plt.subplot(1, 2, 1)
is used.
- The second plot is positioned at the second slot when the command plt.subplot(1, 2, 2)
is given.
- The option plt.tight_layout()
places the two subplots in their respective positions and automatically adjusts the spacing between them so that they do not overlap with each other.
Saving Your Plots
Given that the desired figure has been created and styled, it is possible to save it as an image file. To facilitate this, the method called savefig()
is provided in matplotlib. It can be indicated what the filename will be and accordingly its extension such as PNG, JPEG or on the other hand PDF.
An approach for making use of this feature is as follows or try it yourself using the example below:
Saving Plot
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
plt.plot(x, y)
plt.savefig('line_plot.png') Save the plot as PNG image
plt.show()
Subsequently, the diagram will automatically be available in your working folder as line_plot.png. Other options that may be specified include resolution (DPI) as well as the option of saving the image with a transparent backdrop.
Conclusion
In conclusion, Matplotlib is an important part of Python for visualization of data since in a short format you can write the code for a range of outlines and charts. Therefore, by working with the basic concepts of Matplotlib you are able to bear the greatest understanding of the data hence aiding in the analysis. And as you keep using Python for the purpose of data visualization, you will notice that with Matplotlib you will be able to afford to showcase your data in aesthetically pleasing and meaningful ways.