Skip to Main Content

Python in Digital Scholarship

This guide will provide an introduction to using Python in research and instruction and what resources are available in the Freedman Center.

Data Visualization in Python

Effective data visualization is key for both exploratory analysis and presenting results. Python offers robust libraries for creating a wide range of charts and plots to visually represent data. The two foundational plotting libraries in Python are Matplotlib and Seaborn.

  • Matplotlib: Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It provides low-level plotting functions and a high degree of control over plot appearance. With Matplotlib, one can create basic charts like line graphs, scatter plots, bar charts, histograms, and more complex figures by combining subplots, customizing axes, colors, annotations, etc. Matplotlib’s API (especially its pyplot module) is quite similar to MATLAB’s plotting, which makes it familiar to those with an engineering or scientific background. For example, using Matplotlib, this code would produce a histogram of the values in the measurement column.

    import matplotlib.pyplot as plt
    plt.figure(figsize=(6,4))
    plt.hist(df['measurement'], bins=20)
    plt.title('Distribution of Measurement')
    plt.xlabel('Value')
    plt.ylabel('Frequency')
    plt.show()
  • Seaborn: Seaborn is a high-level library for statistical graphics built on top of Matplotlib. It provides a more concise interface and beautiful default styles for common statistical plots. Seaborn integrates closely with Pandas DataFrames, so it’s easy to plot columns without much data manipulation. For instance, you can create a boxplot or violin plot of a variable grouped by category with one function call (sns.boxplot(x='category', y='value', data=df)). Seaborn also includes many built-in themes and color palettes that make charts aesthetically pleasing by default, which is helpful for publications or presentation. It excels at visualizing distributions (histograms, KDE plots), relationships (scatter or line plots with regression fits), and categorical comparisons (bar plots, box plots, etc.) with minimal code.

When creating visualizations, always ensure clarity: label axes, include units, and use appropriate chart types for the data. Python makes it possible to programmatically generate high-quality figures which can be saved in formats like PNG or SVG for inclusion in documents. This reproducibility (the code that generated a figure can be shared) is a significant advantage in research and teaching. In summary, Python’s visualization libraries enable both quick exploratory plots and polished figures for publication, all within the same environment used for analysis.

In addition to Matplotlib and Seaborn, other visualization tools include Plotly and Bokeh for interactive web-based graphics (useful when you need interactive charts or dashboards). These allow zooming, hovering, and are often used in web applications or interactive reports. For most academic purposes, Matplotlib and Seaborn suffice for creating figures for papers, lab reports, or exploratory analysis.