Style your Data Analysis

Panda with “style”.

Have you ever feel that table output looks boring and very simple when you check it in Jupyter Notebooks?.
Tables allow your data consumers to gather insight by reading the underlying data. However, there are often instances where leveraging the visual system is much more efficient in communicating insight from the data. Knowing this, you may often find yourself in scenarios where you want to provide your consumers access to the underlying data by means of a table, while still providing visual representations of the data so that they can quickly and effectively gather the insight they need.
In this article I would like to share with you some ways to improve the gathering data experience with pandas native styling functionality. This styling functionality allows you to add conditional formatting, bar charts, supplementary information to your dataframes, and more.

Options/Settings API

Pandas allows us to customize some parts of its behavior through its Option/Setting API. It our case we will use display-related options to improve our pandas experience when we have certain issues:

  • There are too many columns/rows in the dataframe and some columns/rows in the middle are omitted on display.
    For example, if one would like to show at most 7 rows and at most 7 columns one would:
  • Columns containing long texts get truncated and columns containing floats display too many / too few digits only on display:

You can check more use cases here.

Time to style !

In a lot of cases for example the fact of have a way to know if a value is negative or positive can represent an advantage when analyzing data. To accomplish this we are going to use the dataframes’s style property.

Let’s build a function that colors values in a dataframe column green or red depending on their sign:

def color_negative_red(value):
"""
Colors elements in a dateframe
green if positive and red if
negative. Does not color NaN
values.
"""

if value < 0:
color = 'red'
elif value > 0:
color = 'green'
else:
color = 'black'

return 'color: %s' % color

Yeah !, we are returning a CSS style. Now let’s apply this method to our dataframe. DataFrame.style property returns a Styler object that have some useful methods to formatting and displaying dataframes. One of those methods is applymap that allows to apply our function to the respective dataframe.

df.style.applymap(color_negative_red)

If your data set have non-numeric values you can select the columns to which the function will be applied using the subset argument passed to applymap method i.e subset=[“column1”, “column2”].

Housig prices dataset

You can also apply arbitrary CSS to the dataframe elements using the Styler object’s set_table_styles() method:

# Set CSS properties for th elements in dataframe
th_props = [
('font-size', '11px'),
('text-align', 'center'),
('font-weight', 'bold'),
('color', '#6d6d6d'),
('background-color', '#f7f7f9')
]

# Set CSS properties for td elements in dataframe
td_props = [
('font-size', '11px')
]

# Set table styles
styles = [
dict(selector="th", props=th_props),
dict(selector="td", props=td_props)
]


(df.style
.applymap(color_negative_red, subset=['longitude','latitude'])
.set_table_styles(styles))

Great ! we are styling, but this is just a small part of what you can do using pandas dataframe styling functionality. Other ways to style include apply custom background color gradients and custom captions, amongst other things:

# Set colormap equal to seaborns light green color palette
cm = sns.light_palette("green", as_cmap=True)

(df.style
.background_gradient(cmap=cm, subset=['latitude','longitude'])
.highlight_max(subset=['longitude','latitude'])
.set_caption('This is a custom caption.')
.set_table_styles(styles))

I you have seen the data set that I am using there are some columns that represent currency values, a good idea would be to add a dollar sign and a comma to this data. To do it we can use the format method of Styler property.

df.head().style.format({"median_house_value": "${:20,.0f}", 
"median_income": "${:20,.4f}"})

Conclusion:

The above examples are just simple cases, I invite you to go deeper and style your dataframes so that you can gather data easier from your tables and give more color to your data analysis.
I am open for any suggestion or correction, Thanks a lot for reading.

Resources:

Software Developer and Science lover always learning.