Dataframe Creation with Pandas

Pandas not only allows you to read in dataframes, but it also lets you create them. Pandas and python give coders several ways of making dataframes. One of the ways to make a dataframe is to create it from a list of lists. As with any pandas method, you first need to import pandas.

import pandas as pd

pd is the typical way of shortening the object name pandas. A list of lists can be created in a way similar to creating a matrix. The list items do need to have a specific order according to the descriptive column name, which you will supply later.

Method 1

The conventional way of making a list of lists is to set a variable equal to a bunch of lists, each in brackets with an additional set of square brackets surrounding the group of lists.

This list depicts a band with their first name and instrument (or vocals for a singer).

When making a dataframe, it is a good practice to name the columns if the column names were not part of the list of lists.

df = pd.DataFrame(this_band, columns = [‘name’, ‘instrument’])

In this example, you set the variable df equal to the pd.DataFrames object comprised of this_band with columns, “name”, and “instrument”.

Then, to see the results of this creation, you can print the dataframe.

print(df)

The print statement, in this case, results in a well-formatted chart with name and instrument columns.

You will notice that pandas has added an index because you did not pick one of the columns to be the index.

You could have picked the first name to be the index, although it would not make the best choice because first names can be repeated.

To set the index to name, you would use the command pd.DataFrame command set_index equal to the column name that you want in brackets and quotes.

So in the first line after creating the dataframe, instead of printing it, you could set the index.

df.set_index([‘name’])

Method 2

Another method for making a dataframe is to form a dictionary that holds lists for each column.

For example,

Because this method uses a dictionary, you will need to surround it with curly brackets.

Again, if you print out the dataframe before naming an index, you will get the same output as the previous example.

Instead of selecting an index column from the already created columns, you can also make your index list when the dataframe is created.

df = pd.DataFrame(this_data, index =[‘231-44-7865’, ‘844-23-1976’, ‘931-22-7451’, ‘777-11-8990’])

If you print out this dataframe you get the following:

Method 3

A third, and perhaps the method with a slightly longer code for making the same dataframe, also uses dictionaries. You set each item equal to the name of the column. This will only be useful if just one item is in each column. Otherwise, a list will have to be linked to each column heading.

If you import numpy, you could also use a numpy array to make a dataframe.

df=pd.DataFrame(np.array([[30, 5, 7], [21, 51, 16], [17, 81,19]]), columns=[‘x’, ‘y’, ‘z’])

Then,

print (df)

So you get the following:

Method 4

A fourth way to make a pandas dataframe is to merge two lists with the zip function.

Again, this dataframe would produce a new index.

When you print out the results, you get:

References

Different ways to create Pandas Dataframe. Geeksforgeeks.org. retrieved from https://www.geeksforgeeks.org/different-ways-to-create-pandas-dataframe/ on Dec. 3, 2019.

pandas.DataFrame. Pydata.org. Retrieved from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html on Dec. 3, 2019.