Starting With ggplot2 in R

Updated on January 26, 2018
Harsh Diwan profile image

Harsh is a SQL DBA having experience of over 13 years. He has worked on most technologies related to SQL Server. He loves politics , reading

Using ggplot2

ggplot2 Package
ggplot2 Package

Introducing ggplot2 package

One of the most important aspects of data analytics is visualization of the data. Visualization is probably the most powerful aspect that allows you view your data from different angles. It also allows you to put your conclusions across the board very powerfully. An image is worth a thousand words.

R has thousands of different packages that can do variety of tasks. ggplot2 is one such package which is designed for creating and displaying plots.

So in this article, I am going to show how we can construct a plot using ggplot2 in R from scratch. I am going to start with a blank plot and then add elements to it to build some basic plots.

Note that the package is named ggplot2.

The actual function that we will use for creating plots is named ggplot.

This can be confusing but I am afraid that's how they are named.

So the package is ggplot2 and function that we use from that package is named ggplot.

Right.

Objective

The objective of this article is not to build a fancy and amazing plot. The objective is to introduce the reader to the process of building a plot bit by bit from scratch.

Using this article you should be able to understand different elements of ggplot2 plotting system and how to use them. However please note that this is just a basic introduction of ggplot2 plotting system. In reality ggplot2 is very powerful but extremely vast plotting system and you can easily write a book on it.

However, this post will cover some basic building blocks of a ggplot graph and build 3 graphs using those basic building blocks.

Building blocks of ggplot

Before we do that, we need to understand the basic building blocks of a ggplot graph.

  • Plot – This is the plotting area on which we will build the plot.
  • Data – This is the data that will be used in the plot.
  • Aesthetic mapping –This is the organization of your data on the plot. This tells ggplot which data points go on which axis, what color they should be, that shape they should be etc. Aesthetic mapping basically controls the visual aspect of the geometric objects that we plot.
  • Geom – These are the different geometric objects that we will place on the plot area. They can be shapes like a dot for a scatter plot, lines, curves etc. These objects represent your data on the plot.

Each of these blocks is represented by functions in R. So basically for each of these blocks, we will write a function.

There is a lot more to ggplot than this, but for the time being we will start with actually seeing how these 4 elements work.

Let's get started

So without any further ado, let’s fire up R and start building a ggplot graph.

But before you can start exploring ggplot2, you need to install it if you already haven’t done so.

Install ggplot2 package

install.packages("ggplot2") 

Once this installation completed successfully, let’s load this package.

library(ggplot2)

Create a blank plot

Now that we have installed and loaded ggplot2 package, let’s build a plot from scratch. So first we need to build the first element we introduced earlier.

Plot – This is the plotting area on which we will build the plot.

ggplot()

That’s it. Note that the function name that we used is ggplot. It is not ggplot2. ggplot2 is the package name which contains this function.

This will create an empty plot. You should be able to see this in plots window of R Studio.

Blank plot created using ggplot
Blank plot created using ggplot

Feed data to plot

Now, let’s move to the second point.

Data – This is the data that will be used in the plot.

Let’s give some data to ggplot. This will not be plotted. But we are just making some data accessible to the plot. Also please note that ggplot only accepts data frame object as the data. It will not accept a matrix, vector, list or any other data type. I don’t understand this limitation but that’s how it is.

For this demonstration I am going to use an inbuilt data set in R named iris. This is part of the base R and you don’t need to install any additional package for this.

You can see what this data is by running the following command in R.

head(iris)
IRIS data set
IRIS data set

As you can see, it has 5 fields. 4 of these fields are numeric and the last one is categorical. This data set is measurements of 150 flowers of 3 different species of IRIS flower.

This data set has 4 numeric measurements and one field identifying the species of the flower. Now, we will use this data set and see how we can plot this data using ggplot2.

Now, let’s feed this data to the ggplot. You do this by passing a parameter named data to ggplot function as shown below. The data that we fed to ggplot is a data frame named iris.

ggplot(data = iris).

Aesthetic mapping

Your plot will still be blank. By this command, we have just passed the data frame iris to ggplot. Now let’s get to the third point.

Aesthetic mapping –This is the organization of your data on the plot.

Now we will define the aesthetic mapping for the data. In its simplest form, we just define what data needs to go on X axis and what needs to go on Y axis. You do this by passing another function named aes to ggplot function.

ggplot(data = iris , mapping = aes(x = Sepal.Length , y = Sepal.Width))

By this command, we have told ggplot to put Sepal Length on X axis and Sepal Width on Y axis. Now, let’s take a look at our plot. It looks like this.

Plot with X and Y axis mapping
Plot with X and Y axis mapping

Earlier the plot was blank. Now we can see two axes. On X axis we see Sepal Length and on Y axis we can see Sepal Width. It has also plotted a nice little grid based on values of Sepal Length and Sepal Width.

But we still don’t see any data points on the plot. All that our command has done is to format the plot. That’s exactly what ggplot function will do.

Now we will get to the fourth point.

Geom

The actual plotting of data on the plot will be done by geometric objects i.e. geom. Now, let’s add the geom to our plot.

For this we add geom._* functions to ggplot function as shown below. Note that this command is not complete. But when you type up to this point, you will see a list of geom options that are available to you.

ggplot(data = iris , mapping = aes(x = Sepal.Length , y = Sepal.Width)) + geom_

You can see the options in the screen shot below. Which geom you choose depends on what kind of plot you want.

Geom options in ggplot
Geom options in ggplot

Now, let’s complete the command. For this demonstration, I will plot a scatter plot which is just points.

ggplot(data = iris , mapping = aes(x = Sepal.Length , y = Sepal.Width)) + geom_point()

Now, let’s take a look at our plot.

Our first plot with ggplot

Scatter plot using ggplot
Scatter plot using ggplot

And there you are. Your first plot with ggplot is ready.

But it’s a bit dull, isn’t it? Let’s add some color to it.

ggplot(data = iris , mapping = aes(x = Sepal.Length , y = Sepal.Width)) + geom_point(color = “red”)

Can you spot the difference between this command and the earlier one? I added a parameter called color to geom point and passed it a value of red. This tells ggplot to color all the points red.

This is how our plot looks now.

Scatter plot in red
Scatter plot in red

Well, let’s say I am bored of dots in my scatter plot and I want to change the shape of my points. I add one more parameter named shape and pass it value of 4. As you can see in the screen shot below this command, ggplot has changed shape of the points in the scatter plot.

ggplot(data = iris , mapping = aes(x = Sepal.Length , y = Sepal.Width)) + geom_point(color = "red"  , shape = 4)
Scatter plot in red but with cross instead of dots
Scatter plot in red but with cross instead of dots

Well, I suppose you get the picture, don’t you? To change the points, you add more parameters to the geom function.

What parameters you can pass depends on the geom you are using. This is just the tip of the iceberg and if you start digging deeper in ggplot, you would find the opportunities almost endless.

Now, let’s change the geom from point to a line. This will generate a line plot instead of scatter plot.

ggplot(data = iris , mapping = aes(x = Sepal.Length , y = Sepal.Width)) + geom_line(color = "red"  , shape = 4)
Line plot instead of scatter plot. Looks a bit ugly, doesn't it?
Line plot instead of scatter plot. Looks a bit ugly, doesn't it?

As you can see it changed the points to line. How about plotting one numeric variable and other categorical one?

In our data iris, species is a categorical data. It is not numeric like length or width but a class.

Let’s plot another scatter plot but instead of Sepal Length on X axis, let’s plot Species on X axis. You can see that for this, I have to change ggplot function aes. Instead of Sepal Length I have passed Species to x axis.

ggplot(data = iris , mapping = aes(x = Species , y = Sepal.Width)) + geom_point(color = "red")

And this is the output we get.

Scatter plot with one numeric variable and one categorical variable
Scatter plot with one numeric variable and one categorical variable

But generally, if you want to plot one categorical variable against a numeric variable, you might want to plot a box plot instead of scatter plot. Box plot shows median, minimum, maximum values and it also shows outliers.

So now let’s plot box plot instead of scatter plot. So now we change the geom from point to box plot.

ggplot(data = iris , mapping = aes(x = Species , y = Sepal.Width)) + geom_boxplot(color = "red")
Box plot between a numeric and categorical variable
Box plot between a numeric and categorical variable

Are you bored of red yet? Let’s change the color of these boxes and also add a fill color inside the boxes.

ggplot(data = iris , mapping = aes(x = Species , y = Sepal.Width)) + geom_boxplot(color = "purple" , fill = "black")
Box plot with color and fill
Box plot with color and fill

Summary

So till now, we have created a scatter plot, line plot and a box plot. We have added some color to it.

You can do a lot more than these 3 plots that I illustrated till now. Realistically speaking the power of ggplot is almost amazing.

This is one handy tool to have in your toolbox.

limitations

But just like any tool, it also has its limitations. It certainly can’t do some stuff that lattice can do. It’s not very good with 3D plots and you may need to use rgl for that. It can’t handle graph theory type graphs that have nodes or decision tree structures.

So that’s it for this time folks. Please let me know what you think in the comments section below. If you want any improvements in this post please let me know and I would be glad to implement our suggestions.

Comments

    0 of 8192 characters used
    Post Comment

    No comments yet.

    working

    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, turbofuture.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://turbofuture.com/privacy-policy#gdpr

    Show Details
    Necessary
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
    Features
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Marketing
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Statistics
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)