This tutorial will show you how to use the free NHGIS service to obtain U.S. Census data that you can use in GIS analysis and mapping.
This tutorial was written with beginners in mind. If you are familiar with GIS, consider reading the section titled "Concise Instructions for People With GIS Experience" at the bottom of this document.
One of the most common geospatial data requests we receive at the AGSL involves U.S. Census data. If your research involves racial and ethnic groups, housing, language, employment, education, wealth distribution, or any of the other metrics the U.S. Census measures, chances are you would like to have this data too. Obtaining the numbers on their own is easy enough — see American FactFinder — but adding a spatial dimension to your research requires a different approach.
The National Historical Geographical Information System (NHGIS), a service provided by the Minnesota Population Center at the the University of Minnesota, is one of the best ways to get this data. If you're learning about GIS at UWM, you likely want a shapefile that you can use for analysis in ArcGIS or QGIS. Data from NHGIS can also be adapted for web maps, interactive applications, and cartography.
Another advantage of NHGIS: It can be used to obtain data from as far back as the first U.S. Census in 1790. Not only does it provide the raw numbers, but you can also obtain shapefiles containing historical Census boundaries.
NHGIS can be intimidating to navigate. While the team behind it has done an excellent job balancing usability and comprehensiveness, demographic data is by nature multifaceted and complex. Once you get used to it, however, you will undoubtedly end up relying on NHGIS a lot for your research. It is absolutely worth learning to use.
The end result of this tutorial will be a map of Milwaukee County showing the percentage of Black residents in each Census Tract using the most current data. This will take less than one hour to accomplish. The process for examining other locations, years, variables, and enumeration units will not be much different. Feel free to follow along by using the same data as our examples, or other data of your choosing. We will use ArcMap to work with the data we obtain.
A few other notes:
If you require further assistance, please e-mail the AGSL GIS data team at firstname.lastname@example.org. Happy mapping!
The first thing one must do is create a Minnesota Population Center account. The process is quick and painless. A permanent record of all extracts (data) you obtain will be tied to your account, allowing you to access or revise them later.
Now the fun part. On the NHGIS homepage, click the Select Data link on the left menu. You will see a page similar to this:
These filters will allow us to tell NHGIS what specific data we want.
Select Geographic Levels to specify your enumeration unit. In our example, we want to obtain data on a Census Tract level.
Click on the yellow plus sign next to Census Tract (or whatever geographic level you will be using). This will add that level to the Selected Geographic Level Filters area at the top of the page. Note that you can select as many units as you want, but this may add a layer of confusion to your process. Do whatever you are comfortable with. Once you are satisfied with the level(s) you have chosen, click Submit at the bottom of the window.
A bunch of rows will now appear on the page. There are far too many to look through with such a limited filter; let's add more.
Click on Years. If you only want to look at U.S. Census data, the Decennial Years column will be your only concern; U.S. Census data is collected every ten years. Some options will be greyed out, but you can still select them. The greyed-out options are years for which data is not available given the other filters you've selected. For example, since we chosen Census Tracts as our geographic level, every decennial year prior to 1910 is grey. This is because Census Tracts were not used before 1910!
Select the year(s) you're interested in, and hit Submit. We'll use the most current data, which is from the 2010 Census.
The rows under Select Data will update once more, but there are still a lot of records. Let's click on Topics to really narrow it down.
Find the topic(s) you wish to map in the list that pops up. In our experience, most UWM students are concerned with Population data, but there are other categories that can be accessed using the tabs on the left. Again, options may be greyed out if data for those topics cannot be obtained given the other filters you have selected. We're interested in measuring the African American population in Milwaukee County, so we will select Race as our topic.
The first plus sign next to Race is the Table Topic Filter; the second is the Breakdown Filter. You may click the blue question mark next to each of these terms at the top of the table to learn more. For the purposes of this tutorial, we will just note that we are only interested in the Table Topic Filter, and only in very niche projects would one be interested in the Breakdown Filter. Click on the first plus sign and Submit once more.
We could select Datasets to narrow down our search even further, but it is not necessary considering the specificity of the other three filters. By all means take a look at the options available to you, but at this point most of them will probably be greyed out.
Underneath the Apply Filters section is a table containing the results of your search. (You should be looking underneath the Source Tables tab, not Time Series Tables or GIS Boundary Files.) Unless you're looking for some very obscure data type, chances are you will still have a significant amount of rows to search — and on multiple pages too! Click on the column headers to help make sense of it all. Sometimes there is no easy way to find the data table you want. Search through every record and apply more filters if possible.
As you might imagine, many researchers want to use race data. If we click on the Popularity column heading, we can see that one row is much more popular than the others.
It is likely this first row is the data we want. Let's confirm it by seeing what information the data table contains. Click on the table name, P3. Race, to open a window detailing the contents of this dataset.
Don't worry if most of this doesn't make sense to you. At the very least, we can determine from the Universe field that this table measures the entire population of a geographic level. "Census Tract" is one of those available geographic levels. Finally, it looks like "Black or African American alone" is one of the variables available to us. In other words, this dataset contains the total population of people who identify as Black or African American in each Census Tract. That is exactly what we're looking for.
Exit the Data Table Details window to return to the filter search results. Click on the yellow plus sign on the left to add this data to our Data Cart. You can select multiple datasets if you would like — this could be useful if, for instance, you are not entirely certain that you've selected the right dataset.
You may notice that we've been using the term "data table." That's because when you download data from NHGIS, it is delivered to you in the form of a table — the kind of thing you could open in, for instance, Microsoft Excel. It has no associated geographic coordinates, and it's definitely not a shapefile.
In the same area where you selected which dataset you want, click on the GIS Boundary Files tab. If you've provided adequate Geographic Levels and Years filters, the list of results will be mercifully short. In fact, our query only returns one boundary file!
This boundary file contains all Census Tract boundaries used in the 2010 U.S. Census. We'll add it to our Data Cart by clicking on the yellow plus sign. The cart will update accordingly.
The worst part is officially over. Congratulations! We still have some work left to do, however. NHGIS will provide a data table and boundary shapefile, and it will be our job to join them. Our data will include every Census Tract in the U.S., but we are only concerned with Milwaukee County — so we will need to trim the data too. All of this is easy, and if you have some GIS experience you may already know how to do it on your own.
But we're not going to worry about any of that until we actually have our data. Click Continue in the Data Cart, then click on it again. Keep all of the options as they are by default. You may want to add a brief description of your project, so you can find this dataset later to modify or re-download. Click on Submit to be brought to your Extracts History.
Note that the status of your new extract is queued. It takes a little time for NHGIS to process a simple dataset like ours; more complicated extracts will, of course, take longer. Take a break for a few minutes. By default, NHGIS will send you an e-mail once your data is ready for download. Refresh the page until your extract's status is completed.
You will need to download the table and gis as .ZIP files separately. Do so, and then extract them once they have finished. (If you're unable to extract your data from the .ZIP file, try 7-Zip.) Note that boundary shapefiles come in the form of a .ZIP file within another .ZIP file, and you'll need to extract both.
All of your data is now ready for use.
If you have ever taken a GIS course, this should be familiar ground for you. If not, no worries. The ability to easily perform joins in GIS software is one of the advantages of using NHGIS, and it's a big one.
First take a look at the table you downloaded. Along with the .CSV file containing the data itself, you received a codebook (.TXT file) in the same directory. Unfortunately, the columns in the data table you just downloaded are often not self-explanitory: for this project, there is no clearly labeled "% Black or African American" column. This codebook is a necessary part of understanding your data. Open it and take a look.
You can quickly glance at the Context Fields. What we're mostly interested in is the part of the document starting with "Breakdown."
Our variable of interest, "Black or African American alone," is associated with the column heading H7X003. Because we want to map the Black population as a percentage of the total population in a Census Tract, we should also know that H7X001 is the column for the total population in a tract. Take the time to note all of the columns that you will want to use.
Keep both the boundary file and data table you downloaded somewhere on your computer where you'll be able to find it later.
It's time to open ArcMap. Note that what we'll be doing can be done in pretty much any GIS software, such as the free QGIS. Most UWM students will develop their GIS skills with ArcGIS, and it is available on all public computers across campus that are running Windows. This includes the computers on the first floor of the library.
We'll work on a new, blank map document. Use the Add Data button to add both the .CSV table and shapefile you downloaded from NHGIS to your document. If that sentence made no sense to you, see the ArcGIS documentation: Adding layers to a map.
If your computer is old, this next part will require some patience. Your boundary shapefile contains boundaries from the entire United States, so ArcMap may take a little while to render it all. If your computer is having difficulty processing so much data, you can temporarily pause map drawing on the bottom left corner of the map view.
We need to associate the data in the .CSV data table with the boundaries in our shapefile. This will be done by joining the tables. In GIS, the term join has a very specific meaning. In our case, it involves finding a column that both our .CSV data table and our shapefile's attribute table have in common, and merging the two into one big table.
The intricacies of joins/relates in GIS are beyond the scope of this document. If you want to learn more — and as a GIS analyst you really must! — check out the ArcGIS documentation: About joining and relating tables.
If you have not done this sort of thing in the past, right click on one of the file names in the left column — in our example, US_tract_2010 and nhgis0017_ds172_2010_tract.csv — and open the (attribute) table. Thoroughly examining the table of any data you download is good practice. Do you see which field we're going to use to join the tables? If it's not immediately obvious, you're thinking about it too much. Look at the names of all the columns!
NHGIS very conveniently includes a GISJOIN column in all of its data. In other words, they've done a big part of the hard work for us! Right-click on the shapefile name (for us, US_tract_2010), hover over Joins and Related, and click on Join.
The drop-down should already have "Join attributes from a table" selected, so leave that as it is. The first blank field prompts you to choose a field from the current file for your join. That would be GISJOIN. The second drop-down should already have your data table selected, but if not, do that. Finally, the last part asks which field you're going to base the join on — again, that would be GISJOIN. It doesn't matter which option you select under Join Options, though you might as well play it safe and select the first. Your window should look something like this:
It's generally a good idea to Validate Join and make sure everything matches up as it should. Since there's so much data here, your computer may struggle to get through validating every field. Give it a go if you'd like some peace of mind, and then click OK.
Open the attribute table for your boundary shapefile once more. Scroll all the way to the right. Do these column names look familiar? They should be the same ones you examined in the codebook earlier. Congratulations — all of your data is now in one convenient (and, frankly, huge) shapefile!
If you want to make a map of the entire United States, you're pretty much ready to go. If not, we still have some work to do. Regardless, take some time to appreciate the fact that you can now, with very little effort, make a detailed thematic map of the whole country. And you could do it with anything in the U.S. Census. Hopefully you feel that all of the effort you put into getting here was worth it!
There are a few ways to trim your data so it only includes your area of interest. The easiest method uses FIPS codes, which are already included in NHGIS data.
Check out the data table for your shapefile. You will find two columns: one begins with STATEFP, the other with COUNTYFP. They will be followed by a number; in our case, this is 10, because we are using data from 2010. Each state and county in the United States has a unique combination if FIPS codes. A simple way to find the FIPS code for an area is to use the official U.S. Census website: 2010 FIPS Codes for Counties and County Equivalent Entities. The FIPS code for Wisconsin is 55; for Milwaukee County, 079.
Click on the icon on the top left of the attribute table window, then select Select by Attributes. Alternatively, you can use the Select menu at the top of the main ArcMap window.
This is where you will tell ArcMap which data you want. Leave the Method box as the default Create a new selection. Find STATEFP10 — it may be preceded by your shapefile name — and double click on it. Click on the equals sign:
=. Now, click on Get Unique Values, which will return all state FIPS codes present in your data. In our example, we're looking for Wisconsin's code, 55. Find it toward the bottom of the list and double click on it. Your WHERE statement in the bottom box should now read something like
"US_tract_2010.STATEFP10" = '55'.
We are interested in all Census Tracts that have both the Wisconsin state FIPS code and the Milwaukee County FIPS code. This means our query will include the word
AND, which means all results of our query must include both FIPS codes. In SQL,
AND is an operator — something used to compare values in a query. For more information, see SQL AND & OR Operators at W3Schools.
Complete the second half of the query, using the same method above but with the COUNTYFP field instead. If you did it correctly, your final statement should look something like this:
"US_tract_2010.STATEFP10" = '55' AND "US_tract_2010.COUNTYFP10" = '079'
ArcMap will read this query like instructions, in much the same way you are reading this tutorial. In plain English, what you have effectively told the software is: select all of the records in the US_tract_2010 shapefile where the state FIPS code is 55 and the county FIPS code is 079. If the state FIPS code for a Census Tract is 55, but the county FIPS code is not 079, that Tract will not be included.
Click Apply. It may difficult to see, but ArcMap has selected the entirety of Milwaukee County. Zoom and pan your map to see for yourself.
As you may notice, however, the rest of the U.S. is still present. Indeed, selecting alone does not isolate your data. Right-click on your shapefile name in the Table of Contents on the left, hover over Selection, and click on Create Layer From Selected Features. Remember this option — it will save you a lot of time and energy throughout your GIS career!
You should have a new layer listed on your Table of Contents; it will be called something like "US_tract_2010 selection." Go ahead and remove your original shapefile and data table (right click them and select Remove), since all of our data is now concentrated in this brand new layer. Zoom in a bit if necessary; you will notice that you have isolated your area of interest!
You have the data you need. The rest is in your hands. Symbolization, projection, ancilliary map elements — these are all things you will need to consider when constructing your final map. If you are enrolled in a GIS class, you will undoubtedly learn the skills to symbolize your map to your needs and liking.
We wrote this document with beginners — people with little or no GIS experience — in mind. If you have some familiarity with basic GIS operations, perhaps from a GIS course, you may find our in-depth approach extraneous. The following text is an abbreviated list of instructions. If you need additional help with a step, reference the respective section in this tutorial.
ANDoperator if you are using more than one. In our example, the SQL statement was
"US_tract_2010.STATEFP10" = '55' AND "US_tract_2010.COUNTYFP10" = '079'. Run the query.