|
February 1993 - Volume 9:4 [Printout | Contents | Search ]
By Lauris Olson, Pat Hildebrand, and Janusz Szyrmer With the publication of the 1990 Census of Population and Housing, the U.S. Census Bureau continues its leadership in developing ways to handle large amounts of demographic, social, and economic data. The decennial census pioneered punch cards and electric tabulating machines (1890), population sampling techniques (1940), automated data entry using optical scanning (1960), and electronic data releases (1970). For 1990, the Census Bureau has enhanced public access to census data with CD-ROM releases and a variety of geographic data products based upon a new nationwide digital map and geographic database. The Census Bureau's new efforts have increased the opportunities available to census data users at Penn. Like the 1980 Census data products, the new 1990 Census electronic data products will be available at Social Science Computing's Social Science Data Center, but for the first time they are also available at the Reference Department of Van Pelt Library. Although both sites may obtain the same title, each site will receive it in different formats. This means that the answer to your census question requires an awareness of not only data product content and structure, but also the available formats and their capabilities.
1990 Census electronic data releasesEarly in 1990, every housing unit in America received a 1990 Census "short-form" questionnaire; one of every six housing units received a "long-form" questionnaire. The public won't see the completed questionnaires again until 2062, when confidentiality laws allow them to be released as the popularly named "schedules." In the interim, however, the Census Bureau publishes printed reports and electronic data products presenting data summarized from the questionnaires.Summary Tape Files, or STFs, present questionnaire data as subject- specific tables for hierarchically arranged geographic areas: e.g., states, urbanized areas and metropolitan statistical areas, counties, municipalities and places, and census tracts and blocks. Organization of the STFs reflects the two questionnaire forms. "Short form" data-- basic information like age, race, family structure, owned/rented housing, and number of rooms--appear in STF 1 and STF 2. The sample, or "long form," data are released in STF 3 and STF 4; sample subjects include income and poverty status, education, ancestry, mortgage costs, and kitchen facilities. Why four Summary Tape Files for two data components? STF 1 and STF 3 present basic tables for the whole population, with some tables repeated, or "iterated," for racial groups; STF 2 and STF 4 use the same respective data, but iterate each table for individual racial groups. Although the public must wait 72 years to see individual questionnaires, two data files--the Public Use Microdata Sample (PUMS) files--provide a five percent sample of each county's individual questionnaires and one percent of each metropolitan statistical area's questionnaires. The questionnaires in these files are stripped of all identifying information to protect confidentiality. The STFs and PUMS files allow the use of 1990 Census data at different levels of analysis. The PUMS questionnaire responses can be used to generate profiles of personal characteristics or to correlate diverse attributes at an individual level within a large region. The STFs describe communities at a number of geographic levels.
1990 Census products at SSDCThe campus source for electronic census data has long been the Social Science Data Center (SSDC). SSDC has PUMS of various sample sizes and geographic areas for the years 1900, 1910, 1940, 1950, 1960, 1970, and 1980. For the 1990 Census, SSDC will obtain one percent and five percent PUMS in the near future. Several 1990 Census data products are currently available at SSDC: P.L. 94-171 (congressional redistricting data) and STF 1A ("short form" data down to block groups) for various states, STF 1B (block level "short form" data) for Pennsylvania only, STF 1C (the "National File"--"short form" data for the entire U.S., states, counties, and metropolitan areas), STF 3A (sample data down to block groups) for various states, and the Modified Age/Race, Sex and Hispanic Origin State and County (MARS) file.These Census Bureau data products, like most SSDC data, are obtained from the Inter-University Consortium for Political and Social Research (ICPSR). Therefore, use of this data must be in accordance with ICPSR by-laws. ICPSR stipulates that the data be used only for educational purposes, such as classes, or for academic research. Further, the data cannot be used at a location other than the University of Pennsylvania without express written permission from ICPSR. The data format at SSDC is generally for analysis, not for look-up, i.e., in most cases, it is raw data. Census data products from ICPSR are received in the original formats released by the Census Bureau. However, some datasets put together at Penn either by SSDC or the Population Studies Center use SAS formatting. In addition, many ICPSR files have been put together for special purposes, such as merging data with the Panel Study of Income Dynamics. These special files can be especially useful to researchers trying to combine data from the decennial census with other studies available at SSDC. In most cases, SSDC obtains datasets only when they are requested. When a request has been made through the SSDC consultants' office, the time needed to receive the data is dependent on a number of factors, but generally it takes a minimum of two weeks. A basic rule of thumb is the larger the data file, the longer the time. SSDC will process requests from outside the School of Arts and Sciences on an individual basis.
1990 Census products at Van Pelt LibraryThe Census Bureau currently plans to release 13 of its 20 electronic data products on CD-ROM. As part of the U.S. Government Printing Office's library depository program, Van Pelt Library's Reference Department will be receiving all of these CD-ROM files, along with the entire printed 1990 Census report series. Each data release will be available at Van Pelt Library in its entirety on a walk-in basis. At present, the complete CD-ROM STF 1 series is available for the entire U.S., and the STF 3 series is awaiting completion. Some data not yet released on CD-ROM can be obtained at Van Pelt Reference using CENDATA, the Census Bureau's fee-based online database.
The Library and SSDC hold census awareness presentations as part of their regularly scheduled orientation programs STF data on CD-ROM are in dBASE III Plus format, and include several dBASE index files to speed access. The STFs can be manipulated using any dBASE-compatible software. At Van Pelt Reference, in addition to dBASE IV, two DOS applications provided by the Census Bureau are ready to use the STFs. GO is a menu-driven program for viewing, and printing or downloading, selected data tables for a specific location; it can also be used to download a data table for a class of locations. EXTRACT allows the user to select data items from several data tables, select data for multiple geographic areas, query data, and create user- defined data items. Both GO and EXTRACT are easy to use. They do not require programming skills, let alone training. But they are not too powerful. It is easier and faster to use them to create a subset of data to be taken away and reworked at your office using dBASE, Excel, or other software. The biggest obstacle to using the 1990 Census on CD-ROM is a technological one. The files are huge, and processing a query can take 30 minutes or longer. To help census users walk away with large chunks of 1990 Census data quickly, Van Pelt Reference has created STF 1 datasets for each county in the eight-county Philadelphia, PA-NJ Primary Metropolitan Statistical Area. These datasets were created by selecting records for each county, county subdivision and place, census tract, and block group, and compressing them to fit onto diskettes: the STF 1 file for Philadelphia--2,230 records in more than 20 Mbytes--has been compressed onto two diskettes. Software is available at Van Pelt Reference for archiving other datasets.
1990 Census geographic dataSSDC has on hand the principal geographic reference data tapes for the 1990 Census: TIGER/Census Tract Comparability File for matching 1980 and 1990 census tracts, Geographic Reference File--Names, and TIGER/Census Tract Street Index File for matching street addresses with census tracts.Van Pelt Reference has 1990 Census geographic data products on CD- ROM. The principal geographic dataset, the TIGER/Line files for the entire U.S., is already available at the Library. The geographic reference data files, as well as census tract and block map images on CD-ROM, are expected at the Library during 1993. The TIGER/Line files contain digital mapping data. That is, they provide latitude and longitude coordinates for points, lines, and polygons representing the geographic areas identified in the 1990 Census; additional data include ZIP codes and address ranges for urban areas. The Library has the capability to display maps drawn from TIGER data, but the TIGER files are more effectively used in geographic information system (GIS) applications: at Penn, successful TIGER-using GIS packages include ARCInfo and ARCView, GIS Plus, SAS/Graph, TransCAD, and AtlasPro. As with the STF data, TIGER files for the eight-county Philadelphia, PA-NJ PMSA are available compressed on diskettes. Archiving software is available at Van Pelt Reference for compressing TIGER/Line files for other regions from the CD-ROMs.
Training and informational opportunitiesIt's easy to learn more about the 1990 Census at Penn. The rapid changes taking place in campus computing make the SSDC consultants' office (398 McNeil, 898-6454) the best information source for current SSDC data holdings and accessing census data through SSDC. For Van Pelt Library's census data holdings, use Franklin, the Penn Library's online catalog (search FCAT using t=1990 census); view PennInfo's Library menu to see press releases, product descriptions, and sample tables; or ask at the Van Pelt Reference Desk.The Library and SSDC hold census awareness presentations as part of their regularly scheduled orientation programs. Van Pelt Library's Noontime CD-ROM demonstrations include the 1990 Census data products. Check Penn Printout or PennInfo, or call Van Pelt Reference (898-8118). Reference Librarians provide instructional sessions for Penn courses using census data, and, of course, they are ready to sit down and help any individual. Contact the SSDC consultants' office for SSDC presentations and assistance. To assist Penn faculty, students, and researchers using 1990 Census data, the Social Science Data Center and Van Pelt Library are sponsoring a one-day Penn Census Users seminar in mid-March 1993. Speakers will discuss a variety of topics, including 1980 and 1990 data comparability and confidentiality, and Penn and other census data users will describe their research, which includes the use of geographic information systems.
LAURIS OLSON is a Reference Librarian at Van Pelt Library. PAT HILDEBRAND is Database Administrator, and DR. JANUSZ SZYRMER is Associate Director, at the Social Science Data Center.
|