Workshop Report-Methodological Advances and the Human Capital Initiative NSF 97-97
Methodological Advances and the Human Capital Initiative | Longitudinal Data: Issues of Design and Analysis | Data Integration | Establishment Surveys | Data Collection Procedures in Survey Research | Survey Research and Measurement Issues | Methods for Linking Diverse Approaches to Understanding Behavior | Recent Projects Related to the Human Capital Initiative Supported by the MMS Program | Workshop Participants | ADDENDUM | FOOTNOTES |

Workshop Report

Methodological Advances
and the
Human Capital Initiative

NATIONAL SCIENCE FOUNDATION

This document reports results from a workshop on "Methodological Advances and the Human Capital Initiative" held 12 July 1996 at the National Science Foundation. The views and comments contained in this document are not necessarily those of the National Science Foundation, but are instead exclusively those of the workshop participants. For further information or additional copies of this report, contact Cheryl L. Eavey, National Science Foundation, at (703) 306-1729, or CEAVEY@NSF.GOV.

Methodological Advances and the
Human Capital Initiative

Since Fiscal Year 1995, the National Science Foundation has provided funding opportunities for research on human capital issues. NSF's Human Capital Initiative (HCI) supports fundamental research "which advances basic understanding of the causes of the psychological, social, economic, and cultural capacities for productive citizenship" (NSF 95-8). In particular, background material for the initiative identified research agendas for six high priority thematic areas: Workplace, Education, Families, Neighborhoods, Disadvantage, and Poverty (NSF, 1994).¹

The Human Capital Initiative offers researchers a unique opportunity to address important substantive issues, ones that often require new ways of conceptualizing or combining data, new or modified methods of analysis or new formal models for relating the constructs of interest to observable data. The HCI thus provides researchers a vehicle for advancing the measurement, methodological, and statistical components of their disciplines. Such advances, made in the context of one or more disciplines, open up issues not previously amenable to empirical or theoretical analysis.

The Methodology, Measurement, and Statistics (MMS) Program of the National Science Foundation invites proposals that embed advances in methodology, data analysis, and/or formal modeling within the context of well-justified substantive research issues, as well as more generally. MMS recognizes that methodological developments relevant for human capital issues may require substantial background, both substantively and methodologically. MMS thus encourages collaborations across the social, behavioral, economic, and statistical sciences. Proposals for conferences and/or workshops on methodological topics appropriate for addressing HCI issues also are welcome.

In order to stimulate discussion of methodological needs in human capital research and to identify potential areas of research, the Methodology, Measurement, and Statistics Program convened a workshop on 12 July 1996 to address the topic "Methodological Advances and the Human Capital Initiative." Workshop discussions were informed by and built upon the agenda for HCI research as described in the various NSF brochures and announcements. The document, Investing in Human Resources: A Strategic Plan for the Human Capital Initiative, outlines a strategy for HCI research "designed to increase understanding of the nature and causes of existing problems and to evaluate the effectiveness of policies aimed at improving the human resources of America's citizens" (NSF, 1994). In addition to formulating research agendas for HCI's six substantive areas, the report also briefly suggests data and methodological needs. Data needs identified include the extension of longitudinal data sets, the collection of data from multiple sources, and embedded studies that merge alternative forms of empirical analysis. Methodological needs identified include expanded methodologies for dynamic modeling and models that link micro-level behavior of individuals with macro-level institutions and environments.

MMS workshop participants agreed with the focus on understanding causal relations and on the importance of the data and methodological needs identified in the strategic plan. Indeed, the centrality of longitudinal data for addressing many important human capital questions led to long discussions of design and analysis issues relevant for studies based on panel data. Ultimately, discussions converged on the following topics:

1) Design and Analysis of Longitudinal Data
2) Data Integration
3) Establishment Surveys
4) Data Collection Procedures in Survey Research
5) Survey Research and Measurement Issues
6) Methods for Linking Diverse Approaches to
Understanding Behavior

Examples of specific methodological questions related to each of these topics are given below. These topics are intended to be illustrative of some important areas of methodological and/or statistical research relevant to the MMS Program and consistent with the goals and strategic plan of the Human Capital Initiative.

Longitudinal Data: Issues of Design and Analysis

Longitudinal studies play a central role in many human capital research projects, and renewed attention to the design of such studies is critical to understanding a broad array of human capital issues. Many longitudinal studies begin with a single or a small group of cohorts and follow them over time. Samples need to be refreshed, however, either because of attrition or changes in the population after the cohorts were initiated (e.g., immigrants). In addition, often there is interest in generalizing beyond those specific cohorts to a larger population.

Most longitudinal studies approach design issues that follow initial sample selection on an ad hoc basis; thus, we have limited cumulative knowledge and systematic study to bring to bear on new investigations. How and when should attrition be modelled? Should data always be gathered at equally-spaced time intervals? As data are gathered over time, the units of measurement change in a dynamic fashion. Designs for continued data collection inevitably require choices that have serious implications for analysis and inference. MMS welcomes proposals addressing questions of design in longitudinal studies.

Beyond questions of design, the analysis of longitudinal data invariably leads to specific methodological problems. Advances on the topics identified below would enhance the value of longitudinal data for addressing complex human capital questions.

Nonparametric methods. Large longitudinal data sets, some containing hundreds of thousands of person-years in observations, often are useful for addressing issues related to HCI. For example, major datasets on labor issues, such as the National Longitudinal Survey (NLS) and the Panel Study of Income Dynamics (PSID), can be used to study the number and duration of poverty spells for individuals with different levels of education. Motivated by the sensitivity of results to specific functional form assumptions, recent research has developed less restrictive procedures for use in large samples. These include classical smoothing procedures, neural networks, and Bayesian hierarchical models. MMS welcomes research that further refines and applies such nonparametric methods to human capital issues.

Discrete choice. Many individual decisions in the human capital accumulation process are discrete; for example, decisions to leave or return to school, fertility, etc. Recent advances have made it possible to study simple models of sequences of discrete choice over time, such as labor market participation. Methods for inference in structural models of dynamic decision making are a promising line of research for understanding the human capital investment process.

Individual, cohort, and age effects. Several important policy questions address changes over time in individuals' responses to opportunities available to them, such as propensities to invest in education. Separating secular changes from individual heterogeneity and changes of the life cycle raise specific methodological questions. The ability to address these policy questions is determined, in large part, by advances in methods that address these issues.

Data Integration

Micro-level data presents many problems including availability, reliability, and continuity. If we chose, for example, a problem set that seeks to relate the environment experienced in inner city neighborhoods by young males to their economic productivity, a researcher would have to look at health, physical environment, crime, education, and economic standing (among other factors) to help explain the productivity of these citizens. No single source of data is likely to be adequate. Integrating data from disparate sources and/or different measurement scales presents a host of statistical, procedural, and computational problems.² Proposals for research that address the barriers to data integration across space, time, and sources will benefit researchers working in many empirical settings. Methodological issues that require additional research include the following:

Small-area data estimation. As analysts strive to explain the decisions made by economic agents, there is increasing pressure to move from macro to micro, and ultimately, individual-level scales of analyses. Moving to higher levels of resolution often requires estimating small-area attribute values (for example of households or places of work) from larger units of analysis. We are just beginning to understand how to perform these estimations.

Estimating missing values. A particularly pressing problem is the estimation of missing values for individual-level data that generates samples with large numbers of zeros due to privacy concerns issues and/or missing measurements.

Boundary value estimation and/or transformations. The artificial truncation of a spatial process presents particular concerns of the value of the recorded measurements in the areal units at the boundary. This is similar to the problem of truncation in event history analysis. Corrections that have been proposed are arbitrary in theory and computationally intensive.

Extraction and exploration techniques. As different kinds of agencies move to collect data in a geo-referenced framework, researchers will have to deal with the computational and storage burden this referencing entails. Even when researchers have no interest in maintaining the geo- referencing, they may require techniques to extract the data from the data set. Further, faced with the volume of data described above, they may need to develop new tools for data visualization as a means of exploring such data and developing preliminary research questions.

Establishment Surveys

Many substantive questions highlighted by the strategic plan for human capital initiative have to do with the performance of organizations. Examples include the capacity of educational institutions or training vendors to contribute to building a skilled workforce, the capacity of employers to renew and increase the training and skills of their employees, and the ability of work organizations to innovate and to produce products and services effectively and efficiently. Both short- and long-term organizational performance -- current output levels as well as the infrastructural capacity to sustain and increase outputs -- are of importance.

"Establishment surveys," in which the units studied are organizations, such as workplaces, schools, hospitals, or agencies of the government, are natural vehicles for addressing such questions. In such surveys, one or more individual informants provide data on behalf of the establishment. Establishment survey data are sometimes integrated with individual-level data on organizational members, with archival data on the places, or with industries that constitute an establishment's setting or competitive environment.

Establishment surveys have long been used as components of systems of national accounts, for the estimation of population totals such as employment or output levels. Scholars in the social sciences are now turning to establishment surveys for different purposes -- to develop, for example, knowledge about work organizations, school processes and effects, the development and diffusion of human resources practices, and innovation, as well as to study schools and work organizations as contexts for learning and skills acquisition.

The methodological literature on establishment surveys is much less extensive than that on surveys of individuals, and many methodological problems involved in such studies have been little-studied or are poorly understood. Other problems result from the changing purposes of establishment surveys and the changing nature of organizational phenomena. Problems requiring attention include, but are not limited to, the following:

the definition of units of analysis;
the development of suitable sampling frames for establishment surveys, and/or the analysis of the coverage and biases of existing frames;
the appropriate manner in which to integrate establishment survey data with other archival information, especially when sources of data define units in ways that overlap only partially;
the appropriate way in which to conduct longitudinal establishment surveys, when units may change over time, for example as a result of mergers and spinoffs;
the manner in which respondents within an establishment should be selected, and the consequences of different methods of respondent selection for data quality;
the capacity of respondents to report accurately and reliably on different organizational properties and phenomena;
the development of appropriate instruments for measurement of key variables such as workplace organization, environmental setting, skill acquisition, or establishment performance; and
the handling of missing data problems that arise as a result of nonresponse or selective availability of data in archival sources.

Data Collection Procedures in Survey Research

Social data are affected by choices made in design and implementation of data collection procedures. For example, choices of how questions are asked, the details of sampling frames, actual times and conditions of measurement, etc., must be made in every data collection. Examples of how such choices affect social data include the considerable work on the effects of how questions are asked, which respondents data are collected when more than one respondent might have data, and the effects found in the National Assessment of Educational Progress (NAEP) of question context and time of measurement. In many cases, the actual choices made are not the only ones possible. The choices are to some extent arbitrary, and reasonable people might choose other alternatives that most researchers would regard as equally valid.

Such effects of design and context might be considered as introducing their own components of variance, which are part of the uncertainty of the resulting information, but which is not captured in estimates of sampling standard error. That is, if we consider several surveys which made different detailed design choices, those surveys would produce estimates that are much more variable than would be expected due to sampling errors alone.

Policy decisions typically are concerned with questions that are broader than the particularities of a single data collection design. For example, policy makers want to know how social class is related to reading achievement, not how social class measured in a specific way is related to a score on a particular set of achievement test items. The standard method of calculating uncertainty in information and policy analyses is based on sampling uncertainty; but for the reasons outlined above, sampling standard errors alone provide an underestimate of the uncertainty in the data and in summaries produced from it.

Studies that provide insight about more realistic estimates of uncertainty of information produced by human capital research would be highly desirable. Such studies might include systematic investigations of variations in procedures and models for reasonable distribution of variation in those procedures. Studies of actual replications that have already been conducted might serve as "natural experiments." Ideas for general methods that might be broadly applicable would be particularly interesting.

Survey Research and Measurement Issues

Research designed to meet the goals of the Human Capital Initiative may require conceptual and definitional innovations in existing (canonical) categories of social measurement. MMS welcomes proposals aimed at improving the core concepts of human capital research, the classification systems used in surveys, and the implications of categories and concepts embedded in administrative data systems. Among the most obvious are the core concepts grounding the social, behavioral, and economic sciences, including household, race and ethnicity, neighborhood, establishment, or occupation. Boundary questions and categorization issues abound in such concepts. The social sciences have always been sensitive to these questions. The core concepts and classification systems themselves are the result of the assumptions and procedures of earlier generations of users, which themselves were embedded in the technologies and questions of interest to their creators. Nevertheless, the consciously interdisciplinary focus of MMS and the Human Capital Initiative provide an opportunity to foster the development of new methods attuned to the particular substantive issues raised in the intersecting conceptual and classifying devices used to analyze human experience.

Questions include, for example, whether the current concept of a household adequately captures the diverse living arrangements of Americans, including the phenomena of blended families, children in joint custody, or even the homeless. How can researchers model appropriately the multiple ethnic identities of Americans and discover and test the salience of particular categories? How do concepts and classification schemes taken from existing data systems affect the research process of the secondary data user? What new concepts and classification systems need to be developed to meet the needs of the Human Capital Initiative? How can or should one develop common measurements to capture the experience, for example, at 'home' and 'work'? What measurement schemes are required for units of analysis at different levels of aggregation, for example, poor people versus poor neighborhoods?

Methods for Linking Diverse Approaches
to Understanding Behavior

Understanding the objective and subjective determinants of behavior is among the most challenging problems facing empirical researchers seeking to understand schooling decisions, labor supply, and other behaviors related to human capital formation. How do individuals form expectations about the consequences of alternative actions? How do the decisions people make depend on these expectations, on their preferences, and on the constraints they face? Different behavioral and social science disciplines have used distinctive empirical research strategies to address these fundamental questions. Economists have sought to infer the structure of decision making almost entirely from data on actual choices. Sociologists and social psychologists have collected and analyzed data on attitudinal measures elicited from respondents in sample surveys. Cognitive psychologists have conducted experiments aimed at understanding the subjective constructs that people use in framing alternatives and reaching decisions. MMS invites proposals that aim to extract the best elements of these strategies, to creatively synthesize them, and to improve upon them. Proposed research might, for example, seek to integrate experimental and survey approaches to illuminate issues of common concern.

Recent Projects Related to the Human Capital
Initiative Supported by the MMS Program

SBR-9422901: "Estimation of Hierarchical Models with Dichotomous Outcomes in Small-Sample, Social Research Settings"
Michael Seltzer, University of California/Los Angeles

This project uses the Gibbs sampler to develop and implement estimation strategies that will enable researchers to obtain robust estimates of parameters and appropriate intervals in applications of hierarchical models with dichotomous outcomes in small-sample, social research settings. Guidelines for proper implementation and use of these strategies will be developed through analyses of a series of simulated data sets and through analyses of the data from two studies: A multi-site evaluation of a dropout prevention initiative, and an NSF-funded study of the effects of different mathematics.

SBR-9631387: "Project to Revise the Historical Labor Statistics of the United States"
Susan B. Carter, University of California/Riverside

This project will revise Chapter D (Labor) of the United States Census Bureau's Historical Statistics of the United States. Historical Statistics is a massive, two-volume compendium of 54 chapters on topics touching all of the social, behavioral, humanistic, and natural sciences. This award assists in a major collaborative effort to produce an updated, revised, expanded, and electronically-accessible "millennial edition" of Historical Statistics. In addition to a completed revision of Chapter D, this project will develop a protocol for the revision of the remaining chapters of Historical Statistics.

SBR-9515136: "Improving Within-School and School-Community Systemic Linkages for At-Risk Students"
Kenneth K. Wong, University of Chicago
Larry Hedges, University of Chicago

This project investigates empirically the impact of recent federal reform initiatives legislated by Title I of the Elementary and Secondary Education Act on the narrowing of the achievement gap between educationally at-risk students and their more advantaged peers. The project makes use of the comprehensive, Congressionally mandated Prospects data files, which consist of standardized reading and math achievement scores for a nationally-representative sample of nearly 40,000 students, and detailed information regarding the students themselves, and their schools, classrooms, and families. The investigators' analyses will generate national estimates of the extent and intensity of these reform activities, and will produce empirically-based paradigms for the improvement of federal Title I programs and the schools that serve at-risk students.

SBR-9423018: "Causal Inference Applied to Income Effects"
Donald B. Rubin, Harvard University
Guido Imbens, Harvard University

The objective of this project is to measure validly the treatment effects of giving additional income to low and middle income families. The study uses the Massachusetts State Lottery as a natural experiment in which some families are randomly assigned additional income and some are not. Subjects in both the treatment and control group will be surveyed by mail and by phone. The use of this natural experiment will allow the researchers to make valid inferences about the effects of additional income on these families using a rigorous definition of causality. The data from the surveys will be linked to earnings records from the Social Security Administration.

Workshop Participants

Margo Anderson
History and Urban Affairs
University of Wisconsin

Carl Amrhein
Department of Geography
University of Toronto

Cheryl Eavey
Methodology, Measurement, and
Statistics Program
National Science Foundation

Stephen Fienberg
Department of Statistics
Carnegie Mellon University

John Geweke
Department of Economics
University of Minnesota

Larry Hedges
Department of Education
University of Chicago

Charles Manski
Department of Economics
University of Wisconsin

Peter Marsden
Department of Sociology
Harvard University

John Sprague
Department of Political Science
Washington University

Thomas Wallsten
Department of Psychology
University of North Carolina, Chapel Hill

ADDENDUM

Methodology, Measurement, and Statistics Program

The Methodology, Measurement, and Statistics Program (MMS) is an interdisciplinary program in the Division of Social, Behavioral, and Economic Research at the National Science Foundation that supports fundamental research in three primary areas:

- Research on methodological aspects of new or existing procedures for data collection; research to evaluate or compare existing data bases and data collection procedures; and the collection of unique databases with cross disciplinary implications, especially when paired with developments in measurement or methodology.

- The methodological infrastructure of social and behavioral research.

Up-to-date information on the program, including recent awards lists and announcements of special funding opportunities, is available on the MMS Home Page:

https://www.nsf.gov/sbe/sber/mms/start.htm

The Foundation provides awards for research and education in the sciences and engineering. The awardee is wholly responsible for the conduct of such research and preparation of the results for publication. The Foundation, therefore, does not assume responsibility for the research findings or their interpretation.

The Foundation welcomes proposals from all qualified scientists and engineers and strongly encourages women, minorities, and persons with disabilities to compete fully in any of the research and education related programs described here. In accordance with federal statutes, regulations, and NSF policies, no person on grounds of race, color, age, sex, national origin, or disability shall be excluded from participation in, be denied the benefits of, or be subject to discrimination under any program or activity receiving financial assistance from the National Science Foundation.

Facilitation Awards for Scientists and Engineers with Disabilities (FASED) provide funding for special assistance or equipment to enable persons with disabilities (investigators and other staff, including student research assistants) to work on NSF projects. See the program announcement or contact the program coordinator at (703) 306-1636.

The National Science Foundation has TDD (Telephonic Device for the Deaf) capability, which enables individuals with hearing impairment to communicate with the Foundation about NSF programs, employment, or general information. To access NSF TDD dial (703) 306-0090; for FIRS, 1-800-877-8339.

FOOTNOTES:

1). NSF brochure #95-8 outlines funding opportunities and proposal submission information for the Human Capital Initiative. The companion document, Investing in Human Resources: A Strategic Plan for the Human Capital Initiative, was prepared for the NSF and provides a synthesis of the reports of working groups convened at the request of the Foundation to produce research agendas for high priority HCI areas.

2). Examples of data integration include combining qualitative with quantitative data; aggregating point-form data into areal data; integration across space, time, amd space/time dimensions; and integration of data from different quantitative data collections (e.g surveys) for the purposes of making general references (meta-analysis).

NSF 97-97