Governments and other agencies repeated many important surveys at regular time intervals

Governments and other agencies repeated many important surveys at regular time intervals, but the population mean is estimated mainly using the latest survey. Time series estimators for the population mean using repeated surveys are superior to those obtained from the last survey. This superiority may be affected by several factors such as the sampling variance, the number of surveys, and the ARMA model coefficients and orders among others. The main objective of the paper is to compare the time series estimator for repeated surveys developed by Scott, Smith, and Jones (1977) and the last survey estimator using extensive simulation studies. Furthermore, the impact of the factors that may affect the efficiency of the time series estimator is also investigated.
Repeated surveys are usually used in economics and social science, by industry, government and research institutions to identify the characteristics of the population under study. It is often done at regular basis to monitor not only the level of some parameters of interest but also their changes in the intervals between surveys. It is also used to build up information on the trends of the phenomenon. The time series analysis for repeated surveys can be used to obtain more accurate estimation of the survey responses corresponding to the parameter of interest when compared to the traditional sam- ple survey approach of separate parameter estimation at each time period (See Scott, et al (1977).
In Egypt, Many surveys are repeated at fixed time intervals (monthly, quarterly, annually, or semiannually) such as Labor force surveys conducted by the Egyptian CAMPAS, and business sur- veys. Other surveys are repeated on an occasional basis. Examples include Exit polls, and monitor- ing TV ratings. The time-series nature of these repeated surveys is seldom taken into account, which leads to lose the benefits of using the time series analysis of repeated surveys. The repeated nature of these surveys data creates a need for estimation procedures that combine information available from different times to produce the best estimates for the current mean or (percent estimators).
Using repeated surveys in estimating mean or percent of some indicators was subjected to many studies through the previous decades. Starting with Jessen (1942), Patterson (1950), Blight and Scott (1973), Scott and Smith (1974), Feder (2001), Van den Brakel and Krieg (2009), and many other studies tried to use different time series methods in repeated surveys. In section 2 the literature of the time series methods used in repeated surveys is presented. A simulation study to compare between the last survey estimate and the time series estimator developed by Scott, (1977) is provided in section 3.In section 4, an application of time series methods on the yearly Egyptian unemployment rate data obtained by the Egyptian CAPMAS (1980-2012) using the labor force survey is presented.
The time series repeated surveys estimators depend on the past information obtained from previous estimates to improve the current estimates. Time series repeated surveys estimators could have lower variance than the corresponding traditional estimators (see Steel, D. and Mclaren, G. (2008), and Haslett, S.J.(1986)). The improved time series repeated surveys estimators is no longer a function defined only on the sample at one time period t, but instead defined on all samples taken between two fixed time periods 1 and t. Thus, the estimator is derived from t periods of data. Previous studies on analyzing repeated survey methods can be classified into two categories; the Classical Method (non-stochastic approach), and the Time Series Methods (stochastic approach). The pioneer work of the classical methods was that of Jessen (1942) for the sample of two occasions. Yates (1960) extended Jessen’s work for the case of constant sample size and a fixed replacement fraction on each occasion. Patterson (1950) generalized the results of Jessen to the case of sampling on h occasions (surveys) with partial overlap of sampling units. Eckler (1955) obtained a minimum variance estimate of the population value using the method of rotation sam- pling. Cochran (1963) developed the optimum overlap for Patterson’s (1950) estimator when all samples are of equal size. Rao and Graham (1964) used the rotations sample design to develop a unified finite population theory for composite estimators of the current levels. Singh (1968) studied the effect of using multi – stage sample design on Patterson’s estimator. The time series approach can be a solution for problems of the classical methods in which any relationship between successive values of the population parameters is ignored completely, therefore using the classical methods requires the individual unit values be available to the analyst, however in many cases the individual units are not available for analysis, and the secondary analysis must be used, which in turn requires using the time series methods. For the time series methods, it can be classified into two categories; the ARMA method approach, and the state space approach. Considerable interest has been shown in the methods of estimation for repeated surveys using time series analysis (ARMA method approach); Blight and Scott (1973) discussed the effect of ignoring the relationship between successive values of the population parameters using a Bayesian approach. They also explored the effect of using such relationship in the estimation process. Scott and Smith (1974) used a more general approach than that used by Blight and Scott (1973). They assumed a stochastic model for the population means with a stationary process or a non- stationary process which can be reduced to a stationary form using differencing or subtracting deterministic components. Scott, et al (1977) extended the results of Scott and Smith (1974) to surveys of complex designs, and applied their results on both overlapping and non-overlapping surveys. Some comparisons of Blight and Scott (1973), Scott and Smith (1974) approach with that of Patterson (1950) using the mean square error of the current mean was given by Jones (1979). Jones (1980) derived a general form for the estimators derived by Blight and Scott (1973), Scott and Smith (1974), and Patterson (1950) using the least square theory. The estimators of the population means obtained by Patterson, Blight and Scott, and Scott and Smith could be obtained from Jones’s estimator using the general- ized least square method. The problem of stationarity is also considered by jones (1980). A new approach of the time series methods for repeated surveys is that of the state space approach. This approach consists of measurement, and transition equation. The population means are then estimated using the Kalman filter technique. Unlike the ARMA method approach, the state space approach can be useful in the cases where a small number of observations are used, or having a data with missing observations. Subsequent work for the state space approach by Tam (1987), Srinath and Quenneville (1987), Pfeffermann (1991), Feder (2001), Silva and Smith (2001), Lind (2005), Pfeffermann and Tiller (2006), Sadik and Notodiputro (2007), Van den Barkel and Krieg (2009), and Krieg and Van den Barkel (2012). The simulation design in the current study is discussing two main cases: Case 1: Correct specified time series models
Case 2: Miss-specified time series models
Under each case, we investigate the following factors: (1) The sampling variance (S2)
The survey sampling errors were assumed to follow the normal distribution with zero mean and a sampling variance (S2), where the sampling variance takes the values 0.5, 1, or 2. (2) Number of surveys (t)
Different number of surveys t (the number of the sample means) were used to trace the efficiency of the following two estimators. The time series estimator obtained by Scott, et al (1977) with the following equation: The numbers of surveys used into the simulation are 10, 20, 50, and 100, where 10 represents smallest number of surveys and 100 represent the largest number of surveys.
To check the effect of the series size, the selected number of surveys t was created using the last t values in the series of the sample means, so that the values for the sample means included in the smaller values for t, will be included in the larger values for t (if t = 20, then the sample means of the series of size 10 (t =10) were included in the series). (3) The ARMA model coefficients DifferentparametervaluesfortheARMAmodelcoefficients(?? and?e)wereconsideredto investigate the effect of parameter uncertainty on the efficiency of the time series estimator (? = 0, 0.2, 0.5, 0.7, or 0.9 for AR(1), and MA (1).
Based on the above design, we have all different scenarios given in table 1. Also, the boxplots of the time series estimator are taller than that of the last survey estimator as 50% of the MSE of the time series estimators are between 0.30 and 0.64 for AR(1), 0.28 and 0.68 for AR(2), 0.32 and 0.80 for MA (1), and 0.32, and 0.87 for MA (2) whereas 100% of the MSE of the last survey estimator are around 0.5 and 2 for the different time series models (this is logically true as the MSE of the last survey estimator is the same as the sampling variance S2, and that may indicate that the simulation is correctly implemented). From figure (1), we conclude that the values of the MSE of the time series estimator are more homogeneous than that of the last survey estimator, and they are centered on small values of the MSE which reflects the efficiency of the time series estimator. The analysis of repeated surveys using time series methods is seldom taken into account. The esti- mation of the mean of the phenomena usually depends on the last survey, although it is more effi- cient to use the time series analysis in the estimation process as the time series for repeated surveys estimators could have lower variance than the corresponding traditional estimators. This was con- firmed using the simulation study which indicated that the mean square error (MSE) of the time It is shown from the current study that the values of the 2 MSE’s get larger when the sampling ˆ ˆ
variance is larger, and also the superiority of ?t to ?t has a positive correlation with the value of 2 ˆ ˆ
S . For the different number of surveys, it is also proved that the superiority of ?t to ?t gets larger when the number of surveys is larger. However when using different values for ? , there is a random
pattern between these values and the MSE of (?t ) using the different time series models. The results
also show that when using miss-specified time series models, the MSE of the time series estimator is always less than that of the last survey unless a small number of surveys are used. These results are supported using the practical example of the unemployment rate which indicated the efficiency of the time series method in estimating the unemployment rate than using the last survey estimator as its confidence interval and hence the variance are very small compared to that of the last survey estimator.