Department of Mathematical Sciences


Statistics Colloquium/Seminar Series

2009-2010

 

For more information, contact the Colloquium/Seminar Organizer, Dr. Hokwon Cho

(To see Math Dept Colloquia/Seminars, click next: Math Dept Seminar)

 

Fall 2009

 

·         Fri. 12:30 p.m. November 20, SEB-1240:  Dr. Donatello Telesca

Department of Biostatistics, University of California, Los Angeles
Title: Functional Mixed Registration Models

Abstract: Functional data often exhibit a common shape but also variations in amplitude and phase across curves. The analysis often proceeds by synchronization of the data through curve registration. We propose a Bayesian hierarchical model for curve registration. Our methodology is extended to define a class of probability models, which combine curve registration with functional mixed effects modeling, discriminating phase and amplitude variability in a joint fashion. We discuss this class of models with a focus on penalized smoothing splines and propose Bayesian inferential procedures based on Markov Chain Monte Carlo samples from the posterior distribution of the functions of interest. We illustrate the application of our model using simulated data as well as to two datasets, namely, the Berkeley study on human growth and a study on the pharmacokinetics of the drug Remifentanil. Time permitting, we will introduce a generalized view of curve registration with applications to longitudinal counts of criminal activity.

 

·         Fri. 1:00 p.m. September 25, SEB-1240:  Dr. Peter Müller

Department of Biostatistics,  University of Texas, M.D. Anderson Cancer Center, Houston
Title: Modeling Dependent Gene Expression

Abstract: We consider statistical inference for high throughput genomic data. Most traditional statistical methods implicitly assume independent sampling (conditional on some hyperparameters). Recognizing the limitations of independent modeling we develop a model that includes a simple dependence structure across genes (or proteins). The important features of the proposed model are the ease of representing typical prior information on the nature of dependencies, model-based parsimonious representation of the signal as a ordinal outcome, and the use of a coherent probability model over both, structure and strength of the conjectured dependencies.
As part of the inference we reduce the recorded data to a trinary response representing underexpression, average expression and overexpression. For proteins the trinary response is further reduced to a binary indicator for activation. To achieve this, we use an extension of a model proposed in recent literature. Inference in the described model is implemented through a straightforward Markov chain Monte Carlo (MCMC) simulation, including posterior simulation over conditional dependence and independence. We use the proposed dependence probability model to derive inference about molecular pathways, including differential pathway activation across biologic conditions.

 

Spring 2010

 

·         Seminar topics and schedule will be forthcoming.

 

2008-2009 Statistics Colloquium/Seminar

Spring 2009

 

·         Fri. 11:30 a.m. April 24, CBC C-224:  Dr. Charles Davis

President, Environmetrics and Statistics Ltd
Title: A Model for Measurements of Lognormally Distributed Environmental Contaminants

Abstract: Lognormal (LN) distributions are often assumed for environmental contaminants, with perhaps some justification. But decisions are made from measurements, not the unobservable concentrations themselves. These often do not have LN distributions. Rather, at fixed concentrations distributions of measurements are often normally distributed, and if low-level measurements are unbiased one has negative values; standard LN inference techniques fail in this setting. This reality is universally ignored; measurement values are censored at a Reporting Limit, the negative values are never seen, and we continue to develop (and publish) methods for left-censored LN environmental data.
    A mixture model for such data is presented. The motivating application involves Upper Tolerance Limits (UTLs = upper confidence limits for upper percentiles) which arise in facility surveys for worker protection. We are dealing with ICP-AES measurements for beryllium surface contamination, and have obtained large quantities of uncensored data. We discuss the model and its five physically meaningful parameters in terms of the measurement process. We show that conventional censored-data LN methods provide conservative UTLs that, paradoxically, become more (not less) conservative as the RL decreases. We pay some attention to maximum likelihood estimation using uncensored data, and then present attractive alternate approaches.

 

·         Fri. 11:30 a.m. February 27, CBC C-224:  Dr. Kevin Quinn

Department of Government & Institute for Quantitative Social Science, Harvard University
Title: Measuring Explicit Political Positions of Media

Abstract: We amass a new, large-scale dataset of newspaper editorials that allows us to calculate fine-grained measures of the political positions of newspaper editorial pages. Collecting and classifying over 1500 editorials adopted by 25 major US newspapers on 495 Supreme Court cases from 1994 to 2004, we apply an item response theoretic approach to place newspaper editorial boards on a substantively meaningful—and long validated—scale of political preferences. We validate the measures, show how they can be used to shed light on the permeability of the wall between news and editorial desks, and argue that the general strategy we employ has great potential for more widespread use.

 

·         Fri. 11:30 a.m. February 13, CBC C-224:  Dr. Barry Arnold

Department of Statistics, University of California, Riverside
Title: Some models involving hidden truncation in non-Gaussian settings

Abstract: The Azzalini skew-normal density of the form 2φ(x)Φ(λx) can be viewed as having arisen by considering a bivariate random variable (X,Y) with a classical bivariate normal density and focussing on the conditional distribution of X given Y < E(Y). The same family of distributions is encountered if we consider the conditional distribution of X given Y > E(Y). A slightly more general family is provided by considering the conditional distribution of X given Y > y0 where y0 is not necessarily equal to E(Y). The resulting model (which we can call a hidden truncation model, since we only observe X if the unobserved or hidden variable Y exceeds a threshold value, is a flexible extension of the classical univariate normal model with potential to .t a broad spectrum of data configurations which may not be well fitted by a classical normal model. In the present paper we consider several other basic bivariate non-Gaussian models and investigate the nature of their corresponding hidden truncation models. In particular, it is of interest to identify situations in which hidden truncation fails to augment the basic model. Additive component representations provide an alternative to the hidden truncation paradigm in the normal case. It is conjectured that it is only in the normal case that the two models coincide.

 

·         Fri. 1:00 p.m. February 6, CBC C-224:  Dr. Grace Chiu

Department of Statistics and Actuarial Science, University of Waterloo
Title: Gauging Ecosystem Health with Latent Health Factor Models

Abstract: We propose a model-based approach for constructing ecological health indices through statistical inference. Our latent health factor index (LHFI) is obtained by estimating an unobservable health factor term in a mixed-effects ANOCOVA that directly models the relationship among indicator variables (or metrics) and health. Unlike  conventional indices (e.g. IBI and O/E index) that rely on domain-specific calibrations of metrics against reference conditions whose non-constancy is largely unaccounted for, our methodology (a) involves no explicit reference conditions while metrics are intrinsically "calibrated" in the context of multiple comparisons, and (b) can naturally incorporate spatio-temporal influences on calibration schemes.

 

·         Fri. 2:30 p.m. January 23, CBC C-224:  Dr. Abel Rodriguez

Department of Applied Mathematics and Statistics, University of California, Santa Cruz
Title: Multilevel Functional Clustering and the Nested Dirichlet Process

Abstract: This talk discusses clustering procedures for nested samples of curves, where multiple profiles are collected for each subject in the study.  We start by considering the application of standard functional clustering tools to this problems, which lead to groupings based on the average profile for each subject.  After discussing some of the shortcoming of this approach, we present a model based on a generalization of the nested Dirichlet processes that uses the information on the distribution of curves to generate the clusters.  The method is illustrated using data from the Early Pregnancy Study on hormone profiles along multiple menstrual periods for a cohort of women.  The resulting model simultaneous cluster both curves and subjects, allowing us to identify outlier curves for each group of women, as well as outlying women whose distribution of profiles differs from the rest.

 

Fall 2008

 

·         Friday 11:30 a.m. November 21, CBC C-224: Dr. N. Balakrishnan

Department of Statistics, McMaster University

Title: Over/Under-Dispersed Poisson Distributions and Processes

Abstract: In this talk, I will establish several connections of the Poisson weight function to overdispersion and
underdispersion. Specifically, I will show that the logconvexity (logconcavity) of the mean weight function is a
necessary and sufficient condition for overdispersion (underdispersion) when the Poisson weight function does not
depend on the original Poisson parameter. I will also discuss some properties of the weighted Poisson distributions
(WPDs). I will then introduce a notion of pointwise duality between two WPDs and discuss some associated
properties.  Next, after presenting some illustrative examples and providing a discussion on various Poisson weight
functions used in practice, I will make some concluding remarks. Finally, I will use these results to introduce and
discuss over/under-dispersed Poisson processes.
 

·         Friday 11:30 a.m. November 7, CBC C-224: Dr. Glen Meeden

Department of Statistics, University of Minnesota, Twin City

Title: A Noninformative Bayesian Approach to Finite Population Sampling Using Auxiliary Variables

Abstract: In finite population sampling prior information is often available in the form of partial knowledge about an auxiliary variable, for example its mean may be known. In such cases, the ratio estimator and the regression estimator are often used for estimating the population mean of the characteristic of interest. The Polya posterior has been developed as a noninformative Bayesian approach to survey sampling. It is appropriate when little or no prior information about the population is available. Here we show that it can be extended to incorporate types of partial prior information about auxiliary variables. We will see that it typically yields procedures with good frequentist properties even in some problems where standard frequentist methods are difficult to apply. Moreover one does not need to select a model which explictly relates the characteristic of interest to the auxiliary variables.

 

·         Friday 11:30 a.m. October 10, CBC C-224: Dr. Lurdes Inoue

Department of Biostatistics, University of Washington, Seattle
Title: Modeling disease progression

Abstract: In this talk we discuss some modeling approaches to investigate disease progression.  First, we propose a model that links longitudinal biomarker and disease progression. Specifically, we consider an underlying latent disease process that describes the onset of the disease and models the transition to an advanced stage of the disease as dependent on the biomarker levels. Next, we propose a variation of the above model to investigate disease progression using data prospectively collected in a screening study. We illustrate our methods through simulations and a case study in prostate cancer.

 

·         Friday 11:30 a.m. October 3, CBC C-224: Dr. Ben Kedem

Department of Statistics, University of Maryland, College Park

Title: Bayesian Spatial Prediction

Abstract: We discuss Bayesian spatial/temporal prediction in transformed Gaussian random fields where the transformation belongs to a parametric family. Monte Carlo integration is used in the approximation of the predictive density function, which is easy to implement in this framework. The BTG software for the implementation of the method will be discussed by means of spatial and temporal examples. As a byproduct, we provide a Bayesian way to tackle the distribution problem of average rainfall rate.

 

·         Friday 3:30 p.m. September 12, CBC C-224: Dr. Ashis SenGupta

Department of Statistics, University of California, Riverside and Applied Statistics Unit, Indian Statistical Institute, Kolkata, India
Title: Directional Data on 3-smooth Manifolds: Probability Models, Independence and Regression Analyses

        Abstract:  Observations on angular propagations, directional orientations, and even strictly periodic phenomena can be cast in the arena of directional data (DD). Such observations are frequently encountered in almost every sphere of applied science, ranging from e.g. agriculture to zoology, chronotherapy to defence, etc. There has been a paucity of probability distributions to model DD, even on 3-smooth manifolds, e.g. torus, cylinder, and hence on their higher dimensional generalizations. We present here unified approaches to derivations of such distributions. Tests for orthogonality of the directional random variables are then obtained based on these distributions. Next, models for regression with linear and circular variables are presented and related inference procedures are developed. Both classical and Bayesian approaches are discussed. Finally, the proposed methods are illustrated by several real-life examples.

 

2008-2009 Statistics Colloquium/Seminar

Spring 2009

 

·         Fri. 11:30 a.m. April 24, CBC C-224:  Dr. Charles Davis

President, Environmetrics and Statistics Ltd
Title: A Model for Measurements of Lognormally Distributed Environmental Contaminants

Abstract: Lognormal (LN) distributions are often assumed for environmental contaminants, with perhaps some justification. But decisions are made from measurements, not the unobservable concentrations themselves. These often do not have LN distributions. Rather, at fixed concentrations distributions of measurements are often normally distributed, and if low-level measurements are unbiased one has negative values; standard LN inference techniques fail in this setting. This reality is universally ignored; measurement values are censored at a Reporting Limit, the negative values are never seen, and we continue to develop (and publish) methods for left-censored LN environmental data.
    A mixture model for such data is presented. The motivating application involves Upper Tolerance Limits (UTLs = upper confidence limits for upper percentiles) which arise in facility surveys for worker protection. We are dealing with ICP-AES measurements for beryllium surface contamination, and have obtained large quantities of uncensored data. We discuss the model and its five physically meaningful parameters in terms of the measurement process. We show that conventional censored-data LN methods provide conservative UTLs that, paradoxically, become more (not less) conservative as the RL decreases. We pay some attention to maximum likelihood estimation using uncensored data, and then present attractive alternate approaches.

 

·         Fri. 11:30 a.m. February 27, CBC C-224:  Dr. Kevin Quinn

Department of Government & Institute for Quantitative Social Science, Harvard University
Title: Measuring Explicit Political Positions of Media

Abstract: We amass a new, large-scale dataset of newspaper editorials that allows us to calculate fine-grained measures of the political positions of newspaper editorial pages. Collecting and classifying over 1500 editorials adopted by 25 major US newspapers on 495 Supreme Court cases from 1994 to 2004, we apply an item response theoretic approach to place newspaper editorial boards on a substantively meaningful—and long validated—scale of political preferences. We validate the measures, show how they can be used to shed light on the permeability of the wall between news and editorial desks, and argue that the general strategy we employ has great potential for more widespread use.

 

·         Fri. 11:30 a.m. February 13, CBC C-224:  Dr. Barry Arnold

Department of Statistics, University of California, Riverside
Title: Some models involving hidden truncation in non-Gaussian settings

Abstract: The Azzalini skew-normal density of the form 2φ(x)Φ(λx) can be viewed as having arisen by considering a bivariate random variable (X,Y) with a classical bivariate normal density and focussing on the conditional distribution of X given Y < E(Y). The same family of distributions is encountered if we consider the conditional distribution of X given Y > E(Y). A slightly more general family is provided by considering the conditional distribution of X given Y > y0 where y0 is not necessarily equal to E(Y). The resulting model (which we can call a hidden truncation model, since we only observe X if the unobserved or hidden variable Y exceeds a threshold value, is a flexible extension of the classical univariate normal model with potential to .t a broad spectrum of data configurations which may not be well fitted by a classical normal model. In the present paper we consider several other basic bivariate non-Gaussian models and investigate the nature of their corresponding hidden truncation models. In particular, it is of interest to identify situations in which hidden truncation fails to augment the basic model. Additive component representations provide an alternative to the hidden truncation paradigm in the normal case. It is conjectured that it is only in the normal case that the two models coincide.

 

·         Fri. 1:00 p.m. February 6, CBC C-224:  Dr. Grace Chiu

Department of Statistics and Actuarial Science, University of Waterloo
Title: Gauging Ecosystem Health with Latent Health Factor Models

Abstract: We propose a model-based approach for constructing ecological health indices through statistical inference. Our latent health factor index (LHFI) is obtained by estimating an unobservable health factor term in a mixed-effects ANOCOVA that directly models the relationship among indicator variables (or metrics) and health. Unlike  conventional indices (e.g. IBI and O/E index) that rely on domain-specific calibrations of metrics against reference conditions whose non-constancy is largely unaccounted for, our methodology (a) involves no explicit reference conditions while metrics are intrinsically "calibrated" in the context of multiple comparisons, and (b) can naturally incorporate spatio-temporal influences on calibration schemes.

 

·         Fri. 2:30 p.m. January 23, CBC C-224:  Dr. Abel Rodriguez

Department of Applied Mathematics and Statistics, University of California, Santa Cruz
Title: Multilevel Functional Clustering and the Nested Dirichlet Process

Abstract: This talk discusses clustering procedures for nested samples of curves, where multiple profiles are collected for each subject in the study.  We start by considering the application of standard functional clustering tools to this problems, which lead to groupings based on the average profile for each subject.  After discussing some of the shortcoming of this approach, we present a model based on a generalization of the nested Dirichlet processes that uses the information on the distribution of curves to generate the clusters.  The method is illustrated using data from the Early Pregnancy Study on hormone profiles along multiple menstrual periods for a cohort of women.  The resulting model simultaneous cluster both curves and subjects, allowing us to identify outlier curves for each group of women, as well as outlying women whose distribution of profiles differs from the rest.

 

Fall 2008

 

·         Friday 11:30 a.m. November 21, CBC C-224: Dr. N. Balakrishnan

Department of Statistics, McMaster University

Title: Over/Under-Dispersed Poisson Distributions and Processes

Abstract: In this talk, I will establish several connections of the Poisson weight function to overdispersion and
underdispersion. Specifically, I will show that the logconvexity (logconcavity) of the mean weight function is a
necessary and sufficient condition for overdispersion (underdispersion) when the Poisson weight function does not
depend on the original Poisson parameter. I will also discuss some properties of the weighted Poisson distributions
(WPDs). I will then introduce a notion of pointwise duality between two WPDs and discuss some associated
properties.  Next, after presenting some illustrative examples and providing a discussion on various Poisson weight
functions used in practice, I will make some concluding remarks. Finally, I will use these results to introduce and
discuss over/under-dispersed Poisson processes.
 

·         Friday 11:30 a.m. November 7, CBC C-224: Dr. Glen Meeden

Department of Statistics, University of Minnesota, Twin City

Title: A Noninformative Bayesian Approach to Finite Population Sampling Using Auxiliary Variables

Abstract: In finite population sampling prior information is often available in the form of partial knowledge about an auxiliary variable, for example its mean may be known. In such cases, the ratio estimator and the regression estimator are often used for estimating the population mean of the characteristic of interest. The Polya posterior has been developed as a noninformative Bayesian approach to survey sampling. It is appropriate when little or no prior information about the population is available. Here we show that it can be extended to incorporate types of partial prior information about auxiliary variables. We will see that it typically yields procedures with good frequentist properties even in some problems where standard frequentist methods are difficult to apply. Moreover one does not need to select a model which explictly relates the characteristic of interest to the auxiliary variables.

 

·         Friday 11:30 a.m. October 10, CBC C-224: Dr. Lurdes Inoue

Department of Biostatistics, University of Washington, Seattle
Title: Modeling disease progression

Abstract: In this talk we discuss some modeling approaches to investigate disease progression.  First, we propose a model that links longitudinal biomarker and disease progression. Specifically, we consider an underlying latent disease process that describes the onset of the disease and models the transition to an advanced stage of the disease as dependent on the biomarker levels. Next, we propose a variation of the above model to investigate disease progression using data prospectively collected in a screening study. We illustrate our methods through simulations and a case study in prostate cancer.

 

·         Friday 11:30 a.m. October 3, CBC C-224: Dr. Ben Kedem

Department of Statistics, University of Maryland, College Park

Title: Bayesian Spatial Prediction

Abstract: We discuss Bayesian spatial/temporal prediction in transformed Gaussian random fields where the transformation belongs to a parametric family. Monte Carlo integration is used in the approximation of the predictive density function, which is easy to implement in this framework. The BTG software for the implementation of the method will be discussed by means of spatial and temporal examples. As a byproduct, we provide a Bayesian way to tackle the distribution problem of average rainfall rate.

 

·         Friday 3:30 p.m. September 12, CBC C-224: Dr. Ashis SenGupta

Department of Statistics, University of California, Riverside and Applied Statistics Unit, Indian Statistical Institute, Kolkata, India
Title: Directional Data on 3-smooth Manifolds: Probability Models, Independence and Regression Analyses

        Abstract:  Observations on angular propagations, directional orientations, and even strictly periodic phenomena can be cast in the arena of directional data (DD). Such observations are frequently encountered in almost every sphere of applied science, ranging from e.g. agriculture to zoology, chronotherapy to defence, etc. There has been a paucity of probability distributions to model DD, even on 3-smooth manifolds, e.g. torus, cylinder, and hence on their higher dimensional generalizations. We present here unified approaches to derivations of such distributions. Tests for orthogonality of the directional random variables are then obtained based on these distributions. Next, models for regression with linear and circular variables are presented and related inference procedures are developed. Both classical and Bayesian approaches are discussed. Finally, the proposed methods are illustrated by several real-life examples.

 

2007-2008 Statistics Colloquium/Seminar

Spring 2008

 

·         Friday 11:30 a.m. May 2, CBC C-224: Dr. Kaushik Ghosh

Department of Mathematical Sciences, University of Nevada, Las Vegas
Title: Joint Modeling of Longitudinal Data and Informative Dropout in the Presence of Multiple   Change points.

Abstract: In longitudinal studies of patients with the Human Immunodeficiency Virus (HIV), objectives of interest often include modeling of individual-level trajectories of HIV Ribonucleic Acid (RNA) as a function of time. Empirical evidence suggests that individual trajectories often possess multiple points of rapid change, which may vary from subject to subject --- both in number and in location. Presence of such changepoints make the modeling of individual viral RNA levels difficult, since usual methods become unsuitable. 

          In this talk, we present a new robust multiple-change point model for longitudinal trajectories. The proposed method uses a joint model to incorporate information from the longitudinal data as well as from informative dropouts, which are common in such studies. A Dirichlet process prior is used to model the distribution of the changepoints. The Dirichlet process leads to a natural clustering, and thus, sharing of information among subjects with similar trajectories. A fully Bayesian approach for model fitting and prediction is implemented using the Gibbs sampler on the ACTG 398 clinical trial data.

 

·         Friday 11:30 a.m. March 28, CBC C-224: Dr. Yuedong Wang

Department of Statistics and Applied Probability, University of California, Santa Barbara,
Title: Nonlinear Nonparametric Regression Models

Abstract: Almost all of the current nonparametric regression methods such as smoothing splines, generalized additive models and varying coefficients models assume a linear relationship when nonparametric functions are regarded as parameters. In this talk we present a general class of nonlinear nonparametric models that allow nonparametric functions to act nonlinearly. They  arise in many fields as either theoretical or empirical models. We propose new estimation methods based on an extension of the Gauss-Newton method to infinite dimensional spaces and the backfitting procedure. We extend the generalized cross validation and the generalized maximum likelihood methods to estimate smoothing parameters. Connections between nonlinear nonparametric models and nonlinear mixed effects models are established. Approximate Bayesian confidence intervals are derived for inference. We will also present a user friendly R function for fitting these models. The methods will be illustrated using two real data examples.

 

·         Friday 2:00 p.m. February 1, CBC C-224: Toby White, Ph.D candidate

Department of Statistics, University of Washington, Seattle
Title: Extensions of Latent Class Transition Models with  Application to Disability Survey Data

        Abstract: Latent class transition models are used to partition a population into a small number of relatively homogeneous subgroups so  that the movement of individuals among these subgroups can be followed  through time.  One context for these models involves the U.S. elderly chronically disabled, who may be grouped into one of 4-5 disability  classes which differ by both type and severity of disability.  Such data appear in longitudinal surveys, which can have large assessment  intervals, considerable right and left censoring, and staggered entry  and exit.  Thus, methodology is needed to account for all the possible  time sequences at which individuals can be observed, since traditional  latent class transition models assume a complete set of observations  for each individual.  I develop a group-based modeling approach that encompasses various time sequences of observation, and use the E-M  algorithm with adjustments to estimate model parameters and parameter  standard errors.  I also extend basic latent class transition models  to incorporate age, period, and cohort effects, while satisfying  identifiability constraints.  I illustrate this methodology using ADL  and IADL data from the National Long-Term Care Survey (1982-2004), and  discuss transition probability estimates among classes of varying  disability level and death.

 

Fall 2007

 

·         Fri. 11:30 a.m. November 30, CBC C-225: Dr. Anton Westveld

Department of Mathematical Sciences, University of Nevada, Las Vegas
Title: Modeling Foreign Direct Investment as a Longitudinal Social Network

Abstract: An extensive literature in international and comparative political economy has focused on the how the mobility of capital affects the ability of governments to tax and regulate firms.  The conventional wisdom holds that governments are in competition with each other to attract foreign direct investment (FDI).  Nation-states observe the fiscal and regulatory decisions of competitor governments, and are forced to either respond with policy changes or risk losing foreign direct investment, along with the politically salient jobs that come with these investments.  The political economy of FDI suggests a network of investments with complicated dependencies.

          We propose an empirical strategy for modeling investment patterns in 24 advanced industrialized countries from 1985-2000.  Using bilateral FDI data we estimate how increases in flows of FDI affect the flows of FDI in other countries.  Our statistical model is based on the methodology developed by Westveld & Hoff (2007).  The model allows the temporal examination of each notion's activity level in investing, attractiveness to investors, and reciprocity between pairs of nations.  We extend the model by treating the reported inflow and outflow data as independent replicates of the true value and allowing for a mixture model for the fixed effects portion of the network model.  Using a fully Bayesian approach, we also impute missing data within the MCMC algorithm used to fit the model.  A working paper can be found at: http://faculty.unlv.edu/westveld/Papers/FDI.pdf.

 

·         Fri. 11:00 a.m. October 12, CBC C-225:  Dr. Junyong Park

Department of Mathematics and Statistics, University of Maryland, Baltimore County
Title: Robust Test for Detecting a Signal in a High Dimensional Sparse Normal Vector

Abstract: We consider the problem of testing whether a high dimensional observation vector has signal, i.e., testing all the mean values are zero versus the alternative that non-zero means exist. The setup is when the dimension of vector is large, and the  mean vector is 'sparse', e.g., the small fraction of mean values is non-zero. We suggest a test which is not sensitive to the exact tail behavior under normality assumption. In particular, if the 'moderate deviation' tail of the distribution is represented as the product of a tail of a standard normal and a `slowly changing' function, our suggested test is robust. In particular, a need for robust test is expected when the observations are of the normalized form where normality assumption is commonly used from C.L.T.

 

 

Fall 2006

 

·         Fri. 11:00 a.m. November 3, CBC C-225:  Dr. Nitis Mukhopadhyay

Department of Statistics, University of Connecticut, Storr
Title: A New Two-Stage Sampling Methodology Designed for an Application in Horticulture

Abstract: A horticulturist was considering the number of days each marigold variety took from planting seeds to reach a stage when first bud appeared. The primary interest was to estimate the maximum waiting time between “seeding” and “first budding” among three varieties. It was thought that a 99% confidence interval of width one day would suffice since the data could be recorded with accuracy of one-half day. We assumed a normal distribution for the response variable. The horticulturist provided positive lower bounds for the variances that led to unequal pilot sample sizes.

          Accordingly, a new two-stage sampling design had to be developed and implemented. We will show that the data validated all assumptions made during the course of this investigation.

Some of the important exact as well as large-sample properties of the proposed methodology will also be summarized. Interpretations of the properties would be highlighted with real data.

          Finally, we will argue that the new methodology is theoretically superior to an existing methodology in case the pilot sizes could somehow be “chosen” equal. Using the data on hand, the superiority of the new methodology will be indicated.