Statistics seminars are typically held in Ross N638 on Friday mornings, from 10:30 to 11:30am.

**Seminars 2019-2020**

**Friday, October 25, 2019, 10:30a.m. to 11:30a.m., N638 Ross**

**Fateh Chebana, Institut National de la Reserche Scientifique (INRS)**

**Index construction with a constrained grouped additive index model**

Abstract: In the last twenty years, a number of countries implemented heat-health warning system to monitor heat waves and take action to reduce their impact on the population. Such warning systems often rely on two components, indices representing the monitored environmental phenomenon of interest and thresholds above which the phenomenon is considered to induce a high risk on human health. The work presented here focuses on the former (indices). In this regard, we introduce a novel and general method to construct one or several indices. In order to construct interpretable indices with good prediction performances, the proposed model uses several constraints and a grouping of the exploratory variables. Hence the model is called constrained grouped additive index model (c-GAIM). We show the use of c-GAIM to construct indices for the heat-health warning system of the province of Quebec.

- Seminars 2017-2018
**Friday, March 9, 2018, 10:30-11:30am****John Braun, University of British Columbia****"Dionysus: A Bootstrapped Version of the Prometheus Wildland Fire Growth Model"**

The Prometheus Fire Growth Model is a deterministic wildfire simulator used to predict the growth of a wildfire, in Canada and other countries. Given weather, topographical and fuel information, the simulated fire front is plotted at equally spaced times. Unpredictability of fire behaviour makes deterministic predictions inaccurate. This talk will briefly describe a risk analysis study undertaken using Prometheus applied to random weather streams, pointing out limitations with that approach and motivating an alternative viewpoint, based on bootstrapping. By statistically modelling the data to which the Prometheus model equations are fit, it is possible to obtain a distribution of fire front predictions. This approach allows us to estimate the probability that a growing fire will eventually burn a particular location. Repeated stochastic simulation is not required, so probability contours require no more computing time than deterministic contours. We conclude with a discussion of the implications for improved fire risk analysis.

## Seminars 2016-2017

- Friday, March 24, 2017, 10:30-11:30am
- Frank Konietschke, The University of Texas at Dallas
- "Rank-Based Procedures in Factorial Designs: Hypotheses about Nonparametric Treatment Effects "

Existing tests for factorial designs in the nonparametric case are based on hypotheses formulated in terms of distribution functions. Typical null hypotheses, however, are formulated in terms of some parameters or effect measures, particularly in heteroscedastic settings. In this talk we extend this idea to nonparametric models by introducing a novel nonparametric ANOVA-type-statistic based on ranks which is suitable for testing hypotheses formulated in meaningful nonparametric treatment effects in general factorial designs. This is achieved by a careful in-depth study of the common distribution of rank-based estimators for the treatment effects. Since the statistic is asymptotically not a pivotal quantity we propose different approximation techniques, discuss their theoretic properties and compare them in extensive simulations together with two additional Wald-type tests. An extension of the presented idea to general repeated measures designs is briefly outlined. The proposed rankbased procedures maintain the pre-assigned type-I error rate quite accurately, also in unbalanced and heteroscedastic models. A real data example illustrates the application of the proposed methods.

References: Brunner, E., Konietschke, F., Pauly, M., Puri, M.L. (2016). Rank-Based Procedures in Factorial Designs: Hypotheses about Nonparametric Treatment Effects. Journal of the Royal Statistical Society - Series B. Accepted for publication.

- Friday, March 17, 2017, 10:30-11:30am
- Professor Madan Puri, Indiana University
- "Asymptotic Normality, Rates of Convergence and Large Deviation Probabilities for a Broad Class of Statistics"

In this talk, I will introduce a broad class of statistics which includes as special cases, the unsigned simple linear rank statistics, signed rank statistics, linear combination of functions of order statistics, linear functions of the concomitants of order statistics and rank combinatorial statistic, among possibly others. For this class, (i) asymptotic normality is established, (ii) the rates of convergence to normality are derived, and (iii) large deviation probabilities are investigated. The results obtained supersede all the results obtained thus far in this direction.

## Seminars 2015-2016

- FRIDAY, January 22, 2016, 10:30-11:30am
- Professor Wesolowski, The Technical University of Warsaw
- "Asymmetric Simple Exclusion Processes and Quadratic Harnesses"

Asymmetric simple exclusion process (ASEP) is a Markov model for random

particles that cannot occupy the same position, and tend to move to the

adjacent site with the rate that is larger to the right than to the left.

We establish a correspondence between a family of Markov processes called

quadratic harnesses (processes with linear conditional expectations and

quadratic conditional variances given a past-future filtration) and finite

state ASEPs with open boundaries. As applications, we give a quick proof

of the large deviations principle for the total number of particles in the

system, and show how explicit formulas for the average occupancy of a site

arise for special choices of parameters. The talk is based on joint

research with Wlodek Bryc (Univ. of Cincinnati).

- MONDAY, November 30th, 10:30-11:30am

Guido Montúfar, Max Planck Institute for Mathematics in the Sciences **Dimension of Restricted Boltzmann Machines.**- A Restricted Botlzmann Machine is a probabilistic graphical model with a full bipartite graph between n observable binary variables and m hidden binary variables. The probability distributions defined by this model are normalized entry-wise products of non-negative rank at most two tables and build a semialgebraic set.
- Using tropical geometry, Cueto, Morton, and Sturmfels showed that for many choices of n and m the corresponding set has the expected dimension, equal to the minimum between the number of model parameters and the ambient dimension, and conjectured that this was true for all n and m. In this talk I present a positive solution to that conjecture.

**FRIDAY, November 13th, 10:30-11:30am **

**Zhou Zhou, University of Toronto**

**Inference for Non-stationary Time Series Regression with Inequality Constraints**

We consider statistical inference for time series linear regression where the response and predictor processes may experience general forms of abrupt and smooth non-stationary behaviours over time. Meanwhile, the regression parameters are subject to linear inequality constraints. A simple and unified procedure for structural stability check and parameter inference is proposed. The proposed methodology is shown to be consistent whether or not the true regression parameters are on the boundary of the restricted parameter space via utilizing an asymptotically invariant geometric property of polyhedral cones.

**CANSSI statistical seminar **

**WEDNESDAY October 28 2-3pm (note the unusual time)**

**Jiahua Chen, Department of Statistics, University of British Columbia **

**Small Area Quantile Estimation**

Sample surveys are widely used to obtain information about totals, means, medians and other parameters of finite populations. In many applications, similar information is also desired on sub-populations such as individuals in specific geographic areas and socio-demographic groups. Often, the surveys are conducted at national or similarly high levels. The random nature of the probability sampling can result in few sampling units from many unplanned sub-populations at the design stage. Estimating parameters of these sub-populations (small areas) with satisfactory precision and evaluating their accuracy pose serious challenges to statisticians. Short of direct information, statisticians resort to pooling information across small areas via suitable model assumptions and administrative archives and census data. In this paper, we propose three estimators of small area quantiles for populations admitting a linear structure with normal error distributions or error distributions satisfying a semi-parametric density ratio model (DRM). We studies the asymptotic properties of the DRM-based method and find it root-n consistent. Extensive simulation studies are used to reveal properties of three methods under various foreseeable populations. The DRM-based is found significantly more efficient when the error distribution is skewed and has comparable efficiency with other methods in other cases. This talk is sponsored by CANSSI CRT Project on Statistical Inference for Complex Surveys with Missing Observations.

**October 23 1-2pm (note the unusual time)**

** Ying Zhang, Acadia University**

Nonparametric statistical procedures are commonly used in analyzing for trend in water resources time series. One popular procedure is associated with Mann-Kendall tau correlation for detecting monotonic trend in time series data with seasonal and serial dependence. However there is little rigorous discussion in the literature about its validity and alternatives. In this talk,the asymptotic normality of seasonal MK test is determined for a large family of absolutely regular processes, a bootstrap sampling version of this test is proposed and its performance is studied through simulation. These simulations compare the performance of the traditional test, the bootstrapped version referred to above, as well as a bootstrapped version of Spearman's rho partial correlation. The simulation results indicate that both bootstrap tests perform comparably to the traditional test when the seasonal effect is deterministic,but the traditional test can fail to converge to the nominal levels when the seasonal effect is stochastic. Both bootstrapped tests perform similarly to each other in terms of accuracy and power. This is a joint work with Paul Cabilio and Khurram Nadeem.

**THURSDAY October 15, 2-3pm (note the unusual time)**

** Yingli Qin, University of Waterloo**

** Testing the order of a population spectral distribution for high-dimensional data**

Large covariance matrices play a fundamental role in various high-dimensional statistics. Investigating the limiting behavior of the eigenvalues can reveal informative structures of large covariance matrices, which is particularly important in high-dimensional principal component analysis and covariance matrix estimation. In this paper, we propose a framework to test the number of distinct population eigenvalues for large covariance matrices, i.e. the order of a Population Spectral Distribution. The limiting distribution of our test statistic for a Population Spectral Distribution of order 2 is developed along with its (N,p) consistency. We will also report some simulation results and real data analysis.

**TUESDAY October 13, 11am-noon (note the unusual time)**

** Giles Hooker, Department of Statistical Science, Department of Biological Statistics and Computational Biology, Cornell University**

** Detecting Evolution in Experimental Ecology: Diagnostics for Missing State Variables in Ordinary Differential Equation Models**

This talk considers goodness of fit diagnostics for time-series data from processes approximately modeled by systems of nonlinear ordinary differential equations. In particular, we seek to determine three nested causes of lack of fit: (i) unmodeled stochastic forcing, (ii) mis-specified functional forms and (iii) mis-specified state variables. Testing lack of fit in differential equations is challenging since the model is expressed in terms of rates of change of the measured variables. Here, lack of fit is represented on the model scale via time-varying parameters. We develop tests for each of the three cases above through bootstrap and permutation methods. A motivating example is presented from laboratory-based ecology in which algae are grown on nitrogen-rich medium and rotifers are introduced as a predator. The resulting data exhibit dynamics that do not correspond to those generated by classical ecological models. A hypothesized explanation is that more than one algal species are present in the chemostat. We assess the statistical evidence for this claim and show that while models incorporating multiple algal species provide better agreement with the data, their existence cannot be demonstrated without strong model assumptions. We conclude with an examination of the use of control theory to design inputs into chemostat systems to improve parameter estimation and power to detect missing components.