Computational Strategies for Large-Scale Statistical Data Analysis

Home > Workshops > 2018 > Computational Strategies for Large-Scale Statistical Data Analysis

Computational Strategies for Large-Scale Statistical Data Analysis

 02 - 06 Jul 2018

ICMS, 15 South College Street Edinburgh

  • Guang Chen, Purdue University
  • Chenlei Leng, University of Warwick

About:

Large-scale data is increasingly encountered in biology, medicine, engineering, social sciences and economics with the advance of the measurement technology. A distinctive feature of such data is that it usually comes with a large sample size and/or a large number of features, creating challenges for data storage, processing and data analysis. On the other hand, classical statistical methodology, theory and computation have been developed based on the assumption that the entire data reside on a central location. As a result, most classical statistical methods face computational challenges for analysing large-scale data in the big data era. Specifically, big data is known to possess the so-called 4D features: Distributed, Dirty, Dimensionality and Dynamic. These features make it very challenging to apply traditional statistical thinking to massive data.

The main aims of this workshop were to exchange developments made in distributed data analysis and aggregated inference with consideration on computational complexity and statistical properties of relevant estimators; to discuss open challenges, exchange research ideas and forge collaborations in three research areas: statistics, machine learning and optimisation; to promote the development of software with justified statistical properties and efficient computational properties; to engage more UK young researchers to work at the interface of computing and statistics.

Speakers

David Dunson, Duke University Scaling up Bayesian Inference
Ata Kaban, University of Birmingham Structure Aware Generalisation Error Bounds Using Random Projections
Eric Xing, Carnegie Mellon University On System and Algorithm Co-Design and Automatic Machine Learning
Yoonkyung Lee, The Ohio State University Dimensionality Reduction for Exponential Family Data
Jinchi Lv, University of Southern California Asymptotics of Eigenvectors and Eigenvalues for Large Structured Random Matrices
Faming Liang, Purdue University Markov Neighbourhood Regression for High-Dimensional Inference
Ping Ma, University of Georgia Asympirical Analysis: a New Paradigm for Data Science
Mladen Kolar, University of Chicago Recovery of Simultaneous Low Rank and Two-Way Sparse Coefficient Matrices, a Nonconvex Approach
Guang Cheng, Purdue University Large-Scale Nearest Neighbour Classification with Statistical Guarantee
Yining Chen, LSE Narrowest-Over-Threshold Detection of Multiple Change-points and Change-point-like Features
Haeran Cho, University of Bristol Multiscale MOSUM Procedure with localised Pruning
Jason Lee, University of Southern California Geometry of Optimization Landscapes and Implicit Regularization of Optimization Algorithms
Chen Zhang, University College London Variational Gaussian Approximation for Poisson Data
Jeremias Knoblauch, University of Warwick Bayesian Online Changepoint Detection and Model Selection in High-Dimensional Data
Stanislav Volgushev, University of Toronto Distributed Inference for Quantile Regression Processes
Hua Zhou, University of California Global Solutions of Generalized Canonical Correlation Analysis Problems
Matteo Fasiolo, University of Bristol Calibrated Additive Quantile Regression
Wenxuan Zhong, University of Georgia Leverage Sampling to Overcome the Computational Challenges for Big Spatial Data
Moulinath Banerjee, University of Michigan Divide and Conquer in Nonstandard Problems: the Super-Efficiency Phenomenon
Qifan Song, Purdue University Bayesian Shrinkage Towards Sharp Minimaxity
Binyan Jiang, The Hong Kong Polytechnic University Penalized Interaction Estimation for Ultra High Dimensional Quadratic Regression
Chao Zheng, Lancaster University Revisiting Huber’s M-Estimation: a Tuning-Free Approach
Xin Bing, Cornell University A Fast Algorithm with Minimax Optimal Guarantees for Topic Models with an Unknown Number of Topics
Cheng Qian, LSE Covariance and Graphical Modelling for High-Dimensional Longitudinal and Functional Data
Didong Li, Duke University Efficient Manifold and Subspace Approximations with Spherelets
Xiaoming Huo, Georgia Institute of Technology Non-Convex Optimization and Statistical Properties