Peter Haas works at the IBM Almaden Research Center in San Jose, California, where he is a member of the Services Research Department, and also works closely with the Computer Science Department. He is also a Consulting Professor in the Department of Management Science and Engineering at Stanford University, where he conducts joint research and teaches a graduate course on computer simulation. He is interested in the application of techniques from Applied Probability and Statistics to the design and performance analysis of service systems and systems for data management, integration, mining, and exploration. In addition, Peter is interested in techniques for modeling, simulation, and control of complex systems, especially discrete-event stochastic systems, with applications to computer, manufacturing, work-flow, service, and telecommunication systems. Some specific topics of past and present interest are given below.
Peter has recently been working on the Splash project, whose goal is to build a novel computational framework for integrating independent data, models, and simulations to create comprehensive system models; such models support collaborative decision making in complex domains such as health. Peter also conducts research on techniques for modelling, simulation, and control of complex discrete-event stochastic systems, with applications to manufacturing, computer, telecommunication, work-flow, and transportation systems. He has made fundamental contributions to the theory of stochastic Petri nets and generalized semi-Markov processes, models that can be used to formally represent a broad class of complex discrete-event systems. His results include (1) the first formal definitions of non-Markovian SPNs and colored SPNs, (2) development of novel methods for specifying and simulating delays in SPNs and GSMPs, and (3) development of a simulation theory that provides conditions on the building blocks of an SPN or GSMP under which the associated system-state and delay processes are stable and are amenable to output-analysis techniques such as the regenerative method, batch-means method, or spectral method.
Peter has contributed in a number of ways to the development of methods for, and applications of, database sampling. Peter helped create the proposed ISO standard for random sampling in SQL queries and provided statistical expertise during the implementation of sampling in IBM's DB2 product; he has developed novel techniques for supporting and enhancing this type of sampling. As part of his query-optimization and data-integration work, Peter helped develop a sampling-based "bump-hunting" method for discovering fuzzy algebraic constraints in relational data, as well as the CORDS algorithm for automatic discovery of correlations and soft functional dependencies. Use of these methods has resulted in order-of-magnitude speedups in query processing. He has also developed state-of-the-art sampling-based methods for estimating the number of distinct values of a database attribute, as well as sampling-based methods for accelerating association-rule mining. Peter has also developed sampling-based methods for quickly estimating the answer to "aggregation" queries that compute statistics, such as selectivities, sums, averages, and distinct-value counts, over relational expressions. His work with the CONTROL group at UC Berkeley and with Jeffrey Naughton at the University of Wisconsin has focused on extending these methods even further to permit online, interactive processing of aggregation queries. Peter has recently developed sampling-based algorithms for creating and maintaining multiple sample synopses in a "synopsis warehouse" in order to support information discovery for the enterprise, as well as a sampling-based method for optimizing scan sharing in main-memory databases on multi-core CPU machines.
Peter helped develop a variety of techniques for query optimizers in database systems to improve their performance over time by learning from query feedback, as part of the LEO project. This includes use of maximum entropy techniques for consistent selectivity estimation and feedback-based histogram maintenance, as well as feedback-based methods for detecting statistical dependencies and for automatically configuring query-optimizer statistics collection. A number of his algorithms have been incorporated into the DB2 and IDS products. Peter has also studied the application of probabilistic methods to problems in query optimization for XML and relational databases. He has also worked on enhancing the statistical-processing capabilities of the DB2 and Visual Warehouse products, helping both to implement this functionality and to develop the ISO SQL standard for specifying linear regression queries over relational databases. He has also developed statistical and data-mining techniques for detection and prediction of anomalies in complex software systems. In addition, he helped develop novel hash-based methods for accurate distinct-value estimation in the presence of multiset operations.
Peter is a member of the Institute for Operations Research and Management Sciences (INFORMS), and is currently serving as President of the INFORMS Simulation Society. He was a Co-Chair of the 2011 INFORMS Simulation Society Research Workshop. He is also a member of ACM, and the ACM Special Interest Group on Management of Data (ACM SIGMOD). He is an Associate Editor for the journals Operations Research (Simulation Area) and VLDB Journal, and is an Area Editor (Stochastic Models) for ACM Transactions on Modeling and Computer Simulation (TOMACS). Peter is currently a Co-Editor of a TOMACS special issue on Simulation in Complex Service Systems and, in 2009, was a Guest Editor (with Dan Suciu) for a special issue of the VLDB Journal on Uncertain and Probabilistic Databases. He has served on the program committees for ACM SIGMOD ( 2002, 2005, 2007 ), VLDB ( 2004, 2006 ), PNPM '03, PODS '11, ACM SIGKDD 2004, and WSC 2012, among others.
Peter Haas received an S.B. Magna cum Laude in Engineering and Applied Mathematics from Harvard University in 1978, where he won the Blumberg Creative Science Award. In 1979 he received an M.S. in Environmental Engineering from Stanford University. From 1979 to 1981 he was a Staff Scientist at Radian Corporation (now URS Corp.), where he helped develop computer models of air quality and performed air quality modeling studies for the EPA, Texas Air Control Board, and corporate clients. He received an M.S. in Statistics and a Ph.D. in Operations Research from Stanford University in 1984 and 1986, respectively. (The OR department has since been renamed department of Management Science and Engineering.) After a brief stint as an Assistant Professor and Leavey Fellow in the Department of Decision and Information Sciences at Santa Clara University, he became a Research Staff Member at IBM Almaden Research Center in 1987, where he has been ever since. During 1992-93 he spent a sabbatical year at the University of Wisconsin-Madison, where he was an Honorary Fellow at the Center for the Mathematical Sciences. In 1999, his paper on interactive query processing earned him awards from both ACM SIGMOD and the IBM Research Division, and he was a keynote speaker at the 11th International Conference on Scientific and Statistical Database Management (SSDBM 1999). In 2003 he received an IBM Outstanding Technical Achievement Award for his work on sampling and mining in databases, and in 2005 received a Research Division Award for his work on learning optimizers. He also received the 2003 Outstanding Publication Award from the INFORMS College on Simulation (now the INFORMS Simulation Society) for his book on stochastic Petri nets. Peter is a four-time winner of the IBM Research Division's Pat Goldberg Memorial Best Paper Award in Computer Science, Electrical Engineering and Mathematics: in 1999, for his paper on join techniques for online aggregation, in 2003, for his paper on automatic discovery and exploitation of fuzzy algebraic constraints in databases, in 2005, for his paper on the use of maximum-entropy methods in query optimization, and in 2008, for his paper on the Monte Carlo Database System. His paper on maximum-entropy methods (VLDB '05) and on incremental sample maintenance (VLDB '06) were both selected as among the top VLDB papers for their respective years, with extended versions appearing in VLDB Journal. In 2007, Peter received the ACM SIGMOD Test of Time Award for his 1997 paper, Online Aggregation, coauthored with Joe Hellerstein and Helen Wang. In 2009, his work on distinct-value estimation under multiset operations was selected to appear in the "Research Highlights" section of Communications of the ACM. In 2011 he won a Best Paper Honorable Mention for his "Data is Dead" paper in the Challenges and Visions Track at VLDB 2011, as well as a Best Paper Award at the 2011 NIPS Big Learning Workshop for his work on matrix factorization over massive data. He has also received a number of IBM Invention Achievement Awards for patents filed and granted, including a Supplemental Patent Award (for distinguished patents). He has over 100 published research papers, books, and articles, and over 20 patents either filed or pending.
Note: some articles require an ACM or IEEE digital library subscription, or must be purchased, to download.
Published July 2002 and available
the Springer Catalog. Click here for errata.
Published September 2002 and available from
the IBM Redbook website.
"The Monte Carlo Database System: Stochastic Analysis Close to the Data." R. Jampani, L. Perez, M. Wu, F. Xu, C. Jermaine, and P. J. Haas. ACM Trans. Database Sys. 36(3), 2011, 18. Download paper.
"MCDB-R: Risk Analytics in the Database." S. Arumugam, R. Jampani, L. L. Perez, F. Xu, C. Jermaine, and P. J. Haas. PVLDB 3(1), 2010, 782-793. Download paper.
"Uncertainty management in rule-based information extraction systems." E. Michelakis, R. Krishnamurthy, P. J. Haas, S. Vaithyanathan. Proc. 2009 ACM SIGMOD Intl. Conf. Management of Data, 101-114. Download paper.
"E = MC3: managing uncertain enterprise data in a cluster-computing environment." F. Xu, K. S. Beyer, V. Ercegovac, P. J. Haas, and E. J. Shekita. Proc. 2009 ACM SIGMOD Intl. Conf. Management of Data, 441-454. Download paper.
"Database meets simulation: tools and techniques." P. J. Haas and C. Jermaine. Proc. 2009 INFORMS Simulation Society Research Workshop, 119-124. Download paper.
"MCDB: A Monte Carlo Approach to Managing Uncertain Data." R. Jampani, L. Perez, M. Wu, F. Xu, C. Jermaine, and P. J. Haas. Proc. 2008 ACM SIGMOD Intl. Conf. Management of Data, 687-700. Download paper.
"Resolution-Aware Query Answers for Business Intelligence." Y. Sismanis, L. Wang, A. Fuxman, P. J. Haas, and B. Reinwald. Proc. 25th Intl. Conf. Data Engineering, 2009, 976-987. Download paper.
"Data-Stream Sampling: Basic Techniques and Results." P. J. Haas. In Data Stream Management: Processing High Speed Data Streams. Springer-Verlag, 2011.
"Maintaining Bounded-Size Sample Synopses of Evolving Datasets." R. Gemulla, W. Lehner, and P. J. Haas. VLDB Journal, 17(2), 2008, 173-201. Download paper.
"Maintaining Bernoulli Samples over Evolving Multisets." R. Gemulla, W. Lehner, and P. J. Haas. Proc. Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Sys., 2007. Download paper.
"Techniques for Warehousing of Sample Data." P. G. Brown and P. J. Haas. Proc. 22nd Intl. Conf. Data Engrg., 2006. Download paper.
"An Estimator of the Number of Species from Quadrat Sampling." P. J. Haas, Y. Liu, and L. Stokes. Biometrics, 62, 2006, 135-141. (Online version here.)
"Efficient Data Reduction Methods for On-Line Association Rule Discovery." H. Brönimann, B.Chen, M.Dash, P. J. Haas, Y. Qiao, and P. Scheuermann. In Data Mining: Next Generation Challenges and Future Directions. AAAI Press, 2004, 125-146. Details.
"CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies." I. Ilyas, V. Markl, P. J. Haas, P. G. Brown and A. Aboulnaga. Proc. 2004 ACM SIGMOD Intl. Conf. Management of Data, 2004. Download paper.
"BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data." P. G. Brown and P. J. Haas. Proc. 29th Intl. Conf. on Very Large Data Bases, 2003, 668-679. Download paper.
"Efficient Data Reduction with EASE." H. Brönimann, B. Chen, M. Dash, P. J. Haas, P. Scheuermann. Proc. 9th Intl. Conf. Knowledge Discovery and Data Mining (KDD), 2003, 59-68. Download paper.
"A New Two-Phase Sampling Based Algorithm for Discovering Association Rules." B. Chen, P. J. Haas, and P. Scheuermann. Proc. 8th Intl. Conf. Knowledge Discovery and Data Mining (KDD), 2002, 462-468. Download paper.
"Estimating the Number of Classes in a Finite Population." P. J. Haas and L. Stokes. J. Amer. Statist. Assoc., 93, 1998, 1475-1487. Download extended version of this paper.
"The New Jersey Data Reduction Report." D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. Ioannidis, H. V. Jagadish, T. Johnson, R.Ng, V. Poosala, and K. Sevcik. Data Engrg. Bull., 20, 1997, 3-45. Download online version.
"Selectivity and Cost Estimation for Joins Based on Random Sampling." P.J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. ACM J. Computer Systems Sciences, 52, 1996, 550-569. (Special issue devoted to best papers from PODS '93.)
"Sampling-Based Selectivity Estimation Using Augmented Frequent Value Statistics." P. J. Haas and A. N. Swami. Proc. Eleventh Intl. Conf. Data Engineering, 1995, 522-531. Download paper.
"Sequential Procedures for Query Size Estimation." P. J. Haas and A. N. Swami. Proc. 1992 ACM SIGMOD Int. Conf. Management of Data, 1-11. Download paper.
"Very large scale Bayesian inference using MCDB." Z. Cai, Z. Vagena, C. Jermaine, and P. J. Haas. NIPS Big Learning Workshop }, 2011. Download paper.
"Large-scale matrix factorization with distributed stochastic gradient descent." R. Gemulla, P. J. Haas, E. Nijkamp, and Y. Sismanis. Proc. 17th Intl. Conf. Knowledge Discovery and Data Mining (KDD) , 2011, 69--77. (Also presented at XLDB 2011, and winner of Best Paper Award at 2011 NIPS Big Learning Workshop.) Download paper. Technical report.
"Ricardo: Integrating R and Hadoop." S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, J. McPherson. Proc. 2010 ACM SIGMOD Intl. Conf. Management of Data. To appear. Download paper.
"Distinct-Value Synopses for Multiset Operations." K. Beyer, R. Gemulla, P. J. Haas, B. Reinwald, and Y. Sismanis. Commun. ACM, 52(10), 2009, 87-95. Download paper.
"Main-memory scan sharing for multi-core CPUs." L. Qiao, V. Raman, F. Reiss, P. J. Haas, and G. M. Lohman. Proc. 34th Intl. Conf. on Very Large Data Bases, 2008, 610-621. Download paper.
"On Synopses for Distinct-Value Estimation under Multiset Operations." K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. Proc. 2007 ACM SIGMOD Intl. Conf. Management of Data, 2007. Download paper.
"Toward automated large scale information integration and discovery." P. Brown, P. J. Haas, J. Myllymaki, H. Pirahesh, B. Reinwald, and Y. Sismanis. In Data Management in a Connected World , T. Härder and W. Lehner, eds. Springer-Verlag, 2005. Download paper.
"Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches." G. Cormode, M. Garofalakis, P. J. Haas, and C. Jermaine. Foundations and Trends in Databases, 4(1-3), 2011, 1-294. Download paper.
"A Bi-Level Bernoulli Scheme for Database Sampling." P. J. Haas and C. König. Proc. 2004 ACM SIGMOD Intl. Conf. Management of Data, 2004. Download paper.
"The Need for Speed: Speeding Up DB2 Using Sampling." P. J. Haas. IDUG Solutions Journal, 10, 2003, 32-34. Download article.
"A Scalable Hash Ripple Join Algorithm." G. Luo, P. J. Haas, and J. F. Naughton. Proc. 2002 ACM SIGMOD Intl. Conf. Management of Data, 2002, 252-262. Download paper.
"Hoeffding Inequalities for Join-Selectivity Estimation and Online Aggregation." P. J. Haas. Computing Science and Statistics, 31, 2000, 74-78. (Proceedings of Interface '99.) Download extended version of this paper.
"Techniques for Online Exploration of Large Object-Relational Datasets." P. J. Haas. Proc. 11th Intl. Conf. Scientific and Statistical Database Management, 1999, 4-12 (keynote paper). Download paper.
"Interactive Data Analysis: The CONTROL Project." J. M. Hellerstein, R. Avnur, A. Chou, C. Hidber, C. Olston, V. Raman, T. Roth, and P. J. Haas. IEEE Computer, 32, August 1999, 51-59 (cover feature). Download paper.Abstract
"Ripple Joins for Online Aggregation." P. J. Haas and J. M. Hellerstein. Proc. 1999 ACM SIGMOD Int. Conf. Management of Data, 287-298. Download paper. See extended IBM Technical Report.
"Online Aggregation." J. M. Hellerstein, P. J. Haas, and H. J. Wang. Proc. 1997 ACM SIGMOD Intl. Conf. Management of Data, 171-182. (Winner of 2007 SIGMOD Test-of-Time Award.) Reprinted in Readings in Database Systems, Third Edition, Morgan Kaufmann, San Francisco. Download paper.
"Large-Sample and Deterministic Confidence Intervals for Online Aggregation." P. J. Haas. Proc. Ninth Intl. Conf. Scientific and Statistical Database Management, 1997, 51-63. Download paper.
"Regenerative simulation." P. J. Haas. Encyclopedia of Operations Research and Management Science}, 3rd Edition, 2012. In preparation.
"Splash: a platform for analysis and simulation of health." W. C. Tan, P. J. Haas, R. Mak, C. A. Kieliszewski, P. Selinger, P. P. Maglio, S. Glissmann, M. Cefkin, and Y. Li. Proc. 2nd ACM SIGHIT Intl. Health Informatics Symp. , 2012, 543-552. Download paper.
"Data is dead...without what-if models." P. J. Haas, P. P. Maglio, P. G. Selinger, and W.-C. Tan. PVLDB}, 4(12), 2011, 1486-1489. (Best Paper Honorable Mention.) Download paper
"Social Factors in Creating an Integrated Capability for Health System Modeling and Simulation." P. P. Maglio, M. Cefkin, P. J. Haas, and P. Selinger. Proc. 2010 Intl. Conf. Social Computing, Behavioral Modeling, and Prediction (SBP 2010), 44-51. Download paper.
"Laws of Large Numbers and Functional Central Limit Theorems for Generalized Semi-Markov Processes." P. W. Glynn and P. J. Haas. Commun. Statist. Stochastic Models, 22, 2006, 201-231. Download paper.
"On Functional Central Limit Theorems for Semi-Markov and Related Processes." P. W. Glynn and P. J. Haas. Commun. Statist. Theory Methods, 33, 2004, 487-506. (Special issue on semi-Markov processes.) Download paper.
"Estimation Methods for Delays in Non-Regenerative Discrete-Event Systems." P. J. Haas. Commun. Statist. Stochastic Models, 19, 2003, 1-35. Download paper.
"On the Validity of Long-Run Estimation Methods for Discrete-Event Systems." P. J. Haas and P. W. Glynn. Perf. Eval. Rev., 30, 2002, 35-37. Special issue on the 4th Workshop Math. Perform. Modeling and Analysis (MAMA 2002). Download paper.
"On Simulation Output Analysis for Generalized Semi-Markov Processes." P. J. Haas. Commun. Statist. Stochastic Models, 15, 1999, 53-80. Download paper.
"Estimation Methods for Passage Times Based on One-Dependent Cycles." P. J. Haas and G. S. Shedler. Discrete Event Dynamic Systems: Theory and Applications, 6, 1996, 43-72. Download paper.
"Recurrence and Regeneration in Non-Markovian Networks of Queues." P. J. Haas and G. S. Shedler. Commun. Statist. Stochastic Models, 3, 1987, 29-52. Download paper.
"Regenerative Generalized Semi-Markov Processes." P. J. Haas and G. S. Shedler. Commun. Statist. Stochastic Models, 3, 1987, 409-438. Download paper.
"Modeling and Simulation with Stochastic Petri Nets." (Invited tutorial.) Proc. Winter Simulation Conference '04. Download paper.
"Estimation Methods for Non-Regenerative Stochastic Petri Nets." IEEE Trans. Software Engrg., 25, 1999, 218-236. (Special section devoted to the best papers from PNPM '97.) Download paper.
"Passage Times in Colored Stochastic Petri Nets." P. J. Haas and G. S. Shedler. Commun. Statist. Stochastic Models, 9, 1993, 31-79. Download paper.
"Stochastic Petri Nets: Modeling Power and Limit Theorems." P. J. Haas and G. S. Shedler. Probab. Engrg. Inform. Sciences, 5, 1991, 477-498.
"Stochastic Petri Nets With Timed and Immediate Transitions." P. J. Haas and G. S. Shedler. Commun. Statist. Stochastic Models, 5, 1989, 563-600. (Special issue devoted to Computer-Experimental Methods in Probability.) Download paper.
"Regenerative Stochastic Petri Nets." P. J. Haas and G. S. Shedler. Performance Evaluation, 6, 189-204, 1986. Download paper.
"Discovering and exploiting statistical properties for query optimization in relational databases: A survey." P. J. Haas, I. F. Ilyas, G. M. Lohman, and V. Markl. Statistical Analysis and Data Mining, 1, 2009, 223-250. Download paper.
"Detecting attribute dependencies from query feedback." P. J. Haas, F. Hueske, and V. Markl. Proc. 33rd Intl. Conf. on Very Large Data Bases, 2007, 830-841. Download paper.
"Consistent selectivity estimation via maximum entropy." V. Markl, P. J. Haas, M. Kutsch, N. Megiddo, and T. M. Tran. VLDB Journal, 16, 2007, 55-76. (Special issue devoted to best papers from VLDB 2005.) Online version, or try here.
"ISOMER: Consistent histogram construction using query feedback." U. Srivastava, P. J. Haas, V. Markl, and N. Megiddo. Proc. 22nd Intl. Conf. Data Engrg., 2006. Download paper.
"Statistical learning techniques for costing XML queries." N. Zhang, P. J. Haas, V. Josifovsky, G. M. Lohman, and C. Zhang. Proc. 31st Intl. Conf. on Very Large Data Bases, 2005, 289-300. Download paper.
"Automated Statistics Collection in DB2 UDB." A. Aboulnaga, P. J. Haas, M. Kandil, S. Lightstone, G. Lohman, V. Markl, I. Popivanov, and V. Raman. Proc. 30th Intl. Conf. on Very Large Data Bases, 2004, 1146-1157. Download paper.
"Improved Histograms for Selectivity Estimation of Range Predicates." V. Poosala, Y. E. Ioannidis, P. J. Haas, and E. J. Shekita. Proc. 1996 ACM SIGMOD Int. Conf. Management of Data, 294-305. Download paper.
"GORDIAN: Efficient and Scalable Discovery of All Composite Keys." Y. Sismanis, P. J. Haas, and B. Reinwald. Proc. 32nd Intl. Conf. on Very Large Data Bases, 2006, 691-702. Download paper.
"Watermarking Relational Data: Framework, Algorithms, and Analysis." R. Agrawal, P. J. Haas, and J. Kiernan. VLDB Journal, 12, 2003, 157-169. Download paper.
"The Maximum and Mean of a Random Length Sequence." P. J. Haas. J. Appl. Probab., 29, 1992, 460-466. Download paper.
"The Effects of NO2-Aerosol Interaction on Indices of Perceived Visibility Impairment." P. J. Haas and A. J. Fabrick. Atmospheric Environment, 15, 2171-2177, 1981. Abstract.
phaas $AT$ us.ibm.com
IBM Almaden Research Center, K55/B1
650 Harry Rd.
San Jose, CA 95120-6099
External 1-408-927-1702Last updated: Thursday, 4/26/2012
[IBM Almaden Services Research | IBM Almaden Computer Science | IBM Almaden | IBM Research ]
[ IBM home page | Order | Search | Contact IBM | Help | (C) | (TM) ]