Research
My current research focus is to explore new database architectures, such as key value stores, column oriented databases, and databases on new storage devices.
I have worked on data warehousing, business intelligence, Web services, data integration and Web data management before.
Selected Publications
--- New Database Exploration
- Efficient and Scalable Data Evolution with Column Oriented Databases. Ziyang Liu, Bin He, Hui-I Hsiao, and Yi Chen. EDBT 2011.
- High Performance Database Logging using Storage Class Memory. Ru Fang, Hui-I Hsiao, Bin He, C. Mohan, and Yun Wang. ICDE 2011.
- CODS: Evolving Data Efficiently and Scalably in Column Oriented Databases. Ziyang Liu, Sivaramakrishnan Natarajan, Bin He, Hui-I Hsiao, and Yi Chen. VLDB 2010 Demo.
- Top-K Aggregation Queries Over Large Networks. Xifeng Yan, Bin He, Feida Zhu, and Jiawei Han. ICDE 2010.
--- Business Intelligence and Data Warehousing
- Efficient Iceberg Query Evaluation using Compressed Bitmap Index. Ziyang Liu, Yu Huang, Bin He, Hui-I Hsiao, and Yi Chen. IEEE Transactions on Knowledge and Data Engineering (TKDE) 2010.
- SIMPLE: A Strategic Information Mining Platform for Licensing and Execution. Ying Chen, W. Scott Spangler, Jeffrey T. Kreulen, Stephen Boyer, Thomas D. Griffin, Alfredo Alba, Amit Behal, Bin He, Linda Kato, Ana Lelescu, Cheryl A. Kieliszewski, Xian Wu, Li Zhang. ICDM Workshops 2009.
- Understanding Complex IT Environments using Information Analytics and Visualization. Amit Behal, Ying Chen, Cheryl A. Kieliszewski, Ana Lelescu, Bin He, Jie Cui, Jeffrey T. Kreulen, E. Michael Maximilien, James Rhodes, W. Scott Spangler. CHIMIT 2007.
- Business Insights Workbench - An Interactive Insights Discovery Solution. Amit Behal, Ying Chen, Cheryl A. Kieliszewski, Ana Lelescu, Bin He, Jie Cui, Jeffrey T. Kreulen, James Rhodes, W. Scott Spangler. HCI (9) 2007: 834-843.
- BIwTL: A Business Information Warehouse Toolkit and Language for Warehousing Simplification and Automation. Bin He, Rui Wang, Ying Chen, Ana Lelescu, and James Rhodes. SIGMOD 2007. [PDF]
- COBRA - Mining Web for Corporate Brand and Reputation Analysis W. Scott Spangler, Ying Chen, Larry Proctor, Ana Lelescu, Amit Behal, Bin He, Thomas D. Griffin, Anna Liu, Brad Wade, and Trevor Davis: Web Intelligence 2007. Journal version in Web Intelligence and Agent Systems 7(3): 243-254, 2009.
--- Deep Web Integration and Exploration (Ph.D. work)
- Accessing the Deep Web: A Survey. Bin He, Mitesh Patel, Zhen Zhang, and Kevin Chen-Chuan Chang. Communications of the ACM (CACM), 50(5): 94-101, May 2007. [PDF]
- Automatic Complex Schema Matching across Web Query Interfaces: A Correlation Mining Approach. Bin He and Kevin Chen-Chuan Chang. ACM Transactions on Database Systems (TODS), 31(1):346-395, March 2006. [PDF]
- Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly. Zhen Zhang, Bin He, and Kevin Chen-Chuan Chang. VLDB 2005. [PDF]
- Making Holistic Schema Matching Robust: An Ensemble Approach. Bin He and Kevin Chen-Chuan Chang. SIGKDD 2005, Full Paper. [PDF]
- Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. Kevin Chen-Chuan Chang, Bin He, and Zhen Zhang. CIDR 2005. [PDF]
- A Holistic Paradigm for Large Scale Schema Matching. Bin He and Kevin Chen-Chuan Chang. SIGMOD Record, 33(4):20-25, December 2004. Invited paper. [PDF]
- Organizing Structured Web Sources by Query Schemas: A Clustering Approach. Bin He, Tao Tao, and Kevin Chen-Chuan Chang. CIKM 2004.
- Structured Databases on the Web: Observations and Implications. Kevin Chen-Chuan Chang, Bin He, Chengkai Li, Mitesh Patel, and Zhen Zhang. SIGMOD Record, 33(3):61-70, September 2004. [PDF]
- Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach. Bin He, Kevin Chen-Chuan Chang, and Jiawei Han. SIGKDD 2004, Full Paper. [PDF]
- Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax. Zhen Zhang, B.in He, and Kevin Chen-Chuan Chang. SIGMOD 2004. [PDF]
- Statistical Schema Matching across Web Query Interfaces. Bin He and Kevin Chen-Chuan Chang. SIGMOD 2003.[PDF]
More papers
Honors
- Master Inventor, IBM, 2011
- Invention Plateau Awards, IBM, 2008, 2009, 2010, 2011
- Invention Achievement Awards, IBM, 2007, 2008, 2009, 2010, 2011
- Patent Issuance Award, IBM, 2010
- Runner Up of Almaden Grand Challenge Competition, IBM, 2009
- Outstanding Technical Achievement Award, IBM, 2008
- Runner Up of ASR Best Paper Award, IBM, 2008
- Bravo! Awards, IBM, 2006, 2008
- Innovation Matters for Business Insights Workbench, IBM, 2006
- Winner of ComputerWorld Horizon Award for Business Insights Workbench, 2006
Patents and Inventions
- SCM-conscious Transactional Key-Value Store. Filed 2011.
- Space Management for Storage Class Memory (SCM). Filed 2011.
- Building a Bi-Temporal Key Value Cache System MemcachBT. Filed 2011.
- MetaQuerier WebCrawler - Building a Database of Online Databases - A Structure-Driven Web Form Crawler. University of Illinois, Office of Technology. No. TF06043, 2011.
- A Method for Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax. University of Illinois, Office of Technology. No. TF04110, 2011.
- High Performance Logging using Storage Class Memory. Filed 2011.
- Space Efficient Aggregation for Column Oriented Data Store. Filed 2010.
- Support Complex Comparisons in Dynamically Formed Groups. Filed 2010.
- Efficient Computation of Top-K Aggregation Over Graph and Network Data. Filed 2009.
- Efficient Iceberg Query Evaluation using Compressed Bitmap Index. Filed 2009.
- An Efficient Subject-Independent Document Readership Classifier. Filed 2009.
- Efficient and Scalable Data Evolution with Column Oriented Databases. Filed 2009.
- Concurrency Control for Multiple ETL Processes. Filed 2009.
- Efficient ETL Schemes for Versioning Data Warehouses. Filed 2008.
- Support Multi-Value Slice and Dice in Data Warehouses. Filed 2008.
- A General Data Filtering and Optimization Framework for ETL processes. Filed 2008.
- A Simplified ER Model to Access Structured Data. Filed 2008.
- Adaptive Aggregation: Improving the Performance of Grouping and Duplicate Elimination By Avoiding Unnecessary Disk Access. Filed 2008.
- A Weighted Hasse Diagram Approach to Optimal Extraction and Transformation in an ETL Process. Filed 2008.
- Efficient Update Methods for Large Volume Data Updates in Data Warehouses. Filed 2007.
- Business Information Warehouse Toolkit and Language for Warehousing Simplification and Automation. Filed 2007.
- Failure Recovery and Error Correction Techniques for Data Loading in Large Information Warehouses. Filed 2007. US Patent 7739547B2, Granted June 2010.
- Method and System for Extracting Web Query Interfaces. Filed 2004. US Patent 7552116, Granted September 2009.