Yunyao Li's Home Page


HOME | RESEARCH | PUBLICATION | LINKS

* If a file is not available online, you can contact me and I will usually be able to send you a copy.

  • David Simmen, Fred Reiss, Yunyao Li Enabling Enterprise Mashups over Unstructured Text In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2009 - Demonstration Track) [PDF]

  • Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan, and Huaiyu Zhu. SystemT: A System for Declarative Information Extraction In SIGMOD Record, 37, 4, Dec. 2008 [PDF]

  • Yunyao Li, Rajasekar Krishnamurthy, Sriram Raghavan, Shivakumar Vaithyanathan, and H.V. Jagadish. Regular Expression Learning for Information Extraction In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), Honolulu, Hawaii, October 2008 [PDF][PPT] (Data set used in this paper will be available soon)

  • Yunyao Li, Cong Yu and H. V. Jagadish. Enabling Schema-Free XQuery1 with Meaningful Query Focus. Very Large Data Base Journal (VLDB J.) 17(3) 2008

  • Yunyao Li, Huahai Yang, H. V. Jagadish. NaLIX: A Generic Natural Language Search Environment for XML Data ACM Trans. Database Syst. 32, 4. Nov. 2007, 30. [link]
  • Yunyao Li, Ishan Chaudhuri, Huahai Yang, Satinder Singh and H. V. Jagadish. Enabling Domain-Awareness for a Generic Natural Language Interface. In Proceedings of 22nd Conference on Artificial Intelligence (AAAI 2007), Vancouver, British Columbia, Canada, July 2007 [PDF] (Query set used in this paper is available here.)
  • Yunyao Li, Ishan Chaudhuri, Huahai Yang, Satinder Singh, H. V. Jagadish. DaNaLIX: a Domain-adaptive Natural Language Interface for Querying XML. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2007 - Demonstration Track), Beijing, China, June 2007 [PDF]
  • H. V. Jagadish, Adriane Chapman, Aaron Elkiss, Magesh Jayapandian, Yunyao Li, Arnab Nandi and Cong Yu. Making Database Systems Usable. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2007), Beijing, China, June 2007. [PDF]
  • Yunyao Li, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, and H.V. Jagadish. Getting Work Done on the Web: Supporting Transactional Queries. In Proceedings of 29th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR 2006), Seattle, WA, August 2006 [PDF][Presentation (PPT)](Data used in this paper is available here)
  • Many searches on the web have a transactional intent. We argue that pages satisfying transactional needs can be distinguished from the more common pages that have some information and links, but cannot be used to execute a transaction. Based on this hypothesis, we provide a recipe for constructing a transaction annotator. By constructing an annotator with one corpus and then demonstrating its classification performance on another, we establish its robustness. Finally, we show experimentally that a search procedure that exploits such pre-annotation greatly outperforms traditional search for retrieving transactional pages.

  • Yunyao Li, Huahai Yang and H.V. Jagadish. Term Disambiguation in Natural Language Query for XML. In Proceedings of 7th International Conference on Flexible Query Answering Systems (FQAS 2006), Milano, Italy, June 2006 [PDF]
  • Converting a natural language query sentence into a formal database query is a major challenge. We have constructed NaLIX, a natural language interface for querying XML data. Through our experience with NaLIX, we find that failures in natural language query understanding can often be dealt with as ambiguities in term meanings. These failures are typically the result of either the user’s poor knowledge of the database schema or the system’s lack of linguistic coverage. With automatic term expansion techniques and appropriate interactive feedback, we are able to resolve these ambiguities. In this paper, we describe our approach and present results demonstrating its effectiveness.

  • Yunyao Li, Huahai Yang and H.V. Jagadish. Constructing a Generic Natural Language Interface for an XML Database. In Proceedings of International Conference on Extending Database Technology (EDBT 2006), Munich, Germany, March 2006 [PDF]
  • We describe the construction of a generic natural language query interface to an XML database. Our interface can accept an arbitrary English sentence as a query, which can be quite complex and include aggregation, nesting, and value joins, among other things. This query is translated, potentially after reformulation, into an XQuery expression. The translation is based on mapping grammatical proximity of natural language parsed tokens in the parse tree of the query sentence to proximity of corresponding elements in the XML data to be retrieved. Our experimental assessment, through a user study, demonstrates that this type of natural language interface is good enough to be usable now, with no restrictions on the application domain.

  • Yunyao Li, Huahai Yang and H.V. Jagadish. NaLIX: an Interactive Natural Language Interface for Querying XML. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2005), Baltimore, MD, June 2005 [PDF] (best demo)
  • Database query languages can be intimidating to the non- expert, leading to the immense recent popularity for keyword based search in spite of its signi¯cant limitations. The holy grail has been the development of a natural language query interface. We present NaLIX, a generic interactive natural language query interface to an XML database. Our system can accept an arbitrary English language sentence as query input, which can include aggregation, nesting, and value joins, among other things. This query is translated, potentially after reformulation, into an XQuery expression that can be evaluated against an XML database. The translation is done through mapping grammatical proximity of natural language parsed tokens to proximity of corresponding elements in the result XML. In this demonstration, we show that NaLIX, while far from being able to pass the Turing test, is perfectly usable in practice, and able to handle even quite complex queries in a variety of application domains. In addition, we also demonstrate how carefully designed features in NaLIX facilitate the interactive query process and improve the usability of the interface.

  • Yunyao Li, Cong Yu and H. V. Jagadish. Schema-Free XQuery1. In Proceedings of International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, September, 2004 [PDF][Presentation (pdf)]
  • The widespread adoption of XML holds out the promise that document structure can be exploited to specify precise database queries. However, the user may have only a limited knowledge of the XML structure, and hence may be unable to produce a correct XQuery, especially in the context of a heterogeneous information collection. The default is to use keyword-based search and we are all too familiar with how difficult it is to obtain precise answers by these means. We seek to address these problems by introducing the notion of Meaningful Lowest Common Ancestor Structure (MLCAS) for finding related nodes within an XML document. By automatically computing MLCAS and expanding ambiguous tag names, we add new functionality to XQuery and enable users to take full advantage of XQuery in querying XML data precisely and efficiently without requiring (perfect) knowledge of the document structure. Such a Schema-Free XQuery is potentially of value not just to casual users with partial knowledge of schema, but also to experts working in a data integration or data evolution context. In such a context, a schema-free query, once written, can be applied universally to multiple data sources that supply similar ontent under di®erent schemas, and applied "forever" as these schemas evolve. Our experimental evaluation found that it was possible to express a wide variety of queries in a schema-free manner and have them return correct results over a broad diversity of schemas. Furthermore, the evaluation of a schema-free query is not expensive using a novel stack-based algorithm we develop for computing MLCAS: from 1 to 4 times the execution time of an equivalent schema-aware query.

  • Yunyao Li and H. V. Jagadish. Compatibility Determination in Web Services. In ICEC 2003 E-Government and E-Services Workshop, Pittsburgh, PA, September 2003 [PDF][Presentation (PPT)]
  • Determining the compatibility between Web services plays a critical role in supporting dynamic discovery and collaboration of Web services in the inherently heterogeneous web environment. In this paper we present a compatibility determination algorithm. The algorithm takes two graphs (each representing the external interface of a Web service) as inputs, and produces the following as outputs (1) judgment of compatibility (true or false), and (2) differences between two graphs. The outputs can be used by a Web service to enable dynamic collaboration between Web services. A visualized representation of differences found can enable a human to determine the criticality of the differences.

     

    NOTE

    1. An implementation of Schema-Free XQuery is released as part of the TIMBER project

     

    Disclaimer: These documents are made available as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each copyright holder. These works may not be reposted without the explicit permission of the copyright holder.

    Last updated 10.08.2009
    ©2007-2009 Yunyao Li