Text and Information


Welcome to the Information Management Principles home page at IBM Almaden Computer Science Research.  Members of this group design techniques to deal with information overload in today's interconnected world of web and intranet servers. We focus on unstructured and semi-structured information sources such as hypertext.  Although the web is growing exponentially, the individual's capacity to read and digest matter is essentially fixed.  Most of us react to information explosion by reading only relevant and authoritative matter.  Relevance depends on the user and the information need; these can be characterized by the documents that the user has seen or liked, and their link structure. Authority or quality can be attributed to documents based on hyperlink citations. Various techniques based on machine learning and graph algorithms are being used to mine documents in large hypertext databases for relevance and quality.



Acrobat Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies* Soumen Chakrabarti, Byron Dom, Rakesh Agrawal and Prabhakar Raghavan VLDB Journal 1998 (Invited paper)
Postscript PowerPoint Enhanced hypertext categorization using hyperlinks** Soumen Chakrabarti, Byron Dom and Piotr Indyk SIGMOD 1998
Postscript A probabilistic analysis of latent semantic indexing Christos Papadimitriou, Prabhakar Raghavan, Hisao Tamaki and Santosh Vempala PODS 1998
Postscript Inferring Web Communities from Link Topologies David Gibson, J. Kleinberg and Prabhakar Raghavan ACM Hypertext, 1998
HTML Automatic resource compilation by analyzing hyperlink structure and associated text (3) Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, and Jon Kleinberg WWW7 1998
Postscript Authoritative sources in a hyperlinked environment Jon Kleinberg SODA 1998
Postscript Using taxonomy, discriminants and signatures to navigate in text databases Soumen Chakrabarti, Byron Dom, Rakesh Agrawal and Prabhakar Raghavan VLDB 1997
Postscript Model Selection in Unsupervised Learning With Applications To Document Clustering Byron Dom, Shivakumar Vaithyanathan ICML 1999
Postscript Generalized Model Selection For Unsupervised Learning In High Dimensions Byron Dom, Shivakumar Vaithyanathan NIPS 1999

Relevant resource links

Copyright notice: The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
[ Privacy | Legal | Search | Contact IBM ]