TEXAS: Taxonomy Extraction with Applications in Semantics
Taxonomies form the backbone of knowledge-based systems by organizing knowledge in a machine interpretable manner and facilitating information integration. Hierarchical structures provide valuable input in knowledge-intensive applications such as question answering and textual entailment and are useful tools for browsing and navigation of document collections, especially when applied for exploration and discovery.
Although some taxonomies are readily available as part of language and web resources such as WordNet and Wikipedia, not all domains are covered and existing taxonomies are often too small to fully describe a domain. Automatic taxonomy extraction methods have been developed in recent years to address this problem, but issues remain in evaluation, comparison and application of extracted taxonomies [1, 2, 3, 4, 5, 6]. Depending on the application, multiple perspectives can be equally valid both in the selection of concepts and in the extraction of relations between them. This makes the resulting taxonomies difficult to compare, as they are based on different requirements. For instance, WordNet is a lexical semantic resource that is used mainly for tracking hyponymic substitution (e.g. ‘table’ can be replaced by ‘furniture’) with the main requirement of broad lexical coverage. On the other hand, subject hierarchies, such as the ACM Subject Classification, are used mainly for document collection browsing (e.g. fine-grained topic distinction such as ‘information retrieval’ vs. ‘information extraction’) with the main requirements of comprehensibility and coherence.
The TEXAS workshop aims at addressing these issues by providing a venue for presenting and discussing approaches that evaluate taxonomy extraction , and its subtasks (term/concept extraction, term/concept relation discovery, taxonomy construction and cleaning) in the context of semantic applications such as: entity search, entity disambiguation and linking, information integration and summarization, knowledge acquisition, knowledge sharing, inference in NLP tasks (question answering, textual entailment), etc. In this way, progress towards automatically constructed hierarchies can be measured relative to other tasks and real-world applications.
 Kozareva, Zornitsa, and Eduard Hovy. "A semi-supervised method to learn and construct taxonomies using the web." Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010.
 Navigli, Roberto, Paola Velardi, and Stefano Faralli. "A graph-based algorithm for inducing lexical taxonomies from scratch." Proceedings of the Twenty-Second international joint conference on Artificial Intelligence- Volume Three. AAAI Press, 2011.
 Medelyan, Olena, et al. "Constructing a Focused Taxonomy from a Document Collection." The Semantic Web: Semantics and Big Data. Springer Berlin Heidelberg, 2013. 367-381.
 Stoica, Emilia, Marti A. Hearst, and Megan Richardson. "Automating Creation of Hierarchical Faceted Metadata Structures." HLT-NAACL. 2007.  Wang, Wei, Payam M. Barnaghi, and Andrzej Bargiela. "Probabilistic topic models for learning terminological ontologies." Knowledge and Data Engineering, IEEE Transactions on 22.7 (2010): 1028-1040.
 Paola Velardi, Stefano Faralli, Roberto Navigli: OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction. Computational Linguistics 39(3): 665-707 (2013)
 Elias Zavitsanos, Georgios Paliouras, George A. Vouros: Gold Standard Evaluation of Ontology Learning Methods through Ontology Transformation and Alignment. IEEE Trans. Knowl. Data Eng. 23(11): 1635-1648 (2011)
Expected research topics of relevance to the workshop:
- application-based evaluation of taxonomies in question answering, document browsing,document clustering, expert finding or other applications;
- using automatically constructed taxonomies for searching, browsing and organizing information
- constructing taxonomies for/from social media
- probabilistic models for topic hierarchies (hierarchical topic modeling)
- constructing taxonomies using hierarchical clustering
- using distributional models for taxonomy construction
- acquisition and modeling of categorical structure and modeling human category acquisition
- constructing topic categorization systems and subject hierarchies
- constructing hierarchical faceted metadata structures
- methods for transforming semi-structured knowledge resources into taxonomies
- merging and aligning existing resources for taxonomy construction
- comparing, aligning and evaluating existing hierarchical structures
- domain glossary acquisition and extracting taxonomies from definitions
- constructing application/domain specific taxonomies from existing resources (lexical resources,Linked Open Data, Wikipedia category structure, semantic networks)
- using different hierarchical structures (e.g., tree, DAG) and relation types (e.g., hyponymy, meronymy) for taxonomy construction
- attaching Named Entities to hierarchical structures and using Named Entities to drive taxonomy construction by extensional analysis
- multilinguality and taxonomies: constructing and using multilingual taxonomies
- paper submission: July 26, 2014
- paper notification: August 26, 2014
- camera ready: Sept. 15, 2014
- workshop: Oct. 29, 2014
Submissions should be made electronically, using the Softconf at https://www.softconf.com/emnlp2014/texas2014/.
Submissions should follow the two-column format of ACL 2014 proceedings and should not exceed 8 pages of content and one additional references page.
The LaTeX style files and the Microsoft Word style files tailored for this year's conference are available at: http://emnlp2014.org/call.html.
The reviewing of papers will be double-blind, so please make sure your paper shows the title, but no author information. You should likewise not have any self identifying references anywhere in the paper submitted for review. For example, rather than this: "We showed previously (Smith, 2001), ...", use citations such as: "Smith (2001) previously showed ...". References to your own work in thesis proposals should also be anonymized. You may for example write it as “in X (2000) we showed”, etc. and do not add your papers in the reference list.
- Nathalie Aussenac-Gilles, IRIT, France
- Kostadin Cholakov, Technische Universität Darmstadt, Germany
- Philipp Cimiano, University of Bielefeld, Germany
- Eduard Hovy, Language Technologies Institute, CMU, USA
- Grace Hui Yang, Georgetown University, USA
- Alyona Medelyan, Pingar, New Zealand
- Rogelio Nazar, Pontificia Universidad Católica de Valparaíso, Chile
- George Paliouras, NCSR Demokritos, Greece
- Michael Schroeder, BiOTEC, TU Dresden
- Dominic Widdows, Google, USA
- Dr. Paul Buitelaar - Unit for Natural Language Processing Insight, National University of Ireland, Galway
- Dr. Georgeta Bordea - Unit for Natural Language Processing Insight, National University of Ireland, Galway
- Prof. Roberto Navigli - Linguistic Computing Laboratory Dept. of Computer Science Sapienza University of Rome, Italy
- Stefano Faralli - Linguistic Computing Laboratory Dept. of Computer Science Sapienza University of Rome, Italy
The TEXAS workshop will be supported by the following projects:
- The “MultiJEDI” ERC Starting Grant (http://multijedi.org/), lead by Prof. Roberto Navigli at the Linguistic Computing Laboratory of the Sapienza University of Rome, Italy.
- Linked Data and Text Mining research area (http://nlp.deri.ie/), lead by Dr. Paul Buitelaar at INSIGHT (http://www.insight-centre.org/), the Irish National Centre for Data Analytics, National University of Ireland, Galway.