Submissions/Wikipedia API, Python and 100+ languages
This is a rejected submission for Wikimania 2012.
- Submission no. 678
- Title of the submission
Wikipedia API, Python and 100+ languages
- Type of submission (workshop, tutorial, panel, presentation)
- Author of the submission
- E-mail address
- Country of origin
- Affiliation, if any (organization, company etc.)
University of Maryland, Human-Computer Interaction Lab
John Hopkins University, Human Language Technology Center of Excellence
- Personal homepage or blog
- Abstract (at least 300 words to describe your proposal)
As research assistant at UMD/JHU I work on various projects focused on bilingual and monolingual translation, machine translation, crowd-sourced translation and natural language processing. Wikipedia projects are one of the essential tools providing access to hundreds of languages and hundreds of thousands of articles ready for unlimited access and use in language research.
But manipulating such large amount of data in different languages is not always a simple task.
In this workshop/tutorial I want to cover following topics:
- Wikipedia API
- Processing Wikipedia articles using Python
- Parsing and extracting sentences/words out of Wikipedia articles
- Working with 100+ languages in unified fashion
- Tips and tricks
As one of the outcomes of this talk, I want to gather requirements from other people using Wikipedia for language research to build a better tools and libraries that language research community can use.
- Research, Analysis, and Education
- Technology and Infrastructure
- Length of presentation/talk
- Will you attend Wikimania if your submission is not accepted?
- Slides or further information (optional)
Will prepare slides if talk is selected.
- Special request as to time of presentations
If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).