Submissions/Wikipedia API, Python and 100+ languages

From Wikimania 2012 • Washington, D.C., USA
Jump to navigation Jump to search

This is a rejected submission for Wikimania 2012.

Submission no. 678
Title of the submission

Wikipedia API, Python and 100+ languages

Type of submission (workshop, tutorial, panel, presentation)


Author of the submission

Dmitry Kachaev

E-mail address



Country of origin


Affiliation, if any (organization, company etc.)

University of Maryland, Human-Computer Interaction Lab

John Hopkins University, Human Language Technology Center of Excellence

Personal homepage or blog

Abstract (at least 300 words to describe your proposal)

As research assistant at UMD/JHU I work on various projects focused on bilingual and monolingual translation, machine translation, crowd-sourced translation and natural language processing. Wikipedia projects are one of the essential tools providing access to hundreds of languages and hundreds of thousands of articles ready for unlimited access and use in language research.

But manipulating such large amount of data in different languages is not always a simple task.

In this workshop/tutorial I want to cover following topics:

  • Wikipedia API
  • Processing Wikipedia articles using Python
  • Parsing and extracting sentences/words out of Wikipedia articles
  • Working with 100+ languages in unified fashion
  • Tips and tricks

As one of the outcomes of this talk, I want to gather requirements from other people using Wikipedia for language research to build a better tools and libraries that language research community can use.

  • Research, Analysis, and Education
  • Technology and Infrastructure
Length of presentation/talk

25 minutes

Will you attend Wikimania if your submission is not accepted?


Slides or further information (optional)

Will prepare slides if talk is selected.

Special request as to time of presentations

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Valid entry (talk) 02:16, 22 March 2012 (UTC) if it doesn't collide with my presentation timeReply[reply]
  2. Yuvipanda (talk) 00:00, 22 March 2012 (UTC)Reply[reply]
  3. Sumanah (talk) 18:24, 21 March 2012 (UTC)Reply[reply]
  4. Logicwiki (talk) 07:44, 22 March 2012 (UTC)Reply[reply]
  5. Bináris (talk) 22:23, 23 March 2012 (UTC)Reply[reply]