How can I use the Wiktionary API for getting pronunciation data?

Wiktionary doesn't have an API of its own. MediaWiki, the software the Wiktionary runs on does have an API but it is completely unaware of the structure and content of Wiktionary.

The best you can do is use the MediaWiki API to find the wiki page for the word you want, then look at the table of contents. If the table of contents has a language section for the language you want and within that there is a Pronunciation section, then use another API call to get the wikitext of that section which you will have to parse yourself. It may well use or not different templates on different words since Wiktionary is constantly evolving.

There are also mailing lists for Wiktionary and for MediaWiki API.


Here is what I did for a similar situation.

  1. Visit Scraping Links With PHP. It will teach you how to scrape links using PHP. Please do not copy and paste but try to learn it.
  2. Now that we have our links we need to separate the audio (*.ogg) ones from the normal links. We need to use the pathinfo function in PHP. The officual documentation for pathinfo should be a good start.
  3. Create a XML out of the result.
  4. Deliver the content using Ajax or any other prefered way.

Or you can give "http://api.forvo.com/demo" a try. It looks promising.

I will not give you the full answer! Because it will not be fun any more. I hope it helps.


You could build on wiktionary dbpedia an send a SPARQL query like the following one to their SPARQL endpoint:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wt:<http://wiktionary.dbpedia.org/terms/>

SELECT DISTINCT ?spell ?pronounce
WHERE { 
  ?spell rdfs:label "work"@en ;
            wt:hasLangUsage ?use .

  ?use dc:language wt:English ;
          wt:hasPronunciation ?pronounce .
}

In this case "work" is the word for which you want to look up the spelling.

EDIT:

A similar project is dbnary, which is more active and delivers more reliable results. You can use the SPARQL endpoint with the following query:

SELECT DISTINCT ?pronun
WHERE {
  ?form lemon:writtenRep "work"@en ;
        lexinfo:pronunciation ?pronun .
}