Lexical data sources

A reference post, listing the lexical data sources that I know of. This post will be updated as I discover more.

Word lists, preferably with phonetic/phonological forms

Indo-European

Dutch: CELEX
English: CELEX, CMU pronouncing dictionary, WordReference.com
French: Lexique, WordReference.com
German: CELEX
Greek: GreekLex
Italian: WordReference.com 
Portuguese: PorLex
Slovak: Slovak National Corpus
Spanish: EsPal, Spanish Gigaword 3rd Edition (Mendonça, Graff & DiPersio, 2009)

Note that WordReference.com is a multi-lingual dictionary site, not a word list, but does contain IPA transcriptions for some languages. The source of these transcriptions is not given.

Austronesian

The Austronesian Basic Vocabulary Database

Bantu

cBold

Semitic

Arabic: Aralex (based on written Arabic)

Sino-Tibetan

Hong Kong Cantonese: Hong Kong Cantonese Adult Corpus (HKCAC); Leung & Law, 2001

Semantic word-webs

Global list

Written forms and frequency data

Most Subtl-xx databases can be found here: http://crr.ugent.be/programs-data/subtitle-frequencies
These are based on subtitles, which are generally a pretty good guide to real usage.

Indo-European

Dutch
American English
British English
German
Greek
Spanish

Sino-Tibetan

Mandarin Chinese

Isolate

Korean (coming soon)