Most similar languages

Speak new languages using words you already know!

Mutual intelligibility is a situation in which two or more speakers of a language can understand each other.

Lexical similarity is a measure of the degree to which the word sets of two given languages are similar. A lexical similarity of 1 (or 100%) would mean a total overlap between vocabularies, whereas 0 means there are no common words.

Our similarity measure ():

S == similarity

C == coverage

W == common_words

N == Number_of_words_shared_with_other_languages

S(L1|L2) = S(L2|L1) = ( W(L1|L2) + W(L2|L1) ) / ( 2 * min( N(L1), N(L2) ) )

C(L1|L2) = C(L2|L1) = ( W(L1|L2) + W(L2|L1) ) / ( N(L1) + N(L2) )



Lexical similarity (& coverage) values for pairs of selected Romance, Germanic, and Slavic languages (compare with Ethnologue data):

CatalanEnglishFrenchGermanItalianPortugueseRomanianRussianSpanish
Catalan134% (28%) 28% (19%) 19% (0%) 36% (20%) 41% (23%) 25% (10%) 14% (0%) 86% (75%) 
English34% (28%) 146% (39%) 51% (29%) 37% (26%) 28% (21%) 44% (23%) 25% (14%) 29% (20%) 
French28% (19%) 46% (39%) 133% (23%) 22% (19%) 20% (17%) 21% (13%) 12% (0%) 34% (20%) 
German19% (0%) 51% (29%) 33% (23%) 116% (13%) 15% (12%) 11% (10%) 15% (15%) 21% (0%) 
Italian36% (20%) 37% (26%) 22% (19%) 16% (13%) 137% (35%) 31% (24%) 12% (0%) 61% (27%) 
Portuguese41% (23%) 28% (21%) 20% (17%) 15% (12%) 37% (35%) 124% (18%) 10% (0%) 86% (41%) 
Romanian25% (10%) 44% (23%) 21% (13%) 11% (10%) 31% (24%) 24% (18%) 1-63% (19%) 
Russian14% (0%) 25% (14%) 12% (0%) 15% (15%) 12% (0%) 10% (0%) -115% (0%) 
Spanish86% (75%) 29% (20%) 34% (20%) 21% (0%) 61% (27%) 86% (41%) 63% (19%) 15% (0%) 1

Most similar languages

Pairs with small sample size have been removed. ()

Number of common words

.