Patterns of Frequency in the Basque Lexicon (PFBL)

This online application offers the public the opportunity to search for the frequency of structural patterns in the Basque lexicon, including the following.

  • Word frequency
  • Syllabic structure of Basque words: numbers of letters, numbers of syllables, and frequency of patterns such as CV, VV, VC, and so forth.
  • Similar words: words in which a letter is added, or letters have been taken out, transposed letters, and so forth.
  • Repeated syllables, groupings of two or three letters, and their location in the word.
  • Morphology of each lemma, its frequency, its grammatical category, and so forth.

The database has been drawn from the corpus Ereduzko prosa Gaur (EPG). Only common Basque words have been included, that is to say, true Basque lemmas. Leaving out proper names, words in other languages and errors, of the 25.1 million words in this corpus, 22.7 have been included in this database.

There are three possible types of searches.

  • A data search: general information from the database.
  • A word search based on criteria: the user can select the criteria that he or she wants to use to limit the search, and the database will provide lists of words matching those criteria.
  • A data search based on words: the user provides a list of words (or a piece of text) in a file, and the application will analyze each word.