Non-English materials

Summary: Some strategies and tools for cataloging materials in non-English languages, with an emphasis on non-Western-European languages. 




Background

The two main challenges of non-English cataloging are:

  • achieving enough translation to allow for subject analysis; and 
  • achieving required transliteration of any transcribed non-Roman scripts

For Western European languages, OCLC searching and subject analysis are more easily within reach, even for non-speakers. Other languages, especially those in non-Roman scripts, typically require additional time and research, and the use of special strategies like OCR tools, online translators, and/or assistance from a native speaker. 

General tips

  • Google the ISBN of the resource; this can lead you to essential info like title and author in the published language and script (plus publisher summaries or book announcements to help with subject analysis)
  • Use the Google Translate extension in Google Chrome to translate web pages; this can further assist in researching the resource online

Character encoding & alternatives

  • While OCLC is Unicode compliant, LC and many other catalog systems cannot render certain Unicode characters. 
  • When working with diacritics and special characters, reference the LC PCC PS for RDA 1.4 (Language and Script), which provides alternative MARC-8/UTF-8 compliant characters for use in MARC records
    • specifically, the section "Special Letters, Diacritical Marks, and Punctuation Marks" 
    • this can apply to non-Roman scripts, but also to scripts that are much closer to Romanized alphabets, like the writing system for the Twi language
    • when alternative characters are needed, catalogers are encouraged to use variant titles to reflect both the original Unicode rendering and the UTF-8 alternative
    • NOTE: this practice is inconsistently followed in OCLC, so when searching for copy, run queries for both the original character and the alternate character

Publication info

  • In books where publication info and ISBNs do not appear on the t.p. verso, check the the end of the book for a colophon
  • If place of publication isn't evident, supply a possible location based on the resource's language rather than using "[Place of publication not identified]"
  • Example: For a Bulgarian-language resource, use 264 1_ ‡a [Bulgaria?] : $b .....

Language codes

  • Lang fixed field
    • Only codes from the MARC code list for languages may be used in this field
    • The MARC code list does not always have a code for a specific language; in these cases, choose the most specific field possible
      • Example: For the African language Tem, the most specific available MARC code is nic for the family of Niger-Kordofanian languages
  • 041 field
    • Both MARC codes and ISO-639-3 codes are valid for use in this field
    • ISO-639-3 codes may be applied using specific coding and following these PCC guidelines
      • Example: 041 07 [ISO code] $2 iso639-3

242 field for translated title

  • If you are able to determine or supply a reliable translation of the title, use the 242 field to include the translation and the language code

242 04 ‡a The Torah of Moses ‡y eng

  • An accompanying 500 note is also recommended, to say "Translated title supplied by cataloger" or similar language.
  • In general, online translation tools shouldn't be used for this purpose; translations should be from a trustworthy source like a language expert, accompanying info, etc.

5XX fields

  • 546 language note
    • these are useful additions because MARC language codes aren't always as granular
      • example: language code Creole, actual language Cape Verdean Creole 
    • for non-Roman-languages, use ‡b to indicate which script is in use
      • example: 

        546  In Wolof; ‡b Ajami script.

  • 520 summary note
    • catalogers are encouraged to include this as frequently and extensively as possible, as they are especially helpful for non-English resources, and can incorporate keyword-searchable terms that may not appear elsewhere in the record

Subject analysis

  • For materials in non-Roman scripts:
    • If there is not sufficient information for in-depth subject analysis, but otherwise the resource can be fully described, consider using the subject heading pattern Language ‡v Texts.

    • Example:

      650 _0 Wolof language ‡v Texts.

    • This will permit classification and some basic access at least.

  • Parallel subject terms in other languages are not required and generally should not be attempted


Back to top
 


Transcription, transliteration, and translation tools


Online translators and keyboards

Use online translation tools with caution. Resulting translations cannot be considered fully reliable, so avoid making cataloging decisions based solely on the use of these tools. Also avoid quoting translated text from these tools in catalog records. Translation results are best used as one of several factors in cataloging treatment. 

  • www.stars21.com/translator 

    • Aggregate of online translator tools (Google, Yahoo, Microsoft, Yandex, and more)
    • Pop-up keyboards to create character strings in non-Roman scripts
    • Supplies accompanying translations
  • www.lexilogos.com/keyboard/index.htm
    • Keyboards for non-Roman scripts
    • Can supply transliteration, but not necessarily LC-compliant
    • Dictionaries for multiple languages
    • No translation available
  • www.translate.google.com
    • Can submit pictures for OCR translation (optimal character recognition)
      • resulting OCRed text can be sent by email and copy/pasted into MARC records
      • if accurate, more efficient than typing out individual characters using one of the keyboards above
      • certain color combinations can cause some issues with the OCR function
    • Separate Google Drive and Google Docs apps can be useful instead of emailing it to yourself
    • For individuals without smartphones, can check out iPads from Patron Services

Softwares

  • Microsoft Word
    • can render right-to-left script effectively
    • available symbol and script characters will allow typing out of non-Roman-language information
      • good font set for African languages!
  • Windows 10 touch keyboard
  • Connexion insert symbol
  • Sierra character map
  • MarcEdit

Transliteration tools

All transcribed non-Roman script information included in a record must be linked to a field containing a transliteration of that information in Romanized form. The Romanization must be constructed using the authorized Library of Congress tables. 

A note about these tables: Although they are considered authoritative by the Library of Congress, they often reflect older or outmoded Romanization schemes that are no longer commonly in use or viewed as accurate by language experts. Regardless of this, PCC requirements constrain catalogers to use these tables only to create transliterated fields. Even if the resource includes a transliterated version of its title, you must still supply an additional transliteration using the LC tables.

Some automatic tools exist to help with transliteration, but use this with a critical eye, especially for languages that involve implied vowels, or vowel diacritics that transliteration tables cannot adequately record/interpret.

  • MarcEdit transliteration
    • Arabic
    • Hebrew
  • OCLC Connexion transliteration macros
    • Cyrillic

Back to top


Specific language & script FYIs

Some languages have their own unique writing system, as in the case of Korean, which is written in the Hangul alphabet. In many other cases, different languages may share a writing system. For instance, Bulgarian, Russian, and Serbian languages share the Cyrillic alphabet. In these situations, some variations in characters and diacritics will occur, but in general, a primary set of characters is common throughout. 

Some languages may be represented in multiple scripts. Some African languages, like Wolof, might be rendered in either a romanized Latin alphabet or in Ajami script, depending on the nature of the resource. 

Languages

Scripts

  • Semitic scripts (Arabic, Hebrew, Amharic, etc.)
    • Often employ implied or unwritten vowels, or vowels represented by small diacritics
  • Ajami script
    • The use of Arabic characters to represent non-Arabic languages, like Hausa, Swahili, and Wolof
  • N'ko script
    • A more recent writing system created for the Manding family of African languages
    • Read right to left like Arabic, Hebrew, etc.
    • Transliteration involves understanding vowel placement

Back to top



Policies

Per PCC policy, any transcribed non-Roman script information must also be transliterated using appropriate LC romanization tables.


ContactJoshua Barton or Autumn Faulkner
TeamCMS
UpdatedNovember 2023
CreatedSeptember 2019