jautils module¶
Utility functions specific for Japanese language.
-
jautils.get_additional_tokens(tokens)[source]¶ Generates new tokens by combining tokens and converting them to various character representations, which can be used as search index tokens.
- Args:
- tokens: a list or set of unicode strings to expand from.
- Returns:
- A set of newly generated tokens to add to the search index.
-
jautils.hiragana_to_romaji(string)[source]¶ Replaces each occurrence of hiragana in a unicode string with a romaji.
- Args:
- string: a unicode string, possibly containing hiragana characters.
- Returns:
- The replaced string.
-
jautils.is_hiragana(string)[source]¶ Returns True if the argument is a non-empty string of only hiragana characters.
-
jautils.katakana_to_hiragana(string)[source]¶ Replaces each occurrence of katakana in a unicode string with a hiragana.
- Args:
- string: a unicode string, possibly containing katakana characters.
- Returns:
- The replaced string.
-
jautils.normalize(string)[source]¶ Normalizes the string with a Japanese specific logic.
- Args:
- string: a unicode string to normalize.
- Returns:
- a unicode string obtained by normalizing the input string.
-
jautils.normalize_hiragana(string)[source]¶ Normalizes hiragana characters to absorb confusing spelling variations.
- Args:
- string: a unicode string, possibly containing hiragana characters.
- Returns:
- The normalized string.