jautils module

Utility functions specific for Japanese language.

jautils.get_additional_tokens(tokens)[source]

Generates new tokens by combining tokens and converting them to various character representations, which can be used as search index tokens.

Args:
tokens: a list or set of unicode strings to expand from.
Returns:
A set of newly generated tokens to add to the search index.
jautils.hiragana_to_romaji(string)[source]

Replaces each occurrence of hiragana in a unicode string with a romaji.

Args:
string: a unicode string, possibly containing hiragana characters.
Returns:
The replaced string.
jautils.is_hiragana(string)[source]

Returns True if the argument is a non-empty string of only hiragana characters.

jautils.katakana_to_hiragana(string)[source]

Replaces each occurrence of katakana in a unicode string with a hiragana.

Args:
string: a unicode string, possibly containing katakana characters.
Returns:
The replaced string.
jautils.normalize(string)[source]

Normalizes the string with a Japanese specific logic.

Args:
string: a unicode string to normalize.
Returns:
a unicode string obtained by normalizing the input string.
jautils.normalize_hiragana(string)[source]

Normalizes hiragana characters to absorb confusing spelling variations.

Args:
string: a unicode string, possibly containing hiragana characters.
Returns:
The normalized string.
jautils.should_normalize(string)[source]

Checks if the string should be normalized by jautils.normalize() as opposed to text_query.normalize().

Args:
string: a unicode string to check.
Returns:
True if the string should be normalized by jautils.normalize().
jautils.sorted_by_popularity(tokens)[source]

Sort tokens according to popularity (see NAME_CHAR_POPULARITY_MAP) so that tokens that are LESS popular in Japanese names come first, and return the sorted tokens.

Args:
tokens: tokens to sort.
Returns:
Sorted tokens.