unicode::wordbreak_callback_base, unicode::wordbreak — unicode word-breaking rules
#include <courier-unicode.h> class wordbreak : public unicode::wordbreak_callback_base { public: using unicode::wordbreak_callback_base::operator<<; using unicode::wordbreak_callback_base::operator(); int callback(bool flag) { // ... } }; char32_t c; std::u32string buf; wordbreak compute_wordbreak; compute_wordbreak << c; compute_wordbreak(buf); compute_wordbreak(buf.begin(), buf.end()); compute_wordbreak.finish(); // ... unicode_wordbreakscan scan; scan << c; size_t nchars=scan.finish();
unicode::wordbreak_callback_base
is a C++
binding for the unicode word-breaking rule implementation
described in unicode_word_break(3).
Subclass unicode::wordbreak_callback_base
and
implement callback
() that's
virtually inherited from unicode::wordbreak_callback_base
. The
callback
() callback function
receives the output values from the word-breaking algorithm,
namely a bool
indicating
whether a word break exists before the unicode character in
the underlying input sequence.
callback
() should return
0. A non-zero return reports an error, that stops the
word-breaking algorithm. See unicode_word_break(3) for
more information.
The input unicode characters for the word-breaking
algorithm are provided by the <<
operator, one unicode character at
a time; or by the ()
operator,
passing either a container, or a beginning and an ending
iterator value for an input sequence of unicode characters.
finish
() indicates the end of
the unicode character sequence.
unicode::wordbreakscan
is a
C++ binding for the unicode_wbscan_init
(), unicode_wbscan_next
() and unicode_wbscan_end
methods described in
unicode_word_break(3). Its
<<
iterates over the
unicode characters, and finish
() indicates the number of
characters before the first unicode word break. The
<<
iterator returns a
bool
indicating when the first
word break has already been found, so further calls are not
necessary.