public class WordIterator extends Object implements Selection.PositionIterator
BreakIterator#getWordInstance()
, and caches CharSequence
for performance reasons.
Also provides methods to determine word boundaries.
DONE
Constructor and Description |
---|
WordIterator()
Constructs a WordIterator using the default locale.
|
WordIterator(Locale locale)
Constructs a new WordIterator for the specified locale.
|
Modifier and Type | Method and Description |
---|---|
int |
following(int offset) |
int |
getBeginning(int offset)
If
offset is within a word, returns the index of the first character of that
word, otherwise returns BreakIterator.DONE. |
int |
getEnd(int offset)
If
offset is within a word, returns the index of the last character of that
word plus one, otherwise returns BreakIterator.DONE. |
int |
getNextWordEndOnTwoWordBoundary(int offset)
If the
offset is within a word or on a word boundary that can only be
considered the end of a word (e.g. word_ where "_" is any character that would not
be considered part of the word) then this returns the index of the last character
plus one of that word. |
int |
getPrevWordBeginningOnTwoWordsBoundary(int offset)
If the
offset is within a word or on a word boundary that can only be
considered the start of a word (e.g. |
int |
getPunctuationBeginning(int offset)
If
offset is within a group of punctuation as defined
by isPunctuation(int) , returns the index of the first character
of that group, otherwise returns BreakIterator.DONE. |
int |
getPunctuationEnd(int offset)
If
offset is within a group of punctuation as defined
by isPunctuation(int) , returns the index of the last character
of that group plus one, otherwise returns BreakIterator.DONE. |
boolean |
isAfterPunctuation(int offset)
Indicates if the provided offset is after a punctuation character
as defined by
isPunctuation(int) . |
boolean |
isBoundary(int offset) |
boolean |
isOnPunctuation(int offset)
Indicates if the provided offset is at a punctuation character
as defined by
isPunctuation(int) . |
int |
nextBoundary(int offset)
Returns the position of next boundary after the given offset.
|
int |
preceding(int offset) |
int |
prevBoundary(int offset)
Returns the position of boundary preceding the given offset or
DONE if the given offset specifies the starting position. |
void |
setCharSequence(CharSequence charSequence,
int start,
int end) |
public WordIterator()
public WordIterator(Locale locale)
locale
- The locale to be used when analysing the text.public void setCharSequence(CharSequence charSequence, int start, int end)
public int preceding(int offset)
preceding
in interface Selection.PositionIterator
public int following(int offset)
following
in interface Selection.PositionIterator
public boolean isBoundary(int offset)
public int nextBoundary(int offset)
DONE
if there is no boundary after the given offset.offset
- the given start position to search from.public int prevBoundary(int offset)
DONE
if the given offset specifies the starting position.offset
- the given start position to search from.public int getBeginning(int offset)
offset
is within a word, returns the index of the first character of that
word, otherwise returns BreakIterator.DONE.
The offsets that are considered to be part of a word are the indexes of its characters,
as well as the index of its last character plus one.
If offset is the index of a low surrogate character, BreakIterator.DONE will be returned.
Valid range for offset is [0..textLength] (note the inclusive upper bound).
The returned value is within [0..offset] or BreakIterator.DONE.IllegalArgumentException
- is offset is not valid.public int getEnd(int offset)
offset
is within a word, returns the index of the last character of that
word plus one, otherwise returns BreakIterator.DONE.
The offsets that are considered to be part of a word are the indexes of its characters,
as well as the index of its last character plus one.
If offset is the index of a low surrogate character, BreakIterator.DONE will be returned.
Valid range for offset is [0..textLength] (note the inclusive upper bound).
The returned value is within [offset..textLength] or BreakIterator.DONE.IllegalArgumentException
- is offset is not valid.public int getPrevWordBeginningOnTwoWordsBoundary(int offset)
offset
is within a word or on a word boundary that can only be
considered the start of a word (e.g. _word where "_" is any character that would not
be considered part of the word) then this returns the index of the first character of
that word.
If the offset is on a word boundary that can be considered the start and end of a
word, e.g. AABB (where AA and BB are both words) and the offset is the boundary
between AA and BB, this would return the start of the previous word, AA.
Returns BreakIterator.DONE if there is no previous boundary.IllegalArgumentException
- is offset is not valid.public int getNextWordEndOnTwoWordBoundary(int offset)
offset
is within a word or on a word boundary that can only be
considered the end of a word (e.g. word_ where "_" is any character that would not
be considered part of the word) then this returns the index of the last character
plus one of that word.
If the offset is on a word boundary that can be considered the start and end of a
word, e.g. AABB (where AA and BB are both words) and the offset is the boundary
between AA and BB, this would return the end of the next word, BB.
Returns BreakIterator.DONE if there is no next boundary.IllegalArgumentException
- is offset is not valid.public int getPunctuationBeginning(int offset)
offset
is within a group of punctuation as defined
by isPunctuation(int)
, returns the index of the first character
of that group, otherwise returns BreakIterator.DONE.offset
- the offset to search from.public int getPunctuationEnd(int offset)
offset
is within a group of punctuation as defined
by isPunctuation(int)
, returns the index of the last character
of that group plus one, otherwise returns BreakIterator.DONE.offset
- the offset to search from.public boolean isAfterPunctuation(int offset)
isPunctuation(int)
.offset
- the offset to check from.public boolean isOnPunctuation(int offset)
isPunctuation(int)
.offset
- the offset to check from.