Question on String#indexOf(String)
madbot at gmail.com
Tue Apr 28 16:46:41 UTC 2009
The overhead of setting up one of the faster algorithms was only worth it
when the skip ahead is large and/or the data to search is large. So you need
heuristics to determine when the overhead is worth it. Those heuristics
would also have to be fast.
Of course you could also leave that decision to the client invoking the
search. They may know, for instance, if they will be repeatedly invoking the
search. If you're searching for a string with a good jump ahead in a lot of
data and/or you will be doing it repeatedly, it's a good idea to use
java.util.regex where you can get the boyer moore speed.
Remember, also, that once a large number of programmers get used to certain
performance tradeoffs it is painful to pull the rug out from underneath them
and change those tradeoffs.
Anyway, all that said, I don't think there's anything inherently wrong with
trying to apply some smarter algorithms. Just be prepared for a lot of
backlash because you made searching for fred in fredrick take longer.
On Tue, Apr 28, 2009 at 8:53 AM, Joshua Bloch <jjb at google.com> wrote:
> I vaguely recall that madbot (Mike McCloskey) did some performance work on
> this method, and came to the conclusion that more sophisticated algorithms
> didn't actually pay for themselves in the common cases. I'm copying him so
> he can confirm or deny.
> On Tue, Apr 28, 2009 at 3:50 AM, Jim Andreou <jim.andreou at gmail.com>wrote:
>> I wonder why String#indexOf(String) is implemented as it is. Apparently,
>> when a character mismatch with the searched pattern is found, the pattern is
>> only shifted by one character, but there are faster algorithms, for example
>> Was anything smarter tried out but had significant disadvantages for general
>> use? What advantages does the current implementation have? It looks very
>> Dimitris Andreou
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the core-libs-dev