Gap Buffer based AbstractStringBuilder implementation
gokdogan at gmail.com
Thu Dec 3 15:28:43 UTC 2009
On Tue, Dec 1, 2009 at 2:16 AM, Martin Buchholz <martinrb at google.com> wrote:
> On Thu, Nov 26, 2009 at 00:57, Goktug Gokdogan <gokdogan at gmail.com> wrote:
> > I think, we can overcome the slowdown. The point of my prototype is to
> > the general performance characteristics. Slowdown is more likely due to
> > poorly optimized extra method call to keep all logic in one place.
> > the gap buffer algorithm should add only one comparison overhead to the
> > common case which will not to be observable in benchmarks.
> I am coming to like the gap buffer idea.
> You can optimize for append mode - append methods do not have to
> check for the existence or length of an internal gap.
> An extra comparison overhead is likely to be cheap, but it is nevertheless
> likely to be a measurable slowdown.
> But as I said, it more and more seems to me the right thing to do.
> If done right, StringBuilder can start to strongly resemble an Emacs
> and be useful for largish amounts of data with internal modifications.
> > I have previously thought about implementing a similar behavior in other
> > array-based growing structures, but it is not worth it. You can easily
> > your previous data structures for pre-appending algorithms - just by
> > appending to end and iterating from reverse. But you can't do that in a
> > StringBuilder because StringBuilder itself is the composed data. So, in
> > opinion, ArrayList is not a good analogy for this case.
> > StringBuilder, as its name suggests, is for building strings and building
> > string by appending to end is only one of the ways of doing it.
> > I think we should go for this if we can do it without an observable
> > slowdown.
> I didn't understand this argument. StringBuilder and ArrayList still
> seem to me conceptually similar.
Let me try to explain my argument in another way.
StringBuilder, similar to the other builder pattern implementations, is for
constructing an object, not to collect some data. When append/insert/delete
etc. called on a StringBuilder object, the goal is to shape the final state
of the resulting String. At the end, you call toString method and you are
But ArrayList is more flexible in a way that you are able to reinterpret the
result. For example, you can iterate the items in reverse order or skip some
of them etc.. So, with this point of view, I believe that StringBuilder and
ArrayList are conceptually different.
While building a String with StringBuilder, as it is the builder class for
String, I expect it to provide good performance for different building
scenarios. If I was designing it in the first place, I would start with an
implementation that performs well in general (e.g. gap buffer), then I would
check the performance for the common case (ie. appending to end) . If it
does not perform well and can not be optimized, then I would consider adding
another builder class to API that performs well on the common case before
replacing the original.
But in our case, as the previous codes already depends on the current
performance characteristics of StringBuilder, I totally agree that we are
good to replace it, iff it still performs as good as the previous one. This
is why I suggested the change in the first place.
> > On Thu, Nov 26, 2009 at 1:02 AM, Martin Buchholz <martinrb at google.com>
> > wrote:
> >> On Wed, Nov 25, 2009 at 21:24, Martin Buchholz <martinrb at google.com>
> >> wrote:
> >> > On Mon, Nov 23, 2009 at 22:51, Goktug Gokdogan <gokdogan at gmail.com>
> >> > wrote:
> >> >> Nobody is interested or everybody is busy?
> >> >
> >> > I think there's a place for a StringBuilder-like
> >> > abstraction that uses a gap buffer,
> >> > but it shouldn't replace StringBuilder.
> >> Let me qualify that.
> >> It is hard to sell the small slowdown in the common case
> >> to make other (rare?) operations much faster.
> >> ArrayList should have been implemented to allow
> >> O(1) insert at both ends, like ArrayDeque,
> >> but it is hard to persuade the maintainers
> >> that such a change is worth making today,
> >> when the benchmarks all exercise only the common case.
> >> Similarily for a hypothetical GapArrayList.
> >> On the other hand, on modern cpus
> >> arithmetic is ever closer to being "free",
> >> so it is easier to justify the extra computation.
> >> Martin
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the core-libs-dev