convert: avoid malloc of original file size
authorJoey Hess <id@joeyh.name>
Thu, 7 Mar 2019 19:56:57 +0000 (14:56 -0500)
committerJunio C Hamano <gitster@pobox.com>
Fri, 8 Mar 2019 01:13:00 +0000 (10:13 +0900)
We write the output of a "clean" filter into a strbuf. Rather than
growing the strbuf dynamically as we read its output, we make the
initial allocation as large as the original input file. This is a good
guess when the filter is just tweaking a few bytes, but it's disastrous
when the point of the filter is to condense a very large file into a
short identifier (e.g., the way git-lfs and git-annex do). We may ask to
allocate many gigabytes, causing the allocation to fail and Git to
die().

Instead, let's just let strbuf do its usual growth.

When the clean filter does output something around the same size as the
worktree file, the buffer will need to be reallocated until it fits,
starting at 8192 and doubling in size. Benchmarking indicates that
reallocation is not a significant overhead for outputs up to a
few MB in size.

Signed-off-by: Joey Hess <id@joeyh.name>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
convert.c
index e0848226d21bbbeb26d9ac6217fe1e61ccc71fca..bdaec3411dfb93b3da6b78a0fac625d1fd3d4733 100644 (file)
--- a/convert.c
+++ b/convert.c
@@ -732,7 +732,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
        if (start_async(&async))
                return 0;       /* error was already reported */
 
-       if (strbuf_read(&nbuf, async.out, len) < 0) {
+       if (strbuf_read(&nbuf, async.out, 0) < 0) {
                err = error(_("read from external filter '%s' failed"), cmd);
        }
        if (close(async.out)) {