Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

Overview

The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.

  1. Diff:
    • Compare two blocks of plain text and efficiently return a list of differences.
    • Diff Demo
  2. Match:
    • Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.
    • Match Demo
  3. Patch:
    • Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn't match.
    • Patch Demo

Originally built in 2006 to power Google Docs, this library is now available in C++, C#, Dart, Java, JavaScript, Lua, Objective C, and Python.

Reference

Languages

Although each language port of Diff Match Patch uses the same API, there are some language-specific notes.

A standardized speed test tracks the relative performance of diffs in each language.

Algorithms

This library implements Myer's diff algorithm which is generally considered to be the best general-purpose diff. A layer of pre-diff speedups and post-diff cleanups surround the diff algorithm, improving both performance and output quality.

This library also implements a Bitap matching algorithm at the heart of a flexible matching and patching strategy.

Comments
  • [WIP] initial typescript conversion

    [WIP] initial typescript conversion

    Following on from #58, I've kicked off the conversion of diff-match-patch starting with @NeilFraser's starting point as the first commit. I've included the built copy for now, though I may strip that out and rebase it based on feedback.

    One interesting thing between Neil's built version and my built version:

    diff --git a/typescript/built/diff_match_patch.js b/typescript/built/diff_match_patch.js
    index 8fbb030..dafa48a 100644
    --- a/typescript/built/diff_match_patch.js
    +++ b/typescript/built/diff_match_patch.js
    @@ -387,7 +387,7 @@ var diff_match_patch = /** @class */ (function () {
          */
         diff_match_patch.prototype.diff_linesToChars_ = function (text1, text2) {
             var lineArray = []; // e.g. lineArray[4] == 'Hello\n'
    -        var lineHash = {}; // e.g. lineHash['Hello\n'] == 4
    +        var lineHash = Object.create(null); // e.g. lineHash['Hello\n'] == 4
             // '\x00' is a valid character, but various debuggers don't like it.
             // So we'll insert a junk entry to avoid generating a null character.
             lineArray[0] = '';
    

    Not quite sure why typescript goes that route to make an object with a null prototype (that still behaves like an object).

    What works

    • Base tests ✅
    • Speed tests ❌
      • The speed tests use diff_match_patch_uncompressed.js which is not generated (by name) here
    • Typing! ✅

    Next steps

    • [ ] Determine how well the types line up with that of @types/diff-match-patch to ease migration
    • [ ] Make speed tests run
    • [ ] Determine how the uncompressed and compressed versions will be stored (I need direction here)
    opened by rgbkrk 9
  • simplify encodeURI using HttpUtility

    simplify encodeURI using HttpUtility

    This simplifies the encodeURI function, using HttpUtility instead of Uri. Tests pass. I suspect this also to be more performant (although I didn't test this specifically).

    opened by jhgbrt 6
  • dart version have something wrong, please fix it...

    dart version have something wrong, please fix it...

    like "message": "'Diff.==' ('(Diff) → bool') isn't a valid override of 'Object.==' ('(dynamic) →

    "message": "A value of type 'Operation' can't be assigned to a variable of type 'int'.",
    "startColumn": 27,
    "endColumn": 54,
    

    ...

    opened by BetterWorld-Liuser 5
  • C# - Exception when diffing long files (Invalid URI: The Uri string is too long.)

    C# - Exception when diffing long files (Invalid URI: The Uri string is too long.)

    I'm trying to diff several files sometimes up to around half a megabyte in size and with some of them I get a "Invalid URI: The Uri string is too long." exception. This is using the patch_make function.

    opened by XXLuigiMario 5
  • Line Diff causes an exception

    Line Diff causes an exception

    Running the example usage from here on the javascript version:

      const text1="this\n is\n a\n test\n";
      const text2="this\n wasn't\n a\n test\n";
      // Example code "as is":
      var dmp = new diff_match_patch();
      var a = dmp.diff_linesToChars_(text1, text2);
      var lineText1 = a[0];
      var lineText2 = a[1];
      var lineArray = a[2];
      var diffs = dmp.diff_main(lineText1, lineText2, false);
      dmp.diff_charsToLines_(diffs, lineArray);
    

    Causes this exception:

    Uncaught Error: Null input. (diff_main) at diff_match_patch.diff_main (:2:190) at :6:19

    opened by urigoren 5
  • 相同内容对比,Java和OC返回结果不一致!这样的问题该怎样解决?Compared with the same content, Java and OC return results are inconsistent! How can such a problem be solved?

    相同内容对比,Java和OC返回结果不一致!这样的问题该怎样解决?Compared with the same content, Java and OC return results are inconsistent! How can such a problem be solved?

    Compared with the same content, Java and OC return results are inconsistent! How can such a problem be solved?(相同内容对比,Java和OC返回结果不一致!这样的问题该怎样解决? Andios )Java on the left and OC on the right

    opened by xuxh6 4
  • Does not seem to work with big text

    Does not seem to work with big text

    Hi tried to compare two ancient greek codices of the New Testament but it seems to failed since for the Vaticanus and Sinaiticus I only have βιβλοϲγενεϲεωϲ|ιυ||χυ|υιου as common bloc.

    opened by kylak 3
  • Modified the Objective-C implementation to use ARC

    Modified the Objective-C implementation to use ARC

    This change does three things:

    1. Updates the Xcode project to support the latest Xcode (11.5),
    2. Make the Apple recommended settings to the Xcode project, and
    3. (most importantly) Updates the project and the code to use Automatic Reference Counting (ARC).

    Closes #97

    opened by stevenklassen8376 3
  • cleanupSemantic doesnt return anything

    cleanupSemantic doesnt return anything

    The last part of the example doesn't work because cleanupSemantic doesn't return the result.

    I looked at the code and indeed the function doesn't return anything.

    opened by jdruiter 3
  • Port C++ code to use Qt5 and its QStringRef

    Port C++ code to use Qt5 and its QStringRef

    Qt4 is no longer supported by Qt.

    Apart from that, porting to Qt5 should also improve the performance, as it has added QStringRef class and QString::leftRef, QString::midRef and QString::rightRef methods. I see that the code splits QStrings into substrings quite a lot, and it does so by making copies, but with QStringRef you can avoid making copies since QStringRef substrings reference the original QString.

    opened by nurupo 3
  • %0A in output

    %0A in output

    I encountered this in Java (using patch_toText(patch_make(...)), but I can see it on the JavaScript demo page, too, and I saw a past Google Code discussion touch on it for a Python usage, plus some Go code working around it.

    Enter the following into the demo page:

    Old version:

    a a

    New version:

    a a a

    (I entered both with trailing newlines, but the behavior is similar without.)

    Compute Patch gives:

    @@ -1,4 +1,6 @@
     a%0Aa%0A
    +a%0A
    

    I would have expected something more like:

    $ diff -u <(echo a; echo a) <(echo a; echo a; echo a) | tail -n +3
    @@ -1,2 +1,3 @@
     a
     a
    +a
    

    (Hmm, I notice only now that diff-match-match disagrees with diff -u about the number of lines involved. (At least, I think that's what the ,4/,6 and ,2/,3 are?))

    As a workaround, I can replace %0A sequences with the empty string, but as best I can tell, even that will leave various other characters escaped.

    I'm a little confused as to why URL encoding comes into this at all. I would expect for Patch.toString() to avoid all that, instead looking like this:

            text.append(aDiff.text.replaceAll("\n", ""))                                                                                                          
                .append("\n");                    
    

    (It seems strange that I need to remove internal \n characters at all. I haven't dug into it.)

    opened by cpovirk 3
  • diff_cleanupSemantic cleans last newline incorrect

    diff_cleanupSemantic cleans last newline incorrect

    The cleaner methodes like diff_cleanupSemantic not correct the last newline like expected. Sometimes (by multiple line insert/delete) the result grabs a newline character from the line before, and cut his own of. It would be better if complete lines are inserted/deleted. I made my own cleaner to correct this and I can post it here, but it should be done in the diff_cleanup.

    (tested only with the cs code)

    opened by Alisis33 4
  • [C#] Convert Javadoc in C# code to proper triple-slash documentation

    [C#] Convert Javadoc in C# code to proper triple-slash documentation

    I like this library and would like to give back: This PR converts the Javadoc in the C# source code to C# XML comments. With these changes, each class's documentation would be made usable by IDE's.

    Notably, this PR does not affect functionality, compat, etc. in any way, so there is no risk accepting this PR. Aside from the documentation convertion, my IDE auto-formatted the opening curly brace to start on the next line, as is the norm in C#.

    Edit: This PR is easier to review with whitespace ignored: https://github.com/google/diff-match-patch/pull/128/files?w=1

    opened by RenderMichael 1
  • Typescript (.ts) port of the file diff_match_patch_uncompressed.js

    Typescript (.ts) port of the file diff_match_patch_uncompressed.js

    This is useful for projects that use Typescript (e.g. a Visual Studio Code Extension). Tested using the same logic as is present in the Patch Demo. "visualSilicon" has also submitted a CLA for contributions to Google open-source projects.

    opened by ramin-zaghi 3
  • Wiki/Documentation: Suggested function name should be diff_wordsToChars

    Wiki/Documentation: Suggested function name should be diff_wordsToChars

    In the first sentence in the section about a word mode:

    https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs#word-mode

    Computing a word-mode diff is exactly the same as the line-mode diff, except you will have to make a copy of diff_linesToChars and call it diff_linesToWords.

    The function should be renamed to "diff_wordsToChars", not "diff_linesToWords".

    Because words will be mapped to single chars, if you do the suggested modification. Analogous to lines that will be mapped to single chars in the original function.

    opened by ztiromoritz 0
  • Request for code review

    Request for code review

    Hi,

    I yesterday ran into a problem that my diff implementation (adopted from the internet) in some cases actually doesn't work reliably.

    Suppose I have a bit stream of zeros and ones, called simply "mine" (99658 bits) and "theirs" (99655 bits). These represent low-level magnetic information of two full revolutions of one and the same track on a 2DD floppy (nominally 100,000 bits).

    Apart from being of different lengths (theirs being by three bits shorter than mine), there is also a difference in the middle - 34,425-th bit is different, thanks to magnetic deterioration. Otherwise, the two bit streams are identical.

    image

    image

    image

    My implementation of the diff algorithm produces the following script that transforms mine into theirs (unlike the numbering in OpenOffice Calc, the numbering of bits is zero-based, hence off by 1). The first two seem to successfully deal with the difference in the middle. The latter two fail to remove the tail three extra bits in mine.

    • insertion of 1 bit from theirs at 34424 into mine at 34424,
    • deletion of 1 bit from mine at 34424,
    • deletion of 1 bit from mine at 99646,
    • deletion of 1 bit from mine at 99650.

    Because I've exhausted all my skills, I would like to ask if anyone could check my implementation of the diff algorithm and eventually fix the bug that's inside. Although I read Mr.Myers's paper "An O(ND) Difference Algorithm and its Variations," I'm not much more clever than before. The diff algorithm is implemented here (circa 100 lines of actual code): https://github.com/tomas-nestorovic/RIDE/blob/master/Main/src/Diff.h

    Edit: Attached mine.txt and theirs.txt with respective bit streams. theirs.txt mine.txt

    MANY THANKS IN ADVANCE FOR ANY RESPONSES! :-)

    opened by tomas-nestorovic 1
Owner
Google
Google ❤️ Open Source
Google
This is our take on the digitalisation of the board game "b00le0", where you can play versus our AI, or against one of your friends in an online match.

This is our take on the digitalisation of the board game "b00le0", where you can play versus our AI, or against one of your friends in an online match.

valko purzalko 22 Dec 8, 2022
Text - A spicy text library for C++ that has the explicit goal of enabling the entire ecosystem to share in proper forward progress towards a bright Unicode future.

ztd.text Because if text works well in two of the most popular systems programming languages, the entire world over can start to benefit properly. Thi

Shepherd's Oasis 228 Dec 25, 2022
An eventing framework for building high performance and high scalability systems in C.

NOTE: THIS PROJECT HAS BEEN DEPRECATED AND IS NO LONGER ACTIVELY MAINTAINED As of 2019-03-08, this project will no longer be maintained and will be ar

Meta Archive 1.7k Dec 14, 2022
Typewriter Effect with Rich Text + *Correct* Text Wrapping

Typewriter Effect with Rich Text + Correct Text Wrapping I've spent way too long getting this right. This is meant as a base class for a UMG dialogue

Sam Bloomberg 30 Nov 29, 2022
Tiny blocker of Windows tracking and telemetry written in plain C++/Win32 API.

Tiny blocker of Windows tracking and telemetry written in plain C++/Win32 API. Just run once as admin and forget. No questions asked. No harmful actions performed like other Windows spying blockers try.

null 6 Dec 23, 2022
A fast and simple 6502 emulator in plain C.

6502js But C A hobby project that replicates the 6502js emulator, but with the speed of C. Why? For fun and learning :D Simple on the outside: Replica

Lim Ding Wen 2 Nov 25, 2021
Dear ImGui Addons Branch = plain unmodified dear imgui plus some extra addon.

Dear ImGui (This library is available under a free and permissive license, but needs financial support to sustain its continued improvements. In addit

Flix 352 Dec 28, 2022
Quick patch to prevent fatal crashing when downloading title assets (boxart, etc) through FSD or Aurora.

Quick patch to prevent fatal crashing when downloading title assets (boxart, etc) through FSD or Aurora. As of v0.2-beta, this patch should work for everyone, regardless of geographic location (both in and outside of the US).

Stelio Kontos 27 Dec 11, 2022
IDA StrikeOut: A Hex-Rays decompiler plugin to patch the Ctree

StrikeOut is an plugin for the Hex-Rays Decompiler. It allows you to delete (hide) statements from the AST, thus simplifying the pseudocode output. This is a useful scenario when you are dealing with lots of junk code or code that don't necessarily increase your understanding of the pseudocode.

Elias Bachaalany 82 Dec 6, 2022
Pure Data patch export to lv2 plugin using heavy compiler + dpf example

Pure Data patch export to lv2 plugin using heavy compiler + dpf example Work in progress - Takes an audio input and writes to a 0.5 seconds buffers. 4

Qwrtst 3 Dec 27, 2022
Patch Onimusha 3 to allow any screen resolution and configure inputs.

Onimusha3Patch Patch Onmimusha 3 to allow any screen resolution. Fix the Configure input menu (see Configure input patch directory). How to use Compil

Xavier Monin 1 Nov 26, 2021
Use DOS object files (OMF) as patch files

omfpatch - Use Intel/Microsoft .OBJ files as binary diffs Overview This tool makes it possible to use MASM / TASM / JWasm / nasm as tool to write patc

Michael Karcher 2 Jan 30, 2022
Patch for Sierra's PowerChess to run on newer Windows Versions >9x

What is it? I recently stumbled upon the following thread: https://sourceforge.net/p/dxwnd/discussion/general/thread/98dd46dfc6/?page=0 Some people we

null 2 Mar 27, 2022
A window manager for GNOME, with rounded corners patch

Tested in gnome-shell 40.5, should works in gnome 40 and 41. integrate the blur effects with rounded corners.The source code can be found here. Issues

Yi 502 Jan 4, 2023
Patch for Titanfall 2 that helps prevent disconnects while the servers are being attacked by a DoS attack.

Titanfall2 DeltaBuf patch This patch for Titanfall 2 helps prevent disconnects while the servers are being attacked by a DoS attack. Disclaimer This i

null 7 Jan 8, 2023
Fork of junaburg's picom fork with a patch for rounded corners and shadows

picom new! : You'll now also find tryone's dual_kawase blur for the new backend, as well as rounded corners from sdhand if they are so desired, merged

Arian Rezazadeh 49 Dec 20, 2022
Samir Teymurov 1 Oct 6, 2021
A C++11 large integer library with effective high performance, simplistic in nature and also clean in the eyes.

BigIntegerCPP BigIntegerCPP is a C++11 port of large integer library used in CryptoLib4Pascal. It allows mostly parsing of numbers as strings in diffe

Telepati 26 Dec 22, 2022
Flexible, portable, high-performance bit fields C++ library. unsigned a:13 becomes F<13> a;

C-plus-plus-library-bit-fields Flexible, portible, high-performance bit fields C++ library. The bit fields are specified with a dummy structure where

Walt Karas 27 Oct 12, 2022