Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

Overview

The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.

  1. Diff:
    • Compare two blocks of plain text and efficiently return a list of differences.
    • Diff Demo
  2. Match:
    • Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.
    • Match Demo
  3. Patch:
    • Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn't match.
    • Patch Demo

Originally built in 2006 to power Google Docs, this library is now available in C++, C#, Dart, Java, JavaScript, Lua, Objective C, and Python.

Reference

Languages

Although each language port of Diff Match Patch uses the same API, there are some language-specific notes.

A standardized speed test tracks the relative performance of diffs in each language.

Algorithms

This library implements Myer's diff algorithm which is generally considered to be the best general-purpose diff. A layer of pre-diff speedups and post-diff cleanups surround the diff algorithm, improving both performance and output quality.

This library also implements a Bitap matching algorithm at the heart of a flexible matching and patching strategy.

Comments
  • [WIP] initial typescript conversion

    [WIP] initial typescript conversion

    Following on from #58, I've kicked off the conversion of diff-match-patch starting with @NeilFraser's starting point as the first commit. I've included the built copy for now, though I may strip that out and rebase it based on feedback.

    One interesting thing between Neil's built version and my built version:

    diff --git a/typescript/built/diff_match_patch.js b/typescript/built/diff_match_patch.js
    index 8fbb030..dafa48a 100644
    --- a/typescript/built/diff_match_patch.js
    +++ b/typescript/built/diff_match_patch.js
    @@ -387,7 +387,7 @@ var diff_match_patch = /** @class */ (function () {
          */
         diff_match_patch.prototype.diff_linesToChars_ = function (text1, text2) {
             var lineArray = []; // e.g. lineArray[4] == 'Hello\n'
    -        var lineHash = {}; // e.g. lineHash['Hello\n'] == 4
    +        var lineHash = Object.create(null); // e.g. lineHash['Hello\n'] == 4
             // '\x00' is a valid character, but various debuggers don't like it.
             // So we'll insert a junk entry to avoid generating a null character.
             lineArray[0] = '';
    

    Not quite sure why typescript goes that route to make an object with a null prototype (that still behaves like an object).

    What works

    • Base tests ✅
    • Speed tests ❌
      • The speed tests use diff_match_patch_uncompressed.js which is not generated (by name) here
    • Typing! ✅

    Next steps

    • [ ] Determine how well the types line up with that of @types/diff-match-patch to ease migration
    • [ ] Make speed tests run
    • [ ] Determine how the uncompressed and compressed versions will be stored (I need direction here)
    opened by rgbkrk 9
  • simplify encodeURI using HttpUtility

    simplify encodeURI using HttpUtility

    This simplifies the encodeURI function, using HttpUtility instead of Uri. Tests pass. I suspect this also to be more performant (although I didn't test this specifically).

    opened by jhgbrt 6
  • dart version have something wrong, please fix it...

    dart version have something wrong, please fix it...

    like "message": "'Diff.==' ('(Diff) → bool') isn't a valid override of 'Object.==' ('(dynamic) →

    "message": "A value of type 'Operation' can't be assigned to a variable of type 'int'.",
    "startColumn": 27,
    "endColumn": 54,
    

    ...

    opened by BetterWorld-Liuser 5
  • C# - Exception when diffing long files (Invalid URI: The Uri string is too long.)

    C# - Exception when diffing long files (Invalid URI: The Uri string is too long.)

    I'm trying to diff several files sometimes up to around half a megabyte in size and with some of them I get a "Invalid URI: The Uri string is too long." exception. This is using the patch_make function.

    opened by XXLuigiMario 5
  • Line Diff causes an exception

    Line Diff causes an exception

    Running the example usage from here on the javascript version:

      const text1="this\n is\n a\n test\n";
      const text2="this\n wasn't\n a\n test\n";
      // Example code "as is":
      var dmp = new diff_match_patch();
      var a = dmp.diff_linesToChars_(text1, text2);
      var lineText1 = a[0];
      var lineText2 = a[1];
      var lineArray = a[2];
      var diffs = dmp.diff_main(lineText1, lineText2, false);
      dmp.diff_charsToLines_(diffs, lineArray);
    

    Causes this exception:

    Uncaught Error: Null input. (diff_main) at diff_match_patch.diff_main (:2:190) at :6:19

    opened by urigoren 5
  • 相同内容对比,Java和OC返回结果不一致!这样的问题该怎样解决?Compared with the same content, Java and OC return results are inconsistent! How can such a problem be solved?

    相同内容对比,Java和OC返回结果不一致!这样的问题该怎样解决?Compared with the same content, Java and OC return results are inconsistent! How can such a problem be solved?

    Compared with the same content, Java and OC return results are inconsistent! How can such a problem be solved?(相同内容对比,Java和OC返回结果不一致!这样的问题该怎样解决? Andios )Java on the left and OC on the right

    opened by xuxh6 4
  • Does not seem to work with big text

    Does not seem to work with big text

    Hi tried to compare two ancient greek codices of the New Testament but it seems to failed since for the Vaticanus and Sinaiticus I only have βιβλοϲγενεϲεωϲ|ιυ||χυ|υιου as common bloc.

    opened by kylak 3
  • Modified the Objective-C implementation to use ARC

    Modified the Objective-C implementation to use ARC

    This change does three things:

    1. Updates the Xcode project to support the latest Xcode (11.5),
    2. Make the Apple recommended settings to the Xcode project, and
    3. (most importantly) Updates the project and the code to use Automatic Reference Counting (ARC).

    Closes #97

    opened by stevenklassen8376 3
  • cleanupSemantic doesnt return anything

    cleanupSemantic doesnt return anything

    The last part of the example doesn't work because cleanupSemantic doesn't return the result.

    I looked at the code and indeed the function doesn't return anything.

    opened by jdruiter 3
  • Port C++ code to use Qt5 and its QStringRef

    Port C++ code to use Qt5 and its QStringRef

    Qt4 is no longer supported by Qt.

    Apart from that, porting to Qt5 should also improve the performance, as it has added QStringRef class and QString::leftRef, QString::midRef and QString::rightRef methods. I see that the code splits QStrings into substrings quite a lot, and it does so by making copies, but with QStringRef you can avoid making copies since QStringRef substrings reference the original QString.

    opened by nurupo 3
  • %0A in output

    %0A in output

    I encountered this in Java (using patch_toText(patch_make(...)), but I can see it on the JavaScript demo page, too, and I saw a past Google Code discussion touch on it for a Python usage, plus some Go code working around it.

    Enter the following into the demo page:

    Old version:

    a a

    New version:

    a a a

    (I entered both with trailing newlines, but the behavior is similar without.)

    Compute Patch gives:

    @@ -1,4 +1,6 @@
     a%0Aa%0A
    +a%0A
    

    I would have expected something more like:

    $ diff -u <(echo a; echo a) <(echo a; echo a; echo a) | tail -n +3
    @@ -1,2 +1,3 @@
     a
     a
    +a
    

    (Hmm, I notice only now that diff-match-match disagrees with diff -u about the number of lines involved. (At least, I think that's what the ,4/,6 and ,2/,3 are?))

    As a workaround, I can replace %0A sequences with the empty string, but as best I can tell, even that will leave various other characters escaped.

    I'm a little confused as to why URL encoding comes into this at all. I would expect for Patch.toString() to avoid all that, instead looking like this:

            text.append(aDiff.text.replaceAll("\n", ""))                                                                                                          
                .append("\n");                    
    

    (It seems strange that I need to remove internal \n characters at all. I haven't dug into it.)

    opened by cpovirk 3
  • Typescript (.ts) port of the file diff_match_patch_uncompressed.js

    Typescript (.ts) port of the file diff_match_patch_uncompressed.js

    This is useful for projects that use Typescript (e.g. a Visual Studio Code Extension). Tested using the same logic as is present in the Patch Demo. "visualSilicon" has also submitted a CLA for contributions to Google open-source projects.

    opened by ramin-zaghi 2
  • Wiki/Documentation: Suggested function name should be diff_wordsToChars

    Wiki/Documentation: Suggested function name should be diff_wordsToChars

    In the first sentence in the section about a word mode:

    https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs#word-mode

    Computing a word-mode diff is exactly the same as the line-mode diff, except you will have to make a copy of diff_linesToChars and call it diff_linesToWords.

    The function should be renamed to "diff_wordsToChars", not "diff_linesToWords".

    Because words will be mapped to single chars, if you do the suggested modification. Analogous to lines that will be mapped to single chars in the original function.

    opened by ztiromoritz 0
  • Request for code review

    Request for code review

    Hi,

    I yesterday ran into a problem that my diff implementation (adopted from the internet) in some cases actually doesn't work reliably.

    Suppose I have a bit stream of zeros and ones, called simply "mine" (99658 bits) and "theirs" (99655 bits). These represent low-level magnetic information of two full revolutions of one and the same track on a 2DD floppy (nominally 100,000 bits).

    Apart from being of different lengths (theirs being by three bits shorter than mine), there is also a difference in the middle - 34,425-th bit is different, thanks to magnetic deterioration. Otherwise, the two bit streams are identical.

    image

    image

    image

    My implementation of the diff algorithm produces the following script that transforms mine into theirs (unlike the numbering in OpenOffice Calc, the numbering of bits is zero-based, hence off by 1). The first two seem to successfully deal with the difference in the middle. The latter two fail to remove the tail three extra bits in mine.

    • insertion of 1 bit from theirs at 34424 into mine at 34424,
    • deletion of 1 bit from mine at 34424,
    • deletion of 1 bit from mine at 99646,
    • deletion of 1 bit from mine at 99650.

    Because I've exhausted all my skills, I would like to ask if anyone could check my implementation of the diff algorithm and eventually fix the bug that's inside. Although I read Mr.Myers's paper "An O(ND) Difference Algorithm and its Variations," I'm not much more clever than before. The diff algorithm is implemented here (circa 100 lines of actual code): https://github.com/tomas-nestorovic/RIDE/blob/master/Main/src/Diff.h

    Edit: Attached mine.txt and theirs.txt with respective bit streams. theirs.txt mine.txt

    MANY THANKS IN ADVANCE FOR ANY RESPONSES! :-)

    opened by tomas-nestorovic 1
  • Repo status: has this been abandoned? Are there active forks?

    Repo status: has this been abandoned? Are there active forks?

    We're about 2.5 years out from the last commit. There are seemingly viable PRs to be merged, but no activity from maintainers. Is there an active fork somewhere?

    opened by derekperkins 1
  • Difference in result of cleanupSemantic() function on iOS and Android.

    Difference in result of cleanupSemantic() function on iOS and Android.

    We have used objective-C code for iOS and Java code for Android from library. We are using same set of function to find the difference of string in Android and iOS.

    But for some texts we are getting difference results for cleanupSemantic() function on Android and iOS.

    1. In code first we have used the diff_main() function which returns us list of Diff models. In iOS and Android its returning list of Diff() with length 399.

    2. Second, we have used cleanupSemantic() to get the result list of Diff() models. In iOS, cleanupSemantic() function returns list of Diff() with length 8 whereas in Android same function returns list of Diff() with length 11.

    3. Then we are using the diff_prettyHtml function to get highlighted text. We are using this highlighted html text to load into the WebView.

    4. We have customized diff_prettyHtml function to show inserted text in only green background without underline.

    Due to difference in result of cleanupSemantic() function on iOS and Android. We are not able to show same text difference on both iOS and Android devices. iOS result image: iOS_text

    Android result image: android_text-min From above images you can find that last 2 lines in Android is different from that is in iOS. You can also use previousText and updatedText value that we have posted in below example to replicate the issue.

    Code used on Android side is as follows:

    String **previousText** = “Customer shall pay Consultant’s expenses $200000 per month, as determined by Consultant in its reasonable business judgment, for performing the Services under this Agreement. Consultant shall invoice Customer for Services. All such invoiced amounts become due and payable to Consultant upon Client’s receipt of such invoice. Amounts that are not paid within fifteen (15) days following Customer’s receipt of such invoice will incur a penalty of $10000 per month or the maximum allowed by law, whichever is less.  Customer shall pay any amounts incurred by Consultant in the collection of past-due amounts owed, including, but not limited to, reasonable attorneys’ fees and costs. This is updated now<br>”;
    
    String **updatedText** = “!%@%%!%%@%%@515252555115   A computer program can easily produce gibberish - especially if it has been provided with garbage beforehand. This program does something a little different. It takes a block of text as input and works out the proportion of characters within the text according to a chosen order. For example, an order of 2 means the program looks at pairs of letters, an order of 3 means triplets of letters and so on. The software can reg”;
    
    diff_match_patch diffMatchPatch = new diff_match_patch();
                       
    LinkedList<diff_match_patch.Diff> listStrings = diffMatchPatch.diff_main(previousText, updatedText);
    
    diffMatchPatch.diff_cleanupSemantic(listStrings);
    
    String sb = diffMatchPatch.diff_prettyHtmlCustom (listStrings);
    
    webViewInstance.loadDataWithBaseURL("", sb, "text/html", "utf-8", "");
    

    We have some of the questions regarding the same:

    1. Is this difference in results is due to different implementation of library in Objective-C and Java ?
    2. Is there is any suggested modification or solution for this issue to make it work for both iOS and Android ?

    Please suggest us an solution to overcome this issue as soon as possible.

    We are very near to our app release, prompt reply will be very helpful.

    Thanks,

    opened by pradiptilala 3
  • confuse about diff-match-path compute results on microsoft TODO app, can somebody give some tips?

    confuse about diff-match-path compute results on microsoft TODO app, can somebody give some tips?

    I try to understand how diff-match-patch works on Microsoft TODO app.

    So i try to create a case about different copies on a base text , and use diff-match-patch to resolve to get a final result, and compared it with the result got from Microsoft TODO app.

    The following is my case: Here i define some names to indicate the case texts. -----------------define begin----------------------- BaseText I am the very model of a cartoon individual, My animation's comical, unusual, and whimsical, I'm quite adept at funny gags, comedic theory I have read, From wicked puns and stupid jokes to anvils that drop on your head.

    TextA: I am the very model of a modern Major-General, I've information vegetable, animal, and mineral, I know the kings of England, and I quote the fights historical, From Marathon to Waterloo, in order categorical.

    TextB: I am the very model of a cartoon individual, xxxxx My animation's comical, unusual, and whimsical, I'm quite adept at funny gags, comedic theory I have read, From wicked puns and stupid jokes to anvils that drop on your head.

    FinalResultText: I am the very model of a modern Major-Generxxx My aniI'veformation vegetable, animal, and mineral, I know the kings of England, and I quote the fights historical, From Marathon to Waterloo, in order categorical.

    ------------------define end-----------------

    The change timeline is:

    1. First UserA and UserB are all BaseText
    2. UserA change BaseText to TextA in online status, but UserB is offline status.
    3. After UserA finish change, UserB change his BaseText to TextB under offline status.
    4. UserB switch the network on, commit it changes to server.
    5. wait for a while, for Microsoft TODO App, both UserA and UserB change their texts to FinalResultText. But on this demo(https://neil.fraser.name/software/diff_match_patch/demos/patch.html), the result text is TextA.

    I can understand that on patch demo, patch of UserB changes cannot patch on TextA, so the final result is TextA. But i cannot understand why Microsoft use DiffMatchPatch but the result is FinalResultText.

    Any body known why it is?

    opened by initlifeinc 2
Owner
Google
Google ❤️ Open Source
Google
This is our take on the digitalisation of the board game "b00le0", where you can play versus our AI, or against one of your friends in an online match.

This is our take on the digitalisation of the board game "b00le0", where you can play versus our AI, or against one of your friends in an online match.

valko purzalko 22 Jul 3, 2022
Text - A spicy text library for C++ that has the explicit goal of enabling the entire ecosystem to share in proper forward progress towards a bright Unicode future.

ztd.text Because if text works well in two of the most popular systems programming languages, the entire world over can start to benefit properly. Thi

Shepherd's Oasis 205 Aug 30, 2022
Tiny blocker of Windows tracking and telemetry written in plain C++/Win32 API.

Tiny blocker of Windows tracking and telemetry written in plain C++/Win32 API. Just run once as admin and forget. No questions asked. No harmful actions performed like other Windows spying blockers try.

null 5 Mar 30, 2022
A fast and simple 6502 emulator in plain C.

6502js But C A hobby project that replicates the 6502js emulator, but with the speed of C. Why? For fun and learning :D Simple on the outside: Replica

Lim Ding Wen 2 Nov 25, 2021
Dear ImGui Addons Branch = plain unmodified dear imgui plus some extra addon.

Dear ImGui (This library is available under a free and permissive license, but needs financial support to sustain its continued improvements. In addit

Flix 349 Sep 21, 2022
An eventing framework for building high performance and high scalability systems in C.

NOTE: THIS PROJECT HAS BEEN DEPRECATED AND IS NO LONGER ACTIVELY MAINTAINED As of 2019-03-08, this project will no longer be maintained and will be ar

Meta Archive 1.7k Sep 3, 2022
Typewriter Effect with Rich Text + *Correct* Text Wrapping

Typewriter Effect with Rich Text + Correct Text Wrapping I've spent way too long getting this right. This is meant as a base class for a UMG dialogue

Sam Bloomberg 28 Sep 19, 2022
Quick patch to prevent fatal crashing when downloading title assets (boxart, etc) through FSD or Aurora.

Quick patch to prevent fatal crashing when downloading title assets (boxart, etc) through FSD or Aurora. As of v0.2-beta, this patch should work for everyone, regardless of geographic location (both in and outside of the US).

Stelio Kontos 26 Jun 26, 2022
IDA StrikeOut: A Hex-Rays decompiler plugin to patch the Ctree

StrikeOut is an plugin for the Hex-Rays Decompiler. It allows you to delete (hide) statements from the AST, thus simplifying the pseudocode output. This is a useful scenario when you are dealing with lots of junk code or code that don't necessarily increase your understanding of the pseudocode.

Elias Bachaalany 80 Aug 11, 2022
Pure Data patch export to lv2 plugin using heavy compiler + dpf example

Pure Data patch export to lv2 plugin using heavy compiler + dpf example Work in progress - Takes an audio input and writes to a 0.5 seconds buffers. 4

Qwrtst 2 Jun 13, 2022
Patch Onimusha 3 to allow any screen resolution and configure inputs.

Onimusha3Patch Patch Onmimusha 3 to allow any screen resolution. Fix the Configure input menu (see Configure input patch directory). How to use Compil

Xavier Monin 1 Nov 26, 2021
Use DOS object files (OMF) as patch files

omfpatch - Use Intel/Microsoft .OBJ files as binary diffs Overview This tool makes it possible to use MASM / TASM / JWasm / nasm as tool to write patc

Michael Karcher 2 Jan 30, 2022
Patch for Sierra's PowerChess to run on newer Windows Versions >9x

What is it? I recently stumbled upon the following thread: https://sourceforge.net/p/dxwnd/discussion/general/thread/98dd46dfc6/?page=0 Some people we

null 2 Mar 27, 2022
A window manager for GNOME, with rounded corners patch

Tested in gnome-shell 40.5, should works in gnome 40 and 41. integrate the blur effects with rounded corners.The source code can be found here. Issues

Yi 479 Sep 20, 2022
Patch for Titanfall 2 that helps prevent disconnects while the servers are being attacked by a DoS attack.

Titanfall2 DeltaBuf patch This patch for Titanfall 2 helps prevent disconnects while the servers are being attacked by a DoS attack. Disclaimer This i

null 6 Jul 30, 2022
Fork of junaburg's picom fork with a patch for rounded corners and shadows

picom new! : You'll now also find tryone's dual_kawase blur for the new backend, as well as rounded corners from sdhand if they are so desired, merged

Arian Rezazadeh 49 Sep 13, 2022
A C++11 large integer library with effective high performance, simplistic in nature and also clean in the eyes.

BigIntegerCPP BigIntegerCPP is a C++11 port of large integer library used in CryptoLib4Pascal. It allows mostly parsing of numbers as strings in diffe

Telepati 26 Sep 12, 2022
Flexible, portable, high-performance bit fields C++ library. unsigned a:13 becomes F<13> a;

C-plus-plus-library-bit-fields Flexible, portible, high-performance bit fields C++ library. The bit fields are specified with a dummy structure where

Walt Karas 26 Aug 31, 2022
Visualization Library is a C++ middleware for high-performance 2D and 3D graphics applications based on OpenGL 1.x-4.x supporting Windows, Linux and Mac OS X.

Visualization Library 2.2 Gallery About Visualization Library is a C++ middleware for high-performance 2D and 3D graphics applications based on the in

Michele 308 Sep 15, 2022