Onigmo is a regular expressions library forked from Oniguruma.

Overview

Build Status Build status Coverage Status Coverity Scan Build Status Code Quality: Cpp Total Alerts

Onigmo (Oniguruma-mod)

https://github.com/k-takata/Onigmo

Onigmo is a regular expressions library forked from Oniguruma. It focuses to support new expressions like \K, \R, (?(cond)yes|no) and etc. which are supported in Perl 5.10+.

Since Onigmo is used as the default regexp library of Ruby 2.0 or later, many patches are backported from Ruby 2.x.

See also the Wiki page: https://github.com/k-takata/Onigmo/wiki

License

BSD license.

Install

Case 1: Unix and Cygwin platform

  1. ./autogen.sh (If configure doesn't exist.)
  2. ./configure
  3. make
  4. make install
  • test

    make test

  • uninstall

    make uninstall

  • configuration check

    onigmo-config --cflags onigmo-config --libs onigmo-config --prefix onigmo-config --exec-prefix

Case 2: Windows 64/32bit platform (Visual C++)

Execute build_nmake.cmd. build_x64 or build_x86 will be used as a working/output directory.

  onigmo_s.lib:  static link library
  onigmo.lib:    import library for dynamic link
  onigmo.dll:    dynamic link library
  • test (ASCII/Shift_JIS/EUC-JP/Unicode)

    Execute build_nmake.cmd test. Python (with the same bitness of Onigmo) is needed to run the tests.

Case 3: Windows 64/32bit platform (MinGW)

Execute mingw32-make -f win32/Makefile.mingw. build_x86-64, build_i686 and etc. will be used as a working/output directory.

  libonigmo.a:     static link library
  libonigmo.dll.a: import library for dynamic link
  onigmo.dll:      dynamic link library
  • test (ASCII/Shift_JIS/EUC-JP/Unicode)

    Execute mingw32-make -f win32/Makefile.mingw test. Python (with the same bitness of Onigmo) is needed to run the tests.

  • If you use MinGW on MSYS2, you can also use ./configure and make like Unix. In this case, DLL name will have API version number. E.g.:

    libonigmo-6.dll

Regular Expressions

See doc/RE or doc/RE.ja for Japanese.

Usage

Include onigmo.h in your program. (Onigmo API) See doc/API for Onigmo API.

If you want to disable UChar type (== unsigned char) definition in onigmo.h, define ONIG_ESCAPE_UCHAR_COLLISION and then include onigmo.h.

If you want to disable regex_t type definition in onigmo.h, define ONIG_ESCAPE_REGEX_T_COLLISION and then include onigmo.h.

Example of the compiling/linking command line in Unix or Cygwin, (prefix == /usr/local case)

cc sample.c -L/usr/local/lib -lonigmo

If you want to use static link library (onigmo_s.lib) in Win32, add option -DONIG_EXTERN=extern to C compiler.

Sample Programs

File Description
sample/simple.c example of the minimum (Onigmo API)
sample/names.c example of the named group callback.
sample/encode.c example of some encodings.
sample/listcap.c example of the capture history.
sample/posix.c POSIX API sample.
sample/sql.c example of the variable meta characters.

Test Programs

File Description
sample/syntax.c Perl, Java and ASIS syntax test.
sample/crnl.c CRNL test

Source Files

File Description
onigmo.h Onigmo API header file (public)
onigmo-config.in configuration check program template
onigmo.py Onigmo module for Python
regenc.h character encodings framework header file
regint.h internal definitions
regparse.h internal definitions for regparse.c and regcomp.c
regcomp.c compiling and optimization functions
regenc.c character encodings framework
regerror.c error message function
regext.c extended API functions (deluxe version API)
regexec.c search and match functions
regparse.c parsing functions.
regsyntax.c pattern syntax functions and built-in syntax definition
regtrav.c capture history tree data traverse functions
regversion.c version info function
st.h hash table functions header file
st.c hash table functions
onigmognu.h GNU regex API header file (public)
reggnu.c GNU regex API functions
onigmoposix.h POSIX API header file (public)
regposerr.c POSIX error message function
regposix.c POSIX API functions
enc/mktable.c character type table generator
enc/ascii.c ASCII-8BIT encoding
enc/jis/ JIS properties data
enc/euc_jp.c EUC-JP encoding
enc/euc_tw.c EUC-TW encoding
enc/euc_kr.c EUC-KR, EUC-CN encoding
enc/shift_jis.c Shift_JIS encoding
enc/shift_jis.h Common part of Shift_JIS and Windows-31J encoding
enc/windows_31j.c Windows-31J (CP932) encoding
enc/big5.c Big5 encoding
enc/gb18030.c GB18030 encoding
enc/gbk.c GBK encoding
enc/koi8_r.c KOI8-R encoding
enc/koi8_u.c KOI8-U encoding
enc/iso_8859.h common definition of ISO-8859 encoding
enc/iso_8859_1.c ISO-8859-1 (Latin-1)
enc/iso_8859_2.c ISO-8859-2 (Latin-2)
enc/iso_8859_3.c ISO-8859-3 (Latin-3)
enc/iso_8859_4.c ISO-8859-4 (Latin-4)
enc/iso_8859_5.c ISO-8859-5 (Cyrillic)
enc/iso_8859_6.c ISO-8859-6 (Arabic)
enc/iso_8859_7.c ISO-8859-7 (Greek)
enc/iso_8859_8.c ISO-8859-8 (Hebrew)
enc/iso_8859_9.c ISO-8859-9 (Latin-5 or Turkish)
enc/iso_8859_10.c ISO-8859-10 (Latin-6 or Nordic)
enc/iso_8859_11.c ISO-8859-11 (Thai)
enc/iso_8859_13.c ISO-8859-13 (Latin-7 or Baltic Rim)
enc/iso_8859_14.c ISO-8859-14 (Latin-8 or Celtic)
enc/iso_8859_15.c ISO-8859-15 (Latin-9 or West European with Euro)
enc/iso_8859_16.c ISO-8859-16 (Latin-10)
enc/utf_8.c UTF-8 encoding
enc/utf_16be.c UTF-16BE encoding
enc/utf_16le.c UTF-16LE encoding
enc/utf_32be.c UTF-32BE encoding
enc/utf_32le.c UTF-32LE encoding
enc/unicode.c common codes of Unicode encoding
enc/unicode/ Unicode case folding data and properties data
enc/windows_1250.c Windows-1250 (CP1250) encoding (Central/Eastern Europe)
enc/windows_1251.c Windows-1251 (CP1251) encoding (Cyrillic)
enc/windows_1252.c Windows-1252 (CP1252) encoding (Latin)
enc/windows_1253.c Windows-1253 (CP1253) encoding (Greek)
enc/windows_1254.c Windows-1254 (CP1254) encoding (Turkish)
enc/windows_1257.c Windows-1257 (CP1257) encoding (Baltic Rim)
enc/cp949.c CP949 encoding (only used in Ruby)
enc/emacs_mule.c Emacs internal encoding (only used in Ruby)
enc/gb2312.c GB2312 encoding (only used in Ruby)
enc/us_ascii.c US-ASCII encoding (only used in Ruby)
win32/Makefile Makefile for Win32 (VC++)
win32/Makefile.mingw Makefile for Win32 (MinGW)
win32/config.h config.h for Win32
win32/onigmo.rc resource file for Win32
Comments
  • "ss" in look-behind raises syntax error

    "ss" in look-behind may raise syntax error.

    irb(main):006:0> /(?<!ss)/iu
    => /(?<!ss)/i
    irb(main):002:0> /(?<!ass)/iu
    SyntaxError: (irb):2: invalid pattern in look-behind: /(?<!ass)/i
    

    This seems because "ss" is expanded to "ß". Though Onigmo avoids the issue if there's only "ss", but with another characters it cause error.

    bug 
    opened by nurse 13
  • extend utf8 to 31bits

    extend utf8 to 31bits

    This extends utf8 to 31bits.

    Explanation of motivation is here https://github.com/k-takata/Onigmo/issues/110

    I implemented logic with same manner on original 21 bits logic.

    I add test program for testing utf-8 codec.

    It can execute following steps. (at least in my mac environment)

    $ aclocal
    $ automake -ac
    $ ./configure
    $ make
    $ make test_enc_utf8
    $ ./test_enc_utf8
    

    31bits mode is enabled by USE_UTF8_31BITS flag in utf8.c. To ease testing this PR, I enabled it. If this is accepted, I amend commit to disable it.

    USE_UTF8_31BITS flag is also in test_enc_utf8.c. We need to keep that these two flag have same.

    Finally, I attached my memo to implement decoding table.

    5 bytes
                111110yy 10yyyxxx 10xxxxxx 10xxxxxx 10xxxxxx 
    min            
    U+200000          00   001000   000000   000000   000000  26 bit
                      F8       88       80       80       80 
    max
    U+3FFFFFF         11   111111   111111   111111   111111  
                      FB       BF       BF       BF       BF 
    
    6bytes
                1111110y 10yyyyxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
    min            
    U+4000000          0   000100   000000   000000   000000   000000  31 bit
                      FC       84       80       80       80       80
    max
    U+7FFFFFF          1   111111   111111   111111   111111   111111
                      FD       BF       BF       BF       BF       BF
    
    サロゲートペア
    U+D800 - U+DBFF, U+DC00 - U+DFFFをそのままコードポイントとして
    UTF-8規則でエンコードすると
    ED A0 80 - ED AF BF, ED B0 80 - ED BF BF
    
    
    
    21bits(original)
    
    S0: 最初の1バイト
          1バイト文字を見た → ACCEPT
          2バイト文字を見た → S1
          E0を見た → S2
          E1-EF(!ED) → S3
          ED → S4
          F0を見た → S5
          F1-F3を見た → S6
          F4を見た → S7
    S1: 通常の後続バイトかどうか判定(1) 80-BF → ACCEPT
    S2: 3バイト文字の2文字目だが、
        1バイト目のyyyyが000だったので、
        2バイト目のyの0が禁止。 A0-BF → S1
    S3: 通常の後続バイトかどうか判定(2) 80-BF → S1
    S4: A0-BFを見たらサロゲートペアなので、
        80-9Fだけ許可 → S1
    S5: 4バイト文字の2文字目だが、
        1バイト目のyyyが000だったので、
        2バイト目のyyの00が禁止。90-BF → S3
    S6: 通常の後続バイトかどうか判定(3) 80-BF → S3
    S7: 1バイト目のyyyが100なので、
        U+10FFFF以下にするため、
        2バイト目のyyは00になる。80-8F → S3
    
    31bits
    
     S0: 最初の1バイト
           1バイト文字を見た → ACCEPT
           2バイト文字を見た → S1
           E0を見た → S2
           E1-EF(!ED) → S3
           ED → S4
           F0を見た → S5
           F1-F7を見た → S6
           F8を見た → S8
           F9-FBを見た → S9
           FCを見た → S10
           FDを見た → S11
     S1: 通常の後続バイトかどうか判定(1) 80-BF → ACCEPT
     S2: 3バイト文字の2文字目だが、
         1バイト目のyyyyが000だったので、
         2バイト目のyの0が禁止。 A0-BF → S1
     S3: 通常の後続バイトかどうか判定(2) 80-BF → S1
     S4: A0-BFを見たらサロゲートペアなので、
         80-9Fだけ許可 → S1
     S5: 4バイト文字の2文字目だが、
         1バイト目のyyyが000だったので、
         2バイト目のyyの00が禁止。90-BF → S3
     S6: 通常の後続バイトかどうか判定(3) 80-BF → S3
     S8: 5バイト文字の2文字目だが、
         1バイト目のyyが00だったので、
         2バイト目のyyyの000が禁止。88-BF → S6
     S9: 通常の後続バイトかどうか判定(4) 80-BF → S6
    S10: 6バイト文字の2文字目だが、
         1バイト目のyが0だったので、
         2バイト目のyyyyの0000が禁止。84-BF → S9
    S11: 通常の後続バイトかどうか判定(5) 80-BF → S9
    
    opened by omochi 10
  • Irregal capture after recursive match

    Irregal capture after recursive match

    Thank you very much for maintaining the excellent library.

    I found the latest version of Onigmo suffers from the following incorrect behavior when used with a recursive expression.

    /\(((?:[^(]|\g<0>)*)\)/ matches "(abc)(abc)" => OK :green_heart: matches[0] == (0, 5) corresponding to "(abc)" => OK :green_heart: matches[1] == (6, 4) corresponding to a string with negative length => NG :broken_heart:

    /\(((?:[^()]|\g<0>)*)\)/ matches "((abc)(abc))" => OK :green_heart: matches[0] == (0, 12) corresponding to "((abc)(abc))" => OK :green_heart: matches[1] == (7, 11) corresponding to "abc)" => NG :broken_heart:

    It seems that matches[].rm_so refers to the last capture while matches[].rm_eo refers to to the top level capture. I believe the users will be happier if both of them refers to the top level capture. When tested in ruby, the former example returns an invalid string for $2 that causes ArgumentError when given to a function as described at ruby/正規表現.

    bug 
    opened by osamutake 9
  • Support --enable-mutlithread for MinGW build

    Support --enable-mutlithread for MinGW build

    We don't need to check pthread availability for MinGW build because Windows provides thread API by default.

    Note that we need to regenerate configure.

    opened by kou 8
  • Add LGTM code quality badges

    Add LGTM code quality badges

    Hi there!

    I thought you might be interested in adding these LGTM code quality badges to your project. They indicate how you care about code quality and encourage your future contributors to do the same. To get an idea of the analyses reflected by these grades, check the alerts discovered by LGTM.

    N.B.: I am on the team behind LGTM.com, I'd appreciate your feedback on this initiative, whether you're interested or not, if you find time to drop me a line. Thanks.

    opened by xcorail 7
  • Implement Direct Threaded VM described in #51. Improving ~49%.

    Implement Direct Threaded VM described in #51. Improving ~49%.

    Here are benchmark scores. Benchmark suite is derived form http://sljit.sourceforge.net/regex_perf.html.

    MasterThis PRImprove Rate
    Twain47 ms47 ms0%
    ^Twain47 ms47 ms0%
    Twain$47 ms47 ms0%
    Huck[a-zA-Z]+|Finn[a-zA-Z]+127 ms127 ms0%
    a[^x]{20}b1172 ms889 ms31%
    Tom|Sawyer|Huckleberry|Finn151 ms153 ms-1%
    .{0,3}(Tom|Sawyer|Huckleberry|Finn)497 ms449 ms10%
    [a-zA-Z]+ing4032 ms2705 ms49%
    ^[a-zA-Z]{0,4}ing[^a-zA-Z]96 ms98 ms-2%
    [a-zA-Z]+ing$4175 ms2797 ms49%
    ^[a-zA-Z ]{5,}$1770 ms1623 ms9%
    ^.{16,20}$1757 ms1637 ms7%
    ([a-f](.[d-m].){0,2}[h-n]){2}1849 ms1670 ms11%
    ([A-Za-z]awyer|[A-Za-z]inn)[^a-zA-Z]656 ms607 ms8%
    "[^"]{0,30}[?!\.]"115 ms93 ms24%
    Tom.{10,25}river|river.{10,25}Tom260 ms262 ms-1%

    Env:

    $ uname -a
    Linux Dynabook 3.19.0-18-generic #18-Ubuntu SMP Tue May 19 18:31:35 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    
    $ cat /proc/cpuinfo
    processor   : 0
    vendor_id   : GenuineIntel
    cpu family  : 6
    model       : 37
    model name  : Intel(R) Core(TM) i5 CPU       M 450  @ 2.40GHz
    stepping    : 5
    microcode   : 0x2
    cpu MHz     : 1199.000
    cache size  : 3072 KB
    physical id : 0
    siblings    : 4
    core id     : 0
    cpu cores   : 2
    apicid      : 0
    initial apicid  : 0
    fpu     : yes
    fpu_exception   : yes
    cpuid level : 11
    wp      : yes
    flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
    bugs        :
    bogomips    : 4788.65
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 36 bits physical, 48 bits virtual
    power management:
    
    processor   : 1
    vendor_id   : GenuineIntel
    cpu family  : 6
    model       : 37
    model name  : Intel(R) Core(TM) i5 CPU       M 450  @ 2.40GHz
    stepping    : 5
    microcode   : 0x2
    cpu MHz     : 1199.000
    cache size  : 3072 KB
    physical id : 0
    siblings    : 4
    core id     : 0
    cpu cores   : 2
    apicid      : 1
    initial apicid  : 1
    fpu     : yes
    fpu_exception   : yes
    cpuid level : 11
    wp      : yes
    flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
    bugs        :
    bogomips    : 4788.65
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 36 bits physical, 48 bits virtual
    power management:
    
    processor   : 2
    vendor_id   : GenuineIntel
    cpu family  : 6
    model       : 37
    model name  : Intel(R) Core(TM) i5 CPU       M 450  @ 2.40GHz
    stepping    : 5
    microcode   : 0x2
    cpu MHz     : 2133.000
    cache size  : 3072 KB
    physical id : 0
    siblings    : 4
    core id     : 2
    cpu cores   : 2
    apicid      : 4
    initial apicid  : 4
    fpu     : yes
    fpu_exception   : yes
    cpuid level : 11
    wp      : yes
    flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
    bugs        :
    bogomips    : 4788.65
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 36 bits physical, 48 bits virtual
    power management:
    
    processor   : 3
    vendor_id   : GenuineIntel
    cpu family  : 6
    model       : 37
    model name  : Intel(R) Core(TM) i5 CPU       M 450  @ 2.40GHz
    stepping    : 5
    microcode   : 0x2
    cpu MHz     : 1199.000
    cache size  : 3072 KB
    physical id : 0
    siblings    : 4
    core id     : 2
    cpu cores   : 2
    apicid      : 5
    initial apicid  : 5
    fpu     : yes
    fpu_exception   : yes
    cpuid level : 11
    wp      : yes
    flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
    bugs        :
    bogomips    : 4788.65
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 36 bits physical, 48 bits virtual
    power management:
    
    $ cat /proc/meminfo
    MemTotal:        7965524 kB
    MemFree:         6007232 kB
    MemAvailable:    6635616 kB
    Buffers:          147344 kB
    Cached:           791320 kB
    SwapCached:            0 kB
    Active:          1170092 kB
    Inactive:         614612 kB
    Active(anon):     848752 kB
    Inactive(anon):   164980 kB
    Active(file):     321340 kB
    Inactive(file):   449632 kB
    Unevictable:          64 kB
    Mlocked:              64 kB
    SwapTotal:             0 kB
    SwapFree:              0 kB
    Dirty:               856 kB
    Writeback:             0 kB
    AnonPages:        846212 kB
    Mapped:           257944 kB
    Shmem:            167692 kB
    Slab:              81000 kB
    SReclaimable:      52676 kB
    SUnreclaim:        28324 kB
    KernelStack:        7824 kB
    PageTables:        27924 kB
    NFS_Unstable:          0 kB
    Bounce:                0 kB
    WritebackTmp:          0 kB
    CommitLimit:     3982760 kB
    Committed_AS:    4154020 kB
    VmallocTotal:   34359738367 kB
    VmallocUsed:      551880 kB
    VmallocChunk:   34359178716 kB
    HardwareCorrupted:     0 kB
    AnonHugePages:    327680 kB
    CmaTotal:              0 kB
    CmaFree:               0 kB
    HugePages_Total:       0
    HugePages_Free:        0
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    Hugepagesize:       2048 kB
    DirectMap4k:       99392 kB
    DirectMap2M:     8079360 kB
    
    $ gcc --version
    gcc (Ubuntu 4.9.2-10ubuntu13) 4.9.2
    Copyright (C) 2014 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    
    opened by KeenS 7
  • Fix security issues

    Fix security issues

    Fix following security issues:

    • https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9224
    • https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9226
    • https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9227
    • https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9228
    • https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9229

    And also fix a problem in onig_scan().

    opened by k-takata 6
  • Define PRIdPTRDIFF at regint.h if not defined yet

    Define PRIdPTRDIFF at regint.h if not defined yet

    Motivation:

    Because PRIdPTRDIFF is not defined on OSX with default c compiler, cannot build onigumo with ONIG_DEBUG_COMPILE=1.

    $ cd Onigumo
    $ ./configure CFLAGS="-DONIG_DEBUG_COMPILE=1"
    $ make
    
    /bin/sh ./libtool --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..  -I.. -I/usr/local/include -I../enc/unicode  -Wall -DONIG_DEBUG_COMPILE=1 -MT regexec.lo -MD -MP -MF .deps/regexec.Tpo -c -o regexec.lo ../regexec.c
    libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -I.. -I/usr/local/include -I../enc/unicode -Wall -DONIG_DEBUG_COMPILE=1 -MT regexec.lo -MD -MP -MF .deps/regexec.Tpo -c ../regexec.c  -fno-common -DPIC -o .libs/regexec.o
    ../regexec.c:4412:43: error: expected ')'
        fprintf(stderr, "onig_search: error %"PRIdPTRDIFF"\n", r);
                                              ^
    ../regexec.c:4412:12: note: to match this '('
        fprintf(stderr, "onig_search: error %"PRIdPTRDIFF"\n", r);
               ^
    ../regexec.c:4421:43: error: expected ')'
        fprintf(stderr, "onig_search: error %"PRIdPTRDIFF"\n", r);
                                              ^
    ../regexec.c:4421:12: note: to match this '('
        fprintf(stderr, "onig_search: error %"PRIdPTRDIFF"\n", r);
               ^
    2 errors generated.
    

    Modifications:

    • Define PRIdPTRDIFF

    Result:

    No more build error

    opened by imasahiro 6
  • update config.guess

    update config.guess

    Current version of config.guess is too old that it fails to detect aarch64.

    This PR updates the file to the most recent version; obtained from http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD

    opened by kazuho 6
  • Implement Absent Operator

    Implement Absent Operator

    opened by k-takata 5
  • avoid negative character

    avoid negative character

    In ASCII, 'a' is bigger than 'A'. Which means 'A' - 'a' is a negative number (-32, to be precise). In C, the type of 'a' and 'A' is signed int. So 'A' - 'a' is also signed int. It is (signed int)-32.

    The problem is, OnigCodePoint is unsigned int. Adding a negative number to a variable of OnigCodepoint (code here) introduces an unintentional cast of (unsigned)(signed)-32, which is 4,294,967,264. Adding this value to code then overflows, and the result eventually becomes normal codepoint.

    The series of operations are not a serious problem but because code >= 'a' holds, we can (code - 'a') + 'A' to reroute this.

    See also https://travis-ci.org/ruby/ruby/jobs/452680646#L2190

    opened by shyouhei 4
  • Add CodeQL workflow for GitHub code scanning

    Add CodeQL workflow for GitHub code scanning

    Hi k-takata/Onigmo!

    This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that we’ve integrated LGTM’s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!

    FAQ

    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request — to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.
    • Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesn’t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by lgtm-com[bot] 0
  • Fix out of bounds memory read in `add_compile_string`

    Fix out of bounds memory read in `add_compile_string`

    This PR fixes out of bounds memory read in add_compile_string revealed by fuzzing fluent-bit: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=46086

    The root cause is that a call to enclen in compile_string_node results in a call to onigenc_mbclen_approximate. When the value of p passed to the function is \xf2 even though it is the last byte in multibyte sequince (the next byte is unexpected string terminator \0) the onigenc_mbclen_approximate returns it's size as 4. The size is added to the overall string length and results in reading past the end of the string.

    opened by sashashura 0
  • Fix out of bounds memory read in `onig_node_str_cat`

    Fix out of bounds memory read in `onig_node_str_cat`

    This PR fixes out of bounds memory read in onig_node_str_cat revealed by fuzzing fluent-bit: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=46049

    The root cause is that a call to enclen inside of PFETCH macro when called in fetch_token results in a call to onigenc_mbclen_approximate. When the value of p passed to the function is \xec even though it is the last byte in multibyte sequince (the next byte is unexpected string terminator \0) the onigenc_mbclen_approximate returns it's size as 4. The size is added to the overall string length and results in reading past the end of the string.

    opened by sashashura 3
  • Windows | fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory

    Windows | fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory

    Hi @k-takata, thanks for the lib. while I was trying to build the lib, I faced this error on Windows 10 (64-bit operating system, x64-based processor).

    Onigmo\regint.h(235): fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory (compiling source file regcomp.c)
    regtrav.c
    Onigmo\regint.h(235): fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory (compiling source file reggnu.c)
    Onigmo\regint.h(235): fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory (compiling source file regext.c)
    st.c
    regposix.c
    Onigmo\regint.h(235): fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory (compiling source file regparse.c)
    Onigmo\regint.h(235): fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory (compiling source file regenc.c)
    Onigmo\regint.h(235): fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory (compiling source file regerror.c)
    Onigmo\regint.h(235): fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory (compiling source file regexec.c)
    Onigmo\regint.h(235): fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory (compiling source file regsyntax.c)
    Onigmo\regint.h(235): fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory (compiling source file regtrav.c)
    Onigmo\regint.h(235): fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory (compiling source file st.c)
    Onigmo\regint.h(235): fatal error C1083: Cannot open include file: 'alloca.h': No such file or directory (compiling source file regposix.c)
    NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.26.28801\bin\HostX86\x86\cl.EXE"' : return code '0x2'
    Stop.
    
    opened by drophouse01 2
  • Separating or excluding the regcomp/regexec/regerror functions from the library.

    Separating or excluding the regcomp/regexec/regerror functions from the library.

    The POSIX compatibility functions regcomp, regexec and regerror are included in the Onigmo library. There doesn't seem to be any way to avoid this without patching the code or build scripts. If there is one, please advice, we haven't found it.

    This is a problem for us. We have a large application with lots of libraries, some of which uses the standard library functions regcomp et al, but we also use the Onigmo library (the native functions) in some places. If library A used regomp, and library B uses the native Onigmo functions, and both are linked in the same binary, A will use the functions in the Onigmo library, not the standard library versions. You might argue the it shouldn't matter, but it does for us. It's important that we know exactly which function is used. Or to put it differently, we want regcomp to mean the standard library function and nothing else, everywhere.

    We can work around this by patching Makefile.in before building, but we would prefer if there was a supported way to do this. Either

    • A configure option to exclude the regcomp etc from the build completely
    • Put regcomp etc in a separate library

    Either method would be ok.

    opened by pem 0
Releases(Onigmo-6.2.0)
  • Onigmo-6.2.0(Jan 30, 2019)

    Changes from Onigmo 6.1.3 are listed below.

    Improvement

    • Add USE_CASE_MAP_API configuration. (PR #125)
    • Update Unicode data: Unicode 11.0.0, Emoji 11.0 (PR #112)
    • Make it possible to extend UTF-8 to 31 bits. (PR #111)
    • Support gperf 3.1 with backward compatibility. (PR #101)

    Bug fixes

    • Import the latest code from Ruby (PR #112)
    • Fix that "ss" in look-behind causes syntax error. (Issue #92) (PR #116)
    • Fix performance regression if quantifier lower bound is 1. (Issue #100) (PR #114)
    • Fix performance problem with /k/i and /s/i. (Issue #97) (PR #113) (Issue #120) (PR #121)
    Source code(tar.gz)
    Source code(zip)
    onigmo-6.2.0.tar.gz(825.13 KB)
  • Onigmo-6.1.3(Sep 26, 2017)

    Changes from Onigmo 6.1.2 are listed below.

    Security fixes

    • PR #91
      • https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9224
      • https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9226
      • https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9227
      • https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9228
      • https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9229

    Bug fixes

    • Add a declaration of onig_end() in onigmoposix.h
    • Fix .*\b (Issue #96)
    • Don't include shift_jis.c from windows_31j.c (Issue #88)
    Source code(tar.gz)
    Source code(zip)
    onigmo-6.1.3.tar.gz(802.85 KB)
  • Onigmo-6.1.2(May 15, 2017)

    Changes from Onigmo 6.1.1 are listed below.

    Security fixes

    • Initialize return values. (Ruby r57660)
      • https://bugs.ruby-lang.org/issues/13234
      • https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-6181

    Improvement

    • Improve document about absence operator (Issue #87)
      The name of the operator was changed from "absent" to "absence".
    • Import the latest enc-unicode.rb from Ruby r58070. (Ruby r58065, r58066, r58069 and r58070.)

    Bug fixes

    • Fix macro expansion bug. (Ruby r58468)
    • Fix UTF-32 valid_encoding?. (Ruby r57816)
    • Fix missing const on onig_new_without_alloc. (Issue #85)
    Source code(tar.gz)
    Source code(zip)
    onigmo-6.1.2.tar.gz(802.40 KB)
  • Onigmo-6.1.1(Jan 29, 2017)

  • Onigmo-6.1.0(Jan 16, 2017)

    Changes from Onigmo 6.0.0 are listed below.

    New features

    • Support absent operator (?~subexp) (Issue #82)
      This is a new operator which matches any string which doesn't contain any string which maches subexp. See the document for detail.

    Bug fixes

    • mac: Fix loading library (PR #84)
    • Fix out-of-bounds read in set_bm_skip() (Issue #81)
    • Ignore /sample/scan (PR #80)
    • suppress warning: sign compare (PR #79)
    • CRuby enc/*.c needs other way to detect (PR #78)
    • Fix backward search with .* (Issue #69)
    Source code(tar.gz)
    Source code(zip)
    onigmo-6.1.0.tar.gz(802.04 KB)
  • Onigmo-6.0.0(Dec 10, 2016)

    This is the first major release of Onigmo. :tada: Changes from Onigmo 5.15.0 are listed below.

    Incompatible changes

    • The header file, library file and other files are renamed from oniguruma* or onig* to onigmo*. This aims to avoid conflict with Oniguruma 6.x. (Issue #66)
    • Source code for Ruby 2.x in ruby-2.x branch is now fully merged into the master branch. This includes the following changes:
      • Ruby specific parts are now sorrounded by #ifdef RUBY .. #endif.
      • Ruby specific onig_compile() is now named onig_compile_ruby().
      • Some APIs have end parameter now. E.g. onigenc_get_prev_char_head().
      • Some encoding names were changed. E.g.: "CP932" -> "Windows-31J".
      • Encoding structs and syntax structs are now constants.
    • The behavior of [[:punct:]] on Unicode encodings is changed. Now [[:punct:]] matches the nine characters $+<=>^|~` on all encodings. (Issue #42)
    • Drop support for very old compilers. For MSVC, VC2005 or later is required. For other platforms, ANSI C89 is required. (Issue #72)
    • Remove ONIG_OPTION_POSIX_REGION. (Issue #75)
    • Value of some constants were changed. E.g. ONIG_OPTION_NOTBOS
    • Remove onigenc_set_default_caseconv_table().
    • All THREAD_* macros are removed, because Onigmo is now thread-safe.

    New features

    • Support Token Threaded VM on GCC and Clang. (PR #52)
    • Update Unicode database from 7.0.0 to 9.0.0.
    • Support true extended grapheme cluster. (Issue #46)
    • Add onig_{get,set}_parse_depth_limit(). (Issue #68)
    • Add onig_initialize().
    • Add onig_scan().
    • Support \uHHHH in Ruby syntax.
    • Support \o{OOO} in Perl syntax.
    • Use separate build directories for x86 and x64 on Windows. (Issue #67)
    • Some new encodings were imported from Ruby: KOI8-U, Windows-1250, Windows-1251, Windows-1252, Windows-1253, Windows-1254 and Windows-1257.
    • Support multiprocess build on MSVC.
    • Add build_nmake script for building with nmake. (PR #47)

    Bug fixes

    • Fix that deeply nested capture groups cause stack overflow. (Issue #68)
    • Fix that some patterns cause crash. (Issue #65, etc.)
    • Fix that \G doesn't match properly with onig_search_gpos(). (Issue #53)
    • Fix that look-behind does not take Unicode case-folding into account. (Issue #18)
    • Fix that wrong capturing occurs after recursive match. (Issue #48)
    • Fix build error on systems using BSD make. (PR #55)
    • The behavior of named backreferences in Perl syntax was different from Perl. (Issue #74)
    • etc.

    Many changes from Ruby and also from Oniguruma were merged.

    Source code(tar.gz)
    Source code(zip)
    onigmo-6.0.0.tar.gz(800.81 KB)
Owner
K.Takata
K.Takata
C++ regular expressions made easy

CppVerbalExpressions C++ Regular Expressions made easy VerbalExpressions is a C++11 Header library that helps to construct difficult regular expressio

null 362 Nov 29, 2022
SRL-CPP is a Simple Regex Language builder library written in C++11 that provides an easy to use interface for constructing both simple and complex regex expressions.

SRL-CPP SRL-CPP is a Simple Regex Language builder library written in C++11 that provides an easy to use interface for constructing both simple and co

Telepati 0 Mar 9, 2022
High-performance regular expression matching library

Hyperscan Hyperscan is a high-performance multiple regex matching library. It follows the regular expression syntax of the commonly-used libpcre libra

Intel Corporation 4k Jan 1, 2023
regular expression library

Oniguruma https://github.com/kkos/oniguruma Oniguruma is a modern and flexible regular expressions library. It encompasses features from different reg

K.Kosako 1.9k Jan 3, 2023
RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

This is the source code repository for RE2, a regular expression library. For documentation about how to install and use RE2, visit https://github.co

Google 7.5k Jan 4, 2023
A Compile time PCRE (almost) compatible regular expression matcher.

Compile time regular expressions v3 Fast compile-time regular expressions with support for matching/searching/capturing during compile-time or runtime

Hana Dusíková 2.6k Jan 5, 2023
A small implementation of regular expression matching engine in C

cregex cregex is a compact implementation of regular expression (regex) matching engine in C. Its design was inspired by Rob Pike's regex-code for the

Jim Huang 72 Dec 6, 2022
The approximate regex matching library and agrep command line tool.

Introduction TRE is a lightweight, robust, and efficient POSIX compliant regexp matching library with some exciting features such as approximate (fuzz

Ville Laurikari 698 Dec 26, 2022
Glob pattern to regex translator in C++11. Optionally, directory traversal with glob pattern in C++17. Header-only library.

Glob pattern to regex translator in C++11. Optionally, directory traversal with glob pattern in C++17. Header-only library.

Takayuki MATSUOKA 3 Oct 27, 2021
Perl Incompatible Regular Expressions library

This is PIRE, Perl Incompatible Regular Expressions library. This library is aimed at checking a huge amount of text against relatively many regular

Yandex 320 Oct 9, 2022
A system to flag anomalous source code expressions by learning typical expressions from training data

A friendly request: Thanks for visiting control-flag GitHub repository! If you find control-flag useful, we would appreciate a note from you (to niran

Intel Labs 1.2k Dec 30, 2022
C++ regular expressions made easy

CppVerbalExpressions C++ Regular Expressions made easy VerbalExpressions is a C++11 Header library that helps to construct difficult regular expressio

null 362 Nov 29, 2022
A powerful and fast search tool using regular expressions

A powerful and fast search tool using regular expressions

Stefan Küng 1.3k Jan 8, 2023
An easy-to-use and competitively fast JSON parsing library for C++17, forked from Bitcoin Cash Node's own UniValue library.

UniValue JSON Library for C++17 (and above) An easy-to-use and competitively fast JSON parsing library for C++17, forked from Bitcoin Cash Node's own

Calin Culianu 24 Sep 21, 2022
RNNLIB is a recurrent neural network library for sequence learning problems. Forked from Alex Graves work http://sourceforge.net/projects/rnnl/

Origin The original RNNLIB is hosted at http://sourceforge.net/projects/rnnl while this "fork" is created to repeat results for the online handwriting

Sergey Zyrianov 879 Dec 26, 2022
Pdfmm - A C++ PDF manipulation library forked from PoDoFo

pdfmm What is pdfmm? Requirements String encoding API Stability TODO Licensing No warranty Contributions Authors What is pdfmm? pdfmm is a s a free po

null 53 Jan 6, 2023
The core engine forked from NVidia's Q2RTX. Heavily modified and extended to allow for a nicer experience all-round.

Nail & Crescent - Development Branch Scratchpad - Things to do or not forget: Items are obviously broken. Physics.cpp needs more work, revising. Proba

PalmliX Studio 21 Dec 22, 2022
The core engine forked from NVidia's Q2RTX. Heavily modified and extended to allow for a nicer experience all-round.

Polyhedron - A Q2RTX Fork A fork of the famous Q2RTX project by NVIDIA™ that strives to improve all of its other factors of what was once upon a time

Polyhedron Studio 21 Dec 22, 2022
Speed Running and Competition Doom. For strictly vanilla speed runs and competitions - forked from CNDoom

Speed Running and Competition Doom Speed Running and Competition Doom is based on Chocolate Doom and aims to accurately reproduce the original DOS ver

Gibbon 3 May 24, 2022
legacy Botnets source code Forked from github.com/malwares

Legacy-Botnets-Source-Code-Collection github.com/malwares None of these were made by me and I take no resonsibility for anything done with the code. T

Mohammed amine guesmi 14 Sep 10, 2022