Typesense is a fast, typo-tolerant search engine for building delightful search experiences.

Overview

Typesense

Typesense is a fast, typo-tolerant search engine for building delightful search experiences.



Typesense Demo

Here are a couple of live demos that show Typesense in action on large datasets:

Quick Links

Features

  • Typo Tolerance: Handles typographical errors elegantly, out-of-the-box.
  • Simple and Delightful: Simple to set-up, integrate with, operate and scale.
  • Blazing Fast: Built in C++. Meticulously architected from the ground-up for low-latency (<50ms) instant searches.
  • Tunable Ranking: Easy to tailor your search results to perfection.
  • Sorting: Sort results based on a particular field at query time (helpful for features like "Sort by Price (asc)").
  • Faceting & Filtering: Drill down and refine results.
  • Grouping & Distinct: Group similar results together to show more variety.
  • Federated Search: Search across multiple collections (indices) in a single HTTP request.
  • Scoped API Keys: Generate API keys that only allow access to certain records, for multi-tenant applications.
  • Synonyms: Define words as equivalents of each other, so searching for a word will also return results for the synonyms defined.
  • Curation & Merchandizing: Boost particular records to a fixed position in the search results, to feature them.
  • Raft-based Clustering: Setup a distributed cluster that is highly available.
  • Seamless Version Upgrades: As new versions of Typesense come out, upgrading is as simple as swapping out the binary and restarting Typesense.

Don't see a feature on this list? Search our issue tracker if someone has already requested it and upvote it, or open a new issue if not. We prioritize our roadmap based on user feedback, so we'd love to hear from you.

Who's using this?

Typesense is used by a range of users across different industries. We've only recently started documenting who's using it in our Showcase.

If you'd like to be included in the list, please feel free to edit SHOWCASE.md and send us a PR.

Install

Option 1: You can download the binary packages that we publish for Linux (x86-64) and Mac.

Option 2: You can also run Typesense from our official Docker image.

Option 3: Spin up a managed cluster with Typesense Cloud:

Deploy with Typesense Cloud

Quick Start

Here's a quick example showcasing how you can create a collection, index a document and search it on Typesense.

Let's begin by starting the Typesense server via Docker:

docker run -p 8108:8108 -v/tmp/data:/data typesense/typesense:0.19.0 --data-dir /data --api-key=Hu52dwsas2AdxdE

We have API Clients in a couple of languages, but let's use the Python client for this example.

Install the Python client for Typesense:

pip install typesense

We can now initialize the client and create a companies collection:

import typesense

client = typesense.Client({
  'api_key': 'Hu52dwsas2AdxdE',
  'nodes': [{
    'host': 'localhost',
    'port': '8108',
    'protocol': 'http'
  }],
  'connection_timeout_seconds': 2
})

create_response = client.collections.create({
  "name": "companies",
  "fields": [
    {"name": "company_name", "type": "string" },
    {"name": "num_employees", "type": "int32" },
    {"name": "country", "type": "string", "facet": True }
  ],
  "default_sorting_field": "num_employees"
})

Now, let's add a document to the collection we just created:

document = {
 "id": "124",
 "company_name": "Stark Industries",
 "num_employees": 5215,
 "country": "USA"
}

client.collections['companies'].documents.create(document)

Finally, let's search for the document we just indexed:

search_parameters = {
  'q'         : 'stork',
  'query_by'  : 'company_name',
  'filter_by' : 'num_employees:>100',
  'sort_by'   : 'num_employees:desc'
}

client.collections['companies'].documents.search(search_parameters)

Did you notice the typo in the query text? No big deal. Typesense handles typographic errors out-of-the-box!

Step-by-step Walk-through

A step-by-step walk-through is available on our website here.

This will guide you through the process of starting up a Typesense server, indexing data in it and querying the data set.

API Documentation

Here's our official API documentation, available on our website: https://typesense.org/api.

If you notice any issues with the documentation or walk-through, please let us know or send us a PR here: https://github.com/typesense/typesense-website.

API Clients

While you can definitely use CURL to interact with Typesense Server directly, we offer official API clients to simplify using Typesense from your language of choice. The API Clients come built-in with a smart retry strategy to ensure that API calls made via them are resilient, especially in an HA setup.

If we don't offer an API client in your language, you can still use any popular HTTP client library to access Typesense's APIs directly.

Here are some community-contributed clients and integrations:

We welcome community contributions to add more official client libraries and integrations. Please reach out to us at [email protected] or open an issue on Github to collaborate with us on the architecture. 🙏

Search UI Components

You can use our InstantSearch.js adapter to quickly build powerful search experiences, complete with filtering, sorting, pagination and more.

Here's how: https://typesense.org/docs/0.19.0/guide/#search-ui

Benchmarks

We tested a dataset with ~3 million records (Amazon product data) that was ~13GB on disk and we were able to achieve a throughput of 250 concurrent search queries per second on a 16GB RAM, 8-vCPU 3-node Typesense cluster.

We'd love to benchmark with larger datasets, if we can find large ones in the public domain. If you have any suggestions for structured datasets that are open, please let us know by opening an issue.

We'd also be delighted if you're able to share benchmarks from your own large datasets. Please send us a PR!

FAQ

How does this differ from Elasticsearch?

Elasticsearch is a large piece of software, that takes non-trivial amount of effort to setup, administer, scale and fine-tune. It offers you a few thousand configuration parameters to get to your ideal configuration. So it's better suited for large teams who have the bandwidth to get it production-ready, regularly monitor it and scale it, especially when they have a need to store billions of documents and petabytes of data (eg: logs).

Typesense is built specifically for decreasing the "time to market" for a delightful search experience. It is a light-weight yet powerful & scaleable alternative that focuses on Developer Happiness and Experience with a clean well-documented API, clear semantics and smart defaults so it just works well out-of-the-box, without you having to turn many knobs.

Elasticsearch also runs on the JVM, which by itself can be quite an effort to tune to run optimally. Typesense, on the other hand, is a single light-weight self-contained native binary, so it's simple to setup and operate.

See a side-by-side feature comparison here.

How does this differ from Algolia?

Algolia is a proprietary, hosted, search-as-a-service product that works well, when cost is not an issue. From our experience, fast growing sites and apps quickly run into search & indexing limits, accompanied by expensive plan upgrades as they scale.

Typesense on the other hand is an open-source product that you can run on your own infrastructure or use our managed SaaS offering - Typesense Cloud. The open source version is free to use (besides of course your own infra costs). With Typesense Cloud we do not charge by records or search operations. Instead, you get a dedicated cluster and you can throw as much data and traffic at it as it can handle. You only pay a fixed hourly cost & bandwidth charges for it, depending on the configuration your choose, similar to most modern cloud platforms.

From a product perspective, Typesense is closer in spirit to Algolia than Elasticsearch. However, we've addressed some important limitations with Algolia:

Algolia requires separate indices for each sort order, which counts towards your plan limits. Most of the index settings like fields to search, fields to facet, fields to group by, ranking settings, etc are defined upfront when the index is created vs being able to set them on the fly at query time.

With Typesense, these settings can be configured at search time via query parameters which makes it very flexible and unlocks new use cases. Typesense is also able to give you sorted results with a single index, vs having to create multiple. This helps reduce memory consumption.

Algolia offers the following features that Typesense does not have currently: geo spatial searches, personalization & server-based search analytics. With Typesense, we intend to bridge this gap, but in the meantime, please let us know if any of these are a show stopper for your use case by creating a feature request in our issue tracker.

See a side-by-side feature comparison here.

Speed is great, but what about the memory footprint?

A fresh Typesense server will consume about 30 MB of memory. As you start indexing documents, the memory use will increase correspondingly. How much it increases depends on the number and type of fields you index.

We've strived to keep the in-memory data structures lean. To give you a rough idea: when 1 million Hacker News titles are indexed along with their points, Typesense consumes 165 MB of memory. The same size of that data on disk in JSON format is 88 MB. If you have any numbers from your own datasets that we can add to this section, please send us a PR!

Why the GPL license?

From our experience companies are generally concerned when libraries they use are GPL licensed, since library code is directly integrated into their code and will lead to derivative work and trigger GPL compliance. However, Typesense Server is server software and we expect users to typically run it as a separate daemon, and not integrate it with their own code. GPL covers and allows for this use case generously (eg: Linux is GPL licensed). Now, AGPL is what makes server software accessed over a network result in derivative work and not GPL. And for that reason we’ve opted to not use AGPL for Typesense.

Now, if someone makes modifications to Typesense server, GPL actually allows you to still keep the modifications to yourself as long as you don't distribute the modified code. So a company can for example modify Typesense server and run the modified code internally and still not have to open source their modifications, as long as they make the modified code available to everyone who has access to the modified software.

Now, if someone makes modifications to Typesense server and distributes the modifications, that's where GPL kicks in. Given that we’ve published our work to the community, we'd like for others' modifications to also be made open to the community in the spirit of open source. We use GPL for this purpose. Other licenses would allow our open source work to be modified, made closed source and distributed, which we want to avoid with Typesense for the project’s long term sustainability.

Here's more background on why GPL, as described by Discourse: https://meta.discourse.org/t/why-gnu-license/2531. Many of the points mentioned there resonate with us.

Now, all of the above only apply to Typesense Server. Our client libraries are indeed meant to be integrated into our users’ code and so they use Apache license.

So in summary, AGPL is what is usually problematic for server software and we’ve opted not to use it. We believe GPL for Typesense Server captures the essence of what we want for this open source project. GPL has a long history of successfully being used by popular open source projects. Our libraries are still Apache licensed.

If you have specifics that prevent you from using Typesense due to a licensing issue, we're happy to explore this topic further with you. Please reach out to us.

Support

👋 🌐 New: If you have general questions about Typesense, want to say hello or just follow along, we'd like to invite you to join our Slack Community.

We also do virtual office hours every Friday. Reserve a time slot here.

If you run into any problems or issues, please create a Github issue and we'll try our best to help.

We strive to provide good support through our issue trackers on Github. However, if you'd like to receive private & prioritized support with:

  • Guaranteed SLAs
  • Phone / video calls to discuss your specific use case and get recommendations on best practices
  • Private discussions over Slack
  • Guidance around deployment, ops and scaling best practices
  • Prioritized feature requests

We do offer Paid Support options. Please reach out to us at [email protected] to sign up.

Contributing

We are a lean team on a mission to democratize search and we'll take all the help we can get! If you'd like to get involved, here's information on where we could use your help: Contributing.md

Getting Latest Updates

If you'd like to get updates when we release new versions, click on the "Watch" button on the top and select "Releases only". Github will then send you notifications along with a changelog with each new release.

We also post updates to our Twitter account about releases and additional topics related to Typesense. Follow us here: @typesense.

👋 🌐 New: We'll also post updates on our Slack Community.

Build from source

Building with Docker

The docker build script takes care of all required dependencies, so it's the easiest way to build Typesense:

TYPESENSE_VERSION=nightly ./docker-build.sh --build-deploy-image --create-binary [--clean] [--depclean]

Building on your machine

Typesense requires the following dependencies:

  • C++11 compatible compiler (GCC >= 4.9.0, Apple Clang >= 8.0, Clang >= 3.9.0)
  • Snappy
  • zlib
  • OpenSSL (>=1.0.2)
  • curl
  • ICU
  • brpc
  • braft
./build.sh --create-binary [--clean] [--depclean]

The first build will take some time since other third-party libraries are pulled and built as part of the build process.


© 2016-2021 Typesense Inc.

Comments
  • Nodes and Kubernetes

    Nodes and Kubernetes

    Description

    I decided to create a helm chart (that I want to provide to you) that is able to start TypeSense cluster. There are several limitations from TypeSense to be able start it.:

    • TypeSense needs a file where there are "IP" of nodes. That is not possible to determine IP of future nodes, we only can determine the hostname (determinist name with StatefulSet in Kubernetes)
    • TypeSense doesn't provides dynamic node creation. In Kubernetes, it's possible to use HorizontaPodScaler to ensure cluster growing while there is a certain amount of resources is reached

    The second point can be missed at this time, but the first one is a blocking point.

    Do you think it could be possible to use node name (resolved by DNS) instead of using IP ?

    Note: I tried to setup "nodes" file with resolvable names, but it failed.

    opened by metal3d 43
  • Feature requests (running list)

    Feature requests (running list)

    1. Allow search fields to be inferred

    When a collection has been setup, its understood that these are fields to be indexed, and hence to be searched for. If no query_by is sent in the params, then it means that the search should be performed thru all fields.

    2. Hide certain fields from being sent in the response

    out_ofand search_time_msshould not be sent except for the Admin key. I thought a search-only key would automatically hide it, but it still shows up. Then I created a Scoped Search Key, with a "exclude_fields":"out_of", but it still shows up in search results.

    3. Control the snippet length

    Allow controlling the length of the snippet that is generated. Its currently too short.

    4. Control "allowed domains" under CORS

    Right now CORS is a single on/off settings. Is it possible to enhance this to support a set of named & wildcarded domains instead, like Django CORS_ORIGIN_WHITELIST

    fix-available 
    opened by zehawki 34
  • Maximizing good results and Minimizing irrelevant results

    Maximizing good results and Minimizing irrelevant results

    We are trying to get good settings for searching thousands of items which would generate minimal irrelevant results. Example searches with different settings for drop tokens, num typos, and typo tokens.

    queryBy: "description,name,categories,attrib", num_typos: gtypo, drop_tokens_threshold: gdrop, typo_tokens_threshold: gttoken,

    Example 1: Searching for this Item with Drop tokens set to 0: IFP5550-BUN Viewsonic Viewboard Ifp5550 Collaboration Display - 55 Lcd - Arm Cortex A53 1.20 Ghz - 2 Gb - Infrared (irda) - Touchscreen - 16:9 Aspect Ratio - 3840 X 2160 - Led - 350 Nit - 1,200:1 Contrast Ratio - 2160p - Usb - Hdmi - Vga - Android 5.1 Lollipop Ez Bu`

    a) When searching for "IFP5550BUN" the item is found, but when searching for "IFP5550BUN Viewsonic" it is no longer found. b) When searching for "ifp5550 viewsonic bun" with typo tokens disabled it finds the above 1 item but when trying to filter that results on a Brand for "viewsonic", more items are added to the search results.


    Example 2: Having these 3 Items: "SPZ007 space", "SPZ008 spacepole", "SPZ005 space" a) when searching for "SPZ00": items SPZ008, SPZ005 are found but item SPZ007 is not found when typo tokens are set to 0, but found with typo tokens set to more then 0 b) when searching for "spz00 space" no items are found at all with settings set to 0. c) when searching for "spz00 space",with num typos set to 0, drop tokens set to more than 0 and typos kept at 1: thousands of 'irrelevant' items are found that have the word "space" in them but no mention of "spz00" d)when num typos is increased from 0 to 1 there is less results found in the above search not more. e) when searching for "spz00 space" having typos set to 1, and droptokens to 1 no items are found. But when typotokens are increased as well to more then one: hundreds of items are found such as "Sp-800, Spare" that seem quite distant from "spz00 space" f) when num typos are increased further to 3, for the above search, there is hundreds LESS results but they seem even more irrelevant with matches such us "TSP700II" highlighted 10 items down the line but first items don't even have a highlight on a matching word.

    opened by dianos 32
  • Range filter works incorrectly (URL encoding issue)

    Range filter works incorrectly (URL encoding issue)

    Description

    I tried to filter documents by duration int32 field using the following filter expression:

    duration:<=1200
    

    it returned 506 pages

    then I appended another condition:

    duration:>=600 && duration:<=1200
    

    obviously it should return less or equal to 506 pages, but it returns 1185 pages

    P.S. filter duration:>=600 also returns 1185 pages

    I tried other combinations like duration: [>=600, <=1200] without success.

    bug fix-available 
    opened by v-byte-cpu 31
  • Unable to run 0.23 on Linux arm64 on Jetson Nano 4

    Unable to run 0.23 on Linux arm64 on Jetson Nano 4

    Description

    Likely an issue with required libraries (possibly glibc)

    Steps to reproduce

    install the docker image or the linux arm64 .deb binary

    Expected Behavior

    Typesense server starts without crashing

    Actual Behavior

    $ ./typesense-server --data-dir=/tmp/typesense-data --api-key=xyz --enable-cors
    I20220604 18:36:24.442307 31503 typesense_server_utils.cpp:345] Starting Typesense 0.23.0
    I20220604 18:36:24.442425 31503 typesense_server_utils.cpp:348] Typesense is using jemalloc.
    I20220604 18:36:24.443177 31503 typesense_server_utils.cpp:397] Thread pool size: 32
    I20220604 18:36:24.461531 31503 store.h:61] Initializing DB by opening state dir: /tmp/typesense-data/db
    I20220604 18:36:24.517038 31503 store.h:61] Initializing DB by opening state dir: /tmp/typesense-data/meta
    I20220604 18:36:24.579599 31503 typesense_server_utils.cpp:478] Starting API service...
    I20220604 18:36:24.579901 31627 typesense_server_utils.cpp:248] Since no --nodes argument is provided, starting a single node Typesense cluster.
    I20220604 18:36:24.579946 31628 batched_indexer.cpp:120] Starting batch indexer with 32 threads.
    I20220604 18:36:24.580034 31503 http_server.cpp:174] Typesense has started listening on port 8108
    I20220604 18:36:24.582826 31628 batched_indexer.cpp:126] BatchedIndexer skip_index: -9999
    I20220604 18:36:24.588387 31627 server.cpp:1045] Server[braft::RaftStatImpl+braft::FileServiceImpl+braft::RaftServiceImpl+braft::CliServiceImpl] is serving on port=8107.
    I20220604 18:36:24.588434 31627 server.cpp:1048] Check out http://nvjet:8107 in web browser.
    I20220604 18:36:24.589870 31627 raft_server.cpp:65] Nodes configuration: 192.168.0.149:8107:8108
    E20220604 18:36:25.137292 31627 backward.hpp:4199] Stack trace (most recent call last) in thread 31627:
    E20220604 18:36:25.137346 31627 backward.hpp:4199] #6    Source "/build/glibc-bwB5WE/glibc-2.27/nptl/pthread_create.c", line 463, in start_thread [0x7fb1aee087]
    E20220604 18:36:25.137372 31627 backward.hpp:4199] #5    Object "/home/everdrone/Desktop/projects/typesense/typesense-server", at 0x130b8cb, in execute_native_thread_routine
    E20220604 18:36:25.137390 31627 backward.hpp:4199] #4    Source "/typesense/src/typesense_server_utils.cpp", line 454, in operator() [0x5dcb47]
    E20220604 18:36:25.137408 31627 backward.hpp:4199] #3    Source "/typesense/src/typesense_server_utils.cpp", line 293, in start_raft_server [0x5dc2ef]
    E20220604 18:36:25.137423 31627 backward.hpp:4199] #2    Source "/typesense/src/raft_server.cpp", line 96, in start [0x5c006f]
    E20220604 18:36:25.137440 31627 backward.hpp:4199] #1    Source "/opt/braft-938eeb5f67dd9ef592f7ec9bd37b9b822980a2c5/src/braft/raft.cpp", line 137, in Node [0x69244f]
    E20220604 18:36:25.137456 31627 backward.hpp:4199] #0  | Source "/opt/braft-938eeb5f67dd9ef592f7ec9bd37b9b822980a2c5/src/braft/node.cpp", line 151, in operator<<
    E20220604 18:36:25.137472 31627 backward.hpp:4199]     | Source "/usr/local/include/bvar/reducer.h", line 199, in modify<bvar::detail::AddTo<long int>, long int>
    E20220604 18:36:25.137487 31627 backward.hpp:4199]     | Source "/usr/local/include/bvar/detail/combiner.h", line 144, in compare_exchange_weak
    E20220604 18:36:25.137503 31627 backward.hpp:4199]     | Source "/usr/local/include/c++/10.3.0/bits/atomic_base.h", line 487, in compare_exchange_weak
    E20220604 18:36:25.137519 31627 backward.hpp:4199]       Source "/usr/local/include/c++/10.3.0/bits/atomic_base.h", line 464, in NodeImpl [0x675284]
    Illegal instruction (Illegal opcode [0x675284])
    E20220604 18:36:25.697453 31627 typesense_server.cpp:95] Typesense 0.23.0 is terminating abruptly.
    [1]    31503 illegal hardware instruction (core dumped)  ./typesense-server --data-dir=/tmp/typesense-data --api-key=xyz --enable-cors
    

    Metadata

    Typsense Version: 0.23

    OS: Ubuntu 18.04.6 LTS bionic

    opened by everdrone 28
  • Typesense CPU is always at 100%

    Typesense CPU is always at 100%

    Description

    Too much CPU consumption from time to time once in 24 hours or more and it keeps like that for about 15 to 30 minutes

    Steps to reproduce

    Just keep it running and do some search queries

    Expected Behavior

    Typesense works without hitting the 100% or even he hit the 100 but when the traffic spikes or something like that

    Actual Behavior

    Typesense keeps in 100% CPU usage for a long time

    Metadata

    Typsense Version: 0.23

    OS: Linux ARM V8

    opened by amine-y 26
  • [docs] How to backup cluster data ?

    [docs] How to backup cluster data ?

    Description

    I want to know how to backup all cluster data. At the present time I found only Export documents from a collection API. Moreover, I did not find any configuration options for export documents. I assume that this operation should have at least batch size like Elasticsearch Scroll API.

    Also I advise to enable discussions section like in MeiliSearch project (https://github.com/meilisearch/MeiliSearch/discussions) for questions like this.

    Expected Behavior

    Any option to backup all cluster data.

    question 
    opened by v-byte-cpu 24
  • Creating a specific key

    Creating a specific key

    Description

    I would like to be able to specify a specific key for searching, like --search-only-api-key, but it's been marked deprecated and doesn't seem to work either. It seems keys created via the key management endpoint are always randomly generated.

    I'm using typesense for read-only data, and sometimes I reload the data with another snapshot. Instead of mutating the existing collection, I would like to simply start from scratch, create the collection, fill it with data, and re-create the same key for searching.

    Is there any way to do this?

    Cheers!

    bug enhancement fix-available 
    opened by fbergroth 23
  • pinned_hits not working on the first page and pinned items don't get removed from the search list

    pinned_hits not working on the first page and pinned items don't get removed from the search list

    Description

    I'm trying to do two queries:

    1. to get a list of Ids of data 
    2. to get a list of data that has certain ones on the top by using the list from the first query as `pinned_hits` value
    

    It works if the length of pinned_hits is less or equal to the per_page number of the second query, but the pinned data still remains in its original position which makes it look like the search result has duplicate data.

    It doesn't work when the length of pinned_hits is bigger than the per_page number. On the first page, nothing gets pinned, but items are pinned on the second and third pages. Duplicates still remain in the search result.

    Steps to reproduce

    1. Do the first query with per_page value 4, and the second query with per_page value 4 or bigger and pinned_hits value as the list of data Ids from the first query.

    2. Do the first query with per_page value 250, and the second query with per_page value 8 and pinned_hits value as the list of data Ids from the first query.

    Expected Behavior

    1. Do the first query with per_page value 4, and the second query with per_page value 4 or bigger and pinned_hits value as the list of data Ids from the first query. -> Pinned items appear on the top of the list and get removed from the original position, so they don't repeat in the search result.

    2. Do the first query with per_page value 250, and the second query with per_page value 8 and pinned_hits value as the list of data Ids from the first query. -> First page should display the pinned items on the top of the list and those pinned items get removed from the original position.

    Actual Behavior

    1. Do the first query with per_page value 4, and the second query with per_page value 4 or a bigger number and pinned_hits value as the list of data Ids from the first query. -> Pinned items appear on the top of the list but it doesn't get removed from the original position so that there are repeated data in the search result.

    2. Do the first query with per_page value 250, and the second query with per_page value 8 and pinned_hits value as the list of data Ids from the first query. -> Nothing is pinned on the first page, but items are pinned on the second and third pages. Those pinned items don't get removed from the original position so they are repeated in the search result.

    Metadata

    Typsense Version: 0.17.0.rc4

    OS: Mac

    opened by axionos 22
  • Implement infix/suffix searching for specific fields

    Implement infix/suffix searching for specific fields

    Description

    Currently Typesense can not find results if the search term isn't at the beginning of the word (prefix search). For example, the search query "DEF" would not return result "ABCDEF", but "ABC" would.

    Suffix search would solve the problem when the searched term is at the end of the word, but that would still miss results where the searched string is neither at the start or the end of the word. Suffix search is apparently also complex to implement.

    As a simpler approach, a form of brute force searching could be enabled on a field by field basis. Brute force searching quickly becomes a bottleneck on larger datasets, but when dealing with "smaller" data, this might be viable. For example, when a field contains only short strings, the search could complete in a reasonable time. It would be up to the user to enable brute force searching, knowing the trade offs.

    As a real life example, a product has the code 157B2210. This product should be returned when the search query is B2210 or even B22, if brute force searching is enabled.

    A workaround was suggested for the meantime: each character combination (substring) could be split into an array field and the search performed against that. But even a ten character long string has 55 unique substrings so the amount of data would get unnecessarily large if a better solution could be found.

    enhancement 
    opened by TatuUlmanen 19
  • Question: How to setup high availability with Docker and Nginx

    Question: How to setup high availability with Docker and Nginx

    Description

    I'm trying to setup a TypeSense cluster with high availability.

    Currently, I have TypeSense running in a docker container, and Nginx proxies incoming requests to TypeSense. And when using just one node, this works flawlessly so far.

    However, I'm having issues defining the IP addresses for the --nodes argument.

    W20210209 05:01:44.249701    77 raft_server.cpp:482] Multi-node with no leader: refusing to reset peers.
    I20210209 05:01:49.069952    82 node.cpp:1484] node default_group:172.17.0.2:8107:8108 term 1 start pre_vote
    W20210209 05:01:49.070509    82 node.cpp:1494] node default_group:172.17.0.2:8107:8108 can't do pre_vote as it is not in [list of IP addresses, not unlike the example from the guide "192.168.12.1:8107:8108,192.168.12.2:8107:8108,192.168.12.3:8107:8108"]
    

    I did remove the specific IP addresses from that log.

    In short, it seems like the issue is that because of the Nginx proxy, the IP addresses that the nodes will need to use to communicate with each other are not the actual IP addresses of any one TypeSense image (since all inbound HTTP requests are proxied to localhost).

    Has anyone worked through a similar situation?

    Metadata

    Typsense Version: 0.19.0.rc16 (docker)

    OS: Ubuntu 20.04

    opened by Nick-Mazuk 19
  • Is there a way to import data directly instead of REST API for large batches

    Is there a way to import data directly instead of REST API for large batches

    For a large number of records batch import via REST interface likely adds significant overhead, especially when there are multiple large collections and frequent imports/syncs.
    Is there a cli utility that calls indexer functions in cpp directly, i.e. streaming jsonl file from disk (on typesense server host), building and saving the index to rocksdb, without the whole http loop and throttling

    opened by akamil-etsy 0
  • Add q parameter to export method

    Add q parameter to export method

    Description

    Today, the export method allows filter_by parameter. It should be great if the export method also allow the q (query) parameter.

    Expected Behavior

    client.collections('companies').documents().export({ q, filter_by })

    enhancement feature:filtering 
    opened by redaben 0
  • Rate limit API updates & improvements

    Rate limit API updates & improvements

    Change Summary

    Updates and improvements according to previous discussions.

    PR Checklist

    opened by ozanarmagan 0
  • Add featured on Unzip.dev badge

    Add featured on Unzip.dev badge

    I've written a developer trend issue about Search as a service on Unzip.dev and found about typesense. I created a badge that will show you are one of the key players in the field. I personally, as the creator of Unzip, chose typesense as the personal recommendation to my readers.

    Unzip.dev is a developer trends newsletter with over 2200 tech founders (and growing). Hopefully this badge PR will be helpful for potential users of typesense 🙏

    Change Summary

    Added a featured on badge to the main Readme.

    PR Checklist

    opened by agamm 0
  • Exact match does not account for multiple tokens that are the same for Swedish, German, French and Italian locales

    Exact match does not account for multiple tokens that are the same for Swedish, German, French and Italian locales

    Description

    I've tried searching for the query "I am here I" and the search failed in different ways depending on whether the locale is set to English or Swedish. For English, it didn't return any document, whereas for Swedish it returned all of them, although clearly only one document contains the whole phrase as it is.

    Steps to reproduce

    Here is the snippet to reproduce the issue

    import typesense
    
    ts_client = typesense.Client({
        'api_key': 'Hu52dwsas2AdxdE',
        'nodes': [{
            'host': 'localhost',
            'port': '8108',
            'protocol': 'http'
        }]
    })
    
    ts_client.collections.create({
        'name': 'test',
        'fields': [{'name': 'text', 'type': 'string', 'facet': False, 'locale': 'sv'}]
    })
    
    documents = [
            {'text': 'Hello world, I am here I am!'},
            {'text': 'World, here I am! Hello John.'},
            {'text': 'I am! Hello world! Here comes the introduction'}
    ]
    
    ts_client.collections['test'].documents.import_(documents)
    
    query_text = "I am here I"
    
    query = {
            'q': '"{}"'.format(query_text),
            'query_by': 'text',
            'split_join_tokens': 'off',
            'num_typos': 0,
            'typo_tokens_threshold': 0,
            'drop_tokens_threshold': 0,
            'highlight_fields': 'none',
            'highlight_full_fields': 'none'
    }
    
    print(ts_client.collections['test'].documents.search(query))
    

    Expected Behavior

    No matter the locale, return one document, which is the first document of the collection.

    Actual Behavior

    For English locale, returns 0 documents. For Swedish locale, returns 3 documents (the entire collection).

    Metadata

    Typesense Version: 0.23.1 (via Docker)

    OS: Ubuntu 22.10

    triage 
    opened by dkalpakchi 3
Releases(v0.23.1)
  • v0.23.1(Jul 7, 2022)

    This release contains a few enhancements and bug fixes identified in v0.23.0 which we thought are important enough to warrant a patch release. If you're using v0.23.0, we recommend upgrading.

    Bug Fixes

    • Fixed a potential crash caused by attempting to alter the id field.
    • Fixed synonym queries not respecting the prefix setting.
    • Fixed a rare bug present in exact filtering of string arrays.
    • Fixed a race condition in slow request logging that led to a crash under high concurrency.
    • Better validation of regular expression used in API key collection allow list.
    • ARM builds now work on more ARM processors (previously worked only on Graviton instances)

    Enhancements

    • Improved filtering performance, upto 5x-10x faster when querying a small subset on large datasets
    • Support for sort_by parameter in override rule.
    • Allow word position in a field value to be used a ranking signal via the prioritize_token_position search parameter.
    • Improved stability in rotation of geographically distributed clusters having a large dataset.

    Download

    Please download the appropriate binary archive for your operating system and architecture here: https://typesense.org/downloads/

    NOTE: The new version is also available on Typesense Cloud. If you're already using Typesense Cloud, please reach out to us to have your cluster upgraded to v0.23.1.

    Upgrade

    Before upgrading your existing Typesense cluster to v0.23.1, please review the changes above to prepare your application for the upgrade.

    • ⚠️ For self-hosted Typesense deployment, please refer to the important upgrade section of the 0.23 documentation.
    • For Typesense Cloud, email us at [email protected] and we can do an in-place upgrade for you.
    Source code(tar.gz)
    Source code(zip)
  • v0.23.0(Jun 4, 2022)

    This release contains new features, performance improvements and important bug fixes.

    New Features

    • Phrase search: wrap keywords in a query with double quotes to search them as a phrase, e.g. "new york".
    • Schema changes: we now support fields to be added or dropped from a collection schema in-place.
    • Improved multi-field matching: much better performance and accuracy when query keywords have to be matched across multiple fields in a document.
    • Infix searching: find string within strings, which is useful for entities like model number or email address.
    • Allow string fields to be sorted: sorting on string fields can be enabled by setting sort: true parameter of the field.

    Enhancements

    • Improved update and delete performance of numerical fields by 10x.
    • Emplace mode for imports: using the emplace action creates a document if it does not exist in a collection or updates it (partially or fully) if it already exists.
    • Treat space as typo: search for basket ball if basketball is not found or vice-versa. You can disable this behavior by setting split_join_tokens: false.
    • Improved Cyrillic support: better highlighting and fuzzy search for fields configured with: el, ru, sr, uk and be locales.
    • ARM compatibility: an ARM build is now published for every release.
    • Each multi-search query can have an independent x-typesense-api-key key. This is useful to specify different scoped search API keys for each search.
    • Control the number of words that Typesense considers for typo and prefix searching via the max_candidates parameter.
    • CORS can now be enabled for a specific set of domains using the --cors-domains flag.
    • Search results are now highlighted by prefix, rather than the full world. Previously, searching for "app" will highlight the full word "apple" in the results, but now it will only highlight the "app"le prefix within the word.
    • "Remove Matched Tokens" can be used by itself in Overrides. So you can now setup rules like, if query contains "word", remove "word" from the search query.
    • Ability to toggle if filters should by applied to overrides or not using the filter_curated_hits flag.
    • Ability to hide out_of and search_time_ms from the search API response, using the exclude_fields parameter.
    • Ability to control typo tolerance for facet queries using facet_query_num_typos.
    • Ability to specify which subnet to use for peering using --peering-subnet server parameter.

    Bug Fixes

    • Fixed exact match of synonym query candidates not being ranked correctly.
    • Fix glibc incompatibility on recent Linux distros (Ubuntu 21.04+). #531
    • Fixed the snippet containing the full field value when highlight_full_fields is enabled.
    • Fixed --enable-cors=true flag format not working. Earlier, only the --enable-cors format worked.
    • Fixed exact match for repeated words (when searching for repeated words such as "Boom Boom"). #427
    • Fixed highlight_fields parameter not respecting include_fields. #556
    • Fixed document ids that are accepted with space char (%20) but cannot be deleted later. #574
    • Improved highlighting of text containing punctuations. #528
    • Fixed case sensitivity of facet fields. #504

    Deprecations / behavior changes

    • In prefix queries, only the prefix part of a word in the result is highlighted now, instead of the whole word. For e.g. given a query like "new y", the result will be highlighted as <mark>New Y</mark>ork City.

    Download

    Please download the appropriate binary archive for your operating system and architecture here: https://typesense.org/downloads/

    NOTE: The new version is also available on Typesense Cloud. If you're already using Typesense Cloud, please reach out to us to have your cluster upgraded to v0.23.0.

    Upgrade

    Before upgrading your existing Typesense cluster to v0.23.0, please review the behavior changes above to prepare your application for the upgrade.

    • For self-hosted Typesense deployment, please refer to the important upgrade section of the 0.23.0 documentation. This particular version requires a specific set of upgrade steps especially for multi-node clusters.
    • For Typesense Cloud, visit your clusters page, click on the cluster you want to upgrade, click on "Modify Configuration" on the right pane and schedule a time for your upgrade. Alternatively, you can also email [email protected].
    Source code(tar.gz)
    Source code(zip)
  • v0.22.2(Feb 4, 2022)

    This release contains a couple of bug fixes identified in v0.22.1 which we thought are important enough to warrant a patch release. If you're using v0.22.1, we recommend upgrading.

    Bug Fixes

    • Handle bad geo polygon vertices (e.g. duplicate points).
    • Fixed an edge case in ranking of words that share a prefix during prefix search.
    • Better validation + handling of unexpected data errors during indexing.
    • Fixed a rare but critical bug that manifested during document updates that had performance implications.

    Download

    Please download the appropriate binary archive for your operating system and architecture here: https://typesense.org/downloads/

    NOTE: The new version is also available on Typesense Cloud. If you're already using Typesense Cloud, please reach out to us to have your cluster upgraded to v0.22.2.

    Upgrade

    Before upgrading your existing Typesense cluster to v0.22, please review the behavior changes above to prepare your application for the upgrade.

    • ⚠️ For self-hosted Typesense deployment, please refer to the important upgrade section of the 0.22 documentation.
    • For Typesense Cloud, email us at [email protected] and we can do an in-place upgrade for you.
    Source code(tar.gz)
    Source code(zip)
  • v0.22.1(Dec 11, 2021)

    This release contains a couple of bug fixes identified in v0.22.0 which we thought are important enough to warrant a patch release. If you're using v0.22.0, we recommend upgrading.

    Bug Fixes

    • Fixed an edge case in exporting of documents using a filter_by condition: documents were being duplicated.
    • Allow a document to contain a dict/hashmap field when a wildcard auto (.*) field is present in the collection schema.

    Download

    Please download the appropriate binary archive for your operating system and architecture here: https://typesense.org/downloads/

    NOTE: The new version is also available on Typesense Cloud. If you're already using Typesense Cloud, please reach out to us to have your cluster upgraded to v0.22.1.

    Upgrade

    Before upgrading your existing Typesense cluster to v0.22, please review the behavior changes above to prepare your application for the upgrade.

    • ⚠️ For self-hosted Typesense deployment, please refer to the important upgrade section of the 0.22 documentation.
    • For Typesense Cloud, email us at [email protected] and we can do an in-place upgrade for you.
    Source code(tar.gz)
    Source code(zip)
  • v0.22.0(Dec 8, 2021)

    This release contains new features, performance improvements and important bug fixes.

    New Features

    • Customizable word separators: define a list of special characters via the token_separators configuration during schema creation. These characters are then used as word separators, in addition to space and new-line characters.
    • Index and search special characters: define a list of special characters that will be indexed as text via the symbols_to_index configuration during schema creation.
    • Dynamic filtering based on rules: overrides now support a filter_by clause that can apply filters dynamically to query rules defined in the override.
    • Server side caching: cache search requests for a configurable amount of time to improve perceived latencies on heavy queries. Refer to the use_cache and cache_ttl parameters. By default, caching is disabled.
    • Protection against expensive queries via the use of search_cutoff_ms parameter that attempts to return results early when the cutoff time has elapsed. This is not a strict guarantee and facet computation is not bound by this parameter.
    • Added geo_precision sorting option to geo fields. This will bucket geo points into "groups" determined by the given precision value, such that points that fall within the same group are treated as equal, and the next sorting field can be considered for ranking.

    Enhancements

    • Reduced memory consumption: 20-30% depending on the shape of your data.
    • Improved update performance: updates on string fields are now 5-6x faster.
    • Improved search performance: 20-25% faster on a variety of datasets we tested on.
    • Improved parallelization for multi-collection writes: collections are now indexed independently, making indexing much faster when you are writing to hundreds of collections at the same time.
    • Allow exhaustive searching via the exhaustive_search parameter. Setting ?exhaustive_search=true will make Typesense consider all prefixes and typo corrections of the words in the query without stopping early when enough results are found (drop_tokens_threshold and typo_tokens_threshold configurations are ignored).
    • Exact filtering on strings (using the := operator) no longer requires the field to be defined as a facet.
    • Make minimum word length for 1-char typo and 2-char typos configurable via min_len_1typo and min_len_2typo parameters. Defaults are 4 and 7 respectively.
    • Support filtering by document id in filter_by query.
    • Support API key permission for creating a specific collection: previously, there was no way to generate an API key that allows you to create a collection with a specific name.
    • Allow use of type auto for a field whose name does not contain a regular expression.
    • Geosearch polygon filter automatically sorts the geo points for the polygon in the correct order: so you don't have to define them in counter clockwise order anymore.

    Bug Fixes

    • Fixed edge cases in import of large documents where sometimes, imports hanged mysteriously or ended prematurely.
    • Fixed document with duplicate IDs within an import upsert batch being imported as two separate documents.
    • Fixed fields with names that contain a regular expression acting as an auto type instead of respecting the schema type.
    • Fixed a few edge cases in multi-field searching, especially around field weighting and boosting.
    • Fixed deletion of collections with slashes or spaces in their names not working: you can now URL encode the names while calling the API.

    Deprecations / behavior changes

    • Once you upgrade your Typesense server to v0.22, the data directory cannot be used with v0.21.0 binary again. So, please take a snapshot/backup of the data directory before upgrading. See important upgrade instructions below.
    • The drop_tokens_threshold and typo_tokens_threshold now default to a value of 1. If you were relying on the earlier defaults (10 and 100 respectively), please set these parameters explicitly.
    • Minimum word length for 1-char typo correction has been increased to 4. Likewise, minimum length for 2-char typo has been increased to 7. This has helped to reduce false fuzzy matches. You can use the min_len_1typo and min_len_2typo parameters to customize these default values.

    Download

    Please download the appropriate binary archive for your operating system and architecture here: https://typesense.org/downloads/

    NOTE: The new version is also available on Typesense Cloud. If you're already using Typesense Cloud, please reach out to us to have your cluster upgraded to v0.22.0.

    Upgrade

    Before upgrading your existing Typesense cluster to v0.22, please review the behavior changes above to prepare your application for the upgrade.

    • For self-hosted Typesense deployment, please refer to the important upgrade section of the 0.22 documentation. This particular version requires a specific set of upgrade steps especially for multi-node clusters.
    • For Typesense Cloud, email us at [email protected] and we can do an in-place upgrade for you.
    Source code(tar.gz)
    Source code(zip)
  • v0.21.0(Jul 14, 2021)

    This release contains new features, performance improvements and important bug fixes.

    New Features

    • Geosearch: Use the geopoint data type to index locations, filter and sort on them. We support filtering on records within a given radius and as well as within any arbitrarily defined geo polygon.
    • Wrap literal strings in filter_by values using backticks to ensure that the commas in filter values don't get parsed as a list separator. Example: filter_by: primary_artist_name:=[`Apple, Inc.`]
    • Support exclude / not equals operator for filtering string and boolean facets. Example: filter_by=author:!= JK Rowling
    • Ability to turn off prefix search on a per field basis. For example, if you are querying 3 fields and want to enable prefix searching only on the first field, use ?prefix=true,false,false. The order should match the order in query_by.
    • Ability to turn off typo tolerance on a per field basis. For example, if you are querying 3 fields and want to disable typo tolerance on the first field, use ?num_typos=0,2,2. The order should match the order in query_by.
    • You can now highlight fields that you don't query for. Use ?highlight_fields=title to specify a custom list of fields that should be highlighted.
    • Add filter_by, include_fields and exclude_fields options to documents/export endpoint.

    Enhancements

    • Increased maximum supported length of HTTP query string to 4K characters: if you wish to send larger payloads, use the multi-search end-point.
    • Accept null values for optional fields.
    • Support for indexing pre-segmented text: you can now index content from any logographic language into Typesense if you are able to segment / split the text into space-separated words yourself before indexing and querying. You should also set ?pre_segmented_query=true during searching.
    • If you have some overrides defined but want to disable all of them during query time, you can now do that by setting ?enable_overrides=false.

    Bug Fixes

    • Fixed some edge cases with typo correction not finding the correct matches
    • Ensure that exact matches are ranked above others. Set ?prioritize_exact_match=false to disable this behavior.
    • Fixed collections:* API key permission which was not previously being recognized by the authentication engine.
    • Fixed float facets being displayed with imprecise precision when displayed as string.

    Deprecations

    • There is a change in the upsert behavior to conform to existing popular conventions: The upsert action now requires the whole document to be sent for indexing. If you wish to update part of a document, use the update action.

    Download

    Please download the appropriate binary archive for your operating system and architecture here: https://typesense.org/downloads/

    NOTE: The new version is also available on Typesense Cloud. If you're already using Typesense Cloud, please reach out to us to have your cluster upgraded to v0.21.0.

    Upgrade

    • For self-hosted Typesense deployment, please replace the binary and restart the Typesense process.
    • For Typesense Cloud, email us at [email protected] and we can do an in-place upgrade for you.
    Source code(tar.gz)
    Source code(zip)
  • v0.20.0(Apr 29, 2021)

    This release contains new features, significant performance improvements and important bug fixes.

    New Features

    • Auto schema detection: you can now index documents without a pre-defined schema
    • Data validation during indexing: configure Typesense to coerce, reject or drop bad values
    • Concurrency improvements: utilize all CPU cores and scale to hundreds of thousands of collections

    Enhancements

    • Default sorting field is now optional: when not present, text match score and insertion order are used
    • Allow custom key value to be provided during creation of API keys
    • Faster parallel loading of collections on cold start
    • Ensure that all queried fields are highlighted in search response
    • Reduction in memory consumption of facet fields
    • Validate SSL certificate and key before loading SSL certs from disk

    Bug Fixes

    Deprecations

    • The catch-up-min-sequence-diff and catch-up-threshold-percentage flags that are used for determining the catch up status of a follower, are replaced with healthy-read-lag and healthy-write-lag flags.

    Download

    Please download the appropriate binary archive for your operating system and architecture here: https://typesense.org/downloads/

    NOTE: The new version is also available on Typesense Cloud. If you're already using Typesense Cloud, please reach out to us to have your cluster upgraded to v0.20.

    Source code(tar.gz)
    Source code(zip)
  • v0.19.0(Feb 16, 2021)

    This release contains new features, significant performance improvements and important bug fixes.

    New Features

    • Ability to send multiple search requests in a single HTTP request
    • Ability to limit total number of results that can be fetched using the limit_hits search parameter
    • Support for slow request logs
    • Support numerical range operator in filter_by field:[min..max]

    Enhancements

    • Improved filter_by & facet_by performance during searches, improves performance as much as 60% in some datasets.
    • Detailed stack traces with additional symbols and line numbers
    • Keep existing config files in place when updating RPM package
    • /operations/snapshot endpoint no longer blocks write operations

    Bug Fixes

    • Improved facet query validation
    • Improved override validation
    • Fixed a crash when import requests are aborted
    • Fixed a crash when integer filter values are used for creating scoped api keys

    Deprecations

    • The max_hits search parameter is deprecated and is no longer necessary.

    Download

    • Please download the appropriate binary archive for your operating system and architecture here: https://typesense.org/downloads/
    • The new version is also available on Typesense Cloud: https://cloud.typesense.org/
    Source code(tar.gz)
    Source code(zip)
  • v0.18.0(Dec 29, 2020)

    This release contains several new features and important bug fixes.

    New Features

    • [Feature] Support for query synonyms.
    • [Feature] Ability to assign custom weights to each field being queried upon.
    • [Feature] Total documents in a collection is now returned in every response via the out_of key.
    • [Feature] Exclusion operator - for excluding individual query tokens from results.
    • [Feature] Vote API for triggering rotation of leader in a multi-node cluster.
    • [Feature] On-demand snapshot API: allows you to create a backup with a single API call.
    • [Feature] Support for expiration of API keys
    • [Feature] Support regex for defining allowed collection names of an API key.
    • [Feature] Support operators in multi-valued numerical filter.

    Improvements

    • [Performance] Faster snapshot transfer and copy in multi-regional clusters.
    • [Performance] Speed up filters on numerical fields.
    • [Feature] Better search relevance when searching across multiple fields: addressed some edge cases in multi-field queries.
    • [Feature] Support bulk imports of upto 3 GB in a single POST API call.

    Bug fixes

    • [Bug fix] Fixed an edge case in fuzzy search that missed some tokens during exact searches.
    • [Bug fix] Prefix matches are assigned lesser importance than exact matches.
    • [Bug fix] Debug end-point is now available even when node is not ready to serve searches.
    • [Bug fix] Fixed >= operator not working well with negative values.
    • [Bug fix] Fixed non-ascii characters not encoded properly in highlight snippet.
    • [Bug fix] Fixed issue where duplicate results were returned across pages.
    • [Bug fix] Fixed issue with pinned results being duplicated.

    Download

    • Please download the appropriate binary archive for your operating system and architecture here: https://typesense.org/downloads/
    • The new version is also available on Typesense Cloud: https://cloud.typesense.org/
    Source code(tar.gz)
    Source code(zip)
  • v0.17.0(Nov 17, 2020)

    This release contains a few new features and important bug fixes.

    [Feature] Matched tokens are returned in the highlight response structure. [Feature] Customization of the start and end HTML tags used for highlighting (default being the mark tag). [Feature] Delete documents that match a filter query. [Feature] Tokenizer now splits text on new line characters, in addition to space. [Bug fix] Fixed a bug that prevented single document updates from being available on the Raft log. [Bug fix] Validate data types of the fields of a collection schema during collection creation. [Bug fix] Ignore invalid unicode characters when returning search response. Earlier, this was causing a crash in some rare cases. [Bug fix] Allow the colon character (:) to be present in the filter query value.

    Please download the appropriate binary archive for your operating system and architecture: https://typesense.org/downloads/

    Source code(tar.gz)
    Source code(zip)
  • v0.16.1(Nov 7, 2020)

    This is a maintenance release to fix an issue we identified with the updates feature.

    [Bug fix] Updates to string array fields were causing a crash during faceting.

    Please download the appropriate binary archive for your operating system and architecture: https://typesense.org/downloads/

    Source code(tar.gz)
    Source code(zip)
  • v0.16.0(Oct 25, 2020)

    The primary focus of this release is to provide update support for documents.

    [Feature] Support partial updates or upserts of documents. [Feature] Parameterize the number of tokens that surround a highlight via the new highlight_affix_num_tokens parameter. [Bug fix] When a document is not imported due to an error, the full document was not always being returned in the import response. This has been addressed in this release.

    Please download the appropriate binary archive for your operating system and architecture: https://typesense.org/downloads/

    Source code(tar.gz)
    Source code(zip)
  • v0.15.0(Sep 19, 2020)

    This release contains several new features, bug fixes and performance improvements.

    [Performance] Adopted jemalloc: we're now using jemalloc as the memory allocator. In our tests, jemalloc showed significantly better performance and lower memory fragmentation. [Performance] Streaming import: You can now safely import large number of documents into Typesense without a drastic impact on search latency. We've also changed the output format of the import end-point: the response will now be in JSON lines rather than as a full-fledged JSON document. [Performance] Significant performance improvement in wildcard queries and faceting involving array fields. [Feature] Allow default sorting field to be an int64. [Feature] Ensured that the server returns a 503 response when it is still catching up on the writes from the leader. This threshold can be controlled by the --catch-up-threshold-percentage argument (default: 95). [Feature] Data snapshot interval can now be customized by the --snapshot-interval-seconds argument (default: 3600). [Feature] Metrics API: we've added a /metrics.json end-point that returns CPU, storage and memory metrics. [Feature] Exact filtering on string field: It's now possible to match a facet-enabled string field exactly in the filter query by using the := operator. [Bug fix] Clustering improvements: We've fixed a number of performance issues and edge cases by extensively benchmarking the clustering implementation via multi-region deployments. [Bug fix] Fixed a race condition that sometimes prevented a Typesense node from recognizing custom generated API keys. [Bug fix] Fixed an edge case in text match score calculation that caused relevancy issues on long queries. [Bug fix] Fixed a crash that happened when an int32 field was filtered by a number exceeding the range of a valid int32 value.

    Please download the appropriate binary archive for your operating system and architecture: https://typesense.org/downloads/

    Source code(tar.gz)
    Source code(zip)
  • v0.14.0(Jul 4, 2020)

    In this release, we announce support for grouping documents on one or more fields. There are also a number of bug fixes.

    [Feature] Group by: documents can now be grouped on one or more fields. You can also limit each group to the top K hits within the documents matching that group. [Bug fix] Fixed an edge case in filtering of documents by int64 field. [Bug fix] Allow float array field to accept integer values (i.e. whole numbers). [Bug fix] Deletion of records with optional fields. [Bug fix] Collection schema API response should contain the optional attribute of fields in the schema.

    Please download the appropriate binary archive for your operating system and architecture: https://typesense.org/downloads/

    Source code(tar.gz)
    Source code(zip)
  • v0.13.0(Jun 1, 2020)

    In this release, we announce the support for API key management.

    • [Feature] API key management. You can generate API keys with fine-grained access control restrictions for better security.
    • [Deprecation] Command line --search-only-key option is removed. Please use the key generation API to generate a key with search-only permission.
    • [Deprecation] The max_hits search query parameter is removed. Please use the per_page parameter as a replacement.

    Please download the appropriate binary archive for your operating system and architecture: https://typesense.org/downloads/

    Source code(tar.gz)
    Source code(zip)
  • v0.12.0(May 24, 2020)

    This is a major release packed with multiple new features and a few bug fixes.

    • [Feature] Raft-based clustering for high write+read availability. We also deprecate the read-only replication feature supported via the --master argument.
    • [Feature] Ability to Curate / Merchandize search results is now available in the open source version, via the Overrides feature.
    • [Feature] Ability to pin or hide specific documents during query time
    • [Feature] Ability to create Aliases for collections is now available in the open source version.
    • [Feature] Allow the maximum number of results returned to be configurable via the max_hits parameter. Previously only the top 500 results were returned.
    • [Feature] Facet search: facet values that are returned can now be filtered via the facet_query parameter. The matching facet text is also highlighted.
    • [Feature] Allow integer and float values to be facetable.
    • [Feature] Facet stats such as min/max/avg are computed for numerical facet fields.
    • [Feature] Allow fields to be marked as optional in the schema.
    • [Feature] Expose typo_tokens_threshold parameter: If the number of results found for a specific query is less than this number, Typesense will attempt to look for tokens with more typos until enough results are found. Previously, this was hard-coded to 100.
    • [Feature] The underlying string similarity score is exposed as _text_match and can be used as a sorting field parameter.
    • [Security] Enforce API key authentication always for search end-point. Previously, search endpoint was open unless a search-only API key was explicitly defined.
    • [Bug] Ensure that float fields defined as a sorting field accepted integer values.
    • [Bug] Fixed an edge case that resulted in incomplete deletion of string array values when a document is deleted.
    • [Others] Adopted GPL v3 license.

    Please download the appropriate binary archive for your operating system and architecture: https://typesense.org/downloads/

    Source code(tar.gz)
    Source code(zip)
  • v0.11.2(Feb 3, 2020)

    This is a maintenance release with bug fixes.

    Bug fixes

    • Ensure that default sorting field exists in schema during collection creation.
    • Fixed the environment variable examples mentioned in the commandline help text.
    • Ensure that the hits and found JSON fields were always returned in response, even if the query produced no results.

    Downloads

    Please download the appropriate binary archive for your operating system and architecture: https://typesense.org/downloads/

    Docker: docker pull typesense/typesense:0.11.2

    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Nov 3, 2019)

    This is a maintenance release with some important bug fixes.

    Maintenance changes

    • DEB and RPM packages are now available: https://typesense.org/downloads/
    • OpenSSL upgrade.

    Bug fixes

    • Fixed an edge case in indexing of non-ASCII characters.
    • Fixed an edge case in replication.

    Downloads

    Please download the appropriate binary archive for your operating system and architecture: https://typesense.org/downloads/

    Docker: docker pull typesense/typesense:0.11.0

    Source code(tar.gz)
    Source code(zip)
  • v0.10.1(Jul 9, 2019)

    This release contains one new feature and a couple of bug fixes.

    New feature

    • You can control the maximum number of facet values returned in the search results via the max_facet_values parameter of the search end-point.

    Bug fixes

    • Fix long queries causing highlighter to misbehave and sometimes crash.
    • Fix facet counts not showing up in wildcard searches.

    Downloads

    Please download the appropriate binary archive for your operating system and architecture.

    Linux(64-bit): https://dl.typesense.org/releases/typesense-server-0.10.1-linux-amd64.tar.gz

    Mac OS X (64-bit): https://dl.typesense.org/releases/typesense-server-0.10.1-darwin-amd64.tar.gz

    Docker: docker pull typesense/typesense:0.10.1

    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Jun 2, 2019)

    This release contains new features, bug fixes and performance improvements.

    New features

    • Importing documents: added a /collections/:collection/documents/import endpoint to which you can POST documents for import.
    • Load configuration from environment variables or config file: Arguments to the Typesense server can now be passed via environment variables or through a configuration file.

    Bug fixes

    • Filter on facet fields: When a facet field is filtered upon, search should be verbatim.
    • Improve memory and disk consistency: Fixed edge cases when in-memory index could potentially go out of sync with on-disk storage (rarely).

    Performance

    • Faster collection initialization: Server initialization time has been significantly reduced for large collections.

    Downloads

    Please download the appropriate binary archive for your operating system and architecture.

    Linux(64-bit): https://dl.typesense.org/releases/typesense-server-0.10.0-linux-amd64.tar.gz

    Mac OS X (64-bit): https://dl.typesense.org/releases/typesense-server-0.10.0-darwin-amd64.tar.gz

    Docker: docker pull typesense/typesense:0.10.0

    Source code(tar.gz)
    Source code(zip)
  • v0.9.2(Sep 8, 2018)

    This is largely a maintenance release with some minor feature additions:

    New features

    • Health check: added a /health health check endpoint that return 200 status code when the service is up.
    • Latin character substituition: latin characters are "normalized" to their ASCII equivalent. For e.g ß is substituted with ss.
    • Collection creation timestamp: the collection summary data now includes the created_at UNIX timestamp (second since epoch).

    Bug fixes

    • Fixed a concurrency related issue with the replication
    • Fixed wrong datetime appearing in log files on Linux

    Downloads

    Please download the appropriate binary archive for your operating system and architecture.

    Linux(64-bit): https://dl.typesense.org/releases/typesense-server-0.9.2-linux-amd64.tar.gz

    Mac OS X (64-bit): https://dl.typesense.org/releases/typesense-server-0.9.2-darwin-amd64.tar.gz

    Docker: docker pull typesense/typesense:0.9.2

    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(Jun 9, 2018)

    This is a bug fix release:

    Fixed an edge case that caused Typesense to crash on some queries involving fields with large text content.

    Downloads

    Please download the appropriate binary archive for your operating system and architecture.

    Linux(64-bit): https://dl.typesense.org/releases/typesense-server-0.9.1-linux-amd64.tar.gz

    Mac OS X (64-bit): https://dl.typesense.org/releases/typesense-server-0.9.1-darwin-amd64.tar.gz

    Docker: docker pull typesense/typesense:0.9.1

    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Jun 4, 2018)

    New features

    1. Allow query string to be optional. You can now specify a wildcard query using a * to consider all records for filtering and sorting. If no sorting field is specified, records will be sorted according to the default_sorting_field specified during collection creation.

    2. Return all fields that match in the highlight section of search results. When searching for a query in multiple fields in a document, v0.8.0 used to return only the best matched field in the highlight section. 0.9.0 returns all fields that match the query string in the highlight section. This required a change in the JSON response structure. Here are the exact details of the change expressed in API Spec format. Learn how to upgrade here.

    3. Highlight matched query text in string[] fields.

    4. Ability to restrict fields returned. Introduces two additional parameters to the search end-point: useinclude_fields and exclude_fields to customize the document fields that are returned in the search result response.

    5. Introducesdrop_tokens_threshold parameter to the search end-point: if the number of results found for a specific search query is less than this number, Typesense will attempt to drop the tokens in the query until enough results are found. Tokens that have the least individual hits are dropped first. Set drop_tokens_threshold=0 to disable dropping of tokens.

    Bug fixes

    1. Fixed a random crash during start-up caused by a RocksDB bug.
    2. Fixed an edge case in fuzzy prefix searches which ignored some matching words.

    Downloads

    Please download the appropriate binary archive for your operating system and architecture.

    Linux(64-bit): https://dl.typesense.org/releases/typesense-server-0.9.0-linux-amd64.tar.gz

    Mac OS X (64-bit): https://dl.typesense.org/releases/typesense-server-0.9.0-darwin-amd64.tar.gz

    Docker: docker pull typesense/typesense:0.9.0

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Apr 4, 2018)

    Please download the appropriate binary archive for your operating system and architecture.

    Linux(64-bit): https://dl.typesense.org/releases/typesense-server-0.8.0-linux-amd64.tar.gz

    Mac OS X (64-bit): https://dl.typesense.org/releases/typesense-server-0.8.0-darwin-amd64.tar.gz

    Source code(tar.gz)
    Source code(zip)
Owner
Typesense
Fast, typo tolerant search engine for building delightful search experiences.
Typesense
weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

Google Project Zero 2k Dec 28, 2022
Building and Executing Position Independent Shellcode from Object Files in Memory

PIC-Privileges Building and Executing Position Independent Shellcode from Object Files in Memory. This is a pingback to the blogpost I wrote at https:

Paranoid Ninja 108 Dec 26, 2022
A set of very empty header files that can be used when building apps with Cosmopolitan

cosmo-include A set of very empty header files that can be used when building apps with Cosmopolitan Why? When you build an application with Cosmopoli

null 26 Dec 21, 2022
NightDriverStrip is a source code package for building a flash program that you upload to the ESP32 microcontroller.

NightDriverStrip is a source code package for building a flash program that you upload to the ESP32 microcontroller.

Plummer's Software LLC 631 Dec 28, 2022
Building Netbsd's Amiga port 'loadbsd' tool.

Building Netbsd's Amiga port 'loadbsd' tool. During a netbsd install, I realized the binary Netbsd provided didn't match the functionality described i

Roc Vallès 1 Oct 5, 2021
A sample project for building Zygisk modules

Developing Zygisk Modules This repository hosts a template zygisk module for developers to start developing Zygisk modules. Before developing Zygisk m

John Wu 237 Jan 1, 2023
The c++ micro framework for building web applications based on workflow

wfrest The c++ micro framework for building web applications based on workflow ⌛️ Build Step 1 : install workflow git clne [email protected]:sogou/workfl

Shiyu Yi 539 Jan 3, 2023
Design files and resources for building a wireless N64 controller

wireless-n64-controller This project and its documentation is a Work-In-Progress. I'm still working on writing everything down and working out kinks i

Spencer Fraint 35 Dec 23, 2022
Qtile ISO profile for building Woof OS using `archiso` with zen kernel

iso-profile ISO profile for Woof OS ISO profile for building Woof OS using archiso Building the ISO profile into an ISO If you are on an Arch based sy

Woof OS 0 Jan 27, 2022
A framework for building native Windows apps with React.

React Native for Windows Build native Windows apps with React. See the official React Native website for an introduction to React Native. React Native

Microsoft 15.2k Jan 2, 2023
A framework for building Mobile cross-platform UI

Weex A framework for building Mobile cross-platform UI. Distribution Support Android 4.1 (API 16), iOS 9.0+ and WebKit 534.30+. platform status Androi

Alibaba 18k Jan 9, 2023
SDK for building cross-platform desktop apps in ANSI-C

NAppGUI Cross-Platform C SDK. Build portable desktop applications for Windows, macOS and Linux, using just C. Quick start in Windows Prerequisites Vis

Francisco García Collado 242 Jan 2, 2023
Repository for building and operating REVOLVER: An automatic protein purification system for gravity columns. Developed at the University of Toronto.

REVOLVER: An automated protein purification system This repository contains all the hardware and firmware files to build and operate REVOLVER: an auto

Laboratory for Metabolic Systems Engineering 3 Jun 14, 2022
MyOwnBricks - A library for building your own sensors and devices compatible with the modern LEGO PoweredUp system.

English version (See at the end for the French version) MyOwnBricks MyOwnBricks is a library for building your own sensors and devices compatible with

null 5 Sep 26, 2022
Pipet - c++ library for building lightweight processing pipeline at compile-time for string obfuscation, aes ciphering or whatever you want

Pipet Pipet is a lightweight c++17 headers-only library than can be used to build simple processing pipelines at compile time. Features Compile-time p

C. G. 60 Dec 12, 2022
NavMeshComponents - High Level API Components for Runtime NavMesh Building

Status of the project Development This project is now developed as part of the AI Navigation package. Please add that package to your project in order

Unity Technologies 2.7k Jan 8, 2023
An eventing framework for building high performance and high scalability systems in C.

NOTE: THIS PROJECT HAS BEEN DEPRECATED AND IS NO LONGER ACTIVELY MAINTAINED As of 2019-03-08, this project will no longer be maintained and will be ar

Meta Archive 1.7k Dec 14, 2022
POCO C++ Libraries are powerful cross-platform C++ libraries for building network

The POCO C++ Libraries are powerful cross-platform C++ libraries for building network- and internet-based applications that run on desktop, server, mobile, IoT, and embedded systems.

POCO C++ Libraries 6.7k Jan 1, 2023
Building blocks for modern GNOME applications

Adwaita Building blocks for modern GNOME applications. License Libadwaita is licensed under the LGPL-2.1+. Building We use the Meson (and thereby Ninj

Muqtadir 3 Jan 19, 2022