Skip to content

Commit

Permalink
Merge pull request #1097 from swirlai/jan-2024-prs
Browse files Browse the repository at this point in the history
Documentation for DS-1474, DS-1276, DS-806, DS-1310, DS-1394
  • Loading branch information
dnicodemus authored Jan 6, 2024
2 parents d64cd76 + 1341de5 commit ec07449
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 1 deletion.
8 changes: 7 additions & 1 deletion docs/Developer-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ This guide is intended to provide developers with an overview of Swirl and how t

* Executes query processing as specified in `Search.query_processors`

* Constructs and validates the query for the SearchProvider using the `url`, `query_template`, and `query_mappings`
* Constructs and validates the query for the SearchProvider using the `url`, `query_template` or `query_template_json`, and `query_mappings`

* Connects to the SearchProvider, sends the query, and gathers the response

Expand Down Expand Up @@ -355,6 +355,12 @@ If you want to apply spellcheck to a single SearchProvider, put it in that Searc
{: .warning }
Use Spellcheck cautiously as it tends to cause a lack of results from sources that have sparse indexes and limited or no fuzzy search.

## Adjust Relevancy for a Single SearchProvider

Swirl 3.2 includes a new `RequireQueryStringInTitleResultProcessor`. If installed after the MappingResultProcessor it will drop results that don't include the user's query in the title.

This processor is intended for use with sources like LinkedIn that frequently return related profiles that mention a person, but aren't about them. (Swirl will normally rank these results poorly, but this will eliminate them entirely.)

## Expire Search Objects

If your Swirl installation is using the [Search Expiration Service](Admin-Guide.md#search-expiration-service), users can specify the retention setting for each Search.
Expand Down
53 changes: 53 additions & 0 deletions docs/User-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,8 @@ If you have the raw JSON of SearchProvider, install it by copying/pasting into t
3. Paste one SearchProvider's JSON at a time into the form and press the `POST` button
4. Swirl will respond with the finished SearchProvider

As of Swirl 3.2 you can copy/paste lists of SearchProviders into the endpoint, and Swirl will load them all.

## Bulk Loading

Use the included [`swirl_load.py`](https://github.com/swirlai/swirl-search/blob/main/swirl_load.py) script to load any SearchProvider instantly, including lists of providers.
Expand Down Expand Up @@ -330,6 +332,26 @@ From here, you can use the form at the bottom of the page to:
* DELETE this SearchProvider, forever
* Edit the configuration of the SearchProvider and `PUT` the changes

## Query Templating

Most SearchProviders require a `query_template``. This is usually bound to query_mappings during the federation process. For example, here is the template for the MongoDB movie table:

```
"query_template": "{'$text': {'$search': '{query_string}'}}",
```

This format is not actually JSON, but rather a string. The single quotes are required, so that the JSON can use double quotes.

As of Swirl 3.2, elastic, opensearch and MongoDB all use the new `query_template_json` field, which stores the template as JSON. For example, here is the MongoDB `query_template_json`:

```
"query_template_json": {
"$text": {
"$search": "{query_string}"
}
},
```

## Organizing SearchProviders with Active, Default and Tags

Three properties of SearchProviders are intended to allow expressive querying by targeting all or part of a query to groups of sources.
Expand Down Expand Up @@ -427,6 +449,8 @@ INFO search.py: invoking processor: CosineRelevancyPostResultProcessor
],
```

Swirl Release 3.2 includes a new RequireQueryStringInTitleResultProcessor. This processor may be installed after the MappingResultProcessor. It drops result items that don't include the user's query in the title. It is recommended for use with noisy services like LinkedIn via Google PSE.

## Authentication & Credentials

The `credentials` property stores any required authentication information for the SearchProvider. The supported types are as follows:
Expand Down Expand Up @@ -508,6 +532,34 @@ The mappings `url=link` and `body=snippet` map the Swirl result fields to the co
{: .highlight }
For Release 2.5.1, [`requests.py`](https://github.com/swirlai/swirl-search/blob/main/swirl/connectors/requests.py) was updated to handle XML responses from source APIs and convert them to JSON for mapping in SearchProvider configurations.

{: .highlight }
For Release 3.2, [`requests.py`](https://github.com/swirlai/swirl-search/blob/main/swirl/connectors/requests.py) was updated to handle list-of-list responses from source APIs, where the first list element is the field names. For example:

```
[
[
"urlkey",
"timestamp",
"original",
"mimetype",
"statuscode",
"digest",
"length"
],
[
"today,swirl)/",
"20221012214440",
"http://swirl.today/",
"text/html",
"301",
"EU3373LKG36VJYZN2MKR4WENHBGK4DCL",
"361"
],
...etc...
```

Swirl will automatically convert this format to a JSON array of dicts, with the fieldnames specified in the first element.

### Multiple Mappings

As of version 1.6, Swirl can map multiple SearchProvider fields to a single Swirl field, aggregating multiple responses in the PAYLOAD field as necessary.
Expand Down Expand Up @@ -554,6 +606,7 @@ The following table explains the `result_mappings` options:
| sw_btcconvert | An optional directive which will convert the provided Satoshi value to Bitcoin; it can be used anyplace in the template such as `result_mappings` | `sw_btcconvert(<fee>)` |
| NO_PAYLOAD | By default, Swirl copies all result keys from the SearchProvider to the PAYLOAD. If `NO_PAYLOAD` is specified, Swirl copies only the explicitly mapped fields.| `NO_PAYLOAD` |
| FILE_SYSTEM | If specified, Swirl will assume that this SearchProvider is a file system and weight matches against the `body` higher. | `FILE_SYSTEM` |
| LC_URL | If specified, Swirl will convert the `url` field to lower case. | `LC_URL` |
| BLOCK | As of Release 3.1.0, this feature is used exclusively by Swirl's RAG processing; that output appears in this `info` block of the Result object. | `BLOCK=ai_summary` |

#### Date Published Display
Expand Down

0 comments on commit ec07449

Please sign in to comment.