Grebennikov Roman | Haystack EU, 2022
a swiss army knife of re-ranking
Inspired by GCP Retail Events, Segment.io Ecom Spec:
{
"event": "item",
"id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
"item": "product1",
"timestamp": "1599391467000",
"fields": [
{"name": "title", "value": "Nice jeans"},
{"name": "price", "value": 25.0},
{"name": "color", "value": ["blue", "black"]},
{"name": "availability", "value": true}
]
}
{
"event": "ranking",
"id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
"timestamp": "1599391467000",
"user": "user1",
"session": "session1",
"fields": [
{"name": "query", "value": "socks"}
],
"items": [
{"id": "item3", "relevancy": 2.0},
{"id": "item1", "relevancy": 1.0},
{"id": "item2", "relevancy": 0.5}
]
}
{
"event": "interaction",
"id": "0f4c0036-04fb-4409-b2c6-7163a59f6b7d",
"impression": "81f46c34-a4bb-469c-8708-f8127cd67d27",
"timestamp": "1599391467000",
"user": "user1",
"session": "session1",
"type": "purchase",
"item": "item1",
"fields": [
{"name": "count", "value": 1},
{"name": "shipping", "value": "DHL"}
],
}
Demo: ranklens dataset
Goal: cover 90% most common ML features
// take a value from item metadata
- name: budget
type: number
scope: item
source: item.budget
ttl: 60 days
// one-hot/label encode a string
- name: genre
type: string
scope: item
source: item.genre
values:
- comedy
- drama
- action
// index encode mobile/desktop/tablet category
// from User-Agent field
- name: platform
type: ua
field: platform
source: ranking.ua
// count how many clicks were done on a product
- name: click_count
type: interaction_count
scope: item
interaction: click
// A sliding window count of interaction events
// for a particular item
- name: item_click_count
type: window_count
interaction: click
scope: item
bucket_size: 24h // make a counter for each 24h rolling window
windows: [7, 14, 30, 60] // on each refresh, aggregate to 1-2-4-8 week counts
refresh: 1h
// Click-through rate
- name: CTR
type: rate
top: click // divide number of clicks
bottom: impression // to number of examine events
scope: item
bucket: 24h // aggregate over 24-hour buckets
periods: [7, 14, 30, 60] // sum buckets for multiple time ranges
// Does this user had an interaction before
// with other item with the same field value?
- name: clicked_color
type: interacted_with
interaction: click
field: metadata.color
scope: user
- name: title_match
type: field_match
itemField: item.title
rankingField: ranking.query
method:
type: ngram
n: 3
Demo: ranklens config
Demo: import and training the model
Demo: sending requests
We built Metarank to solve our problem.
But it may be also useful for you