S StreamifyЛокальная аналитика Яндекс Музыки

Личная аналитика Яндекс Музыки на вашем ноутбуке.

Как готовить будущие карты без опасных догадок о местоположении. Метаданные остаются локально: ingestion, DuckDB/dbt, dashboard, отчеты, action queues и воспроизводимая документация.

Жанровый сдвиг
Artist gravity 3.4x
Playlist overlap 0.28
Monthly rhythm

Future Location Enrichment Contract

This document sketches a future, optional geo/location data contract for Streamify. It is not part of the current Yandex Music local ingestion path and should not be treated as an implemented feature.

Why Yandex Music Metadata Is Not Enough

The current Yandex Music adapter reads account-visible metadata through the yandex-music Python client: tracks, artists, albums, playlists, playlist membership, liked markers and derived library events. That metadata does not include a stable listening location field.

Even when timestamps are present, they describe a music library action such as a like, playlist membership or account-visible history item. They do not prove where the user was when listening. Region, catalog availability, account locale, artist country or playlist language are not user location signals.

Safe User Location Sources

Location enrichment must be user-supplied, optional and separable from music ingestion. Possible sources over time:

  • Google Timeline / Google Takeout Location History, if the user has it enabled and explicitly exports it.
  • iOS Significant Locations, noted as a possible on-device source but not practically exportable for reliable Streamify ingestion.
  • Photo EXIF GPS coordinates, only from user-selected photos and only with explicit consent.
  • Calendar or travel exports, such as flight, hotel, event or trip records that the user explicitly provides.
  • A manual city timeline maintained by the user, for example date ranges such as 2025-06-01 to 2025-06-14 in Tbilisi, Georgia.
  • Network or IP logs only if the user explicitly provides them and understands their limits. Streamify should not collect IP logs implicitly.

Sources that are not safe defaults:

  • Inferring user location from artist origin, track language, genre, playlist name or Yandex account region.
  • Scraping device, browser or network location without a deliberate user import step.
  • Treating coarse country or IP geolocation as precise movement history.

user_location_events

user_location_events should represent where the user may have been during a time interval, with confidence and provenance.

Suggested fields:

FieldTypeNotes
location_event_idstringStable hash of source, source row id and normalized time interval.
sourcestringgoogle_takeout, photo_exif, calendar_travel, manual_city_timeline, network_ip_log, or another explicit import source.
source_record_idstringOptional source-local identifier, redacted where needed.
started_attimestampInclusive UTC start time.
ended_attimestampExclusive UTC end time; may equal started_at for point observations.
timezonestringIANA timezone when known.
latitudedoubleOptional; omit or round when only coarse location is needed.
longitudedoubleOptional; omit or round when only coarse location is needed.
citystringOptional normalized city.
regionstringOptional state/province/region.
country_codestringOptional ISO 3166-1 alpha-2 code.
precision_metersintegerApproximate spatial precision or bucket size.
confidencedouble0.0 to 1.0; manual ranges and IP-derived locations should usually be lower confidence than direct GPS.
is_inferredbooleanTrue when the row is inferred from an indirect source such as calendar travel or IP logs.
consent_scopestringUser-approved scope, such as analytics_only or city_level_only.
imported_attimestampTime Streamify imported the row.

The table should allow overlapping rows because real-world sources conflict. Downstream joins must choose a deterministic tie-break rule instead of assuming one location per timestamp.

artist_locations

artist_locations should describe artist-associated places, not user listening places. It can support questions such as geographic diversity of artists, but it must never be used as evidence of where the user listened.

Suggested fields:

FieldTypeNotes
artist_location_idstringStable hash of artist id, source and normalized location.
artist_idstringStreamify/Yandex artist identifier when available.
artist_namestringDisplay name for review and fallback matching.
sourcestringDiscogs, MusicBrainz, Wikidata, manual curation or another cited source.
source_urlstringOptional provenance URL.
location_typestringorigin, formed_in, based_in, birthplace, scene, or label_location.
started_atdateOptional date when the association began.
ended_atdateOptional date when the association ended.
citystringOptional normalized city.
regionstringOptional region.
country_codestringOptional ISO 3166-1 alpha-2 code.
latitudedoubleOptional coarse coordinate for mapping.
longitudedoubleOptional coarse coordinate for mapping.
confidencedouble0.0 to 1.0; biographies and crowd-sourced sources require care.
notesstringOptional caveat for ambiguous or multi-location artists.

Joining Location To Library Events

The future join should be timestamp-based and explicit about uncertainty:

Normalize all user_library_events.event_at, user_location_events.started_at and user_location_events.ended_at values to UTC.

For each library event with a usable timestamp, find location events where started_at <= event_at < ended_at.

If no interval matches, optionally search nearest point observations within a configured window, such as 30 minutes for GPS-like data or one day for manual city timelines.

Rank candidates by source trust, precision, confidence, non-inferred status and distance from the observation time.

Persist the selected match in a bridge table such as user_library_event_locations, including location_event_id, match_method, match_confidence, time_delta_seconds and location_precision_meters.

Keep unmatched music events. Missing location is expected and should not fail ingestion.

Recommended bridge fields:

FieldTypeNotes
event_location_idstringStable hash of library event and selected location event.
library_event_idstringExisting music/library event id.
location_event_idstringSelected user location event id.
match_methodstringinterval_exact, nearest_point, manual_range, calendar_range, ip_coarse, or similar.
match_confidencedoubleCombined confidence after tie-breaking.
time_delta_secondsinteger0 for interval matches; signed delta for nearest-point matches.
location_precision_metersintegerSpatial precision used for the match.

Privacy Constraints

  • Location imports must be opt-in and separate from YANDEX_MUSIC_TOKEN setup.
  • Raw high-precision location files should remain local, ignored by git and excluded from reports by default.
  • Default analytics should use city, region or country buckets instead of exact coordinates.
  • Users must be able to delete imported location data without deleting music metadata.
  • Reports, snapshots and dashboards should label location-derived metrics as optional and source-dependent.
  • The manifest should store row counts, source names and checksums, not raw coordinates or sensitive source identifiers.
  • Consent scope should travel with derived rows so a city-only import is not later used for exact maps.
  • IP-derived rows must be marked inferred and coarse, and must never be collected implicitly.

Inference Caveats

Location enrichment can answer "what music-library event happened while the user's provided location data suggests they were in this place?" It cannot prove the user listened there unless the source event itself is a trustworthy listening event and the location source is accurate for the same time.

Important caveats:

  • Library likes and playlist edits can happen long after listening.
  • Manual city timelines are useful for coarse trip context but poor for exact movement.
  • Calendar travel can describe intended plans, not actual presence.
  • GPS and photo EXIF can be sparse and biased toward moments when photos were taken.
  • IP geolocation can be wrong because of VPNs, mobile carriers, corporate networks and provider databases.
  • Artist location is artist metadata, not user location.

Any product surface using this contract should show confidence and source labels, avoid precise claims, and prefer language such as "associated with your provided location timeline" over "listened in."