-

@ mleku
2025-05-18 15:25:13
humbug... so i did some research on these things and almost all of them involve having giant dictionaries for "stemming" and shit, and i'm like, uh. i just want something simple that gives a reasonable sort of stuff that most of the results in the top 50% of the results are actually relevant
proximity searching was the most generalised type of search
most of the search algorithms are tailored for large documents, not small ones, and most nostr text events are between 1 and about 5000 words long, so i think that proximity search is the right search
i know i can write a function that evaluates relative ordering compared to the search query text, it's just a little tricky constructing the loop so it walks the search terms and then creates a list of counts of how many of the hits on the text come in the same order as the others, and then further evaluating that by the distance between terms, as closer distances are more likely to be relevant
i am NOT making this language specific because most of the time the posts i see on nostr are either english, spanish, portugese, japanese, chinese or russian
i could maybe make it so the matching is fuzzy so it allows prefix or suffix or infix matches to apply as a hit, but i want to just try and make it work for simple whole word matches with proximity and sequence, i think