Rendered at 03:34:40 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
netdur 3 days ago [-]
I tried TQ for vector search and my findings is not good, it is not worth it if you cannot use GPU, however I got same quality of search as 32f using 8bit quant
I wrote ann ext for sqlite, using tq, I do save a lot on space but 32f is still faster despite everything I have tried
you’re right that 32f is faster on raw query time, quantization adds extra step. main benefit on download size since gzip won’t help much, which matters most in browser contexts
ninja3925 3 days ago [-]
So i assumed it would get crushed by OPQ (which requires training)
refulgentis 3 days ago [-]
Sloppiest slop I've seen in a couple weeks:
- fork of a fork of a quantization technique
- Only contribution is...compiling JS to WASM by default?
- suspicious burst of ~nothing comments from new accounts
- 6 comments 7 hours in, 4 flagged/dead, other 2 also spammy, confused and making category errors at best, at worst, more spam.
- Demo shows it's worse: 800 ms instead of 2.6 ms for text embedding search
- "but it saves space" - yes! 1.2 MB in RAM instead of 7.2 MB to turn search into 1s on a MacBook Pro M4 Max, instead of sub-frame duration.
- It's not even wrong to do this with the output embeddings, there's way more obvious ways to save space that don’t affect retrieval time this much
glohbalrob 3 days ago [-]
Very cool. I added the new multi embedding 2 model to my site the other week from google
I guess need to dig into this and see if it’s faster and has more use cases! Thanks for publishing your work
hhthrowaway1230 3 days ago [-]
Awesome! Also love the gaussian splat demo, cool use case!
I wrote ann ext for sqlite, using tq, I do save a lot on space but 32f is still faster despite everything I have tried
code here https://github.com/netdur/munind/tree/main/src/tq
- fork of a fork of a quantization technique
- Only contribution is...compiling JS to WASM by default?
- suspicious burst of ~nothing comments from new accounts
- 6 comments 7 hours in, 4 flagged/dead, other 2 also spammy, confused and making category errors at best, at worst, more spam.
- Demo shows it's worse: 800 ms instead of 2.6 ms for text embedding search
- "but it saves space" - yes! 1.2 MB in RAM instead of 7.2 MB to turn search into 1s on a MacBook Pro M4 Max, instead of sub-frame duration.
- It's not even wrong to do this with the output embeddings, there's way more obvious ways to save space that don’t affect retrieval time this much
I guess need to dig into this and see if it’s faster and has more use cases! Thanks for publishing your work