phpc.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A server for PHP programmers & friends. Join us for discussions on the PHP programming language, frameworks, packages, tools, open source, tech, life, and more.

Administered by:

Server stats:

833
active users

Andrew Feeney

We use the Fast Fourier Transform () to do for things like images (e.g. jpeg). Most of the information is dropped and only the most important information is retained, which when reversed provides a noisy but recognisable version of the original image.

If this happens with a simple matrix of multi-dimensional vectors (e.g. a bitmap) could it not also be done with word embeddings like to perform lossy compression on text? Is this a thing?

I'm so curious as to what the result would be. Like could you use it as a tool to distill the meaning in text either by reducing the amount of text, or just to reduce a large corpus of text to just the gist, without having to store the whole thing? Obviously this would be highly problematic and probably a bad idea, but maybe humorous at least?

Another thought I had would be to use it to fit a larger corpus into a limited number of parameters when priming a generative network.

If this idea has any merit, which I'm sure it doesn't, it's hard to imagine that somebody who actually knows about this stuff (unlike me) hasn't already tried it, but I haven't yet found any references. Then again, I'm not really sure what to search for.

So since writing I’ve found loads more academic papers in this area, but haven’t had time to dive in. But @aparrish has some amazing things in this area which I’m only just scratching the surface of. Incidentally she has done exactly what I had in mind with FFT and word2vec and presented it in this talk from 2016 and the result is perfect, no notes. youtube.com/watch?v=meovx9OqWJ