We use the Fast Fourier Transform (#FFT) #algorithm to do #LossyCompression for things like images (e.g. jpeg). Most of the information is dropped and only the most important information is retained, which when reversed provides a noisy but recognisable version of the original image.
If this happens with a simple matrix of multi-dimensional vectors (e.g. a bitmap) could it not also be done with word embeddings like #word2vec to perform lossy compression on text? Is this a thing?
I'm so curious as to what the result would be. Like could you use it as a tool to distill the meaning in text either by reducing the amount of text, or just to reduce a large corpus of text to just the gist, without having to store the whole thing? Obviously this would be highly problematic and probably a bad idea, but maybe humorous at least?
Another thought I had would be to use it to fit a larger corpus into a limited number of parameters when priming a generative network.
If this idea has any merit, which I'm sure it doesn't, it's hard to imagine that somebody who actually knows about this stuff (unlike me) hasn't already tried it, but I haven't yet found any references. Then again, I'm not really sure what to search for.
I've found some rather humorous examples of lossy text compression attempts, for instance: https://hackaday.io/project/5689/gallery#0904354f9f40a934977415877b354407