this post was submitted on 01 Sep 2023
9 points (100.0% liked)

Free and Open Source Software

17960 readers
12 users here now

If it's free and open source and it's also software, it can be discussed here. Subcommunity of Technology.


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
 

My small, non-profit team produces a lot of content in the form of blogs, presentations, graphics, mp3 and mp4 files. We are looking for a tool that can classify the content and allow us to search on it to find relevant information on topics. The goal is to maximize existing IP we've developed. Are any of you using any #foss tools do this? Bonus points if it supports natural language querying or generative AI.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] TheHobbyist@lemmy.zip 3 points 1 year ago (1 children)

I suppose you can split your content in 3 categories:

  • text
  • audio
  • image

For text, you can use Langchain which allows to get embeddings from text (read more here: https://js.langchain.com/docs/modules/data_connection/text_embedding/).

For images, you can use CLIP (this model is open source, from OpenAI). You can read more about it here: https://github.com/openai/CLIP

For audio, I don't know anything off the top of my head but you are likely to find something even open source similar to the above I mentioned.

[โ€“] astromd 1 points 1 year ago

Thanks for the suggestions. I have audio transcripts of all the mp3s.