Part 1: AI products for knowledge management reviews
Ni-Howdy, welcome back to the second article of Inf Thoughts! It’s been a while since the first newsletter as your boy is working on something interesting. It will be a two-part series zooming in on AI-generated content (AIGC) specifically on voice and text. In this one, I will share some reviews of the artificial linguistic creativity tools I've been using for the last couple months.
TL;DR:
Personal favorite knowledge management products
voice-to-text:
Otter.ai
Lark Minutes (by Bytedance)
text-to-voice:
Speechify
text-to-text:
Mem.ai
Rewind.ai
Market Maps, Articles, and Tweets for your reference
From Sonya Huang, Partner at Sequoia:


From Anne-Laure Le Cunff, founder of Ness Labs:


Generative AI index from Scale Ventures: https://www.scalevp.com/generative-ai
Great AI trend discussions with 10 investors from
: https: //www.generalist.com/briefing/what-to-watch-in-aAs a data science master student at Upenn and previous intern at VCs, synthesizing information is always my top priority. More specifically, my pain points are summarizing lectures, books, podcasts, and articles to “build a second brain” at school and generating high quality, continuous information flows as a junior team member at work. I got familiar with natural language processing (NLP) and Automatic Speech recognition(ASR) products relatively early and was fascinated by how much of my productivity could be increased with suitable new gear.
By functionality, ASR and NLP products could be put into 4 categories: voice-to-text, text-to-voice, voice-to-voice, and text-to-text. For voice-to-voice, I can see great application potential, such as more accurate real-time translation in all languages (recent breakthrough from Meta) However, I currently don’t really have the demand for it and won’t comment too much.
“Our team first translated English or Hokkien speech to Mandarin text, and then translated it to Hokkien or English,” said Juan Pino, researcher at Meta. “They then added the paired sentences to the data used to train the AI model.”
For voice-to-text and text-to-voice, applications include meeting note/podcast transcript generation and book/article reading out loud. The deal breakers for the two categories would be more on the user experience side since the technology is mature enough. The add-on features for a better user experience include collaboration functions, input/output format support, a unified user interface, and cross-platform performance. Otter.ai and Speechify are my go-to tools since Otter.ai is simple and cheap (with a student discount) to use across my devices. Speechify is a text-to-speech reader that offers the by-far best voice selections (listened to Venture Deals on Snoop Dogg’s voiceover…pretty awesome). I can have deeper impressions with the knowledge learned compared to just reading or hearing, and it will keep me focused with continuous information pouring into my ears (voluntarily) and a time estimation to finish the whole thing (that level of control is great).
However, further improvements for voice-to-text and text-to-voice tools could be made to support multiple languages, especially under context-switching (different words from different languages in one setting). Otter.ai is not fully capable of transcribing industry specific contents in my machine learning lectures or meeting notes from a fintech conference. Lark (飞书), the collaboration SaaS from Bytedance, put Lark Minutes(飞书妙记) out of beta mode in September 2022, and currently I’m using it FOR FREE (again, I’m a broke student). Although it also doesn’t fully support context switching yet and the transcript is not as accurate as Otter.ai for English, Lark Minutes supports 3 languages - Chinese, English, and Japanese and can automatically detect the input language and translate to the desired language accordingly.
The hottest, most exciting category would be text-to-text operations enabled by Large Language Model (LLM) generators such as OpenAI and Hugging Face. Leveraging GPT-3 and the latest Whisper provided by OpenAI, NLP products can perform text transcription, text translation, and text summarization through OpenAI APIs and Hugging Face pipelines. Grammary, Quillbot, and Writer are all “text optimization” tools, and they are offering more premium features thanks to the new Transformer advancements. I’m still trying to develop my user habits on content generators such as Lex and Rtyr. The extent of content generating tools right now is to remove writer’s block by giving prompts, but the content generated is always, as Sarah Guo puts it, a trial-and-error process with no clear instructions given. In fact, the following paragraph is written by Lex:
“When it comes to the question of what is the best AI product for knowledge management, the answer may not be as straightforward as one might hope. There are a variety of different AI products on the market, each with its own unique set of features and benefits. As such, it can be difficult to determine which AI product is best suited for a particular application or use case. However, there are a few general factors that can be considered when evaluating AI products for knowledge management.”
The most innovative products for personal knowledge management are Mem.ai and Rewind.ai. They could be grouped together since their most powerful functions for me are both searching. Mem.ai is a note-keeping platform without file hierarchies that could ingest text data from websites, emails, and tweets. It then provides holistic search results from my notes, my hashtags, my contacts, and other stored personal data leveraging LLMs
“We leverage both OpenAI embeddings models and Pinecone vector search as fundamental pillars of Mem X. These technologies power features such as similar mems and smart results, among others.” — Mem.ai Memo
I personally love Mem.ai because of its non-folder design and its frictionless note taking experience. I can save Twitter summaries (so many “mem it” tweets now I’m not the only weirdo), newsletters, and texts easily into Mem.ai. As a Mem X subscriber, I’m waiting for the actual iOS app and the upcoming Mem X features such as intelligent tagging and action suggestions. It would also be next-level if there could be a transcription feature for the “upcoming event” feature if there’s audio/video involved.
Rewind.ai is a “macOS app that enables you to find anything you’ve seen, said, or heard”. I had the chance to test out Rewind.ai early (shout out to Brett!) and I would say that it’s magical, even a bit ‘dreams come true’. When every other SaaS is taking the “cloud-first, all-platform, strong-integration” approach, Rewind.ai is going for the “local-only, Apple-Silicon, no-integration-needed” route. Co-Founder & CEO Dan Siroker addressed all 3 differences in his twitter thread:


“Local-only” is because of privacy concerns. Rewind.ai “records anything you’ve seen, said, or heard and makes it searchable. For your privacy, we store all of the recordings locally on your Mac. Only you have access to them.”
“Apple-Silicon” is because of the performance and technical advancement. Rewind.ai “utilize virtually every part of the System on a Chip (SoC) so that running Rewind doesn’t tax system resources (like the CPU and memory) while it is recording. It feels virtually imperceptible.”
“no-integration-needed” is because of Apple’s system support. Rewind.ai “use native macOS APIs and Optical Character Recognition (OCR) to recognize & index all the words that appear on your screen. This means you don’t need to integrate with cloud products like Gmail, Dropbox, or Slack.”
One more killer differentiator is the compression ability. Rewind.ai can compress raw recording data up to 3,750x times without a major loss of quality(definitely the next-gen of Pied Piper). I’ve only used it for a short time, and the power of search will be unleashed after more data is being added to the personal database. One use case that I find great is contextual version control for everywhere, whether it’s a message, a piece of code, or a random note. With Apple’s text detection, copying and pasting from frames is just too easy. Rewind.ai saved me from browser-crush-losing-text multiple times already. However, I don’t think that currently Rewind.ai search supports languages other than English or support semantic search. Also, even the compression rate is super impressive, “on average users use 14 GB per month” for storing recorded content does not really applicable to me. I personally use a Mac and a 4K monitor outlet, and don’t think I have the disk space to keep more than a year's worth of Rewind data.
With that being said, I’m really bullish with their product development potentials. Rewind.ai could be the personal search engine for the “noticed” contexts and links, while mem.ai could be the personal search engine for the “unnoticed” contexts and links. For example, if I was bombarded by Zoom meetings and vaguely remembered a piece of important information from those conversations, then Rewind.ai could help me find it (if I opened it, of course). On the other hand, if there's a gold newsletter that's sitting in the inbox for a long time and I never had the chance to read it, Mem.ai would be a good discovery source when I'm searching for the topic.
All the tools I mentioned above are awesome for my day-to-day information ETL(Extract-transform-load) processes and can be used in supplementary manners to each other. However, I still feel like that there would be tools to fill the gaps those great product have not covered. More on that later!
I also appreciate comments/suggestions for the content I wrote about and how I should improve the newsletter experience. Let me know what you think!

*all personal opinions