Cream Mic: AI Desktop Assistant That Listens, Sees, and Answers in Real Time

Cream Mic AI Desktop Assistant

The best advisor is the one who is always present but never in the way. Who listens when spoken to, sees what you see, and delivers the answer before you finish formulating the question.

Every developer, engineer, and knowledge worker faces the same daily friction: context switching. You are deep in code — and need to look up an API signature. You are reviewing an architecture diagram — and need a second opinion. You are preparing for a technical interview — and need a sparring partner who knows system design as well as you do.

Each interruption breaks your flow. Each tab switch costs 23 minutes of focus recovery. Each Google search pulls you out of the work and into the noise.

Cream Mic eliminates this friction entirely. It runs as a lightweight overlay on your desktop — always available, never intrusive. Press a key to ask a question by voice. Press another to analyze whatever is on your screen. The AI responds in real time, streaming the answer directly into a clean floating panel. No browser. No tab switching. No broken flow.

Think of it as an AI copilot that lives on your screen — ready when you need it, invisible when you don't.

How Cream Mic Works

The Interface

Cream Mic is a PyQt6 desktop application with a frameless, semi-transparent window. It floats on top of all other windows but stays visually minimal — rounded corners, adjustable transparency, compact UI. It occupies as little attention as a sticky note, but delivers the power of a full AI assistant.

On macOS, it uses native Quartz rendering. On Windows, WinAPI transparency. Both look polished and professional.

The Hotkeys

The entire interaction model is built around two keys — because the fastest assistant is the one that requires zero setup:

Key First press Second press
Caps Lock Start recording your question Stop recording → transcription → AI answer streams in
Alt / Option Capture screenshot → Vision AI analyzes it → answer

Two keys. No menus. No typing required. You speak naturally, and the assistant responds.

The Voice Pipeline

When you press Caps Lock:

Recording starts (ask your question out loud)
    |
Stop recording (second press)
    |
Speech-to-text transcription (~500ms)
    |
LLM processes your question (streaming)
    |
Answer appears in the overlay, sentence by sentence
    |
Optional: TTS reads the answer back to you

From question to first words of the answer: 1-3 seconds. Faster than typing the query into any chat interface. Faster than opening a new tab.

Screenshot Analysis

Press Alt/Option at any time. Cream Mic captures your screen, sends it to a Vision AI, and returns an analysis. This is where it becomes more than a chatbot — it sees what you see.

Use cases:

You don't need to copy-paste code into a chat window. You don't need to describe what you're looking at. The AI sees your screen and responds with context.

Who Uses Cream Mic

Developers — Coding Without Leaving the Editor

You're deep in VS Code. A function is behaving unexpectedly. Instead of Alt-Tab to a browser, type a query, scroll through StackOverflow — press Caps Lock: "Why would this async function return undefined instead of the promise result?" The answer streams into the overlay while your editor stays in focus.

Screenshot an error traceback. Get an explanation and a fix in 2 seconds. No context switch. No flow disruption.

Engineers Preparing for Technical Interviews

Interview preparation is one of Cream Mic's strongest use cases. Use it as a practice partner:

The engineers who prepare with AI-assisted practice consistently outperform those who study alone. Not because the AI gives them answers — but because it gives them instant, structured feedback on their thinking.

Product Managers and Analysts

During meetings, research sessions, or strategy reviews — ask the AI to analyze what's on screen. A competitive analysis spreadsheet. A market research PDF. A product roadmap diagram. Get instant summaries, second opinions, and sanity checks without leaving the current context.

Researchers and Writers

Reading a dense paper? Screenshot a page and ask "What's the main finding here?" Working on a report? Ask for feedback on your draft by voice. The overlay keeps your working document in view while the AI provides input alongside it.

Provider Flexibility

Cream Mic is not locked to a single AI provider. You choose what works best for your needs:

Speech-to-Text Options

Provider Best for Privacy
ElevenLabs Highest accuracy, multilingual Cloud
Whisper (local) Free, fully private — nothing leaves your machine Local
Yandex SpeechKit Russian language Cloud
Sber SaluteSpeech Russian language (alternative) Cloud

For maximum privacy, use local Whisper. Your questions are transcribed entirely on your hardware.

Language Model Options

Model Best for
Gemini 2.5 Flash Fast answers, quick lookups, syntax questions
GPT-4.1-nano Deep technical reasoning, system design, complex code review

Switch between models depending on the task. Quick syntax question? Use Gemini Flash for speed. Complex architecture question? Use GPT for depth.

Text-to-Speech Voices

Voice Character
Nova Warm, professional
Alloy Neutral, versatile
Echo Deeper male voice
Fable British accent, authoritative
Onyx Low, calm
Shimmer Friendly, energetic

All voices support English and Russian. TTS plays sentence-by-sentence during streaming — you hear the first sentence within 400ms, not after the entire response generates.

Session Logging — Your AI-Powered Study Journal

Every question and answer is automatically saved to the data/ directory with timestamps:

[14:32:05] Q: How would you design a rate limiter for an API?
[14:32:08] A: I'd approach this with a token bucket algorithm. Here's how it works...

[14:35:12] Q: What happens when the bucket is empty?
[14:35:14] A: When tokens are depleted, requests are either queued or rejected with a 429 status...

Over time, this becomes a personal knowledge base — a searchable archive of every technical question you've asked and every AI-generated explanation you've received. Use it to:

The general who documents his campaigns learns faster than the one who relies on memory alone.

Customization

System Prompt — Define Your Assistant's Expertise

Configure the AI to match your domain:

The system prompt shapes every response. Different roles, different contexts — one tool.

Screenshot Prompt — Control How the AI Reads Your Screen

Cross-Platform Support

Feature macOS Windows 11
Overlay Native Quartz WinAPI
Hotkeys Quartz CGEventTap keyboard library
TTS playback afplay (system) pygame.mixer
Whisper (local) Supported Supported
Installation make install scripts\install.bat

Both platforms provide identical functionality with native rendering.

Frequently Asked Questions

What can I use Cream Mic for?
Coding assistance, technical research, interview preparation, document analysis, debugging, architecture reviews — anything where you need quick AI help without leaving your current workflow.

Does it work alongside any app?
Yes. It's a system-level overlay that works on top of any application — code editors, browsers, design tools, video calls, terminals.

What about latency?
First words of the answer appear 1-3 seconds after you stop recording. TTS plays each sentence with ~400ms delay. For a 5-sentence answer, you hear the first sentence in 1.5 seconds and the last in ~4 seconds.

Is my data private?
With local Whisper, your voice never leaves your machine. The LLM query is sent to the provider you select (OpenAI, Google, etc.) under their standard API privacy terms. Session logs are stored locally on your device.

Does it require internet?
For LLM responses and cloud STT: yes. For local Whisper transcription: the STT part works offline, but the AI response requires internet.

Can I use it for interview preparation?
Absolutely — this is one of the most popular use cases. Practice system design questions, get instant feedback, build a study log, and track your progress over time.

See also:

Try Cream Mic — your AI desktop assistant

Follow our Telegram channel for launch updates and early access.

Join NeCL on Telegram