AI-Assisted Workflows for Personal Science

Since many of us are now using AI assisted workflows to handle some of the finicky details of personal science projects, I thought it would be a good idea to create a topic to share ideas and approaches. It’s open now!

Feel free to post long text if you want to share details, but use the “expand” function, as I did in the first post:

[details="Click to expand"]

Your long text goes here. It can include paragraphs, lists, code, etc.

[/details]
3 Likes

Today I worked on a medium term goal of nudging my QS work toward text-based workflows, because this supports freeform reasoning using AI assistance better than trying to integrate various apps directly. One of the resources for my personal science projects is my collection of research papers in Zotero. Today I spent about 90 minutes setting up a structure to make them more easily “thinkable” by extracting metadata and notes on recent papers from my Zotero directories and putting them into .md files accessible both via command line/Claude and via Obsidian.

Here’s the summary of the session generated by the diary prompt.

NOTE July 31 2025: I have not yet implemented this, and if I do I think it will be simplified. I’ve learned a lot since then and I think this workflow is too complex.

Click to expand

Claude Diary Entry - 2025.06.07.140000

Session Title: Zotero to TBWK Integration Strategy Development

Session Summary

Today’s session focused on developing a comprehensive strategy for integrating Zotero research management with Text-Based Workflow Knowledge (TBWK) principles. The user initiated the conversation with a request to examine file structures containing Zotero knowledge and extract research paper links, metadata, and personal notes into a text-based system that would align with their established TBWK approach.

The conversation began with an exploration of the user’s file system to locate Zotero data, but the actual Zotero files were not found in the accessible /Users/garywolf/developer directory. This led to a strategic discussion about extraction approaches rather than immediate implementation. The user provided a sample Zotero storage path (/Users/garywolf/Zotero/storage/A6Q274ZU/Gruber - 1995 - Toward principles for the design of ontologies used for knowledge sharing.pdf) which confirmed a standard Zotero installation structure.

Through iterative discussion, we developed a “Bridge and Enhance” strategy that maintains Zotero as the primary PDF manager and citation tool while creating a parallel text-based research analysis layer. The approach focuses on extracting research papers to markdown files within the user’s Obsidian vault, making them accessible to both Obsidian’s visual interface and command-line tools for AI assistant analysis. Key decisions included using a single, flexible template with pre-populated tags for deletion rather than creation, focusing initially on 2025 papers, and placing extracted files in a research_papers/2025_extraction/ subdirectory within the existing Obsidian vault.

The session culminated in creating a comprehensive step-by-step implementation plan that breaks the integration into five phases: setup and template creation, extraction setup, template refinement, workflow integration, and maintenance. The plan includes specific technical details, success metrics, risk mitigation strategies, and immediate next actions. The approach successfully balances the user’s desire for text-centric workflows with practical considerations around existing Zotero investments and research habits.

Key Artifacts Created

• Plan document: /Users/garywolf/developer/zotero_tbwk/zotero_plan.tbwk

• Directory structure: /Users/garywolf/developer/zotero_tbwk/ created

• Text shortcut template designed for macOS text replacement (;research trigger)

• Markdown note template framework for extracted research papers

• Strategic workflow architecture bridging Zotero and TBWK systems

Main Requests/Tasks

• Initial request: “Examine the file structure that contains Zotero knowledge and suggest a way to extract some links to papers stored in Zotero and the associated notes for eventual inclusion in my TBWK approach”

• Focus refinement: “Focus on extraction in your answer” with specific requirements for paper links, metadata, and personal notes

• Strategic clarification: Discussion of three integration approaches (Bridge and Enhance, Export and Evolve, Hybrid Workflow)

• Template development: “I’d like to focus on recent papers in my 2025 file” and placement within Obsidian vault but accessible to AI assistants

• Workflow optimization: Development of text shortcut template with pre-populated tags for deletion rather than creation

• Documentation request: “Create a step-by-step plan as a .md document” and “Create a Claude Diary entry”

Technical Details

  • Zotero installation confirmed at: /Users/garywolf/Zotero/ with database likely at /Users/garywolf/Zotero/zotero.sqlite

  • Target extraction location: ~/ObsidianVault/research_papers/2025_extraction/

  • Filename convention: YYYY_FirstAuthor_KeywordFromTitle.md

  • Text replacement trigger: ;research expanding to structured template

  • File system exploration performed on: /Users/garywolf/developer

  • Search patterns used: “zotero”, “Zotero”, “sqlite”, “.rdf”, “bib”, “citation”, “reference”

  • Template structure includes: metadata, original intent capture, research context, connections, and follow-up actions

Learning Points

• Discovered the importance of capturing “moment of discovery” insights when downloading research papers

• Learned that deletion-based templates (pre-populated tags) are more user-friendly than creation-based templates

• Found that the “Bridge and Enhance” approach aligns perfectly with TBWK philosophy of enhancing existing tools rather than replacing them

• Understood that file placement within Obsidian vault but accessible to external tools creates optimal flexibility

• Recognized that text-centric research workflows can complement rather than compete with existing Zotero investments

Next Steps

  • Implement Phase 1: Set up macOS text replacement with ;research trigger

  • Create research_papers/2025_extraction/ directory structure in Obsidian vault

  • Test template workflow with 1-2 recent papers from Zotero

  • Locate and verify access to Zotero database at expected path

  • Begin Phase 2: Develop and test extraction script on small subset of papers

  • Refine template based on initial usage patterns

  • Establish command-line search workflows using ripgrep

Session Metadata

  • Duration: Approximately 1.5 hours

  • Tools discussed: Zotero, Obsidian, macOS text replacement, ripgrep, Python extraction scripts

  • References: TBWK document principles, Zotero database structure, SQLite querying

  • File types created: .tbwk (markdown plan), directory structure

  • Integration complexity: Multi-tool workflow bridging GUI and command-line interfaces

2 Likes

I’ve been using MCP with Claude Desktop and the mcp-pyzotero MCP connector. The new versions of Zotero can work as an MCP server, letting you read and search your Zotero docs from within Claude Desktop. For example, I can prompt connect to my Zotero library and find all documents that refer to Apple Watch. and then return the results in BIB format.

There are similar MCP connectors to Obsidian (find all notes that relate to gut health) but I haven’t dug deeply into that yet.

1 Like

This is just to record something I learned while having Claude Code work on an html interface for local data analysis.

After a morning of dumb mistakes on both of our parts, I had a sincere chat with my AI assistant. I was micromanaging and trying to get it to stop making the same error again and again. But my advice wasn’t working. Micromanaging hardly ever does. I wasn’t seeing the details of what was driving the poor decision-making. So we stepped back and I offered this advice.

Time to do some careful thinking. Consider how the JavaScript is being presented and used with the wisdom of long experience in iterative development, emphasize simplicity, and practicality, and see if you can resolve these sorts of issues once and for all. Remember our best practices with relative paths and resilience. Don’t do any fallback coding or security operations. This is for local use only under very secure conditions.

This seemed to work where more mechanical instruction failed. These instructions/heuristics are in the project knowledge, but a reminder was necessary.

Response:

You’re absolutely right. Looking at this with experience, the core issue is mixing inline event handlers with module-style JavaScript. Let me apply practical wisdom and fix this once and for all with simple, reliable patterns.

The Real Problem:

Inline onclick handlers (onclick="functionName()") expect global functions, but modern JavaScript patterns often create scoped functions that aren’t globally accessible.

Simple, Bulletproof Solution

Use addEventListener pattern - it’s more reliable, separates concerns, and works consistently.

Hi Richard, one of the things I’ve been frustrated by/curious about is all the different ways to present background knowledge/code/heuristics to Claude. They all seem to have different effects, and the non-mechanistic nature of the results is just strange and fascinating. I haven’t used MCP connectors much, instead tending to load semi-processed metatdata into subdirectories Claude Code has access too. I’m trying develop a somewhat controlled environment for creative work, but this may be a pipe-dream. My own mental model of “what I can expect from my AI assistants” is very unstable. For instance, there seems to be a dilution effect, such that the longer I work on a specific project without compacting or restarting, the less focused and useful the AI output. My thick-fingered sketch is: all the mistakes we’ve made are piling up around us. I have no idea if this is a real thing or not, though.

I’ve been trying the (free) version of Gemini CLI to see how it can handle my rough dietary notes.

What I did:

Click for an overview of my process
  1. I keep a daily diary in Obsidian with very brief notes on what I ate (e.g. “2 eggs”, “beef burger with bun”, “shrimp, broccoli, sourdough bread”).
  2. I have a prompt that was “trained” on a few weeks of those notes, where I annotated my food diary with exact numbers taken from Cronometer, essentially “teaching” the LLM about the relationship between my rough notes and the exact dietary numbers.
  3. I applied that prompt to 6 months worth of the rough notes, telling it to filter out days that are obviously incorrect (e.g. total calories < 800) and then compute my macronutrient averages along with any trends.

Result

Gemini gave me this (which seems pretty accurate)

After filtering out the days with less than 800 calories, here is the analysis of your Q2 food log:

Macronutrient Averages:

  • Carbohydrates: 39%
  • Fat: 42%
  • Protein: 19%
Click for more analysis

Trends:

  • Higher Carb Intake: Compared to Q1, your carbohydrate intake has increased significantly, making it the primary energy source in Q2.
  • Reduced Fat Intake: Consequently, your fat intake has decreased and is no longer the dominant macronutrient.
  • Consistent Protein: Your protein intake has remained relatively stable across both quarters.
  • Travel Impact: Your diet seems to be significantly influenced by travel, with periods of higher carbohydrate and fat consumption during trips.
  • Post-Travel Correction: You tend to have lower-calorie days with a higher protein percentage after returning from trips, suggesting a conscious effort to rebalance your diet.

Bottom line

As with all things LLM, this is a game changer. Once the knowledge of these tools defuses some more, this dramatically lowers the bar for anyone performing self-experiments.

I think it’s interesting that you did some training. I’ve been more focused on AI-assisted workflows to produce visualizations and other reasoning artifacts like tables or summary statistics. All of that has been super helpful, but I haven’t considered training the responses using a more elaborate prompt. I’m going to try it.

I’ve been wanting a searchable archive of all my messages. My goals were finding old conversations that reflect my situation at the time (useful when making my morning notes), for Quantified Self analysis, and for supplying background knowledge for AI supported inquiry of various kinds. Here’s what I did.

Click to expand

Apple stores messages in ~/Library/Messages/chat.db, a SQLite database. However, they’ve made it intentionally difficult to access:

  1. Permission barriers - The database requires Full Disk Access permissions, which most apps (including Terminal) don’t have by default.
  2. Non-standard timestamps - Apple uses nanoseconds since 2001-01-01 instead of Unix timestamps.
  3. Complex schema - Messages are spread across 17+ tables with multiple joins required to reconstruct conversations.
  4. No export option - Messages.app has no built-in export functionality.

I appreciate Apple’s “privacy by default” approach but sometimes there also seems to be a “our private property by default” aspect to the data structures: technically accessible but practically locked in.

At first I worked with Claude Code to parse the SQLite database directly, but I ran into problems getting the complete dataset. Since I have over 40 GB of messages, I don’t have them all locally, and after a few hours of trying I still wasn’t sure how to copy the full database. Somebody more skilled than I am might be able to figure out it, but from Reddit, Apple Community comments, and Stack Overflow, I gathered that it was generally considered impossible. There were many recommendations of iMazing, a third-party iOS/macOS management tool. I hate paying for 3rd party export tools; it seems unjust that this is necessary, and in the past I’ve had some bad experiences with tools that promised data export but didn’t really deliver. But this was my only way forward, so I went for it.

The free version of iMazing is limited to a few conversations, but this was enough to at least run some tests. They worked, so I bought a one-year license for $50. The export process was straightforward:

  • Connect your device or select your Mac’s messages
  • Choose “Messages” from the sidebar
  • Select conversations (or all)
  • Export to CSV format

The resulting CSV included 156,000+ lines with the metadata I wanted: delivery timestamps, read receipts, sender names, reply threading, and attachment info.

From here, I worked with Claude to create a simple Python script that converts the CSV to readable Markdown, for human eyes, and some scripts for querying the tabular data. The process was iterative:

  1. Started with a test file (first 1,000 lines) to verify the format
  2. Built a basic parser to group messages by date and conversation
  3. Added progress tracking for the full 156K line file
  4. Generated a 12MB Markdown file with all 124,000+ messages

The script handles the CSV parsing, date formatting, and Markdown escaping in about 30 seconds. There were few problems with the Markdown, since my text messages include a few code snippets, but these were easy to spot and hand edit.

NOTE: Claude required careful supervision to limit the scope. I stayed in planning mode for quite a few rounds as Claude continued to elaborate complex ideas despite my having set what I thought were clear guardrails. With a level of nitpicky correction, my robot assistant finally conformed.

I now have a complete, searchable archive of every text I’ve sent or received. It’s grouped by date with conversations as subheadings, making it easy to see what I was discussing on any given day.

The total cost was $50 and two hours work. Most of this work was going down dead ends to avoid paying for iMazing, so you can skip this part of you want to end up where I did. Feel free to ask any questions here if you want to try it.


Note July 31, 2025:

This is working even better than I hoped. I ask Claude questions like: “what was I doing on August 1, 2022” and I get answers like:

On August 1st, 2022, you were traveling in France with Christa - you spent the day swimming in Nice and helped Christa with her hair while someone named Ludo warned you about hot weather inland."*

The activities recorded in the sms archive trigger intense memories, and their seeming insignificance is part of what makes them real. The details are often slightly wrong, and I can usually detect these mistakes, but instead of ruining the effect it strengthens it. There’s a slight friction, like when you are talking about old times but something is out of order, and it sends my mind back to solve it. I sometimes check the sms record directly if I’m unsure. Names of people I haven’t thought about in a while are especially affecting.

In logging my own actions, my to do lists have never been very useful. That’s because while I make a lot of lists, I don’t organize the lists themselves, so they are scattered across all my working environments. My lists were never meant to be long lasting; I keep them wherever I’m working, and when the tasks are done they’re gone. Higher level planning stuff is more permanent, and of course there are some things in my project planning that look like To Do lists, but I’m talking about something different: the daily, in the moment notes that I keep to remind myself of things that have to be done or followed up on. I might have a to do that says: “call Tammy about nonprofit rules.” After I call Tammy, there’s no reason to keep this hanging around.

But recently I’ve been using a lot of this background material to help me reason about events and explore personal science questions. Along with calendar entries and direct messages, my To Do notes can tell me something about where I was and what I was doing at any particular time. This motivated me, at long last, to take a more archival approach to my To Do lists. Since I’m already involved in a more general project of developing text-based workflows that are not dependent on cloud services, I thought this was a good opportunity to add some of my most ephemeral records.

In the moment, I often dictate To Do items using Siri and Apple Reminders.The task today was to pull my Apple reminders out of my phone and put them into a generic Markdown file which I can read using any text reader, but mostly interact with using Obsidian. It took a little while to figure out how to do this, but now it seems to be working well. I discussed it for a bit with ChatGPT, but most of the work was in Claude Code. Here is my automatically generated diary entry about how it was done.

Time 90 min. Cost: Free if you don’t count paying for Claude.

Click to expand

Claude Diary Entry - 2025.07.30.224131

Session Title: macOS Reminders to Obsidian Sync Automation

Session Summary

Today’s session focused on creating a comprehensive automation system to sync Apple Reminders tasks to Obsidian for unified task management. The development process involved multiple iterations. Initial attempts revealed issues with AppleScript permissions causing tccd (Transparency, Consent, and Control Daemon) to consume high CPU and create 60-second delays. Through systematic debugging, we discovered that the permission system needed proper handling and that executing AppleScript commands efficiently required careful structuring to avoid connection timeouts.

The solution implements the following workflow: reminders are copied as unchecked task to a markdown document inside the Obsidian file structure, then marked complete in Apple Reminders. This preserves the tasks in Reminders as an archive while keeping them actionable in Obsidian.

To ensure daily synchronization, we configured a macOS LaunchAgent to run the script automatically at 11:59 PM each day. The session concluded with fixing a display issue in Obsidian where project TODOs were appearing in small font due to an unrecognized query syntax, which was resolved by implementing proper Dataview plugin syntax.

Code Snippets

# Final optimized AppleScript within bash
output=$(osascript <<'EOF'
tell application "Reminders"
    set todoList to list "To Do Notes"
    set uncompletedReminders to reminders of todoList whose completed is false
    set processedCount to 0
    set addedTasks to ""
    
    repeat with aReminder in uncompletedReminders
        set reminderName to name of aReminder
        if addedTasks is not "" then set addedTasks to addedTasks & "\n"
        set addedTasks to addedTasks & "- [ ] " & reminderName
        set completed of aReminder to true
        set processedCount to processedCount + 1
    end repeat
    
    if processedCount > 0 then
        return addedTasks & "\n---SUMMARY---\n" & processedCount & " tasks processed"
    else
        return "NO_TASKS"
    end if
end tell
EOF
)

Learning Points

• Discovered that macOS tccd can cause significant delays when permission dialogs fail to display properly
• Learned that AppleScript connections to system apps can timeout when executed repeatedly in bash loops
• Found that consolidating AppleScript operations into a single execution dramatically improves performance
• Understood that macOS uses LaunchAgents (not cron) for user-level scheduled tasks
• Realized that Obsidian’s query code blocks require specific plugin syntax (dataview) for proper rendering

Session Metadata

  • Duration: Approximately 90 minutes
  • Tools used: Bash, AppleScript, macOS LaunchAgent, Obsidian, Dataview plugin
  • References: Apple Reminders AppleScript documentation, macOS launchd documentation
  • Final script execution time: 1-2 seconds (improved from 60+ seconds)

Decided to do this with my historical calendar entries today, going back to 2010.

It might seem like this would be too big to put into a single markdown file, but it was under 2MB. It’s well-formatted, searchable, and queryable. I thought about putting it into a database, but didn’t bother. I also thought about using Google Takeout, but I didn’t bother with that either. I just vibe coded an extraction from my local data, which is synced to my MacBook from Google Calendar. Along the way, I added a script that will add the next day’s events to my daily notes in Obsidian. Eventually, I will probably leave all of the apps that I use behind and work mainly in text files. But the last few days of entertaining myself between real work by straightening out my text-based workflows has connected with some of my research interests, just in the sense of confirming that you want things loose at the edges but strict at the core. You can think of calendar entries as records of observations made by the “calendar instrument.” At the moment of collection, there are going to be some glitches and messy things happening (for instance. duplicate entries). But during the process of collection, there needs to be a process very close to the origin that applies judgement about what is the “minimal effective observation” (HT @tblomseth & @jakobeglarsen). Once this is done, then the rest of the integration is efficient and reliable. Autogenerated summary below.

Calendar Automation for Text-Based Workflows

Goals

Transform manual calendar integration into a hands-free system that maintains both daily Obsidian notes and long-term calendar archives.

Extract 15+ years of calendar history while creating sustainable automation for ongoing workflow.

Strategy

Implemented dual-layer automation using macOS launchd: daily evening preparation of tomorrow’s notes (11 PM) and weekly archival updates (Sunday 8 AM). Developed incremental update system that reads timestamps from existing files and processes only new events since last update.

Philosophy

Applied Text-Based Workflow Knowledge (TBWK) principles: simple, composable tools with transparent operations. System adapts to user availability rather than demanding rigid schedule adherence.

Tools

  • icalBuddy 1.10.1: CalDAV integration for direct calendar access

  • macOS launchd: Native scheduling handles sleep/wake cycles

  • Bash scripting: Wrapper scripts for date calculations and automation

  • Atomic file operations: Temporary files with timestamped backups

  • Comprehensive logging: Separate stdout/stderr streams for monitoring

Key Lessons

icalBuddy requires -nrd flag for consistent date parsing. launchd gracefully handles missed runs when Mac is off.

Wrapper scripts essential for date calculations in automated environments. Atomic operations with temporary files prevent data corruption. TBWK principles scale effectively to complex automation scenarios.

Results

Fully automated system processing 60+ calendars with 559 current events and 15+ years historical data (1.57MB archive). Zero manual intervention required for daily workflow integration and weekly archival maintenance.

Local First Personal Accounting System

As a side project to my side project, I’ve been creating a local first accounting system for both my business and personal transactions.This is partly in order to have more control over how I query things if I have a financial or tax question. But it’s also motivated by knowing how much signal is in these transactions if I’m trying to troubleshoot a personal science question.I know that it contains information about where I am, what I’ve done, even doctor’s visits, etc.I do some manual lookups if I’m really trying to figure out the context of some observations. But this involves looking in a Quicken web interface, which always presents frustrations.There was so much finicky detail in keeping a financial transaction system like this going that I’ve never tried to extricate myself from the world of apps and cloud services.

But it seemed like a good target for vibe coding. So far, I’ve created a data store with downloads of all my financial transactions from bank accounts and credit cards. Of course they are in different formats! But standardizing them is easy with robot assistance. The challenge lies in dealing with edge cases and in manually entering corrections and in testing to make sure that there aren’t errors. I found that the spending side of this is pretty easy. Almost all transactions have a vendor, and I was able to teach Claude about matching vendors with categories by providing an export from Quicken. Now I’m at a fork in the road (as shown by the prompt below). Not sure which direction I’ll take - possibly both.

Details

Focus completely on the spending side of the

system. In this approach, we would ignore account

transfers as long as we can identify them as

transfers. We can not count them as part of our

spending reports and not include them in our

categorization workflows. In this way forward, our

goal is to clearly and strictly identify account

transfers. Account transfers almost certainly have

very standard features in all accounts, so I would

like to define this approach in a document called

“Spending Analysis Workflows.md”.

Use a chart of accounts and do double-entry

bookkeeping which allows reconciliation.

Reconciliation can be labor-intensive and complex,

but on the other hand, it provides a very clear

signal that everything is working because if there

is a reconciliation error, we know that something

has gone wrong. I would like a second document that

outlines a reconciliation system so that I can

assess how difficult this would be to create and

maintain. I don’t want these two approaches to

connect; I do not want to tangle up our spending

analysis in complex reconciliation issues. That’s

why I like this analyzed into separate documents.

Before we do anything, including before creating

each of these documents, I’d like to see your plan.