Leveraging Machines
How I Evolved My Photo Culling Workflow using Claude API
Act I: The First Attempt
Two months ago, I decided to build a photo culling system, though “build” may be generous; I mostly vibe-coded my way through it. But I didn’t want to just delete bad photos. I wanted to understand why they were bad.
So I built a 3-stage local ML pipeline. It was ambitious. It was elegant. It was also my biggest mistake.
The Original Architecture
The pipeline ran locally with no API dependencies:
Stage 1: Technical Gatekeeping
↓ (Blur detection: Laplacian variance)
↓ (Exposure check: mean image intensity)
↓ (Duplicate detection: perceptual hashing)
Stage 2: Aesthetic Scoring
↓ (CLIP embeddings + scoring)
↓ (K-means clustering for scene diversity)
Stage 3: AI Semantic Review (Optional)
↓ (Ollama/LLaVA running locally)
↓ (Check: eyes closed? distracting elements? composition?)
The appeal was obvious: no API costs, no latency waiting for servers, everything runs on your machine.
On paper, it was beautiful. In practice, it was a nightmare.
The Dependency Hell
The pipeline required:
PyTorch + CUDA 11.8+ (or CPU mode, but slow)
transformers library (for CLIP)
CLIP model checkpoint (~1 GB download)
Ollama local LLM runtime (~5 GB for the LLaVA model)
OpenCV for Laplacian blur detection
scikit-learn for K-means clustering
PIL, pandas, numpy — and all their subversions
On a fresh machine, this could take 45 minutes to set up correctly. On an existing machine, it was chaos.
Here’s what happened when I tried to run it after a week away:
ModuleNotFoundError: No module named ‘transformers.utils.quantization_config’
ImportError: cannot import name ‘version’ from ‘importlib.metadata’
RuntimeError: CUDA out of memory (even on CPU mode)
FileNotFoundError: ~/.cache/huggingface/CLIP model not found
It took some debugging to fix. The culprit? An orphaned numpy installation from a previous project was shadowing the current one. Python was loading the wrong module, which broke transformers, which broke CLIP, which broke the entire pipeline.
I had built a system so brittle that it broke if you looked at it sideways.
But It Worked (Eventually)
Once I fixed the environment, the results were actually good. For a 7,000+ image shoot:
· Stage 1 filtered out more 3,520 blurry/duplicated images
· Stage 2 scored gave thumbs up to 88 of the remaining images with CLIP embeddings minutes
· Stage 3 reviewed the remain 88 images with Ollama and gave thumbs up to 81.
· Total time: several hours.
The output was a CSV with: - Blur metrics (Laplacian variance, center-crop sharpness, patch-wise max) - Exposure data (mean intensity, highlights, shadows) - Duplicate hashes - CLIP aesthetic scores - Scene clusters - Ollama VLM review (eyes closed, distracting elements, composition notes)
It was comprehensive. It was useful. And unleashing on thousands of images took too long.
The Moment of Clarity
Fast forward to last month. I came back from the Norwegian Fjords with over 7,000 images. I pulled up the photo_ai_workflow folder to run it. I had to leave it running overnight.
And in that moment of frustration, I realized something: I was optimizing for the wrong thing.
I had optimized for “no API costs” and “runs locally”. What I actually needed was “works reliably” and “gives me useful insights.”
I wondered: What if I just… paid Claude a few dollars instead?
Act II: The Pivot (A Complete Rewrite in 4 Hours)
What if the entire pipeline could be replaced with a single Claude call?
Not three stages. Not K-means clustering. Not LLM review of specific artifacts. Just: “Tell me if this image is stock-worthy and portfolio-worthy.”
I wrote assess_images_claude.py in an afternoon, or rather, I vibed it.
The New Architecture
send image + prompt → Claude → structured JSON response → CSV
That’s it. One API call per image. Done.
But the prompt was sophisticated:
SYSTEM_PROMPT = “”“You are two expert evaluators...
EVALUATOR 1 — STOCK REVIEWER:
Reject soft focus, motion blur, noise, poor exposure, chromatic aberration...
Approve technically clean, well-exposed, commercially useful images.
EVALUATOR 2 — PORTFOLIO JUDGE:
Assess compelling composition, distinctive light, emotional impact...
Technical perfection alone is not enough.
“”“
The response structure was JSON:
{
“technical”: {
“blur”: {”present”: bool, “severity”: “none|minor|moderate|severe”},
“exposure”: {”overall”: “correct|under|over”, “severity”: “...”},
“clipping”: {”highlights”: bool, “shadows”: bool, “severity”: “...”}
},
“stock”: {”verdict”: “APPROVE|REJECT|BORDERLINE”, “score”: 1-10},
“portfolio”: {”verdict”: “STRONG|CONSIDER|REJECT”, “score”: 1-10}
}
The Cost Problem
Submitting more than 7,000 images individually would cost ~$20. (Claude charges ~$0.003 per image at Haiku pricing.)
But Claude has a Batch API that costs half as much. Catch: each batch needs to be under ~25 MB.
So I implemented chunking:
for chunk of 50 images:
1. Resize to 768px (saves 80% of tokens)
2. Encode to base64
3. Submit batch immediately
4. Move to next 50 images
If one batch failed, I only lost ~50 images. And I could resume from saved batch IDs.
What I Gained
Local Pipeline → API Pipeline
─────────────────────────────────────────────────
45 min setup (+ env fixes) → 5 min to write
3 hours debug time → 0 debug time
Dependency management → Just `pip install anthropic pillow`
Laplacian blur variance → Claude’s reasoning
CLIP aesthetic score → Two explicit verdicts
Ollama VLM review → Structured reasoning
Fragile environment → No environment
$0 in API costs → $3 in API costs
The API pipeline was simpler, faster to write, more reliable, and only $3 more expensive because it was so much faster.
The Data: Local vs API
Here’s what surprised me:
The local pipeline gave me 11 metrics per image. The API pipeline gives me 1 verdict and reasoning.
Which is more useful?
The local pipeline told me: - Blur variance: 87.3 - Laplacian subject variance: 102.1 - Max patch variance: 156.2 - Perceptual hash: e7c3a9f
So… which images should I keep?
The API pipeline told me: - Stock: APPROVE — Sharp focus, well-balanced exposure, travel utility - Portfolio: CONSIDER — Solid composition but conventional framing
Which one actually tells me what to do with the image?
The API approach replaced low-level metrics with high-level reasoning. It wasn’t trying to measure blur; it was trying to understand the image.
The Real Difference
Here’s the thing: a Laplacian variance of 87 is meaningless. You know what’s meaningful? Claude saying: “Motion blur on the subject, hand-holding at 1/30th. Can’t use this for stock. But the composition was interesting — might fix in post for portfolio.”
That’s not a score. That’s a conversation.
The local pipeline was trying to be objective with metrics. The API pipeline is trying to be useful with explanation.
By The Numbers
On my Norwegian Fjords images:
What the local pipeline would have said:
3520 images failed Stage 1 (blur+dedup+exposure)
88 images scored > 0.70 on CLIP aesthetic
81 images passed Ollama evaluation
What the API pipeline actually said:
6172 images APPROVE for stock (26%)
306 images STRONG for portfolio (8%)
290 images are “gems” (both stock-approvable AND portfolio-strong)
203 images are “technically perfect but boring”
76 images are “risky but visually interesting”
I could act on the API results. The local pipeline metrics? I’d still be staring at them, wondering what to do next.
Why This Matters
This wasn’t just about building a better tool. It was about understanding when to use local intelligence vs remote intelligence.
Local ML is great when:
You have a clear, measurable objective (blur = bad, sharpness = good) -
You need to process terabytes without API overhead
You want to understand the mechanism (why is it blurry?)
Cost is critical and API fees would be prohibitive
API-based AI is great when:
The problem is subjective (is this portfolio-worthy?)
You want reasoning, not metrics
The problem is complex (artistic judgment vs image statistics)
You can afford to trade dollars for reliability and simplicity - You want to reason about multiple axes at once (stock vs portfolio simultaneously)
Photo culling is subjective. It requires judgment. It benefits from reasoning. The local pipeline was optimized for the wrong problem.
The Subtext
Here’s what I actually learned:
The API pipeline is:
✓ More reliable (no environment issues)
✓ Faster to develop (API > ML infrastructure)
✓ Better results (reasoning > metrics)
✓ Cheaper at scale (Batch API < GPU hardware)
✓ Easier to explain (verdicts > variance numbers)
The only advantage of the local pipeline was “$0 in API costs.” But I was spending that cost in time, complexity, and pain.
What I Did With 7,000+ Photos
1. Ran assess_images_claude.py on the full folder
2. Got a CSV with stock/portfolio verdicts for 7000+ unique images (first batch, API throttled)
3. Identified 290 “gems” (stock + portfolio strong) — straight to Lightroom
4. Postpone review of 608 “BORDERLINE” images.
5. Deleted 566 “stock rejects” — no point editing technically broken images
From 7,000+ images to 290 keepers in ~3 hours. And I understood why each image landed where it did.
The Code
If you want to steal this approach:
# Install dependencies
pip install anthropic pillow
# Run assessment
python assess_images_claude.py --input ./photos --output ./results
# Get CSV with verdicts
# → results/assessment_results.csv
The script handles everything:
Resizes images to 768px (80% token savings)
Chunks them in groups of 50 (~10 MB each)
Submits via Batch API (50% cheaper)
Polls until complete
Saves batch IDs for resume capability
Parses JSON responses
Outputs flat CSV for Excel
The Takeaway
Zero API cost not always the best approach.
The cheapest tool is the one you actually use.
A complex local pipeline that breaks every week and costs 3 hours to debug isn’t “$0 in API costs.” It extracts mental cost.”
A simple API call that costs $3 and works reliably is actually the cheaper option.
P.S.
The photo_ai_workflow project still lives on GitHub. Get it here. It’s open source. It’s a good reference for local ML pipelines, CLIP scoring, Ollama integration.
But my actual photo workflow? That’s now 500 lines of Python and a $3 API bill.
Sometimes the sophisticated solution is simpler than you think.
Want to try it? The full script is ready to go. Get it here. Takes 5 minutes to set up.
Got 5,000+ photos from a trip? Run it tonight and see what your real gems look like by morning.

