Two things
I care about.
Plain-language notes on what I actually work on. One picture, one paragraph, no hedging.
Semantic Segmentation across Domains
“Teaching a model that the road in Boston is still the road in Beijing.”
Domain shift between source and target; the prediction stays consistent across both.
I work on unsupervised domain adaptation for semantic segmentation — the model is trained on one city (or one LiDAR sensor, or one microscope), and is then expected to just work on another. The picture is rarely that kind.
My approach: instead of trusting confident pseudo-labels blindly, I look for structural priors — long-range dependencies in 3D point clouds, density patterns in LiDAR sweeps, anatomical regularities in medical slides — and use them as a quiet, geometry-aware supervisor.
Hallucinations in Large Vision-Language Models
“When the model sees a cat that isn't there.”
A hallucinated object disappears once decoding is corrected.
Large vision-language models (LVLMs) are eloquent storytellers, but they sometimes describe objects that the image never contained.
My recent work proposes attention contrastive decoding — a decoding-time intervention that quietly suppresses the heads hallucinating from the language prior, while keeping the heads that genuinely look at the image. No retraining required; the surgery is at inference.