Jul 2024 to Present
Lead Data Engineer
DigiLawyer
Founding/early data engineer building the ingestion + structuring layer and retrieval stack for a legal research product. The work spans messy sources, strict correctness requirements, and the day-to-day operational reality of keeping pipelines and search stable as usage grows.
Primary Impact
- Built high-throughput ingestion pipelines for Pakistan’s legal corpus (statutes + related materials), turning fragmented PDFs/HTML into structured, queryable records.
- Designed the retrieval layer (full-text + semantic/vector retrieval), with attention to latency, debuggability, and evaluation consistency.
- Helped scale the product as it grew from roughly ~1k users to ~16k users, where search quality and data freshness were directly user-facing.
- Led the work functionally as “head of data” with a small team (1 full-time teammate and multiple interns), focusing on documentation and operational discipline.
Core Technologies
PythonPostgreSQLAirflowpgvectorFull-text searchQdrantMeiliSearchScrapingBeautifulSoupDockerKubernetesLinuxAWS (EKS/EC2)