Data & AI engineering consulting

The data platform your roadmap promised. Built, shipped, running.

Import Knowledge is a senior data & AI engineering practice with 12+ years building data platforms on Snowflake, Databricks, and AWS across healthcare, biosciences, retail, and manufacturing. Now putting RAG pipelines and AI agents on top of them. Hands on keyboard, embedded in your team.

Currently taking select engagements · Remote · Americas & Europe · English / Español

Experience from
  • Bravado Health
  • Personify Health
  • Blue Cross Blue Shield RI
  • Bank of America
  • Walgreens
  • Foot Locker
  • Brunswick

# services

One engineer, the whole stack.

From raw source systems to governed models to AI that answers questions about your business, built on the platform you already pay for.

Data platform engineering

Warehouses and lakehouses on Snowflake and Databricks, plus Redshift or MotherDuck where they're the better fit. dbt transformation layers that are version-controlled, tested, and documented. Batch and streaming pipelines with Kafka, Airflow, AWS Glue, Spark, and Iceberg change data capture, all on infrastructure managed with Terraform or AWS CDK. Including the unglamorous parts: dimensional modeling, performance tuning, and warehouse cost optimization.

  • Snowflake
  • Databricks
  • Redshift
  • MotherDuck
  • dbt
  • Kafka
  • Airflow
  • Spark
  • Iceberg
  • AWS Glue
  • Terraform
  • AWS CDK
  • Cost optimization

RAG & AI agents

Retrieval-augmented generation and agentic systems grounded in your governed data, not a demo bolted onto a vector store. Embedding pipelines, semantic search, MCP integrations, and agent evaluation, built with the AI tooling native to Snowflake and Databricks so security, lineage, and access control come along for the ride. Ungoverned agents are the top security concern in enterprise AI right now; agents built on a governed platform are the answer.

  • RAG
  • Agents
  • MCP
  • LangChain
  • Embeddings
  • Vector search
  • AI governance
  • Snowflake Cortex
  • Databricks AI
  • Evals

MLOps, catalogs & governance

The infrastructure that keeps data science shipping: SageMaker environments, CI/CD with GitHub Actions, Docker deployments, and data quality testing and observability baked into the pipelines. Plus data catalogs with OpenMetadata, DataHub, or Promethium, because discovery, lineage, and governance are the context layer your AI agents run on. If your models live in notebooks, this is the fix.

  • SageMaker
  • GitHub Actions
  • CI/CD
  • Docker
  • OpenMetadata
  • DataHub
  • Promethium
  • Lineage
  • Data observability

Healthcare & life-sciences data

EHR and clinical integrations over HL7 interfaces: Mirth, ModMed, eClinicalWorks, Surgimate. Medicare HCC risk-score and claims pipelines. LIMS and Salesforce CRM data wired into the analytics platform. Compliance-aware engineering from years inside digital health and biosciences.

  • EHR integration
  • HL7
  • Mirth
  • HCC risk scores
  • Claims data
  • LIMS
  • Salesforce
  • HIPAA-aware

# how we work

Three ways to bring me in.

01

Staff augmentation

An embedded senior engineer for a sprint, a quarter, or as long as the backlog demands. Your standups, your repo, and your code reviews, with your team leveling up while the work ships.

Best for: teams that are short a senior pair of hands right now.

02

Scoped project

A defined build with a defined end: a warehouse migration, a dbt refactor, an EHR integration, a RAG prototype your stakeholders can actually touch. Fixed scope, clear deliverable, documented handoff.

Best for: the project that's been on the roadmap for three quarters.

03

Long-term partner

Fractional, ongoing ownership of your data platform: architecture, maintenance, governance, and a roadmap that survives contact with reality. Senior data leadership without the full-time headcount.

Best for: companies whose data platform is critical but not their core business.

# selected work

A few platforms that stayed in production.

built at bravado health

Healthcare data platform, end to end

The backend database and data pipelines for an entire health organization: EHR and practice-management integrations (ModMed, eClinicalWorks, Surgimate, Mirth), an S3/Glue data lake feeding Snowflake, and Medicare HCC patient risk-score pipelines. Delivered while leading a team of four engineers.

built at bank of america

Cybersecurity data and analytics at a global bank

Security data turned into analysis and reporting the security organization could act on, inside one of the largest banks in the world, where the volume, the stakes, and the scrutiny are all at their highest. The cybersecurity thread goes back further, to security data-analytics work at Argonne National Laboratory.

built at virgin pulse · now personify health

The data layer behind a health plan's member app

The full data model and ETL for Blue Cross Blue Shield of Rhode Island's Your Blue Touch app, plus automated analytical insights for health-plan benefits navigation.

built at virgin pulse · now personify health

Real-time analytics on streaming data

Change data capture from Kafka streams with Apache Iceberg, ETL on AWS Glue, and dashboards moved from batch to real time with Apache Pinot and Superset. Analytical insights at wellness-platform scale with Spark and Presto.

walgreens · foot locker · brunswick

Forecasting and recommenders at retail scale

Demand forecasting, recommender systems, clustering, and enterprise reporting for some of the biggest names in retail, consumer goods, and manufacturing. At Brunswick that spanned brands including Lund Boats, Boston Whaler, and Life Fitness.

# about

One senior engineer. Not an agency.

Import Knowledge is a one-person practice built on over a decade in the data layer, in roles from analyst to data scientist to lead engineer to engineering manager, at companies from Walgreens and Bank of America to digital-health startups and biosciences labs. That path covers the whole arc: writing the SQL, training the models, designing the platform, and running the team that owns it. It also means knowing which of those your problem actually needs.

The foundation underneath is formal training in statistics and machine learning, from regression and time-series forecasting to clustering, recommenders, and ensemble methods. That matters more now, not less: RAG pipelines and AI agents are only as good as the data discipline behind them, and evaluating whether an AI system actually works is a statistics problem.

The same opinions run through every engagement: version-control everything, test what you ship, document what you build, and treat governance as a feature instead of paperwork. Platforms built this way are boring in the best sense. They survive team turnover, audits, and Monday mornings.

Engagements run remote across the Americas and Europe, from the US, Canada, and Mexico to Brazil, the UK, Spain, and Germany, in English or Spanish. The practice takes a small number of them at a time, so each one gets senior attention from day one. No hand-off to a junior bench, no discovery phase that never ends, and no slide deck where a pipeline should be.

Languages & engines

  • Python
  • SQL
  • Scala
  • JavaScript
  • Spark
  • Presto / Trino

Warehouses & lakehouses

  • Snowflake
  • Databricks
  • Redshift
  • MotherDuck
  • DuckDB
  • Postgres
  • Apache Iceberg
  • S3 / Glue data lakes

Pipelines, cloud & infra

  • dbt
  • Airflow
  • Kafka
  • AWS Glue
  • AWS
  • Azure
  • Terraform
  • AWS CDK
  • Docker
  • GitHub Actions
  • SageMaker

AI & machine learning

  • RAG
  • AI agents
  • MCP
  • LangChain
  • Snowflake Cortex
  • Vector databases
  • scikit-learn
  • XGBoost
  • Forecasting
  • Recommenders

Catalogs & BI

  • OpenMetadata
  • DataHub
  • Promethium
  • Tableau
  • QuickSight
  • Superset
  • Apache Pinot

Spoken languages

  • English / Español

# contact

Have a platform that needs building, or an AI initiative that needs to be real?

Tell me what you're trying to ship. I'll tell you honestly whether I'm the right person to help.