This internship role will be based out of Headquarters in Mountain View, California.
At LinkedIn, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location of this role is hybrid, meaning it will be performed both from home and from a LinkedIn office on select days, as determined by the business needs of the team.
Are you interested in large-scale data processing? We are building the next generation of LinkedIn鈥檚 data infrastructure, spanning analytical compute, data storage, and lakehouse platforms that power insights, analytics, and intelligent products across the company. As LinkedIn continues to grow in membership, usage, and data volume, you will help scale systems that process and query massive datasets reliably and efficiently.
In this role, you will work with distributed data processing engines and algorithms, developing a strong systems mindset around scalability, performance, and correctness. You will gain hands-on experience with query execution, data partitioning, caching, and distributed storage, and contribute to production systems built on and alongside cutting-edge open-source technologies.
Candidates must be currently enrolled in a PhD program, with an expected graduation date December 2026 or later.
Our internships are 12 weeks in length and will have the option of two intern sessions
May 26th, 2026 - August 14th, 2026
June 15th, 2026 - September 4th, 2026
The ideal intern will contribute to scaling LinkedIn鈥檚 data infrastructure to support continued growth in membership, traffic, and data volume. As usage of our products continues to expand, this role will focus on building and supporting large-scale data systems鈥攕uch as Apache Spark, Flink, Trino, Iceberg, and Airflow鈥攖hat power self-serve analytics, reporting, interactive querying, and the data pipelines behind LinkedIn鈥檚 machine learning and AI-powered products.
As part of this work, the intern will:
Design and optimize distributed data processing and query execution workflows operating at LinkedIn scale.
Develop data abstractions and optimizations鈥攕uch as materialized views and query rewriting鈥攖o improve performance, efficiency, and data freshness for analytics and AI workloads.
Contribute to the reliability and scalability of production data pipelines through monitoring, correctness validation, and performance tuning.