Job Description
This internship role will be based out of Headquarters in Mountain View, California.
At LinkedIn, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location of this role is hybrid, meaning it will be performed both from home and from a LinkedIn office on select days, as determined by the business needs of the team.
Are you interested in cutting edge work at the intersection of distributed systems and database internals ? The Online Analytics Engines team is responsible for building and operating the OLAP Database Engines (Query Execution, Cluster Management, Storage and Indexing) to serve low-latency analytics. The team is part of the Online Analytics Group in the Online Infrastructure Org, and frequently works with sister teams (Platform / Control Plane), partners (e.g Compute Infrastructure) and customers.
The Engines Team is working towards a roadmap to be the one-stop shop for low latency analytics. This is the team that built and open-sourced Apache Pinot Distributed OLAP Database (powering 100+ use cases at LinkedIn) and is currently working on operationalizing more Engines (e.g Clickhouse). As a PhD Intern on this team, there is an opportunity to demonstrate technical depth, own complex engineering problems in the areas of scalable system design and architecture, distributed systems, database internals. You will have the opportunity to tackle problems at LinkedIn scale and collaborate with incredible distributed systems engineers.
We value academic and industrial research. Our engineers are encouraged to innovate on the product by conducting and leveraging existing research. The team is currently writing a paper that we will be submitting to a Database Research Conference later this year. We frequently present our work at conferences (USENIX, Realtime Analytics Summit, ApacheCon, P99Conf to name a few)
Here are few project ideas:
Build a high performance engine in native code (C++, Rust)
Leverage modern HW acceleration capabilities. LLVM based JIT etc.
Innovate on system performance optimizations across multiple resource dimensions (cpu, memory, IO) to drive key efficiency gains for the fleet.
Example - Re-imagine IO layer that is heavily based on memory mapping files on SSDs.
Example - Thread per core based query architecture for improving tail latencies.
Capabilities
Indexes based on Radix Tree, BW Tree for range queries, Hash Indexes for point queries.
Expand complex SQL support - JOINs, Window Functions,
Add query planning support ; cost based, rule based optimizers.
Upsert (in real-time).
Tiered Storage
Index and store nested/unstructured/semi-structured event data (e.g JSON) data
Agentic AI based capabilities to improve operations, eliminate toil.
Candidates must be currently enrolled in a PhD program, with an expected graduation date December 2026 or later.
Our internships are 12 weeks in length and will have the option of two intern sessions
May 26th, 2026 - August 14th, 2026
June 15th, 2026 - September 4th, 2026