์์ ์ดํด AI์ ๊ธ๋ก๋ฒ ๊ธฐ์ค์ ํจ๊ป ๋ง๋ค์ด ๊ฐ ์ธ์ฌ๋ฅผ ์ฐพ์ต๋๋ค!
ํธ์ฐ๋ธ๋ฉ์ค๋ ๋ฐฉ๋ํ ์์ ๋ฐ์ดํฐ๋ฅผ ํจ๊ณผ์ ์ผ๋ก ์ฒ๋ฆฌํ์ฌ, ์์์ ํนํ๋ ๊ฒ์, ๋ถ์, ์์ฝ, ์ธ์ฌ์ดํธ ์์ฑ ๊ธฐ๋ฅ์ ์ ๊ณตํ๋ ์ธ๊ณ ์ต๊ณ ์์ค์ ์์ ํนํ AI ๋ชจ๋ธ์ ๋ง๋ค๊ณ ์์ต๋๋ค.
์ธ๊ณ ์ต๋ ์คํฌ์ธ ๋ฆฌ๊ทธ์์๋ ํธ์ฐ๋ธ๋ฉ์ค ๋ชจ๋ธ์ ํ์ฉํด ๋ฐฉ๋ํ ๊ฒฝ๊ธฐ ์์ ์์์ ๋น ๋ฅด๊ณ ์ ํํ๊ฒ ํ์ด๋ผ์ดํธ๋ฅผ ์ ๋ณํ์ฌ ์ด๊ฐ์ธํ๋ ์์ฒญ ๊ฒฝํ์ ์ ๊ณตํ๊ณ ์์ต๋๋ค. ๊ตญ๋ด ํตํฉ๊ด์ ์ผํฐ์์๋ ์๊ธฐ ์ํฉ์ ์ ์ํ ๋์ํ๊ธฐ ์ํด ํธ์ฐ๋ธ๋ฉ์ค์ ํจ๊ป CCTV ์์์ ํจ์จ์ ์ผ๋ก ํ์ํ๊ณ ์์ผ๋ฉฐ, ์ ์ธ๊ณ ์ฃผ์ ๋ฐฉ์ก์ฌ์ ์คํ๋์ค๋ค์ ์์ญ์ต ๋ช ์ ์์ฒญ์๋ฅผ ์ํ ์ฝํ ์ธ ์ ์์ ํธ์ฐ๋ธ๋ฉ์ค ๋ชจ๋ธ์ ํ์ฉํ๊ณ ์์ต๋๋ค.
ํธ์ฐ๋ธ๋ฉ์ค๋ ์ํ๋์์ค์ฝ์ ์์ธ์ ์คํผ์ค๋ฅผ ๋ Deep Tech ์คํํธ์ ์ผ๋ก, 4๋ ์ฐ์ CB Insights ์ ์ ์ธ๊ณ 100๋ AI ์คํํธ์ ์ ์ด๋ฆ์ ์ฌ๋ ธ์ต๋๋ค. NVIDIA, NEA, Index Ventures, Databricks, Snowflake ๋ฑ ์ธ๊ณ์ ์ธ VC์ ๊ธฐ์ ๋ค๋ก๋ถํฐ ์ด 1์ต 1์ฒ๋ง ๋ฌ๋ฌ ์ด์์ ํฌ์๋ฅผ ์ ์นํ์ผ๋ฉฐ, ํ๊ตญ์์ ๊ฐ๋ฐ๋ AI ๋ชจ๋ธ ์ค ์ ์ผํ๊ฒ Amazon Bedrock์ ํตํด ์๋น์ค๋ฉ๋๋ค. ์ฐ๋ฆฌ๋ ํ์ํ ๋๋ฃ๋ค๊ณผ ํ์ ์ ์ธ ์ ํ์ ๋ง๋ค๊ณ ์ ์ธ๊ณ ๊ณ ๊ฐ๋ค๊ณผ ํจ๊ป ์ฑ์ฅํ๊ณ ์์ต๋๋ค.
ํธ์ฐ๋ธ๋ฉ์ค๋ ๋ค์๊ณผ ๊ฐ์ ํต์ฌ ๊ฐ์น๋ฅผ ์ค์ฌ์ผ๋ก ์ผํฉ๋๋ค.
๋์ ํ์ ๋ํด ์ ์งํ๊ณ ์ฑ์ฐฐํ ์ ์๋ ํ๋
์คํจ์ ํผ๋๋ฐฑ์ ๋๋ ค์ํ์ง ์๋ ๋๊ธฐ์ ๊ฒธ์
๋์์๋ ํ์ต์ ํตํด ํ์ ์ญ๋์ ํจ๊ป ๋์ฌ ๊ฐ๋ ์์ธ
๋์ ์ ์ธ ๋ฌธ์ ๋ฅผ ํจ๊ป ํด๊ฒฐํ๋ฉฐ ์ฑ์ฅํ๋ ๊ณผ์ ์ ์ฆ๊ธฐ๋ ๋ถ์ด๋ผ๋ฉด, ๊ทธ ๊ธฐํ๊ฐ ์ฌ๊ธฐ ํธ์ฐ๋ธ๋ฉ์ค์ ์์ต๋๋ค.
ํธ์ฐ๋ธ๋ฉ์ค์ ๋ฉํฐ๋ชจ๋ฌ ์๋ฒ ๋ฉ ๋ชจ๋ธ Marengo์ ์ฐ๊ตฌ๊ฐ๋ฐ์ ๋ด๋นํ๋ ํ์ ๋๋ค. ๋น๋์ค, ์ค๋์ค, ํ ์คํธ ๋ฑ ๋ค์ํ ๋ชจ๋ฌ๋ฆฌํฐ๋ฅผ ํ๋์ ์๋ฒ ๋ฉ ๊ณต๊ฐ(Embedding Space)์ ํตํฉํ๋ ๋ชจ๋ธ์ ์ฐ๊ตฌํ๊ณ ๊ฐ๋ฐํฉ๋๋ค.
Contrastive learning, temporal video understanding, multimodal representation learning ๋ฑ ๋ค์ํ ์ฐ๊ตฌ ์ฃผ์ ๋ฅผ ๋ค๋ฃจ๋ฉฐ, ๋๊ท๋ชจ ํ์ต ๋ฐ์ดํฐ ํ์ดํ๋ผ์ธ ๊ตฌ์ถ๋ถํฐ ๋ชจ๋ธ ์ํคํ ์ฒ ์ค๊ณ, ๋ถ์ฐ ํ์ต ์ต์ ํ, ํ๊ฐ ์ฒด๊ณ ์ค๊ณ๊น์ง ๋ชจ๋ธ ๊ฐ๋ฐ์ ์ ๊ณผ์ ์ ์ฑ ์์ง๋๋ค. NVIDIA B300 ๋ฑ ์ธ๊ณ ์ต๊ณ ์์ค์ GPU ๋ฆฌ์์ค์ ๋ํ ์ ๊ทผ ๊ถํ์ ๋ฐํ์ผ๋ก ๋๊ท๋ชจ ์คํ์ ๋น ๋ฅด๊ฒ ์ํํฉ๋๋ค.
์ฐ๊ตฌ์์ ํ๋ก๋์ ๊น์ง์ ๊ฐ๊ทน์ด ๋งค์ฐ ์งง์ ํ๊ฒฝ์์, Search, Product, Infrastructure ํ๊ณผ ๊ธด๋ฐํ ํ์ ํ๋ฉฐ ์ ์ธ๊ณ ์์ฒ ๊ณ ๊ฐ์ด ์ฌ์ฉํ๋ ๋ชจ๋ธ์ ํ์ง์ ์ง์์ ์ผ๋ก ํฅ์์ํต๋๋ค.
As a Senior ML Research Engineer on the Marengo team, you will drive the research and development of TwelveLabs' multimodal embedding models, from data strategy and training pipeline optimization to model architecture experimentation and evaluation.
This is a research-heavy engineering role at the intersection of multimodal representation learning, large-scale distributed training, and data engineering. We're looking for a strong engineer-researcher who can take well-scoped research problems with moderate ambiguity, design rigorous experiments, and deliver reproducible results that ship to production.
Design and execute experiments to improve multimodal embedding model quality, spanning model architecture, training methodology, data composition, and evaluation
Build and optimize large-scale distributed training pipelines (multi-node, multi-GPU) for contrastive and representation learning
Develop and improve data curation, filtering, and quality assessment pipelines at scale
Conduct ablation studies to systematically evaluate design choices and communicate findings to guide technical direction
Implement evaluation frameworks and benchmarks that rigorously measure embedding model quality
Collaborate with the search/serving team to ensure model improvements translate to end-to-end retrieval quality gains
Even if you don't check every box, we encourage you to apply.
If you're a zero-to-one achiever, a ferocious learner, and a kind team player who motivates others, you'll find a home at TwelveLabs.
4โ7 years of industry experience in computer vision, NLP, or multimodal learning, with a track record of shipping ML systems to production
Strong proficiency in Python and PyTorch, with hands-on experience in distributed model training
Experience in contrastive learning, representation learning, or embedding models, demonstrated through shipped products, publications, or open-source contributions
End-to-end ownership experience: taking a model from research idea through training to production deployment, not just running experiments in isolation
Ability to independently drive research projects from problem definition through experiment design to conclusions
Effective communication skills for collaborating with colleagues from diverse backgrounds
We evaluate based on relevant technical skills and industry impact rather than degrees alone. This role is typically a strong fit for engineers with an MS and meaningful industry experience building ML systems at scale.
Experience with temporal video understanding (segmentation, boundary detection, temporal grounding)
Experience with large-scale data curation (filtering, deduplication, quality scoring) for model training
Experience with training infrastructure optimization (mixed precision, gradient checkpointing, communication backends)
Familiarity with experiment tracking and reproducibility tools
Experience with petabyte-scale data processing
The gap between research and production is remarkably short here. Models you build will be used by thousands of companies worldwide within months. We work as a unified team toward the broader goal of video understanding, rather than solving isolated problems. Our research philosophy balances rigorous experimentation with real-world application: we aim to build multimodal systems that are powerful, trustworthy, and genuinely useful.
Work Location: Seoul Itaewon office + Pangyo satellite office
Additional Info: ์ ๋ฌธ์ฐ๊ตฌ์์ ํธ์ /์ ์ง ๊ฐ๋ฅํฉ๋๋ค.
Application Review โ Recruiter Interview (๋น๋๋ฉด/30๋ถ) โ Loop Interview [Hiring Manager Interview&Live Coding Test Interview] (๋๋ฉด/์ฝ 90๋ถ) โ Loop Interview [System Design&Final Round Interview] (๋น๋๋ฉด/์ฝ 90๋ถ) โ Reference Check โ Offer
๊ธ๋ก๋ฒ B2B ๊ณ ๊ฐ๊ณผ ํจ๊ป ์ฑ์ฅํ๋ Global Team
์์จ์ฑ๊ณผ ํ์ ์ ๋ชจ๋ ๊ฐ์ถ ํ์ด๋ธ๋ฆฌ๋ ๊ทผ๋ฌด
์ ์ง์์๊ฒ ๋งฅ๋ถ ๋ฐ 70๋ง ์ ์๋น ์ฌํ๊ทผ๋ฌด ์ฅ๋น ์ง์, 3๋ ์ฃผ๊ธฐ๋ก ์ต์ ์ฅ๋น ๊ต์ฒด
์์ฌยท๊ตํต๋น ๋ฑ ์์ ๋กญ๊ฒ ์ฌ์ฉํ ์ ์๋ ์ 60๋ง ์ ํ๋ ๋ฒ์ธ์นด๋ ์ ๊ณต
์ฌ๋ฌด์ค ๋ด ์ค๋ต๋ฐ(๊ฐ์, ์ปคํผ, ์ ์ ์ํ ์ ๊ณต)
์ฐ๋ง 2์ฃผ๊ฐ ๊ฒจ์ธ๋ฐฉํ ์ด์
์ฐ 1ํ ๊ฑด๊ฐ๊ฒ์ง ์ง์
์์ด๊ต์ก ํ๋ก๊ทธ๋จ ์ง์
twelve-labs