🏠 Home
🔭 About
📺 Programs
Overview
🧪 Open Source Research Experience
🧪 Summer of Reproducibility
🪺 Open Source Incubator Fellowship
🎓 Open Source Education
📚 Resources
📝 Blog
🎪 Events
osre25
Applying MLOps to overcome reproducibility barriers in machine learning research
Topics: machine learning, MLOps, reproducibility Skills: Python, machine learning, GitOps, systems, Linux, data, Docker Difficulty: Hard Size: Large (350 hours) Mentors: Fraida Fund and Mohamed Saeed Project Idea Description
Fraida Fund
CacheBench: Building a Benchmarking Suite for Cache Performance Evaluation
Overview In this project, we aim to develop a comprehensive benchmarking suite, CacheBench, for evaluating the performance of cache systems in modern computing environments. Caches play a crucial role in enhancing system performance by reducing latency and improving data access speeds.
Juncheng Yang
FairFace
FairFace: Reproducible Bias Evaluation in Facial AI Models via Controlled Skin Tone Manipulation Bias in facial AI models remains a persistent issue, particularly concerning skin tone disparities. Many studies report that AI models perform differently on lighter vs.
James Davis
IO logger: IO tracing in the modern computing era
Overview Storage systems are critical components of modern computing infrastructures, and understanding their performance characteristics is essential for optimizing system efficiency. There were many works from twenty to thirty years ago, but the landscape has changed significantly with the advent of
Juncheng Yang
ReasonWorld
ReasonWorld: Real-World Reasoning with a Long-Term World Model A world model is essentially an internal representation of an environment that an AI system would construct based on external information to plan, reason, and interpret its surroundings.
James Davis
AI for Science: Automating Domain Specific Tasks with Large Language Models
Recent advancements in Large Language Models (LLMs) have transformed various fields by demonstrating remarkable capabilities in processing and generating human-like text. This project aims to explore the development of an open-source framework that leverages LLMs to enhance discovery across specialized domains.
Daniel Wong
,
Luanzheng "Lenny" Guo
Enhancing Reproducibility in Distributed AI Training: Leveraging Checkpointing and Metadata Analytics
Reproducibility in distributed AI training is a crucial challenge due to several sources of uncertainty, including stragglers, data variability, and inherent randomness. Stragglers—slower processing nodes in a distributed system—can introduce timing discrepancies that affect the synchronization of model updates, leading to inconsistent states across training runs.
Luanzheng "Lenny" Guo
Enhancing Reproducibility in RAG Frameworks for Scientific Workflows
Retrieval-Augmented Generation (RAG) frameworks, which merge the capabilities of retrieval systems and generative models, significantly enhance the relevance and accuracy of responses produced by large language models (LLMs). These frameworks retrieve relevant documents from a large corpus and use these documents to inform the generative process, thereby improving the contextuality and precision of the generated content.
Luanzheng "Lenny" Guo
Exploration of I/O Reproducibility with HDF5
Parallel I/O is a critical component in high-performance computing (HPC), allowing multiple processes to read and write data concurrently from a shared storage system. HDF5—a widely adopted data model and library for managing complex scientific data—supports parallel I/O but introduces challenges in I/O reproducibility, where repeated executions do not always produce identical results.
Luanzheng "Lenny" Guo
,
Wei Zhang
Peersky Browser
Peersky Browser is an experimental personal gatekeeper to a new way of accessing web content. In a world where a handful of big companies control most of the internet, Peersky leverages distributed web technologies—IPFS, Hypercore, and Web3—to return control to the users.
Akhilesh Thite
«
»
Cite
×