small data lab
@ Cornell Tech
Tell Me More

Small Data

Creating Access to Your Data for You

Small Data are the myriad of digital traces we each generate everyday. Unfortunately, that data is often unavailable to us in a form that we can make sense of or act upon. Imagine a special kind of app running in the cloud that privately and securely turns your small data into big insights.

Open-Source and Modular

We've designed our infrastructure and apps to be modular and reusable so that they can be remixed and repurposed for new services and platform architectures, from research to commercial products. And everything down to the front-end UIs of our small data apps are Open Source. Fork away.

Developed Through Applied Research

We're not just building tools and services for theoretical use cases and "what if?" scenarios. Through our collaborations with researchers spanning healthcare, behavioral economics, human computer interaction and policy we collaborate to iteratively develop and evolve our services to address issues and problems in the real-world.


We're going beyond recommendation engines to services that contextually parse your small data to provide the right insights at the right time, to help you make (hopefully) the right plans and decisions. And as for the data that drives these insights, we take the utmost care and precaution with regards to its security while advancing architectural frameworks that maximize transparency and individual control of small data usage.


Immersive Recommendation

New user-centric recommendation framework that incorporates cross-platform and diverse personal digital traces into recommendations.


The Ancile Project is developing a new software platform for managing microscale data in a privacy-sensitive manner.

Retrospective Data Learning

By analyzing personal retrospective data traces, we aim to learn temporal patterns and deviations that reveal individual behavior patterns.

Research Stack

This SDK and UX framework for building research study apps on Android and iOS supports scientific research

Digital Marshmallow

The Digital Marshmallow Test (DMT) is a mobile app designed to identify, understand and help individuals who have difficulty regulating their behavior.


Beehive is a research platform designed to study human behavior outside of the lab (in the wild) using small data and ubiquitous mobile technology.


Pushcart is an effortless and compatible passive data collection tool that provides objective, high frequency, household-level data on food consumption and nutrition.


LIMBR is a suite of applications that we have developed based on the innovations developed under this funding for effective pain reporting, and self-directed management of medication and exercise tracking.


  • Please join us in congratulating Deborah Estrin on being selected as a 2018 MacArthur Foundation Fellow for her groundbreaking work and research in small data.

    You can read the article in the Cornell Chronicle

    MacArthur Foundation Announcement

  • OpenRec Release

    small data lab
    February, 2018

    We are excited to release OpenRec - a modular framework designed for researchers and practitioners to readily adapt and extend state-of-the-art recommendation algorithms. Checkout our website and #wsdm18 paper.

  • First Smartphone Study Launches to Examine Impulse Control

    small data lab
    CornellTech Website, February, 2017

    The Feinstein Institute for Medical Research, Cornell Tech, and Sage Bionetworks announced today the launch of a pioneering study to examine the use of a smartphone application to identify and understand impulsivity in daily life. The team intends to develop additional apps using study data with the goal of providing support to those looking to change impulsive behaviors and better their ability to resist unhealthy temptations.

  • Deborah Estrin named 2017 IEEE Internet Award Recipient

    Deborah Estrin
    CornellTech Website, June, 2016

    Deborah Estrin, Professor of Computer Science at Cornell Tech and Professor of Public Health at Weill Cornell, has been named the 2017 IEEE Internet Award Recipient. Sponsored by the Nokia Corporation, the award is given for exceptional contributions to the advancement of Internet technology. Specifically, Estrin was selected for “formative contributions and thought leadership in Internet routing and in mobile sensing techniques and applications, from environmental monitoring to personal and community health.” IEEE is the world’s largest technical professional organization dedicated to advancing technology for the benefit of humanity.

  • The Jacobs Technion-Cornell Institute at Cornell Tech and AOL today announced a research technology called “Immersive Recommendations,” a concept where a user opts in to a tool that translates personal digital traces from one platform into content recommendations in another. The new technique was developed by Cornell Tech researchers to address “the cold-start problem” — how to engage users with relevant content when they first start using a platform. As an example, could a service like Netflix suggest better movies for a first-time user, if it tapped into their Twitter data? Could use your Medium posts to tailor events for you? Researchers will present the the new paper at the 25th International World Wide Web Conference in Montreal, Canada.

  • small data, where n = me

    Deborah Estrin
    Communications of the ACM 2014

    We hear a lot about how big data, smart devices, and all the '-omics' (for example, genomics, proteomics, metabolomics, and so forth) are going to transform medicine—and they will. But there is another force that is going to change the way we think about and practice health, and that is our small data—small data derived from our individual digital traces.

  • We leave a trail of digital data breadcrumbs as we go about our days. With access and good apps, we could make sense of this "small data" to help get a clearer picture of our personal health. Deborah Estrin, networked sensing pioneer, Professor of Computer Science at the new Cornell Tech campus in New York City and co-founder of the non-profit startup, Open mHealth, explains at TEDMED 2013.

Our Amazing Team

Deborah Estrin


JP Pollak

Senior Researcher in residence

Michael Sobolev

Chief Scientist

Faisal Alquaddoomi

PhD Student

Fabian Okeke

PhD Student

Longqi Yang

PhD Student

Hongyi Wen

PhD Student

Eugene Bagdasaryan

PhD Student

Matthew Griffith

Developer in residence

Pargol Gheissari


Jason Waterman

Visiting Professor

Past Members

Recent Papers

Mobile Health Technologies for Older Adults with Cardiovascular Disease: Current Evidence and Future Directions. [PDF]
SEARCY, R., SUMMAPUND, J., ESTRIN, D., POLLAK, JP, SCHOENTHALER, A., TROXEL, A., DODSON, J.A., Current Geriatrics Reports (2019), January 2019
More than Just Words: Modeling Non-textual Characteristics of Podcasts. [PDF]
YANG, L., WANG, Y., DUNNE, D. SOBOLEV, M., NAAMAN, M., C., ESTRIN, D. In Twelfth ACM Conference on Web Search and Data Mining (WSDM’19), February 2019
Understanding User Interactions with Podcast Recommendations Delivered Via Voice. [PDF]
YANG, L., SOBOLEV, M., TSANGOURI, C., ESTRIN, D. In Twelfth ACM Conference on Recommender Systems (RecSys ’18) October 2018
Exploring Recommendations Under User-Controlled Data Filtering. [PDF]
WEN, H., YANG, L., SOBOLEV, M., ESTRIN, D. In Twelfth ACM Conference on Recommender Systems (RecSys ’18) October 2018
Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback. [PDF]
YANG, L., CUI, Y., XUAN, Y., WANG, C., BELONGIE, S., ESTRIN, D. In Twelfth ACM Conference on Recommender Systems (RecSys ’18) October 2018
An mHealth App for Self-Management of Chronic Lower Back Pain (Limbr): Pilot Study. [PDF]
Good Vibrations: Can a Digital Nudge Reduce Digital Overload? [PDF]
OKEKE, F., SOBOLEV, M., Dell, N., ESTRIN, D. International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI 2018). September 2018
Characterizing User Skills from Application Usage Traces with Hierarchical Attention Recurrent Networks. [PDF]
YANG, L., FANG, C., JIN, H., HOFFMAN, M.D., ESTRIN, D. ACM Transactions on Intelligent Systems and Technology (TIST) June 2018
Small Data: Applications and Architecture. [PDF]
HSIEH, C.K., ALQUADDOOMI, F., OKEKE, F., POLLAK, JP, GUNASEKARA, L., ESTRIN, D. Proceedings of the Fourth International Conference on Big Data, Small Data, Linked Data and Open Data, April 2018.
Towards A Framework for Mobile Behavior Change Research.
OKEKE, F., SOBOLEV, M., ESTRIN, D. In Technology, Mind, and Society: APAScience, Washington DC, USA, April 2018
Ranking Subreddits by Classifier Indistinguishability in the Reddit Corpus. [PDF]
ALQUADDOOMI, F., ESTRIN, D. Proceedings of the Tenth International Conference on Information, Process, and Knowledge Management, March 2018.
OpenRec: A Modular Framework for Extensible Recommendation Algorithms. [PDF]
YANG, L., BAGDASARYAN, E., GRUENSTEIN, J., HSIEH, C., ESTRIN, D. Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM), Feburary 2018.
Yum-me: A Personalized Nutrient-based Meal Recommender System. [PDF]
YANG, L., HSIEH, C., YANG, H., POLLAK, JP, DELL, N., Belongie, S., COLE, C., ESTRIN, D., ACM Transactions on Information Systems (TOIS), August 2017 2017
Personalizing Software and Web Services by Integrating Unstructured Application Usage Traces. [PDF]
YANG, L., FANG, C., JIN, H., HOFFMAN, M.D., ESTRIN, D. In Proceedings of 26th International World Wide Web Conference (WWW), Perth, Australia, April 2017.
Collaborative Metric Learning. [PDF]
HSIEH, C.H., YANG, L., CUI, Y., LIN, T.Y., BELONGIE, S., ESTRIN, D., In Proceedings of 26th International World Wide Web Conference (WWW), Perth, Australia, April 2017.
The Pace of Technologic Change: Implications for Digital Health Behavior Intervention Research. [PDF]
PATRICK, K., HEKLER, E.B, ESTRIN, D., MOHR, D.C., RIPER, H., CRANE, D., GODINO, J., RILEY, W.T., American Journal of Preventive Medicine. 2016 Nov; 51(5):816-824.
Internet Scale Research Studies using SDL-RX. [PDF]
KIZER, J., SAHUGUET, A., LAKIN, N., CARROLL, M., POLLAK, JP AND ESTRIN, D. Presented at the Data For Good Exchange, September, 2016.
Your Activities of Daily Living (YADL): An Image-based Survey Technique for Patients with Arthritis. [PDF]
YANG, L., FREED, D., WU, A., WU, J., POLLAK, JP. AND ESTRIN, D. In Proceedings of the 10th International Conference on Pervasive Computing Technologies for Healthcare, Cancun, Mexico, May 2016.
Leveraging Multi-Modal Sensing for Mobile Health: a Case Review in Chronic Pain. [PDF]
AUNG, M. S. H., ALQUADDOOMI, F., HSIEH, A., RABBI, M., YANG, L., POLLAK, J.P., ESTRIN, D. and CHOUDHURY, T. IEEE Journal of Selected Topics in Signal Processing. 2016 Aug; 10(5): 962-974.
Immersive Recommendation: News and Event Recommendations Using Personal Digital Traces. [PDF]
HSIEH, C.K., YANG, L., WEI, H., NAAMAN, M. AND ESTRIN, D. In Proceedings of the 25th International World Wide Web Conference (WWW), Montréal, Quèbec, Canada, April 2016.
GroupLink: Group Event Recommendations Using Personal Digital Traces. [PDF]
Wei, H., Hsieh, C., Yang, L., Estrin, D., In the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW’16), March 2016
Reassembling Our Digital Selves. [PDF]
ESTRIN, D. AND JUELS, A. Daedalus, 145, 1 , 43-53 (doi: 10.1162/DAED_a_00364). January 2016
Smartphone Data in Rheumatoid Arthritis - What Do Rheumatologists Want?[PDF]
SAY, P.R., STEIN, D., ANCKER J.S., HSIEH A, POLLAK JP. AND ESTRIN, D. In Proceedings of the AMIA Annual Symposium, San Francisco, CA, November 2015.
PlateClick: Bootstrapping Food Preferences Through an Adaptive Visual Interface. [PDF]
YANG, L., CUI, Y., ZHANG, F., POLLAK, JP., BELONGIE, S. AND ESTRIN, D. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM), Melbourne, Australia, October 2015
Beyond Classification: Latent User Interests Profiling from Visual Contents Analysis. [PDF]
YANG, L., HSIEH, C. AND ESTRIN, D., In Proceedings of Data Mining Workshop (ICDMW), IEEE International Conference. November 2015
Pushcart: Supporting and Scaling Nutritionist-Client Relationships. [PDF]
BAUM, A., CARROLL, M., ESTRIN, D., GUNASEKARA, L. AND POLLAK, JP. In Proceedings of CSCW 2015: Workshop on Moving Beyond e-Health and the Quantified Self, Vancouver, Canada, March, 2015, CSCW 2015.
small data, where n=me. [PDF]
ESTRIN, D. CACM, Viewpoint Column, Communications of the ACM, 57, 4, 32-34. April 2014
The Email Analysis Framework: Aiding the analysis of personal natural language texts. [PDF]
ALQUADDOOMI, F., KETCHAM, C. and ESTRIN, D. In Workshop on Linking The Quantified Self (LinkQS), Santiago, Chile, September 2014
Lifestreams: A Modular Sense-making Toolset for Identifying Important Patterns from Everyday Life. [PDF]
HSIEH, C.K., TANGMURARUNKIT, H., ALQUADDOOMI, F., JENKINS, J., KANG, J., KETCHAM, C., LONGSTAFF, B., SELSKY, J., SWENDEMAN, D., ESTRIN, D. AND RAMANATHAN, N. In Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems (SenSys), Rome Italy, November 2013.