small data lab
@ Cornell Tech
Tell Me More

Small Data

Creating Access to Your Data for You

Small Data are the myriad of digital traces we each generate everyday. Unfortunately, that data is often unavailable to us in a form that we can make sense of or act upon. Imagine a special kind of app running in the cloud that privately and securely turns your small data into big insights.

Open-Source and Modular

We've designed our infrastructure and apps to be modular and reusable so that they can be remixed and repurposed for new services and platform architectures, from research to commercial products. And everything down to the front-end UIs of our small data apps are Open Source. Fork away.

Developed Through Applied Research

We're not just building tools and services for theoretical use cases and "what if?" scenarios. Through our collaborations with researchers spanning healthcare, behavioral economics, human computer interaction and policy we collaborate to iteratively develop and evolve our services to address issues and problems in the real-world.


We're going beyond recommendation engines to services that contextually parse your small data to provide the right insights at the right time, to help you make (hopefully) the right plans and decisions. And as for the data that drives these insights, we take the utmost care and precaution with regards to its security while advancing architectural frameworks that maximize transparency and individual control of small data usage.


Research Stack

This SDK and UX framework for building research study apps on Android and iOS supports scientific research, including capturing participant consent, extensible input tasks, and the security and privacy needs necessary for IRB approval.

Digital Marshmallow

The Digital Marshmallow Test (DMT) is a mobile app designed to identify, understand and help individuals who have difficulty regulating their behavior.


Beehive is a research platform designed to study human behavior outside of the lab (in the wild) using small data and ubiquitous mobile technology.

Immersive Recommendation

new user-centric recommendation framework that incorporates cross-platform and diverse personal digital traces into recommendations.


The Ancile Project is developing a new software platform for managing microscale data in a privacy-sensitive manner.

Retrospective Data Learning

By analyzing personal retrospective data traces, we aim to learn temporal patterns and deviations that reveal individual behavior patterns.


  • First Smartphone Study Launches to Examine Impulse Control

    small data lab
    CornellTech Website, February, 2017

    The Feinstein Institute for Medical Research, Cornell Tech, and Sage Bionetworks announced today the launch of a pioneering study to examine the use of a smartphone application to identify and understand impulsivity in daily life. The team intends to develop additional apps using study data with the goal of providing support to those looking to change impulsive behaviors and better their ability to resist unhealthy temptations.

  • Deborah Estrin named 2017 IEEE Internet Award Recipient

    Deborah Estrin
    CornellTech Website, June, 2016

    Deborah Estrin, Professor of Computer Science at Cornell Tech and Professor of Public Health at Weill Cornell, has been named the 2017 IEEE Internet Award Recipient. Sponsored by the Nokia Corporation, the award is given for exceptional contributions to the advancement of Internet technology. Specifically, Estrin was selected for “formative contributions and thought leadership in Internet routing and in mobile sensing techniques and applications, from environmental monitoring to personal and community health.” IEEE is the world’s largest technical professional organization dedicated to advancing technology for the benefit of humanity.

  • The Jacobs Technion-Cornell Institute at Cornell Tech and AOL today announced a research technology called “Immersive Recommendations,” a concept where a user opts in to a tool that translates personal digital traces from one platform into content recommendations in another. The new technique was developed by Cornell Tech researchers to address “the cold-start problem” — how to engage users with relevant content when they first start using a platform. As an example, could a service like Netflix suggest better movies for a first-time user, if it tapped into their Twitter data? Could use your Medium posts to tailor events for you? Researchers will present the the new paper at the 25th International World Wide Web Conference in Montreal, Canada.

  • small data, where n = me

    Deborah Estrin
    Communications of the ACM 2014

    We hear a lot about how big data, smart devices, and all the '-omics' (for example, genomics, proteomics, metabolomics, and so forth) are going to transform medicine—and they will. But there is another force that is going to change the way we think about and practice health, and that is our small data—small data derived from our individual digital traces.

  • We leave a trail of digital data breadcrumbs as we go about our days. With access and good apps, we could make sense of this "small data" to help get a clearer picture of our personal health. Deborah Estrin, networked sensing pioneer, Professor of Computer Science at the new Cornell Tech campus in New York City and co-founder of the non-profit startup, Open mHealth, explains at TEDMED 2013.

Our Amazing Team

Deborah Estrin


JP Pollak

Senior Researcher in residence

Michael Sobolev

Chief Scientist

Chen-Kang Hsieh

Post-Doc Student

Faisal Alquaddoomi

PhD Student

Fabian Okeke

PhD Student

Longqi Yang

PhD Student

Hongyi Wen

PhD Student

Eugene Bagdasaryan

PhD Student

Yulia Reznik


Josh Gruenstein


Past Members

Recent Papers

Yum-me: A Personalized Nutrient-based Meal Recommender System. [PDF]
YANG, L., HSIEH, C., YANG, H., POLLAK, JP, DELL, N., Belongie, S., COLE, C., ESTRIN, D., ACM Transactions on Information Systems (TOIS), 2017
Personalizing Software and Web Services by Integrating Unstructured Application Usage Traces. [PDF]
YANG, L., FANG, C., JIN, H., HOFFMAN, M.D., ESTRIN, D. In Proceedings of 26th International World Wide Web Conference (WWW), Perth, Australia, April 2017.
Collaborative Metric Learning. [PDF]
HSIEH, C.H., YANG, L., CUI, Y., LIN, T.Y., BELONGIE, S., ESTRIN, D., 26th International World Wide Web Conference (WWW), Perth, Australia, April 2017.
The Pace of Technologic Change: Implications for Digital Health Behavior Intervention Research. [PDF]
PATRICK, K., HEKLER, E.B, ESTRIN, D., MOHR, D.C., RIPER, H., CRANE, D., GODINO, J., RILEY, W.T., American Journal of Preventive Medicine. 2016 Nov; 51(5):816-824.
Internet Scale Research Studies using SDL-RX. [PDF]
KIZER, J., SAHUGUET, A., LAKIN, N., CARROLL, M., POLLAK, JP AND ESTRIN, D. Presented at the Data For Good Exchange, September, 2016.
Your Activities of Daily Living (YADL): An Image-based Survey Technique for Patients with Arthritis. [PDF]
YANG, L., FREED, D., WU, A., WU, J., POLLAK, JP. AND ESTRIN, D. In Proceedings of the 10th International Conference on Pervasive Computing Technologies for Healthcare, Cancun, Mexico, May 2016.
Leveraging Multi-Modal Sensing for Mobile Health: a Case Review in Chronic Pain. [PDF]
AUNG, M. S. H., ALQUADDOOMI, F., HSIEH, A., RABBI, M., YANG, L., POLLAK, J.P., ESTRIN, D. and CHOUDHURY, T. IEEE Journal of Selected Topics in Signal Processing. 2016 Aug; 10(5): 962-974.
Immersive Recommendation: News and Event Recommendations Using Personal Digital Traces. [PDF]
HSIEH, C.K., YANG, L., WEI, H., NAAMAN, M. AND ESTRIN, D. In Proceedings of the 25th International World Wide Web Conference (WWW), Montréal, Quèbec, Canada, April 2016.
GroupLink: Group Event Recommendations Using Personal Digital Traces. [PDF]
Wei, H., Hsieh, C., Yang, L., Estrin, D., In the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW’16), 2016
Reassembling Our Digital Selves. [PDF]
ESTRIN, D. AND JUELS, A. 2016 Winter. Daedalus, 145, 1 , 43-53 (doi: 10.1162/DAED_a_00364).
Smartphone Data in Rheumatoid Arthritis - What Do Rheumatologists Want?[PDF]
SAY, P.R., STEIN, D., ANCKER J.S., HSIEH A, POLLAK JP. AND ESTRIN, D. In Proceedings of the AMIA Annual Symposium, San Francisco, CA, November 2015.
PlateClick: Bootstrapping Food Preferences Through an Adaptive Visual Interface. [PDF]
YANG, L., CUI, Y., ZHANG, F., POLLAK, JP., BELONGIE, S. AND ESTRIN, D. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM), Melbourne, Australia, October 2015
Beyond Classification: Latent User Interests Profiling from Visual Contents Analysis. [PDF]
YANG, L., HSIEH, C. AND ESTRIN, D., In Proceedings of Data Mining Workshop (ICDMW), IEEE International Conference. 2015
Pushcart: Supporting and Scaling Nutritionist-Client Relationships. [PDF]
BAUM, A., CARROLL, M., ESTRIN, D., GUNASEKARA, L. AND POLLAK, JP. In Proceedings of CSCW 2015: Workshop on Moving Beyond e-Health and the Quantified Self, Vancouver, Canada, March, 2015, CSCW 2015.
small data, where n=me. [PDF]
ESTRIN, D. 2014. CACM, Viewpoint Column, Communications of the ACM, 57, 4, 32-34.
The Email Analysis Framework: Aiding the analysis of personal natural language texts. [PDF]
ALQUADDOOMI, F., KETCHAM, C. and ESTRIN, D. In Workshop on Linking The Quantified Self (LinkQS), Santiago, Chile, 2014
Lifestreams: A Modular Sense-making Toolset for Identifying Important Patterns from Everyday Life. [PDF]
HSIEH, C.K., TANGMURARUNKIT, H., ALQUADDOOMI, F., JENKINS, J., KANG, J., KETCHAM, C., LONGSTAFF, B., SELSKY, J., SWENDEMAN, D., ESTRIN, D. AND RAMANATHAN, N. In Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems (SenSys), Rome Italy, November 2013.