Yazar "Gursoy, Mehmet Emre" seçeneğine göre listele
Listeleniyor 1 - 5 / 5
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe Differentially private nearest neighbor classification(Springer, 2017) Gursoy, Mehmet Emre; Inan, Ali; Nergiz, Mehmet Ercan; Saygin, YucelInstance-based learning, and the k-nearest neighbors algorithm (k-NN) in particular, provide simple yet effective classification algorithms for data mining. Classifiers are often executed on sensitive information such as medical or personal data. Differential privacy has recently emerged as the accepted standard for privacy protection in sensitive data. However, straightforward applications of differential privacy to k-NN classification yield rather inaccurate results. Motivated by this, we develop algorithms to increase the accuracy of private instance-based classification. We first describe the radius neighbors classifier (r-N) and show that its accuracy under differential privacy can be greatly improved by a non-trivial sensitivity analysis. Then, for k-NN classification, we build algorithms that convert k-NN classifiers to r-N classifiers. We experimentally evaluate the accuracy of both classifiers using various datasets. Experiments show that our proposed classifiers significantly outperform baseline private classifiers (i.e., straightforward applications of differential privacy) and executing the classifiers on a dataset published using differential privacy. In addition, the accuracy of our proposed k-NN classifiers are at least comparable to, and in many cases better than, the other differentially private machine learning techniques.Öğe Explode: An Extensible Platform for Differentially Private Data Analysis(IEEE, 2016) Esmerdag, Emir; Gursoy, Mehmet Emre; Inan, Ali; Saygin, YucelDifferential privacy (DP) has emerged as a popular standard for privacy protection and received great attention from the research community. However, practitioners often find DP cumbersome to implement, since it requires additional protocols (e.g., for randomized response, noise addition) and changes to existing database systems. To avoid these issues we introduce Explode, a platform for differentially private data analysis. The power of Explode comes from its ease of deployment and use: The data owner can install Explode on top of an SQL server, without modifying any existing components. Explode then hosts a web application that allows users to conveniently perform many popular data analysis tasks through a graphical user interface, e.g., issuing statistical queries, classification, correlation analysis. Explode automatically converts these tasks to collections of SQL queries, and uses the techniques in [3] to determine the right amount of noise that should be added to satisfy DP while producing high utility outputs. This paper describes the current implementation of Explode, together with potential improvements and extensions.Öğe Graph-based modelling of query sets for differential privacy(Assoc Computing Machinery, 2016) Inan, Ali; Gursoy, Mehmet Emre; Esmerdag, Emir; Saygin, YucelDifferential privacy has gained attention from the community as the mechanism for privacy protection. Significant effort has focused on its application to data analysis, where statistical queries are submitted in batch and answers to these queries are perturbed with noise. The magnitude of this noise depends on the privacy parameter s and the sensitivity of the query set. However, computing the sensitivity is known to be NP-hard. In this study, we propose a method that approximates the sensitivity of a query set. Our solution builds a query-region-intersection graph. We prove that computing the maximum clique size of this graph is equivalent to bounding the sensitivity from above. Our bounds, to the best of our knowledge, are the tightest known in the literature. Our solution currently supports a limited but expressive subset of SQL queries (i.e., range queries), and almost all popular aggregate functions directly (except AVERAGE). Experimental results show the efficiency of our approach: even for large query sets (e.g., more than 2K queries over 5 attributes), by utilizing a state-of-the-art solution for the maximum clique problem, we can approximate sensitivity in under a minute.Öğe Privacy-Preserving Learning Analytics: Challenges and Techniques(IEEE Computer Soc, 2017) Gursoy, Mehmet Emre; Inan, Ali; Nergiz, Mehmet Ercan; Saygin, YucelEducational data contains valuable information that can be harvested through learning analytics to provide new insights for a better education system. However, sharing or analysis of this data introduce privacy risks for the data subjects, mostly students. Existing work in the learning analytics literature identifies the need for privacy and pose interesting research directions, but fails to apply state of the art privacy protection methods with quantifiable and mathematically rigorous privacy guarantees. This work aims to employ and evaluate such methods on learning analytics by approaching the problem from two perspectives: (1) the data is anonymized and then shared with a learning analytics expert, and (2) the learning analytics expert is given a privacy-preserving interface that governs her access to the data. We develop proof-of-concept implementations of privacy preserving learning analytics tasks using both perspectives and run them on real and synthetic datasets. We also present an experimental study on the trade-off between individuals' privacy and the accuracy of the learning analytics tasks.Öğe Sensitivity Analysis for Non-Interactive Differential Privacy: Bounds and Efficient Algorithms(IEEE Computer Soc, 2020) Inan, Ali; Gursoy, Mehmet Emre; Saygin, YucelDifferential privacy (DP) has gained significant attention lately as the state of the art in privacy protection. It achieves privacy by adding noise to query answers. We study the problem of privately and accurately answering a set of statistical range queries in batch mode (i.e., under non-interactive DP). The noise magnitude in DP depends directly on the sensitivity of a query set, and calculating sensitivity was proven to be NP-hard. Therefore, efficiently bounding the sensitivity of a given query set is still an open research problem. In this work, we propose upper bounds on sensitivity that are tighter than those in previous work. We also propose a formulation to exactly calculate sensitivity for a set of COUNT queries. However, it is impractical to implement these bounds without sophisticated methods. We therefore introduce methods that build a graph model G based on a query set Q, such that implementing the aforementioned bounds can be achieved by solving two well-known clique problems on G. We make use of the literature in solving these clique problems to realize our bounds efficiently. Experimental results show that for query sets with a few hundred queries, it takes only a few seconds to obtain results.