416DAT

Artifact Content
Login

Artifact 7cd1608148d26dd473a19457106c57856c448525:

Wiki page [Project Proposal] by wtzou 2013-02-27 20:22:18.
D 2013-02-27T20:22:18.799
L Project\sProposal
P cbaab95de4f700aee357d04e48fe9172b67f2840
U wtzou
W 5191
<h1>Project Description</h1>
<p>
Our project will involve a comparative study of different document store database systems. The 5 systems we will be examining will be MongoDB, OrientDB, Couchbase, Redis and Apache Cassandra.

We will use the Yahoo! Cloud Serving Benchmark (YCSB) framework to collect data on the performance of each system. Performance will be tested by running common workloads across all of the systems using the YCSB client which is an open source workload generator. In addition to a predefined set of "Core" workloads already provided by YCSB, due to the extensible nature of the client, we will be able to define new workloads and look at performance from different aspects and test various scenarios

By the end of this study, we hope to achieve a better understanding of how document store databases perform in a distributed environment and how different offerings of document stores perform relative to each other under similar conditions. We also expect to gain a deeper appreciation of the challenges behind distributed systems in the context of databases.
</p>
<h1>Goals</h1>
<ul><li>Gather and compare performance analytics of several distributed databases under YCSB workloads as well as custom workloads</li>
<li>Implement unsupported database interfaces into the YCSB framework</li>
<li>Learn about distributed document-stores</li>
<li>Extend YCSB with custom workloads</li>

<li>Run workloads under simulated failure of nodes and analyze results</li>
<ul><li>1/5 node “unplugged”</li>
<li>2/5 nodes “unplugged”</li></ul>
</ul>

<h1>Final Deliverables</h1>
<ul><li>A comprehensive report comparing 5 select document store databases outlining key differences in performance and scalability under varying workflows and within a distributed environment</li>
<li>If time permits, interface adapters and workflow extensions written throughout the lifecycle of the project may be added as contributions to the YCSB framework</li>
</ul>

<h1>Internal Milestones</h1>
<ol>
<li>Get a YCSB instance up and running with the core workload ( Feb 11 )
</li>
<li>Add client interfaces if ones do not exist ( Feb 18th ) 
</li>
<li>Run core workloads on our databases and analyze results (local) ( Feb 22nd ) 
</li>
<li>Configure Amazon instances to use same resources (RAM, HDD/SSD speed, processors, caches); Snapshots, system images, schemes ( Mar 3rd ) 
</li>
<li>Set up databases on instances ( Mar 8th )
</li>
<li>Collect data ( Mar 15th )
</li>
<li>Data analysis and 1st Report Draft Done ( Mar 29th )
</li>
</ol>

<h1>Division Of Work</h1>

<ul>
<li>Each group member will be assigned to 1 out of the 5 databases and
</li>
<ul>
<li>research its implementation, constraints, benefits
</li>
<li>write a client adaptor if necessary inside of the YCSB framework
</li>
<li>run YCSB workloads against the database
</li>
<li>be responsible for communicating/analyzing/interpreting the results and comparing them against the other 4 databases
</li>
</ul>
</ul>


<h1>Wish List ( If Time Permits)</h1>
<ul>
<li>Perform further benchmarks on other well-known document store databases already supported by YCSB</li>
<li>Expand on various test scenarios and extend YCSB benchmarking workloads with custom tests
</li>
<li>Contribute extensions to the existing YCSB framework
</li>
</ul>

<h1>Risk Management</h1>

<p>Due to the modular nature of our goals we can choose not to do extended workflows or write our own workflows if the core workflow is proving too complex to analyze effectively under time constraints.
</p>

<h1>Tools, Scripts, Acknowledgements</h1>

<p>We will be relying heavily on research papers, documentation, and the YCSB benchmark framework for our report.</p>

<h1>Project References</h1>
<table border="1">
<tr>
<th>Links To Reference Documents</th>
</tr>
<tr>
<td><a href="http://research.yahoo.com/Web_Information_Management/YCSB">YCSB Overview</a></td>
</tr>
<tr>
<td><a href="http://research.yahoo.com/node/3202">YCSB Research Paper</a></td>
</tr>
<tr>
<td><a href="http://research.yahoo.com/files/ycsb-v4.pdf">YCSB Experimental Results</a></td>
</tr>
<tr>
<td><a href="http://github.com/brianfrankcooper/YCSB">YCSB GitHub Source Code</a></td>
</tr>
<tr>
<td><a href="https://github.com/brianfrankcooper/YCSB/wiki/Implementing-New-Workloads
">YCSB - new workloads</a></td>
</tr>
<tr>
<td><a href="https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads
">Core Workloads</a></td>
</tr>
<tr>
<td><a href="https://github.com/brianfrankcooper/YCSB/wiki/_pages
">List of wiki pages for YCSB</a></td>
</tr>
<tr>
<td><a href="http://www.mongodb.org/
">MongoDB</a></td>
</tr>
<tr>
<td><a href="http://www.orientdb.org/
">OrientDB</a></td>
</tr>
<tr>
<td><a href="http://www.couchbase.com/
">Couchbase</a></td>
</tr>
<tr>
<td><a href="http://redis.io/
">Redis</a></td>
</tr>
<tr>
<td><a href="http://cassandra.apache.org/
">Cassandra</a></td>
</tr>
</table>



<h1>Project Progress</h1>

<a href="http://chiselapp.com/user/jfang/repository/416DAT/wiki?name=Project+Description">Project Description and Progress</a>
Z 678c584db62121486ff6116ea2f3d561