As part of the SMART initiative, IBM Researchers are automating the process of data
partitioning. Given a workload of SQL statements, scientists are trying to determine
automatically how to partition data across multiple nodes to achieve optimal performance.
The traditional approach to this problem has been to use heuristic rules, but this method
does not consider all aspects of a query performance. IBM Researchers are taking a slightly
different approach to automating and improving data partitioning by using the query optimizer
to recommend possible partitions for each table that will have a positive impact on each query
workload. Additionally, IBM data management experts have compared a rank-based enumeration
method with a random-based on and experiment results have demonstrated that the former is
more effective.