Skip to main content

Jaql

Overview

Jaql is a query language whose objectives are to research semi-structured query processing, extensibility, and parallelization. We use JSON (JavaScript Object Notation) as a simple, yet flexible way to represent data that ranges from flat, relational data to semi-structured, XML data. So long as a "JSON view" over data can be defined, Jaql will process it. In addition, due to JSON's widespread use in the Web 2.0 community, bindings to most scripting and programming languages have been developed. This ties in with the extensibility objective: by using JSON, Jaql easily allows for developers to pass data between the query language and the language chosen for user-defined functions (UDF) (e.g., JavaScript, Java, Python, Ruby, Perl, etc).

Jaql's query language was inspired by many programming and query languages that include: Lisp, SQL, XQuery, and PigLatin. Jaql is a functional, declarative query language that is designed to process large data sets. For parallelism, Jaql rewrites high-level queries when appropriate into a "low-level" query consisting of Map-Reduce jobs that are evaluated using the Apache Hadoop project. Interestingly, the query rewriter produces valid Jaql queries which illustrates a departure from the rigid, declarative-only approach (but with hints!) of most relational databases. Instead, developers can interact with the "low-level" queries if needed and can add in their own low-level functionality such as indexed access or hash-based joins that are missing from Map-Reduce platforms.

Jaql is developed in the open source with an Apache 2.0 License. More information can be found here -- feedback and contributions are greatly appreciated.

Project Contact: Eugene Shekita

[an error occurred while processing this directive]