# HG changeset patch # User Mike Becker # Date 1431511254 -7200 # Node ID 9cec78f23cbfe22ba2562982ddd672aa75254ed6 # Parent ee0de2b1872ee556f60e9cb2b6703e5643279073 started refactoring davqlparser diff -r ee0de2b1872e -r 9cec78f23cbf libidav/davqlparser.c --- a/libidav/davqlparser.c Sat May 02 18:52:04 2015 +0200 +++ b/libidav/davqlparser.c Wed May 13 12:00:54 2015 +0200 @@ -49,7 +49,7 @@ static const char* _map_exprtype(davqlexprtype_t type) { switch(type) { - case DAVQL_UNDEFINED_TYP: return "undefined"; + case DAVQL_UNDEFINED_TYPE: return "undefined"; case DAVQL_NUMBER: return "NUMBER"; case DAVQL_STRING: return "STRING"; case DAVQL_TIMESTAMP: return "TIMESTAMP"; @@ -90,12 +90,14 @@ } static void dav_debug_ql_fnames_print(DavQLStatement *stmt) { - printf("Field names: "); - UCX_FOREACH(field, stmt->fields) { - DavQLField *f = field->data; - printf("%.*s, ", sfmtarg(f->name)); + if (stmt->fields) { + printf("Field names: "); + UCX_FOREACH(field, stmt->fields) { + DavQLField *f = field->data; + printf("%.*s, ", sfmtarg(f->name)); + } + printf("\b\b \b\b\n"); } - printf("\b\b \b\b\n"); } static void dav_debug_ql_stmt_print(DavQLStatement *stmt) { @@ -349,196 +351,160 @@ // P A R S E R // ------------------------------------------------------------------------ -#define _unexpected_end_msg "unexpected end of statement" -#define _invalid_msg "invalid statement" -#define _unexpected_token "unexpected token (%.*s [->]%.*s %.*s)" -#define _expected_token "expected token '%s' before '%.*s'" -#define _expected_by "expected 'by' after 'order' (order [->]%.*s)" -#define _missing_fmtspec "format specifier missing (%.*s [->]%.*s %.*s)" -#define _invalid_fmtspec "invalid format specifier (%.*s [->]%.*s %.*s)" -#define _unknown_fmtspec "unknown format specifier (%.*s [->]%.*s %.*s)" -#define _missing_quote "missing closing quote symbol (%.*s)" -#define _parser_state "parser reached invalid state" -#define _unknown_attribute "unknown attribute '%.*s'" -#define _duplicated_attribute "duplicated attribute '%.*s'" -#define _invalid_depth "invalid depth" -#define _invalid_path "invalid path" +#define _error_context "(%.*s [->]%.*s %.*s)" +#define _error_invalid "invalid statement" +#define _error_unhandled "unhandled error " _error_context +#define _error_unexpected_token "unexpected token " _error_context +#define _error_invalid_token "invalid token " _error_context +#define _error_missing_from "missing FROM keyword " _error_context +#define _error_missing_by "missing BY keyword " _error_context +#define _error_invalid_depth "invalid depth " _error_context +#define _error_missing_expr "missing expression " _error_context +#define _error_invalid_unary_op "invalid unary operator " _error_context -#define _identifier_expected "identifier expected (%.*s [->]%.*s %.*s)" -#define _idornum_expected "identifier or number expected (%.*s [->]%.*s %.*s)" -#define _idorstr_expected "identifier or string expected (%.*s [->]%.*s %.*s)" -#define _idorts_expected "identifier or timestamp expected (%.*s [->]%.*s %.*s)" - -#define token_sstr(listelem) ((sstr_t*)(listelem)->data) +#define token_sstr(token) (((DavQLToken*)(token)->data)->value) static void dav_error_in_context(int errorcode, const char *errormsg, DavQLStatement *stmt, UcxList *token) { sstr_t emptystring = ST(""); stmt->errorcode = errorcode; stmt->errormessage = ucx_sprintf(errormsg, - sfmtarg(token->prev?*token_sstr(token->prev):emptystring), - sfmtarg(*token_sstr(token)), - sfmtarg(token->next?*token_sstr(token->next):emptystring)).ptr; + sfmtarg(token->prev?token_sstr(token->prev):emptystring), + sfmtarg(token_sstr(token)), + sfmtarg(token->next?token_sstr(token->next):emptystring)).ptr; } // special symbols are single tokens - the % sign MUST NOT be a special symbol static const char *special_token_symbols = ",()+-*/&|^~=!<>"; +static _Bool iskeyword(DavQLToken *token) { + sstr_t keywords[] = {ST("get"), ST("set"), ST("from"), ST("at"), ST("as"), + ST("where"), ST("with"), ST("order"), ST("by"), ST("asc"), ST("desc") + }; + for (int i = 0 ; i < sizeof(keywords)/sizeof(char*) ; i++) { + if (!sstrcasecmp(token->value, keywords[i])) { + return 1; + } + } + return 0; +} + +static UcxList* dav_parse_add_token(UcxList *tokenlist, DavQLToken *token) { + + // determine token class (order of if-statements is very important!) + char firstchar = token->value.ptr[0]; + + if (isdigit(firstchar)) { + token->tokenclass = DAVQL_TOKEN_NUMBER; + } else if (firstchar == '%') { + token->tokenclass = DAVQL_TOKEN_FMTSPEC; + } else if (token->value.length == 1) { + switch (firstchar) { + case '(': token->tokenclass = DAVQL_TOKEN_OPENP; break; + case ')': token->tokenclass = DAVQL_TOKEN_CLOSEP; break; + case ',': token->tokenclass = DAVQL_TOKEN_COMMA; break; + case '=': token->tokenclass = DAVQL_TOKEN_EQ; break; + case '<': token->tokenclass = DAVQL_TOKEN_LT; break; + case '>': token->tokenclass = DAVQL_TOKEN_GT; break; + case '!': token->tokenclass = DAVQL_TOKEN_EXCLAIM; break; + default: + token->tokenclass = strchr(special_token_symbols, firstchar) ? + DAVQL_TOKEN_OPERATOR : DAVQL_TOKEN_IDENTIFIER; + } + } else if (firstchar == '\'') { + token->tokenclass = DAVQL_TOKEN_STRING; + } else if (firstchar == '`') { + token->tokenclass = DAVQL_TOKEN_IDENTIFIER; + } else if (iskeyword(token)) { + token->tokenclass = DAVQL_TOKEN_KEYWORD; + } else { + token->tokenclass = DAVQL_TOKEN_IDENTIFIER; + } + + // remove quotes (extreme cool feature) + if (token->tokenclass == DAVQL_TOKEN_STRING || + (token->tokenclass == DAVQL_TOKEN_IDENTIFIER && firstchar == '`')) { + + char lastchar = token->value.ptr[token->value.length-1]; + if (firstchar == lastchar) { + token->value.ptr++; + token->value.length -= 2; + } else { + token->tokenclass = DAVQL_TOKEN_INVALID; + } + } + + + return ucx_list_append(tokenlist, token); +} + static UcxList* dav_parse_tokenize(sstr_t src) { UcxList *tokens = NULL; - sstr_t *token = NULL; + DavQLToken *token = NULL; char insequence = '\0'; for (size_t i = 0 ; i < src.length ; i++) { // quoted strings / identifiers are a single token if (src.ptr[i] == '\'' || src.ptr[i] == '`') { if (src.ptr[i] == insequence) { // add quoted token to list - token->length++; - tokens = ucx_list_append(tokens, token); + token->value.length++; + tokens = dav_parse_add_token(tokens, token); token = NULL; insequence = '\0'; } else if (insequence == '\0') { insequence = src.ptr[i]; // always create new token for quoted strings if (token) { - tokens = ucx_list_append(tokens, token); + tokens = dav_parse_add_token(tokens, token); } - token = malloc(sizeof(sstr_t)); - token->ptr = src.ptr + i; - token->length = 1; + token = malloc(sizeof(DavQLToken)); + token->value.ptr = src.ptr + i; + token->value.length = 1; } else { // add other kind of quotes to token - token->length++; + token->value.length++; } } else if (insequence) { - token->length++; + token->value.length++; } else if (isspace(src.ptr[i])) { // add token before spaces to list (if any) if (token) { - tokens = ucx_list_append(tokens, token); + tokens = dav_parse_add_token(tokens, token); token = NULL; } } else if (strchr(special_token_symbols, src.ptr[i])) { // add token before special symbol to list (if any) if (token) { - tokens = ucx_list_append(tokens, token); + tokens = dav_parse_add_token(tokens, token); token = NULL; } // add special symbol as single token to list - token = malloc(sizeof(sstr_t)); - token->ptr = src.ptr + i; - token->length = 1; - tokens = ucx_list_append(tokens, token); + token = malloc(sizeof(DavQLToken)); + token->value.ptr = src.ptr + i; + token->value.length = 1; + tokens = dav_parse_add_token(tokens, token); // set tokenizer ready to read more tokens token = NULL; } else { // if this is a new token, create memory for it if (!token) { - token = malloc(sizeof(sstr_t)); - token->ptr = src.ptr + i; - token->length = 0; + token = malloc(sizeof(DavQLToken)); + token->value.ptr = src.ptr + i; + token->value.length = 0; } // extend token length when reading more bytes - token->length++; + token->value.length++; } } if (token) { - tokens = ucx_list_append(tokens, token); + tokens = dav_parse_add_token(tokens, token); } return tokens; } -static DavQLExpression* dav_parse_expression( - DavQLStatement* stmt, UcxList* starttoken, size_t n) { - if (n == 0) { - return NULL; - } - - DavQLExpression *expr = calloc(1, sizeof(DavQLExpression)); - - // set pointer for source text - expr->srctext.ptr = token_sstr(starttoken)->ptr; - - // special case - only one token - if (n == 1) { - expr->srctext.length = token_sstr(starttoken)->length; - char firstchar = expr->srctext.ptr[0]; - char lastchar = expr->srctext.ptr[expr->srctext.length-1]; - if (firstchar == '\'') { - expr->type = DAVQL_STRING; - } else if (isdigit(firstchar)) { - expr->type = DAVQL_NUMBER; - } else if (firstchar == '%') { - if (expr->srctext.length == 1) { - dav_error_in_context(DAVQL_ERROR_MISSING_FMTSPEC, - _missing_fmtspec, stmt, starttoken); - } else if (expr->srctext.length == 2) { - switch (expr->srctext.ptr[1]) { - case 'd': expr->type = DAVQL_NUMBER; break; - case 's': expr->type = DAVQL_STRING; break; - case 't': expr->type = DAVQL_TIMESTAMP; break; - default: - dav_error_in_context(DAVQL_ERROR_UNKNOWN_FMTSPEC, - _unknown_fmtspec, stmt, starttoken); - } - } else { - dav_error_in_context(DAVQL_ERROR_INVALID_FMTSPEC, - _invalid_fmtspec, stmt, starttoken); - } - } else { - expr->type = DAVQL_IDENTIFIER; - } - // remove quotes (if any) - if (firstchar == '\'' || firstchar == '`') { - if (lastchar != firstchar) { - stmt->errorcode = DAVQL_ERROR_MISSING_QUOTE; - stmt->errormessage = - ucx_sprintf(_missing_quote, sfmtarg(expr->srctext)).ptr; - } - expr->srctext.ptr++; - if (expr->srctext.length > 2) { - expr->srctext.length -= 2; - } else { - expr->srctext.length = 0; - } - } - } else { - UcxList* token = starttoken; - - // check, if first token is ( - // if so, verify that last token is ) and throw both away - if (!sstrcmp(*token_sstr(token), S("("))) { - if (!sstrcmp(*token_sstr(ucx_list_get(token, n-1)), S(")"))) { - token = token->next; - n -= 2; - } else { - // TODO: throw syntax error - } - } - - // process tokens - for (size_t i = 0 ; i < n ; i++) { - sstr_t tokendata = *token_sstr(token); - - // TODO: make it so - - // go to next token (if this is not the last token) - if (i < n-1) { - token = token->next; - } - } - - // compute length of source text (including delimiters) - expr->srctext.length = token_sstr(token)->ptr + - token_sstr(token)->length - expr->srctext.ptr; - } - - return expr; -} - static void dav_free_expression(DavQLExpression *expr) { if (expr->left) { dav_free_expression(expr->left); @@ -548,377 +514,12 @@ } free(expr); } - -#define _step_fieldlist_ 10 // field list -#define _step_FROM_ 20 // FROM clause -#define _step_WITH_ 30 // WITH clause -#define _step_WITHopt_ 530 // expecting more WITH details or end -#define _step_WHERE_ 40 // WHERE clause -#define _step_ORDER_BY_ 50 // ORDER BY clause -#define _step_ORDER_BYopt_ 550 // expecting more ORDER BY details or end -#define _step_end_ 500 // expect end - -struct fieldlist_parser_state { - UcxList *expr_firsttoken; - DavQLField *currentfield; - size_t expr_len; - /* - * 0: begin of field list - may encounter "*" or "-" special fields - * 1: collect expression token - * switch to step 2 on keyword "as" - * expect "," or "from" only if expr_len is 1 (add to list and continue) - * 2: expect one token (identifier) for as clause - * 3: expect a ",": continue with step 1 - * or a "from": leave field list parser - * 4: expect end of field list (i.e. a "from" keyword) - */ - int step; -}; static void dav_free_field(DavQLField *field) { dav_free_expression(field->expr); free(field); } -static int dav_parse_fieldlist(DavQLStatement *stmt, UcxList *token, - struct fieldlist_parser_state *state) { - sstr_t tokendata = *token_sstr(token); - - _Bool fromkeyword = !sstrcasecmp(tokendata, S("from")); - _Bool comma = !sstrcmp(tokendata, S(",")); - - switch (state->step) { - case 0: - if (!sstrcmp(tokendata, S("*")) || !sstrcmp(tokendata, S("-"))) { - DavQLField *field = malloc(sizeof(DavQLField)); - field->name = tokendata; - field->expr = calloc(1, sizeof(DavQLExpression)); - field->expr->type = DAVQL_IDENTIFIER; - field->expr->srctext = tokendata; - stmt->fields = ucx_list_append(stmt->fields, field); - - if (tokendata.ptr[0] == '-') { - // no further fields may follow, if dash symbol has been found - state->step = 4; - } else { - state->step = 3; - } - return _step_fieldlist_; - } - // did not encounter special field, fall through to step 1 - state->step = 1; - case 1: - if (fromkeyword || comma) { - // add possible identifier to list - if (state->expr_firsttoken) { - // TODO: skip comma in function call) - if (state->expr_len > 1) { - stmt->errorcode = DAVQL_ERROR_UNEXPECTED_TOKEN; - stmt->errormessage = ucx_sprintf(_expected_token, - "AS", sfmtarg(tokendata)).ptr; - return 0; - } - - DavQLExpression *expr = dav_parse_expression( - stmt, state->expr_firsttoken, state->expr_len); - - if (expr->type != DAVQL_IDENTIFIER) { - dav_free_expression(expr); - stmt->errorcode = DAVQL_ERROR_UNEXPECTED_TOKEN; - stmt->errormessage = ucx_sprintf(_expected_token, - "AS", sfmtarg(tokendata)).ptr; - return 0; - } // TODO: do not allow identifier when wildcard is present - - DavQLField *field = malloc(sizeof(DavQLField)); - field->expr = expr; - field->name = field->expr->srctext; - stmt->fields = ucx_list_append(stmt->fields, field); - - state->expr_firsttoken = NULL; - state->expr_len = 0; - - if (fromkeyword) { - return _step_FROM_; - } - } else { - dav_error_in_context(DAVQL_ERROR_UNEXPECTED_TOKEN, - _unexpected_token, stmt, token); - return 0; - } - } else if (!sstrcasecmp(tokendata, S("as"))) { - // TODO: return error, if expr_first_token is NULL - state->currentfield = malloc(sizeof(DavQLField)); - state->currentfield->expr = dav_parse_expression( - stmt, state->expr_firsttoken, state->expr_len); - - state->expr_firsttoken = NULL; - state->expr_len = 0; - - state->step = 2; - } else { - // collect tokens for field expression - if (state->expr_firsttoken) { - state->expr_len++; - } else { - state->expr_firsttoken = token; - state->expr_len = 1; - } - } - - return _step_fieldlist_; - case 2: { - DavQLExpression *expr = dav_parse_expression(stmt, token, 1); - if (expr->type == DAVQL_IDENTIFIER) { - state->currentfield->name = expr->srctext; - stmt->fields = ucx_list_append(stmt->fields, state->currentfield); - state->currentfield = NULL; - } else { - dav_free_field(state->currentfield); - dav_error_in_context(DAVQL_ERROR_IDENTIFIER_EXPECTED, - _identifier_expected, stmt, token); - - } - dav_free_expression(expr); - state->step = 3; - - return _step_fieldlist_; - } - case 3: - if (fromkeyword) { - return _step_FROM_; - } else if (comma) { - state->step = 1; - return _step_fieldlist_; - } else { - dav_error_in_context(DAVQL_ERROR_UNEXPECTED_TOKEN, - _unexpected_token, stmt, token); - return 0; - } - case 4: - if (fromkeyword) { - return _step_FROM_; - } else { - stmt->errorcode = DAVQL_ERROR_UNEXPECTED_TOKEN; - stmt->errormessage = ucx_sprintf(_expected_token, - "FROM", sfmtarg(tokendata)).ptr; - return 0; - } - default: - stmt->errorcode = DAVQL_ERROR_INVALID; - stmt->errormessage = strdup(_parser_state); - return 0; - } -} - -static int dav_parse_from(DavQLStatement *stmt, UcxList *token) { - sstr_t tokendata = *token_sstr(token); - - if (!sstrcasecmp(tokendata, S("with"))) { - return _step_WITH_; - } else if (!sstrcasecmp(tokendata, S("where"))) { - return _step_WHERE_; - } else if (!sstrcasecmp(tokendata, S("order"))) { - return _step_ORDER_BY_; - } else { - if (stmt->path.ptr) { - if (stmt->path.ptr[0] == '/') { - char *end = tokendata.ptr+tokendata.length; - stmt->path.length = end - stmt->path.ptr; - } else { - stmt->errorcode = DAVQL_ERROR_INVALID_PATH; - stmt->errormessage = strdup(_invalid_path); - } - } else { - if (tokendata.ptr[0] == '/' || !sstrcmp(tokendata, S("%s"))) { - stmt->path = tokendata; - } else { - stmt->errorcode = DAVQL_ERROR_INVALID_PATH; - stmt->errormessage = strdup(_invalid_path); - } - } - return _step_FROM_; - } -} - -struct with_parser_state { - /* - * 0: key - * 1: = - * 2: value - * 3: comma or new clause or end - */ - int step; - /* - * 1: depth - */ - int key; - int keymask; -}; - -static int dav_parse_with_clause(DavQLStatement *stmt, UcxList *token, - struct with_parser_state *state) { - sstr_t tokendata = *token_sstr(token); - - switch (state->step) { - case 0: - if (!sstrcasecmp(tokendata, S("depth"))) { - state->key = 1; - state->step = 1; - if (state->keymask & state->key) { - stmt->errorcode = DAVQL_ERROR_DUPLICATED_ATTRIBUTE; - stmt->errormessage = ucx_sprintf(_duplicated_attribute, - sfmtarg(tokendata)).ptr; - } else { - state->keymask |= state->key; - } - } else { - stmt->errorcode = DAVQL_ERROR_UNKNOWN_ATTRIBUTE; - stmt->errormessage = ucx_sprintf(_unknown_attribute, - sfmtarg(tokendata)).ptr; - } - return _step_WITH_; // continue parsing WITH clause - case 1: - if (sstrcmp(tokendata, S("="))) { - stmt->errorcode = DAVQL_ERROR_UNEXPECTED_TOKEN; - stmt->errormessage = ucx_sprintf(_expected_token, - "=", sfmtarg(tokendata)).ptr; - } else { - state->step = 2; - } - return _step_WITH_; // continue parsing WITH clause - case 2: - switch (state->key) { - case 1: /* depth */ - if (!sstrcasecmp(tokendata, S("infinity"))) { - stmt->depth = DAV_DEPTH_INFINITY; - } else { - DavQLExpression *depthexpr = - dav_parse_expression(stmt, token, 1); - - if (depthexpr->type == DAVQL_NUMBER) { - if (depthexpr->srctext.ptr[0] == '%') { - stmt->depth = DAV_DEPTH_PLACEHOLDER; - } else { - sstr_t depthstr = depthexpr->srctext; - char *conv = malloc(depthstr.length+1); - char *chk; - memcpy(conv, depthstr.ptr, depthstr.length); - conv[depthstr.length] = '\0'; - stmt->depth = strtol(conv, &chk, 10); - if (*chk || stmt->depth < -1) { - stmt->errorcode = DAVQL_ERROR_INVALID_DEPTH; - stmt->errormessage = strdup(_invalid_depth); - } - free(conv); - } - } else { - stmt->errorcode = DAVQL_ERROR_INVALID_DEPTH; - stmt->errormessage = strdup(_invalid_depth); - } - - dav_free_expression(depthexpr); - } - break; - } - state->step = 3; - return _step_WITHopt_; // continue parsing WITH clause - case 3: - // a with clause may be continued with a comma - // or another clause may follow - if (!sstrcmp(tokendata, S(","))) { - state->step = 0; // reset clause parser - return _step_WITH_; - } else if (!sstrcasecmp(tokendata, S("where"))) { - return _step_WHERE_; - } else if (!sstrcasecmp(tokendata, S("order"))) { - return _step_ORDER_BY_; - } else { - dav_error_in_context(DAVQL_ERROR_UNEXPECTED_TOKEN, - _unexpected_token, stmt, token); - return 0; - } - default: - stmt->errorcode = DAVQL_ERROR_INVALID; - stmt->errormessage = strdup(_parser_state); - return 0; - } -} - -struct orderby_parser_state { - /* - * 0: expect by keyword - * 1: expect identifier / number - * 2: expect asc / desc or comma - * 3: expect comma - */ - int step; - DavQLOrderCriterion *crit; -}; - -static int dav_parse_orderby_clause(DavQLStatement *stmt, UcxList *token, - struct orderby_parser_state *state) { - - sstr_t tokendata = *token_sstr(token); - - switch (state->step) { - case 0: - if (!sstrcasecmp(tokendata, S("by"))) { - state->step++; - } else { - stmt->errorcode = DAVQL_ERROR_UNEXPECTED_TOKEN; - stmt->errormessage = ucx_sprintf(_expected_by, - sfmtarg(tokendata)).ptr; - } - return _step_ORDER_BY_; - case 1: - state->crit = malloc(sizeof(DavQLOrderCriterion)); - state->crit->column = dav_parse_expression(stmt, token, 1); - state->crit->descending = 0; - - if (!state->crit->column || ( - state->crit->column->type != DAVQL_NUMBER && - state->crit->column->type != DAVQL_IDENTIFIER)) { - free(state->crit); - dav_error_in_context(DAVQL_ERROR_IDORNUM_EXPECTED, - _idornum_expected, stmt, token); - } else { - stmt->orderby = ucx_list_append(stmt->orderby, state->crit); - } - - // continue parsing clause, if more tokens available - state->step++; - return _step_ORDER_BYopt_; - case 2: - if (!sstrcasecmp(tokendata, S("desc"))) { - state->crit->descending = 1; - } else if (!sstrcasecmp(tokendata, S("asc"))) { - state->crit->descending = 0; - } else if (!sstrcmp(tokendata, S(","))) { - state->step = 1; // reset clause parser - return _step_ORDER_BY_; // statement must not end now - } else { - dav_error_in_context(DAVQL_ERROR_UNEXPECTED_TOKEN, - _unexpected_token, stmt, token); - return 0; - } - // continue parsing clause, if more tokens available - state++; - return _step_ORDER_BYopt_; - case 3: - if (!sstrcmp(tokendata, S(","))) { - state->step = 1; // reset clause parser - return _step_ORDER_BY_; // statement must not end now - } else { - dav_error_in_context(DAVQL_ERROR_UNEXPECTED_TOKEN, - _unexpected_token, stmt, token); - return 0; - } - } - - return _step_end_; -} - static void dav_free_order_criterion(DavQLOrderCriterion *crit) { if (crit->column) { // do it null-safe though column is expected to be set dav_free_expression(crit->column); @@ -926,6 +527,255 @@ free(crit); } +#define token_is(token, expectedclass) (token && \ + (((DavQLToken*)(token)->data)->tokenclass == expectedclass)) + +#define tokenvalue_is(token, expectedvalue) (token && \ + !sstrcasecmp(((DavQLToken*)(token)->data)->value, S(expectedvalue))) + +typedef int(*exprparser_f)(DavQLStatement*,UcxList*,DavQLExpression*); + +static int dav_parse_binary_expr(DavQLStatement* stmt, UcxList* token, + DavQLExpression* expr, exprparser_f parseL, char* opc, int* opv, + exprparser_f parseR) { + + int total_consumed = 0, consumed; + + // save temporarily on stack (copy to heap later on) + DavQLExpression left, right; + + // RULE: LEFT, [Operator, RIGHT] + memset(&left, 0, sizeof(DavQLExpression)); + consumed = parseL(stmt, token, &left); + if (!consumed) { + return 0; + } + total_consumed += consumed; + token = ucx_list_get(token, consumed); + + char *op = strchr(opc, token_sstr(token).ptr[0]); // locate operator + if (token_is(token, DAVQL_TOKEN_OPERATOR) && op) { + expr->op = opv[op-opc]; + total_consumed++; + token = token->next; + memset(&right, 0, sizeof(DavQLExpression)); + consumed = parseR(stmt, token, &right); + if (!consumed) { + dav_error_in_context(DAVQL_ERROR_MISSING_EXPR, + _error_missing_expr, stmt, token); + return 0; + } + total_consumed += consumed; + } + + if (expr->op == DAVQL_NOOP) { + memcpy(expr, &left, sizeof(DavQLExpression)); + } else { + expr->left = malloc(sizeof(DavQLExpression)); + memcpy(expr->left, &left, sizeof(DavQLExpression)); + expr->right = malloc(sizeof(DavQLExpression)); + memcpy(expr->right, &right, sizeof(DavQLExpression)); + } + + return total_consumed; +} + + +static int dav_parse_unary_expr(DavQLStatement* stmt, UcxList* token, + DavQLExpression* expr) { + + int total_consumed = 0; + DavQLExpression *litexpr = expr; + + // optional unary operator + if (token_is(token, DAVQL_TOKEN_OPERATOR)) { + char *op = strchr("+-~", token_sstr(token).ptr[0]); + if (op) { + expr->type = DAVQL_UNARY; + switch (*op) { + case '+': expr->op = DAVQL_ADD; break; + case '-': expr->op = DAVQL_SUB; break; + case '~': expr->op = DAVQL_NEG; break; + } + expr->left = calloc(sizeof(DavQLExpression), 1); + litexpr = expr->left; + total_consumed++; + token = token->next; + } else { + dav_error_in_context(DAVQL_ERROR_INVALID_UNARY_OP, + _error_invalid_unary_op, stmt, token); + return 0; + } + } + + // RULE: (ParExpression | AtomicExpression) + if (token_is(token, DAVQL_TOKEN_OPENP)) { + // TODO: make it so (and don't forget CLOSEP) + } else { + // RULE: FunctionCall + // TODO: make it so + + // RULE: Identifier + /*else*/ if (token_is(token, DAVQL_TOKEN_IDENTIFIER)) { + total_consumed++; + litexpr->type = DAVQL_IDENTIFIER; + litexpr->srctext = token_sstr(token); + } + + // RULE: Literal + // TODO: make it so + } + + + return total_consumed; +} + +static int dav_parse_bitexpr(DavQLStatement* stmt, UcxList* token, + DavQLExpression* expr) { + + return dav_parse_binary_expr(stmt, token, expr, + dav_parse_unary_expr, + "&|^", (int[]){DAVQL_AND, DAVQL_OR, DAVQL_XOR}, + dav_parse_bitexpr); +} + +static int dav_parse_multexpr(DavQLStatement* stmt, UcxList* token, + DavQLExpression* expr) { + + return dav_parse_binary_expr(stmt, token, expr, + dav_parse_bitexpr, + "*/", (int[]){DAVQL_MUL, DAVQL_DIV}, + dav_parse_multexpr); +} + +static int dav_parse_expression(DavQLStatement* stmt, UcxList* token, + DavQLExpression* expr) { + + // TODO: save source text + + return dav_parse_binary_expr(stmt, token, expr, + dav_parse_multexpr, + "+-", (int[]){DAVQL_ADD, DAVQL_SUB}, + dav_parse_expression); +} + +static int dav_parse_format_spec(DavQLStatement* stmt, UcxList* token) { + + return 0; +} + +static int dav_parse_fieldlist(DavQLStatement *stmt, UcxList *token) { + + // RULE: "-" + if (token_is(token, DAVQL_TOKEN_OPERATOR) && tokenvalue_is(token, "-")) { + DavQLField *field = malloc(sizeof(DavQLField)); + field->expr = calloc(sizeof(DavQLExpression), 1); + field->expr->type = DAVQL_IDENTIFIER; + field->expr->srctext = field->name = token_sstr(token); + stmt->fields = ucx_list_append(stmt->fields, field); + return 1; + } + + // RULE: "*", {",", Expression, " as ", Identifier} + if (token_is(token, DAVQL_TOKEN_OPERATOR) && tokenvalue_is(token, "*")) { + DavQLField *field = malloc(sizeof(DavQLField)); + field->expr = calloc(sizeof(DavQLExpression), 1); + field->expr->type = DAVQL_IDENTIFIER; + field->expr->srctext = field->name = token_sstr(token); + stmt->fields = ucx_list_append(stmt->fields, field); + + int total_consumed = 0; + int consumed = 1; + + do { + token = ucx_list_get(token, consumed); + total_consumed += consumed; + + if (token_is(token, DAVQL_TOKEN_COMMA)) { + total_consumed++; token = token->next; + DavQLExpression * expr = calloc(sizeof(DavQLExpression), 1); + consumed = dav_parse_expression(stmt, token, expr); + if (expr->type == DAVQL_UNDEFINED_TYPE) { + dav_free_expression(expr); + } else { + DavQLField *field = malloc(sizeof(DavQLField)); + field->expr = expr; + field->name = expr->srctext; + stmt->fields = ucx_list_append(stmt->fields, field); + } + + // TODO: parse "as" + } else { + consumed = 0; + } + } while (consumed > 0); + + return total_consumed; + } + + // RULE: FieldExpression, {",", FieldExpression} + // TODO: make it so + + return 0; +} + +static int dav_parse_where_clause(DavQLStatement *stmt, UcxList *token) { + return 0; +} + +static int dav_parse_with_clause(DavQLStatement *stmt, UcxList *token) { + + int total_consumed = 0; + + // RULE: "depth", "=", (Number | "infinity") + if (tokenvalue_is(token, "depth")) { + token = token->next; total_consumed++; + if (token_is(token, DAVQL_TOKEN_EQ)) { + token = token->next; total_consumed++; + if (tokenvalue_is(token, "infinity")) { + stmt->depth = DAV_DEPTH_INFINITY; + token = token->next; total_consumed++; + } else { + DavQLExpression *depthexpr = calloc(sizeof(DavQLExpression), 1); + + int consumed = dav_parse_expression(stmt, token, depthexpr); + + if (consumed) { + if (depthexpr->type == DAVQL_NUMBER) { + if (depthexpr->srctext.ptr[0] == '%') { + stmt->depth = DAV_DEPTH_PLACEHOLDER; + } else { + sstr_t depthstr = depthexpr->srctext; + char *conv = malloc(depthstr.length+1); + char *chk; + memcpy(conv, depthstr.ptr, depthstr.length); + conv[depthstr.length] = '\0'; + stmt->depth = strtol(conv, &chk, 10); + if (*chk || stmt->depth < -1) { + dav_error_in_context(DAVQL_ERROR_INVALID_DEPTH, + _error_invalid_depth, stmt, token); + } + free(conv); + } + total_consumed += consumed; + } else { + dav_error_in_context(DAVQL_ERROR_INVALID_DEPTH, + _error_invalid_depth, stmt, token); + } + } + + dav_free_expression(depthexpr); + } + } + } + + return total_consumed; +} + +static int dav_parse_orderby_clause(DavQLStatement *stmt, UcxList *token) { + return 0; +} + /** * Semantic analysis of a get statement. * @param stmt the statement to analyze. @@ -943,77 +793,95 @@ static void dav_parse_get_statement(DavQLStatement *stmt, UcxList *tokens) { stmt->type = DAVQL_GET; - int step = _step_fieldlist_; + // Consume field list + tokens = ucx_list_get(tokens, dav_parse_fieldlist(stmt, tokens)); - struct with_parser_state state_with; - memset(&state_with, 0, sizeof(struct with_parser_state)); - struct orderby_parser_state state_orderby; - memset(&state_orderby, 0, sizeof(struct orderby_parser_state)); - struct fieldlist_parser_state state_fieldlist; - memset(&state_fieldlist, 0, sizeof(struct fieldlist_parser_state)); + // Consume from keyword + if (token_is(tokens, DAVQL_TOKEN_KEYWORD) + && tokenvalue_is(tokens, "from")) { + tokens = tokens->next; + } else { + dav_error_in_context(DAVQL_ERROR_MISSING_TOKEN, + _error_missing_from, stmt, tokens); + return; + } - // Process tokens - UCX_FOREACH(token, tokens) { - switch (step) { - // too much input data - case _step_end_: - dav_error_in_context(DAVQL_ERROR_UNEXPECTED_TOKEN, - _unexpected_token, stmt, token); - break; - // field list - case _step_fieldlist_: { - step = dav_parse_fieldlist(stmt, token, &state_fieldlist); - break; + // Consume path + if (token_is(tokens, DAVQL_TOKEN_STRING)) { + stmt->path = token_sstr(tokens); + tokens = tokens->next; + } else if (token_is(tokens, DAVQL_TOKEN_OPERATOR) + && tokenvalue_is(tokens, "/")) { + stmt->path.ptr = token_sstr(tokens).ptr; + tokens = tokens->next; + while (!token_is(tokens, DAVQL_TOKEN_KEYWORD)) { + sstr_t toksstr = token_sstr(tokens); + stmt->path.length = toksstr.ptr-stmt->path.ptr+toksstr.length; + tokens = tokens->next; } - // from clause - case _step_FROM_: { - step = dav_parse_from(stmt, token); - break; - } - // with clause - case _step_WITH_: - case _step_WITHopt_: { - step = dav_parse_with_clause(stmt, token, &state_with); - break; - } - // where clause - case _step_WHERE_: - // TODO: implement - step = _step_end_; - break; - // order by clause - case _step_ORDER_BY_: - case _step_ORDER_BYopt_: - step = dav_parse_orderby_clause(stmt, token, &state_orderby); - break; - default: - stmt->errorcode = DAVQL_ERROR_INVALID; - stmt->errormessage = strdup(_parser_state); - } - - // cancel processing, when an error has been detected - if (stmt->errorcode) { - break; + } else if (token_is(tokens, DAVQL_TOKEN_FMTSPEC)) { + // TODO: make it so + } + + // Consume with clause (if any) + if (token_is(tokens, DAVQL_TOKEN_KEYWORD) + && tokenvalue_is(tokens, "with")) { + tokens = tokens->next; + tokens = ucx_list_get(tokens, + dav_parse_with_clause(stmt, tokens)); + } + if (stmt->errorcode) { + return; + } + + // Consume where clause (if any) + if (token_is(tokens, DAVQL_TOKEN_KEYWORD) + && tokenvalue_is(tokens, "where")) { + tokens = tokens->next; + tokens = ucx_list_get(tokens, + dav_parse_where_clause(stmt, tokens)); + } + if (stmt->errorcode) { + return; + } + + // Consume order by clause (if any) + if (token_is(tokens, DAVQL_TOKEN_KEYWORD) + && tokenvalue_is(tokens, "order")) { + tokens = tokens->next; + if (token_is(tokens, DAVQL_TOKEN_KEYWORD) + && tokenvalue_is(tokens, "by")) { + tokens = tokens->next; + tokens = ucx_list_get(tokens, + dav_parse_orderby_clause(stmt, tokens)); + } else { + dav_error_in_context(DAVQL_ERROR_MISSING_TOKEN, + _error_missing_by, stmt, tokens); + return; } } + if (stmt->errorcode) { + return; + } - if (!stmt->errorcode) { - if (step < _step_end_) { - stmt->errorcode = DAVQL_ERROR_UNEXPECTED_END; - stmt->errormessage = strdup(_unexpected_end_msg); + + if (tokens) { + if (token_is(tokens, DAVQL_TOKEN_INVALID)) { + dav_error_in_context(DAVQL_ERROR_INVALID_TOKEN, + _error_invalid_token, stmt, tokens); } else { - dav_analyze_get_statement(stmt); + dav_error_in_context(DAVQL_ERROR_UNEXPECTED_TOKEN, + _error_unexpected_token, stmt, tokens); } + } else { + dav_analyze_get_statement(stmt); } } static void dav_parse_set_statement(DavQLStatement *stmt, UcxList *tokens) { stmt->type = DAVQL_SET; - UCX_FOREACH(token, tokens) { - sstr_t tokendata = *token_sstr(token); - - } + // TODO: make it so } DavQLStatement* dav_parse_statement(sstr_t srctext) { @@ -1031,18 +899,15 @@ if (tokens) { // use first token to determine query type - sstr_t token = *token_sstr(tokens); - free(tokens->data); - tokens = ucx_list_remove(tokens, tokens); - if (!sstrcasecmp(token, S("get"))) { - dav_parse_get_statement(stmt, tokens); - } else if (!sstrcasecmp(token, S("set"))) { - dav_parse_set_statement(stmt, tokens); + if (tokenvalue_is(tokens, "get")) { + dav_parse_get_statement(stmt, tokens->next); + } else if (tokenvalue_is(tokens, "set")) { + dav_parse_set_statement(stmt, tokens->next); } else { stmt->type = DAVQL_ERROR; stmt->errorcode = DAVQL_ERROR_INVALID; - stmt->errormessage = strdup(_invalid_msg); + stmt->errormessage = strdup(_error_invalid); } // free token data @@ -1053,7 +918,7 @@ } else { stmt->type = DAVQL_ERROR; stmt->errorcode = DAVQL_ERROR_INVALID; - stmt->errormessage = strdup(_invalid_msg); + stmt->errormessage = strdup(_error_invalid); } return stmt; diff -r ee0de2b1872e -r 9cec78f23cbf libidav/davqlparser.h --- a/libidav/davqlparser.h Sat May 02 18:52:04 2015 +0200 +++ b/libidav/davqlparser.h Wed May 13 12:00:54 2015 +0200 @@ -43,10 +43,22 @@ typedef enum {DAVQL_ERROR, DAVQL_GET, DAVQL_SET} davqltype_t; /** + * Enumeration of possible token classes. + */ +typedef enum { + DAVQL_TOKEN_INVALID, DAVQL_TOKEN_KEYWORD, + DAVQL_TOKEN_IDENTIFIER, DAVQL_TOKEN_FMTSPEC, + DAVQL_TOKEN_STRING, DAVQL_TOKEN_NUMBER, DAVQL_TOKEN_TIMESTAMP, + DAVQL_TOKEN_COMMA, DAVQL_TOKEN_OPENP, DAVQL_TOKEN_CLOSEP, + DAVQL_TOKEN_EQ, DAVQL_TOKEN_LT, DAVQL_TOKEN_GT, DAVQL_TOKEN_EXCLAIM, + DAVQL_TOKEN_OPERATOR +} davqltokenclass_t; + +/** * Enumeration of possible expression types. */ typedef enum { - DAVQL_UNDEFINED_TYP, + DAVQL_UNDEFINED_TYPE, DAVQL_NUMBER, DAVQL_STRING, DAVQL_TIMESTAMP, DAVQL_IDENTIFIER, DAVQL_UNARY, DAVQL_BINARY, DAVQL_LOGICAL, DAVQL_FUNCCALL } davqlexprtype_t; @@ -63,6 +75,11 @@ DAVQL_LIKE, DAVQL_UNLIKE // comparisons } davqloperator_t; +typedef struct { + davqltokenclass_t tokenclass; + sstr_t value; +} DavQLToken; + /** * An expression within a DAVQL query. */ @@ -145,7 +162,7 @@ * AddExpression = MultExpression, [AddOperator, AddExpression]; * MultExpression = BitwiseExpression, [MultOperator, MultExpression]; * BitwiseExpression = UnaryExpression, [BitwiseOperator, BitwiseExpression]; - * UnaryExpression = [UnaryOperator], (AtomicExpression | ParExpression); + * UnaryExpression = [UnaryOperator], (ParExpression | AtomicExpression); * AtomicExpression = FunctionCall | Identifier | Literal; * ParExpression = "(", Expression, ")"; * @@ -174,9 +191,9 @@ * LogicalOperator = " and " | " or " | " xor "; * Comparison = | "=" | "<" | ">" | "<=" | ">=" | "!="; * - * FieldExpressions = "*", {",", Expression, " as ", Identifier} - * | FieldExpression, {",", FieldExpression} - * | "-"; + * FieldExpressions = "-" + * | "*", {",", Expression, " as ", Identifier} + * | FieldExpression, {",", FieldExpression}; * FieldExpression = Identifier * | Expression, " as ", Identifier; * SetExpressions = SetExpression, {",", SetExpression}; @@ -266,54 +283,30 @@ /** Depth needs to be specified at runtime. */ #define DAV_DEPTH_PLACEHOLDER -2 -/** Invalid path. */ -#define DAVQL_ERROR_INVALID_PATH 1 - -/** Expected an identifier, but found something else. */ -#define DAVQL_ERROR_IDENTIFIER_EXPECTED 10 - -/** Expected an identifier or literal, but found something else. */ -#define DAVQL_ERROR_IDORLIT_EXPECTED 11 +/** Unexpected token. */ +#define DAVQL_ERROR_UNEXPECTED_TOKEN 1 -/** Expected an identifier or number, but found something else. */ -#define DAVQL_ERROR_IDORNUM_EXPECTED 12 - -/** Expected an identifier or string, but found something else. */ -#define DAVQL_ERROR_IDORSTR_EXPECTED 13 +/** A token has been found, for which no token class is applicable. */ +#define DAVQL_ERROR_INVALID_TOKEN 2 -/** Expected an identifier or timestamp, but found something else. */ -#define DAVQL_ERROR_IDORTS_EXPECTED 14 - -/** The with-clause contains an unknown attribute. */ -#define DAVQL_ERROR_UNKNOWN_ATTRIBUTE 20 +/** A token that has been expected was not found. */ +#define DAVQL_ERROR_MISSING_TOKEN 11 -/** Depth must be greater than zero or infinity. */ -#define DAVQL_ERROR_INVALID_DEPTH 21 - -/** The with-clause contains an attribute more than once. */ -#define DAVQL_ERROR_DUPLICATED_ATTRIBUTE 29 - -/** The format specifier is missing. */ -#define DAVQL_ERROR_MISSING_FMTSPEC 30 - -/** The format specifier is unknown. */ -#define DAVQL_ERROR_UNKNOWN_FMTSPEC 31 +/** An expression has been expected, but was not found. */ +#define DAVQL_ERROR_MISSING_EXPR 12 -/** The format specifier is invalid. */ -#define DAVQL_ERROR_INVALID_FMTSPEC 39 - -/** A quote symbol (' or `) is missing. */ -#define DAVQL_ERROR_MISSING_QUOTE 50 +/** An operator has been found for a unary expression, but it is invalid. */ +#define DAVQL_ERROR_INVALID_UNARY_OP 21 -/** No more tokens to parse, but the parser expected more. */ -#define DAVQL_ERROR_UNEXPECTED_END 100 - -/** A token was found, which has not been expected. */ -#define DAVQL_ERROR_UNEXPECTED_TOKEN 101 +/** The depth is invalid. */ +#define DAVQL_ERROR_INVALID_DEPTH 101 /** Nothing about the statement seems legit. */ #define DAVQL_ERROR_INVALID -1 +/** Unhandled error */ +#define DAVQL_ERROR_UNHANDLED -2 + /** * Starts an interactive debugger for a DavQLStatement. *