libbash weekly report #9 (mid-term summary)

It’s been four months since I started working on this project. I just passed the mid-term evaluation so I’d like to talk about the current status of libbash.

My plan for the mid-term evaluation is to be able to parse the ebuilds that don’t inherit any eclass. With a few exceptions, I achieved that more than a month ago. If you read my weekly reports, you might have noticed that we could parse about 7000+ ebuilds at some point. That’s when I achieved the goal.

If you want to see what libbash can handle so far, there are several test scripts where you can find the answer.

But why are there some exceptions? The answer is due to the limitation of parser grammar. The parser grammar was supposed to be working at the end of GSoC 2010. However we found there were still many issues in the grammar while implementing the runtime of libbash. The parser grammar is quite important and I can’t work on the runtime if the parser grammar does not work properly.

Then what are you doing these days? I spent most of the time fixing the problems in the parser grammar. In my GSoC application, I said if I had enough time, I would try to disable global backtracking for the parser grammar. But now this seems to be required. I started working on it from last week and it goes pretty well so far.

Some people doubt if I could completely disable backtracking. I’m not a compiler expert and I don’t have the confidence to say if bash is LL(k). As far as I see, I can handle most of the syntax with LL(k) parser. But what I am really trying to do is to remove the global backtracking option of ANTLR grammar. With that option, ANTLR will automatically backtrack so there might be many places where you could have avoided backtracking. It also makes the problems in the parser grammar even more complicated. It’s highly recommended not to use that option in production code. As a result, I’m trying to use left factoring, syntactic predicate and local backtracking option instead of the global one. So far the parser grammar is faster and it’s much easier to fix the problems in it. I’ll give the performance comparison in the next weekly report.

What I have done in the last week:

  • worked on a new parser grammar without the global backtracking option

Plan for this week:

  • finish the work with the new parser grammar and improve unit tests
  • incorporate the new parser grammar
  • fully support here document
  • improve the rules for parameter expansion
  • improve the rules for built-in and keyword test
  • fix the problem with \newline


libbash weekly report #8

This week I focused on improving the parser grammar . Petteri and I finally decide to remove the global backtracking. We didn’t plan to do it at first because it would cost a lot of time. We thought that was not something required for metadata generation. However, many stories are blocked due to the limitation of parser grammar. We have to remove backtracking so that semantic predicate can work for the grammar. Although doing it is quite difficult and will cost a lot of time, we will get a cleaner and much faster grammar when it’s done. It is the thing that we should do sooner or later anyway.

These are what I have done in the past week:

  • Used bash to verify unit tests
  • Supported thread safety
  • Turned off backtracking for the command rules
  • Left factored the rules for pipeline and function definition
  • Improved double quoted string handling
  • Worked on upgrading to ANTLR 3.4
  • Started working on removing global backtracking

Note that we support thread safety now and our version of instruo can finish its work in less than 1 minute. We still haven’t upgraded to ANTLR 3.4 because of some upstream bugs. The process might take longer.

In the next few weeks, I will mainly focus on one story: removing global backtracking. I hope it can be done in two weeks.

Leave a comment

libbash weekly report #7

This week we finally have our CI server working. Many thinks to robbat2 and other people who helped. If your GSoC projects might benefit from it(buildbot), please go ahead to ask for it. The number of ebuilds that we can handle doesn’t change as we still have the problem in our parser grammar. Here’s what I have done in the last week:

  • Finished setting up the CI server
  • Broke down the walker grammar to reduce compile time
  • Unified header protection coding style
  • Cleaned up doxygen warnings

two stories are blocked:

  • Upgrade to ANTLR 3.4

I fixed some problems in the libraries and reported some bugs upstream. Now I’m still blocked by some upstream bugs. This story is quite important because many stories need semantic predicate. We hope ANTLR 3.4 could support it when backtracking is turned on.

  • Make the library thread safe

There are still some problems with the thread safe code, I will continue working on it in this week.

So this week I will continue working on the blocked stories. We will decide what to do next after upgrading to ANTLR 3.4.


Leave a comment

libbash weekly report #6

Last week I tried to do semantic predicate in our parser grammar. We hoped that it could solve most of the problems we were facing. Unfortunately, it didn’t end up well as we cannot make semantic predicate working with backtracking. This week I will migrate to ANTLR 3.4. and see if it helps solve the problem. Here are what I have done in the last week:

  • Supported braces in command arguments
  • Improved comment handling
  • Supported ANSI C Quoting
  • Supported shortcut capability for && and || in arithmetic expansion
  • Supported arithmetic expression
  • Supported break built-in
  • Improved our build system to reduce dependencies
  • Made arithmetic expansion follow POSIX
  • Improved exception hierarchy
  • Implemented shift built-in
  • Improved the ast_printer utility

A few stories are blocked due to the backtracking problem.

This week I will:

  • Upgrade to ANTLR 3.4 and see if it helps solve the backtracking problem
  • Fix errors in the CI Server
  • Break down walker grammar to reduce compile time
  • Use bash to verify test scripts
  • Support alias
  • Handle options to the local built-in
  • Upgrade to Paludis 0.64.1
  • Support thread-safety
  • Clean up warnings from doxygen

, ,

Leave a comment

libbash weekly report #5

In the last week, I focused on parser grammar improvement. So far we can generate correct metadata for 8028 ebuilds. As we have made error handling POSIX compliant, any parsing failure will cause an exception. So making the parser working properly is the first thing that should be done. To be honest, fixing bugs in the parser is not easy. The logic there is already quite complicated and backtracking make it worse. But I need to get through it. Here are what I have done in the last week:

  • Supported bash redirection for all kinds of commands
  • Supported the special parameter $-
  • Supported parsing -o and -a operators for built-in test
  • Supported brace expansion
  • Implemented eclass parse failure cache
  • Supported backslash escapes inside double quotes
  • Fixed variable indirection in arithmetic expressions
  • Supported regex match operator for keyword test
  • Tried to parse here document and improve variable expansion
  • Improved CI server configuration

This week I will:

  • Support braces in command arguments
  • Improve comment handling
  • Handle single quoted string in variable reference like $’string’
  • Support shortcut capability for && and || in arithmetic expression
  • Support arithmetic expression
  • Support break built-in
  • Support read-only built-in
  • Improve our build system to reduce dependencies
  • Make arithmetic expansion follow POSIX
  • Improve exception hierarchy
  • Implement shift built-in
  • Try boost::spirit::qi to implement a simple lexer for ANTLR

At the end of this weekly report, I’d like to mention a small tip for bash arithmetic. As you probably have known, you can’t write $(( expression )) as a bash command. But sometimes you just need to evaluate the expression and nothing else. You can certainly call the let built-in but that requires you quote your expression to avoid word splitting. Then some people invent this:

: $(( expression ))

‘:’ is a bash built-in that does nothing. So the argument gets evaluated first and then ‘:’ gets called. I really feel it unnecessary as you can always use bash arithmetic expression:

(( expression ))

This is exactly equivalent to

let "expression"

, ,

Leave a comment

libbash weekly report #4

This week I’ve made an important change to bash function implementation. So far we can generate correct metadata for 7927 ebuilds. Here’s what I have done:

  • supported not equals in arithmetic expansion
  • supported array offset expansion like ${*:1:2} and ${a[@]:1:2}
  • improved the implementation for bash functions
  • supported shopt -p
  • supported declare -p
  • supported printf
  • supported $#
  • fixed case statement with empty body
  • removed unnecessary abstractions in the implementation of arithmetic expansion
  • filed new stories from the output of instruo

In the coming week, I’ll:

  • Support bash redirections for all kinds of commands
  • support $-
  • fix problems with built-in test
  • support brace expansion
  • cache parsing failures
  • support indirect reference in arithmetic expansion
  • support escaped characters in double quoted string
  • fix bugs in arithmetic expansion
  • continue working on the story for the CI server

Till now, we have supported most of the language features that are needed for metadata generation. But there are a lot of small problems that we have to handle. The most difficult part is to improve the parser grammar. Even a small change to the grammar could break a lot of things because the grammar is already complicated. So our progress may be slowed down when I have to deal with parser grammar improvement.

Hope we can get more stuff fixed in the next couple of weeks.

, ,

Leave a comment