Archive for July, 2011
Finally I have done with the backtracking removal. Now doing semantic predicate is much easier. With this change, I successfully supported here document, which was the biggest blocker before I started this work. In addition, the performance is better. I used valgrind to get performance comparison. Here is the output of ms_print (post-processing tool for Massif) before and after backtracking removal: before, after. I reduced about 38% memory usage and got the library run 20% faster.
What I have done in the past week:
- Completed the work on the new parser and incorporated it to our project
- Improved parameter expansion parsing and its runtime
- Improved built-in and keyword test
- Improved the runtime for case statement
- Improved arithmetic expansion
- Improved the local built-in
- Fixed some minor problems in compound statement and parameter expansion
- Reimplemented the export built-in
- Removed several tokens to avoid conflicts
- Improved here document and here string
- Fixed single quoted string in command substitution
In the following week, I will:
- Get backtracking removal pushed
- Fix our instruo implementation (now it crashes with the new grammar)
- Improve process substitution
- Support redirection without any command
- Reimplement the local built-in
- Fix some minor problems in variable expansion and bash test
- Remove some composite tokens
I started working on this project early and I will start seeking a job soon (I’ll soon graduate from my university). So Petteri and I agreed to end the GSoC on 08.06. As a result, this is the last iteration of this year’s GSoC. I’ll write one more regular report and a final report before the end. I’ll continue my work on this project as soon as I get a job :).
It’s been four months since I started working on this project. I just passed the mid-term evaluation so I’d like to talk about the current status of libbash.
My plan for the mid-term evaluation is to be able to parse the ebuilds that don’t inherit any eclass. With a few exceptions, I achieved that more than a month ago. If you read my weekly reports, you might have noticed that we could parse about 7000+ ebuilds at some point. That’s when I achieved the goal.
If you want to see what libbash can handle so far, there are several test scripts where you can find the answer.
But why are there some exceptions? The answer is due to the limitation of parser grammar. The parser grammar was supposed to be working at the end of GSoC 2010. However we found there were still many issues in the grammar while implementing the runtime of libbash. The parser grammar is quite important and I can’t work on the runtime if the parser grammar does not work properly.
Then what are you doing these days? I spent most of the time fixing the problems in the parser grammar. In my GSoC application, I said if I had enough time, I would try to disable global backtracking for the parser grammar. But now this seems to be required. I started working on it from last week and it goes pretty well so far.
Some people doubt if I could completely disable backtracking. I’m not a compiler expert and I don’t have the confidence to say if bash is LL(k). As far as I see, I can handle most of the syntax with LL(k) parser. But what I am really trying to do is to remove the global backtracking option of ANTLR grammar. With that option, ANTLR will automatically backtrack so there might be many places where you could have avoided backtracking. It also makes the problems in the parser grammar even more complicated. It’s highly recommended not to use that option in production code. As a result, I’m trying to use left factoring, syntactic predicate and local backtracking option instead of the global one. So far the parser grammar is faster and it’s much easier to fix the problems in it. I’ll give the performance comparison in the next weekly report.
What I have done in the last week:
- worked on a new parser grammar without the global backtracking option
Plan for this week:
- finish the work with the new parser grammar and improve unit tests
- incorporate the new parser grammar
- fully support here document
- improve the rules for parameter expansion
- improve the rules for built-in and keyword test
- fix the problem with \newline
This week I focused on improving the parser grammar . Petteri and I finally decide to remove the global backtracking. We didn’t plan to do it at first because it would cost a lot of time. We thought that was not something required for metadata generation. However, many stories are blocked due to the limitation of parser grammar. We have to remove backtracking so that semantic predicate can work for the grammar. Although doing it is quite difficult and will cost a lot of time, we will get a cleaner and much faster grammar when it’s done. It is the thing that we should do sooner or later anyway.
These are what I have done in the past week:
- Used bash to verify unit tests
- Supported thread safety
- Turned off backtracking for the command rules
- Left factored the rules for pipeline and function definition
- Improved double quoted string handling
- Worked on upgrading to ANTLR 3.4
- Started working on removing global backtracking
Note that we support thread safety now and our version of instruo can finish its work in less than 1 minute. We still haven’t upgraded to ANTLR 3.4 because of some upstream bugs. The process might take longer.
In the next few weeks, I will mainly focus on one story: removing global backtracking. I hope it can be done in two weeks.
This week we finally have our CI server working. Many thinks to robbat2 and other people who helped. If your GSoC projects might benefit from it(buildbot), please go ahead to ask for it. The number of ebuilds that we can handle doesn’t change as we still have the problem in our parser grammar. Here’s what I have done in the last week:
- Finished setting up the CI server
- Broke down the walker grammar to reduce compile time
- Unified header protection coding style
- Cleaned up doxygen warnings
two stories are blocked:
- Upgrade to ANTLR 3.4
I fixed some problems in the libraries and reported some bugs upstream. Now I’m still blocked by some upstream bugs. This story is quite important because many stories need semantic predicate. We hope ANTLR 3.4 could support it when backtracking is turned on.
- Make the library thread safe
There are still some problems with the thread safe code, I will continue working on it in this week.
So this week I will continue working on the blocked stories. We will decide what to do next after upgrading to ANTLR 3.4.