The Nightstar Zoo

Nightstar IRC Network - irc.nightstar.net
It is currently Wed Sep 20, 2017 6:13 am

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 21 posts ] 
Author Message
 Post subject: coroutines/protothreads
PostPosted: Mon Oct 10, 2005 9:40 pm 
I'm sure many of you caught that protothreads thing on /. the other day. The article links to libraries with macros and things to support it, but the gist of it is a stateful function:

Code:
int whatever (int count) {
    static int state = 0;
    static int i;

    switch (state) {
    case 0:
        state = 1;

        for (i = 0; i < count; i++) {
            return doSomething(i);
    case 1:
        }
        state = 2
    case 2:
        return -1;
    }
}


Now, this is obviously horrible style by C standards, and it's prettier when you've got all the macros and so forth working for you but I don't want to try to dress it up if we're going to have a frank discussion about whether or not it's acceptable.

Generators in Python are very powerful and to the programmer they don't look very different from this. OO gives us a place to put state but we still have to unwind the function to get the flow of control right, and that can be tricky or annoying. Threads and processes allow us to ignore flow-of-control issues but even thread synchronization has a lot of overhead compared to this (both computational and mental).

Now, Chalain made it clear he doesn't think optimization of this kind should be attempted early on even though he concedes that it may be necessary later, but that's optimizations for performance reasons. But I think that something like this, or a pretty variation could be a good idea for ease of development and style reasons alone.

I probably wouldn't think this if I hadn't had so much luck with generators in Python. Threads are a messy business. They're a really easy way to introduce bugs, and they take a non-trivial amount of programming to set up even in languages that make them easy like Java. Something like this has low CPU and mental overhead, which is nice, but it's also got low mental overhead. If you can make a function that will spew your values sequentially, you can make a function that will return them one at a time on demand with a few fairly trivial changes and no thread management issues.

Also, there's nothing there that can't be done in Java or C++. It might even look clean from the outside. In C you'd want to be passing a pointer to a struct, in Java or C++ you could probably have something like this associated with an object where it can keep its state, and it would look like you were just calling obj.getNext() or something which most people could understand.

So... what do you think? I don't really care about the performance, but I want generators. I just got finished telling off C++ programmers for wanting every other language to be like C++, but the whole point of learning lots of languages for me was to be able to bring things from one to the other. If knowing C makes me a better Java programmer, can knowing Python make me a better C programmer?


Top
  
 
 Post subject:
PostPosted: Tue Oct 11, 2005 12:49 pm 
One word. DynamicC
http://www.zworld.com/documentation/doc ... htm#951713


Top
  
 
 Post subject:
PostPosted: Tue Oct 11, 2005 2:44 pm 
One question you answered yourself: If you require a thread, use a thread, rather than trying to skirt it, adding complexity.

The given code doesn't do much (it'll doSomething(0) or return -1, assuming the misplaced brace is moved to where it ought to be). This is what I assume your above code is supposed to do (added a reset flag for giggles):
Code:
int func (int count, BOOL reset = FALSE) {
    static int i = 0;
    if (TRUE == reset) i = 0;
    if (i < count) return doSomething(i);
    return -1;
}

This is certainly doable, but in general using statics is fairly evil unless you absolutely need to do so. I can't ever remember running into a case where I needed to do something like this that couldn't be done more simply.

That's my CAN $0.025.


Top
  
 
 Post subject:
PostPosted: Tue Oct 11, 2005 4:00 pm 
Raif wrote:
One question you answered yourself: If you require a thread, use a thread, rather than trying to skirt it, adding complexity.


Well... have you used Python? My goal is something like generators.

Code:
def mygen ():
  for i in xrange(0, 10):
    yield i

g = mygen()
print g.next()
->  0
print g.next()
-> 1


Basically, instead of a return statement, you have a yield statement which causes the function to return flow of control to where it was. In Python there's an added advantage because calling functions is expensive, and this allows a stack frame to be recycled. I generally don't worry about the performance (or I wouldn't use Python...), but I've noticed that this makes many algorithms easier to implement. This is trivial to set up, easier than even the simplest case of threads.

An example of where this is preferable to threads is a lexer I wrote for python (all the ones I found online either used globals or a really ugly hack (doctstrings shouldn't be used to contain executable code and it makes some of the nicer extensions of the GNU lex impossible to do it that way)). It could have been done with threads, but the only difference would be additional mental overhead to set up all the threading stuff.

Raif wrote:
This is certainly doable, but in general using statics is fairly evil unless you absolutely need to do so. I can't ever remember running into a case where I needed to do something like this that couldn't be done more simply.


I hadn't either until I learned Python...

This is a better example, and I know it's easier to do in other ways but it's just an example for posting on a board... Now that I've got macros set up to make it reasonably easy to do, it's easier to set up than a producer/consumer thing with threads. It's well supported in Python, so there's no reason to take on the additional burden. I say if it can be made easy in C, then it should be used in C as well.

cgen.h:
Code:
#define cgen_begin(state) switch (state) { case 0:
#define cgen_yield(state, ret) { state = __LINE__; return(ret); \
 case __LINE__: {} }
#define cgen_end(state) state = -1; case -1: {} }

#define CGEN_START 0
#define CGEN_DONE -1


gentest.c:
Code:
#include <stdio.h>
#include <stdlib.h>

#include "cgen.h"

#define DONE -1

typedef struct genstate {
  int state, i;
} genstate_t;

void gen_init (genstate_t *s) {
  s->state = CGEN_START;
  s->i = 0;
}

int gen (genstate_t *s) {
  cgen_begin(s->state);

  for (s->i = 0; s->i < 10; s->i++) {
    cgen_yield(s->state, s->i);
  }

  cgen_end(s->state);

  return DONE;
}

int main () {
  int ret;
  genstate_t s;

  gen_init(&s);

  for (ret = gen(&s); ret != DONE; ret = gen(&s)) {
    printf("%i\n", ret);
  }

  exit(EXIT_SUCCESS);
}


compiles with -Wall -Werror :)


Top
  
 
 Post subject:
PostPosted: Tue Oct 11, 2005 4:32 pm 
First, I was not aware you could nest a case statement in a different scope like that. That said, the solution is fairly clever, but what I'd worry about is debuggability, and especially maintenance later down the road...

The code looks readable, and actually makes sense. What about when something goes haywire in that loop? What if you want to change what the loop does in a year? Can you be confident other programmers will understand the generator function idiom after you've left the project?

If not, take the safe road and avoid it. The way you've written it tells me it will never be asynchronous (do one iteration per function call and return). With that in mind, you need neither generator functions nor threading, and I'm still hard-pressed to think of any case where they'd be useful enough to justify the added complexity in C.

The big idea here is that if the language was designed with something like this in mind, then its implied that you're using an understood idiom. If not, it may not be so safe to make that assumption.


Top
  
 
 Post subject:
PostPosted: Tue Oct 11, 2005 5:04 pm 
Raif wrote:
The given code doesn't do much (it'll doSomething(0) or return -1, assuming the misplaced brace is moved to where it ought to be). This is what I assume your above code is supposed to do (added a reset flag for giggles):
Code:
int func (int count, BOOL reset = FALSE) {
    static int i = 0;
    if (TRUE == reset) i = 0;

    /* Added below line to fix the loop -- Pi */
    else i++;

    if (i < count) return doSomething(i);
    return -1;
}


I don't think that the brace was "misplaced". The goal of anthonyr's code is that, when you enter the function the second, third, ..., count times, you immediately continue the for() loop where you left off. It is an interesting effect of the implementation of a switch() statement (and goto's in general) that you can jump into and out of loop control structures. Figuring out what the weirldly-interlaced switch and for structures are doing is pretty straightforward once you've got a good idea of what assembly the code compiles to...

... which is probably why I hate the original code so much. It is for exactly the same reason why entry-level C students are shown the 'goto' statement and then told never to use it. Programmers are warned that a goto is useful for short-circuiting loops, and even then sparingly. A 'switch' statement really is nothing more than a variable goto without the stigma attached. The code anthonyr posted would actually be *cleaner* with gotos because readers of the code would not then enter it with the assumption that switch is supposed to have strict nesting like a loop control structure.

Raif's rewrite (corrected above to include the counter increment) is a pretty good way to implement the same thing with an order of magnitude more readability and almost no performance degradation. Unless you're writing for the obfuscated C contest, I can't think of a good reason to inflict this code on anyone.

Raif wrote:
One question you answered yourself: If you require a thread, use a thread, rather than trying to skirt it, adding complexity.


Well said.


Top
  
 
 Post subject:
PostPosted: Tue Oct 11, 2005 5:18 pm 
Raif wrote:
First, I was not aware you could nest a case statement in a different scope like that.


You've never met Duff's Device:
Code:
switch (count % 8)
{
   case 0:        do {  *to = *from++;
   case 7:              *to = *from++;
   case 6:              *to = *from++;
   case 5:              *to = *from++;
   case 4:              *to = *from++;
   case 3:              *to = *from++;
   case 2:              *to = *from++;
   case 1:              *to = *from++;
                     } while (--n > 0);
}
(please note that I'm not advocating its use, this is simply for informational purposes)

Raif wrote:
What about when something goes haywire in that loop?


There's two things that help with debugging... Because you can't use locals you need to use something like what I've done, and having a struct with all the variables available from outside tells you where it is and what it's doing (the way I've done it gives you the line number...). Also, the loop can be designed in isolation and then converted to be a generator after the fact. In principal, you pretty much just need to replace every enqueue operation with a yield and all locals with members of structs.

Raif wrote:
Can you be confident other programmers will understand the generator function idiom after you've left the project?


Maintenance is basically my only concern.

Raif wrote:
If not, take the safe road and avoid it.


Oh I wouldn't put this in production code. This will sit on a shelf in my mind until a situation comes up where it's orders of mangnitude better than any of the alternatives.

Raif wrote:
The way you've written it tells me it will never be asynchronous (do one iteration per function call and return). With that in mind, you need neither generator functions nor threading, and I'm still hard-pressed to think of any case where they'd be useful enough to justify the added complexity in C.


Clearly this wouldn't be that useful for trivial cases like the one I've written, but there are situations where one iteration relies on the previous one(s) and this is basically one of the possible ways of storing that state, another being a thread or process asynchronously taking care of it and another being the construction of a state machine so multiple independant invocations can work properly.

Raif wrote:
The big idea here is that if the language was designed with something like this in mind, then its implied that you're using an understood idiom. If not, it may not be so safe to make that assumption.


I agree, but there are times when something sufficiently powerful justifies expanding the knowledge that's expected of people. Threads are a perfect example of this.


Last edited by anthonyr on Tue Oct 11, 2005 5:50 pm, edited 1 time in total.

Top
  
 
 Post subject:
PostPosted: Tue Oct 11, 2005 5:49 pm 
Pi wrote:
Raif's rewrite (corrected above to include the counter increment) is a pretty good way to implement the same thing with an order of magnitude more readability and almost no performance degradation. Unless you're writing for the obfuscated C contest, I can't think of a good reason to inflict this code on anyone.


See the second version I posted afterwards.

The primary motivation is to avoid building a complex state machine (assuming it's doing something non-trivial), and to avoid needing threads.

A lesson I've learned from Python is that while it can't really do much that other languages can't do, it's often so inconvenient to do things in other languages that they never end up happening. That's its power, and generators are a good chunk of that. Bringing them to C is a big win if it can be done right.


Top
  
 
 Post subject:
PostPosted: Tue Oct 11, 2005 6:07 pm 
Pi wrote:
... which is probably why I hate the original code so much. It is for exactly the same reason why entry-level C students are shown the 'goto' statement and then told never to use it.

I had this exact same thought once I understood what the macros were doing.


Top
  
 
PostPosted: Tue Oct 11, 2005 7:47 pm 
It occured to me that an almost identical interface can be built using pthreads. I'll probably end up writing that because if it's pthreads-based I'll actually be able to use it.

It's the interface I care about. I guess using queues to communicate between threads was burned into my brain so strongly that it didn't occur to me I could do something else with them. :)


Top
  
 
PostPosted: Tue Oct 11, 2005 10:51 pm 
anthonyr wrote:
I'm sure many of you caught that protothreads thing on /. the other day. The article links to libraries with macros and things to support it, but the gist of it is a stateful function:

Code:
int whatever (int count) {
    static int state = 0;
    static int i;

    switch (state) {
    case 0:
        state = 1;

        for (i = 0; i < count; i++) {
            return doSomething(i);
    case 1:
        }
        state = 2
    case 2:
        return -1;
    }
}


Now, this is obviously horrible style by C standards, and it's prettier when you've got all the macros and so forth working for you but I don't want to try to dress it up if we're going to have a frank discussion about whether or not it's acceptable.


Wait, so this is supposed to run in a thread? Because that code is clearly not threadsafe.


Top
  
 
PostPosted: Tue Oct 11, 2005 11:07 pm 
gwalla wrote:
Wait, so this is supposed to run in a thread? Because that code is clearly not threadsafe.


No, it's supposed to be an ultra-light-weight thread replacement for embedded systems that I thought could be used to get features that C doesn't have.


Top
  
 
PostPosted: Thu Oct 13, 2005 2:13 pm 
I've given it some thought and I still don't buy some of the objections.

-Goto is evil

People are advocating the use of a thread instead, but I am not convinced a thread is a substantially different situation.

A userspace threading implementation preserves state of each thread on a stack. My example preserves it in a struct. A userspace threading implementation hides the ugliness (and I've looked at them, they're hideous) of the implementation behind a relatively easy to use interface. My example does the same.

If it's okay for userspace threads to do something messy where we can't see it, how is it not okay for my example to do something messy where we can't see it? If I were to make an identical interface that uses threads in the implementation instead of a switch statement, would it be okay even though it does the same thing? If not, then how are userspace threads okay when they do the same thing as kernel threads but in a messier way?

-Debugability

The struct that we hang onto contains the entire state state of the generator, including a line number of where it is. It would even be possible to record the state of the generator in the event of a problem and play back the case that went wrong.

-Maintainability

I accept that this prevents this idea from being used in production code, but I don't think this should prevent the idea from being developed and possibly used in the future if people eventually get used to it. This happened for threads.


Top
  
 
PostPosted: Thu Oct 13, 2005 2:57 pm 
anthonyr wrote:
-Maintainability
I accept that this prevents this idea from being used in production code, but I don't think this should prevent the idea from being developed and possibly used in the future if people eventually get used to it. This happened for threads.


All code is production code. If you don't believe that, you haven't been programming long enough. Maintainability is quite possibly the most critical thing with which you can infuse your code. As Chalain has tried to drive home over and over, it is better to have buggy slow code which is modular, readable, and maintainable, than it is to have perfect spaghetti code. The former can be fixed. The latter must be scrapped and restarted from scratch when (when, not if) the requirements change.

BTW, I've heard the argument that code written for a college assignment is truly throwaway, and I even used to believe it (Experience tells me differently). However, even if there is no conceivable way that you could ever look at this program again, do you really want to train yourself, while you're building your career skills, into a practice that is guaranteed to limit your usefulness to an employer?

All code is production code. Even if you don't believe that, behave as if it's true.


Top
  
 
PostPosted: Thu Oct 13, 2005 3:33 pm 
Pi wrote:
All code is production code.


I think it's fair to make an exception for code written with the explicit intent of experimenting with something like this and for no other purpose. To claim otherwise is to claim that one should never undertake experimentation.

Pi wrote:
BTW, I've heard the argument that code written for a college assignment is truly throwaway, and I even used to believe it (Experience tells me differently).


I agree with this.

Pi wrote:
However, even if there is no conceivable way that you could ever look at this program again, do you really want to train yourself, while you're building your career skills, into a practice that is guaranteed to limit your usefulness to an employer?


meh

I don't think it's fair to require that every line of code that I write for any purpose be done in a particular way. I'm capable of discarding an idea if it proves to be untenable.

Experimentation is a good thing if you can take something back that makes sense and discard what doesn't. I make it a point to write well written code even if it won't leave my hard drive, but code written with the intent of experimenting has to be different sometimes. I wouldn't put Lisp or Haskell in production either, but does that imply that I can't learn those languages and experiment with them? I benefit from knowing Haskell, it gave me a much better grasp on recursive algorithms even though I write them in C/Java/whatever.

Also... some things are simply fun to know. I don't have an obligation to remain ignorant of things that don't directly impact the code I write for work. For example, I think it's fun to know how userspace threads work even though in practice they simply follow POSIX and that's all I strictly need to know for work.


Last edited by anthonyr on Thu Oct 13, 2005 3:55 pm, edited 1 time in total.

Top
  
 
PostPosted: Thu Oct 13, 2005 3:53 pm 
anthonyr wrote:
A userspace threading implementation hides the ugliness (and I've looked at them, they're hideous) of the implementation behind a relatively easy to use interface.

This brings me to question what kind of threading code you've seen. The last threading code I dealt with (and will be dealing with again soon) was very carefully thought out, works well, and is quite clear.

Quote:
If I were to make an identical interface that uses threads in the implementation instead of a switch statement, would it be okay even though it does the same thing?

If it does the same thing it's not asynchronous, and is thus a poor application of threading.


Top
  
 
PostPosted: Thu Oct 13, 2005 3:59 pm 
Raif wrote:
This brings me to question what kind of threading code you've seen. The last threading code I dealt with (and will be dealing with again soon) was very carefully thought out, works well, and is quite clear.


I'm specifically thinking of the OpenBSD userspace threading library.

Quote:
If it does the same thing it's not asynchronous, and is thus a poor application of threading.


The cgen_yield() macro could enqueue a result just as easily as returning it.


Top
  
 
PostPosted: Thu Oct 13, 2005 4:23 pm 
anthonyr wrote:
The cgen_yield() macro could enqueue a result just as easily as returning it.

Doesn't really matter, at its heart it's still synchronous.


Top
  
 
PostPosted: Thu Oct 13, 2005 4:34 pm 
Raif wrote:
anthonyr wrote:
The cgen_yield() macro could enqueue a result just as easily as returning it.

Doesn't really matter, at its heart it's still synchronous.

Just to play devil's advocate here... On a uniprocessor system, everything, at its heart, is synchronous. :)

Multithreading in a uniprocessor system is just a (very) convenient abstraction. The difference between that and what anthonyr is dealing with is who's doing the thread switching. In most cases, it serves everyone's purpose just fine to understand that the kernel is magically serializing the execution of all of those threads, and not have to worry about the massive interrupt-driven state machine that is the scheduler.

Outside of trying to emulate (or rewrite) the kernel scheduler, I still don't understand where anything like what anthonyr suggests would be useful.


Top
  
 
PostPosted: Thu Oct 13, 2005 4:51 pm 
Pi wrote:
Multithreading in a uniprocessor system is just a (very) convenient abstraction. The difference between that and what anthonyr is dealing with is who's doing the thread switching.

True, and I was waiting for someone to bring it up.

One key difference in my eyes is that there are no interrupts, as one might encounter in real threading, and so it becomes a case where every execution of the program, the same portions of code "yield" at the same times, which comes down to a case where you can just split them out of the pseudo-threading and drop the entire idiom all together, resulting in a cleaner implementation without macro magic. :P

Where this isn't true, and you need/want interrupts, you've crossed over into the land where a real thread is what you need.

Quote:
Outside of trying to emulate (or rewrite) the kernel scheduler, I still don't understand where anything like what anthonyr suggests would be useful.

Exactly.

Edit: On discussing this with Pi a rephrasing might be more illuminating... If you can do away with interrupts, your system is inherently closed, and thus deterministic. With interrupts (*true* interrupts), that's not true, and that's when your system becomes asynchronous.


Top
  
 
PostPosted: Thu Oct 13, 2005 5:08 pm 
Raif wrote:
Doesn't really matter, at its heart it's still synchronous.


At heart it's indifferent. The generator abstraction is a powerful one, and "yield" doesn't just have the one meaning.

EDIT: There are obviously different implications if it's implemented with threads, but it shouldn't rely on external state either way.

Pi wrote:
Multithreading in a uniprocessor system is just a (very) convenient abstraction.


Yes. IMO the primary benefit of threading is getting a sane way to do non-blocking I/O.

Pi wrote:
Outside of trying to emulate (or rewrite) the kernel scheduler, I still don't understand where anything like what anthonyr suggests would be useful.


It's a lightweight way to get a producer thread (or rather thread-like-thing). It's also useful for doing lazy evaluation on computationally intensive problems, and it's much easier to write and modify than a state machine if the algorithm is complex.


Top
  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 21 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: Yahoo [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group