ImplementSyntaxHighlighting
last edited July 25, 2007 05:35:10 (144.126.163.15)
| |
|---|---|
| Edit / History / New / Search | Quick Links: Home / Recent Changes / Glossary / Jobs / Forums / Help |
|
I want to implement syntax highlighting in a text editor. Do I have to subclass NSTextField or something similar?
I've done this with a little perl text editor I've been using for a few years. Register for the NSTextStorageDidProcessEditingNotification? notification to learn when text was modified, get the NSTextStorage object from the notification, use NSTextStorage's editedRange to figure out where the recent edit was, and then colorize the NSTextView accordingly. Here's a meaningless blurb:
I ended up subclassing NSTextStorage and NSTextView to perform other tasks, but that shouldn't be required in order to do syntax coloring.
The following code listens to changes to a NSTextView and colorizes all instances of the word "Mike" in blue, FWIW. The example colorizes the entire document after every key-press. This becomes quite costly; a "real" editor will try to minimize the area it attempts to colorize. Xcode, for example, only colorizes text that's currently on-screen, and defers colorizing the rest of the document until you scroll through it. I added a text view to the default window. I created an NSObject subclass called Controller, instantiated it, added "window" and "textview" outlets, and hooked the outlets up to the window and text view. Then I saved the files. Here's the controller definition: Controller.h
Controller.m
I'm not subclassing NSTextView or NSTextStorage; I register a delegate for the NSTextStorage object and listen to the proper notification. The NSTextStorage delegate is allowed to modify the storage's text attributes, so long as the delegate doesn't change the text's characters directly (this apparently includes switching from western to Kanji fonts). I compiled the application, ran it, and watched my "mike" strings draw in blue. You should be able to get this little example up and running quickly, assuming you're familiar with using InterfaceBuilder. Personally, I'd start by colorizing only the area around the actual edit, rather than colorizing the entire document w/ each pass. This means getting the edited area using NSTextStorage's editedRange, rolling back the range to a previous point (such as a word boundary), and rolling forward to a following point (such as a word boundary again) and colorize that edited range. My own perl editor does this and I've been fairly happy with the results. If you want to try to get "just in time" syntax coloring to work, start by reading the documentation on all of the text classes, including NSTextView, NSTextStorage, NSTextContainer, and NSLayoutManager. Also read the TextArchitecture? topics found here: file:///Developer/Documentation/Cocoa/TasksAndConcepts/ProgrammingTopics/TextArchitecture/index.html or here: http://developer.apple.com/techpubs/macosx/Cocoa/TasksAndConcepts/ProgrammingTopics/TextArchitecture/index.html http://developer.apple.com/documentation/Cocoa/Conceptual/TextArchitecture/index.html This will be fairly difficult, probably involving a lot of subclassing and doing things "by hand". Even ProjectBuilder doesn't get this right all the time.
I've extended my previous program to emulate a simple on-the-fly spell checker. If you type a word that's not in the word dictionary the word is colored in red. Otherwise the word is black. The implementation doesn't support punctuation of any kind, including commas and periods, hyphenated words, or apostrophes. The word dictionary comes from the system's /usr/share/dict/words dictionary that comes with Mac OS X. Here's the controller definition: Controller.h
Controller.m
The interface is the same as that in the "Mike" program with the addition of an InstanceVariable for a dictionary of words. We create the dictionary in awakeFromNib using NSScanner to parse the word dictionary. We handle the edit event in textStorageDidProcessEditing: as before. We start with the editedRange; this is usually a single character, but it might represent an arbitrary amount of text if the user pasted text from the clipboard. We extend the edited range out to word boundaries, taking advantage of the fact that words are separated by whitespace. Typically the "area" range represents is a single word, but it might represent several lines of text if this was a paste event again. As a result we still loop over the range like before. If you set up and run this program you'll find your words change colors as you type them. The word will flash between red and black as you complete the word. For example, when typing "there": 't' is black, 'th' is black (it's in the dictionary file...), 'the' is black, 'ther' is red, 'there' is black. If you paste text in from a big text file or mail message the program will stop and colorize all of the new text. This is the disadvantage I mentioned in my previous entry. But, once all of your text is in the file, you can scroll to any part of the file and start typing without feeling any negative effects of the colorizer. That is, even when you edit the middle of the file you don't have to re-colorize the entire document. That's the advantage I mention in my previous entry, and that's the specific problem ThomasSempf mentions in his previous post above. This example is fairly simple, but the concept is quite scalable. My perl editor uses a more sophisticated version of this same strategy. We start with the edited range, roll back to the previous colored area (it might be a word, a keyword, a comment, a string) and roll forward as necessary until the colorizer is finished changing colors. This way, adding a "x" to the file affects only the characters immediately surrounding the "x", adding a "#" affects the characters just prior (you may have put it in the middle of a word) and all the characters before the next newline, and adding a "'" will cause all the strings in the file to temporarily unbalance until you add a second "'". This is all powered by a simple lexical analyzer I wrote to recognize common formatting in perl, shell, and make scripts. keywords are detected using a dictionary approach similar to the one I outline above; since keywords can vary depending on script type (perl keywords are different than shell keywords) the editor keeps a number of keyword dictionaries around.
Well after all (before you posted) i implemented it a more c way. I am just searching for brackets to color the text between them. Here it is, if you are interested:
You are on the right track for writing a more general parser. If you're looking for a standard delimiter, such as angle brackets or spaces, then the NSCharacterSet approach I mention above is quite good. But some jobs require you to look at each individual character, such as most recursive descent parsers. If you find using characterAtIndex: is too slow, there are some sophisticated improving access to the unicode string underneath. . I'm interested how it will work in a Document Based Application.
I think the correct way to implement syntax coloring would be to use temporary attributes (see NSLayoutManager.h). The following is taken from the docs: "Temporary attributes are used only for on-screen drawing and are not persistent in any way. NSTextView uses them to color misspelled words when continuous spellchecking is enabled."
This approach pre-dates the temporary attributes in NSLayoutManager. If you already have control of your text storage it doesn't seem too inconvenient. Using temporary attributes in NSLayoutManager should simplify some operations (you shouldn't need to remove the color attribute from the edited area, like the examples do above) and you need worry less about your text storage system. The big question is, how "temporary" are these temporary attributes? Under what circumstances are these attributes removed/ignored, exactly? I'll leave the NSLayoutManager example as an exercise ;-) See ImplementSyntaxHighlightingUsingTemporaryAttributes for more on this, though
I have a working syntax highlighter of sorts, based on the code above. The problem is taking care of elements of the language which span more than one word, for instance comments and strings. I have a very shaky implementation which spots single line comments, but I'm at a loss when it comes to knowing whether what I am highlighting is in a multi-line comment or string (Which could also be multi-line)
I described using a lexical analyzer to interpret perl code. Perl is fairly easy, since you don't need to parse the code completely; you can just look for the things you're interested in (comments, strings, keywords). For performance I decided to write this part of my editor in C operating on a char *. I didn't want to spend all my time in NSString's characterAtIndex:, API like regex(3) only works on char *, and so forth. I end up breaking the file into "tokens" by scanning through the text looking for interesting data: comments, strings, keywords, and none-of-the-above. Sample flow control looks like this:
You can imagine more sophisticated state machines that can do a better job of parsing text, depending on the kind of thing you are parsing. For example, if you are processing things like HTML, XML, or old-style property lists you may need a recursive descent parser in order to keep all your tags balanced. You can also use tools like lex and yacc to build these kinds of parsers for you.
I've got the code above to work (ThomasSempf's), but it won't work when I replace the '<' and '>' with '"' (that's a double quote inside single quotes). All it does is color the quotes blue, not the text inside them. Add an else to the code. The current version works because if the current character is the start tag '<', it cannot be the end tag '>'. If you make them both identical, it will assume the start tag is the end tag, and just colour it, nothing else, blue. Put an else in there somewhere, and you should be fine:
Given the current cooperation between the KDE folk & Apple, perhaps someone should look at porting the core of Quanta 3.1 across to Mac OS X? I guess for full featured-ness, we need to wait for the WebCore SDK - in the meantime, give Quanta a whirl under X11 - it is a fantastic web editor!
Here is the source code for an editor I wrote, NSDocument based, but single window, open files are handled by a drawer. It does syntax highlighting by an external highlighter object which highlights a NSTextContainer. http://ozy.student.utwente.nl/projects/ozyedit/
How's your speed on large documents? Unless I'm mistaken, you're rehighlighting the entire document each time, which can be very slow.
As you can see it uses some sort of caching approach to prevent it from rehighlighting it all. This was one of the design decisions: I want to extend the highlighter to be a syntax checker as well, highlighting your syntax errors while you type. Brackets will always match.
I've looked over the CodeEditor source and it will start running from a PB Project, but hangs. I was looking over the GNUstep site and they mention CodeEditor http://www.gnustep.org/experience/apps.html at the bottom of the page with the broken link you gave us above. In looking over the ProjectCenter? app though, the CodeEditor piece seems to be gone.
Here is yet another version. It works pretty well, but it isn't perfect. Please post any updates here so we can all benefit from the bug fixes.
You'll need to prepare an NSDictionary called "keyWords" that contains the color to use for each keyword. The keys are all of the key words and the values are the colors (NSColor objects) to set the text color. You can build an NSDictionary containing some key words like this:
CodeEditor, a fairly complete syntax coloring editor written for GnuStep has a new home: http://savannah.nongnu.org/projects/codeeditor
FCBlogEditor? has a neat implementation of syntax highlighting using flex. http://members.shaw.ca/akochoi/2003/11-09/index.html#1 Some modifications to highlight everything inside of <> brackets and inside of quotation marks. This code is working fine:
In the above code, "quote-open", "quote-close", etc, are defined in an NSDictionary as the beginning and ending symbols of a quote (currently just " "). Like I said, the code works fine, but every single keystroke raises an exception in the run log which reads:
I have narrowed it down to this line:
but I cannot figure out any way to make the exception go away. The code is there to compensate for the scanner thinking that the beginning quote is also the end quote. Does anyone see any way that I could fix this?
I resolved the issue by using [scanner scanString] instead of [scanner setScanLocation].
You should always bracket changes in a text storage between -[textStorage beginEditing] and [textStorage endEditing] (defined in NSMutableAttributedString super class). This may significantly improve your colorizer performance.
I've written a nice class called UKSyntaxColoredTextDocument? that demonstrates a fairly efficient approach at syntax coloring, where generally only what you actually changed gets recolored. Very useful for larger documents. See http://www.zathras.de/angelweb/sourcecode.htm#UKSyntaxColoredTextDocument. -- UliKusterer
Moved discussion of using temporary attributes for coloring syntax to ImplementSyntaxHighlightingUsingTemporaryAttributes | |
| Edit / History / New / Search | Quick Links: Home / Recent Changes / Glossary / Jobs / Forums / Help |