Biff's 2003 PDC Newsletter

Motto - Where the only Java is coursing through our veins

Technical content of this issue - 4 out of 5

Special Pop Culture Issue - In an attempt to keep things interesting, throughout this issue I am going to intentionally sprinkle references and quotes from songs, movies, TV shows, etc. For instance, in the first paragraph of Tuesday's issue I used the phrase "so I got that goin' for me" which some of you may have recognized as having come from Bill Murray in Caddyshack. See how many you can find in this issue - then watch tomorrow's issue to see how many you missed!

Errata

In Sunday's Letters, TH was erroneously identified as being in VA. He is actually in CA - I am currently in CA but have not seen him as CA is very large.

Session - Building Database Applications with SQL Server "Yukon": XQuery, XML Datatype

This session talked about advances in Yukon that moved us toward an integrated storage system for relational and semi-structured (XML) data. It focused heavily on the new XML features in Yukon. From what I took away from the session, the key to everything when it comes to the new features is a new datatype that can be applied to a column called XML. When the column is set up, an XML schema can be assigned to the column to ensure properly structured data is in the field. Let's pause here a minute and consider where we are - we now have a column in a table that has XML data in it that conforms to a particular schema (if we desire). Once we are there, the possibilities really start to open up.

There are several ways of searching the XML fields. The first option they suggested was create a full text index and perform a full text search - my first thought was you have got to be kidding. Then they started talking about XQuery, a W3C working draft for querying XML data. It includes XPath, so within a query is a selection clause that looks familiar to anyone who has written XSLT or used XPath in some other way. You can now search the XML field based on an XQuery query to get just the data that you want, but it requires a full table scan. That's when they brought full text searching back up - do a full text search, taking advantage of the full text index to reduce the possible rowset, then do the XQuery on the results. Now the full text stuff made sense to me - but I still wanted more. Luckily the session was not yet over...

There is also the capability to create an XML Index on an XML column. This creates indexes on tags, values and paths. This seemed very, very cool, but I was concerned as the space taken up by such an index - surely it could get huge. There was no discussion of size, nor any indication that you could limit which nodes in the XML document were actually indexed. Finally, they spoke of the ability to modify data through XQuery - something they are building into Yukon, but in which they are ahead of the standard, which does not address modifying data yet.

Session - "Indigo": Connected Application Technology Roadmap (Slides)

This was another crowded Indigo session, but it was a good one. It talked about how Indigo would interact with the existing technologies it is replacing and what we can do now to make sure that we can transition easily (they keep saying "easy" - I do not think that word means what they think it means).

The session started by talking about existing DCOM applications and how they will work in Longhorn. One slide gave a synopsis of the two ways they could ensure that existing DCOM apps would still work under Indigo. Either they could A) Reimplement the interfaces of the existing technologies to take advantage of the Indigo bus, allowing existing apps to keep working and take advantage of better functionality; or B) Make sure Indigo supports existing protocols. When they laid it out that way, choosing A seemed pretty obvious. So, if you have an existing DCOM app that you install on Longhorn it will continue to work exactly as it does now using the exact interfaces and unmanaged code that it does now. You will be able to change it so that it uses the Indigo technology behind the original interfaces merely by changing configuration information - you won't even have to recompile. They went through several different scenarios in the slide deck that you should check out if you are interested - in the slide deck every block and arrow that is blue represents old technology, every one that is red represents new technology. All in all what they are planning looks great - of course, it's not done yet. One troubling moment - at one point the presenter made the comment about the current DCOM infrastructure that they are "never going to get rid of it." So we've been told and some choose to believe it, but with VB 6.0 going out of support in 18 months, I think they're wrong wait and see.

As an example of how old and new will work together, he spoke of the ability of VB Script to access a web service by calling GetObject and passing a moniker set up to represent the web service. That sounded pretty cool.

The last half of the session discussed how code could be converted to take full advantage of Indigo and what to do now to make sure it was "easy." Here are the highlights-

HttpContext does not work in Indigo
ASMX, Enterprise Services and .NET Remoting will migrate easily, with just a few search and replaces. Check out the slide deck to see animated slides explaining the required changes
Web Service Extensions (WSE) code cannot be migrated easily - it will need to be rewritten to take advantage of Indigo, although it will continue to work for the foreseeable future in its orginal form. He said, in so many words, that unless you gain a competitive advantage by getting something to market right now that needs support for the WS protocols, it's best to wait for Indigo and let WSE go by.
System.Messaging code will not migrate, but will need to be rewritten. I believe this namespace includes all the MSMQ code, so if you are using MSMQ you should be sure to abstract all direct calls behind an interface to reduce the impact of your future migration. There seemed to be contradiction in the deck about System.Messaging and whether to use it or not, but I was not able to ask him about it. If I hear more I will let you know.

Session - Managed/Native Interop Best Practices and Common Pitfalls (That We Learned the Hard Way)

(Slides)

This was a repeat of a session offered earlier in the week that filled up. Since it was held at lunch time, I was forced to skip lunch to attend - but I'm willing to make that sacrifice to increase the amount of cutting edge knowledge found in this newsletter.

First, if you are doing interop you must download CLR Spy. This run time tool configures all 11 debug probes in the 1.1 version of the CLR, catching Interop errors that may takes weeks to turn up otherwise. For instance, if you pass an unreferenced object to unmanaged code you won't see any problems as long as the GC process doesn't run - if the GC runs then your app sleeps with the fishes. When the GC runs is just luck, you probably will find the bug one week after you go live. CLR Spy forces a GC before all unmanaged calls - so if you have passed an unreferenced object to unmanaged code the bug will show up immediately. This is a very cool tool and very free - download it now.

Two things they pointed out that, although not bugs, will kill your performance. First - make sure you always call COM objects from the right threading model. CLR runs in the MTA by default, if you are going to call COM objects you should manipulate this with STAThread or Thread.ApartmentState (or ASPCompat in ASP.NET). Second, .NET does all strings in UNICODE, so make sure you use the UNICODE version of all API calls and not the ANSI version.

I haven't done alot of Interop, so I didn't understand all the pitfalls - I'll just quote one from the slides that I did understand-

public class OSHandle
{
public IntPtr h;

public OSHandle
{
// If an exception throw during this routine after the handle is
// obtained then the finalizer will not be called and the handle will leak
h = NativeMethods.CreateOSHandle();
}

~OSHandle
{
NativeMethods.ReleaseOSHandle(h);
}
}

public class MyClass
{
public DoWork
{
OSHandle osh = new OSHandle();
// There's no more references to osh after this, so if the GC
// runs between this call begins and when it returns it will free osh
// and we will have a problem like we discussed above
NativeMethods.UseHandle(osh.h);
}
}

This looks pretty safe, but there is a plethora possible bugs. The solutions they suggested involved the SafeHandle classes, although some of those won't be available until Whidbey. The guy giving the session (Adam Nathan) has a book out - if you are doing lots of Interop you may want to check it out.

Interlude

Looks like there won't be any Twinkies this year. Today's snack was Graham Crackers - another strike against the LA Convention Center. I would go to the PDC in an airplane hangar in Newark as long as there was good material, but a good venue really makes it a more enjoyable experience. This place was left wanting in the following departments-

Room size (at least for the Indigo talks), I've never seen a PDC where so many sessions turned people away.
Facilities - if I may skirt the boundaries of good taste for just one paragraph, I will only say one thing about the Men's rooms - metal troughs - not everywhere, but enough.
Food - see previous newsletters
Chair size - just not big enough to hold three people in a row without significant overlap. I'm not talking pre-Subway Jared sized folks, just average sized people.

To be sure, logistics for something like this are amazing and Microsoft proved to be nimble, scheduling repeat sessions of full events, sometimes while the first session was still being held. And the computing facilities they set up are just amazing - they bring in hundreds of PC's hooked up to the internet and put them everywhere in the center so you can log on just about any time. Wireless Internet access all through the facility, network hookups for your laptop in dozens of places around the facility. It is just an amazing acheivement involving hundreds of people to stage.

Session - "Indigo": The Web Services Protocols and Architecture (Slides)

This session concentrated on the web service protocols and the looked at the actual XML that makes up the protocols. Like most of the other Indigo presentations this week, this one overflowed the room and spilled out into the hallway - web services are certainly the buzz this week. The guy giving the session was surprised at the turnout. The session was, in his words, a geeky talk even by PDC standards.

For the technical details I will just refer you to the slides and not even try to summarize. The session started with a demo of a interop demo between Indigo and IBM software that had been given by Bill Gates and a senior IBM executive in NY in September. It was the first time that Microsoft and IBM senior executives had been on the same stage since before the OS/2...er...unpleasantness. Anyway, if you're interested in the XML backing up things like WS-Addressing, WS-Policy, WS-Security, WS-Trust, WS-AtomicTransaction, WS-ReliableMessaging, etc., check out the slides.

Session - "Indigo": Building Secure Distributed Applications with Web Services (Slides)

I was able to understand some of this session, and I plan to go back to the slides myself to see if I can get a little more, but to be honest I just didn't understand enough of it to put together any kind of coherent summary for you. Sorry. On the positive side, there was a great song during the muzak before the session began - a free subscription to next year's newsletter for the first person who can identify this lyric - "I wish I was in Tijauna, eating barbecue iguana".

Session - Caching Techniques for Scalable Enterprise Applications (Slides)

I originally attempted to get into a BizTalk/Yukon/Indigo session during this timeslot, but it was full. Before I took off for the hotel, I thought I'd give this session a try and I'm glad I did. It demonstrated some really cool stuff but didn't tax the brain too much at 5:15 in the evening.

Before we get to the content, I'd like to relate one observation that came to me during this talk. ODBC gave way to DAO, OLEDB and ADO and now ADO.NET. COM gave way to .NET. SQL Server has been through several releases. Every Microsoft technology will eventually give way - but the Northwind database will live forever!

Caching is a cool thing. Keeping data in memory instead of going back to the database every time allows us to greatly reduce database bottleneck - the problem is what to do when the database changes and the data in the cache is no longer valid. Up until now apps have used methods like expiration timers on the cache or database triggers - but nothing that really solves the problem in an elegant way. Yukon and Whidbey introduce a new mechanism that solves the problem very well.

The solution at the lowest level is based on a new construct called Query Processor (QP) Notifications. These are constructs associated with a particular query that monitor the database and send a notification when the results of that query change. The notification is sent via a new mechanism in Yukon called the SQL Server Broker (essentially a Message Queue for SQL Server). To use QP Notifications, a program creates a command and a notification request, associating them with one another. It then executes the command. It sets up an asynchronous query against the SQL Server Broker and when notification comes through it knows that it needs to run the query again to get the latest result. One important aspect is that there is not an open connection to the database maintained while waiting for the notification. There are a few more details and some caveats about the types of queries well suited for this strategy - you can find more info in the slides.

OK, neat stuff, but it involves some serious code. Fortunately there is a higher level abstraction available call SQLDependency. This class registers interest in a particular command and exposes an OnChanged event that fires when the query results change. This greatly simplifies the code required and is an excellent way to implement database caching based on notification in a business layer.

There are a couple of further abstractions built on this functionality that are built into ASP.NET. If you're application has ASP.NET pages talk directly to the database without a business layer in between, you can take advantage of this functionality by making your ASP.NET Object Cache or Response Cache dependent upon data invalidations.

Very cool stuff

Session - ObjectSpaces (Object-Relational Mapping)

Guest columnist Steve Harshbarger and I had dinner on Tuesday night and he was telling me about something called ObjectSpaces that had been mentioned at a session that sounded very cool. He went to a full session on this technology today, so I imposed on him to make a rare return appearance in the newsletter to share this stuff with you. So once again please welcome to the stage, Steve Harshbarger.

The .NET Framework 2.0 (Whidbey) will include a new data access feature called ObjectSpaces. It is specifically designed to bridge the object world (in your program) and the relational world (where objects are persisted in the database). The presenter was quick to point out two things. First, this is not a replacement for ADO Datasets. It, in fact, is layer on top of it, and developers can use either technique depending on their requirements. Second, he stressed that when absolute control and performance are of concern, Datasets may be more appropriate. ObjectSpaces are appropriate when there is a strong object model aspect the application.

All in all, DataSets seem fairly simple to use. They work like this:

You define objects in your application normally (i.e., as classes). These classes do not need to inherit from any ObjectSpaces base class or implement any special interfaces. Any vanilla class you write will work.
You define a database structure to store your objects. Note this only works with SQL Server (7.0, 2000, and the new Yukon next year).
You create XML-based mapping files to define how the objects relate to tables in the database. In the current alpha build of Whidbey, this is a manual process, but they showed a screen shot of a visual tool they're working on to build these files graphically. Picture your object model on the left and the database schema on the right and the ability to draw lines between the two and you get the idea of where they're going.

In code, you use a small set of ObjectSpaces classes (similar to ADO DataSets) to execute commands.

With this in place, ObjectsSpaces handles "materializing" objects (i.e., populating them with data from the database) as well as persisting objects (saving, updating back to the database). It does this by generating appropriate SQL using the mapping files. It handles objects and all manner of relationships between them (one to many relationships, trees of hierarchically related objects, etc.).

To retrieve objects from the database, a query language called OPath is used. OPath lets you address objects by their names and properties. The examples showed how queries which were very simple to express in OPath generated very complex SQL in the background, the point being that this is a much less verbose way to express business concepts in code.

Related to retrieving objects were the concepts of "Spanning" and "Delay Loading". Spanning instructs the framework to retrieve all children of a given object when that object is retrieved. This allows you to instantiate a whole tree of related objects at once with one line of code.

Delay Loading is similar to spanning, but addresses the potential performance problem of loading up a whole lot of objects at once. By specifying delay loading, the framework doesn't actually go to the database for a given object in the tree until it is addressed in code. Its "just in time" loading of object data.

In terms up saving objects back to the database, they showed a number of features. First, the framework tracks changes to the objects in memory automatically. When you want to save, you merely call a PersistChanges method, and it executes the necessary UPDATE and INSERT statements to save the objects to the appropriate tables.

For multi user environments, both optimistic concurrency and transactions are supported. Under optimistic concurrency, the first user to save a change wins, and subsequent users are denied the save if their copy does not match what's in the database. Transaction support is what you'd expect - BeginTrans, Commit, and Rollback methods are all supported, and they it works with DTC as well.

Finally, they had a brief discussion of the ability to override the default mapping by using ADO.NET to feed your own SQL to populate objects for cases that are more complicated than supported by the framework. It seemed that the framework would handle some very complex cases, and a specific example of when you'd need to use this was not presented.

All in all, ObjectSpaces looked very promising and well designed. I know from my friends in the Java world ( ed. note - "friends in the Java world"? I can only shake my head sadly) that Microsoft is behind in implementing something like this (they did, in fact preview this at the 2000 PDC but are just now getting around to building it), but I'm not familiar enough with that side to compare. The "bits" for this are included in the code we received at the PDC, so I'm anxious to give it a try.

Letters - We Get Letters

Imagine our suprise to get this message from SC in NJ -

"Hey Biff. I sent my boss your PDC Newsletter and he's interested in meeting up with you...I expect you could get a dinner out of it"

Dear SC,
Now you're talking! Note to other readers out there - I think we've finally found the perfect letter to the editor! -Ed.

Sign Off

One more day - one more issue. See you tomorrow.

Biff
Night Watchman, Biff's PDC Newsletter