Wednesday, June 4, 2008

Getting Started With Linq To Sql and Sql Server Compact Edition (CE)

UPDATE 7.21.2008: Well, I'm now in the middle of my third project using Sql Server CE and I've got to say that it's really kind of a piece of shit and I can't in good conscience recommend using it with LINQ or without. It's a shame because having a decent offering in this space would really be a strength for Microsoft but SqlCE just kind of sucks. The tools suck, the IDE support sucks and the database format itself is so chock-full of limitations and oddities that using it is way more trouble than it's worth. My current advice is that if you've got a project you're thinking of using SqlCE on you should either ditch LINQ and use some other embedded database or else just bite the bullet and use the full version of Sql Server. Bummer huh? Old post preserved below: Today I had a little project to get done that caused me to take a look at using Microsoft's relatively new product, Sql Server 2005 Compact Edition. As near as I can tell CE (as we'll call Compact Edition from here on out) is sort of like a blend between normal Sql Server and MS Access. In terms of its SQL support and syntax it's like Sql Server. But the big difference is that there is no server. Instead CE utilizes a single database file which has an .sdf extension. This file works similarly to the old Access .mdf files in that you can move it around, copy it, etc. and every instance of the file is its own separate database. Accessing this file from code is very like accessing a normal Sql Server instance: you create a SqlCeConnection object with a connection string and then you can use objects that are immediately familiar like SqlCeDataReader, SqlCeCommand, etc. These objects are found in the System.Data.SqlServerCe namespace and operate almost exactly like their normal System.Data.SqlClient coutnerparts. But the object of this post is twofold: firstly to explain how to create and work with CE .sdf files and secondly to then show how to use Linq To Sql rather than the SqlServerCe data client classes to access that file in an application. Neither of these tasks is particularly difficult but neither is either obvious or (in my opinion) particularly well documented. So hopefully you've searched for something like 'getting started with Linq to Sql and Sql Server CE' or 'using Linq with Sql Server Compact Edition', you've ended up here and hopefully the following will be of service to you. To get started you'll need to create your CE database (.sdf) file. You can do this either with Visual Studio 2008 or with Sql Server Management Studio Express SP2. If you're using Management Studio you need to be certain you download the SP2 version for CE support. At the time of this writing it can be found here. Note that for CE files Management Studio provides absolutely no benefits that I can see over just using the built in VS2008 Server Explorer tools so for this post I'm going to be using VS2008 to create and manage the db. So to create the db in VS2008 either create a new executable project (WinForms or Console) or open an existing one. Then in the Solution Explorer pane right-click on the project and choose 'Add . . .' -> 'New Item . . .' from the context menu. This will bring up the Add New Item dialog. You might want to filter the list by click on 'Data' in the top left pane of this dialog. The template you are looking for is called 'Local Database'. Select this and then type a logical name for the database. This name will be the physical file name as well as the default name for your Linq DataContext when we create it. For this example we'll call it 'MyAppDB.sdf'. Click OK and the .sdf file will be added to your project. VS will immediately then prompt you to create a DataSource and DataSet for your new database. Since we're going to be using Linq we don't need either of those so just cancel out of the Data Source Configuration Window. At this point you should see MyAppDB.sdf listed as a file within your project in the Solution Explorer. If you open the Server Explorer you should also see it listed under Data Connections. Expand the listing under Data Connections in the Server Explorer and you should see sub-folders for Tables and Replication. If you do then you're db is created and ready to go. Now all you need to do is add some tables. You can either use SQL to do this by right-clicking the db entry under Data Connections and selecting 'New Query' or you can use the GUI to do it by right-clicking on the db's Tables sub-folder and choosing 'Create Table'. The GUI's probably a bit easier but it's also pretty limited. It will allow you to create a primary key and an identity field. It will allow you to create indexes and to set default values on fields. And of course it allows you to add, delete and modify the columns in the table. What it won't do however is let you create foreign keys. I have absolutely no idea why this is since CE supports them just fine but that's the way it is. If you want to create a foreign key you've got to do it with SQL. Just in case you can't remember the syntax it should be something like this:
alter table myChildTable
add constraint FK_ParentID 
foreign key (ParentID)
references myParentTable(ParentID) on delete cascade   
Once you've got your db schema all set up how you want it you're ready to generate your Linq DBML file so you can start coding against your database. Once again however for whatever reason Microsoft has failed to implement the most obvious way of getting this done. The DBML designer in VS doesn't have full support for CE. You can't just create a new Linq to Sql classes object in your project and then drag tables from the Server Explorer to the .dbml files designer canvas. In fact, there is no way (as far as I can tell) within VS to create a DBML file for a CE database. To do this you have to use the command line tool SqlMetal.exe. The easiest way to get at SqlMetal in my opinion is to go into your start menu and from there choose 'Microsoft Visual Studio 2008' -> 'Visual Studio Tools' -> 'Visual Studio 2008 Command Prompt'. This will open a console window whose path and everything are set up correctly to run SqlMetal. In this console window you then need to execute a command like this:
sqlmetal /dbml:MyAppDB.dbml MyAppDB.sdf /pluralize
SqlMetal is the code generator that VS uses to create a DBML file and its code-behinds when you use the designer. While the designer doesn't fully support CE, SqlMetal does. So what we're doing here is an end-run around the designer. Let's take a look at the arguments we're passing to the SqlMetal executable above. When we specify
'/dbml:MyAppDB.dbml'
we are telling SqlMetal two things. First we're telling it that we want it to generate a DBML file. Second we're telling it that the filepath of the newly generated file should be
\MyAppDB.dbml
. The second argument,
'MyAppDB.sdf'
is the filepath to the CE database file that has contains the table schema we want SqlMetal to run against. The final argument,
'/pluralize'
tells SqlMetal to automatically handle pluralization of the names of entities. For instance the entity generated to represent rows in a table named Employees will automatically be called Employee. This works pretty well and I recommend using it as it makes your code more readable. In American English at least. Now one thing you'll note is that I didn't specify paths to either the .dbml file or the .sdf file in the above call. You certainly could, but I find it easier to just cd to your project directory first, before you execute SqlMetal at all. That way everything just refers to the current directory (your project directory) by default. So, so far your steps were to:
  1. Create or open a project in Visual Studio 2008
  2. Add a Local Database file to that project
  3. Added one or more tables to the database created in step 2
  4. Opened a Visual Studio Command Prompt window and changed directory to your project directory
  5. Ran the SqlMetal command line tool with something like this:
    'sqlmetal /dbml:MyAppDB.dbml MyAppDB.sdf /pluralize'
At this point if you look in your project directory you should see that there is a new file there called MyAppDB.dbml just as you specified. Now go back into your project in VS and in the Solution Explorer right-click your project and from the context menu select 'Add' -> 'Existing Item . . .' and in the resulting dialog find MyAppDB.dbml and add it. Now you should see the .dbml file listed in the Solution Explorer under your project. Double-click it and voila! The designer opens up and you should see all of the tables in your database represented graphically, including their relationships if you defined any. That's really all there is to it. To program against your model simply create an instance of the related DataContext object and use Linq to query against it. For instance:
using (MyAppDB db = new MyAppDB("MyAppDB.sdf"))
{
 //add a row to the database
 MyTestEntity myEntity = new MyTestEntity();
 myEntity.TestProperty = "Test";
 db.MyTestEntities.InsertOnSubmit(myEntity);
 db.SubmitChanges();

 //load rows from the database into objects
 var query = from mte in db.MyTestEntities
    select mte;
 
 //should output one instance of the word 'Test'
 foreach (MyTestEntity e in query)
 {
  Console.WriteLine(e.TestProperty);
 }
}
Hopefully someone out there finds this helpful! If so leave a comment and let me know.

Tuesday, April 8, 2008

Bulk Deletes in Linq to Sql

UPDATE: Terry Aney independently came up with a similar solution to this problem as I did. Terry however has a better implementation in that his supports compound primary keys. He's also got similar support for batch updates in there. I recommend that you read about his approach here but then read his follow-up and download a later version of his code here. Thanks Terry! A couple of days ago I was working on an application that uses Linq for its data access layer and I was trying to figure out the Linq way of doing a bulk delete. To illustrate the type of operation I was attempting let's imagine a simple HR database. This database has two tables: Employees and EmployeeTypes. A relationship between the tables exists such that each Employee has one and only one type while each EmployeeType has 0, 1 or many Employees. This is accomplished with the following schema: Employees ======== EmployeeID int(primary key, identity) EmployeeName varchar EmployeeTypeID int (foreign key to EmployeeTypes) EmployeeTypes ============ EmployeeTypeID int (referenced by Employees) TypeDesc varchar Let's say that in my imaginary organization we had been experimenting with using offshore developers to decrease our development costs. Now let's imagine that (just like in most attempts at off shoring development) the higher ups have realized that the code coming from across the pond is total crap and that you really do get what you pay for. Therefore the powers that be have decided to can the whole offshore team. Now we need to update the database to get rid of all the offshore employees in the database. (In reality we probably wouldn't really want to delete these rows but this is just an example.) In Sql this would be a simple thing to accomplish. We could simply execute the following query:
DELETE FROM Employees
WHERE EmployeeTypeID IN
(SELECT EmployeeTypeID
FROM EmployeeTypes
WHERE TypeDesc = 'Offshore')
Much to my surprise there is no clear analog to this query in Linq. I thought that this would work:
var toDelete = from e in db.Employees
   where e.EmployeeType.TypeDesc == "Offshore"
   select e;

db.Employees.DeleteAllOnSubmit(toDelete);
db.SubmitChanges();
In fact it does work but when I ran the Sql Profiler I noticed something quite troubling. When Linq executed this code it was first running the select defined by my toDelete var and retrieving the results. It was then going through each Employee returned by the query and issuing a SEPARATE delete call for each. To me this seemed needlessly inefficient so I started Googling for a solution. I came up with two hits that attempted to solve this problem. The first was from Chris Rock (no not that Chris Rock.) He wrote a good post about a solution to the problem that a guy named Benjamin Eidelman wrote. You can see this solution at Chris's blog, RocksThoughts. But there were some problems with this solution. Firstly the code he posted did not build as posted. I traced down the problem to the fact that the capture groups in the regex he was using to parse the query were not named properly. I'm guessing that since .NET regexes put their capture group names between less than/greater than symbols his blogging app is stripping out the group names thinking that they are HTML. Fair enough. I've contacted him and hopefully he'll get the fix I supplied posted. A much bigger problem for me though was that when I tried to use his code to accomplish the above defined task I got an error that my query type wasn't supported. I tried to use his code like so:
db.DeleteAll(db.Employees.Where(e => e.EmployeeType.TypeDesc == "Offshore"));
This is definitely what I wanted but in examining the code from Chris's blog I quickly determined that it would only work if the Where clause was very simple and limited to checking values on the object being deleted. Joins are not supported by this code at all as far as I can see. And in fact while looking at the underlying code I realized that it relies heavily on string manipulation to achieve its results. And that certainly doesn't seem very Linq-y to me. Linq gives us the facility to pass around a query as a logical construct so it seemed to me that we should be operating on that construct rather than its interpreted string representation. So from Chris's blog I then followed a link to a Lambda expression-based solution to the same problem. The solution was from Jeffrey Zhao's blog. Here I thought I would find what I was looking for. A method that examined an IQueryable construct and parsed it into an efficient Sql DELETE statement. And yet upon further examination I found that this wasn't really what I was looking for either. First of all, rather than taking an IQueryable as an argument it takes a predicate defined as Expression>. While I think this cuts down on the readability of the consuming code I figured no big deal and so I tried my delete query with this library like so:
db.Employees.Delete(e => e.EmployeeType.TypeDesc == "Offshore");
This built but when I ran it it the generated query did not join in the EmployeeType table. Instead the query was looking for a field called TypeDesc on the Employees table. So of course it failed. I came to the conclusion that while this method was probably somewhat better than Mr. Eidelman's string manipulations it still didn't do what I wanted. If you look on the message boards, etc. you will see people asking this question and the answer is usually either to use one of the two above methods or else to write a custom stored procedure to accomplish this type of delete. But having to write a stored proc for every different type of bulk delete I might need to perform just rubbed me the wrong way. I wondered if it would be possible for me to come up with a solution more to my liking. After some thought I came up with something that so far I'm liking quite a bit. It takes its inspiration from the solutions explored above, combining the legibility of defining the records to delete in a simple Linq query with the relative type safety of using the expression tree to generate the where clause. Now it doesn't work in all situations.The main one I see right now is that it doesn't support tables with compound primary keys. But other than that I think it's pretty nice. The usage is pretty straightforward:
var toDelete = from e in db.Employees
        where e.EmployeeType.TypeDesc == "Offshore"
        select e.EmployeeID;

db.Employees.BulkDelete(toDelete);


Note that the query is selecting a list of the ids of the records to be deleted. This is key because basically what the code does is wrap the select query defined in Linq inside a delete query based on the primary key of the table being deleted from. So the query that is actually executed on the server from the above is:
delete from Employees
where EmployeeID in
(select EmployeeID
from Employees t0
inner join EmployeeTypes t1 on t0.EmployeeTypeID = t1.EmployeeTypeID
where t1.TypeDesc = 'Offshore')
Unfortunately I haven't yet been able to come up with a way to make the compiler ensure that the enumerable type returned by the user-supplied query matches the type of the primary key. That's just up to the developer right now. I'd love to hear if anyone's got any ideas beyond just making it only work with int. It should I think at the very least also support string and guid keys. But I think that the Sql Optimizer should be able to execute this sort of query quickly and efficiently. If anyone sees any problems please let me know! Here is the code for the extension method that actually does the work:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.Linq;
using System.Data.Linq.Mapping;
using System.Data;

namespace LinqExtensions
{
public static class TableExtensions
{
    public static int BulkDelete<TEntity>(this Table<TEntity> table, IQueryable idsToDelete)
       where TEntity : class
    {
        MetaTable metaTable = table.Context.Mapping.GetTable(typeof(TEntity));
        string tableName = metaTable.TableName;
        if (metaTable.RowType.IdentityMembers.Count != 1)
        {
            var keyFields = from im in metaTable.RowType.IdentityMembers
                                 select im.MappedName;

            if (keyFields == null || keyFields.Count() < 1 )
            {
                keyFields = new List<string>();
            }

            throw new ApplicationException(string.Format("Error in TableExtensions.BulkDelete<TEntity>: Table {0} has a compound key.  BulkDelete can only operate on tables with a single key field.  (key fields found: {1}", tableName, string.Join(",", keyFields.ToArray())));
        }
        string primaryKey = metaTable.RowType.IdentityMembers[0].MappedName;

        System.Data.Common.DbCommand origCmd = table.Context.GetCommand(idsToDelete);
        string toDelete = origCmd.CommandText;
        string sql = string.Format("delete from {0} where {1} in ({2})", tableName, primaryKey, toDelete);

        origCmd.CommandText = sql;

        bool openedConn = false;

        if (origCmd.Connection.State != ConnectionState.Open)
        {
            openedConn = true;
            origCmd.Connection.Open();
        }

        try
        {
            return origCmd.ExecuteNonQuery();
        }
        finally
        {
            if (openedConn)
            {
                origCmd.Connection.Close();
            }
        }

    }
}
}

Post in the comments and let me know how it works for you!

Thursday, April 3, 2008

The REAL Current Directory in .NET

I've got this little open source C# app I wrote called PowerFlag. PowerFlag is a response to the unending plethora of spammers on Craigslist. Now, I love Craigslist but the spammers really get on my nerves. So I wrote this app to automatically flag as spam posts which contain keywords that the user defines. The app worked pretty great except for the fact that when I tried to run it through the Windows Scheduler it seemed to run fine judging by the logs but it never seemed to find any posts to flag. For instance if I ran the app by double-clicking it and clicked the Flag! button in the app it would find maybe 10 or so spam posts and flag them. This would be evident via a text box in the app and the log file that is created on every run. If I right-clicked the scheduled task and chose 'Run' though the log file would be created but it would say that no posts were found to flag. I lived with it this way for a couple of months before today I finally got around to figuring out what the root of the problem was and how to fix it. My problem was that the app uses two settings files and the definitions for the filepaths for those two files were stored in string constants which concatenated together System.Environment.CurrentDirectory with the file names of the files. By adding more logging I was able to see that when the app was run via the task scheduler the value of System.Environment.CurrentDirectory was not the directory the app executable was in but instead was for some reason c:\windows\system32. Upon further investigation I found different settings files for the app in this location as well. These settings files contained the initial settings that the program uses on first startup and did not contain all of the keywords I had defined using the app itself. The fix for this problem of System.Environment.CurrentDirectory not always being the same as the current directory of the executing app is to instead use this: System.IO.Path.GetDirectoryName(System.Reflection.Assembly._ GetExecutingAssembly().Location) This is pretty much guaranteed to always give you the path to the folder containing the current executable. Once I made this change my log files overflowed with flagged posts from the scheduled app. Voila!

Tuesday, March 11, 2008

How To Copy A SqlServer 2005 Database

So today I started a new project here at the office. I'm going to update our homebuilt CMS to include some new functionality that our sites require. The changes I plan to make will require changes to the database structure. Now the current version of our app's database is running on SqlServer 2005. Since other developers will continue work on the old app while I work on the new one it was necessary for me to create a copy of the database and do my development against this copy. The problem was that every time I need to make a copy of a database I have a hard time remembering the right way to go about it. Searching on Google was no help. I got links to a bunch of 3rd party crapware and out of date MSDN articles. None of which helped. Eventually I figured out a good way to do it and figured I'd post it here for anyone else who might need to do this. What I'm describing here is a way to create a duplicate of an existing database on the same server as the existing db. This method uses Sql Server Management Studio Express. I'm sure the full version would work as well. I imagine a similar procedure would work across servers too. The first step I did was to create a new database on the server which would be the duplicate db. For the purposes of this article let's assume that my original db is called 'Prod' and my new db is called 'Dev'. So I right clicked on the 'Databases' folder in SSMSE (Sql Server Management Studio Express), chose 'New Database . . .' and created a new db called 'Dev'. Next up I right-clicked on the 'Prod' db (the one I want to duplicate) and chose 'Tasks -> BackUp . . .' I created a full backup of the db just to make sure I had the latest data. Once the backup had completed I then right-clicked on the 'Dev' database and chose 'Tasks -> Restore -> Database . . .' On the 'General' page I specified 'Prod' as my 'From database' and 'Dev' as my 'To database'. Note that when you select the 'From database' the 'To database' value gets changed to that value. You need to make sure that the 'To database' value is actually the name of the duplicate database. Next I clicked over to the Options page in the Restore dialog. Here I had to change the 'Restore As' names of the files in the file listing. By default they are the same as the 'Original File Name' values. But that would cause the backup to overwrite the original database. What we want is to change the 'Restore As' names to be the names of the duplicate database's files. In this example I changed these two entries: C:\Program Files\Microsoft SQL Server\MSSQL\data\Prod.mdf C:\Program Files\Microsoft SQL Server\MSSQL\data\Prod_log.ldf To: C:\Program Files\Microsoft SQL Server\MSSQL\data\Dev.mdf C:\Program Files\Microsoft SQL Server\MSSQL\data\Dev_log.ldf Then, still on the Options page of the 'Restore Database' dialog, I checked the 'Overwrite the existing database' checkbox. Next just hit the 'OK' button and let it do its thing. The end result should be a duplicate of your original database! Hope this helps some of you out there. So first I created a new, empty db called 'Dev'.

Friday, February 8, 2008

Linq, LinqDataSource and Connection Strings

Today I figured something out that might be obvious to some but caused me confusion for a good amount of time. What I figured out is where the default constructor for a DataContext class gets the connection string it uses to connect to a database. You're probably already aware that when you create a new LINQ To SQL item in a Visual Studio 2008 project a custom DataContext derived class is generated by the SqlMetal tool (this happens invisibly in VS.) This DataContext has numerous constructors. One for example takes a connection string as an argument. A DataContext object created in this way will use the connection string supplied when it needs to access the database. In fact since I started using LINQ I have been relying heavily on this very constructor. I was using it despite the fact that I knew that there was a default constructor on the generated DataContext as well. The reason for this was that I wasn't really clear on where exactly the default constructor was getting its connection string value from. I did see that if I used the default constructor the data context would just automatically connect to the database I used to create the Linq To Sql file but that was all I knew. And since I do this for a living and most of the applications I develop are developed against a devel db and then deployed against a producton db I just sort of figured that the proper way to handle things would be to always use the previously mentioned DataContext constructor that takes a connection string argument. I then created a little utility method called GetDataContext that would read a connection string out of the app or web.config file and use that as the argument to the DataContext's constructor, passing back the resulting object. This method served me pretty well for a couple of months. But this morning it went pear-shaped on me. I was deploying a new LINQ-based web application to our production server. Once up there I tried to load the first page in my browser. To my surprise instead of the page I expected I got an error message that the SQL Server couldn't be reached. I checked and double checked my connection string value in my web.config to make sure it was pointing at the right place. It was. I verified that the same code ran fine in our test environment. It did. "What," I wondered, "could possibly be the problem?" My next step was to add some logging around that GetDB method. I told it to log the connection string it was using. I compiled and posted the new binaries and tried to access my page again. Again I got the error so I looked in the logs. Oddly enough there was no entry in the log at all for this last request. But I knew the page used the database to display a list. How was it that my GetDB method wasn't getting called? Then it dawned on me. For this particular list I'd decided to try out the new LinqDataSource object. I hadn't yet really worked with the LinqDataSource and I'd figured this would be a good opportunity. Now that I remembered using the LinqDataSource it occurred to me that I really didn't know where it was getting its connection string from. I knew that under the hood it must be creating an instance of my DataContext object. And I suspected it was using the default constructor. But how could I get it to use the proper connection string? First I looked at the properties on the LinqDataSource. There didn't appear to be anything there that would let me control the connection string. I then decided to reexamine my LINQ project and try to figure out where the default constructor's connection string was coming from. This led me to realize that the connection string was defined as a member of the Properties.Settings class and therefore the value for it was in the ConnectionStrings section of an app.config file in my LINQ project. So in order to get my application to work all I had to do was copy the connection string 'add' directive from the app.config in my LINQ project to the web.config of my web project and then update the connectionString value to the proper one for my production envirionment. Loaded up my page again and it worked like a charm. And now that I know this I'm going back and fixing all my LINQ applications to use the default contsructor. No more GetDB methods for me!