High-Speed Blog
Technology Moves as Fast as We Move It by Brian Buikema
Technology Moves as Fast as We Move It by Brian Buikema
Jan 28th
Scaling is a key feature of MongoDB. And even though manual sharding is supported by most databases, MongoDB supports the concept of autosharding. This 15 minute high speed post provides a detailed overview of autosharding in MongoDB and, specifically, how to create shards supporting autosharding in MongoDB.
The process of splitting up data and storing portions of data on different machines is called sharding. By splitting up data across machines, it becomes possible to store more data and handle much more load without requiring large or powerful machines, e.g., machines that consist of powerful CPU’s and/or massive amounts of RAM.
Two types are sharding can occur. Manual sharding and autosharding.
In manual sharding, the application code manages storing different data on different servers and querying the appropriate server to get it back. Manual sharding can be done with virtually any database software package.
In MongoDB autosharding, some of the administrative overhead required in manual sharding is eliminated. The cluster of database servers, or shards, handles splitting up of data and rebalancing of data automatically.
The basic concept behind MongoDB’s sharding is to break up collections into smaller chunks, or documents. These documents can be distributed across shards so that each shard is responsible for a subset of the total dataset.
As an example, consider the following. When you set up sharding you choose a key from a collection and use that key’s values to split up the data. This key is called the shard key.
Suppose we had a collection of contacts. If we chose “lastName” as our shard key, one shard could hold documents where “lastName” starts with A-F, the next shard could hold last names from G-P, and the final shard could hold last names Q-Z. As you add or remove shards, MongoDB would rebalance this data so that each shard was getting a balanced amount of traffic and a practical amount of data.
So when should you decide to start sharding? Consider the following reasons:
- When you’ve run out of disk space on your current machine.
- You want to write data faster than a single mongod can handle.
- You want to keep a larger portion of data in memory to improve performance.
Three different components are involved in sharding as follows:
A shard is a container that holds a subset of a collection’s data. A shard is either a single mongod server (for development/testing), or a replica set (for production).
This is the router process. It routes requests and aggregates responses. It doesn’t store any data or configuration information, although it does cache information from the config servers.
Config servers store the configuration of the cluster. For example, which data is located on which shard. Used by mongos to determine request routing.
First we need to strat up our config server and mongos. We need to start a config server because mongos uses it to get its configuration.
$ mkdir -p ~/dbs/config $ ./mongod --dbpath ~/dbs/config --port 20000
Now we start a mongos process for an application to connect to. Routing servers do not even need a data directory, but they need to know the location of the config server.
$ ./mongos --port 30000 --configdb localhost:20000
Shard administration is always done through a mongos.
Start a normal mongod instance (or replica set), since this is what a shard naturally is
$ mkdir -p ~/dbs/shard1 $ ./mongod --dbpath ~/dbs/shard1 --port 10000
Now connect to the mongos process started earlier and add the shard to the cluster.
First, start up a shell connected to the mongos process as follows:
$ ./mongo localhost:30000/admin
Now add this shard with the addshard database command:
> db.runCommand({addshard : "localhost:10000", allowLocal : true})
{
"added" : "localhost:10000",
"ok" : true
}
The “allowLocal” key is necessary only if you are running the shard on localhost and lets MongoDB know that you’re in development and know what you are doing.
In order to allow MongoDB to distribute data, you have to explicitly turn sharding on at both the database and collection levels. For example, the following enaables sharding for the database acme:
> db.runCommand({"enablesharding" : "acme"})
Once you’ve enabled on the acme database, a collection is sharded by running the shardcollection command as follows:
> db.runCommand({"shardcollection" : "acme.products", "key", : {"_id" : 1}})
Now the collection will be sharded by the “_id” key. When data is added to acme, it will be automatically distributed across the shards based on the values of “_id”.
I hope this post enlightens you on the possibilities that MongoDB’s auto-sharding feature provides for ease of scaling.
Jan 26th
MongoDb is becoming the defacto standard supporting the concept of NoSQL databases that are schemaless, and true document repositories. Companies developing state-of-the-art applications, both enterprise level and small start-ups, are embracing mongodb for its simple approach for supporting the development of schemaless databases that provide
A general set of MongoDB supported Use Cases, both well suited and less well suited are found here.
It should be noted that MongoDB shares the NoSQL spotlight with CouchDB. Since MongoDB and CouchDB are both document-oriented databases with schemaless JSON-style object data storage, folks are naturally asking questions. Please click here to compare MongoDB and CouchDB.
Here is a list of MongoDB features. Click on the links for details.
MongoDB was not created to be just another database that tries to do everything for everyone.
Instead, MongoDB was created to work with documents rather than rows, was extremely fast, massively scalable, and easy to use.
In order to accomplish this, some features were excluded, namely support for transactions. Therefore, MongoDB may not be a great fit for developing accounting applications. However, it is quite common to strategize software architectures by using a hybrid of technologies. For example, a relational database works well with accounting and transactional components. While MongoDB is a great technology for storing and retrieving complex data and document storage.
At a high-level, MongoDB provides the following (this is not an exhaustive list):
MongoDB is a document-oriented database. It is not a relational one. The primary reason for moving away from the relational model to the document-oriented model is to make scaling out easier, but additional advantages are found as well.
The fundamental idea is replacing the concept of a row with the more flexible model, the document. By supporting embedded documents and arrays, the document-oriented approach makes it possible to represent complex hierarchical relationships with a single record. This fits naturally with how data needs are addressed by developers using modern object-oriented languages.
More coming soon!
I highly recommend installing MongoDB via the default repositories available in Ubuntu. These repositories contain MongoDB, but may contain out-of-date versions. Software installation from repositories is done through aptitude. In order to obtain the latest versions of MongoDB, simply add the following line to your repository list found in /etc/apt/sources.list
deb http://downloads.mongodb.org/distros/ubuntu 10.4 10gen
where 10.4 is the version of Ubuntu I run. Make sure to supply the version of Ubuntu you run.
Next, you need to tell aptitude to retrieve the new repositories as follows:
$ sudo aptitude update
And now install MongoDB using the following command:
$ sudo aptitude install mongodb-stable
As the last step in the installation process, you will need to create the data directory yourself. Make sure and do this as a non-root user.
$ sudo mkdir -p /data/db/ $ sudo chown `id -u` /data/db
You can start the MondoDB service as follows, and again, make sure and do this as a non-root user:
$ sudo start mongodb $ sudo status mongodb
And start the MongoDB shell with the following command:
$ mongo
Once in the MongoDB shell, some basic commands include the following:
> show dbs > show collections > show users > use <db name>
Here is a quick reference to the commands that control the execution of the mongod server process:
$ sudo status mongodb $ sudo stop mongodb $ sudo start mongodb $ sudo restart mongodb
If upon starting the MongoDB shell, you happen to get an error message connecting to the server similar to the following:
Error: couldn’t connect to server 127.0.0.1} (anon):1137
You may also notice that even though you issued the command to start the service, it actually did not start. The next step is to verify that you are unable to start MongoDB.
Run sudo start mongodb. It will report
mongodb start/running, process XXXX
no matter what. However, when you run sudo status mongodb again, you’ll get
mongodb stop/waiting instead of mongodb start/running
Note: This condition is largely due to an unclean shutdown, and results in the creation of a lockfile /var/lib/mongodb/mongod.lock.
The fix is a quick two-step process as follows:
This is accomplished as follows:
$ sudo rm /var/lib/mongodb/mongod.lock $ sudo -u mongodb mongod -f /etc/mongodb.conf --repair
Now when you run sudo start mongodb. It will report
mongodb start/running, process XXXX
and when you run sudo status mongodb again, you’ll get the expected
mongodb start/running, process XXXX
A MongoDB database is non-relational and schemaless. Therefore a MongoDB database is not bound to columns and data types like relational databases are. One of the primary benefits of a flexible schemaless design is that you are not restricted when programming in a dynamically typed programming language such as Python. Keep in mind that even though MongoDB is schemaless, the data structure is not completely devoid of schema as you still define collections and indexes, as we will discuss later. Nevertheless you will not need to predefine any data structure for any of your documents you will be adding.
The fundamental components of a MongoDB database are documents and collections.
Document is an item that contains the actual data, similar to a row in a SQL database table.
Collection is a collection of documents, similar to a table in a SQL database. Unlike a SQL database, two or more completely different documents can co-exist in a single collection.
More coming soon!
Python is a simple programming language that provides the natural ability to develop code that is perfectly readable. Here are some links to Python code examples:
In this section, we will develop simple, clear, and powerful code that works with MongoDB through the Python driver known as the PyMongo driver.
Before we begin to write Python code to access MongoDB databases, we first need to install the PyMongo driver.
I’m assuming you have Python 2.7 or greater up and running. The steps for Python installation are simple once you obtain the source from python.org/download and extract the contents from the tar file, and are commonly listed as follows:
$ ./configure $ make $ make test $ sudo make install
Of course, you should always consult the readme file to obtain up-to-date instructions.
The steps required to install the PyMongo driver are as follows:
Obtain the Python setuptools egg for your version of Python. For example, I obtained setuptools-0.6c11-py2.7.egg from http://pypi.python.org/pypi/setuptools#using-setuptools-and-easyinstall.
Now execute the downloaded egg as if it were an acual shell script by typing (note your setuptools egg filename may be different depending on the version of Python you are using).
$ sudo sh setuptools-0.6c11-py2.7.egg
$ sudo easy_install pymongo
That’s it! But let’s test to ensure that our installation was a success.
From the Python shell, type the following:
>> import pymongo
You should be greeted with an empty cursor >>. That’s success.
To take it one step further, we’ll insert data into a database and also retrieve it back. Remember in MongoDB, if we try to retrieve a database that does not exist, MongoDB creates it for us automatically, as is the case with mytestdb below.
Type the following in the Python shell:
>> from pymongo import Connection
>> c = Connection()
>> db = c.mytestdb
>> collection = db.items
>> item = { "Title" : "Test Data", "Value1" : "1", "Value2" : "2"}
>> collection.insert(item)
>> collection.find_one()
{u'_id': ObjectId('4d432adc1d41c85d8a000000'), u'Value1': u'1', u'Value2': u'2', u'Title': u'Test Data'}
Now that you have the PyMongo driver installed and running, it is time to learn the basics in developing Python code that works with MongoDB.
Instead of reinventing an already great tutorial, I provide this link to the tutorial located on the mongodb.org site.
Enjoy.
Since sharding in MongoDB is a key feature and large topic to discuss, I created a separate post found here. This 15 minute high speed post provides a detailed overview of autosharding in MongoDB and, specifically, how to create shards supporting autosharding in MongoDB.
Enjoy.
Dec 3rd
Part 1 discussed designing and executing advanced performance testing for the deployment of Microsoft CRM Dynamics solutions.
This post discusses how to ensure your production CRM environment is tuned and optimized for top performance.
My knowledge in this area has been largely obtained by working closely with Robert Shurtleff, the Microsoft CRM Dynamics Architect at Ciber.
Therefore I supply a link to Robert’s post on this subject.
Enjoy!
Dec 3rd
MS CRM Dynamics can be configured and customized per specific business needs. However, neglecting performance testing can kill your CRM deployment.
Conversely, administering some common advanced performance tactics targeted at your specific usage can upgrade performance metrics to well above acceptable standards.
More coming soon.
Nov 23rd
If you’re developing a mobile app that requires an event reminder feature, or are curious to expand your toolbox of development knowledge, this post is for you. As the mobile app industry is still in its infancy stage, I find developing mobile apps challenging and fun, and brings out the innovator in all of us.
In this post, I will
Note: Since a full working application is not provided, I leave it to the reader to wrap this feature within the appropriate activities and provide a user friendly interface to manage event reminders.
Event reminders simply remind us of an upcoming event. The most common events are birthday’s, holiday’s, and meetings.
Most of us use a calendaring tool, such as Google Calendar or MS Outlook to set event reminders from our laptops and desktops. We are reminded of an event via an alert, usually displayed as a dialog box and, possibly sound, to get the user’s attention.
Event reminders are easily added, and allow us to specify a period of time in advance of the event, to alert us of the upcoming event.
In this post, we are interested in moving the event reminder feature into a mobile app. At this time, we are not interested in sync’ing event reminders to an existing calendaring tool. That is, we are only interested in developing a stand-alone event reminder feature that runs on a mobile device. Our requirements are as follows:
To effectively implement event reminders, components providing the following high-level functionality are necessary:
Consider the following Class Diagram depicting a design of the event reminder feature:
Note that the EventReminderAlarmService, EventReminderAlarmReceiver, and EventReminderAlertCheckTask are required to properly service and broadcast alerts using the Android platform. Each inherits functionality as follows:
Also, the AlertNotificationHelper class contains simple methods that provide a facade to the Android notification features. When we need to alert the user of the occurrence of an event reminder, we use this class to effectively make a sound, flash the LED’s, and vibrate the device.
The EventReminderAlarmService class is a true Android Service that periodically checks for the occurrences of event reminders. The actual occurrence checks are done in the EventReminderAlertTask class. An EventReminderAlert instance is created for each event reminder occurrence found.
The EventReminderAlarmService class is as follows:
public class EventReminderAlarmService extends Service implements IDebugSwitch {
public static final String NEW_ALERTS_FOUND = "New_Alerts_Found";
private AlarmManager alarms;
private PendingIntent alarmIntent;
private EventReminderAlertCheckTask alertCheckTask = null;
@Override
public int onStartCommand(Intent intent, int flags, int startId) {
// todo: alertsEnabled should be read as a user preference/setting
boolean alertsEnabled = true;
long updateFreq = 24*60*60*1000;
if (alertsEnabled) {
int alarmType = AlarmManager.ELAPSED_REALTIME_WAKEUP;
long timeToRefresh = SystemClock.elapsedRealtime() + updateFreq;
alarms.setRepeating(alarmType, timeToRefresh, updateFreq, alarmIntent);
}
else {
alarms.cancel(alarmIntent);
}
refreshAlerts();
return Service.START_NOT_STICKY;
}
@Override
public void onCreate() {
alarms = (AlarmManager)getSystemService(Context.ALARM_SERVICE);
String ALARM_ACTION = ReminderAlertAlarmReceiver.ACTION_REFRESH_ALERT_ALARM;
Intent intentToFire = new Intent(ALARM_ACTION);
alarmIntent = PendingIntent.getBroadcast(this, 0, intentToFire, 0);
}
@Override
public IBinder onBind(Intent arg0) {
// TODO Auto-generated method stub
return null;
}
private void refreshAlerts() {
if (alertCheckTask == null ||
alertCheckTask.getStatus().equals(AsyncTask.Status.FINISHED)) {
alertCheckTask = new EventReminderAlertCheckTask();
User user = Logon.retrieveUser(getApplicationContext());
alertCheckTask.execute(user);
}
}
}
And the EventReminderAlarmReceiver class is as follows:
public class EventReminderAlarmReceiver extends BroadcastReceiver {
public static final String ACTION_REFRESH_ALERT_ALARM = "com.iqspike.eventreminder.service.ACTION_REFRESH_ALERT_ALARM";
@Override
public void onReceive(Context context, Intent intent) {
Intent startIntent = new Intent(context, EventReminderAlarmService.class);
context.startService(startIntent);
}
}
It is important to note that some entries are required in the AndroidManifest.xml file as follows:
<service android:enabled="true" android:name="com.iqspike.eventreminder.service.EventReminderAlarmService"/> <receiver android:name=".EventReminderAlarmReceiver"> <intent-filter> <action android:name="com.iqspike.eventreminder.service.ACTION_REFRESH_ALERT_ALARM"/> </intent-filter> </receiver>
More coming soon!
Oct 22nd
In my development of mobile apps, I have come across several cases that could benefit from a server-side backup-restore process as follows:
NOTE: #3 is beyond the scope of this discussion and is left to the reader to adapt and enhance this more general server-side backup-restore process.
Some quick examples of these are the following.
I’m writing this post since I recently required a server-side backup-restore process for a mobile application I’m developing, and I’m betting many others can benefit from this discussion, including the example client and server-side code. Android is my target mobile platform and ASP.NET MVC, using RESTful services is my server-side platform of choice.
For this exercise, I will create a file of json serialized records at the client. Each record represents one business entity, e.g., address record, contact record, note record, etc…
A RESTful service accepts the file, the version of the user records, and the user’s Id, and stores the file on the server appropriately named to associate it to the user.
NOTE: Another option for storing records on the server, is to store them as key-value pairs, suitable for storage into a document repository such as mongodb.
The UserRecords is a self-contained entity that actually does most of the work for us. It is a full-fledged business object and DTO. We will start with the basics, and develop more as we go along. Our primary properties are as follows:
public class UserRecords {
public long UserId = -1;
public byte[] UserRecords = null;
public String RecordVersion = "1.0";
public UserRecords() {
}
}
More coming soon!
Sep 26th
If you have been developing mobile apps for the last year, two things stick out that should make you pay special attention to your architectures. They are:
1) Saturation of wireless carriers’ networks resulting in decreased throughput, a.k.a., s-l-o-w network.
2) And the second is the increase in speed of processors powering our mobile devices to over 1Gz and quickly rising.
More coming soon!
Sep 7th
I am quite excited that Samsung is going to release the first Android tablets in the coming weeks. I consider myself a huge proponent of the Android platform and own a Droid. In fact, you can read all of my dozens of Android posts found here.
With that said, I’m very upset at the prospect of being required to purchasing another two-year contract with a wireless carrier in order to have wireless data services, aka, data plan. PCWorld has an interesting article on this found here.
You ask, what choices do you really have? The first that comes to mind is Tethering. The second is to purchase a Wi-Fi only tablet when available, thus not requiring a data plan.
Let’s look at the Pros and Cons of owning both a Smartphone and Tablet.
You will be able to have two slightly different, slightly similar devices to choose from depending on your immediate needs. For example, smartphones are more compact than tablets.
Maybe we will start using the actual “phone” feature on our smartphones again.
And do more reading, typing, and gaming on our tablets.
You may have to buy two separate data plans, similar to the 2-year agreements we’re currently accustomed to.
Now you will own two quickly obsoleting devices.
You will have to learn how to use two different devices. Of course, if both are Android platforms, you should really have no, or very little ramp up time learning to use your devices.
I’ve had this discussion with a dozen or so folks. I want to purchase a powerfully equipped smartphone, complete with phone and data plans, and be able to purchase “touchscreen-only” tablets that come in a variety of sizes, e.g., 7″, 10″, etc,… that cost no more than $99 AND providing a docking mechanism for my smartphone.
In this way, my smartphone provides the CPU, memory, and power supply for the “touchscreen-only” tablet. And me, as a consumer, am not required to purchase a separate data plan for my tablet. How cool is that!
Additionally, I am free to by “touchscreen-only” tablets as often as I choose, since they are relatively cheap at around $99 each. I’m betting that the market will flood with a variety of tablets, and consumers will want to buy-and-try these at will.
I am really interested in everyone’s thoughts on this, and am sincerely asking that you comment on this idea.
I hope you find this interesting and innovative. Remember, we are in the infant stage of all things mobile, and we will see lots of innovation and changes in the coming months.
Aug 23rd
In this post, we will develop the code for our location-based mobile application. Remember we designed our application in Part 2, and now we ready to tackle the code!
More coming soon!
More coming soon!
More coming soon!
More coming soon!
More coming soon!
Aug 14th
In Part 1, we discussed how to get started using Google’s Places API, including all necessary registrations. We also delved lightly on the differences between Place searches and Place details.
Now let’s actually create something fun and useful. First we need an idea. And I have a good one!
Let’s quickly identify some core location-based features. These features are made possible by the ability of smart phones to track their GPS location and use GPS coordinates within applications to correlate user location to people and places in proximity to that location. Generally, the following is a list of such features:
Now, real innovation comes from your ability to enhance those features. For example, let’s say your smart phone contains an application that alerts you to store specials as you come within proximity to participating stores. So one day you are driving by and a proximity alert alerts you to a store special (or discount) at your favorite store. But you don’t have time, and drive on by.
Now let’s say you drive by the same store on your return home. Since you are within proximity to that store again, you get another proximity alert. However, this time the application somehow knows you were nearby about an hour ago, and doubles the discount offered earlier. You are excited and drive straight to the store.
That’s real innovation enhancing the core location-based features we discussed earlier.
As developers, we tend to learn new concepts and techniques best through examples. Given the example we discussed earlier in enhancing the core location-based features, let’s develop this into a working mobile application using the Android platform. To review, we stated the following:
For example, let’s say your smart phone contains an application that alerts you to store specials as you come within proximity to participating stores. So one day you are driving by and a proximity alert “alerts” you to a store special (or discount) at your favorite store. But you don’t have time, and drive on by.
Now let’s say you drive by the same store on your return home. Since you are within proximity to that store again, you get another proximity alert. However, this time the application somehow “knows” you were nearby about an hour ago, and doubles the discount offered earlier. You are excited and drive straight to the store.
In order to tackle this, we need to better understand are requirements and end goals.
A great place to start is defining the Use Cases. Of course we need to identify our key Actors first. I see the following:
Per Actor, the Use Cases are as follows:
Let’s identify some considerations and assumptions to make our development simple and well-defined.
We will track retailers and campaigns, and consumer proximity alert activities on the server using RESTful services. We’ll generate a quick report of customer proximity alert activities by querying the database and returning results in either JSON or XML.
We will assume a standard campaign with simple rules used by all retailers and exclude a full-blown retailer campaign management process. We will assume two or three registered retailers and exclude a full-blown retailer registration process.
And we will assume store specials/discounts never expire!
Now did you think we were done? We need to think about how we should manage the actual redemption of a reward. We can’t assume the consumer will just click a button. Therefore we need a website for the retailer to access and manage consumer reward redemption’s and basic reporting needs.
And lastly, you may wonder what technology I have chosen for the server-side. It is ASP.NET MVC. as it is easy to create RESTful services supported by POCO’s, as Business Objects, and working with DTO’s and Repositories. Click here to review my previous post for more information on POCO’s, as Business Objects, DTO’s, and Repositories.
Per our Considerations above,
We will define our data model (or entity diagram) as follows:
Note that there is no relationship between Retailers and Campaigns. Normally one would exist (e.g., one-to-many), but we are assuming that all retailers are using the same campaign since we are excluding the retailer campaign management process per our considerations.
Our mobile application is considered a client that requires services to provide the complete system functionality. Remember that system functionality is required by both retailers and consumers, and all retailer relevant information is stored on the server. Therefore, we define our required services as follows:
Our mobile application needs several screens to provide basic settings management and rewards display/management functionality.
The required screens are:
We’ve already designed our data model, spec’ed our services, and identified our application screens. Since we are tapping into location-based features, we need to discuss and agree on certain design strategies to allow us to be a mobile friendly mobile application, and not one that sucks battery life, or lives in the user thread (thus, being a bad citizen), etc…
We need a way to identify the places that match the user’s selected categories, and are within the user’s target proximity radius.
Therefore it makes sense for us to create a service that periodically call’s the place’s search API, and return those places that are within the user’s current proximity. From these results, we can easily filter those that match the user’s selected categories.
In general, our mobile application will need the following:
And our server-side application will require the following:
Let’s describe some general classes that depict our system design as follows:
For brevity, I am showing classes that belong to all components found in our system, without showing a more detailed architectural, or component diagram.
And we’ll describe the sequence through those classes for the core use cases in retrieving places in proximity as follows:
Note that I did skip some important calls to the Android API to help keep our focus on the main sequence per our use cases.
We’ll start developing the mobile application next.
The following enhancements are my opinions of details required to make this a practical application. Since my intention focuses on location-based concepts, not full application development, some application features are excluded as follows:
In Part 3 we will start tackling the code for our location-based mobile application. See you in Part 3.