Known problems affecting Sakai
Sakai was moved to version 2.6 May 20, 2009. This section lists only
problems with that new version. See the next section for issues with
the previous release.
Pauses
The Java software on which Sakai depends occasionally pauses to
reorganize its memory. This leads to periods of 20 - 30 seconds
in which Sakai does not respond. Currently we think a given
user would experience this once a week at most. We're
looking to see whether this can be improved, but it may require
a change in software that is not feasible until sometime in 2010.
Before November, we saw longer and more frequent pauses.
Some of them were long enough that users would have been asked
to login again. (What's actually happening is that the system
is considered down, and users are moved to a different system.
That currently requires them to login again.)
We're fairly sure we have fixed the situations that caused those.
Downtime
- 10/24. The database was down from 7:36-8:06 pm. One of the 4
front ends had to be restarted. The others survived, although it might
have been a few minutes after 8:06 before they were operating smoothly.
We're still investigating, but it looks like a hardware
problem with a disk drive. It we have any other problems, we'll move
to the backup server. That will probably require something like 30 min
of downtime.
- 10/14. Course tabs were missing from 7 - 9 am. Due to a miscommunication,
sakai lost access to roster information.
- Wed 9/16, morning. We've had very slow response for periods of
about 5 min. There's a problem with a file server that Sakai and other
OIT services use. At this moment we don't know what is going on. I'll
update this entry when we do.
- Tues, 7/28, 8 - 9 am. Downtime was due to a failure of the
database software.
- Tues, 6/9, 9am. Restarted one of the 4 front ends, because
of a problem associated with the move to Java 1.6. That problem
has now been fixed, so this should be the last failure due to it.
- Wed, 6/3, 12:45pm. Restarted one of the 4 front ends. In the
6am restart we inadvertently had an inconsistent configuration that
affected a few of the tools.
- Saturday, 5/23, around 10am. Sakai was down for a few minutes.
When we put up the new version of Sakai we also went to the most
recent version of Java, 1.6. The JVM crashed. We've backed up to 1.5,
which is what we were using for the old version of Sakai.
Issues from the previous year
Sakai was moved to version 2.5 May 19, 2008. This is a large enough
change that we're restarting the problem list.
Sakai has been down or slow:
- We're seeing a pattern of pauses for up to 3 minutes by one or another
of our 4 front ends. It doesn't happen very often, maybe every couple of days.
The system continues afterwards without any problem. Because it's so brief,
people typically don't report it. On May 20 as part of the new version
we're changing to a much newer version of Java that is supposed to have
fixes that are potentially relevant. We'll be watching performance.
- May 6, 2009. two of four front ends were unresponsive from 4:10 to 4:30.
The system may have been slow for some time before that.
We can reproduce the
problem, so we know what happened: A user was trying to load
data that was invalid, causing Sakai to run out of
memory. We'll be working on this, but the problem with the data was odd enough that
we don't expect it to recur.
- May 4, 2009. one front end had a period of 10 min (7pm) when it was
unresponsive, and another 20 min (11:30 pm). We believe users were moved to another
front end automatically. These are the two servers that weren't restarted April 21 and
22. This makes us suspicous that we need to restart servers after about
a month of continuous uptime.
- April 22, 2009. Restarted one front end around 9:40 am, because it was
in an inconsistent state. Users on that front end weren't able to get to Resources. It
was otherwise working properly.
This problem may have been present since late yesterday. Again, users
on that one front end (1/4 of our users) would have had to login again.
- April 21, 2009. One of the 4 front ends started doing
continuous garbage collections, around 4:30pm. We restarted it.
Users would have had to login again.
- Week of March 3, 2009. We had one to two periods of 30 min a day
with very slow response. This appears to be due to heavy use of
the forums application. It made some database queries that took
a long time. As of 9:38 pm March 8, we improved the database query
so that the system could handle the load.
- March 3, 2009, 6 am - March 4, 12:50 am. This was a scheduled
downtime, to do routine work on the database. Unfortunately we
were unable to bring the database system back up. We had to
reconstruct the database from backups. This is a surprisingly
long process. Since this episode we've set up a backup database
system that is kept in sync with the primary. Changing to it
should take no more than 30 min.
- about 45 min in February, due to a file system filling on
the database system
- Nov 17/18, 23:27 - 00:20. This looks like a duplicate of the
problem in September. The system where the database runs failed.
We will be scheduling a move to a backup server, so we can work
on the system.
- about 2 hours in October when we had trouble bringing the
system up after scheduled network maintenance
- about 30 min in September because the system where the database
runs crashed
Tests and Quizes
While these aren't exactly bugs, they are things students should
know about tests and quizes:
- Be careful about opening several windows in your browser. If
you are taking a test, move to a different window and then come back
to the test, it may not work. This only matters if both windows
are on Sakai. It's safe to open a separate window and go to CNN, etc.
If you are taking a test and want to look at another part of
Sakai, use "Save and exit". This will save everything from your
test. Then you can go to other places in Sakai. You can come back
to tests and quizes and restart the test. It will put you where
you were before.
- In timed tests, please make sure you submit the test before the
deadline. If you don't, Sakai will try to submit for you, using whatever
data you've already entered. But depending upon the situation this
may not work. It's your responsibility to submit before the deadline.
Note that the time limit is enforced by the server. Nothing you
do in your browser can stop the clock. Some attempts to do so can
cause problems.