Log in to Ask a Question

[Postgresql] strange pattern in UPDATE/SELECT on one table

development

forum:sob 13 years, 10 months ago

So I have this table:

> => \d tokens

Table "public.tokens"

Column | Type | Modifiers

–----------+--------------------------+-----------

session_id | character varying | not null

username | character varying | not null

expires | timestamp with time zone | not null

id | character varying | not null

created | timestamp with time zone | not null

Indexes:

"tokens_pk" PRIMARY KEY, btree (id)

I measured the performance of UPDATEs/SELECTs on this table using the primary key in the where clause, and while most of the time the query completes in 2-3 ms, sometimes it can take 500-1000ms (0.3% of the time, but still).

My first two reactions were to REINDEX and DELETE/VACUUM/ANALYZE since old records are useless to me. That didn't change anything.

According to my measurement over the 250k last SELECTs/UPDATEs, the average query execution time was 3 ms on average with a standard deviation of 30 ms (variance of 1000 ms2).

Now something worth knowing is that my id is quite large and the query plan is the following:

                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Index Scan using tokens_pk on tokens  (cost=0.00..8.27 rows=1 width=128)
   Index Cond: ((id)::text = '3aaa6faaab3aaadbaaaaaae181faaabc6aaaaa7c7aaaacce1da3aaa1222a5aaa'::text)
(2 rows)

I should add that the load on the server is quite low, there's CPU/RAM to spare and I/O is low also.

Would anyone know what's wrong with that table?

41 Replies

forum:Guspaz 13 years, 10 months ago

While I can't comment on the slowdowns, your schema is odd. Why is your id a varchar rather than a bigint, or bigserial? If your session key is a fixed length, why do you use a varchar?

Your indexes are also odd, or rather your complete lack of them on anything but your primary key. You only ever do selects/updates with the id, never with the session_id, username, expires, created, or anything else in the where (or order by) clause?

forum:sob 13 years, 10 months ago

id has to be random without collusion, it's as hash(rand()) to speak loosely

but I agree it's quite large (64 characters) and could/should very much be reduced by 3/4

the where clause is on (id,expires) 95% of the time and 5% on (id,username,expires) (the latter will change so that it's always on (id,expire)

keep in mind this works well in 99.% queries (<20 ms), and the table doesn't have a huge row count (it should oscillate between 5k and 20k)

for indexes, I assumed (maybe wrongly) that having one on id would be sufficient since id is unique and it's only a matter of checking if the token has expired or not

thanks for taking the time to answer

forum:hoopycat 13 years, 10 months ago

I would second the recommendation to store it as an integer… converting the hash to a hexadecimal representation then converting it to an ASCII string and storing it in a text column of varying length doesn't make the database's job easier. This will also reduce the size of the column by half: instead of 512 bits (64 characters at 8 bits/character), you've got 256 bits (64 hexadecimal digits each representing 2^4 possible states of 0000..1111).

That will very likely grease the weasel all the way to market, but if not, an index on expires or (id, expires) might help too.

forum:sob 13 years, 10 months ago

I absolutely agree that this design is by no means perfect. And I'd fix it immediately if it was critical. However, I have no reason whatsoever to think it's the cause of the issue I'm experiencing right now.

Edit: on second thought, the btree on that huge string is most likely sub-optimal, and if not balanced properly could lead to high lookup times for certain values (which could cause the issue) but again the table is small so even if the tree is the worse possible, I don't expect the performance to be that bad even in the worse case. Something to investigate anyway.

forum:Guspaz 13 years, 10 months ago

Are you caching the entire table/index in memory? If you aren't, and your query happens in the middle of somebody else on the box doing a disk copy or other IO-intensive operation…

Is there anything that differentiates the slow queries from the fast queries? Are they only UPDATEs? Only SELECTs?

forum:sob 13 years, 10 months ago

both SELECT and UPDATES are slow, and even when the where clause is id,expires

the only thing that differs from a fast and a long query is the value of id… the rest of the clause being "expires > now()"

I'll try to see if some values cause of id are recurrently the source of issues

I didn't specify any caching of the table or index in memory, I'll look into it

forum:hoopycat 13 years, 10 months ago

~~@sob:~~

Edit: on second thought, the btree on that huge string is most likely sub-optimal, and if not balanced properly could lead to high lookup times for certain values (which could cause the issue) but again the table is small so even if the tree is the worse possible, I don't expect the performance to be that bad even in the worse case. Something to investigate anyway.

The ASCII representations for 0..9 and a..f don't lend themselves to a good random distribution, that's for sure. :-)

Caching is another good thing to check for. Unfortunately, I'm not too familiar with PostgreSQL's tuning knobs, but disk will take forever (comparatively speaking).

The more I think about it, the more I like the idea of sticking an index on (id, expires)… if you're just looking for the existence or not of a row where id=bleh,expires>now(), I have reason to believe a btree can just crap that out straight from the index. 'course, if you're doing SELECT session_id,username,etc,etc, you probably won't buy too much except on a miss.

forum:sob 13 years, 10 months ago

I will change that field to int over the week end and see how it goes.

Another questions: indexes are btree by default but can be hash is specified.

It seems btree is good at comparisons < > and hash at equality test =

My where clause involving expires will obviously always be a comparison, whereas my id is always an equality test.

Should I have an index btree on id and hash on expires? or btree on (id, expires)?

It's not obvious to me what would work best…

forum:sob 13 years, 10 months ago

I will change that field to int over the week end and see how it goes.

Another questions: indexes are btree by default but can be hash is specified.

It seems btree is good at comparisons < > and hash at equality test =

My where clause involving expires will obviously always be a comparison, whereas my id is always an equality test.

Should I have an index btree on id and hash on expires? or btree on (id, expires)?

It's not obvious to me what would work best…

forum:hybinet 13 years, 10 months ago

If the ID is unique and you're always doing queries on (id, expires), there's no need for an index on expires. Postgres will probably just search its index for id, fetch one (and only one) matching row, and either return it or throw it away depending on the value of expires. In that case, an index on expires would have negligible benefits, and only slow down updates.

forum:sob 13 years, 10 months ago

That makes perfect sense, which is why I originally had a btree on id only :)

forum:Guspaz 13 years, 10 months ago

Also, btree is the only index type that supports unique constraints.

forum:hoopycat 13 years, 10 months ago

I'm closer to CE than CS, so I'll defer to the algorithms folks on what to index and how to index it. I'll defend my statements on how to store the hashes, though ;-)

forum:sob 13 years, 10 months ago

I switched to a 32 bit integer for id, generated by rand().

As I don't want to run the risg of having two identical integers, my UNIQUE constraint is on id and expires.

pilotest=> \d tokens
               Table "public.tokens"
   Column   |           Type           | Modifiers 
------------+--------------------------+-----------
 session_id | character varying        | not null
 username   | character varying        | not null
 expires    | timestamp with time zone | not null
 created    | timestamp with time zone | not null
 id         | integer                  | not null
Indexes:
    "tokens_id_key" UNIQUE, btree (id, expires)

And I still have queries that are extremely slow (some reach 1.5s), things actually seem to have gotten worse.

forum:sob 13 years, 10 months ago

And the table has below 500 rows. I'm lost.

forum:sob 13 years, 10 months ago

Still the same pattern though.

Most SELECT/UPDATE on that table get executed within 3ms. And some (identical in structure) go beyond 1s. I don't get it.

forum:hoopycat 13 years, 10 months ago

How about with an index on 'id' alone?

Still sounds weird, whatever's going on here… is it reproducible for particular queries, or does it happen on a periodic basis (every x seconds/minutes)?

Might be worthwhile watching "vmstat 1" and seeing if there's a spike in I/O around the time of a slow query. Sounds like there's also a fair amount of writes to go along with the reads, so maybe it is picking that moment to sync to disk…

(I really oughta read up on PostgreSQL more. I have a fairly active database running on it, sure, but it hasn't needed a lot of tuning, and most situations like this get covered up by considerable memcacheding.)

forum:sob 13 years, 10 months ago

I tried an index on id, it didn't change anything.

Then I removed the index altogether and I still have slow queries, but they're in the 100-200ms range instead of 1s.

I am obviously monitoring I/O, CPU load and others… the server is really underused.

forum:hoopycat 13 years, 10 months ago

On my bookshelf lives SQL Antipatterns, which does, indeed, suggest that indexes can make things worse. This might be one of those times.

How closely are you watching IO? This is one of those cases where you almost need as much resolution as you can get; five minutes won't cut it. Watching vmstat on a one-second interval should help cut through the noise and point towards where a problem might be, without capturing too much data and creating its own IO storm.

forum:db3l 13 years, 10 months ago

When it comes to trying to tune PostgreSQL I'd probably suggest you try the PostgreSQL mailing lists as you'll have a far greater pool of experienced readers there.

With that said, I suspect that focusing on the table structure, or indices, or certainly the data types is not going to make any difference. From what I can see you're talking quite small amounts of data, and even full scans probably shouldn't be subject to that much variation. Certainly worrying about an integer vs. 64 character strings as an index, or btree vs. hash indices is way premature optimization at this point (and unlikely to be a space-I/O-performance issue until you have a much larger data set). Yes, a large and frequent update cycle will create some bloat due to dead rows but if that was your problem I'd expect a gradual and overall slowdown in transaction rate, not spikes.

My first inclination is that given the sporadic nature and the significant deviations in time, that you're just running into host contention for disk I/O. That is, it's not really your schema structure at all.

However, it's also possible that there are some things you can check for which is increasing the odds of your Linode having a spike in I/O itself which could then make any host contention worse. A few thoughts (roughly oriented in decreasing priority):
* If you haven't, enable long duration query logging to ensure that the delay is internal to the database. Plus you'll then have timestamps to correlate with other database logging.

If you haven't, enable checkpoint logging and see if it correlates at all to the increased timings. If your data patterns are creating large and brief checkpoints that can delay other I/O, you could increase checkpoint frequency and lengthen a checkpoint duration to more evenly distribute the load.
Do you have autovacuum enabled and is it running? Enable logging for it if not sure. Depending on your access patterns its default configuration may not be aggressive enough for a table that is frequently updated. Or conversely, it may be kicking in too aggressively and interfering a little, so you might wish to take more explicit control and only do the vacuums when you're not doing heavy updates. (Though again, total quantity of your data would seem to make this not a big deal either way).
When checking your query, be sure to use "explain analyze" and not just "explain" so it can include actual results vs. estimates. This is a pretty trivial planner result, but if you have a complicated one, http://explain.depesz.com/ can help highlight bottlenecks.
Does your Linode have enough memory allocated to buffer/cache and have you configured PostgreSQL reasonably. It's default configuration is extremely conservative. At a minimum check effectivecachesize so it has a reasonable estimate of system cache available to it. (Although given your data size and the query explain you posted, I doubt this is an issue specifically in this case)
Check what I/O scheduler your kernel is using. If it's defaulting to cfq, change to deadline or even noop. There's little benefit from cfq in a Linode environment (and with a shared RAID array with BBU) and it can delay PostgreSQL I/O in favor of other processes.
Try testing with fsync=off. Part of the bottleneck may be PostgreSQL waiting on WAL or table syncs, but the Linode RAID+BBU environment makes the need to wait on a sync much less of an issue. I don't think I'd run with fsync=off, but it is a useful test to see if it changes behavior. If you're using 8.3+, synchronous_commit=off should help with performance just as much but be safe from a consistency perspective (and data loss perspective given the BBU in the host RAID).
Do you have any long lived transactions (I suspect not, but…). If so, they can keep additional dead rows alive until they exit since they technically need visibility as of then they started.
Is anything else on your Linode generating sporadic I/O traffic? You might try something like iotop or collect to watch per process, and at a high frequency while you try to generate a delayed query. If the increase in I/O at that point in time isn't huge, then it's good odds that it's something beyond your control.
If you can watch your I/O Wait % during such a problem, if that jumps very high, it's also a good indication that you're purely waiting on I/O - which can be normal in some circumstances - but only if you are generating a lot.
And finally, in terms of testing the basic setup, another choice is to spin up another Linode, clone your current one over there, and try the same tests. If you get different behavior, it again may point at heavier than usual host load. Another similar alternative would be to set up a VM at home with the same configuration - where you know there is no additional contention for disk I/O - and see how the same configuration and data performs there.

Assuming you find a discrepancy compared to your current Linode under the same configuration, at that point you could open a ticket and Linode will probably offer to migrate you to another host. Or you could just do it yourself by switching to your second Linode as primary and deleting the first if feasible.

– David

forum:sob 13 years, 10 months ago

Hi David,

Thank you for all these suggestions. I enabled query logging for long queries as a first step, I'll see tomorrow what the results are when users come back to the site.

Then I'll try to play with the linux I/O scheduler and fsync as you advised.

Thanks!

forum:sob 13 years, 10 months ago

I'm measuring I/O with iostat every 5 min indeed. Tried to vo "vmstat 1" but there's really nothing going on, a few blocs out here and there but not much.

forum:glg 13 years, 10 months ago

~~@sob:~~

I switched to a 32 bit integer for id, generated by rand().

why rand instead of using auto_increment?

though if you go auto_increment, do not use another value in the primary key

forum:db3l 13 years, 10 months ago

~~@sob:~~

I'm measuring I/O with iostat every 5 min indeed. Tried to vo "vmstat 1" but there's really nothing going on, a few blocs out here and there but not much.
5 minute cycles are not going to be anywhere near granular enough for something like this. You'll want at least every second, or more frequently if the tool permits and you're able to reproduce the issue in a relatively tight time frame.

One thing I do sometimes is use a tool like vmstat, iostat, etc.. in a small shell loop where I let it run once a second for maybe 60 seconds, then generate a timestamp and then go back and loop again. To avoid polluting the I/O with the stat logging itself, just leave an ssh connection running the script and log the information on a remote host.

That way if you have a timestamp (in this case from the database long duration logging) you can find the nearest timestamp in the stat output logging and see what was happening in the vicinity.

Oh, and to your prior response, if you're enabling any logging, you may as well ensure that all of long duration queries, checkpoints and autovacuums are all logged - no reason to wait to at least turn on the logging for diagnostic purposes.

– David

forum:sob 13 years, 10 months ago

>glg

rand() so that they're not predictable

db3l

of course 5 min is not granular enough, but I did run vmstat 1 on screen after your comment, trying to find spikes in I/O, and didn't see anything crazy

I enabled logging for long queries and reduced other I/O from other sources on the guest (such as debug logging of a couple custom apps), that way I hope to verify where the delays comes from (postgres or outside)

autovacuum is disabled (I prefer to run it myself at night, and there's not much to vacuum anyway as I move rows only at night in my archiving batch, and VACUUM ANALYZE after)

forum:db3l 13 years, 10 months ago

~~@sob:~~

of course 5 min is not granular enough, but I did run vmstat 1 on screen after your comment, trying to find spikes in I/O, and didn't see anything crazy
Did you get one of the longer queries occurring while you were running the vmstat? If so, and if there wasn't any noticeable increase in I/O load at the time, that's probably a reasonably good indicator that you're running into contention on the host external to your own guest. Though in that case I would expect you to see a noticeable spike in I/O wait percentage (final column in vmstat).

Of course, if you didn't have a long query simultaneous with the vmstat logging, then that particular run isn't all that useful.

– David

forum:Guspaz 13 years, 10 months ago

To rule out someone else on your linode causing this, spinning up a second linode to do the tests is a good idea. It's cheap, too. Pro-rated by the hour, I think, so 3 hours of testing with a 512 should cost you 8 or 9 cents?

forum:sob 13 years, 10 months ago

Back with more info.

I'm logging queries > 20 ms in Postgresql, and the queries that php logs as >200ms (for instance) don't even show up as >20ms in the postgresql logs. That goes in the direction of contention for host IO right?

Checkpoints and long queries don't coincide, although I must admit I'm not sure what to make of such debug:

2011-05-09 16:30:00 EDT LOG:  checkpoint complete: wrote 194 buffers (6.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=38.671 s, sync=0.002 s, total=38.675 s

I have 3 segments per checkpoint, of 16 MB each. So does this mine the 2*16MB write was spread over 38s?

I enabled a vmstat 1 60 && date in a loop to see if I get huge IO spikes around the time the long queries take place.

Thanks again.

forum:db3l 13 years, 10 months ago

~~@sob:~~

I'm logging queries > 20 ms in Postgresql, and the queries that php logs as >200ms (for instance) don't even show up as >20ms in the postgresql logs. That goes in the direction of contention for host IO right?
Actually, no, that argues for no contention, unless your PHP script is also doing a lot of I/O. In other words, the database is able to do all of it's work (including I/O) for all the queries made while that logging threshold was in place, without exceeding the threshold, which means that out of those 200ms, less than 10% is the database and disk I/O.

I suppose if I were you I might very briefly drop the threshold a bit further - something that has to be exceeded - just to make sure the logging was in fact working.

What sort of processing is the PHP code doing - are there any potential bottlenecks or ways it could get delayed?

> I'm not sure what to make of such debug:

2011-05-09 16:30:00 EDT LOG:  checkpoint complete: wrote 194 buffers (6.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=38.671 s, sync=0.002 s, total=38.675 s
I have 3 segments per checkpoint, of 16 MB each. So does this mine the 2*16MB write was spread over 38s?
Not quite. You get a checkpoint when either checkpoint_segments WAL segments have been written or the checkpoint timeout exceeded (whichever comes first). But how much data needs to be forced out at checkpoint time depends on how well the background writer has been keeping up with the actual changes.

See http://developer.postgresql.org/pgdocs/ … ation.html">http://developer.postgresql.org/pgdocs/postgres/wal-configuration.html for some details (adjust as needed for your PostgreSQL release), and http://www.westnet.com/~gsmith/content/ … bgw-83.htm">http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm has some useful stuff from when the newer checkpoint logic was being released.

At the point of a checkpoint, any remaining data that hasn't made it to the actual table/index files is flushed, at which point the WAL records are no longer needed for recovery (a checkpoint record is written to the WAL at that point). It is also at that point that earlier WAL segment files are permitted to be re-used.

In terms of the checkpoint, the log indicates how many remaining buffers were dirty and needed to be written, and the overall time to do so. The latter aims to spread your checkpoint over a time equal to the configured completion target as a percentage of the overall checkpoint timeout. This helps spread the I/O overhead.

So in your case, at the checkpoint time you had 194 remaining buffers to flush. I think a buffer is 8K, so that's about 1.5MB of data. The I/O was spread over 38 seconds. And the final fsync to ensure it was on the disk took almost no time. That latter point in particular is a good indication that there is little contention, since otherwise you'd block there waiting for the disk to fully acknowledge the sync.

If you wanted to minimize the checkpoint spike even further you could increase the checkpoint frequency (either by dropping the timeout or segment limit) which would then most likely have less dirty data to flush each time. But this is a pretty tiny amount of data.

To be honest, if the database isn't logging any queries beyond 20ms I'm not sure your issue is at the database level at all. You might need to work your way up the stack, and perhaps add some intermediate logging in your PHP code to try to determine where the remaining 180+ms is coming from in the longer queries. To minimize intrusion of the monitoring you could just store a timestamp at various points in the code path and then have a single top-level handler dump all the timestamps/deltas as part of some final output/log.

This assumes that you've definitely caught one of those 1000ms+ queries that started the thread. If not, then I suppose those extreme cases might still be contention within the database layer.

-- David

forum:sob 13 years, 10 months ago

I am certain I am logging queries above 20ms in postgresql as I see my VACUUM and others.

The following portion of code is what all my queries go through

$starttime = explode(' ', microtime());
$starttime = $starttime[1] + $starttime[0];
$res = pg_query_params($dbconn,$query,$arguments);
if (!$res) {
        // ...          
}
$mtime = explode(' ', microtime());
$totaltime = $mtime[0] + $mtime[1] - $starttime;
if (1000*$totaltime > 20) {
        mylog(LOG_DEBUG,"dbfunc.inc","query execution time",round(1000*$totaltime,0)." ms ($query)");
}

mylog() formats the string and sends it to syslog… that's how I see which queries take more than 20ms

I'm using pgpool2 as a connection pooling service, which is supposed to reduce latency (and apparently it does the job well… if pgpool2 was the culprit I would see delays on all tables, not just "tokens")

now monitoring with vmstat, will see what the results are after tomorrow…

php is making a few queries per page viewed, writing a few <10k png's occasionally, really not much… and it's not like the 20ms I measure are from beginning to end of the page, it's really just around the execution of the query!

forum:db3l 13 years, 10 months ago

~~@sob:~~

I am certain I am logging queries above 20ms in postgresql as I see my VACUUM and others.
The only conclusion I think I can take from this is that if you have a query your application code is logging as taking 200ms, but there's no long duration log in the PostgreSQL logs, that over 80% of that time was above the level of the database engine. About the only thing that such logging can't account for is the network I/O between processes on the local host for a query and response but I have a really hard time believing that's anywhere but in the noise. Well, that or the connection setup itself if that's happening per connection.

Whether that's in the PHP code, in libraries you are using to access the database, or in an intermediate layer like pgpool, but it doesn't seem to be in the database, and by extrapolation, anything below that like the host disk subsystem.

This is different than how I read your original query which seemed to indicate logging showing it was the database itself periodically spiking and taking longer to process queries.

> php is making a few queries per page viewed, writing a few <10k png's occasionally, really not much… and it's not like the 20ms I measure are from beginning to end of the page, it's really just around the execution of the query!
Of course, execution of the query involves a lot more than just the database, given that you have the PHP db libraries, and then pgpool.

I suppose if I were in your shoes, I'd jettison pgpool, if only because it's a variable you can do without (and it's not like your scale likely needs it), and at least testing without it would be interesting. Connection setup/teardown with PostgreSQL is definitely high overhead comparatively speaking, so sharing longer lived connections is definitely a good thing in general though.

Otherwise, I guess I'd have to tear down the application code path and try to figure out just where the latency is coming from if it isn't the database. Probably with a bunch of logging since the problem only occurs sporadically. Maybe it'll turn out to be something like occasional garbage collection latency in the PHP interpreter, though that seems extreme to push up into the 1s range like some of your extreme cases.

It would be interesting to get timing from the actual database connection setup (pre-query). Do you use host names in pg_hba.conf and IP connections to the database even for local connections? I think that may involve a reverse DNS lookup, and maybe that's varying? But I'd think all of that would be very quick for localhost connections anyway.

Definitely odd, but also more and more looking like higher level than the basic database engine and schema.

– David

forum:sob 13 years, 10 months ago

I found a query which my application logged as 800ms.

Postgresql didn't log it as above 20ms.

And vmstat shows this on the exact execution time of the query. There's IO but it's not outrageous (I/O is zero around theses seconds):

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  1   6376  44120  25540 216888    0    0     0   112  124  147  0  0 90 10
 0  0   6376  42632  25540 216892    0    0     0   112  586  701  5  0 83 12

I'll bypass pgpool2 tonight when users are gone, and try to look at a higher level as advised.

Only local connections so far.

Thanks :)

forum:db3l 13 years, 10 months ago

~~@sob:~~

And vmstat shows this on the exact execution time of the query. There's IO but it's not outrageous (I/O is zero around theses seconds):
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  1   6376  44120  25540 216888    0    0     0   112  124  147  0  0 90 10
 0  0   6376  42632  25540 216892    0    0     0   112  586  701  5  0 83 12
I will at least say that I/O wait values of 10% and 12% for those two seconds do show you are waiting on I/O a bit, and is a bit higher than I'd expect for that quantity of data. Assuming this is a 1s granularity, your 800ms logging is below the level of these stats so those delays could account for it.

The wait isn't outrageous, particularly if it goes away quickly, but I was just watching one of my systems for a bit (reasonably idle) and see cases where it is able to push 2-4x that number of blocks out while still showing 100% idle and 0% wait. (Then again, I just caught one single entry with an I/O wait of 10 for 220 blocks, so…)

So host contention isn't completely off the table (isn't debugging fun). What is interesting is that whatever delay is occurring doesn't seem to be happening at or below the database engine, or it would delay the actual query execution, which is still intriguing, in terms of just where your overall request path is being delayed.

-- David

forum:sob 13 years, 10 months ago

Removing pgpool2 doesn't seem to do much (in terms of I/O… because my memory consumption is down and CPU is up, as expected).

forum:Guspaz 13 years, 10 months ago

You should really spin up a second linode and do comparative testing… It just costs a few cents…

forum:sob 13 years, 10 months ago

I have a second linode already, but it's hard to simulate a similar load (especially for that tokens table, as a token somewhat corresponds to (specific interactions) with a logged in user). Jmeter would allow be to send a lot of requests to Apache, but the tokens table would be untouched unless I configure jmeter to login 20 users at the same time.

forum:sob 13 years, 10 months ago

I guess as a side question if someone knows a tool that may be more suitable than jmeter to simulate load with authenticated users, it would help (I would need jmeter to be able to alternatively select POST values for user/pwd from a list for example, which doesn't seem possible now, at least to the extent of my knowledge).

forum:Guspaz 13 years, 10 months ago

You could always switch production traffic over to a database on the second linode for a day and see if the same problems occur, and then switch back the following night if they do.

forum:sob 13 years, 10 months ago

My two linodes are 70ms apart, I feel that would hurt performance.. but I can give it a try I guess (when traffic goes down on the site, which is expected in a couple weeks maybe).

forum:Guspaz 13 years, 10 months ago

You can spin up a new linode in the same datacenter as the existing one for minimal cost…

forum:db3l 13 years, 10 months ago

~~@sob:~~

I guess as a side question if someone knows a tool that may be more suitable than jmeter to simulate load with authenticated users, it would help (I would need jmeter to be able to alternatively select POST values for user/pwd from a list for example, which doesn't seem possible now, at least to the extent of my knowledge).
Since the problem is below the client request layer how about just tweaking the top level PHP code to imagine that it's been receiving the requests. E.g., just write a top level loop (with any values/delays you want) and the execute the request handling path code just as it would be with a real web request.

You could also write some code to actually behave like a client, but that seems like it would be more work with very little diagnostic benefit.

Ideally you'd be able to generate a few delayed cases over a very short period (like no more than minutes). Having a controlled way to replicate the issue without depending on user traffic would be a big win when it comes to narrowing this down further.

Then you can re-run the same process under different scenarios and on different machines.

– David

Reply

Description

Please enter an answer

Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

**I am bold** I am bold

*I am italicized* I am italicized

Log in to Reply