Using .seqnums file to trigger a full resend

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Network disconnect recovery testing

Dermot
QuickFIX Documentation: http://www.quickfixengine.org/quickfix/doc/html/


Hi Mike,

Just getting back to orders that are sent (and don't go anywhere) during network downtime - is there a recommended approach to identifying and cleaning these up? I'm thinking to just run a process every minute or so that checks "if currently logged on and order hasn't been acked in the last minute then delete from database". What do you think?

Dermot

----- Original Message -----
From: Mike Gatny <[hidden email]>
To: [hidden email]
Cc: Grant Birchmeier <[hidden email]>; "[hidden email]" <[hidden email]>
Date: 2017/4/14, Fri 00:31
Subject: Re: Network disconnect recovery testing

Dermot,

This is one of the reasons the qf/j implementation has config options like JdbcLogHeartBeats/FileLogHeartbeats.
I don't think it would be too hard to add this to MySQLLog for qf/c++.

--
Mike Gatny
Connamara Systems, LLC

On Thu, Apr 13, 2017 at 11:18 AM, <[hidden email]> wrote:
Hi Mike,

Thanks for the quick response. I haven't included PersistMessages in the config file but I can see it defaults to 'N' so that makes sense.

Ok noted on the onLogon()/onLogout(). Up to now I've just been catching the event text "Received logon response" and "Disconnecting" which is a bit hacky but the result is the same. The only issue I have is I've been trying to keep HeartBtInt=120 so as not to flood MySql with too many messages - this means that it takes a minute or two for quickfix to realise there's a network problem and disconnect and in the meantime my app will still be sending orders. I guess I can't have my cake and eat it so if I want to catch it sooner I'll have to up the heartbeat frequency, right?

Dermot



----- Original Message -----
From: Mike Gatny <[hidden email]>
To: [hidden email]
Cc: Grant Birchmeier <[hidden email]>; "[hidden email]" <[hidden email]>
Date: 2017/4/13, Thu 23:55
Subject: Re: Network disconnect recovery testing

Dermot,

Have you set PersistMessages=N in your SessionSettings? This is one way to prevent resend (i.e. cause GapFill over App-level msgs).

Another way is to throw DoNotSend in toApp for possdup msgs:

void App::toApp(FIX::Message& msg, const FIX::SessionID&) throw(FIX:: DoNotSend)
{
  try 
  {
    FIX::PossDupFlag possDupFlag;
    msg.getHeader().getField( possDupFlag);
    if (possDupFlag)
      throw FIX::DoNotSend();
  }
  catch (FIX::FieldNotFound&)
  { }
}

As you say, this is almost certainly the behavior you want for sending orders, and that's the two main ways to achieve it.  If you are not doing either of those things, QF will indeed resend instead of sending GapFills.

If you want to detect a connection problem, you can use onLogon() and onLogout() to alert or set some state.  You can think of onLogout() as "onDisconnect" as well -- it will get called even if you didn't get a clean Logout.
Another way to do it on the fly is:

FIX::Session * pSession = FIX::Session::lookupSession( sessionID);
if (pSession->isLoggedOn()) { /* ... */  }

--
Mike Gatny
Connamara Systems, LLC

On Thu, Apr 13, 2017 at 10:37 AM, <[hidden email]> wrote:
Hi Guys, I've been doing some recovery testing and I have a couple of questions.

Firstly, my setup is pretty simple - my trading application (initiator) and sell-side simulator (acceptor) are set up on two different networked machines, both are quickfix apps. Here's what I'm doing:- 

1. Establish a connection between initiator and acceptor.
2. Send some orders and ack them on the acceptor side.
2. Break the network connection on the initiator side.
3. Send some more orders from the initiator.
4. Fill existing orders on acceptor side.
5. Reconnect the initiator to the network.

At this point quickfix handles the reconnect and all the executions flow back to the initiator. However the new orders I sent from the initiator during the downtime are not resent to the executor [they are gap filled instead]. Is this the correct behaviour? Actually this is exactly the behaviour I want but I thought quickfix would resend them. So that's my first question.

My second question is what is the best way to determine a connection problem before sending any order?

Thanks!
Dermot







------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Quickfix-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quickfix-developers
Reply | Threaded
Open this post in threaded view
|

Re: Network disconnect recovery testing

K. Frank
QuickFIX Documentation: http://www.quickfixengine.org/quickfix/doc/html/

Hi Dermot!

On Mon, Apr 17, 2017 at 5:45 AM,  <[hidden email]> wrote:
> ...
> Hi Mike,
>
> Just getting back to orders that are sent (and don't go anywhere) during network downtime - is there a recommended approach to identifying and cleaning these up? I'm thinking to just run a process every minute or so that checks "if currently logged on and order hasn't been acked in the last minute then delete from database". What do you think?

Some words of advice from the trenches ...

Think through your recovery strategy carefully (as you seem to
be doing).

First, if you send an order and receive an ack, then your counter-
party has accepted responsibility for that order.  (Hopefully he
fulfills his responsibility.)

If you try to send an order and don't receive an ack, you don't
know whether your counter-party has it or not.  (Send order,
receive and process order, send ack -- network goes down --
don't receive ack.  Note, in this scenario, your counter-party
doesn't know -- until and unless you request a gap fill -- that
you didn't receive the ack.  FIX doesn't ack acks.)  Note, this
means that you REALLY don't want to delete unacked orders
from your database and forget about them -- they might be
live on your counter-party's side.  So you need, at least, to
put them in some kind of limbo state, and have a protocol
for cleaning them up (or leaving them live when you get your
ack after a reconnect and gap fill).

You need to decide on a business protocol for handling network
outages.  Some counter-parties will let you (or require) that you
have them (pre-arranged -- not via FIX) auto-cancel any orders
if connectivity drops.  Note such a "cure" can be worse than the
problem if your network goes down for a few milliseconds or a
few seconds or even a few minutes.  It depends on your use case.

Similarly, you need to decide what you want to do with orders
that you have attempted to issue but haven't sent yet.  Again,
if the outage was only for a few milliseconds or seconds, you
might be best to just send them.

The problem is with orders that you might have sent.  You can't
really not resend them in a gap fill because that could be
inconsistent with the traffic you counter-party has already seen.
You could negotiate with your counter-party an "auto-cancel" policy
for new, previously unseen orders that come in a gap fill.

Let's say you do decide to filter out (somehow) possibly unsent traffic
when gap filling after an outage.  While you might want to "filter out"
new orders (or have your counter-party ignore / auto-cancel them)
you almost certainly do not want to filter out possibly unsent cancel
requests.

You are right that it makes a lot of sense to monitor whether you
have connectivity with your counter-party.  Note there are several
levels to this:

Business "connectivity" -- e.g., receive acks
FIX connectivity -- heartbeats, isLoggedOn
Network connectivity -- e.g., a free-standing "ping" watchdog

It does make sense -- to reduce potential gap-fill load and economic
exposure if you don't manage a timely reconnect -- to pause
sending orders on your side if you detect possible connectivity
interruption.  You may or may not wish to queue up such unsent
traffic to send if you reestablish connectivity after a "short" amount
of time.  Note, there is real business logic in how you decide to
handle this.  Would you really not want to send cancel requests
that you generated and got queued up during a (real or imagined)
loss of connectivity?

Repeating two key points:

It is well worth designing your recovery strategy with care.

The details of your strategy will depend on your specific business
use case -- auto-cancellation policies, how long an outage puts
you into a clean-up mode, rather than recovering and proceeding
normally (keeping orders live, etc.).

And last:  Test to the extent you can afford and your business requires.
I have NEVER seen automated recovery work completely correctly in
institutional environments -- even with reasonably well-tested systems.
(I'm not saying it can't happen -- I've just never seen it.)


> Dermot


Good luck.


K. Frank

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Quickfix-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quickfix-developers
Reply | Threaded
Open this post in threaded view
|

Re: Network disconnect recovery testing

Dermot
QuickFIX Documentation: http://www.quickfixengine.org/quickfix/doc/html/


Hi K. Frank,

Really appreciate the comprehensive response. Advice from the trenches is exactly what I need!

So actually for this business case we do NOT want to resend any orders or cancels that didn't get through during downtime. The order submission is fast-paced and the limit prices may be stale by the time we reconnect. Also, cancels are always linked to new orders. This makes the approach simple - any unacked messages can be 'cancelled' on our side. You're right about deleting orders, that's not a good idea. Instead will set them to 'Withdrawn' status.

My plan is to have process which runs 120 seconds after each successful logon [that should be enough time for any missing acks/fills to get back in the resend]. The process checks if there are any orders in 'Sent' status (unacked) and created before last logon time - update these orders to 'Withdrawn' status.

Simple but should do the job. Definitely need to get unacked orders out of the way as they will interfere with new orders being sent (we don't allow more than one open order of the same ticker/side). There's no auto-cancel requirements from the broker so the policy is up to us.

What do you think?

Thanks!
Dermot



----- Original Message -----
From: K. Frank <[hidden email]>
To: Quickfix Developers List <[hidden email]>
Date: 2017/4/17, Mon 21:33
Subject: Re: [Quickfix-developers] Network disconnect recovery testing

QuickFIX Documentation: http://www.quickfixengine.org/quickfix/doc/html/

Hi Dermot!

On Mon, Apr 17, 2017 at 5:45 AM,  <[hidden email]> wrote:
> ...
> Hi Mike,
>
> Just getting back to orders that are sent (and don't go anywhere) during network downtime - is there a recommended approach to identifying and cleaning these up? I'm thinking to just run a process every minute or so that checks "if currently logged on and order hasn't been acked in the last minute then delete from database". What do you think?

Some words of advice from the trenches ...

Think through your recovery strategy carefully (as you seem to
be doing).

First, if you send an order and receive an ack, then your counter-
party has accepted responsibility for that order.  (Hopefully he
fulfills his responsibility.)

If you try to send an order and don't receive an ack, you don't
know whether your counter-party has it or not.  (Send order,
receive and process order, send ack -- network goes down --
don't receive ack.  Note, in this scenario, your counter-party
doesn't know -- until and unless you request a gap fill -- that
you didn't receive the ack.  FIX doesn't ack acks.)  Note, this
means that you REALLY don't want to delete unacked orders
from your database and forget about them -- they might be
live on your counter-party's side.  So you need, at least, to
put them in some kind of limbo state, and have a protocol
for cleaning them up (or leaving them live when you get your
ack after a reconnect and gap fill).

You need to decide on a business protocol for handling network
outages.  Some counter-parties will let you (or require) that you
have them (pre-arranged -- not via FIX) auto-cancel any orders
if connectivity drops.  Note such a "cure" can be worse than the
problem if your network goes down for a few milliseconds or a
few seconds or even a few minutes.  It depends on your use case.

Similarly, you need to decide what you want to do with orders
that you have attempted to issue but haven't sent yet.  Again,
if the outage was only for a few milliseconds or seconds, you
might be best to just send them.

The problem is with orders that you might have sent.  You can't
really not resend them in a gap fill because that could be
inconsistent with the traffic you counter-party has already seen.
You could negotiate with your counter-party an "auto-cancel" policy
for new, previously unseen orders that come in a gap fill.

Let's say you do decide to filter out (somehow) possibly unsent traffic
when gap filling after an outage.  While you might want to "filter out"
new orders (or have your counter-party ignore / auto-cancel them)
you almost certainly do not want to filter out possibly unsent cancel
requests.

You are right that it makes a lot of sense to monitor whether you
have connectivity with your counter-party.  Note there are several
levels to this:

Business "connectivity" -- e.g., receive acks
FIX connectivity -- heartbeats, isLoggedOn
Network connectivity -- e.g., a free-standing "ping" watchdog

It does make sense -- to reduce potential gap-fill load and economic
exposure if you don't manage a timely reconnect -- to pause
sending orders on your side if you detect possible connectivity
interruption.  You may or may not wish to queue up such unsent
traffic to send if you reestablish connectivity after a "short" amount
of time.  Note, there is real business logic in how you decide to
handle this.  Would you really not want to send cancel requests
that you generated and got queued up during a (real or imagined)
loss of connectivity?

Repeating two key points:

It is well worth designing your recovery strategy with care.

The details of your strategy will depend on your specific business
use case -- auto-cancellation policies, how long an outage puts
you into a clean-up mode, rather than recovering and proceeding
normally (keeping orders live, etc.).

And last:  Test to the extent you can afford and your business requires.
I have NEVER seen automated recovery work completely correctly in
institutional environments -- even with reasonably well-tested systems.
(I'm not saying it can't happen -- I've just never seen it.)


> Dermot


Good luck.


K. Frank

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Quickfix-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quickfix-developers



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Quickfix-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quickfix-developers
Reply | Threaded
Open this post in threaded view
|

Re: Network disconnect recovery testing

K. Frank
QuickFIX Documentation: http://www.quickfixengine.org/quickfix/doc/html/

Hello Dermot!

On Tue, Apr 18, 2017 at 9:48 AM,  <[hidden email]> wrote:
> Hi K. Frank,
>
> Really appreciate the comprehensive response. Advice from the trenches is
> exactly what I need!
>
> So actually for this business case we do NOT want to resend any orders or
> cancels that didn't get through during downtime. The order submission is
> fast-paced and the limit prices may be stale by the time we reconnect.

This is reasonable.

> Also, cancels are always linked to new orders.

> (we don't allow more than one open order of the same ticker/side)

>From this I take it that your trading strategy, roughly speaking,
consists of having outstanding limit orders (possibly on both sides)
for a set of securities, more or less continuously, at least until they
get filled.  You do, however, adjust limit prices, by first cancelling such
an order, and then issuing a new order with a new limit price (rather
than issuing a single cancel-replace request that, in essence, modifies
the limit price of an existing order.

As side note: I would imagine that, if only as a matter of operational risk
control, there would be instances where you would want to cancel an
order without then issuing a new order.  It's your business logic, so
maybe you really don't ever do this, but it would seem prudent to have
this capability.  Anyway, if you do ever have "unpaired" cancels, it would
seem that you would want to resend them in the event of a reconnect and
a resend request from your counter-party.

> This makes the approach simple -
> any unacked messages can be 'cancelled' on our side. You're right about
> deleting orders, that's not a good idea. Instead will set them to
> 'Withdrawn' status.

This seems reasonable.  When QuickFIX reconnects for you and responds
to the resend request from your counter-party, it will give you a chance to
intervene before it actually resends a given message.  It does this through
the Application::toApp() callback that you can override in your MyApp (or
whatever) class that you have derived from Application.  You can then tweak
the messages before they are sent, or -- as in your use case -- throw a
DoNotSend exception, in which case QuickFIX will gap-fill the sequence
number (in effect telling your counter-party that the missed message was
an ignorable admin message), rather than resend the message.

> My plan is to have process which runs 120 seconds after each successful
> logon [that should be enough time for any missing acks/fills to get back in
> the resend].

This should work.  However, as a matter of general practice, using this kind
of magic-number wait time to insure that something has happened should
be avoided when possible.

When you reconnect, your counter-party tells you what sequence number
it's on.  You (i.e., QuickFIX) notice that you've missed some messages so
you (QuickFIX) issue a resend request.  You can then wait until your
counter-party is done with its resend.

I'm not entirely sure how to get QuickFIX to tell you that the resend is done,
though.  The resent messages ought to have their PosDupFlag set, so it
might work to wait until you get a message that was sent post reconnect
because its PosDupFlag isn't set.  (QuickFIX queues up out-of-order
messages for you, so messages should delivered to your application logic
in order after the resend is complete.)  Maybe you could status a known
good order and use its status message as a flag that the resend is complete.

(It would be nice if you could get a callback from QuickFIX or query it directly
somehow to get it to tell you that the resend is complete.  I don't know of a
way to do this, but perhaps there is one.)

> The process checks if there are any orders in 'Sent' status
> (unacked) and created before last logon time - update these orders to
> 'Withdrawn' status.

Yes, so the logic would be something like:

Inspect orders that QuickFIX is resending on your behalf in the toApp callback,
and throw the DoNotSend exception so that you don't resend new orders and
cancel requests that presumably haven't been sent.

Wait until your counter-party's resend is complete (e.g., as outlined
above), and
then update unacked orders in your own order store to your proposed "Withdrawn"
status.

> Simple but should do the job. Definitely need to get unacked orders out of
> the way as they will interfere with new orders being sent (we don't allow
> more than one open order of the same ticker/side). There's no auto-cancel
> requirements from the broker so the policy is up to us.

You still have a small potential issue.  According to you, after
sending the initial
order for a ticker/side, you only send pairs of cancel / new orders.
It's possible
that before a disconnect your cancel makes it through, but your new
order doesn't.

So you will want to have a little bit of logic that recognizes this
situation and lets
you send another new order without it being part of a cancel / new order pair
(because the cancel was already sent and correctly processed).

> What do you think?

If you (try to) send an order, reconnect, get all of your
counter-party's messages
through a proper resend request, and you don't get an ack for that
order, then it's
fair to say that your counter-party has told you that he did not
receive the order.

But relying on your counter-party's silence to communicate this could be less
reliable that getting an affirmative statement that the order was not received.

To be redundantly cautious, you could proactively send status requests for your
unacked orders, and if your counter-party never got them, you should get back
some kind of unknown order message.  I believe that the FIX specification says
you can do this using the ClOrdID (the order id generated on your end and sent
as part of your new order message).  However, bear in mind, some counter-parties
may require that you status the order using the OrdID (the order id assigned by
the counter-party and sent back in the ack).  Well, since you never got an ack,
you wouldn't be able to do it this way.  (As I said, I don't believe
that requiring the
OrdID in place of the ClOrdID is really meets the spec, but it is a
pitfall to watch
out for.)

Similarly, if you tried to send a cancel (in your case, presumably as part of a
cancel / new order pair), but never got a ur-out, you could status the order to
confirm that it is still alive, rather than relying on silence (i.e.,
the absence of
the ur-out) to indicate that the order is alive.

Although this kind of double-check requires building some additional application
logic, it will probably be worth it if you're going to trade any
significant flow
through your application or your application is going to be in service for more
than, say, a few months.

Trust me, stuff happens.

It always seems cheaper not to invest in this kind of additional
development cost.
Cheaper, at least, until you're on the losing side of a trading error
(and automated-
trading trading errors can blow up in a hurry).

> Thanks!
> Dermot


Happy Trading!


K. Frank


> ----- Original Message -----
> From: K. Frank <[hidden email]>
> To: Quickfix Developers List <[hidden email]>
> Date: 2017/4/17, Mon 21:33
> Subject: Re: [Quickfix-developers] Network disconnect recovery testing
>
> Hi Dermot!
>
> On Mon, Apr 17, 2017 at 5:45 AM,  <[hidden email]> wrote:
>> ...
>> Hi Mike,
>>
>> Just getting back to orders that are sent (and don't go anywhere) during
>> network downtime - is there a recommended approach to identifying and
>> cleaning these up? I'm thinking to just run a process every minute or so
>> that checks "if currently logged on and order hasn't been acked in the last
>> minute then delete from database". What do you think?
>
> Some words of advice from the trenches ...
>
> Think through your recovery strategy carefully (as you seem to
> be doing).
> ...

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Quickfix-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quickfix-developers
Reply | Threaded
Open this post in threaded view
|

Re: Network disconnect recovery testing

Dermot
QuickFIX Documentation: http://www.quickfixengine.org/quickfix/doc/html/


Hi K. Frank,

Firstly apologies for my very tardy response - I got sidetracked on another project and didn't want to send a cursory reply after the effort you put in.

So you hit the nail on the head, yes our strategy will be sending limit orders on both sides until they are filled, with limits being changed continuously.  I'm using separate new order/cancel pairs rather than cancel-replace because limit orders may be replaced by a pegged algo (and visa versa) and that change isn't supported by the broker via cancel-replace.

Also, yes there will be cases where orders are cancelled outright (not individually but as a cancel-all function) and you're right it would make sense to have them included as a resend so I'll use the toApp() to control that logic for those orders.

[You still have a small potential issue.  According to you, after sending the initial order for a ticker/side, you only send pairs of cancel / new orders. It's possible that before a disconnect your cancel makes it through, but your new order doesn't] - this won't be a problem as the new order will not be triggered until I get the cancel confirm back.

[My plan is to have process which runs 120 seconds after each successful logon] - this will work probably 95% of the time but yes, point taken - it's hardly bullet proof. I could possibly use the QuickFix event "resend ..... has been satisfied" to check when a resend has ended but your suggestion of making an order status requests would lay all doubt to rest so I'm going to try that one out.

Automated trading errors.. yes I've seen the damage algos can do, especially the volume driven ones!

Thanks again for the great advice. I'll probably be back with more on this one!

Regards,
Dermot



----- Original Message -----
From: K. Frank <[hidden email]>
To: Quickfix Developers List <[hidden email]>
Date: 2017/4/20, Thu 06:15
Subject: Re: [Quickfix-developers] Network disconnect recovery testing

QuickFIX Documentation: http://www.quickfixengine.org/quickfix/doc/html/

Hello Dermot!

On Tue, Apr 18, 2017 at 9:48 AM,  <[hidden email]> wrote:
> Hi K. Frank,
>
> Really appreciate the comprehensive response. Advice from the trenches is
> exactly what I need!
>
> So actually for this business case we do NOT want to resend any orders or
> cancels that didn't get through during downtime. The order submission is
> fast-paced and the limit prices may be stale by the time we reconnect.

This is reasonable.

> Also, cancels are always linked to new orders.

> (we don't allow more than one open order of the same ticker/side)

>From this I take it that your trading strategy, roughly speaking,
consists of having outstanding limit orders (possibly on both sides)
for a set of securities, more or less continuously, at least until they
get filled.  You do, however, adjust limit prices, by first cancelling such
an order, and then issuing a new order with a new limit price (rather
than issuing a single cancel-replace request that, in essence, modifies
the limit price of an existing order.

As side note: I would imagine that, if only as a matter of operational risk
control, there would be instances where you would want to cancel an
order without then issuing a new order.  It's your business logic, so
maybe you really don't ever do this, but it would seem prudent to have
this capability.  Anyway, if you do ever have "unpaired" cancels, it would
seem that you would want to resend them in the event of a reconnect and
a resend request from your counter-party.

> This makes the approach simple -
> any unacked messages can be 'cancelled' on our side. You're right about
> deleting orders, that's not a good idea. Instead will set them to
> 'Withdrawn' status.

This seems reasonable.  When QuickFIX reconnects for you and responds
to the resend request from your counter-party, it will give you a chance to
intervene before it actually resends a given message.  It does this through
the Application::toApp() callback that you can override in your MyApp (or
whatever) class that you have derived from Application.  You can then tweak
the messages before they are sent, or -- as in your use case -- throw a
DoNotSend exception, in which case QuickFIX will gap-fill the sequence
number (in effect telling your counter-party that the missed message was
an ignorable admin message), rather than resend the message.

> My plan is to have process which runs 120 seconds after each successful
> logon [that should be enough time for any missing acks/fills to get back in
> the resend].

This should work.  However, as a matter of general practice, using this kind
of magic-number wait time to insure that something has happened should
be avoided when possible.

When you reconnect, your counter-party tells you what sequence number
it's on.  You (i.e., QuickFIX) notice that you've missed some messages so
you (QuickFIX) issue a resend request.  You can then wait until your
counter-party is done with its resend.

I'm not entirely sure how to get QuickFIX to tell you that the resend is done,
though.  The resent messages ought to have their PosDupFlag set, so it
might work to wait until you get a message that was sent post reconnect
because its PosDupFlag isn't set.  (QuickFIX queues up out-of-order
messages for you, so messages should delivered to your application logic
in order after the resend is complete.)  Maybe you could status a known
good order and use its status message as a flag that the resend is complete.

(It would be nice if you could get a callback from QuickFIX or query it directly
somehow to get it to tell you that the resend is complete.  I don't know of a
way to do this, but perhaps there is one.)

> The process checks if there are any orders in 'Sent' status
> (unacked) and created before last logon time - update these orders to
> 'Withdrawn' status.

Yes, so the logic would be something like:

Inspect orders that QuickFIX is resending on your behalf in the toApp callback,
and throw the DoNotSend exception so that you don't resend new orders and
cancel requests that presumably haven't been sent.

Wait until your counter-party's resend is complete (e.g., as outlined
above), and
then update unacked orders in your own order store to your proposed "Withdrawn"
status.

> Simple but should do the job. Definitely need to get unacked orders out of
> the way as they will interfere with new orders being sent (we don't allow
> more than one open order of the same ticker/side). There's no auto-cancel
> requirements from the broker so the policy is up to us.

You still have a small potential issue.  According to you, after
sending the initial
order for a ticker/side, you only send pairs of cancel / new orders.
It's possible
that before a disconnect your cancel makes it through, but your new
order doesn't.

So you will want to have a little bit of logic that recognizes this
situation and lets
you send another new order without it being part of a cancel / new order pair
(because the cancel was already sent and correctly processed).

> What do you think?

If you (try to) send an order, reconnect, get all of your
counter-party's messages
through a proper resend request, and you don't get an ack for that
order, then it's
fair to say that your counter-party has told you that he did not
receive the order.

But relying on your counter-party's silence to communicate this could be less
reliable that getting an affirmative statement that the order was not received.

To be redundantly cautious, you could proactively send status requests for your
unacked orders, and if your counter-party never got them, you should get back
some kind of unknown order message.  I believe that the FIX specification says
you can do this using the ClOrdID (the order id generated on your end and sent
as part of your new order message).  However, bear in mind, some counter-parties
may require that you status the order using the OrdID (the order id assigned by
the counter-party and sent back in the ack).  Well, since you never got an ack,
you wouldn't be able to do it this way.  (As I said, I don't believe
that requiring the
OrdID in place of the ClOrdID is really meets the spec, but it is a
pitfall to watch
out for.)

Similarly, if you tried to send a cancel (in your case, presumably as part of a
cancel / new order pair), but never got a ur-out, you could status the order to
confirm that it is still alive, rather than relying on silence (i.e.,
the absence of
the ur-out) to indicate that the order is alive.

Although this kind of double-check requires building some additional application
logic, it will probably be worth it if you're going to trade any
significant flow
through your application or your application is going to be in service for more
than, say, a few months.

Trust me, stuff happens.

It always seems cheaper not to invest in this kind of additional
development cost.
Cheaper, at least, until you're on the losing side of a trading error
(and automated-
trading trading errors can blow up in a hurry).

> Thanks!
> Dermot


Happy Trading!


K. Frank


> ----- Original Message -----
> From: K. Frank <[hidden email]>
> To: Quickfix Developers List <[hidden email]>
> Date: 2017/4/17, Mon 21:33
> Subject: Re: [Quickfix-developers] Network disconnect recovery testing
>
> Hi Dermot!
>
> On Mon, Apr 17, 2017 at 5:45 AM,  <[hidden email]> wrote:
>> ...
>> Hi Mike,
>>
>> Just getting back to orders that are sent (and don't go anywhere) during
>> network downtime - is there a recommended approach to identifying and
>> cleaning these up? I'm thinking to just run a process every minute or so
>> that checks "if currently logged on and order hasn't been acked in the last
>> minute then delete from database". What do you think?
>
> Some words of advice from the trenches ...
>
> Think through your recovery strategy carefully (as you seem to
> be doing).
> ...

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Quickfix-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quickfix-developers



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Quickfix-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/quickfix-developers
12